Package com.selectpdf

Class PdfToTextClient

java.lang.Object
com.selectpdf.ApiClient
com.selectpdf.PdfToTextClient

public class PdfToTextClient
extends ApiClient
Pdf To Text Conversion with SelectPdf Online API.
 
package com.selectpdf;

public class PdfToText {
    public static void main(String[] args) throws Exception {
        String testUrl = "https://selectpdf.com/demo/files/selectpdf.pdf";
        String testPdf = "Input.pdf";
        String localFile = "Result.txt";
        String apiKey = "Your API key here";

        System.out.println(String.format("This is SelectPdf-%s.", ApiClient.CLIENT_VERSION));

        try {
            PdfToTextClient client = new PdfToTextClient(apiKey);

            // set parameters - see full list at https://selectpdf.com/pdf-to-text-api/
            client
                .setStartPage(1) // start page (processing starts from here)
                .setEndPage(0) // end page (set 0 to process file til the end)
                .setOutputFormat(ApiEnums.OutputFormat.Text) // set output format (0-Text or 1-HTML)
            ;

            System.out.println("Starting pdf to text...");

            // convert local pdf to local text file
            client.getTextFromFileToFile(testPdf, localFile);

            // extract text from local pdf to memory
            // String text = client.getTextFromFile(testPdf);
            // print text
            // System.out.println(text);

            // convert pdf from public url to local text file
            // client.getTextFromUrlToFile(testUrl, localFile);

            // extract text from pdf from public url to memory
            // String text = client.getTextFromUrl(testUrl);
            // print text
            // System.out.println(text);

            System.out.println(String.format("Finished! Number of pages: %d.", client.getNumberOfPages()));

            // get API usage
            UsageClient usageClient = new UsageClient(apiKey);
            String usage = usageClient.getUsage(false);
            System.out.printf("Usage details: %s.\r\n", usage);

            // org.json.JSONObject usageObject = new org.json.JSONObject(usage);
            // int available = usageObject.getInt("available");
            // System.out.printf("Conversions remained this month: %d.\r\n", available);

        }
        catch (Exception ex) {
            System.out.println("An error occured: " + ex.getMessage());
        }
    }
        
}
 
 
  • Constructor Details

    • PdfToTextClient

      public PdfToTextClient​(java.lang.String apiKey)
      Construct the Pdf To Text Client.
      Parameters:
      apiKey - API Key.
  • Method Details

    • getTextFromFile

      public java.lang.String getTextFromFile​(java.lang.String inputPdf)
      Get the text from the specified pdf.
      Parameters:
      inputPdf - Path to a local PDF file.
      Returns:
      Extracted text.
    • getTextFromFileToFile

      public void getTextFromFileToFile​(java.lang.String inputPdf, java.lang.String outputFilePath) throws java.io.IOException
      Get the text from the specified pdf and write it to the specified text file.
      Parameters:
      inputPdf - Path to a local PDF file.
      outputFilePath - The output file where the resulted text will be written.
      Throws:
      java.io.IOException
    • getTextFromFileToStream

      public void getTextFromFileToStream​(java.lang.String inputPdf, java.io.OutputStream stream) throws java.io.IOException
      Get the text from the specified pdf and write it to the specified stream.
      Parameters:
      inputPdf - Path to a local PDF file.
      stream - The output stream where the resulted PDF will be written.
      Throws:
      java.io.IOException
    • getTextFromFileAsync

      public java.lang.String getTextFromFileAsync​(java.lang.String inputPdf)
      Get the text from the specified pdf with an asynchronous call.
      Parameters:
      inputPdf - Path to a local PDF file.
      Returns:
      Extracted text.
    • getTextFromFileToFileAsync

      public void getTextFromFileToFileAsync​(java.lang.String inputPdf, java.lang.String outputFilePath) throws java.io.IOException
      Get the text from the specified pdf with an asynchronous call and write it to the specified text file.
      Parameters:
      inputPdf - Path to a local PDF file.
      outputFilePath - The output file where the resulted text will be written.
      Throws:
      java.io.IOException
    • getTextFromFileToStreamAsync

      public void getTextFromFileToStreamAsync​(java.lang.String inputPdf, java.io.OutputStream stream) throws java.io.IOException
      Get the text from the specified pdf with an asynchronous call and write it to the specified stream.
      Parameters:
      inputPdf - Path to a local PDF file.
      stream - The output stream where the resulted PDF will be written.
      Throws:
      java.io.IOException
    • getTextFromUrl

      public java.lang.String getTextFromUrl​(java.lang.String url)
      Get the text from the specified pdf.
      Parameters:
      url - Address of the PDF file.
      Returns:
      Extracted text.
    • getTextFromUrlToFile

      public void getTextFromUrlToFile​(java.lang.String url, java.lang.String outputFilePath) throws java.io.IOException
      Get the text from the specified pdf and write it to the specified text file.
      Parameters:
      url - Address of the PDF file.
      outputFilePath - The output file where the resulted text will be written.
      Throws:
      java.io.IOException
    • getTextFromUrlToStream

      public void getTextFromUrlToStream​(java.lang.String url, java.io.OutputStream stream) throws java.io.IOException
      Get the text from the specified pdf and write it to the specified stream.
      Parameters:
      url - Address of the PDF file.
      stream - The output stream where the resulted PDF will be written.
      Throws:
      java.io.IOException
    • getTextFromUrlAsync

      public java.lang.String getTextFromUrlAsync​(java.lang.String url)
      Get the text from the specified pdf with an asynchronous call.
      Parameters:
      url - Address of the PDF file.
      Returns:
      Extracted text.
    • getTextFromUrlToFileAsync

      public void getTextFromUrlToFileAsync​(java.lang.String url, java.lang.String outputFilePath) throws java.io.IOException
      Get the text from the specified pdf with an asynchronous call and write it to the specified text file.
      Parameters:
      url - Address of the PDF file.
      outputFilePath - The output file where the resulted text will be written.
      Throws:
      java.io.IOException
    • getTextFromUrlToStreamAsync

      public void getTextFromUrlToStreamAsync​(java.lang.String url, java.io.OutputStream stream) throws java.io.IOException
      Get the text from the specified pdf with an asynchronous call and write it to the specified stream.
      Parameters:
      url - Address of the PDF file.
      stream - The output stream where the resulted PDF will be written.
      Throws:
      java.io.IOException
    • searchFile

      public java.lang.String searchFile​(java.lang.String inputPdf, java.lang.String textToSearch)
      Search for a specific text in a PDF document. The search is case insensitive and returns partial words also. Pages that participate to this operation are specified by setStartPage() and setEndPage() methods.
      Parameters:
      inputPdf - Path to a local PDF file.
      textToSearch - Text to search.
      Returns:
      List with text positions in the current PDF document.
    • searchFile

      public java.lang.String searchFile​(java.lang.String inputPdf, java.lang.String textToSearch, java.lang.Boolean caseSensitive, java.lang.Boolean wholeWordsOnly)
      Search for a specific text in a PDF document. Pages that participate to this operation are specified by setStartPage() and setEndPage() methods.
      Parameters:
      inputPdf - Path to a local PDF file.
      textToSearch - Text to search.
      caseSensitive - If the search is case sensitive or not.
      wholeWordsOnly - If the search works on whole words or not.
      Returns:
      List with text positions in the current PDF document.
    • searchFileAsync

      public java.lang.String searchFileAsync​(java.lang.String inputPdf, java.lang.String textToSearch)
      Search for a specific text in a PDF document with an asynchronous call. The search is case insensitive and returns partial words also. Pages that participate to this operation are specified by setStartPage() and setEndPage() methods.
      Parameters:
      inputPdf - Path to a local PDF file.
      textToSearch - Text to search.
      Returns:
      List with text positions in the current PDF document.
    • searchFileAsync

      public java.lang.String searchFileAsync​(java.lang.String inputPdf, java.lang.String textToSearch, java.lang.Boolean caseSensitive, java.lang.Boolean wholeWordsOnly)
      Search for a specific text in a PDF document with an asynchronous call. Pages that participate to this operation are specified by setStartPage() and setEndPage() methods.
      Parameters:
      inputPdf - Path to a local PDF file.
      textToSearch - Text to search.
      caseSensitive - If the search is case sensitive or not.
      wholeWordsOnly - If the search works on whole words or not.
      Returns:
      List with text positions in the current PDF document.
    • searchUrl

      public java.lang.String searchUrl​(java.lang.String url, java.lang.String textToSearch)
      Search for a specific text in a PDF document. The search is case insensitive and returns partial words also. Pages that participate to this operation are specified by setStartPage() and setEndPage() methods.
      Parameters:
      url - Address of the PDF file.
      textToSearch - Text to search.
      Returns:
      List with text positions in the current PDF document.
    • searchUrl

      public java.lang.String searchUrl​(java.lang.String url, java.lang.String textToSearch, java.lang.Boolean caseSensitive, java.lang.Boolean wholeWordsOnly)
      Search for a specific text in a PDF document. Pages that participate to this operation are specified by setStartPage() and setEndPage() methods.
      Parameters:
      url - Address of the PDF file.
      textToSearch - Text to search.
      caseSensitive - If the search is case sensitive or not.
      wholeWordsOnly - If the search works on whole words or not.
      Returns:
      List with text positions in the current PDF document.
    • searchUrlAsync

      public java.lang.String searchUrlAsync​(java.lang.String url, java.lang.String textToSearch)
      Search for a specific text in a PDF document with an asynchronous call. The search is case insensitive and returns partial words also. Pages that participate to this operation are specified by setStartPage() and setEndPage() methods.
      Parameters:
      url - Address of the PDF file.
      textToSearch - Text to search.
      Returns:
      List with text positions in the current PDF document.
    • searchUrlAsync

      public java.lang.String searchUrlAsync​(java.lang.String url, java.lang.String textToSearch, java.lang.Boolean caseSensitive, java.lang.Boolean wholeWordsOnly)
      Search for a specific text in a PDF document with an asynchronous call. Pages that participate to this operation are specified by setStartPage() and setEndPage() methods.
      Parameters:
      url - Address of the PDF file.
      textToSearch - Text to search.
      caseSensitive - If the search is case sensitive or not.
      wholeWordsOnly - If the search works on whole words or not.
      Returns:
      List with text positions in the current PDF document.
    • setStartPage

      public PdfToTextClient setStartPage​(int startPage)
      Set Start Page number. Default value is 1 (first page of the document).
      Parameters:
      startPage - Start page number (1-based).
      Returns:
      Reference to the current object.
    • setEndPage

      public PdfToTextClient setEndPage​(int endPage)
      Set End Page number. Default value is 0 (process till the last page of the document).
      Parameters:
      endPage - End page number (1-based).
      Returns:
      Reference to the current object.
    • setUserPassword

      public PdfToTextClient setUserPassword​(java.lang.String userPassword)
      Set PDF user password.
      Parameters:
      userPassword - PDF user password.
      Returns:
      Reference to the current object.
    • setTextLayout

      public PdfToTextClient setTextLayout​(ApiEnums.TextLayout textLayout)
      Set the text layout. The default value is TextLayout.Original.
      Parameters:
      textLayout - The text layout.
      Returns:
      Reference to the current object.
    • setOutputFormat

      public PdfToTextClient setOutputFormat​(ApiEnums.OutputFormat outputFormat)
      Set the output format. The default value is OutputFormat.Text.
      Parameters:
      outputFormat - The output format.
      Returns:
      Reference to the current object.
    • setTimeout

      public PdfToTextClient setTimeout​(int timeout)
      Set the maximum amount of time (in seconds) for this job. The default value is 30 seconds. Use a larger value (up to 120 seconds allowed) for large documents.
      Parameters:
      timeout - Timeout in seconds.
      Returns:
      Reference to the current object.
    • setCustomParameter

      public PdfToTextClient setCustomParameter​(java.lang.String parameterName, java.lang.String parameterValue)
      Set a custom parameter. Do not use this method unless advised by SelectPdf.
      Parameters:
      parameterName - Parameter name.
      parameterValue - Parameter value.
      Returns:
      Reference to the current object.