SelectPdf Pdf To Text REST API is an online solution that lets you extract text from your PDF documents or search your PDF document for certain words. The API is easy to use, the integration can be done in a few minutes with only a few lines of code.
GET A DEMO LICENSE KEY NOW
The free trial key for the online API is valid for 7 days and it includes 200 conversions.
Features
- Extract text from PDF.
- Search PDF.
- Specify start and end page for partial file processing.
- Specify output format (plain text or html).
- Use a PDF from an online location (url) or upload a local PDF document.
Usage
It’s very easy to use SelectPdf Pdf To Text REST API. All you need is a license key and you will be able to extract text from your PDFs in an instant. If you want to see how to do this in PHP, Java, Ruby, Python, Perl, Node.JS, C# or VB.net, click here to go directly to the coding examples.
Here is a basic usage example using a POST request:
Method: POST
Content Type: multipart/form-data
POST /api2/pdftotext/ HTTP/1.1
Content-Type: multipart/form-data; boundary=—011000010111000001101001
Host: selectpdf.com
Content-Length: 273
—–011000010111000001101001
Content-Disposition: form-data; name=”key”
_put___your___license___key___here__
—–011000010111000001101001
Content-Disposition: form-data; name=”url”
https://selectpdf.com/demo/files/selectpdf.pdf
—–011000010111000001101001–
Options
Mandatory Parameters
SelectPdf Pdf To Text REST API has only 2 mandatory parameters. The rest of the parameters are optional. When they are missing, the default value is used.
| Parameter | Description |
| key | Your API license key |
| url | The url of the PDF that is processed |
| — OR — | |
| === Single file upload === | |
Optional Parameters
These parameters are used by both “Convert” and “Search” actions.
| Parameter | Description |
| action | Specify the action performed on the PDF. Possible values: Convert, Search. Default value: Convert. |
| start_page | Start page number in the PDF. Default value is 1 (first page). |
| end_page | End page number in the PDF. Default value is 0 (process till the last page). |
| user_password | PDF document password (if the document is password protected). |
| timeout | Maximum amount of time (in seconds) for this job (up to 120 seconds allowed). Default value: 30 seconds. |
Text Extraction Parameters
These parameters are used only by “Convert” action.
| Parameter | Description |
| text_layout | Output text layout. Possible values: 0 – Original, 1 – Reading Order. Default value: 0. |
| output_format | Output format. Possible values: 0 – Text, 1 – Html. Default value: 0. |
Search Parameters
These parameters are used only by “Search” action.
| Parameter | Description |
| search_text | Text to search. |
| case_sensitive | Specify if the search is case sensitive or not. Possible values: True, False. Default value: False. |
| whole_words_only | Specify if the search works on whole words or not. Possible values: True, False. Default value: False. |
Return Codes
Our API returns HTTP response codes, which you can check to see if the conversion was successful or not. Here is the list of HTTP codes used by SelectPdf REST API:
| Code | Description |
| 200 OK | The API call succeeded. If action is ‘Convert’ – the extracted text is returned. If action is ‘Search’ – a json with the searched text positions is returned. |
| 400 Bad Request | Url or file not uploaded. The body of the response contains an explanation in plain text. |
| 401 Authorization Required | License key not specified or invalid. The body of the response contains an explanation in plain text. |
| 415 Unsupported Media Type | The PDF to TEXT API requires data to be posted as multipart/form-data. |
| 499 Custom | Conversion error. The body of the response contains an explanation in plain text. |
Remarks
- Concurrency: depending on payment plan, the following number of concurrent requests can be sent to the API:
Free Trial – 1 request
Entry Level – 2 requests
Standard Level – 4 requests
Advanced Level – 8 requests
Premium Level – 8 requests
Ultra Level – 16 requests
Dedicated Level – 16 requests.If more requests are sent, they are either queued or rejected with a Too Many Requests error.
- PDF file size: The API accepts files up to 100Mb. Each 50 pages is counted as 1 conversion credit.
- The API can process PDF documents with selectable text. It will not work with text embedded in images. This is not an OCR tool.
Code Examples
Pdf To Text Conversion in PHP
<?php
require("SelectPdf.Api.php");
$testUrl = "https://selectpdf.com/demo/files/selectpdf.pdf";
$testPdf = "Input.pdf";
$localFile = "Result.txt";
$apiKey = "Your API key here";
echo ("This is SelectPdf-" . SelectPdf\Api\ApiClient::CLIENT_VERSION . ".\n");
try {
$client = new SelectPdf\Api\PdfToTextClient($apiKey);
// set parameters - see full list at https://selectpdf.com/pdf-to-text-api/
$client
->setStartPage(1) // first page to process
->setEndPage(0) // 0 = process to the last page
->setOutputFormat(SelectPdf\Api\OutputFormat::Text) // 0 - Text, 1 - Html
;
echo ("Starting pdf to text ...\n");
// extract text from a local pdf to a local text file
$client->getTextFromFileToFile($testPdf, $localFile);
// other available calls:
// $text = $client->getTextFromFile($testPdf); // local pdf -> string
// $client->getTextFromUrlToFile($testUrl, $localFile); // remote url -> file
// $text = $client->getTextFromUrl($testUrl); // remote url -> string
echo ("Finished! Number of pages processed: " . $client->getNumberOfPages() . ".\n");
// get API usage
$usageClient = new \SelectPdf\Api\UsageClient($apiKey);
$usage = $usageClient->getUsage(false);
echo ("Conversions remained this month: " . $usage["available"] . ".\n");
}
catch (Exception $ex) {
echo ("An error occurred: " . $ex . ".\n");
}
?>
// Code snippet uses the SelectPdf API Client library for PHP.
Pdf To Text Conversion in Java
package com.selectpdf;
public class PdfToText {
public static void main(String[] args) throws Exception {
String testUrl = "https://selectpdf.com/demo/files/selectpdf.pdf";
String testPdf = "Input.pdf";
String localFile = "Result.txt";
String apiKey = "Your API key here";
System.out.println(String.format("This is SelectPdf-%s.", ApiClient.CLIENT_VERSION));
try {
PdfToTextClient client = new PdfToTextClient(apiKey);
// set parameters - see full list at https://selectpdf.com/pdf-to-text-api/
client
.setStartPage(1) // first page to process
.setEndPage(0) // 0 = process to the last page
.setOutputFormat(ApiEnums.OutputFormat.Text) // 0 - Text, 1 - HTML
;
System.out.println("Starting pdf to text...");
// extract text from a local pdf to a local text file
client.getTextFromFileToFile(testPdf, localFile);
// other available calls:
// String text = client.getTextFromFile(testPdf); // local pdf -> string
// client.getTextFromUrlToFile(testUrl, localFile); // remote url -> file
// String text = client.getTextFromUrl(testUrl); // remote url -> string
System.out.println(String.format("Finished! Number of pages: %d.", client.getNumberOfPages()));
// get API usage
UsageClient usageClient = new UsageClient(apiKey);
String usage = usageClient.getUsage(false);
System.out.printf("Usage details: %s.%n", usage);
}
catch (Exception ex) {
System.out.println("An error occurred: " + ex.getMessage());
}
}
}
// Code snippet uses the SelectPdf API Client library for Java.
Pdf To Text Conversion in Ruby
require 'selectpdf'
$stdout.sync = true
print "This is SelectPdf-#{SelectPdf::CLIENT_VERSION}\n"
test_url = 'https://selectpdf.com/demo/files/selectpdf.pdf'
test_pdf = 'Input.pdf'
local_file = 'Result.txt'
api_key = 'Your API key here'
begin
client = SelectPdf::PdfToTextClient.new(api_key)
# set parameters - see full list at https://selectpdf.com/pdf-to-text-api/
client.start_page = 1 # first page to process
client.end_page = 0 # 0 = process to the last page
client.output_format = SelectPdf::OutputFormat::TEXT # Text or HTML
print "Starting pdf to text ...\n"
# extract text from a local pdf to a local text file
client.text_from_file_to_file(test_pdf, local_file)
# other available calls:
# text = client.text_from_file(test_pdf) # local pdf -> string
# client.text_from_url_to_file(test_url, local_file) # remote url -> file
# text = client.text_from_url(test_url) # remote url -> string
print "Finished! Number of pages processed: #{client.number_of_pages}.\n"
# get API usage
usage_client = SelectPdf::UsageClient.new(api_key)
usage = usage_client.get_usage(false)
print('Conversions remained this month: ', usage['available'], "\n")
rescue SelectPdf::ApiException => e
print("An error occurred: #{e}")
end
# Code snippet uses the SelectPdf API Client library for Ruby.
Pdf To Text Conversion in Python
# -*- coding: utf-8 -*-
import selectpdf
testUrl = "https://selectpdf.com/demo/files/selectpdf.pdf"
testPdf = "Input.pdf"
localFile = "Result.txt"
apiKey = "Your API key here"
print ("This is SelectPdf-{0}.".format(selectpdf.CLIENT_VERSION))
try:
client = selectpdf.PdfToTextClient(apiKey)
# set parameters - see full list at https://selectpdf.com/pdf-to-text-api/
client.setStartPage(1) # first page to process
client.setEndPage(0) # 0 = process to the last page
client.setOutputFormat(selectpdf.OutputFormat.Text) # 0 - Text, 1 - HTML
print ("Starting pdf to text ...")
# extract text from a local pdf to a local text file
client.getTextFromFileToFile(testPdf, localFile)
# other available calls:
# text = client.getTextFromFile(testPdf) # local pdf -> string
# client.getTextFromUrlToFile(testUrl, localFile) # remote url -> file
# text = client.getTextFromUrl(testUrl) # remote url -> string
print ("Finished! Number of pages processed: {0}.".format(client.getNumberOfPages()))
# get API usage
usageClient = selectpdf.UsageClient(apiKey)
usage = usageClient.getUsage()
print ("Conversions remained this month: {0}.".format(usage["available"]))
except selectpdf.ApiException as ex:
print ("An error occurred: {0}.".format(ex.getMessage()))
# Code snippet uses the SelectPdf API Client library for Python.
Pdf To Text Conversion in C#
using System;
using SelectPdf.Api;
namespace Samples
{
public class PdfToText
{
public static void Main()
{
string testUrl = "https://selectpdf.com/demo/files/selectpdf.pdf";
string testPdf = "Input.pdf";
string localFile = "Result.txt";
string apiKey = "Your API key here";
Console.WriteLine("This is SelectPdf-{0}.", ApiClient.CLIENT_VERSION);
try
{
PdfToTextClient client = new PdfToTextClient(apiKey);
// set parameters - see full list at https://selectpdf.com/pdf-to-text-api/
client
.setStartPage(1) // first page to process
.setEndPage(0) // 0 = process to the last page
.setOutputFormat(OutputFormat.Text) // 0 - Text, 1 - HTML
;
Console.WriteLine("Starting pdf to text ...");
// extract text from a local pdf to a local text file
client.getTextFromFileToFile(testPdf, localFile);
// other available calls:
// string text = client.getTextFromFile(testPdf); // local pdf -> string
// client.getTextFromUrlToFile(testUrl, localFile); // remote url -> file
// string text = client.getTextFromUrl(testUrl); // remote url -> string
Console.WriteLine("Finished! Number of pages processed: {0}.", client.getNumberOfPages());
// response telemetry (SDK 1.5.0+)
Console.WriteLine("Mode: {0}, Execution: {1}.", client.Mode, client.ExecutionMode);
Console.WriteLine("Credits remaining: {0} / {1}.",
client.CreditsRemaining, client.CreditsTotal);
// get API usage
UsageClient usageClient = new UsageClient(apiKey);
UsageInformation usage = usageClient.getUsage(false);
Console.WriteLine("Conversions remained this month: {0}.", usage.Available);
}
catch (ApiException ex)
{
Console.WriteLine("API error: {0}", ex.Message);
}
catch (Exception ex)
{
Console.WriteLine("An error occurred: " + ex.Message);
}
}
}
}
// Code snippet uses the SelectPdf API Client library for .NET.
Pdf To Text Conversion in Go
There is no dedicated Go client library for the SelectPdf API. The example below uses Go’s standard library to POST a multipart/form-data request directly to the API endpoint.
package main
import (
"bytes"
"fmt"
"io"
"mime/multipart"
"net/http"
"os"
)
func main() {
apiEndpoint := "https://selectpdf.com/api2/pdftotext/"
apiKey := "Your API key here"
testUrl := "https://selectpdf.com/demo/files/selectpdf.pdf"
outFile := "Result.txt"
body := &bytes.Buffer{}
writer := multipart.NewWriter(body)
// mandatory parameter
_ = writer.WriteField("key", apiKey)
// input PDF - either remote url or upload a local file
_ = writer.WriteField("url", testUrl)
// alternative: upload a local PDF as 'inputPdf'
// f, err := os.Open("Input.pdf")
// if err != nil { panic(err) }
// defer f.Close()
// part, _ := writer.CreateFormFile("inputPdf", "Input.pdf")
// _, _ = io.Copy(part, f)
// optional parameters - see full list at https://selectpdf.com/pdf-to-text-api/
_ = writer.WriteField("action", "Convert") // Convert (default) or Search
_ = writer.WriteField("start_page", "1")
_ = writer.WriteField("end_page", "0") // 0 = until the end
_ = writer.WriteField("output_format", "0") // 0 - Text, 1 - HTML
if err := writer.Close(); err != nil { panic(err) }
req, err := http.NewRequest("POST", apiEndpoint, body)
if err != nil { panic(err) }
req.Header.Set("Content-Type", writer.FormDataContentType())
resp, err := http.DefaultClient.Do(req)
if err != nil { panic(err) }
defer resp.Body.Close()
if resp.StatusCode != http.StatusOK {
msg, _ := io.ReadAll(resp.Body)
fmt.Printf("API error %d: %s\n", resp.StatusCode, string(msg))
return
}
out, err := os.Create(outFile)
if err != nil { panic(err) }
defer out.Close()
if _, err := io.Copy(out, resp.Body); err != nil { panic(err) }
fmt.Printf("Finished! Pages: %s. Text saved to %s.\n",
resp.Header.Get("X-SelectPdf-Pages"), outFile)
}
Pdf To Text Conversion in Node.js
var selectpdf = require('selectpdf');
console.log("This is SelectPdf-%s.", selectpdf.CLIENT_VERSION);
try {
var testUrl = 'https://selectpdf.com/demo/files/selectpdf.pdf';
var testPdf = 'Input.pdf';
var localFile = 'Result.txt';
var apiKey = 'Your API key here';
var client = new selectpdf.PdfToTextClient(apiKey);
// set parameters - see full list at https://selectpdf.com/pdf-to-text-api/
client
.setStartPage(1) // first page to process
.setEndPage(0) // 0 = process to the last page
.setOutputFormat(0) // 0 - Text, 1 - HTML
;
console.log('Starting pdf to text ...');
// extract text from a local pdf to a local text file
client.getTextFromFileToFile(testPdf, localFile,
function(err, fileName) {
if (err) return console.error("An error occurred: " + err);
console.log("Finished! Result is in file '" + fileName +
"'. Number of pages processed: " + client.getNumberOfPages());
var usageClient = new selectpdf.UsageClient(apiKey);
usageClient.getUsage(false, function(err2, data) {
if (err2) return console.error("Usage error: " + err2);
console.log("Conversions remained this month: " + data["available"]);
});
}
);
// other available calls:
/*
client.getTextFromFile(testPdf, function(err, text) { ... }); // local pdf -> string
client.getTextFromUrlToFile(testUrl, localFile, function(err, f) {...}); // remote url -> file
client.getTextFromUrl(testUrl, function(err, text) { ... }); // remote url -> string
*/
}
catch (ex) {
console.log("An error occurred: " + ex);
}
// Code snippet uses the SelectPdf API Client library for Node.js.
Test SelectPdf PDF To TEXT online RESTful API Now
Request a Demo License Key for SelectPdf REST API right now. Feel free to ask any questions if needed.
The free trial key for the online API is valid for 7 days and it includes 200 conversions.
SelectPdf for Cloud’s platform independent Pdf to Text API is a true REST API that can be used with any language: .NET, Java, PHP, Ruby, Rails, Python, jQuery and many more. You can use it with any language or platform that supports REST. Almost all platforms and languages support REST and provide native REST clients to work with REST APIs. You do not need to worry about language or platform limitations. You can use it with any platform – web, desktop, mobile, and cloud. Try now for free the best API to convert PDF to TEXT.
