The PDF to text REST API — extract text or search PDFs.
Two parameters. One endpoint. The text of any PDF. SelectPdf returns plain text or HTML from a PDF URL or upload — selectable-text PDFs (not OCR), page-range processing, search with case-sensitive and whole-words options.
PDF to text API in eight languages.
Open-source client libraries on GitHub for seven runtimes plus a Go example using the standard library. Same API surface across runtimes — learn once, deploy anywhere.
using System;
using System.Collections.Generic;
using SelectPdf.Api;
PdfToTextClient client = new PdfToTextClient(apiKey);
client.setStartPage(1); // first page to process
client.setEndPage(0); // 0 = process til the end
client.setOutputFormat(OutputFormat.Text); // 0 - Text, 1 - HTML
// --- Extract text from a remote PDF to a local file ---
client.getTextFromUrlToFile(testUrl, localFile);
Console.WriteLine("Pages processed: " + client.getNumberOfPages());
// --- Search a remote PDF for a phrase ---
IList<TextPosition> hits = client.searchUrl(testUrl, "pdf");
Console.WriteLine("Hits: " + hits.Count);
Console.WriteLine("Credits remaining: " + client.CreditsRemaining);
package com.selectpdf;
PdfToTextClient client = new PdfToTextClient(apiKey);
client.setStartPage(1); // first page
client.setEndPage(0); // 0 = til the end
client.setOutputFormat(ApiEnums.OutputFormat.Text); // Text or HTML
// --- Extract text from a remote PDF to a local file ---
client.getTextFromUrlToFile(testUrl, localFile);
System.out.printf("Pages processed: %d%n", client.getNumberOfPages());
// --- Search a remote PDF for a phrase ---
String results = client.searchUrl(testUrl, "pdf");
System.out.printf("Search results JSON: %s%n", results);
// usage telemetry
UsageClient usageClient = new UsageClient(apiKey);
System.out.println("Usage: " + usageClient.getUsage(false));
require("SelectPdf.Api.php");
$client = new SelectPdf\Api\PdfToTextClient($apiKey);
$client->setStartPage(1);
$client->setEndPage(0);
$client->setOutputFormat(SelectPdf\Api\OutputFormat::Text); // Text or HTML
// --- Extract text from a remote PDF to a local file ---
$client->getTextFromUrlToFile($testUrl, $localFile);
echo "Pages processed: " . $client->getNumberOfPages() . "\n";
// --- Search a remote PDF for a phrase ---
$results = $client->searchUrl($testUrl, "pdf");
echo "Hits: " . count($results) . "\n";
// usage telemetry
$usageClient = new \SelectPdf\Api\UsageClient($apiKey);
$usage = $usageClient->getUsage(false);
echo "Credits remaining: " . $usage["available"] . "\n";
import selectpdf
client = selectpdf.PdfToTextClient(apiKey)
client.setStartPage(1)
client.setEndPage(0)
client.setOutputFormat(selectpdf.OutputFormat.Text) # 0 - Text, 1 - HTML
# --- Extract text from a remote PDF to a local file ---
client.getTextFromUrlToFile(testUrl, localFile)
print("Pages processed:", client.getNumberOfPages())
# --- Search a remote PDF for a phrase ---
results = client.searchUrl(testUrl, "pdf")
print("Hits:", len(results))
# usage telemetry
usageClient = selectpdf.UsageClient(apiKey)
print("Credits remaining:", usageClient.getUsage()["available"])
var selectpdf = require('selectpdf');
var client = new selectpdf.PdfToTextClient(apiKey);
client.setStartPage(1).setEndPage(0).setOutputFormat(0); // Text or HTML
// --- Extract text from a remote PDF to a local file ---
client.getTextFromUrlToFile(testUrl, localFile, function(err, file) {
if (err) return console.error('Extract error:', err);
console.log('Pages processed:', client.getNumberOfPages());
// --- Search a remote PDF for a phrase ---
client.searchUrl(testUrl, 'pdf', false, false, function(err, hits) {
if (err) return console.error('Search error:', err);
console.log('Hits:', hits.length);
});
});
require 'selectpdf'
client = SelectPdf::PdfToTextClient.new(api_key)
client.start_page = 1
client.end_page = 0
client.output_format = SelectPdf::OutputFormat::TEXT # Text or HTML
# --- Extract text from a remote PDF to a local file ---
client.text_from_url_to_file(test_url, local_file)
print "Pages processed: #{client.number_of_pages}\n"
# --- Search a remote PDF for a phrase ---
results = client.search_url(test_url, 'pdf')
print "Hits: #{results.length}\n"
# usage telemetry
usage_client = SelectPdf::UsageClient.new(api_key)
print "Credits remaining: #{usage_client.get_usage(false)['available']}\n"
use SelectPdf;
use JSON;
my $client = SelectPdf::PdfToTextClient->new($apiKey);
$client->setStartPage(1);
$client->setEndPage(0);
$client->setOutputFormat(0); # 0 - Text, 1 - HTML
# --- Extract text from a remote PDF to a local file ---
$client->getTextFromUrlToFile($test_url, $local_file);
print "Pages processed: " . $client->getNumberOfPages() . "\n";
# --- Search a remote PDF for a phrase ---
my $results = $client->searchUrl($test_url, "pdf", "False", "False");
print "Hits: " . scalar(@$results) . "\n";
# usage telemetry
my $usageClient = SelectPdf::UsageClient->new($apiKey);
print "Credits: " . $usageClient->getUsage(0)->{"available"} . "\n";
// No dedicated Go SDK — POST /api2/pdftotext/ over net/http.
package main
import (
"bytes"
"io"
"mime/multipart"
"net/http"
)
const apiURL = "https://selectpdf.com/api2/pdftotext/"
// Helper: POST one /api2/pdftotext/ request and return the response body.
func ptt(action, target, query string) ([]byte, error) {
b := &bytes.Buffer{}
w := multipart.NewWriter(b)
w.WriteField("key", apiKey)
w.WriteField("url", target)
w.WriteField("action", action)
if action == "Search" { w.WriteField("search_text", query) }
w.Close()
req, _ := http.NewRequest("POST", apiURL, b)
req.Header.Set("Content-Type", w.FormDataContentType())
resp, _ := http.DefaultClient.Do(req)
defer resp.Body.Close()
return io.ReadAll(resp.Body)
}
text, _ := ptt("Convert", testUrl, "") // extracted plain text
hits, _ := ptt("Search", testUrl, "pdf") // JSON array of TextPosition
Read any PDF. Text or HTML.
Plain text from any PDF
Return the textual content of a PDF document as a plain-text payload. Same engine as the .NET PDF library — REST means any language, anywhere.
Find phrases with positions
Pass action=Search with search_text and receive a JSON array of every match — page index, rectangle, and surrounding context. Case-sensitive and whole-word toggles.
Process just the pages you need
Set start_page and end_page to extract from a slice of a long PDF — table of contents, an appendix, the first chapter. end_page=0 means "to the end".
Plain text or HTML
Choose output_format=0 for clean plain text or =1 for HTML that keeps line breaks and structure. text_layout picks between the original layout and a reading-order linearization.
URL or local PDF upload
Pass url for an online PDF, or upload a local PDF as multipart/form-data. Password-protected PDFs work too — set user_password and the API opens them.
202 + poll for the long ones
Synchronous for fast extractions (up to timeout=120 s). For longer jobs, set async=True: get a job ID, poll GET /api2/asyncjob/ until the text is ready.
One endpoint, two required things.
POST to /api2/pdftotext/ as multipart/form-data. Pass your license key and either a URL or an uploaded PDF file. The Convert action returns the extracted text in the response body; the Search action returns JSON with match positions.
https://selectpdf.com/api2/pdftotext/
POST /api2/pdftotext/ HTTP/1.1 Host: selectpdf.com Content-Type: multipart/form-data; boundary=----X ------X Content-Disposition: form-data; name="key" YOUR_LICENSE_KEY ------X Content-Disposition: form-data; name="url" https://selectpdf.com/demo/files/selectpdf.pdf ------X--
url
URL · OR · FILE
URL of the PDF document to process. URL-encode the value.
(file upload)
URL · OR · FILE
Or upload a local PDF as multipart/form-data. The SDK handles the field name automatically.
Convert. To search instead, pass action=Search with a search_text value.
Page range defaults to the whole document (start_page=1, end_page=0).
See the full reference in the 14-parameter section below.
Three PDF to text API endpoints. One key.
All endpoints under https://selectpdf.com/api2/. The same license key authenticates every call.
/api2/pdftotext/
Extract text or search a PDF. The main endpoint — synchronous unless async=True. POST only (multipart/form-data); GET returns 400 from the controller. More details →
/api2/asyncjob/
Poll an async PDF-to-text job. Returns the result once ready, or HTTP 202 while still running. More details →
/api2/usage/
Read your subscription: current plan, monthly limit, used and remaining credits. More details →
Every parameter, in one place.
The full 14-parameter surface of the PDF-to-text endpoint. Use the side navigation to jump between groups, or the search box to find a parameter by name.
Mandatory parameters
Only two parameters are required. Every other parameter falls back to a documented default.
key
url
Options shared by both actions
Apply to both the Convert and Search actions. All optional — defaults are used when omitted.
action
Convert
Search
start_page
end_page
user_password
timeout
Convert-only parameters
Used only when action is Convert (the default).
text_layout
0 (Original)
1 (Reading Order)
output_format
0 (Text)
1 (Html)
Search-only parameters
Used only when action is Search. Returns a JSON array of positions where search_text appears in the PDF.
search_text
case_sensitive
True
False
whole_words_only
True
False
Async mode & JSON parameter blob
Submit asynchronously when the sync timeout is not enough, or pass every parameter as a single JSON object instead of many form-data fields.
async
True
False
raw_parameters
For PDFs that outlast your HTTP connection.
Recommended for large multi-page PDFs or callers that cannot keep an HTTP connection open for the full extraction. Submit once, poll the result when it is ready.
- POST
/api2/pdftotext/withasync=True(and the rest of your parameters). - The server replies
202 Acceptedand returns the job ID in theX-SelectPdf-Job-Idresponse header. - Poll
GET /api2/asyncjob/?key=YOUR_KEY&job_id=…at a reasonable cadence. - Each poll:
202= still processing ·200= result returned in the response body ·499= extraction failed (plain-text reason in the body).
https://selectpdf.com/api2/asyncjob/?key=YOUR_KEY&job_id=JOB_ID
The async parameter belongs to /api2/pdftotext/ (see Advanced); job_id is a parameter of the /api2/asyncjob/ polling endpoint, not of this one — the server returns the value to you in the X-SelectPdf-Job-Id response header.
HTTP status, plain English.
Standard HTTP semantics. The body of every non-200 contains a plain-text explanation — no error shapes to parse, no enum to memorize.
action is Convert, the response body is the extracted text or HTML. If action is Search, the body is JSON with the searched text positions.X-SelectPdf-Job-Id header; poll /api2/asyncjob/ until the result is ready.multipart/form-data.Retry-After header indicates how long to back off before retrying.v2).production for PDF-to-text (no public demo endpoint).worker for PDF-to-text (no in-process fallback).-1 means unlimited.429: suggested back-off in seconds.Read your subscription. Programmatically.
Two ways to know where your quota stands. Every successful extraction already returns X-SelectPdf-Credits-Total and X-SelectPdf-Credits-Remaining response headers — read those for a zero-extra-cost live view of remaining credits. Use the dedicated /api2/usage/ endpoint below when you also need the subscription tier or a month-by-month history.
https://selectpdf.com/api2/usage/?key=YOUR_KEY&get_history=True
https://selectpdf.com/api2/usage/
{
"key": "YOUR_LICENSE_KEY",
"get_history": "True"
}
{
"status": "License key active.",
"subscription_type": "Entry Level",
"limit": 2000,
"used": 340,
"available": 1660,
"history": [
{ "year": 2026, "month": 4,
"conversions": 340, "credits": 340 },
{ "year": 2026, "month": 3,
"conversions": 1876, "credits": 1923 }
]
}
Concurrency scales with your plan.
Each plan allows a fixed number of simultaneous requests. Excess requests are queued or rejected with a 429 Too Many Requests. One conversion credit covers up to 50 PDF pages — a 1-page and a 49-page extraction cost the same. The API accepts PDF files up to 100 MB.
Each 50 pages of processed input counts as one conversion credit. A 12-page PDF, a 1-page PDF and a 49-page PDF all cost the same. The API accepts PDF files up to 100 MB and processes PDFs with selectable text only — this is not an OCR tool; PDFs with text embedded in images will not return useful output. See API pricing for plan details.
PDF to text API, answered.
Six quick answers about the PDF-to-text REST API. For the full surface, see the 14-parameter reference above.
user_password with the document password and the API will open and extract text from password-protected PDFs.Convert action (default) returns the extracted text or HTML in the response body. The Search action returns a JSON array of positions where search_text appears, with case_sensitive and whole_words_only options.start_page and end_page to restrict the page range. end_page=0 means "process to the last page".async=True. The API returns 202 Accepted with a job ID in the X-SelectPdf-Job-Id header; poll GET /api2/asyncjob/?key=…&job_id=… until the text is returned. The maximum synchronous timeout parameter is 120 seconds.