§ ONLINE API PDF → TEXT · REST · ANY LANGUAGE

The PDF to text REST API — extract text or search PDFs.

Two parameters. One endpoint. The text of any PDF. SelectPdf returns plain text or HTML from a PDF URL or upload — selectable-text PDFs (not OCR), page-range processing, search with case-sensitive and whole-words options.

7-day free trial · 200 conversions · no credit card

§CLIENT LIBRARIES

PDF to text API in eight languages.

Open-source client libraries on GitHub for seven runtimes plus a Go example using the standard library. Same API surface across runtimes — learn once, deploy anywhere.

PdfToText.cs
using System;
using System.Collections.Generic;
using SelectPdf.Api;

PdfToTextClient client = new PdfToTextClient(apiKey);
client.setStartPage(1);                       // first page to process
client.setEndPage(0);                         // 0 = process til the end
client.setOutputFormat(OutputFormat.Text);    // 0 - Text, 1 - HTML

// --- Extract text from a remote PDF to a local file ---
client.getTextFromUrlToFile(testUrl, localFile);
Console.WriteLine("Pages processed: " + client.getNumberOfPages());

// --- Search a remote PDF for a phrase ---
IList<TextPosition> hits = client.searchUrl(testUrl, "pdf");
Console.WriteLine("Hits: " + hits.Count);
Console.WriteLine("Credits remaining: " + client.CreditsRemaining);
.NET (C#) CLIENT
Three lines, one extraction.

Bypass the raw HTTP wiring. The official client wraps the REST endpoint with named setters, file/stream convenience methods, and language-idiomatic error handling.

PdfToText.java
package com.selectpdf;

PdfToTextClient client = new PdfToTextClient(apiKey);
client.setStartPage(1);                              // first page
client.setEndPage(0);                                // 0 = til the end
client.setOutputFormat(ApiEnums.OutputFormat.Text);  // Text or HTML

// --- Extract text from a remote PDF to a local file ---
client.getTextFromUrlToFile(testUrl, localFile);
System.out.printf("Pages processed: %d%n", client.getNumberOfPages());

// --- Search a remote PDF for a phrase ---
String results = client.searchUrl(testUrl, "pdf");
System.out.printf("Search results JSON: %s%n", results);

// usage telemetry
UsageClient usageClient = new UsageClient(apiKey);
System.out.println("Usage: " + usageClient.getUsage(false));
JAVA CLIENT
Three lines, one extraction.

Bypass the raw HTTP wiring. The official client wraps the REST endpoint with named setters, file/stream convenience methods, and language-idiomatic error handling.

pdf-to-text.php
require("SelectPdf.Api.php");

$client = new SelectPdf\Api\PdfToTextClient($apiKey);
$client->setStartPage(1);
$client->setEndPage(0);
$client->setOutputFormat(SelectPdf\Api\OutputFormat::Text);  // Text or HTML

// --- Extract text from a remote PDF to a local file ---
$client->getTextFromUrlToFile($testUrl, $localFile);
echo "Pages processed: " . $client->getNumberOfPages() . "\n";

// --- Search a remote PDF for a phrase ---
$results = $client->searchUrl($testUrl, "pdf");
echo "Hits: " . count($results) . "\n";

// usage telemetry
$usageClient = new \SelectPdf\Api\UsageClient($apiKey);
$usage = $usageClient->getUsage(false);
echo "Credits remaining: " . $usage["available"] . "\n";
PHP CLIENT
Three lines, one extraction.

Bypass the raw HTTP wiring. The official client wraps the REST endpoint with named setters, file/stream convenience methods, and language-idiomatic error handling.

pdf_to_text.py
import selectpdf

client = selectpdf.PdfToTextClient(apiKey)
client.setStartPage(1)
client.setEndPage(0)
client.setOutputFormat(selectpdf.OutputFormat.Text)  # 0 - Text, 1 - HTML

# --- Extract text from a remote PDF to a local file ---
client.getTextFromUrlToFile(testUrl, localFile)
print("Pages processed:", client.getNumberOfPages())

# --- Search a remote PDF for a phrase ---
results = client.searchUrl(testUrl, "pdf")
print("Hits:", len(results))

# usage telemetry
usageClient = selectpdf.UsageClient(apiKey)
print("Credits remaining:", usageClient.getUsage()["available"])
PYTHON CLIENT
Three lines, one extraction.

Bypass the raw HTTP wiring. The official client wraps the REST endpoint with named setters, file/stream convenience methods, and language-idiomatic error handling.

pdf-to-text.js
var selectpdf = require('selectpdf');

var client = new selectpdf.PdfToTextClient(apiKey);
client.setStartPage(1).setEndPage(0).setOutputFormat(0);  // Text or HTML

// --- Extract text from a remote PDF to a local file ---
client.getTextFromUrlToFile(testUrl, localFile, function(err, file) {
    if (err) return console.error('Extract error:', err);
    console.log('Pages processed:', client.getNumberOfPages());

    // --- Search a remote PDF for a phrase ---
    client.searchUrl(testUrl, 'pdf', false, false, function(err, hits) {
        if (err) return console.error('Search error:', err);
        console.log('Hits:', hits.length);
    });
});
NODE.JS CLIENT
Three lines, one extraction.

Bypass the raw HTTP wiring. The official client wraps the REST endpoint with named setters, file/stream convenience methods, and language-idiomatic error handling.

pdf_to_text.rb
require 'selectpdf'

client = SelectPdf::PdfToTextClient.new(api_key)
client.start_page = 1
client.end_page = 0
client.output_format = SelectPdf::OutputFormat::TEXT  # Text or HTML

# --- Extract text from a remote PDF to a local file ---
client.text_from_url_to_file(test_url, local_file)
print "Pages processed: #{client.number_of_pages}\n"

# --- Search a remote PDF for a phrase ---
results = client.search_url(test_url, 'pdf')
print "Hits: #{results.length}\n"

# usage telemetry
usage_client = SelectPdf::UsageClient.new(api_key)
print "Credits remaining: #{usage_client.get_usage(false)['available']}\n"
RUBY CLIENT
Three lines, one extraction.

Bypass the raw HTTP wiring. The official client wraps the REST endpoint with named setters, file/stream convenience methods, and language-idiomatic error handling.

PdfToText.pl
use SelectPdf;
use JSON;

my $client = SelectPdf::PdfToTextClient->new($apiKey);
$client->setStartPage(1);
$client->setEndPage(0);
$client->setOutputFormat(0);  # 0 - Text, 1 - HTML

# --- Extract text from a remote PDF to a local file ---
$client->getTextFromUrlToFile($test_url, $local_file);
print "Pages processed: " . $client->getNumberOfPages() . "\n";

# --- Search a remote PDF for a phrase ---
my $results = $client->searchUrl($test_url, "pdf", "False", "False");
print "Hits: " . scalar(@$results) . "\n";

# usage telemetry
my $usageClient = SelectPdf::UsageClient->new($apiKey);
print "Credits: " . $usageClient->getUsage(0)->{"available"} . "\n";
PERL CLIENT
Three lines, one extraction.

Bypass the raw HTTP wiring. The official client wraps the REST endpoint with named setters, file/stream convenience methods, and language-idiomatic error handling.

pdftotext.go
// No dedicated Go SDK — POST /api2/pdftotext/ over net/http.
package main

import (
    "bytes"
    "io"
    "mime/multipart"
    "net/http"
)

const apiURL = "https://selectpdf.com/api2/pdftotext/"

// Helper: POST one /api2/pdftotext/ request and return the response body.
func ptt(action, target, query string) ([]byte, error) {
    b := &bytes.Buffer{}
    w := multipart.NewWriter(b)
    w.WriteField("key", apiKey)
    w.WriteField("url", target)
    w.WriteField("action", action)
    if action == "Search" { w.WriteField("search_text", query) }
    w.Close()
    req, _ := http.NewRequest("POST", apiURL, b)
    req.Header.Set("Content-Type", w.FormDataContentType())
    resp, _ := http.DefaultClient.Do(req)
    defer resp.Body.Close()
    return io.ReadAll(resp.Body)
}

text, _ := ptt("Convert", testUrl, "")    // extracted plain text
hits, _ := ptt("Search",  testUrl, "pdf") // JSON array of TextPosition
GO CLIENT
Three lines, one extraction.

No dedicated client library for this runtime — the snippet uses the standard library to POST a multipart/form-data request directly to the REST endpoint.

CLIENTnet/http — no SDK
ENDPOINTPOST /api2/pdftotext/
§WHAT IT DOES

Read any PDF. Text or HTML.

EXTRACT

Plain text from any PDF

Return the textual content of a PDF document as a plain-text payload. Same engine as the .NET PDF library — REST means any language, anywhere.

SEARCH

Find phrases with positions

Pass action=Search with search_text and receive a JSON array of every match — page index, rectangle, and surrounding context. Case-sensitive and whole-word toggles.

PAGE RANGE

Process just the pages you need

Set start_page and end_page to extract from a slice of a long PDF — table of contents, an appendix, the first chapter. end_page=0 means "to the end".

OUTPUT FORMAT

Plain text or HTML

Choose output_format=0 for clean plain text or =1 for HTML that keeps line breaks and structure. text_layout picks between the original layout and a reading-order linearization.

INPUTS

URL or local PDF upload

Pass url for an online PDF, or upload a local PDF as multipart/form-data. Password-protected PDFs work too — set user_password and the API opens them.

SYNC + ASYNC

202 + poll for the long ones

Synchronous for fast extractions (up to timeout=120 s). For longer jobs, set async=True: get a job ID, poll GET /api2/asyncjob/ until the text is ready.

§HOW IT WORKS

One endpoint, two required things.

POST to /api2/pdftotext/ as multipart/form-data. Pass your license key and either a URL or an uploaded PDF file. The Convert action returns the extracted text in the response body; the Search action returns JSON with match positions.

POST — multipart/form-data
POST https://selectpdf.com/api2/pdftotext/
Content-Type: multipart/form-data
POST /api2/pdftotext/ HTTP/1.1
Host: selectpdf.com
Content-Type: multipart/form-data; boundary=----X

------X
Content-Disposition: form-data; name="key"

YOUR_LICENSE_KEY
------X
Content-Disposition: form-data; name="url"

https://selectpdf.com/demo/files/selectpdf.pdf
------X--
key REQUIRED

Your API license key. Get one in 30 seconds — free for 7 days.

url URL · OR · FILE

URL of the PDF document to process. URL-encode the value.

(file upload) URL · OR · FILE

Or upload a local PDF as multipart/form-data. The SDK handles the field name automatically.

Reminders. Action defaults to Convert. To search instead, pass action=Search with a search_text value. Page range defaults to the whole document (start_page=1, end_page=0). See the full reference in the 14-parameter section below.
§ENDPOINTS

Three PDF to text API endpoints. One key.

All endpoints under https://selectpdf.com/api2/. The same license key authenticates every call.

POST /api2/pdftotext/ Extract text or search a PDF. The main endpoint — synchronous unless async=True. POST only (multipart/form-data); GET returns 400 from the controller. More details →
GETPOST /api2/asyncjob/ Poll an async PDF-to-text job. Returns the result once ready, or HTTP 202 while still running. More details →
GETPOST /api2/usage/ Read your subscription: current plan, monthly limit, used and remaining credits. More details →
§PARAMETERS

Every parameter, in one place.

The full 14-parameter surface of the PDF-to-text endpoint. Use the side navigation to jump between groups, or the search box to find a parameter by name.

§ MANDATORY 2 PARAMETERS

Mandatory parameters

Only two parameters are required. Every other parameter falls back to a documented default.

PARAMETER
DESCRIPTION
key
REQUIRED string
Your API license key.
url
URL · OR · FILE string
URL of the PDF document to process (URL-encoded). Alternatively, upload a local PDF as a multipart file part — any field name works; the controller takes the first file from any multipart file part.
§ COMMON OPTIONS 5 PARAMETERS

Options shared by both actions

Apply to both the Convert and Search actions. All optional — defaults are used when omitted.

PARAMETER
DESCRIPTION
action
enum default: Convert
Specifies the action performed on the PDF. Convert extracts text or HTML; Search returns the positions of search_text.
Convert Search
start_page
int default: 1
Start page number in the PDF document.
end_page
int default: 0
End page number in the PDF document. 0 means process to the last page.
user_password
string
Password used to open password-protected PDF documents.
timeout
number (seconds) default: 30
Maximum amount of time the synchronous job can run. Up to 120 seconds.
§ TEXT EXTRACTION 2 PARAMETERS

Convert-only parameters

Used only when action is Convert (the default).

PARAMETER
DESCRIPTION
text_layout
enum default: 0
Output text layout. 0 — Original (preserve the layout of the PDF) · 1 — Reading Order (linearize for reading).
0 (Original) 1 (Reading Order)
output_format
enum default: 0
Output format of the extracted text.
0 (Text) 1 (Html)
§ ADVANCED 2 PARAMETERS

Async mode & JSON parameter blob

Submit asynchronously when the sync timeout is not enough, or pass every parameter as a single JSON object instead of many form-data fields.

PARAMETER
DESCRIPTION
async
bool default: False
Submit the extraction asynchronously. The endpoint returns HTTP 202 with the job ID in the X-SelectPdf-Job-Id header; poll /api2/asyncjob/ to retrieve the text.
True False
raw_parameters
string (JSON)
Alternative to multiple form-data fields — pass a JSON object containing all parameter values as a single form-data field. The controller deserializes it into PdfToTextParameters and then uses the same processing path. Convenient for SDKs and callers that prefer one-shot JSON over multipart key/value pairs.
§ASYNCHRONOUS CONVERSION

For PDFs that outlast your HTTP connection.

Recommended for large multi-page PDFs or callers that cannot keep an HTTP connection open for the full extraction. Submit once, poll the result when it is ready.

  1. POST /api2/pdftotext/ with async=True (and the rest of your parameters).
  2. The server replies 202 Accepted and returns the job ID in the X-SelectPdf-Job-Id response header.
  3. Poll GET /api2/asyncjob/?key=YOUR_KEY&job_id=… at a reasonable cadence.
  4. Each poll: 202 = still processing · 200 = result returned in the response body · 499 = extraction failed (plain-text reason in the body).
GET https://selectpdf.com/api2/asyncjob/?key=YOUR_KEY&job_id=JOB_ID

The async parameter belongs to /api2/pdftotext/ (see Advanced); job_id is a parameter of the /api2/asyncjob/ polling endpoint, not of this one — the server returns the value to you in the X-SelectPdf-Job-Id response header.

§RESPONSE

HTTP status, plain English.

Standard HTTP semantics. The body of every non-200 contains a plain-text explanation — no error shapes to parse, no enum to memorize.

200
OK
The API call succeeded. If action is Convert, the response body is the extracted text or HTML. If action is Search, the body is JSON with the searched text positions.
202
Accepted — asynchronous job
The extraction was accepted asynchronously. The job ID is returned in the X-SelectPdf-Job-Id header; poll /api2/asyncjob/ until the result is ready.
400
Bad Request
URL or file not specified, or parameter validation failure. The response body explains in plain text.
401
Authorization Required
License key not specified or invalid. The response body contains an explanation in plain text.
415
Unsupported Media Type
The PDF to text API requires data to be posted as multipart/form-data.
429
Too Many Requests
Concurrency limit on your plan exceeded. Requests are either queued or rejected with 429. The Retry-After header indicates how long to back off before retrying.
499
Custom — Conversion Error
Something went wrong during the extraction (file unreadable, password incorrect, timeout, etc.). The body contains an explanation in plain text.
RESPONSE HEADERS
X-SelectPdf-ApiAPI version (currently v2).
X-SelectPdf-PagesNumber of pages in the processed PDF document.
X-SelectPdf-Job-IdUUID of the asynchronous job (returned with HTTP 202).
X-SelectPdf-ModeEndpoint mode — always production for PDF-to-text (no public demo endpoint).
X-SelectPdf-ExecutionServer execution path — always worker for PDF-to-text (no in-process fallback).
X-SelectPdf-Credits-TotalMonthly conversion credit limit on the calling key. -1 means unlimited.
X-SelectPdf-Credits-RemainingCredits remaining for the current month.
Retry-AfterReturned with 429: suggested back-off in seconds.
§USAGE TRACKING

Read your subscription. Programmatically.

Two ways to know where your quota stands. Every successful extraction already returns X-SelectPdf-Credits-Total and X-SelectPdf-Credits-Remaining response headers — read those for a zero-extra-cost live view of remaining credits. Use the dedicated /api2/usage/ endpoint below when you also need the subscription tier or a month-by-month history.

GET https://selectpdf.com/api2/usage/?key=YOUR_KEY&get_history=True
POST https://selectpdf.com/api2/usage/
application/json
{
  "key": "YOUR_LICENSE_KEY",
  "get_history": "True"
}
response.json
{
  "status": "License key active.",
  "subscription_type": "Entry Level",
  "limit": 2000,
  "used": 340,
  "available": 1660,
  "history": [
    { "year": 2026, "month": 4,
      "conversions": 340, "credits": 340 },
    { "year": 2026, "month": 3,
      "conversions": 1876, "credits": 1923 }
  ]
}
§LIMITS

Concurrency scales with your plan.

Each plan allows a fixed number of simultaneous requests. Excess requests are queued or rejected with a 429 Too Many Requests. One conversion credit covers up to 50 PDF pages — a 1-page and a 49-page extraction cost the same. The API accepts PDF files up to 100 MB.

Free Trial
200 /mo
1req
Entry
2,000 /mo
2req
Standard
5,000 /mo
4req
Advanced
20,000 /mo
8req
Premium
50,000 /mo
8req
Ultra
100,000 /mo
16req
Dedicated
Unlimited /mo
16req
CREDIT MATH
50 pages = 1 credit · 100 MB file cap

Each 50 pages of processed input counts as one conversion credit. A 12-page PDF, a 1-page PDF and a 49-page PDF all cost the same. The API accepts PDF files up to 100 MB and processes PDFs with selectable text only — this is not an OCR tool; PDFs with text embedded in images will not return useful output. See API pricing for plan details.

?FAQ

PDF to text API, answered.

Six quick answers about the PDF-to-text REST API. For the full surface, see the 14-parameter reference above.

Yes. Pass user_password with the document password and the API will open and extract text from password-protected PDFs.
No. The API extracts text from PDFs that already have selectable text. It is not an OCR tool — PDFs whose pages are images of text will not return useful output.
The Convert action (default) returns the extracted text or HTML in the response body. The Search action returns a JSON array of positions where search_text appears, with case_sensitive and whole_words_only options.
Each 50 pages of input PDF counts as one conversion credit. A 49-page PDF costs one credit, a 51-page PDF costs two. Monthly plans run from $19 to $449.
Yes. Set start_page and end_page to restrict the page range. end_page=0 means "process to the last page".
Submit the request with async=True. The API returns 202 Accepted with a job ID in the X-SelectPdf-Job-Id header; poll GET /api2/asyncjob/?key=…&job_id=… until the text is returned. The maximum synchronous timeout parameter is 120 seconds.