SelectPdf Online REST API – Python Client Library

SelectPdf Online REST API is a platform independent PDF manipulation API. SelectPdf REST API is cloud based and it can be used with any language: .NET (C# or VB.NET), Java, PHP, Python, Go, Ruby, Node.js, Perl and many more. We are presenting today the dedicated Python client library for SelectPdf API.

SelectPdf Python client library can be used to take advance of the features offered by SelectPdf Online REST API:

HTML to PDF REST API – Use SelectPdf HTML To PDF Online REST API to generate PDF documents from web page urls or raw HTML code.

PDF to TEXT REST API – SelectPdf Pdf To Text REST API is a cloud based solution that can be used to extract text from PDF documents or search PDF documents for specific words.

PDF Merge REST API – SelectPdf Pdf Merge REST API is an online solution that can be used to merge local or remote PDFs into a final PDF document.

All these APIs can be easily integrated with Python scripts and applications using the dedicated client library.

Installation

Download selectpdf-api-python-client-1.4.0.zip, unzip it and run:

cd selectpdf-api-python-client-1.4.0
python setup.py install

OR

Install SelectPdf Python Client for Online API via PyP: SelectPdf API on PyPI.

pip install selectpdf

OR

Clone selectpdf-api-python-client from Github and install the library.

git clone https://github.com/selectpdf/selectpdf-api-python-client
cd selectpdf-api-python-client
python setup.py install

Get a trial key for SelectPdf online REST API

Once the library is installed, you need a key to be able to access the API.

GET A DEMO LICENSE KEY NOW
The free trial key for the online API is valid for 7 days and it includes 200 conversions.

Sample Code

The Python client library makes accessing SelectPdf online REST API very easy. Here are a few samples that present the main features of the API. For details and full list of parameters access the individual pages of the APIs: HTML to PDF API or PDF to TEXT API or PDF Merge API.

Convert HTML to PDF in Python

The following sample shows the main features of the HTML To PDF API. Comment/uncomment code to convert an url to file or memory or also convert raw HTML to file or memory.

# -*- coding: utf-8 -*-

import sys, json
import selectpdf

url = "https://selectpdf.com"
localFile = "Test.pdf"
apiKey = "Your API key here"

pythonVersion = "Python 3" if selectpdf.IS_PYTHON3 else "Python 2"
print ("This is SelectPdf-{0} using {1}.".format(selectpdf.CLIENT_VERSION, pythonVersion))

try:
    client = selectpdf.HtmlToPdfClient(apiKey)

    # set parameters - see full list at https://selectpdf.com/html-to-pdf-api/

    # main properties

    client.setPageSize(selectpdf.PageSize.A4) # PDF page size
    client.setPageOrientation(selectpdf.PageOrientation.Portrait) # PDF page orientation
    client.setMargins(0) # PDF page margins
    client.setRenderingEngine(selectpdf.RenderingEngine.WebKit) # rendering engine
    client.setConversionDelay(1) # conversion delay
    client.setNavigationTimeout(30) # navigation timeout
    client.setShowPageNumbers(False) # page numbers
    client.setPageBreaksEnhancedAlgorithm(True) # enhanced page break algorithm

    # additional properties
    
    # client.setUseCssPrint(True) # enable CSS media print
    # client.setDisableJavascript(True) # disable javascript
    # client.setDisableInternalLinks(True) # disable internal links
    # client.setDisableExternalLinks(True) # disable external links
    # client.setKeepImagesTogether(True) # keep images together
    # client.setScaleImages(True) # scale images to create smaller pdfs
    # client.setSinglePagePdf(True) # generate a single page PDF
    # client.setUserPassword("password") # secure the PDF with a password

    # generate automatic bookmarks

    # client.setPdfBookmarksSelectors("H1, H2") # create outlines (bookmarks) for the specified elements
    # client.setViewerPageMode(selectpdf.PageMode.UseOutlines) # display outlines (bookmarks) in viewer

    print ("Starting conversion ...")
    
    # convert url to file
    client.convertUrlToFile(url, localFile)

    # convert url to memory
    # pdf = client.convertUrl(url)

    # convert html string to file
    # client.convertHtmlStringToFile("This is some html.", localFile)

    # convert html string to memory
    # pdf = client.convertHtmlString("This is some html.")

    print ("Finished! Number of pages: {0}.".format(client.getNumberOfPages()))

    # get API usage
    usageClient = selectpdf.UsageClient(apiKey)
    usage = usageClient.getUsage()
    print("Conversions remained this month: {0}.".format(usage["available"]))

except selectpdf.ApiException as ex:
    print ("An error occurred: {0}.".format(ex.getMessage()))

Convert HTML to PDF with custom header/footer in Python

The following sample shows how to convert a web page to PDF and also setting a custom header or footer.

# -*- coding: utf-8 -*-

import sys, json
import selectpdf

url = "https://selectpdf.com"
localFile = "Test.pdf"
apiKey = "Your API key here"

pythonVersion = "Python 3" if selectpdf.IS_PYTHON3 else "Python 2"
print ("This is SelectPdf-{0} using {1}.".format(selectpdf.CLIENT_VERSION, pythonVersion))

try:
    client = selectpdf.HtmlToPdfClient(apiKey)

    # set parameters - see full list at https://selectpdf.com/html-to-pdf-api/

    client.setMargins(0) # PDF page margins
    client.setPageBreaksEnhancedAlgorithm(True) # enhanced page break algorithm

    # header properties

    client.setShowHeader(True) # display header
    # client.setHeaderHeight(50) # header height
    # client.setHeaderUrl(url) # header url
    client.setHeaderHtml("This is the HEADER!!!!") # header html

    # footer properties

    client.setShowFooter(True) # display footer
    # client.setFooterHeight(60) # footer height
    # client.setFooterUrl(url) # footer url
    client.setFooterHtml("This is the FOOTER!!!!") # footer html

    # footer page numbers
    
    client.setShowPageNumbers(True) # show page numbers in footer
    client.setPageNumbersTemplate("{page_number} / {total_pages}") # page numbers template
    client.setPageNumbersFontName("Verdana") # page numbers font name
    client.setPageNumbersFontSize(12) # page numbers font size
    client.setPageNumbersAlignment(selectpdf.PageNumbersAlignment.Center) # page numbers alignment (2-Center)

    print ("Starting conversion ...")
    
    # convert url to file
    client.convertUrlToFile(url, localFile)

    # convert url to memory
    # pdf = client.convertUrl(url)

    # convert html string to file
    # client.convertHtmlStringToFile("This is some html.", localFile)

    # convert html string to memory
    # pdf = client.convertHtmlString("This is some html.")

    print ("Finished! Number of pages: {0}.".format(client.getNumberOfPages()))

    # get API usage
    usageClient = selectpdf.UsageClient(apiKey)
    usage = usageClient.getUsage()
    print("Conversions remained this month: {0}.".format(usage["available"]))

except selectpdf.ApiException as ex:
    print ("An error occurred: {0}.".format(ex.getMessage()))

Extract text from PDF in Python

The following sample shows how to extract the text from a PDF document using SelectPdf API. Comment/uncomment code to convert a local PDF or a PDF from a remote url to file or memory.

# -*- coding: utf-8 -*-

import sys, json
import selectpdf

testUrl = "https://selectpdf.com/demo/files/selectpdf.pdf"
testPdf = "Input.pdf"
localFile = "Result.txt"
apiKey = "Your API key here"

pythonVersion = "Python 3" if selectpdf.IS_PYTHON3 else "Python 2"
print ("This is SelectPdf-{0} using {1}.".format(selectpdf.CLIENT_VERSION, pythonVersion))

try:
    client = selectpdf.PdfToTextClient(apiKey)

    # set parameters - see full list at https://selectpdf.com/pdf-to-text-api/

    client.setStartPage(1) # start page (processing starts from here)
    client.setEndPage(0) # end page (set 0 to process file til the end)
    client.setOutputFormat(selectpdf.OutputFormat.Text) # set output format (0-Text or 1-HTML)

    print ("Starting pdf to text ...")
    
    # convert local pdf to local text file
    client.getTextFromFileToFile(testPdf, localFile)

    # extract text from local pdf to memory
    # text = client.getTextFromFile(testPdf)
    # print text
    # print (text)

    # convert pdf from public url to local text file
    # client.getTextFromUrlToFile(testUrl, localFile)

    # extract text from pdf from public url to memory
    # text = client.getTextFromUrl(testUrl)
    # print text
    # print (text)

    print ("Finished! Number of pages processed: {0}.".format(client.getNumberOfPages()))

    # get API usage
    usageClient = selectpdf.UsageClient(apiKey)
    usage = usageClient.getUsage()
    print("Conversions remained this month: {0}.".format(usage["available"]))

except selectpdf.ApiException as ex:
    print ("An error occurred: {0}.".format(ex.getMessage()))

Search for text in PDF using Python

The following sample shows how to search a PDF document for a specific text.

# -*- coding: utf-8 -*-

import sys, json
import selectpdf

testUrl = "https://selectpdf.com/demo/files/selectpdf.pdf"
testPdf = "Input.pdf"
apiKey = "Your API key here"

pythonVersion = "Python 3" if selectpdf.IS_PYTHON3 else "Python 2"
print ("This is SelectPdf-{0} using {1}.".format(selectpdf.CLIENT_VERSION, pythonVersion))

try:
    client = selectpdf.PdfToTextClient(apiKey)

    # set parameters - see full list at https://selectpdf.com/pdf-to-text-api/

    client.setStartPage(1) # start page (processing starts from here)
    client.setEndPage(0) # end page (set 0 to process file til the end)
    client.setOutputFormat(selectpdf.OutputFormat.Text) # set output format (0-Text or 1-HTML)

    print ("Starting search pdf ...")
    
    # search local pdf
    results = client.searchFile(testPdf, "pdf")

    # search pdf from public url
    # results = client.searchUrl(testUrl, "pdf")

    print ("Search results:\n{0}\nSearch results count: {1}.".format(json.dumps(results, indent=4), len(results)))

    print ("Finished! Number of pages processed: {0}.".format(client.getNumberOfPages()))

    # get API usage
    usageClient = selectpdf.UsageClient(apiKey)
    usage = usageClient.getUsage()
    print("Conversions remained this month: {0}.".format(usage["available"]))

except selectpdf.ApiException as ex:
    print ("An error occurred: {0}.".format(ex.getMessage()))

Merge PDFs using Python

The following sample shows how merge several PDF documents into a final file. The source PDFs can be local files or PDFs from remote urls. The final PDF can be retrieved in memory or saved to a local file.

# -*- coding: utf-8 -*-

import sys, json
import selectpdf

testUrl = "https://selectpdf.com/demo/files/selectpdf.pdf"
testPdf = "Input.pdf"
localFile = "Result.pdf"
apiKey = "Your API key here"

pythonVersion = "Python 3" if selectpdf.IS_PYTHON3 else "Python 2"
print ("This is SelectPdf-{0} using {1}.".format(selectpdf.CLIENT_VERSION, pythonVersion))

try:
    client = selectpdf.PdfMergeClient(apiKey)

    # set parameters - see full list at https://selectpdf.com/pdf-merge-api/

    # specify the pdf files that will be merged (order will be preserved in the final pdf)

    client.addFile(testPdf) # add PDF from local file
    client.addUrlFile(testUrl) # add PDF From public url
    # client.addFileWithPassword(testPdf, "pdf_password") # add PDF (that requires a password) from local file
    # client.addUrlFileWithPassword(testUrl, "pdf_password") # add PDF (that requires a password) from public url

    print ("Starting pdf merge ...")
    
    # merge pdfs to local file
    client.saveToFile(localFile)

    # merge pdfs to memory
    # pdf = client.save()

    print ("Finished! Number of pages: {0}.".format(client.getNumberOfPages()))

    # get API usage
    usageClient = selectpdf.UsageClient(apiKey)
    usage = usageClient.getUsage()
    print("Conversions remained this month: {0}.".format(usage["available"]))

except selectpdf.ApiException as ex:
    print ("An error occurred: {0}.".format(ex.getMessage()))

The above Python samples can also be found in GitHub repository: Python Samples.