Click or drag to resize
Pdf Library for .NET

PdfToText Class

Extract text from PDF documents. Search PDF document for a specific text.
Inheritance Hierarchy
SystemObject
  SelectPdfPdfTool
    SelectPdfPdfToText

Namespace:  SelectPdf
Assembly:  Select.Pdf (in Select.Pdf.dll) Version: 24.1
Syntax
public class PdfToText : PdfTool

The PdfToText type exposes the following members.

Constructors
  NameDescription
Public methodPdfToText
Instantiates the PdfToText object.
Top
Properties
  NameDescription
Public propertyClipText
Do not return hidden text from the PDF document.
Public propertyDocumentInformation
PDF document information. This property is populated after the requested operation (GetText, GetHtml, etc) is finished.
(Inherited from PdfTool.)
Public propertyEndPageNumber
The page number where the current operation will end on the PDF file. The default value is 0 which means that all the PDF document is processed starting from the StartPageNumber page.
(Inherited from PdfTool.)
Public propertyHtmlCharset
The charset meta tag added to the generated HTML document when the GetHtml() method is used. The default value is UTF-8.
Public propertyLayout
Gets or sets the layout of the output text. The default value is Original.
Public propertyMarkPageBreaks
Insert a special character after the text extracted from each PDF page. The special character defined by the PageBreakMark property.
Public propertyPageBreakMark
Gets the page break mark character used when the MarkPageBreaks property is true.
Public propertyStartPageNumber
The page number from where the current operation will start on the PDF file. The default value is 1 which means that the operation will start from the first page.
(Inherited from PdfTool.)
Public propertyTimeout
Timeout in seconds for the current operation. Default value is 600 seconds.
(Inherited from PdfTool.)
Public propertyUserPassword
The user password to be used to open the PDF document for reading. The default value is null, which means that no password will be used to open the PDF document.
(Inherited from PdfTool.)
Top
Methods
  NameDescription
Public methodExtractText
Extracts the text from the specified page and coordinates.
Public methodGetHtml
Gets the text from a PDF document and wraps it with HTML tags.
Public methodGetInfo
Gets the information of the loaded PDF document.
(Inherited from PdfTool.)
Public methodGetPageCount
Gets the number of pages in the loaded PDF document.
(Inherited from PdfTool.)
Public methodGetText
Gets the text from a range of pages from a PDF document.
Public methodLoad(Byte)
Loads a pdf document from a byte array.
(Inherited from PdfTool.)
Public methodLoad(Stream)
Loads a pdf document from the specified stream.
(Inherited from PdfTool.)
Public methodLoad(String)
Loads an existing pdf file.
(Inherited from PdfTool.)
Public methodLoad(Byte, String)
Loads a password protected pdf document from a byte array.
(Inherited from PdfTool.)
Public methodLoad(Stream, String)
Loads a pdf document from a stream containing a password protected pdf document.
(Inherited from PdfTool.)
Public methodLoad(String, String)
Loads an existing password protected pdf file.
(Inherited from PdfTool.)
Public methodSaveHtml(String)
Gets the text from a PDF document wrapped in HTML tags and saves it into a file.
Public methodSaveHtml(String, Encoding)
Gets the text from a PDF document wrapped in HTML tags and saves it into a file.
Public methodSaveText(String)
Gets the text from a PDF document and saves it into a file.
Public methodSaveText(String, Encoding)
Gets the text from a PDF document and saves it into a file.
Public methodSearch(String)
Search for a specific text in a PDF document. The search is case insensitive and returns partial words also.
Public methodSearch(String, Boolean, Boolean)
Search for a specific text in a PDF document.
Top
See Also