Search for Text in PDF


Search for Text in PDF

The Pdf To Text Converter provided by SelectPdf, described in Extract Text from PDF section, provides one other very interesting feature: the possibility to search for text in a PDF document and retrieve the locations for all searched text occurrences.

This can be done using the Search method of the PdfToText class. The method will return an array of TextPosition objects.

Sample Code

This sample shows how to use SelectPdf PDF Library for .NET to search for text in a PDF document. A new PDF will be created highlighting the text that has been found.

Copy

// the test file
string filePdf = Server.MapPath("~/files/selectpdf.pdf");

// settings
bool caseSensitive = ChkCaseSensitive.Checked;
bool wholeWordsOnly = ChkWholeWordsOnly.Checked;

// instantiate a pdf to text converter object
PdfToText pdfToText = new PdfToText();

// load PDF file
pdfToText.Load(filePdf);

// search for text and retrieve all found text positions
TextPosition[] positions = pdfToText.Search(TxtSearchText.Text, 
    caseSensitive, wholeWordsOnly);

// open the existing PDF document in editing mode
PdfDocument doc = new PdfDocument(filePdf);

// highlight the found text in the existing PDF document
for (int i = 0; i < positions.Length; i++)
{
    TextPosition position = (TextPosition)positions[i];

    PdfPage page = doc.Pages[position.PageNumber - 1];

    PdfRectangleElement rect = new PdfRectangleElement(
        position.X, position.Y, position.Width, position.Height);
    rect.BackColor = new PdfColor(240, 240, 0);
    rect.Transparency = 30;
    page.Add(rect);
}

// save pdf document
doc.Save(Response, false, "Sample.pdf");

// close pdf document
doc.Close();

' the test file
Dim filePdf As String = Server.MapPath("~/files/selectpdf.pdf")

' settings
Dim caseSensitive As Boolean = ChkCaseSensitive.Checked
Dim wholeWordsOnly As Boolean = ChkWholeWordsOnly.Checked

' instantiate a pdf to text converter object
Dim pdfToText As New PdfToText()

' load PDF file
pdfToText.Load(filePdf)

' search for text and retrieve all found text positions
Dim positions As TextPosition() = pdfToText.Search( _
    TxtSearchText.Text, caseSensitive, wholeWordsOnly)

' open the existing PDF document in editing mode
Dim doc As New PdfDocument(filePdf)

' highlight the found text in the existing PDF document
For i As Integer = 0 To positions.Length - 1
    Dim position As TextPosition = DirectCast(positions(i), TextPosition)

    Dim page As PdfPage = doc.Pages(position.PageNumber - 1)

    Dim rect As New PdfRectangleElement( _
        position.X, position.Y, position.Width, position.Height)
    rect.BackColor = New PdfColor(240, 240, 0)
    rect.Transparency = 30
    page.Add(rect)
Next

' save pdf document
doc.Save(Response, False, "Sample.pdf")

' close pdf document
doc.Close()

Reference

SelectPdf

PdfToText

Other Resources

Select.Pdf Online Demo with C# Sample Code

Select.Pdf Online Demo with Vb.Net Sample Code