DEV Community

IronSoftware
IronSoftware

Posted on • Originally published at ironsoftware.com

Create Searchable PDFs by OCR

We can use Iron's advanced Tesseract engine to convert Images to searchable PDFs. It can also make existing PDF's searchable.

This adds to SEO performance and internal search indexing within intranets and databases.

C#:

using IronOcr;

  var Ocr = new IronTesseract();
using (var Input = new OcrInput())
{
    Input.Add(@"images\page1.png")
    Input.Add(@"images\page2.bmp")
    Input.Add(@"images\page3.tiff")

    Input.Deskew();

    var Result = Ocr.Read(Input);
    Result.SaveAsSearchablePdf("searchable.pdf");
}
Enter fullscreen mode Exit fullscreen mode

VB:

Imports IronOcr

  Private Ocr = New IronTesseract()
Using Input = New OcrInput()
    Input.Add("images\page1.png") Input.Add("images\page2.bmp") Input.Add("images\page3.tiff") Input.Deskew()

    Dim Result = Ocr.Read(Input)
    Result.SaveAsSearchablePdf("searchable.pdf")
End Using
Enter fullscreen mode Exit fullscreen mode

Top comments (0)