DEV Community

IronSoftware
IronSoftware

Posted on • Originally published at ironsoftware.com

3 2

Create Searchable PDFs by OCR

We can use Iron's advanced Tesseract engine to convert Images to searchable PDFs. It can also make existing PDF's searchable.

This adds to SEO performance and internal search indexing within intranets and databases.

C#:

using IronOcr;

  var Ocr = new IronTesseract();
using (var Input = new OcrInput())
{
    Input.Add(@"images\page1.png")
    Input.Add(@"images\page2.bmp")
    Input.Add(@"images\page3.tiff")

    Input.Deskew();

    var Result = Ocr.Read(Input);
    Result.SaveAsSearchablePdf("searchable.pdf");
}
Enter fullscreen mode Exit fullscreen mode

VB:

Imports IronOcr

  Private Ocr = New IronTesseract()
Using Input = New OcrInput()
    Input.Add("images\page1.png") Input.Add("images\page2.bmp") Input.Add("images\page3.tiff") Input.Deskew()

    Dim Result = Ocr.Read(Input)
    Result.SaveAsSearchablePdf("searchable.pdf")
End Using
Enter fullscreen mode Exit fullscreen mode

Sentry image

See why 4M developers consider Sentry, “not bad.”

Fixing code doesn’t have to be the worst part of your day. Learn how Sentry can help.

Learn more

Top comments (0)

Billboard image

The Next Generation Developer Platform

Coherence is the first Platform-as-a-Service you can control. Unlike "black-box" platforms that are opinionated about the infra you can deploy, Coherence is powered by CNC, the open-source IaC framework, which offers limitless customization.

Learn more

👋 Kindness is contagious

Please leave a ❤️ or a friendly comment on this post if you found it helpful!

Okay