DEV Community

IronSoftware
IronSoftware

Posted on • Originally published at ironsoftware.com

1 1

Make any PDF have Searchable, Copyable Text (Code Example)

We can use Iron's advanced Tesseract engine make scanned PDF documents searchable and indexable, with text users can copy and paste.

C#:

using IronOcr;

var Ocr = new IronTesseract();
using (var Input = new OcrInput())
{
    Input.AddPdf("scan.pdf","password")

    // clean up twisted pages
    Input.Deskew();

    var Result = Ocr.Read(Input);
    Result.SaveAsSearchablePdf("searchable.pdf");
}
Enter fullscreen mode Exit fullscreen mode

VB:

Imports IronOcr

Private Ocr = New IronTesseract()
Using Input = New OcrInput()
    Input.AddPdf("scan.pdf","password") Input.Deskew()

    Dim Result = Ocr.Read(Input)
    Result.SaveAsSearchablePdf("searchable.pdf")
End Using
Enter fullscreen mode Exit fullscreen mode

Top comments (0)

Billboard image

The Next Generation Developer Platform

Coherence is the first Platform-as-a-Service you can control. Unlike "black-box" platforms that are opinionated about the infra you can deploy, Coherence is powered by CNC, the open-source IaC framework, which offers limitless customization.

Learn more

👋 Kindness is contagious

Please leave a ❤️ or a friendly comment on this post if you found it helpful!

Okay