<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Hani Amro</title>
    <description>The latest articles on DEV Community by Hani Amro (@h_amro_13de6b93cc1ce).</description>
    <link>https://dev.to/h_amro_13de6b93cc1ce</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3998576%2Fe09c9e46-6dd1-4d85-aa45-98f1ff600aab.png</url>
      <title>DEV Community: Hani Amro</title>
      <link>https://dev.to/h_amro_13de6b93cc1ce</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/h_amro_13de6b93cc1ce"/>
    <language>en</language>
    <item>
      <title>Arabic OCR with an API: Make Scanned Arabic PDFs Searchable (Python)</title>
      <dc:creator>Hani Amro</dc:creator>
      <pubDate>Tue, 23 Jun 2026 11:25:30 +0000</pubDate>
      <link>https://dev.to/h_amro_13de6b93cc1ce/arabic-ocr-with-an-api-make-scanned-arabic-pdfs-searchable-python-5hah</link>
      <guid>https://dev.to/h_amro_13de6b93cc1ce/arabic-ocr-with-an-api-make-scanned-arabic-pdfs-searchable-python-5hah</guid>
      <description>&lt;p&gt;If you've ever tried to extract text from a scanned Arabic document, you already know the pain. Most OCR tooling is built English-first. Arabic adds three problems on top:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Right-to-left (RTL) text&lt;/strong&gt; that breaks naive layout assumptions.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Connected letters (ligatures)&lt;/strong&gt; — the same letter changes shape depending on its position in the word.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Diacritics and a different numeral set&lt;/strong&gt; that generic models drop or mangle.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The result: you run a scanned Arabic contract, invoice, or government form through a typical "PDF to text" tool and get back garbage — reversed words, missing letters, or nothing at all.&lt;/p&gt;

&lt;p&gt;This post shows a practical way to turn a &lt;strong&gt;scanned Arabic PDF into a searchable PDF&lt;/strong&gt; (a real, selectable text layer underneath the original page image) with a single API call — no ML pipeline to build, no GPU, no model weights to host. Code is in Python, cURL, and JavaScript.&lt;/p&gt;

&lt;h2&gt;
  
  
  Contents
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;What "searchable PDF" actually means&lt;/li&gt;
&lt;li&gt;The approach&lt;/li&gt;
&lt;li&gt;Tips for better Arabic OCR results&lt;/li&gt;
&lt;li&gt;Honest limitations&lt;/li&gt;
&lt;li&gt;Why an API instead of self-hosting Tesseract&lt;/li&gt;
&lt;li&gt;Pricing&lt;/li&gt;
&lt;li&gt;Wrap-up&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What "searchable PDF" actually means
&lt;/h2&gt;

&lt;p&gt;There are two different things people call "OCR":&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Text extraction&lt;/strong&gt; — you get back a string of the recognized text.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Searchable PDF&lt;/strong&gt; — you get back a &lt;em&gt;PDF that looks identical to the scan&lt;/em&gt;, but now has an invisible text layer, so &lt;code&gt;Ctrl+F&lt;/code&gt;, copy-paste, and indexing all work.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The second is what most real workflows need: you keep the original document exactly as scanned (important for legal/official docs), but it becomes searchable and accessible. That's what we'll produce here.&lt;/p&gt;

&lt;h2&gt;
  
  
  The approach
&lt;/h2&gt;

&lt;p&gt;We'll use the &lt;strong&gt;PDF Tools API&lt;/strong&gt; &lt;code&gt;/ocr&lt;/code&gt; endpoint. Under the hood it runs Tesseract with the Arabic (&lt;code&gt;ara&lt;/code&gt;) and English (&lt;code&gt;eng&lt;/code&gt;) language models and rebuilds the PDF with an invisible OCR text layer. The relevant detail for us: you can pass &lt;code&gt;lang=eng+ara&lt;/code&gt; to recognize &lt;strong&gt;mixed Arabic/English documents&lt;/strong&gt; in one pass — which is what most real MENA paperwork actually is (Arabic body text, English brand names, Latin numbers).&lt;/p&gt;

&lt;p&gt;You'll need a free API key from the listing (the free tier is 1,000 requests/month, no card). Then:&lt;/p&gt;

&lt;h3&gt;
  
  
  Python
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;

&lt;span class="n"&gt;API_KEY&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;YOUR_RAPIDAPI_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;HOST&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;pdf-tools-api2.p.rapidapi.com&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;arabic_scan.pdf&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;rb&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;resp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;HOST&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/ocr&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;X-RapidAPI-Key&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;API_KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;X-RapidAPI-Host&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;HOST&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="n"&gt;files&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;file&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;arabic_scan.pdf&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;application/pdf&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)},&lt;/span&gt;
        &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;lang&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;eng+ara&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;   &lt;span class="c1"&gt;# mixed Arabic + English
&lt;/span&gt;    &lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;resp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;raise_for_status&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;searchable.pdf&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;wb&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;out&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;out&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;resp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Done — searchable.pdf now has a real text layer.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Open &lt;code&gt;searchable.pdf&lt;/code&gt; and try selecting the Arabic text or searching it. It's there now.&lt;/p&gt;
&lt;h3&gt;
  
  
  cURL
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-X&lt;/span&gt; POST &lt;span class="s2"&gt;"https://pdf-tools-api2.p.rapidapi.com/ocr"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"X-RapidAPI-Key: YOUR_RAPIDAPI_KEY"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"X-RapidAPI-Host: pdf-tools-api2.p.rapidapi.com"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-F&lt;/span&gt; &lt;span class="s2"&gt;"file=@arabic_scan.pdf"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-F&lt;/span&gt; &lt;span class="s2"&gt;"lang=eng+ara"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--output&lt;/span&gt; searchable.pdf
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h3&gt;
  
  
  JavaScript (Node / browser)
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;form&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;FormData&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="nx"&gt;form&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;file&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;fileInput&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;files&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]);&lt;/span&gt;
&lt;span class="nx"&gt;form&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;lang&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;eng+ara&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;https://pdf-tools-api2.p.rapidapi.com/ocr&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;method&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;POST&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;X-RapidAPI-Key&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;YOUR_RAPIDAPI_KEY&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;X-RapidAPI-Host&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;pdf-tools-api2.p.rapidapi.com&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="na"&gt;body&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;form&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;blob&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;blob&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt; &lt;span class="c1"&gt;// application/pdf, now searchable&lt;/span&gt;

&lt;span class="c1"&gt;// Browser: download the searchable PDF&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;url&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;URL&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;createObjectURL&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;blob&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;a&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;Object&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;assign&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;document&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;createElement&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;a&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;href&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;download&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;searchable.pdf&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="nx"&gt;a&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;click&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="nx"&gt;URL&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;revokeObjectURL&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;url&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;&lt;/p&gt;
  Just need the raw text instead of a searchable PDF?
  &lt;p&gt;If you only want the extracted string (for a database, a search index, an LLM pipeline), run the searchable PDF through &lt;code&gt;/extract-text&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;resp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://pdf-tools-api2.p.rapidapi.com/extract-text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;X-RapidAPI-Key&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;API_KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;X-RapidAPI-Host&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;HOST&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="n"&gt;files&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;file&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;searchable.pdf&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;searchable.pdf&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;rb&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;application/pdf&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)},&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;resp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;()[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;




&lt;p&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  Tips for better Arabic OCR results
&lt;/h2&gt;

&lt;p&gt;OCR quality depends mostly on the &lt;strong&gt;input scan&lt;/strong&gt;, not the engine. To get clean output:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Scan at 300 DPI&lt;/strong&gt; or higher. Below ~200 DPI, connected Arabic letters blur together.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Deskew&lt;/strong&gt; crooked scans before sending. Even 2–3° of rotation hurts RTL recognition.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Use &lt;code&gt;eng+ara&lt;/code&gt;, not &lt;code&gt;ara&lt;/code&gt; alone&lt;/strong&gt;, for any document that mixes Latin characters (almost all real-world ones do).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Keep it under 15 pages per request&lt;/strong&gt; (split larger docs first — there's a &lt;code&gt;/split&lt;/code&gt; endpoint).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Black-on-white&lt;/strong&gt; beats colored backgrounds; if your scan is noisy, that's the biggest quality lever.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;
  
  
  Honest limitations
&lt;/h2&gt;


&lt;div class="crayons-card c-embed"&gt;

  &lt;br&gt;
This is Tesseract-based OCR, not a frontier vision model. It's excellent for &lt;strong&gt;printed&lt;/strong&gt; Arabic (forms, contracts, books, invoices). It is &lt;strong&gt;not&lt;/strong&gt; built for handwritten Arabic, heavily stylized calligraphy, or low-resolution phone photos — accuracy drops sharply there, same as every OCR engine. For clean printed scans it's genuinely good and, importantly, it's &lt;em&gt;available&lt;/em&gt; — which is more than most PDF APIs can say for Arabic at all.&lt;br&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Why an API instead of self-hosting Tesseract
&lt;/h2&gt;

&lt;p&gt;You &lt;em&gt;can&lt;/em&gt; &lt;code&gt;apt install tesseract-ocr-ara&lt;/code&gt; and wire up the PDF rebuild yourself. People do. But you then own:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;installing and updating Tesseract + the Arabic language data,&lt;/li&gt;
&lt;li&gt;the rasterize → OCR → re-embed-text-layer pipeline (the fiddly part),&lt;/li&gt;
&lt;li&gt;font/encoding edge cases for the invisible RTL text layer,&lt;/li&gt;
&lt;li&gt;scaling it without melting your server on a 15-page scan.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If Arabic OCR is core to your product, self-hosting is fine. If it's one feature among many, one HTTP call you can put in a spreadsheet beats a maintenance project.&lt;/p&gt;

&lt;h2&gt;
  
  
  Pricing, briefly
&lt;/h2&gt;

&lt;p&gt;The API is &lt;strong&gt;flat per-request&lt;/strong&gt; — one OCR call is one request, whether it's a 1-page or 15-page scan. No credit tables, no per-page billing (iLovePDF, for comparison, charges OCR per page in credits). Free tier is 1,000 requests/month, permanently, no card. The same key also does merge, split, compress, encrypt, HTML→PDF, Office→PDF, redaction, and table extraction — 26 endpoints total.&lt;/p&gt;

&lt;h2&gt;
  
  
  Wrap-up
&lt;/h2&gt;

&lt;p&gt;Arabic OCR has a reputation for being painful, and self-hosting it is. But for printed documents, turning a scanned Arabic PDF into a searchable one is now a single API call with &lt;code&gt;lang=eng+ara&lt;/code&gt;. If you're digitizing Arabic archives, building a MENA document-management product, or just need &lt;code&gt;Ctrl+F&lt;/code&gt; to work on a scanned contract, this gets you there in five minutes.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Your turn:&lt;/strong&gt; what trips you up most with Arabic OCR — RTL layout, connected-letter ligatures, or diacritics getting dropped? And what are you digitizing: contracts, old books, or handwritten notes? Tell me in the comments. 👇&lt;/p&gt;

&lt;p&gt;&lt;a href="https://rapidapi.com/thabatnajm/api/pdf-tools-api2" class="crayons-btn crayons-btn--primary" rel="noopener noreferrer"&gt;Try the Arabic OCR API free — 1,000 requests/month, no card&lt;/a&gt;
&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Built and maintained by a solo developer (based in Syria) who actually answers — questions welcome in the comments.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>api</category>
      <category>python</category>
      <category>ocr</category>
      <category>arabic</category>
    </item>
  </channel>
</rss>
