<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: AI Viewz</title>
    <description>The latest articles on DEV Community by AI Viewz (@aiviewz_team).</description>
    <link>https://dev.to/aiviewz_team</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3389193%2F55749656-22da-4b63-a82c-510d6ca99628.png</url>
      <title>DEV Community: AI Viewz</title>
      <link>https://dev.to/aiviewz_team</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/aiviewz_team"/>
    <language>en</language>
    <item>
      <title>Tesseract OCR 5.5 Installation</title>
      <dc:creator>AI Viewz</dc:creator>
      <pubDate>Fri, 26 Sep 2025 21:43:31 +0000</pubDate>
      <link>https://dev.to/aiviewz_team/tesseract-ocr-55-installation-384g</link>
      <guid>https://dev.to/aiviewz_team/tesseract-ocr-55-installation-384g</guid>
      <description>&lt;h2&gt;
  
  
  How To Install Latest Version of Tesseract OCR 5.5
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;sudo add-apt-repository ppa:alex-p/tesseract-ocr5.4
sudo apt update

sudo add-apt-repository ppa:alex-p/tesseract-ocr-devel
sudo apt update

sudo apt install tesseract-ocr-eng   # For English
sudo apt install tesseract-ocr-ara   # For Arabic
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We Have a Very Detailed Explanation on &lt;a href="https://www.aiviewz.com/posts/how-to-install-latest-version-5-4-of-tesseract-ocr" rel="noopener noreferrer"&gt;Tesseract Installation &lt;/a&gt;&lt;/p&gt;

</description>
      <category>programming</category>
      <category>ai</category>
      <category>ocr</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>Generating Synthetic RTL OCR Data for Donut with SynthDoG-RTL</title>
      <dc:creator>AI Viewz</dc:creator>
      <pubDate>Tue, 23 Sep 2025 20:11:56 +0000</pubDate>
      <link>https://dev.to/aiviewz_team/generating-synthetic-rtl-ocr-data-for-donut-with-synthdog-rtl-3ghi</link>
      <guid>https://dev.to/aiviewz_team/generating-synthetic-rtl-ocr-data-for-donut-with-synthdog-rtl-3ghi</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;Building OCR models for right-to-left (RTL) languages like Arabic, Urdu, Persian, or Hebrew often suffers from a lack of annotated training data. &lt;strong&gt;SynthDoG-RTL&lt;/strong&gt; is a synthetic document generator adapted from Donut’s SynthDoG, extended to handle RTL text rendering correctly. In this post, we’ll walk through how advanced developers can generate large-scale synthetic datasets compatible with &lt;a href="https://github.com/clovaai/donut" rel="noopener noreferrer"&gt;Donut&lt;/a&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  What is SynthDoG-RTL?
&lt;/h2&gt;

&lt;p&gt;SynthDoG (Synthetic Document Generator) was introduced with Donut to create training data on the fly for document understanding. &lt;a href="https://github.com/aiviewz/Synthdog-RTL" rel="noopener noreferrer"&gt;SynthDoG-RTL&lt;/a&gt; extends it by:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Supporting RTL text direction and contextual script shaping.&lt;/li&gt;
&lt;li&gt;Including sample corpora, fonts, and templates for Arabic, Urdu, Persian, Hebrew, and others.&lt;/li&gt;
&lt;li&gt;Allowing custom YAML configuration for layouts, distortions, and effects.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Installation and Setup
&lt;/h2&gt;

&lt;p&gt;Clone the repository and install dependencies:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/aiviewz/Synthdog-RTL.git
&lt;span class="nb"&gt;cd &lt;/span&gt;Synthdog-RTL

conda create &lt;span class="nt"&gt;-n&lt;/span&gt; synthdog &lt;span class="nv"&gt;python&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;3.8 &lt;span class="nt"&gt;-y&lt;/span&gt;
conda activate synthdog
pip &lt;span class="nb"&gt;install &lt;/span&gt;synthtiger
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Make sure to install &lt;a href="https://github.com/HOST-Oman/libraqm" rel="noopener noreferrer"&gt;libraqm&lt;/a&gt; for proper Arabic/RTL shaping:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sudo &lt;/span&gt;apt-get &lt;span class="nb"&gt;install &lt;/span&gt;libfreetype6-dev libharfbuzz-dev
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;On macOS, set:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;OBJC_DISABLE_INITIALIZE_FORK_SAFETY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;YES
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Preparing Resources
&lt;/h2&gt;

&lt;p&gt;Each language needs:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Corpus&lt;/strong&gt;: UTF-8 text file under &lt;code&gt;resources/corpus/&lt;/code&gt; (e.g., &lt;code&gt;urdu.txt&lt;/code&gt;, &lt;code&gt;arabic.txt&lt;/code&gt;).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Fonts&lt;/strong&gt;: Place &lt;code&gt;.ttf/.otf&lt;/code&gt; fonts in &lt;code&gt;resources/font/&amp;lt;lang_code&amp;gt;/&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Backgrounds&lt;/strong&gt;: Optional textures under &lt;code&gt;resources/backgrounds/&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Example structure:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;resources/
 ├─ corpus/
 │   ├─ urdu.txt
 │   └─ arabic.txt
 └─ font/
     ├─ ur/
     │   └─ NotoNastaliq.ttf
     └─ ar/
         └─ NotoNaskh.ttf
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Configuring Generation
&lt;/h2&gt;

&lt;p&gt;YAML config files (e.g., &lt;code&gt;config_ur.yaml&lt;/code&gt;) define page size, font range, distortions, and paths.&lt;/p&gt;

&lt;p&gt;Example Urdu config:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;corpus_path&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;resources/corpus/urdu.txt"&lt;/span&gt;
&lt;span class="na"&gt;font_dir&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;resources/font/ur/"&lt;/span&gt;
&lt;span class="na"&gt;page_width&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;1240&lt;/span&gt;
&lt;span class="na"&gt;page_height&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;1754&lt;/span&gt;
&lt;span class="na"&gt;min_font_size&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;20&lt;/span&gt;
&lt;span class="na"&gt;max_font_size&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;40&lt;/span&gt;
&lt;span class="na"&gt;rotate_angle&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;-2&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;2&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
&lt;span class="na"&gt;background_dir&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;resources/backgrounds/paper/"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Generating Synthetic Data
&lt;/h2&gt;

&lt;p&gt;Run the CLI:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;synthtiger &lt;span class="nt"&gt;-o&lt;/span&gt; ./outputs/synthdog_ur &lt;span class="nt"&gt;-c&lt;/span&gt; 1000 &lt;span class="nt"&gt;-w&lt;/span&gt; 8 &lt;span class="nt"&gt;-v&lt;/span&gt; template.py SynthDoG config_ur.yaml
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This generates 1000 samples with 8 workers, outputting images and text into &lt;code&gt;./outputs/synthdog_ur/&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Repeat with &lt;code&gt;config_ar.yaml&lt;/code&gt;, &lt;code&gt;config_fa.yaml&lt;/code&gt;, etc. for multiple languages.&lt;/p&gt;




&lt;h2&gt;
  
  
  Formatting for Donut
&lt;/h2&gt;

&lt;p&gt;Donut expects an &lt;strong&gt;image + JSON&lt;/strong&gt; pair. Structure your dataset like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;my_dataset/
 ├─ train/
 │   ├─ metadata.jsonl
 │   ├─ 00000001.png
 │   └─ ...
 ├─ validation/
 │   └─ ...
 └─ test/
     └─ ...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each line in &lt;code&gt;metadata.jsonl&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"file_name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"00000001.png"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"ground_truth"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"{&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;gt_parse&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;:{&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;text_sequence&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;:&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;یہ اردو کا متن ہے&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;}}"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Donut will tokenize this internally. Ensure that &lt;code&gt;file_name&lt;/code&gt; matches your image and &lt;code&gt;text_sequence&lt;/code&gt; contains the RTL ground truth text.&lt;/p&gt;




&lt;h2&gt;
  
  
  Advanced Tips
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Layouts&lt;/strong&gt;: Customize &lt;code&gt;template.py&lt;/code&gt; for multi-column, headers, or tables.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Effects&lt;/strong&gt;: Add noise, blur, or perspective distortion in YAML for realism.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Fonts&lt;/strong&gt;: Use multiple fonts per language to avoid overfitting.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Mixed Scripts&lt;/strong&gt;: Include English corpora to simulate bilingual documents.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Scaling&lt;/strong&gt;: Generate 10k–100k samples to pre-train Donut effectively.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;With &lt;a href="https://github.com/aiviewz/Synthdog-RTL" rel="noopener noreferrer"&gt;SynthDog-RTL&lt;/a&gt; you can rapidly bootstrap synthetic OCR datasets for all major RTL languages. The generated data integrates seamlessly with Donut, enabling you to train or fine-tune robust document understanding models even in low-resource settings.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;References&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://github.com/clovaai/donut" rel="noopener noreferrer"&gt;Donut GitHub&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/aiviewz/Synthdog-RTL" rel="noopener noreferrer"&gt;SynthDoG-RTL&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/clovaai/synthtiger" rel="noopener noreferrer"&gt;SynthTIGER&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.aiviewz.com/posts/how-to-create-synthetic-dataset-for-donut-ocr-for-your-custom-language" rel="noopener noreferrer"&gt;Synthdog-RTL Tutorial Blog Post&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>python</category>
      <category>ocr</category>
      <category>computervision</category>
      <category>programming</category>
    </item>
  </channel>
</rss>
