<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Bato</title>
    <description>The latest articles on DEV Community by Bato (@kaniel_outis).</description>
    <link>https://dev.to/kaniel_outis</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3761117%2F99cec290-03c2-4e65-8a70-546ce8545dbb.jpeg</url>
      <title>DEV Community: Bato</title>
      <link>https://dev.to/kaniel_outis</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/kaniel_outis"/>
    <language>en</language>
    <item>
      <title>How Well Can OCR Read Doctor Handwriting in 2026?</title>
      <dc:creator>Bato</dc:creator>
      <pubDate>Tue, 07 Apr 2026 07:18:06 +0000</pubDate>
      <link>https://dev.to/kaniel_outis/how-well-can-ocr-read-doctor-handwriting-in-2026-54hn</link>
      <guid>https://dev.to/kaniel_outis/how-well-can-ocr-read-doctor-handwriting-in-2026-54hn</guid>
      <description>&lt;p&gt;&lt;em&gt;Benchmarking four open-source OCR engines on 5,578 handwritten medical prescriptions&lt;/em&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Key Takeaways&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;PP-OCRv5 (5M parameters) and GLM-OCR (0.9B parameters) both achieve 20%+ exact match on handwritten prescriptions, a 10x jump over Tesseract and EasyOCR&lt;/li&gt;
&lt;li&gt;GLM-OCR leads on character accuracy (CER 0.328), while PP-OCRv5 leads on word accuracy (WER 0.789)&lt;/li&gt;
&lt;li&gt;A 5M-parameter model trained on curated data rivals a 900M-parameter vision-language model&lt;/li&gt;
&lt;li&gt;Neither engine is clinically deployable yet: even the best gets only 1 in 3 words exactly right&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;




&lt;p&gt;Last month I spent some time squinting at prescription scans, trying to figure out if a doctor wrote &lt;em&gt;Amoxicillin&lt;/em&gt; or &lt;em&gt;Amitriptyline&lt;/em&gt;. I got it wrong twice. That got me wondering: how would today's OCR engines handle this?&lt;/p&gt;

&lt;p&gt;The stakes are real. Medication errors injure approximately 1.3 million people annually in the United States alone and cost an estimated $42 billion globally (&lt;a href="https://www.who.int/news/item/29-03-2017-who-launches-global-effort-to-halve-medication-related-errors-in-5-years" rel="noopener noreferrer"&gt;WHO&lt;/a&gt;, 2017). Illegible handwriting is a well-documented contributor: 35.7% of handwritten prescriptions contain errors, compared to just 2.5% of electronic ones (&lt;a href="https://pmc.ncbi.nlm.nih.gov/articles/PMC4281619/" rel="noopener noreferrer"&gt;Albarrak et al.&lt;/a&gt;, 2014).&lt;/p&gt;

&lt;p&gt;A study of 4,183 prescriptions found that 10.21% of them are illegible, and 19.39% are barely legible (&lt;a href="https://pmc.ncbi.nlm.nih.gov/articles/PMC10686667/" rel="noopener noreferrer"&gt;Albalushi et al.&lt;/a&gt;, 2023). The global OCR market is projected to reach $32.9 billion by 2030, growing at a 14.8% CAGR (&lt;a href="https://www.grandviewresearch.com/industry-analysis/optical-character-recognition-market" rel="noopener noreferrer"&gt;Grand View Research&lt;/a&gt;, 2025). Healthcare is one of the fastest-growing verticals driving that growth. But I couldn't find a public benchmark comparing today's open-source OCR engines on handwritten medical text.&lt;/p&gt;

&lt;p&gt;So I ran one myself. I tested four engines on &lt;strong&gt;5,578 handwritten prescription word images&lt;/strong&gt;, and the results surprised me.&lt;/p&gt;




&lt;h2&gt;
  
  
  Who Are the Four Contenders?
&lt;/h2&gt;

&lt;p&gt;These four engines span three generations of OCR thinking, from traditional pattern matching to specialized deep learning to generative vision-language models.&lt;/p&gt;

&lt;h3&gt;
  
  
  Tesseract: The Veteran
&lt;/h3&gt;

&lt;p&gt;Google-backed and over 18 years old, Tesseract is the default OCR engine for a generation of developers. It uses an LSTM-based architecture designed primarily for printed text. Stable, well-documented, and runs everywhere, but handwritten cursive is not its strength.&lt;/p&gt;

&lt;h3&gt;
  
  
  EasyOCR: The Accessible One
&lt;/h3&gt;

&lt;p&gt;Built on a CRNN (Convolutional Recurrent Neural Network) architecture with roughly &lt;strong&gt;10 million parameters&lt;/strong&gt;, EasyOCR's selling point is simplicity: &lt;code&gt;pip install easyocr&lt;/code&gt; and you're recognizing text in 80+ languages. It uses deep learning but remains a traditional detection-recognition pipeline.&lt;/p&gt;

&lt;h3&gt;
  
  
  PP-OCRv5: The Data-Centric Specialist
&lt;/h3&gt;

&lt;p&gt;Baidu's latest, with just &lt;strong&gt;5 million parameters&lt;/strong&gt;. PP-OCRv5 uses an SVTR_LCNet architecture with a Guided Training of CTC (GTC) strategy. The real innovation isn't the architecture, though. It's the training data.&lt;/p&gt;

&lt;p&gt;The PP-OCRv5 paper (&lt;a href="https://arxiv.org/abs/2603.24373" rel="noopener noreferrer"&gt;Cui et al.&lt;/a&gt;, 2026) shows that &lt;em&gt;data quality trumps model scale&lt;/em&gt;. They curated &lt;strong&gt;22.6 million training samples&lt;/strong&gt; by filtering along three dimensions. First, &lt;strong&gt;difficulty&lt;/strong&gt;: they use model confidence as a proxy and found that samples in the [0.95, 0.97] range hit a sweet spot, hard enough to teach the model something new, but not so hard that the labels are unreliable. Second, &lt;strong&gt;accuracy&lt;/strong&gt;: they cross-check predictions against labels to weed out mislabeled samples. Third, &lt;strong&gt;diversity&lt;/strong&gt;: they cluster training images into 1,000 visual groups using CLIP embeddings and ensure each cluster is represented. Together, these filters yielded 2-3x improvements in handwritten recognition from v3 to v5 without changing the model architecture.&lt;/p&gt;

&lt;h3&gt;
  
  
  GLM-OCR: The Compact Vision-Language Model
&lt;/h3&gt;

&lt;p&gt;From Zhipu AI and Tsinghua University, GLM-OCR takes a fundamentally different approach. It's a &lt;strong&gt;0.9-billion-parameter&lt;/strong&gt; multimodal model combining a 0.4B CogViT vision encoder with a 0.5B GLM language decoder (&lt;a href="https://arxiv.org/abs/2603.10910" rel="noopener noreferrer"&gt;Duan et al.&lt;/a&gt;, 2026). Rather than traditional CTC or attention-based sequence recognition, it &lt;em&gt;generates&lt;/em&gt; text autoregressively, like a language model that reads images.&lt;/p&gt;

&lt;p&gt;An important note: 0.9B is &lt;strong&gt;compact&lt;/strong&gt; for a vision-language model. For comparison, Qwen3-VL has 235 billion parameters and GPT-4o is even larger. GLM-OCR was designed for efficiency, using Multi-Token Prediction (MTP) to generate approximately 5.2 tokens per decoding step, yielding a roughly 50% throughput improvement over standard autoregressive generation. It's trained through a 4-stage pipeline that includes supervised fine-tuning and GRPO reinforcement learning.&lt;/p&gt;

&lt;p&gt;These four engines represent a clear spectrum: &lt;strong&gt;traditional&lt;/strong&gt; (Tesseract), &lt;strong&gt;specialized deep learning&lt;/strong&gt; (EasyOCR, PP-OCRv5), and &lt;strong&gt;generative VLM&lt;/strong&gt; (GLM-OCR). &lt;/p&gt;




&lt;h2&gt;
  
  
  What Makes the RxHandBD Dataset So Hard?
&lt;/h2&gt;

&lt;p&gt;We use &lt;strong&gt;RxHandBD&lt;/strong&gt; (&lt;a href="https://data.mendeley.com/datasets" rel="noopener noreferrer"&gt;Shovon et al.&lt;/a&gt;, Mendeley Data), a dataset of 5,578 cropped word images extracted from handwritten medical prescriptions written by doctors in Bangladesh. Each image contains a single word with a corresponding ground-truth label.&lt;/p&gt;

&lt;p&gt;This dataset is hard for four reasons:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Doctor's handwriting.&lt;/strong&gt; Enough said.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Medical terminology.&lt;/strong&gt; Drug names like "Amoxicillin" and "Metformin" alongside dosage notations like "5% dns" and "1+0+1."&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Mixed language.&lt;/strong&gt; English medical terms interspersed with Bangla script.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Inconsistent quality.&lt;/strong&gt; Varying paper backgrounds, pen types, and image capture conditions.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Here are a few samples to give you a sense of the difficulty range:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyjtlg5pbxm179b0val0h.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyjtlg5pbxm179b0val0h.png" alt="OCR Samples" width="800" height="700"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;From top to bottom: a drug name both modern engines nail ("Ronem"), cases where only one engine succeeds ("Vineet" and "Eylox"), and a close call where GLM-OCR gets closest but still misses ("Ambrox").&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;One important methodological note: these are &lt;strong&gt;pre-cropped word-level images&lt;/strong&gt;. We're isolating the &lt;strong&gt;recognition&lt;/strong&gt; half of the OCR pipeline, not testing detection (locating text regions on a full page). This means our results reflect recognition accuracy only. Real-world performance also depends on how well each engine detects text regions before recognizing them.&lt;/p&gt;




&lt;h2&gt;
  
  
  How Did We Run the Benchmark?
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Metrics
&lt;/h3&gt;

&lt;p&gt;We evaluate using four complementary metrics:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;CER (Character Error Rate):&lt;/strong&gt; Edit distance between predicted and reference strings, normalized by reference length. If the label says "Amoxicillin" (11 characters) and the OCR outputs "Amoxicilin" (1 deletion), the CER is 1/11 = 0.09. Lower is better.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;WER (Word Error Rate):&lt;/strong&gt; Same concept at the word level. Any mistake in a word counts the entire word as wrong. Lower is better.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Exact Match Rate:&lt;/strong&gt; The strictest metric. Did the OCR output match the ground truth character-for-character after normalization? Higher is better.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Latency:&lt;/strong&gt; Wall-clock milliseconds per image, measuring practical throughput.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Setup
&lt;/h3&gt;

&lt;p&gt;All engines run on an Apple Silicon Mac (CPU only). Single run. Default configurations for each engine with no fine-tuning or domain-specific adjustments. PP-OCRv5 runs its full detection + recognition pipeline even on pre-cropped images. GLM-OCR processes each image with the prompt "Text Recognition:".&lt;/p&gt;




&lt;h2&gt;
  
  
  What Do the Numbers Say?
&lt;/h2&gt;

&lt;p&gt;GLM-OCR achieves the lowest character error rate at 0.328, while PP-OCRv5 leads on word-level accuracy with a WER of 0.789. Both modern engines dramatically outperform the older generation, with exact match rates 8-13x higher than Tesseract or EasyOCR.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Engine&lt;/th&gt;
&lt;th&gt;Type&lt;/th&gt;
&lt;th&gt;Parameters&lt;/th&gt;
&lt;th&gt;CER (lower=better)&lt;/th&gt;
&lt;th&gt;WER (lower=better)&lt;/th&gt;
&lt;th&gt;Exact Match (higher=better)&lt;/th&gt;
&lt;th&gt;Latency (ms/img)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Tesseract&lt;/td&gt;
&lt;td&gt;LSTM&lt;/td&gt;
&lt;td&gt;n/a&lt;/td&gt;
&lt;td&gt;0.785&lt;/td&gt;
&lt;td&gt;1.043&lt;/td&gt;
&lt;td&gt;2.5%&lt;/td&gt;
&lt;td&gt;49&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;EasyOCR&lt;/td&gt;
&lt;td&gt;CRNN&lt;/td&gt;
&lt;td&gt;~10M&lt;/td&gt;
&lt;td&gt;0.695&lt;/td&gt;
&lt;td&gt;1.074&lt;/td&gt;
&lt;td&gt;2.6%&lt;/td&gt;
&lt;td&gt;14&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;PP-OCRv5&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;SVTR+CTC&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;5M&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;0.477&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;0.789&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;21.4%&lt;/td&gt;
&lt;td&gt;103&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;GLM-OCR&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;VLM&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;0.9B&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;0.328&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;0.801&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;32.6%&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;141&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ff7aqwpajca70pmcs3hjl.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ff7aqwpajca70pmcs3hjl.png" alt="CER Comparison" width="800" height="476"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fueyrt23geii16juchya0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fueyrt23geii16juchya0.png" alt="Accuracy vs Latency" width="800" height="598"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Let me walk through what these numbers actually mean.&lt;/p&gt;

&lt;h3&gt;
  
  
  Finding 1: Tesseract and EasyOCR Barely Function on Handwriting
&lt;/h3&gt;

&lt;p&gt;Both Tesseract and EasyOCR achieve under &lt;strong&gt;3% exact match&lt;/strong&gt;, meaning they get fewer than 1 in 40 words perfectly right. Their WER exceeds 1.0, which means on average they produce &lt;em&gt;more errors than there are words&lt;/em&gt;. For practical purposes, these engines are unusable on handwritten medical text.&lt;/p&gt;

&lt;p&gt;This isn't a knock on either project. Tesseract's LSTM architecture was optimized for printed text, and EasyOCR's CRNN similarly excels on clean, well-formatted inputs. Handwritten medical cursive is simply a different problem.&lt;/p&gt;

&lt;h3&gt;
  
  
  Finding 2: The Modern Engines Are a Generational Leap
&lt;/h3&gt;

&lt;p&gt;PP-OCRv5 and GLM-OCR both break the 20% exact match barrier, a qualitative jump from the sub-3% performance of the older engines. The gap between "old" and "new" (a roughly 10x improvement in exact match) is far larger than the gap between PP-OCRv5 and GLM-OCR themselves.&lt;/p&gt;

&lt;h3&gt;
  
  
  Finding 3: Why Does GLM-OCR Win on Characters but Lose on Words?
&lt;/h3&gt;

&lt;p&gt;This is the most interesting finding. GLM-OCR achieves a CER of &lt;strong&gt;0.328&lt;/strong&gt;, which is 31% lower than PP-OCRv5's 0.477. At the character level, the vision-language approach genuinely helps. GLM-OCR can use its language decoder's knowledge of likely character sequences to infer partially visible characters.&lt;/p&gt;

&lt;p&gt;But PP-OCRv5 edges ahead on WER: &lt;strong&gt;0.789 vs 0.801&lt;/strong&gt;. It makes fewer word-level mistakes.&lt;/p&gt;

&lt;p&gt;Why the divergence? My hypothesis: GLM-OCR's autoregressive generation occasionally produces subtle extra tokens or formatting variations. The GLM-OCR paper itself acknowledges "minor stochastic variation in formatting behaviors, particularly in line breaks and whitespace handling" (&lt;a href="https://arxiv.org/abs/2603.10910" rel="noopener noreferrer"&gt;Duan et al.&lt;/a&gt;, 2026). These small artifacts barely affect CER but can flip a word from "correct" to "incorrect" in WER/exact-match scoring.&lt;/p&gt;

&lt;p&gt;The practical takeaway: &lt;strong&gt;which engine is "better" depends on your error metric.&lt;/strong&gt; If you care about getting as close as possible character-by-character (for downstream spell-correction, for example), GLM-OCR wins. If you need clean word-level outputs with minimal postprocessing, PP-OCRv5 has the edge.&lt;/p&gt;

&lt;h3&gt;
  
  
  Finding 4: What Are the Real Deployment Trade-offs?
&lt;/h3&gt;

&lt;p&gt;Both engines are fast enough for practical use. PP-OCRv5 processes images at &lt;strong&gt;103ms each&lt;/strong&gt; and GLM-OCR at &lt;strong&gt;141ms&lt;/strong&gt;. GLM-OCR's Multi-Token Prediction (generating roughly 5.2 tokens per step instead of one) keeps its inference speed competitive despite the larger model.&lt;/p&gt;

&lt;p&gt;The bigger difference is &lt;strong&gt;model size and memory footprint&lt;/strong&gt;. PP-OCRv5's 5M parameters take up roughly 20MB on disk, small enough for a Raspberry Pi or embedded device. GLM-OCR's 0.9B parameters need around 1.8GB at FP16 (less with quantization). Both run on CPU without a GPU, as our Apple Silicon benchmark shows, but GLM-OCR consumes significantly more RAM. The GLM-OCR paper notes that the model "enables deployment in both large-scale and resource-constrained edge scenarios" and supports frameworks like Ollama for local inference (&lt;a href="https://arxiv.org/abs/2603.10910" rel="noopener noreferrer"&gt;Duan et al.&lt;/a&gt;, 2026). For deploying across thousands of low-spec workstations, PP-OCRv5's small footprint is a clear advantage. For a centralized server or any machine with a few GB of RAM to spare, GLM-OCR is equally practical.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Do the Predictions Actually Look Like?
&lt;/h2&gt;

&lt;p&gt;Numbers tell one story. Seeing the actual predictions tells another. Here are four representative samples from the benchmark, hand-picked to illustrate each finding above.&lt;/p&gt;

&lt;h3&gt;
  
  
  GLM-OCR Nails a Complex Drug Name
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ff50vz8qzjypfxds957f8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ff50vz8qzjypfxds957f8.png" alt="GLM-OCR nails " width="800" height="206"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;"Ceftriaxone" (a common antibiotic), 11 characters long. Tesseract reads "Wau" and EasyOCR produces "@@FNRH," both useless. PP-OCRv5 gets close with "CEFRAXONE" (CER=0.18, missing the 'ti'). Only GLM-OCR reads it perfectly. On long drug names, the language decoder's knowledge of likely character sequences gives it a real edge.&lt;/p&gt;

&lt;h3&gt;
  
  
  When PP-OCRv5 Is Closer and GLM-OCR Hallucinates
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhzhur0d9njtprem6cvds.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhzhur0d9njtprem6cvds.png" alt="PP-OCRv5 closer on " width="800" height="206"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;"Zolfin" (a proton pump inhibitor). PP-OCRv5 reads "zolfiu" (CER=0.17, just the last character wrong). GLM-OCR outputs "2016'u" (CER=1.0), completely misreading the word as a number. This is the flip side of the VLM approach: when the handwriting doesn't match patterns in the training data, the language model can steer the output in the wrong direction entirely.&lt;/p&gt;

&lt;h3&gt;
  
  
  GLM-OCR Nearly Perfect
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F45x5mr6cgpg2r54q6xmy.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F45x5mr6cgpg2r54q6xmy.png" alt="GLM-OCR nearly perfect on " width="800" height="206"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;"Pantonix" (a pantoprazole brand). GLM-OCR outputs "Pantomix" (CER=0.125, one character off). PP-OCRv5 gets "Pantomy" (CER=0.375). Both are close, but GLM-OCR is three times more accurate by CER. Neither gets an exact match, which is the pattern behind Finding 3: GLM-OCR consistently gets closer character-by-character, even when neither engine gets the word exactly right.&lt;/p&gt;

&lt;h3&gt;
  
  
  When the VLM Goes Off the Rails
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8fsukpn1xj6tw0jve4c7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8fsukpn1xj6tw0jve4c7.png" alt="GLM-OCR outputs LaTeX for " width="800" height="206"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;"Fexo" (fexofenadine, an antihistamine). EasyOCR reads "Texo" (CER=0.25) and PP-OCRv5 reads "-exo" (CER=0.25), both reasonable attempts. GLM-OCR outputs LaTeX math notation: &lt;code&gt;$ \sqrt{e} x_{0} $&lt;/code&gt; (CER=4.75). This is a rare but real failure mode of generative VLMs: the model interprets the handwriting as a math expression instead of text. It produced more characters than the ground truth has, which is why CER exceeds 1.0.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Are the Limitations?
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Single dataset, single run.&lt;/strong&gt; RxHandBD is one dataset of Bangladeshi prescriptions. US, European, or East Asian handwriting styles may produce different rankings. We don't have confidence intervals from multiple runs.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Word-level only.&lt;/strong&gt; We tested recognition on pre-cropped word images, not full-page detection + recognition. Real-world performance depends on the complete pipeline.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;CPU-only.&lt;/strong&gt; All engines ran on CPU. GPU acceleration could significantly change the latency picture, particularly for GLM-OCR.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Default configs.&lt;/strong&gt; No engine was fine-tuned on medical data. Domain-specific adaptation could improve all of them.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No clinical validation.&lt;/strong&gt; OCR accuracy and clinical safety are different things. A 32.6% exact match rate is impressive for research, but not nearly sufficient for automated prescription processing without human review.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Modern OCR has made a genuine leap on handwritten medical text. &lt;strong&gt;PP-OCRv5&lt;/strong&gt; (5M parameters, best word-level accuracy) and &lt;strong&gt;GLM-OCR&lt;/strong&gt; (0.9B parameters, best character-level accuracy) both dramatically outperform Tesseract and EasyOCR.&lt;/p&gt;

&lt;p&gt;The two champions represent fundamentally different design philosophies: a data-centric specialized pipeline vs. a compact vision-language model. Yet they arrive at remarkably similar performance levels. Both are open-source and practically deployable.&lt;/p&gt;

&lt;p&gt;For practitioners building healthcare OCR systems: these two engines deserve serious evaluation. Start with your specific error tolerance, hardware constraints, and whether you need word-level recognition or full-page document understanding.&lt;/p&gt;

&lt;p&gt;For researchers: this is a domain with high clinical impact and, as these results show, plenty of room for improvement. Even the best engine here gets only 1 in 3 words exactly right on doctor handwriting. There's real work left to do.&lt;/p&gt;

&lt;p&gt;What datasets or engines should I test next? Let me know in the comments.&lt;/p&gt;




&lt;h2&gt;
  
  
  References
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;World Health Organization. "WHO launches global effort to halve medication-related errors in 5 years." March 29, 2017. &lt;a href="https://www.who.int/news/item/29-03-2017-who-launches-global-effort-to-halve-medication-related-errors-in-5-years" rel="noopener noreferrer"&gt;https://www.who.int/news/item/29-03-2017-who-launches-global-effort-to-halve-medication-related-errors-in-5-years&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Albarrak AI, Al Rashidi EA, Fatani RK, Al Ageel SI, Mohammed R. "Assessment of legibility and completeness of handwritten and electronic prescriptions." &lt;em&gt;Saudi Pharmaceutical Journal&lt;/em&gt;, 2014. &lt;a href="https://pmc.ncbi.nlm.nih.gov/articles/PMC4281619/" rel="noopener noreferrer"&gt;https://pmc.ncbi.nlm.nih.gov/articles/PMC4281619/&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Albalushi AK, et al. "Assessment of Legibility of Handwritten Prescriptions and Adherence to W.H.O. Prescription Writing Guidelines." &lt;em&gt;J. of Pharmaceutical Research International&lt;/em&gt;, 2023. &lt;a href="https://pmc.ncbi.nlm.nih.gov/articles/PMC10686667/" rel="noopener noreferrer"&gt;https://pmc.ncbi.nlm.nih.gov/articles/PMC10686667/&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Cui C, Zhang Y, Sun T, et al. "PP-OCRv5: A Specialized 5M-Parameter Model Rivaling Billion-Parameter Vision-Language Models on OCR Tasks." arXiv:2603.24373, March 2026. &lt;a href="https://arxiv.org/abs/2603.24373" rel="noopener noreferrer"&gt;https://arxiv.org/abs/2603.24373&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Duan S, Xue Y, Wang W, et al. "GLM-OCR Technical Report." arXiv:2603.10910, March 2026. &lt;a href="https://arxiv.org/abs/2603.10910" rel="noopener noreferrer"&gt;https://arxiv.org/abs/2603.10910&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Shovon MSH, et al. "RxHandBD: A Handwritten Prescription Recognition Dataset from Bangladesh." &lt;em&gt;Mendeley Data&lt;/em&gt;. &lt;a href="https://data.mendeley.com/datasets" rel="noopener noreferrer"&gt;https://data.mendeley.com/datasets&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Grand View Research. "Optical Character Recognition Market Analysis." 2025. &lt;a href="https://www.grandviewresearch.com/industry-analysis/optical-character-recognition-market" rel="noopener noreferrer"&gt;https://www.grandviewresearch.com/industry-analysis/optical-character-recognition-market&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;




&lt;p&gt;&lt;em&gt;Benchmark conducted independently. No affiliation with any OCR project. All engines evaluated using default configurations.&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;About the author:&lt;/strong&gt; Botao Deng is an ML/AI engineer and researcher who builds and evaluates production models.&lt;a href="https://github.com/robot010" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>ocr</category>
      <category>medical</category>
      <category>machinelearning</category>
    </item>
    <item>
      <title>To the Programmer Quietly Drowning in AI Anxiety</title>
      <dc:creator>Bato</dc:creator>
      <pubDate>Sun, 22 Feb 2026 07:22:04 +0000</pubDate>
      <link>https://dev.to/kaniel_outis/to-the-programmer-quietly-drowning-in-ai-anxiety-42pm</link>
      <guid>https://dev.to/kaniel_outis/to-the-programmer-quietly-drowning-in-ai-anxiety-42pm</guid>
      <description>&lt;p&gt;A quiet word for those who feel like they’re falling behind&lt;/p&gt;




&lt;p&gt;Let me guess how your morning went.&lt;/p&gt;

&lt;p&gt;You opened your phone, scrolled through some tech feed: Twitter, Hacker News, Reddit, whatever your poison is, and within thirty seconds, you saw someone claim they built an entire SaaS product over the weekend using nothing but prompts and vibes. Then you saw a thread about a new model that makes the one you just learned obsolete. Then a CEO somewhere declared that software engineers have maybe five good years left.&lt;/p&gt;

&lt;p&gt;You put your phone down. You picked up your coffee. And somewhere between the first sip and the second, a familiar knot tightened in your chest.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Am I falling behind? Should I be doing more? Is everything I've built going to be worthless?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Yeah. I know that feeling. I want to talk about it.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Treadmill That Never Stops
&lt;/h2&gt;

&lt;p&gt;The pace right now is genuinely absurd. It’s not just fast, it’s &lt;em&gt;disorienting&lt;/em&gt;. Last month’s breakthrough is this month’s footnote. You barely finish a tutorial on one framework before the community has already moved on to something shinier. The vocabulary alone is exhausting: RAG, LoRA, Agents, MCP, function calling, each one demanding your attention like a toddler pulling at your sleeve.&lt;/p&gt;

&lt;p&gt;And the showcase culture makes it worse. Every feed is a highlight reel. Everyone seems to be shipping, building, launching. Nobody posts about the afternoon they spent confused, reading the same documentation page four times. Nobody talks about the tools they tried that turned out to be useless.&lt;/p&gt;

&lt;p&gt;It creates this illusion that there's a speeding train, and everyone is on it except you.&lt;/p&gt;




&lt;h2&gt;
  
  
  But Here's What I've Learned From Watching a Few Trains Go By
&lt;/h2&gt;

&lt;p&gt;I've been in tech long enough to remember when the shift from classical machine learning to deep learning felt like the sky was falling. People who had spent a decade perfecting feature engineering, tuning gradient-boosted trees, building meticulous pipelines — they woke up one day and the entire conversation had moved to neural networks. A decade of expertise suddenly felt quaint.&lt;/p&gt;

&lt;p&gt;Then deep learning itself went through its own upheavals. CNNs gave way to RNNs, then LSTMs, then attention mechanisms, then Transformers swallowed everything whole. At each turn, someone's specialty became a paragraph in a history chapter.&lt;/p&gt;

&lt;p&gt;Then came BERT, then GPT, and suddenly pre-training plus fine-tuning was the only game in town. Another reshuffling. Another wave of existential dread.&lt;/p&gt;

&lt;p&gt;You know what I noticed, though? The people who came through all of that, the ones who are still here and still relevant, they weren't the ones who had the best grip on any single technology. They were the ones who had learned how to &lt;em&gt;learn&lt;/em&gt;. They had developed a kind of peripheral vision for change: the ability to sense what mattered, what was temporary, and when to invest their energy.&lt;/p&gt;

&lt;p&gt;That skill set doesn't expire.&lt;/p&gt;




&lt;h2&gt;
  
  
  Not Every Wave Deserves Your Weekend
&lt;/h2&gt;

&lt;p&gt;Here's something nobody tells you when you're in the thick of it: the shelf life of most technical hype is shockingly short. The vast majority of tools, frameworks, and paradigms that seem world-ending today will be footnotes in two years. Some of them will be footnotes in six months.&lt;/p&gt;

&lt;p&gt;This doesn't mean none of it matters. It means &lt;em&gt;not all of it matters equally&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;And if you try to sprint after every single thing, if you treat every new announcement as a personal emergency, you will burn out. That's not a motivational cliché. It's a mechanical fact. Human beings are not designed to sustain a permanent state of urgency.&lt;/p&gt;

&lt;p&gt;The more useful discipline isn't relentless pursuit. It's &lt;em&gt;discernment&lt;/em&gt;. Learning to sit with the noise long enough to separate the signal. Asking: is this a real shift in how problems get solved, or is this just a new coat of paint on an old idea? Is this changing the &lt;em&gt;questions&lt;/em&gt; we ask, or just the tools we use to answer them?&lt;/p&gt;

&lt;p&gt;That kind of judgment is slow to build. But it's the thing that compounds.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why I Don't Think We're Getting Replaced
&lt;/h2&gt;

&lt;p&gt;I've heard the "programmers are done" narrative enough times to have an opinion on it, so here's mine: I think it's mostly wrong, and wrong in an interesting way.&lt;/p&gt;

&lt;p&gt;The argument assumes that programming is fundamentally about producing code, and if a machine can produce code faster, then programmers lose. But that was never quite right. The hard part of software was never typing. It was &lt;em&gt;figuring out what to type&lt;/em&gt;. Understanding messy requirements. Navigating system constraints. Making tradeoffs that don't have clean answers. Debugging not just logic errors, but &lt;em&gt;conceptual&lt;/em&gt; errors, the kind where the code works perfectly and the product is still wrong.&lt;/p&gt;

&lt;p&gt;AI is extraordinary at generation. It's getting better at reasoning. But it still needs someone to point it at the right problem, to validate its output against reality, to integrate it into systems that have history and politics and technical debt. That "someone" looks a lot like an engineer to me.&lt;/p&gt;

&lt;p&gt;And here's the irony that I think gets lost in the panic: programmers are already the people &lt;em&gt;closest&lt;/em&gt; to this technology. &lt;strong&gt;We're the ones working with the models every day, feeling out their edges, learning their failure modes. The anxiety often comes from proximity, when you're standing right under a wave, it looks like it's going to crush you.&lt;/strong&gt; But proximity is also advantage. We're not watching this from the shore. We're already in the water.&lt;/p&gt;




&lt;h2&gt;
  
  
  A Case for Going Slow
&lt;/h2&gt;

&lt;p&gt;I want to end with something that might sound counterintuitive in an industry obsessed with speed.&lt;/p&gt;

&lt;p&gt;It's okay to be slow.&lt;/p&gt;

&lt;p&gt;It's okay to not have an opinion on the model that dropped yesterday. It's okay to skip a hype cycle. It's okay to spend your weekend doing something that has nothing to do with AI and not feel guilty about it.&lt;/p&gt;

&lt;p&gt;The people who build lasting careers in technology aren't the ones who mass-produce side projects on every trending tool. They're the ones who develop &lt;em&gt;taste&lt;/em&gt;, a quiet, hard-won instinct for what matters and what doesn't. That kind of taste doesn't come from chasing everything. It comes from watching patiently, choosing deliberately, and trusting that you don't have to catch every wave to have a good ride.&lt;/p&gt;

&lt;p&gt;So if the anxiety has been getting to you, if you've been lying awake wondering whether your skills still matter, whether you're doing enough, whether the ground beneath you is about to give way, let me say this plainly:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;You are not behind. You are in the middle of a very loud, very confusing moment. And loud, confusing moments always feel more permanent than they are.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The wave will keep moving. So will you. And at your own pace, in your own way, you'll find where you stand.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;I wrote this as much for myself as for anyone else. If it landed, I'd love to hear what you're going through. I suspect a lot more of us feel this way than the highlight reels suggest.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>programming</category>
      <category>ai</category>
      <category>genai</category>
      <category>machinelearning</category>
    </item>
    <item>
      <title>A Beginner's Guide to Multi-Agent Systems: How AI Agents Work Together</title>
      <dc:creator>Bato</dc:creator>
      <pubDate>Sat, 21 Feb 2026 18:08:55 +0000</pubDate>
      <link>https://dev.to/kaniel_outis/a-beginners-guide-to-multi-agent-systems-how-ai-agents-work-together-d43</link>
      <guid>https://dev.to/kaniel_outis/a-beginners-guide-to-multi-agent-systems-how-ai-agents-work-together-d43</guid>
      <description>&lt;p&gt;You've probably heard the term "AI agents" thrown around a lot lately. But recently, a new idea has been taking over engineering discussions: &lt;strong&gt;multi-agent systems&lt;/strong&gt;. Not one AI doing everything: but a team of AIs, each with a specific job, collaborating to tackle complex problems.&lt;/p&gt;

&lt;p&gt;Here's a surprise: if you've ever used &lt;strong&gt;Claude Code&lt;/strong&gt; to refactor a large codebase or fix a tricky bug, you've already seen a multi-agent system at work, you just might not have known it.&lt;/p&gt;

&lt;p&gt;If that sounds complicated, don't worry. By the end of this guide, you'll understand what multi-agent systems are, why they matter, and how to build a simple one yourself (no PhD required).&lt;/p&gt;




&lt;h2&gt;
  
  
  First: What Even Is an "Agent"?
&lt;/h2&gt;

&lt;p&gt;Before we go multi, let's make sure we're clear on what a single agent is.&lt;/p&gt;

&lt;p&gt;A traditional LLM (like GPT or Claude) takes input and produces output — one shot, done. An &lt;strong&gt;agent&lt;/strong&gt; goes further: it can &lt;strong&gt;reason&lt;/strong&gt;, &lt;strong&gt;use tools&lt;/strong&gt;, and &lt;strong&gt;take actions in a loop&lt;/strong&gt; until a goal is completed.&lt;/p&gt;

&lt;p&gt;Think of it this way:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;LLM&lt;/strong&gt;: "Here's a summary of that article."&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Agent&lt;/strong&gt;: "I'll search the web for that article, read it, cross-check it with two other sources, and then give you a summary with citations."&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Agents typically follow a loop:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Observe → Think → Act → Observe again → ...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A common implementation looks roughly like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;run_agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;goal&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;goal&lt;/span&gt;&lt;span class="p"&gt;}]&lt;/span&gt;

    &lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;llm&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;is_final_answer&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;

        &lt;span class="c1"&gt;# The LLM decided to use a tool
&lt;/span&gt;        &lt;span class="n"&gt;tool_result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;execute_tool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tool_call&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;tool_result&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Simple enough. So why do we need multiple agents?&lt;/p&gt;




&lt;h2&gt;
  
  
  The Problem With One Agent Doing Everything
&lt;/h2&gt;

&lt;p&gt;Imagine you ask a single agent to:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"Research our top 3 competitors, write a market analysis report, and then draft 5 LinkedIn posts based on it."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That's three very different jobs: researcher, analyst, copywriter. Cramming all of that into one agent creates real problems:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Context window overload&lt;/strong&gt; — Long tasks fill up the LLM's memory fast, causing it to "forget" earlier steps.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Lack of specialization&lt;/strong&gt; — An agent trying to do everything tends to do nothing particularly well.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Hard to debug&lt;/strong&gt; — When something goes wrong, you don't know which "part" failed.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No parallelism&lt;/strong&gt; — One agent does things one at a time. What if subtasks could run simultaneously?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is exactly the problem multi-agent systems solve.&lt;/p&gt;




&lt;h2&gt;
  
  
  You're Already Using Multi-Agent AI
&lt;/h2&gt;

&lt;p&gt;Before we get to theory, let's look at a tool many developers already have in their terminal: &lt;strong&gt;Claude Code&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;When you ask Claude Code something simple like &lt;code&gt;"fix the bug on line 42"&lt;/code&gt;, it handles it in a single pass. But ask it something more complex:&lt;code&gt;"refactor this entire module, write tests, and check for regressions"&lt;/code&gt;, and something more interesting happens under the hood.&lt;/p&gt;

&lt;p&gt;Claude Code acts as an &lt;strong&gt;orchestrator&lt;/strong&gt;. Instead of trying to hold the entire task in one context window, it breaks the work down and can spin up &lt;strong&gt;subagents&lt;/strong&gt;: separate Claude instances with specific, scoped roles. One subagent might be tasked with exploring the codebase structure, another with writing the actual refactored code, and another with running the test suite and reporting results. Each subagent operates independently, does its job, and reports back.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;You
 └─▶ Claude Code (Orchestrator)
        ├─▶ Subagent A: "Explore the repo and map dependencies"
        ├─▶ Subagent B: "Rewrite the module based on the map"
        └─▶ Subagent C: "Run tests and report failures"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The orchestrator then assembles the results and gives you a single coherent answer — as if one very capable developer had done it all.&lt;/p&gt;

&lt;p&gt;This is the multi-agent pattern in action. And the same design is behind tools like Devin, OpenAI's Operator, and many of the AI-powered developer tools launching in 2025–2026. Now let's understand how it works so you can build your own.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Is a Multi-Agent System?
&lt;/h2&gt;

&lt;p&gt;A &lt;strong&gt;multi-agent system (MAS)&lt;/strong&gt; is a setup where multiple AI agents work together — each with a defined role — to complete a larger task. Think of it like a software engineering team: you have a project manager, a frontend dev, a backend dev, and a QA engineer. Each is an expert in their lane, and a coordinator ties their work together.&lt;/p&gt;

&lt;p&gt;The key building blocks are:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. The Orchestrator (a.k.a. the "Manager Agent")
&lt;/h3&gt;

&lt;p&gt;This is the brain that receives the high-level goal, breaks it into subtasks, assigns those subtasks to specialized agents, and assembles the final result. The orchestrator doesn't necessarily do the actual work — it delegates.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Subagents (a.k.a. "Worker Agents")
&lt;/h3&gt;

&lt;p&gt;These agents handle specific, well-scoped tasks. A &lt;code&gt;ResearchAgent&lt;/code&gt; searches the web. A &lt;code&gt;WriterAgent&lt;/code&gt; drafts content. A &lt;code&gt;CodeAgent&lt;/code&gt; writes and runs code. Each has its own set of tools appropriate to its role.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Tools
&lt;/h3&gt;

&lt;p&gt;Tools are functions that agents can call — web search, code execution, API calls, database queries, file I/O. Tools are what make agents actually &lt;em&gt;useful&lt;/em&gt; in the real world.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Memory
&lt;/h3&gt;

&lt;p&gt;Agents need context. Memory can be:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Short-term&lt;/strong&gt; (conversation history within a session)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Long-term&lt;/strong&gt; (a vector database or knowledge store that persists between runs)&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  5. Communication
&lt;/h3&gt;

&lt;p&gt;Agents pass messages to each other — typically as structured text or JSON. The orchestrator sends a task; the subagent returns a result.&lt;/p&gt;




&lt;h2&gt;
  
  
  Building a Simple Multi-Agent System
&lt;/h2&gt;

&lt;p&gt;Let's put this into code. We'll build a small, framework-agnostic example: a two-agent system where one agent researches a topic and another writes a blog intro based on the research.&lt;/p&gt;

&lt;p&gt;We'll use Python and the OpenAI API (you can swap this for any LLM provider, the pattern stays the same).&lt;/p&gt;

&lt;h3&gt;
  
  
  Setup
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;openai
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;your-api-key-here&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;call_llm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;system_prompt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;user_message&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;A simple wrapper to call an LLM with a system + user prompt.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpt-4o-mini&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;system&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;system_prompt&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
            &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;user_message&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  The Research Agent
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;research_agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;topic&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    A specialized agent whose only job is to gather key facts about a topic.
    In a real system, this agent would have web search tools.
    For simplicity, we&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;re having the LLM draw on its training knowledge.
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;system_prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    You are a research assistant. Your job is to provide a concise,
    factual summary of a given topic — 5 key bullet points, nothing more.
    Focus on accuracy and relevance. Do not editorialize.
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;call_llm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;system_prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Research this topic: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;topic&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;[ResearchAgent] Done. Key facts gathered.&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  The Writer Agent
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;writer_agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;topic&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;research&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    A specialized agent whose only job is to write engaging content
    based on provided research. It doesn&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;t search — it just writes.
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;system_prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    You are a skilled technical writer for a developer blog.
    Given a topic and research notes, write a compelling, friendly
    introduction paragraph (3-4 sentences) that hooks the reader.
    Write for developers, not academics.
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;user_message&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    Topic: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;topic&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;

    Research notes:
    &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;research&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;

    Write the intro paragraph now.
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;call_llm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;system_prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;user_message&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;[WriterAgent] Done. Intro written.&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  The Orchestrator
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;orchestrator&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;goal&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    The orchestrator receives a high-level goal, breaks it into subtasks,
    delegates to specialized agents, and assembles the final output.
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;[Orchestrator] Goal received: &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;goal&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;'"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;[Orchestrator] Delegating research task...&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Step 1: Extract the topic from the goal (in a real system,
&lt;/span&gt;    &lt;span class="c1"&gt;# the orchestrator would use an LLM to parse the goal)
&lt;/span&gt;    &lt;span class="n"&gt;topic&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;goal&lt;/span&gt;  &lt;span class="c1"&gt;# Simplified for this example
&lt;/span&gt;
    &lt;span class="c1"&gt;# Step 2: Delegate to ResearchAgent
&lt;/span&gt;    &lt;span class="n"&gt;research_output&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;research_agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;topic&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Step 3: Delegate to WriterAgent, passing the research output
&lt;/span&gt;    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;[Orchestrator] Delegating writing task...&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;final_output&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;writer_agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;topic&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;research_output&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Step 4: Return assembled result
&lt;/span&gt;    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;[Orchestrator] All tasks complete. Returning final output.&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;final_output&lt;/span&gt;


&lt;span class="c1"&gt;# Run it
&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;orchestrator&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;The rise of multi-agent AI systems in 2025&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;=== FINAL OUTPUT ===&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Sample Output
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[Orchestrator] Goal received: 'The rise of multi-agent AI systems in 2025'
[Orchestrator] Delegating research task...

[ResearchAgent] Done. Key facts gathered.

[Orchestrator] Delegating writing task...

[WriterAgent] Done. Intro written.

[Orchestrator] All tasks complete. Returning final output.

=== FINAL OUTPUT ===
In 2025, AI stopped being a solo act. Multi-agent systems — where
teams of specialized AI models collaborate on complex tasks — emerged
from research labs into production engineering stacks at companies like
Google, OpenAI, and Anthropic. Rather than asking one model to do
everything, developers are now designing pipelines where a "manager"
agent delegates research, writing, coding, and verification to expert
subagents. If you've been wondering what all the buzz is about, you're
in exactly the right place.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is the core pattern. In a production system, you'd add real web search tools, error handling, retry logic, agent memory, and parallel execution — but the &lt;strong&gt;orchestrator → delegate → assemble&lt;/strong&gt; structure stays the same.&lt;/p&gt;




&lt;h2&gt;
  
  
  Real-World Use Cases
&lt;/h2&gt;

&lt;p&gt;Multi-agent systems shine whenever a task is too large, complex, or varied for a single agent. Here are three common patterns you'll see in the wild:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Automated research pipelines&lt;/strong&gt;&lt;br&gt;
One agent searches and gathers sources, another reads and extracts key points, a third synthesizes findings into a report. No single agent's context window gets overwhelmed.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. AI coding assistants (like Claude Code)&lt;/strong&gt;&lt;br&gt;
This is the most accessible real-world example. Claude Code uses an orchestrator-subagent model: when given a complex task, the main agent breaks it into subtasks and delegates — one subagent explores the codebase, one writes or modifies code, one runs shell commands and tests. Each subagent has a narrow, well-defined job. This same pattern powers tools like Devin and SWE-agent.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Customer support automation&lt;/strong&gt;&lt;br&gt;
An &lt;code&gt;IntentAgent&lt;/code&gt; classifies the user's issue, a &lt;code&gt;KnowledgeAgent&lt;/code&gt; retrieves the relevant documentation, and a &lt;code&gt;ResponseAgent&lt;/code&gt; drafts the reply. Each agent is small, fast, and easy to tune independently.&lt;/p&gt;




&lt;h2&gt;
  
  
  Common Pitfalls to Avoid
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Giving agents too much responsibility.&lt;/strong&gt; The whole point of multi-agent systems is specialization. If your &lt;code&gt;ResearchAgent&lt;/code&gt; is also writing and formatting the output, it's not really specialized.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Forgetting error handling between agents.&lt;/strong&gt; What happens if the research agent returns nothing? Your writer agent will hallucinate. Always validate the output of one agent before passing it to the next.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Ignoring cost and latency.&lt;/strong&gt; Each agent call costs money and time. More agents ≠ better results. Start with the minimum number of agents needed and add more only when you hit a real bottleneck.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;No logging or tracing.&lt;/strong&gt; In a chain of agents, debugging is hard without visibility. Add logs at every handoff (like the &lt;code&gt;print&lt;/code&gt; statements in our example), and consider tools like LangSmith or Langfuse for production tracing.&lt;/p&gt;




&lt;h2&gt;
  
  
  Where to Go From Here
&lt;/h2&gt;

&lt;p&gt;You now understand the fundamentals. Here are some good next steps depending on where you want to go:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Try LangGraph&lt;/strong&gt; if you want a production-grade framework for building stateful, graph-based agent workflows with built-in support for cycles and conditional edges.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Try Google's Agent Development Kit (ADK)&lt;/strong&gt; if you want Google's official framework — it was just announced and has great tooling for building hierarchical agent systems.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Try OpenAI's Agents SDK&lt;/strong&gt; if you're already in the OpenAI ecosystem and want handoffs and tool-calling built in out of the box.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Read "Patterns for Building LLM-based Systems"&lt;/strong&gt; by Eugene Yan — one of the best practical overviews of agent design patterns available.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Wrapping Up
&lt;/h2&gt;

&lt;p&gt;Multi-agent systems aren't magic, and they're not just hype either. They're a practical engineering pattern for solving problems that are genuinely hard for a single AI to handle — tasks that are too long, too complex, or too diverse.&lt;/p&gt;

&lt;p&gt;The pattern is simple: &lt;strong&gt;break down the goal → assign specialized agents → orchestrate the results&lt;/strong&gt;. Start small, keep your agents focused, and add complexity only when you need it.&lt;/p&gt;

&lt;p&gt;The era of AI teamwork is just getting started, and now you know how to build your own team.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Did this help? Drop a comment with what you're building — I'd love to hear what multi-agent use cases you're exploring.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>agents</category>
      <category>genai</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>Pandas 3.0's PyArrow String Revolution: A Deep Dive into Memory and Performance</title>
      <dc:creator>Bato</dc:creator>
      <pubDate>Mon, 16 Feb 2026 07:50:49 +0000</pubDate>
      <link>https://dev.to/kaniel_outis/pandas-30s-pyarrow-string-revolution-a-deep-dive-into-memory-and-performance-357g</link>
      <guid>https://dev.to/kaniel_outis/pandas-30s-pyarrow-string-revolution-a-deep-dive-into-memory-and-performance-357g</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;Pandas 3.0 made a game-changing decision: &lt;strong&gt;PyArrow-backed strings are now the default&lt;/strong&gt;. Instead of storing strings as Python objects (the old &lt;code&gt;object&lt;/code&gt; dtype), pandas now uses Apache Arrow's columnar format with the new &lt;code&gt;string[pyarrow]&lt;/code&gt; dtype.&lt;/p&gt;

&lt;p&gt;But here's the question that matters: &lt;strong&gt;How much does this new string dtype actually improve performance and memory usage in real-world scenarios?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;To find out, I ran comprehensive benchmarks across diverse datasets and common string operations. The results? &lt;strong&gt;51.8% memory savings&lt;/strong&gt; on average, with operations running &lt;strong&gt;2-27x faster&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;This isn't a theoretical improvement, it's a fundamental shift in how pandas handles string data.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Results: Summary Dashboard
&lt;/h2&gt;

&lt;p&gt;Let me start with the headline numbers, then we'll dive into how I got them.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fb19prr1lm77mc5zf2s0r.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fb19prr1lm77mc5zf2s0r.png" alt="Result Summary" width="800" height="509"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  The Four Key Metrics
&lt;/h3&gt;

&lt;h3&gt;
  
  
  1. 51.8% Memory Savings
&lt;/h3&gt;

&lt;p&gt;Across all test datasets, the new PyArrow string dtype used half the memory of the old object dtype. This isn't a marginal improvement, it's transformative for memory-constrained environments.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. 6.17x Average Operation Speedup
&lt;/h3&gt;

&lt;p&gt;String operations aren't just more memory-efficient: they're dramatically faster. On average, operations like &lt;code&gt;str.lower()&lt;/code&gt;, &lt;code&gt;str.contains()&lt;/code&gt;, and &lt;code&gt;str.len()&lt;/code&gt; run 6x faster with PyArrow strings.&lt;/p&gt;

&lt;p&gt;Some operations are even more impressive:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;str.len()&lt;/code&gt;: 27x faster&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;str.startswith()&lt;/code&gt;: 16x faster&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;str.endswith()&lt;/code&gt;: 15x faster&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  3. 889 MB Total Memory Saved
&lt;/h3&gt;

&lt;p&gt;Across our test datasets (totaling 645 MB on disk), we saved nearly &lt;strong&gt;1 GB of RAM&lt;/strong&gt; in memory. For a real data pipeline processing dozens of datasets, this compounds quickly.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Memory Overhead: The Game Changer
&lt;/h3&gt;

&lt;p&gt;The bottom chart reveals something crucial about how pandas handles strings:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Old string dtype (object):&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;CSV files on disk: 645 MB&lt;/li&gt;
&lt;li&gt;Loaded into pandas: 1,714 MB&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Memory overhead: 165.7%&lt;/strong&gt; (more than doubles!)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;New string dtype (PyArrow):&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;CSV files on disk: 645 MB&lt;/li&gt;
&lt;li&gt;Loaded into pandas: 825 MB&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Memory overhead: 27.9%&lt;/strong&gt; (minimal overhead)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;What does this mean?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;When pandas reads a CSV file, it doesn't just store the raw bytes: it creates in-memory data structures for fast operations. The old object dtype was incredibly inefficient, essentially duplicating string data multiple times. The new PyArrow string dtype keeps overhead minimal with a smarter memory layout.&lt;/p&gt;

&lt;p&gt;This is the difference between pandas 2's Python-object approach and pandas 3's columnar Arrow approach.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Methodology: Why 5 Different Datasets?
&lt;/h2&gt;

&lt;p&gt;Now that you've seen the results, let me explain how I tested this. Real-world data comes in many shapes and sizes. A single benchmark on one type of data wouldn't tell the whole story.&lt;/p&gt;

&lt;p&gt;That's why I created &lt;strong&gt;5 distinct datasets&lt;/strong&gt;, each representing common patterns you'll encounter in production:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. &lt;strong&gt;Low Cardinality Dataset&lt;/strong&gt; (1M rows)
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;What it is:&lt;/strong&gt; Repeated categorical values like product categories, status codes, regions, and priorities.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why it matters:&lt;/strong&gt; This is typical of business data - think order statuses, customer segments, or department codes. The same values repeat millions of times.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Example columns:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;category&lt;/code&gt;: "Electronics", "Clothing", "Food" (10 unique values)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;status&lt;/code&gt;: "pending", "completed", "failed" (4 unique values)&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  2. &lt;strong&gt;High Cardinality Dataset&lt;/strong&gt; (1M rows)
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;What it is:&lt;/strong&gt; Mostly unique strings like user IDs, email addresses, and session tokens.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why it matters:&lt;/strong&gt; When every row is different (like customer emails or transaction IDs), pandas can't use simple optimizations. This tests worst-case scenarios.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Example columns:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;user_id&lt;/code&gt;: "USER_00000001", "USER_00000002"... (1M unique)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;email&lt;/code&gt;: "&lt;a href="mailto:user123@example45.com"&gt;user123@example45.com&lt;/a&gt;" (1M unique)&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  3. &lt;strong&gt;Mixed String Lengths Dataset&lt;/strong&gt; (1M rows)
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;What it is:&lt;/strong&gt; A combination of short codes (2-5 chars), medium names (20-50 chars), and long descriptions (100-300 chars).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why it matters:&lt;/strong&gt; Real data isn't uniform. You might have product codes next to customer addresses next to order notes. This tests how pandas handles variable-length strings.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. &lt;strong&gt;Dataset With Nulls&lt;/strong&gt; (1M rows)
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;What it is:&lt;/strong&gt; Data with missing values (10-33% nulls in different columns).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why it matters:&lt;/strong&gt; Messy data is reality. How does pandas 3.0 handle missing string data compared to pandas 2?&lt;/p&gt;

&lt;h3&gt;
  
  
  5. &lt;strong&gt;Large Dataset&lt;/strong&gt; (10M rows)
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;What it is:&lt;/strong&gt; A scaled-up version to test performance at scale.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why it matters:&lt;/strong&gt; Memory savings that look good at 1M rows might behave differently at 10M rows. This validates the findings scale linearly.&lt;/p&gt;




&lt;h2&gt;
  
  
  Memory Savings by Dataset Type
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fgithub.com%2Frobot010%2FDev-Post-Code%2Fblob%2F04343be535204b9093df487d1bcfb921d9d00f2e%2Fpandas-memory-saving%2Fvisualizations%2Fmemory_comparison.png%3Fraw%3Dtrue" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fgithub.com%2Frobot010%2FDev-Post-Code%2Fblob%2F04343be535204b9093df487d1bcfb921d9d00f2e%2Fpandas-memory-saving%2Fvisualizations%2Fmemory_comparison.png%3Fraw%3Dtrue" alt="Memory Consumption Comparison" width="4171" height="2368"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The memory savings from PyArrow strings vary significantly by dataset characteristics:&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Best Case: Low Cardinality Data (-71.6%)&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;When data has &lt;strong&gt;repeated values&lt;/strong&gt; (like categories), PyArrow strings shine:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Object dtype: 219 MB&lt;/li&gt;
&lt;li&gt;PyArrow string dtype: 62 MB&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Savings: 71.6%&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Worst Case: Mixed String Lengths (-30.6%)&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Variable-length strings see smaller (but still significant) savings:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Object dtype: 383 MB&lt;/li&gt;
&lt;li&gt;PyArrow string dtype: 266 MB&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Savings: 30.6%&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;The Pattern&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Notice how savings correlate with &lt;strong&gt;data characteristics&lt;/strong&gt;:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Repeated values&lt;/strong&gt; (low cardinality) → Best savings (64-72%)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Unique values&lt;/strong&gt; (high cardinality) → Good savings (53-55%)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Variable length&lt;/strong&gt; (mixed sizes) → Moderate savings (31%)&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Takeaway:&lt;/strong&gt; PyArrow strings help everywhere, but they're &lt;em&gt;especially&lt;/em&gt; powerful for categorical-like data.&lt;/p&gt;




&lt;h2&gt;
  
  
  Performance: Operation-Specific Speedups
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fg5evoggnd0cs0yp0xkf0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fg5evoggnd0cs0yp0xkf0.png" alt="Operation Speedup Heatmap" width="800" height="606"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This heatmap shows how much &lt;strong&gt;faster&lt;/strong&gt; PyArrow strings are compared to object dtype for common string operations (values &amp;gt; 1.0 mean PyArrow is faster).&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;The Fastest Operations&lt;/strong&gt;
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;str.len()&lt;/code&gt;: 10-27x faster&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;str.startswith()&lt;/code&gt; and &lt;code&gt;str.endswith()&lt;/code&gt;: 11-18x faster&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;str.contains()&lt;/code&gt;: 3-5x faster&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;str.split()&lt;/code&gt;: 1-8x faster&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;The Pattern&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Read operations (like &lt;code&gt;len()&lt;/code&gt;, &lt;code&gt;startswith()&lt;/code&gt;) → Massive speedups (10-27x)&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;These operations just examine existing data without modification&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Transform operations (like &lt;code&gt;replace()&lt;/code&gt;, &lt;code&gt;split()&lt;/code&gt;) → Good speedups (2-5x)&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;These operations create new data, which limits the performance gains&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  The Trade-off: CSV Loading Time
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwd0etv119sj0o6yiut5e.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwd0etv119sj0o6yiut5e.png" alt="Load Time Comparison" width="800" height="396"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;There's no such thing as a free lunch. While PyArrow strings save memory and run operations faster, &lt;strong&gt;loading CSV files is 9%-61% slower&lt;/strong&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why the Slowdown?
&lt;/h3&gt;

&lt;p&gt;When pandas reads a CSV with PyArrow strings enabled:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;It parses the text (same as before)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;It converts strings to PyArrow's columnar format&lt;/strong&gt; (extra step)&lt;/li&gt;
&lt;li&gt;This conversion involves building dictionary encodings and optimized memory structures&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Pandas is doing &lt;strong&gt;more work upfront&lt;/strong&gt; to enable better performance downstream.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Real-world impact:&lt;/strong&gt; On our 10M row dataset, the difference is &lt;strong&gt;1.63s vs 2.02s&lt;/strong&gt;, an extra 0.4 seconds for 10 million rows. For many data pipelines, this upfront cost might be negligible compared to the 2-27x speedup in subsequent operations.&lt;/p&gt;




&lt;h2&gt;
  
  
  Pros and Cons: Should You Adopt PyArrow Strings?
&lt;/h2&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Benefits of PyArrow String Dtype&lt;/strong&gt;
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Massive Memory Savings (30-72%)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Dramatically Faster String Operations (2-27x)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Minimal Memory Overhead (28% vs 166%)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Modern Data Ecosystem Integration&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Trade-offs to Consider&lt;/strong&gt;
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Slower CSV Loading (9-61% slower)&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Initial data ingestion takes longer&lt;/li&gt;
&lt;li&gt;May impact workflows that repeatedly load small files&lt;/li&gt;
&lt;li&gt;The trade-off: slower start, much faster operations&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Behavioral Changes&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;String dtype behaves differently from object dtype in edge cases&lt;/li&gt;
&lt;li&gt;Need to update code that explicitly checks for &lt;code&gt;object&lt;/code&gt; dtype&lt;/li&gt;
&lt;li&gt;Testing required for migration&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;The Recommendation&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;For most data workflows, &lt;strong&gt;PyArrow strings are a clear win&lt;/strong&gt;. The memory and performance benefits far outweigh the trade-offs.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Consider staying with object dtype if:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You rarely work with string columns&lt;/li&gt;
&lt;li&gt;Your datasets easily fit in memory&lt;/li&gt;
&lt;li&gt;Load time is critical and you rarely perform string operations&lt;/li&gt;
&lt;li&gt;You have legacy code that's deeply coupled to object dtype behavior&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Definitely adopt PyArrow strings if:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You process large datasets with text data&lt;/li&gt;
&lt;li&gt;String operations are a significant part of your workflow&lt;/li&gt;
&lt;li&gt;Memory is a constraint in your environment&lt;/li&gt;
&lt;li&gt;You're building production data pipelines&lt;/li&gt;
&lt;li&gt;You work with modern data tools (Parquet, Arrow, DuckDB, etc.)&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Our comprehensive analysis across &lt;strong&gt;5 diverse datasets&lt;/strong&gt; and &lt;strong&gt;15+ string operations&lt;/strong&gt; conclusively shows that PyArrow-backed strings deliver transformative improvements:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;51.8% average memory savings&lt;/strong&gt; across all dataset types&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;6.17x average operation speedup&lt;/strong&gt; for string operations&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Minimal memory overhead&lt;/strong&gt; (28% vs 166% with Python objects)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;PyArrow strings aren't just an incremental improvement, they're a fundamental reimagining of how pandas handles text data. By adopting Apache Arrow's proven columnar format, pandas has joined the modern data ecosystem while delivering massive performance and memory improvements.&lt;/p&gt;

&lt;p&gt;For most data practitioners working with text data, the question isn't "Should I use PyArrow strings?" but rather "How quickly can I migrate?"&lt;/p&gt;




&lt;p&gt;Questions or feedback? Feel free to open an issue or contribute to this analysis! The code we used in this analysis has been uploaded to this &lt;a href="https://github.com/robot010/Dev-Post-Code/tree/04343be535204b9093df487d1bcfb921d9d00f2e/pandas-memory-saving" rel="noopener noreferrer"&gt;repo&lt;/a&gt;. &lt;/p&gt;

</description>
      <category>ai</category>
      <category>datascience</category>
      <category>dataengineering</category>
      <category>machinelearning</category>
    </item>
    <item>
      <title>We All Accepted the "Python Tax.", Pandas 3.0 Just Reduced It.</title>
      <dc:creator>Bato</dc:creator>
      <pubDate>Sun, 15 Feb 2026 06:51:21 +0000</pubDate>
      <link>https://dev.to/kaniel_outis/we-all-accepted-the-python-tax-pandas-30-just-reduced-it-1n43</link>
      <guid>https://dev.to/kaniel_outis/we-all-accepted-the-python-tax-pandas-30-just-reduced-it-1n43</guid>
      <description>&lt;p&gt;I’ve been there. You have a "small" 3GB CSV file. You load it into a Pandas DataFrame on a 16GB machine, and suddenly everything freezes. You start manually chunking data, deleting columns, and praying to the OOM (Out of Memory) gods 🙃.&lt;/p&gt;

&lt;p&gt;We’ve accepted this as the "Python Tax." We tell ourselves that &lt;code&gt;object&lt;/code&gt; dtypes are just the price we pay for flexibility. Spoiler: They aren't. And we’ve been wasting RAM for years.&lt;/p&gt;

&lt;h2&gt;
  
  
  The "Object" Lie
&lt;/h2&gt;

&lt;p&gt;For a decade, Pandas stored strings as NumPy &lt;code&gt;objects&lt;/code&gt;. This was a beautiful abstraction with a dark secret: it’s incredibly inefficient. Each string is wrapped in a heavy Python object header. When you have 10 million rows, you aren’t just storing data; you’re storing a massive, fragmented mess of pointers.&lt;/p&gt;

&lt;h2&gt;
  
  
  The 10-Minute Upgrade That Saved 60% of My RAM
&lt;/h2&gt;

&lt;p&gt;With the release of Pandas 3.0, the game changed. By default, it now uses a dedicated &lt;code&gt;str&lt;/code&gt; type backed by PyArrow.&lt;/p&gt;

&lt;p&gt;I ran the numbers because, honestly, I didn't believe at the first place. I kept my code exactly the same: no special flags, no engine tweaks, just a plain &lt;code&gt;pd.read_csv()&lt;/code&gt;. Here is what happens when you stop using legacy NumPy objects:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Results are Actually Insane:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Memory Slashing&lt;/strong&gt;: In a mixed-type dataset of 10M rows, I saw a 53.2% drop in memory usage just by upgrading to version 3.0.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Text-Only DataFrame&lt;/strong&gt;: In my experiment with 10M pure string rows, memory usage fell from 658 MB to 267 MB, 59.4% drop!&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fr15fot7k6xzyje7xgu92.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fr15fot7k6xzyje7xgu92.png" alt="Memory consumption comparison" width="800" height="466"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Pragmatism &amp;gt; Perfection
&lt;/h2&gt;

&lt;p&gt;Is Pandas 3.0 perfect? No. But if you are working with text-heavy data, ignoring this upgrade is effectively choosing to pay for cloud resources you don't need.&lt;/p&gt;

&lt;p&gt;What’s your weirdest pandas "Out of Memory" story?This type of error never fails to bring me back to the early days of pandas dev 😁 &lt;/p&gt;

&lt;h2&gt;
  
  
  Links
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Repository: &lt;a href="https://github.com/robot010/Dev-Post-Code/tree/aef81da0767404f179dd9e9303de97283c278209/pandas-memory-saving" rel="noopener noreferrer"&gt;GitHub link&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>python</category>
      <category>pandas</category>
      <category>performance</category>
      <category>dataengineering</category>
    </item>
  </channel>
</rss>
