<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Piyush Choudhari</title>
    <description>The latest articles on DEV Community by Piyush Choudhari (@piyush_choudhari_a5b29f7f).</description>
    <link>https://dev.to/piyush_choudhari_a5b29f7f</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3425069%2F18dfe6a4-b639-4e16-af36-1ff03b933041.png</url>
      <title>DEV Community: Piyush Choudhari</title>
      <link>https://dev.to/piyush_choudhari_a5b29f7f</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/piyush_choudhari_a5b29f7f"/>
    <language>en</language>
    <item>
      <title>Making A Peer Review System for My Blogs Using Google-ADK &amp; Mem0</title>
      <dc:creator>Piyush Choudhari</dc:creator>
      <pubDate>Thu, 27 Nov 2025 04:18:24 +0000</pubDate>
      <link>https://dev.to/piyush_choudhari_a5b29f7f/making-a-peer-review-system-for-my-blogs-using-google-adk-mem0-3ejg</link>
      <guid>https://dev.to/piyush_choudhari_a5b29f7f/making-a-peer-review-system-for-my-blogs-using-google-adk-mem0-3ejg</guid>
      <description>&lt;h2&gt;
  
  
  My Process
&lt;/h2&gt;

&lt;p&gt;When writing my technical blogs, I have a &lt;strong&gt;very rigid process&lt;/strong&gt; I like to follow.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Research the topic I am interested in&lt;/li&gt;
&lt;li&gt;Create a &lt;strong&gt;structured research roadmap&lt;/strong&gt; I require to gain knowledge about the particular topic&lt;/li&gt;
&lt;li&gt;Go through the roadmap and try to &lt;strong&gt;learn/research the concepts&lt;/strong&gt; as in-depth as I can&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Start coding&lt;/strong&gt; whatever the relevant implementation for that topic is&lt;/li&gt;
&lt;li&gt;Finally, &lt;strong&gt;start writing the blog&lt;/strong&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;But one thing always bugs me, &lt;strong&gt;&lt;em&gt;"Is my blog factually correct and have I compromised the integrity of my blog anywhere?"&lt;/em&gt;.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;That leads me to frantically go through my sources repeatedly and asking tools like Perplexity about the blog. So, I had the idea to &lt;strong&gt;automate this process by a creating a Peer Review System.&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  What This System Does
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F47edtu6sxmqw7tl837j9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F47edtu6sxmqw7tl837j9.png" alt="flowchart" width="800" height="643"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. What the System Focuses On&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;It behaves like a technical editor, not just a grammar checker.&lt;/li&gt;
&lt;li&gt;It evaluates writing for:

&lt;ul&gt;
&lt;li&gt;Structure&lt;/li&gt;
&lt;li&gt;Clarity&lt;/li&gt;
&lt;li&gt;Factual accuracy&lt;/li&gt;
&lt;li&gt;Tone correctness&lt;/li&gt;
&lt;li&gt;Proper use of supporting evidence&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;The purpose is to help the writer produce content that is accurate, readable, and consistent.&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;2. How It Reviews Content&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The system doesn’t read content blindly.&lt;/li&gt;
&lt;li&gt;It uses uploaded reference files as a knowledge base.&lt;/li&gt;
&lt;li&gt;Relevant information from those files is retrieved using semantic search rather than keyword matching.&lt;/li&gt;
&lt;li&gt;If a statement appears in the writing:

&lt;ul&gt;
&lt;li&gt;The system first checks if it exists in the uploaded sources.&lt;/li&gt;
&lt;li&gt;If confirmed, the system becomes more confident in that claim.&lt;/li&gt;
&lt;li&gt;If not found, it triggers an external web-based fact check.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;3. How Memory Improves Review Quality&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Feedback adapts over time instead of resetting with each review.&lt;/li&gt;
&lt;li&gt;The system tracks repeated mistakes or patterns such as:

&lt;ul&gt;
&lt;li&gt;Missing citations&lt;/li&gt;
&lt;li&gt;Style inconsistencies&lt;/li&gt;
&lt;li&gt;Formatting issues&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;If the same issue shows up again, the system highlights it more firmly.&lt;/li&gt;

&lt;li&gt;This turns the review into a learning process rather than a one-time correction.&lt;/li&gt;

&lt;/ul&gt;




&lt;h3&gt;
  
  
  Screenshots:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://drive.google.com/file/d/1VTvQBkQ4753NbpVFlVPr5v6_kjH3SXjZ/view?usp=sharing" rel="noopener noreferrer"&gt;https://drive.google.com/file/d/1VTvQBkQ4753NbpVFlVPr5v6_kjH3SXjZ/view?usp=sharing&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://drive.google.com/file/d/14sfHrC0Lw0pvU4oX7U18Ydv61a8RNTu0/view?usp=sharing" rel="noopener noreferrer"&gt;https://drive.google.com/file/d/14sfHrC0Lw0pvU4oX7U18Ydv61a8RNTu0/view?usp=sharing&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  A Demo Peer Review Report:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://drive.google.com/file/d/1HQh5stEAj4tkh3E7Fyf1jOse52ZDw-bb/view?usp=sharing" rel="noopener noreferrer"&gt;https://drive.google.com/file/d/1HQh5stEAj4tkh3E7Fyf1jOse52ZDw-bb/view?usp=sharing&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Workflow
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F36or22x0d9cs9k0u8to7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F36or22x0d9cs9k0u8to7.png" alt="sequence" width="800" height="987"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Phase 1: Ingestion&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Fetches content from URLs if needed&lt;/li&gt;
&lt;li&gt;Loads past review history for the project&lt;/li&gt;
&lt;li&gt;Examines uploaded source documents&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Phase 2: Verification&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Identifies all factual claims in the content&lt;/li&gt;
&lt;li&gt;Searches uploaded sources for supporting evidence&lt;/li&gt;
&lt;li&gt;Uses Google search for external fact-checking&lt;/li&gt;
&lt;li&gt;Validates technical assertions and statistics&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Phase 3: Evaluation&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Assesses clarity, flow, and structure&lt;/li&gt;
&lt;li&gt;Checks accuracy against evidence&lt;/li&gt;
&lt;li&gt;Evaluates tone for target audience&lt;/li&gt;
&lt;li&gt;Compares to past feedback to track improvement&lt;/li&gt;
&lt;li&gt;Flags recurring issues with escalated severity&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Phase 4: Synthesis&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Generates structured report&lt;/li&gt;
&lt;li&gt;Provides evidence for all major issues&lt;/li&gt;
&lt;li&gt;References past feedback when relevant&lt;/li&gt;
&lt;li&gt;Gives actionable, constructive feedback&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Features
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;1. Model Flexibility&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You aren’t locked into one AI provider.&lt;/li&gt;
&lt;li&gt;Switching between models like Gemini, Claude, GPT, or Ollama only requires changing one environment variable.&lt;/li&gt;
&lt;li&gt;This gives control over:

&lt;ul&gt;
&lt;li&gt;Cost&lt;/li&gt;
&lt;li&gt;Performance&lt;/li&gt;
&lt;li&gt;Privacy&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;The review logic remains consistent across models.&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;2. Context-Aware Retrieval&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Uploaded reference files are stored in a vector database.&lt;/li&gt;
&lt;li&gt;The system breaks them into chunks, embeds them, and indexes them for efficient search.&lt;/li&gt;
&lt;li&gt;During review, it retrieves relevant sections using semantic similarity rather than simple keyword matching.&lt;/li&gt;
&lt;li&gt;This helps the system understand meaning, not just matching exact text.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;3. Automated Fact Verification&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;When a claim isn’t supported by uploaded sources, the system escalates verification.&lt;/li&gt;
&lt;li&gt;A separate search agent performs a structured web lookup.&lt;/li&gt;
&lt;li&gt;The goal is not to rewrite content, but to confirm whether the information is reliable and accurate.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;4. Built-In Memory&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The system remembers past reviews and writing patterns.&lt;/li&gt;
&lt;li&gt;If a mistake repeats, the system identifies it as a recurring issue.&lt;/li&gt;
&lt;li&gt;Instead of pointing it out repeatedly at the same level, the feedback becomes stronger and more specific.&lt;/li&gt;
&lt;li&gt;This encourages long-term improvement rather than one-off corrections.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Limitations
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;The verification is only as good as the model plus the search results&lt;/li&gt;
&lt;li&gt;Source reliability isn’t enforced&lt;/li&gt;
&lt;li&gt;Web search can surface low quality or outdated material&lt;/li&gt;
&lt;li&gt;The model is still the final judge. It can misinterpret sources, over trust weak evidence, or fabricate justification&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Implementation: &lt;a href="https://github.com/capybara-brain346/peer-review-agent" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;
&lt;/h3&gt;

</description>
      <category>ai</category>
      <category>agents</category>
      <category>automation</category>
    </item>
    <item>
      <title>Training a Mixture-of-Experts Router</title>
      <dc:creator>Piyush Choudhari</dc:creator>
      <pubDate>Mon, 17 Nov 2025 13:54:25 +0000</pubDate>
      <link>https://dev.to/piyush_choudhari_a5b29f7f/training-a-mixture-of-experts-router-5b4p</link>
      <guid>https://dev.to/piyush_choudhari_a5b29f7f/training-a-mixture-of-experts-router-5b4p</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fddcxf5nb902zpkj62oea.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fddcxf5nb902zpkj62oea.png" alt="moe-implementation" width="800" height="326"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Since Deepseek-MoE introduced the MoE architecture, I was aware of it and saw it's adoption across the open source and proprietary model providers. But I never tried to understand the idea deeper. The idea that you can expand a model’s capacity without sending every token through a huge feed-forward block felt very interesting. &lt;/p&gt;

&lt;p&gt;I tried to write the whole thing myself, from data loading and tokenization to a GPT-style transformer with optional MoE layers. This included the dataset pipeline, transformer blocks, routing logic, expert modules, and a training loop that tracked timing, throughput, losses, and expert usage. &lt;/p&gt;

&lt;p&gt;This blog walks through what I learned, how each component fits together, and the results that stood out, with the hope that these insights help anyone curious about MoE models or planning to build one.&lt;/p&gt;




&lt;h2&gt;
  
  
  Project Plan
&lt;/h2&gt;

&lt;p&gt;Before writing any code, I outlined the system I wanted. I needed a modular setup that let me swap dense layers for MoE layers, try different routing strategies, and run controlled comparisons without constant rewrites. That meant keeping data, model, MoE components, and training logic cleanly separated.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbtph3633ajd32uaj2us6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbtph3633ajd32uaj2us6.png" alt="project plan" width="800" height="393"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I began with the dataset pipeline. A steady source of tokenized text is essential for any language model experiment, and I wanted the option to fall back to synthetic data. After that, I focused on the model architecture. I planned to build a small GPT-style transformer first, since it provided a stable baseline and a familiar structure to extend with MoE layers. The goal was to keep the dense path intact while making the MoE path a drop-in replacement so both versions could be compared under identical conditions.&lt;/p&gt;

&lt;p&gt;Next came the MoE module, which required the most iteration. I wanted per-token routing, top k selection, load balancing, and expert statistics without creating a messy forward pass. I mapped out how the router, experts, and auxiliary loss would interact and built an interface that let transformer blocks treat dense and MoE layers the same.&lt;/p&gt;

&lt;p&gt;The final piece was the training loop. I needed detailed metrics: throughput, timing, auxiliary losses, temperature schedules, and expert usage. The Trainer class would handle epochs, collect metrics, and coordinate evaluation so experiments remained consistent.&lt;/p&gt;

&lt;p&gt;With the components defined, the plan was straightforward: build the dataset module, implement the transformer, add the MoE layer, wire everything in the Trainer, and run dense and MoE configurations under a shared framework.&lt;/p&gt;




&lt;h2&gt;
  
  
  Implementation
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9ixmz84ckvn0wfmrrbag.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9ixmz84ckvn0wfmrrbag.png" alt="implementation" width="800" height="331"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Once the plan was set, the next step was turning each idea into a working module. I wanted the codebase to feel like a compact training stack with clear boundaries. That shaped how the transformer, MoE layer, and Trainer were built.&lt;/p&gt;

&lt;p&gt;The transformer came first, following a standard GPT-style decoder with token embeddings, positional embeddings, a stack of self-attention blocks, and a final projection. The key feature was a pluggable feed-forward sublayer. If a layer index matched an MoE position, the dense FFN was swapped for an MoE layer through a shared interface. This made it easy to alternate between dense and sparse configurations without altering the architecture.&lt;/p&gt;

&lt;p&gt;The MoE layer required the most careful engineering. Routing occurs per token, so the implementation flattens batch and sequence dimensions, applies a linear router, then uses a temperature scaled softmax to produce expert probabilities. Tokens pick top experts, pass through identical feed-forward experts, and are recombined with routing weights. The layer also tracks usage, probabilities, and entropy, which was crucial for spotting imbalance and specialization patterns.&lt;/p&gt;

&lt;p&gt;With the model ready, the Trainer handled timing, throughput, auxiliary losses, and temperature schedules. It recorded detailed metrics, measured forward and backward phases separately, and supported early stopping. Together, these components formed a focused environment for comparing dense and MoE models and revealing their trade-offs.&lt;/p&gt;




&lt;h2&gt;
  
  
  Dataset
&lt;/h2&gt;

&lt;p&gt;For these experiments I needed a dataset that was structured enough to reveal real modeling behavior yet small enough for fast iteration. WikiText-2 fit well. It contains high-quality English text from Wikipedia, offering natural sentence structures, topic shifts, and long-range dependencies that make model behavior easy to inspect.&lt;/p&gt;

&lt;p&gt;I kept the raw text but used a custom tokenization pipeline instead of the original large vocabulary. I mapped everything into a fixed vocabulary of 10000 tokens. This kept the embedding matrix small and made the model lighter, while also shifting the dataset’s statistics. A smaller vocabulary increases sequence length due to more subword splits and raises the frequency of common tokens. That steeper distribution created clearer patterns early in training and made capacity differences between dense and MoE variants easier to observe.&lt;/p&gt;

&lt;p&gt;The dataset is stored in Parquet format and loaded with Polars. After tokenization, it produces about 2.1 million training tokens and about 217000 validation tokens. These totals stay similar after remapping, although some lines become longer and the token distribution grows more concentrated.&lt;/p&gt;

&lt;p&gt;A streaming dataset slices training sequences with sliding windows, keeping memory use low and avoiding preprocessing. I also added a synthetic fallback for development. Overall, WikiText-2 with a 10000-token vocabulary offered a realistic and efficient environment for comparing dense and MoE models.&lt;/p&gt;




&lt;h2&gt;
  
  
  MoE vs Dense: Experiment Results and Recommendations
&lt;/h2&gt;

&lt;h3&gt;
  
  
  TL;DR
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fddqjgb054uxmtt8jfhjb.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fddqjgb054uxmtt8jfhjb.jpg" alt="summary" width="800" height="578"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The dense model achieved the best validation loss and the highest training throughput. It remains the strongest performer for this setup when evaluating generalization versus raw speed.&lt;/li&gt;
&lt;li&gt;Mixture-of-experts (MoE) variants increased model capacity (roughly 17.3M parameters vs 9.9M for the dense model) but introduced substantial runtime overhead, primarily in the backward pass. That overhead reduced tokens/sec compared with the dense baseline.&lt;/li&gt;
&lt;li&gt;Among MoE variants, &lt;code&gt;throughput_opt&lt;/code&gt; substantially reduced backward cost compared with the other MoEs and delivered the best throughput among MoEs, but it did not match the dense model in tokens/sec or in best validation loss.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;top2&lt;/code&gt; reached the lowest training loss but the worst validation loss, suggesting overfitting or instability in gating/generalization.&lt;/li&gt;
&lt;li&gt;Routing entropy and per-expert usage indicate reasonably balanced expert assignment across experiments, but layer- and experiment-level differences remain and may explain some of the generalization differences.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Key numbers (single-epoch timing, parameters, best validation loss, avg throughput)
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Params&lt;/th&gt;
&lt;th&gt;Best validation loss&lt;/th&gt;
&lt;th&gt;Avg throughput (tokens/sec)&lt;/th&gt;
&lt;th&gt;Forward (s)&lt;/th&gt;
&lt;th&gt;Backward (s)&lt;/th&gt;
&lt;th&gt;Optimizer (s)&lt;/th&gt;
&lt;th&gt;Total per-epoch (s)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Dense&lt;/td&gt;
&lt;td&gt;9,924,608&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;9.3698&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;68,825&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;0.40&lt;/td&gt;
&lt;td&gt;0.20&lt;/td&gt;
&lt;td&gt;0.03&lt;/td&gt;
&lt;td&gt;0.63&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;MoE-baseline&lt;/td&gt;
&lt;td&gt;17,286,656&lt;/td&gt;
&lt;td&gt;9.3809&lt;/td&gt;
&lt;td&gt;44,976&lt;/td&gt;
&lt;td&gt;0.75&lt;/td&gt;
&lt;td&gt;1.17&lt;/td&gt;
&lt;td&gt;0.05&lt;/td&gt;
&lt;td&gt;1.97&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;MoE-top2&lt;/td&gt;
&lt;td&gt;17,286,656&lt;/td&gt;
&lt;td&gt;9.3921&lt;/td&gt;
&lt;td&gt;37,387&lt;/td&gt;
&lt;td&gt;0.90&lt;/td&gt;
&lt;td&gt;1.47&lt;/td&gt;
&lt;td&gt;0.05&lt;/td&gt;
&lt;td&gt;2.42&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;MoE-strong_reg&lt;/td&gt;
&lt;td&gt;17,286,656&lt;/td&gt;
&lt;td&gt;9.3798&lt;/td&gt;
&lt;td&gt;44,243&lt;/td&gt;
&lt;td&gt;0.76&lt;/td&gt;
&lt;td&gt;1.19&lt;/td&gt;
&lt;td&gt;0.05&lt;/td&gt;
&lt;td&gt;2.00&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;MoE-throughput_opt&lt;/td&gt;
&lt;td&gt;17,286,656&lt;/td&gt;
&lt;td&gt;9.3870&lt;/td&gt;
&lt;td&gt;54,784&lt;/td&gt;
&lt;td&gt;0.56&lt;/td&gt;
&lt;td&gt;0.96&lt;/td&gt;
&lt;td&gt;0.03&lt;/td&gt;
&lt;td&gt;1.55&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Training and validation dynamics
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fq9rtzeziudyy734jwigh.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fq9rtzeziudyy734jwigh.png" alt="training loss" width="800" height="463"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxy53ugo4yykw5ylh8wew.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxy53ugo4yykw5ylh8wew.png" alt="validation loss" width="800" height="463"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Training loss&lt;/strong&gt;: All variants show steadily decreasing training loss. &lt;code&gt;MoE-top2&lt;/code&gt; reaches the lowest training loss across epochs, indicating higher capacity or faster training fit.&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Validation loss&lt;/strong&gt;: Dense achieves the lowest validation loss (9.3698). MoE variants either match or exceed (worse) the dense validation loss:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;MoE-baseline&lt;/code&gt; and &lt;code&gt;MoE-strong_reg&lt;/code&gt; are close to each other and only marginally worse than dense.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;MoE-top2&lt;/code&gt; exhibits the largest gap and a clear upward trend in validation loss after epoch 4, indicating overfitting or gating instability.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;MoE-throughput_opt&lt;/code&gt; shows stable validation behavior but does not surpass dense.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;&lt;p&gt;Interpretation: The MoE configurations give higher representational capacity but require careful gating/regularization and optimization to realize generalization benefits. Without such tuning, larger capacity can overfit or destabilize validation performance.&lt;/p&gt;&lt;/li&gt;

&lt;/ul&gt;

&lt;h3&gt;
  
  
  Throughput and timing breakdown
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1c3oawsgkswxlnyeig0t.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1c3oawsgkswxlnyeig0t.png" alt="throughput" width="800" height="463"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;The dominant cost for MoE models is the backward pass. Backward times:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;MoE-top2&lt;/code&gt;: 1.47 s&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;MoE-baseline&lt;/code&gt;: 1.17 s&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;MoE-strong_reg&lt;/code&gt;: 1.19 s&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;MoE-throughput_opt&lt;/code&gt;: 0.96 s&lt;/li&gt;
&lt;li&gt;Dense: 0.20 s&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;&lt;p&gt;&lt;code&gt;throughput_opt&lt;/code&gt; reduced the backward cost significantly compared with other MoEs, producing the best MoE throughput (54,784 tokens/sec), but still below dense.&lt;/p&gt;&lt;/li&gt;

&lt;li&gt;&lt;p&gt;Total per-epoch wall-clock is smallest for dense (0.63s) and largest for &lt;code&gt;top2&lt;/code&gt; (2.42s).&lt;/p&gt;&lt;/li&gt;

&lt;li&gt;&lt;p&gt;Interpretation: MoE overheads are chiefly in expert-specific gradient/communication during backward. Optimizations that reduce communication or reduce expert work in backward propagate directly to throughput gains (as &lt;code&gt;throughput_opt&lt;/code&gt; demonstrates).&lt;/p&gt;&lt;/li&gt;

&lt;/ul&gt;

&lt;h3&gt;
  
  
  Expert routing behavior: usage and entropy
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgf5hkku4l149z8169g83.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgf5hkku4l149z8169g83.jpg" alt="expert entropy" width="800" height="338"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Entropy&lt;/strong&gt;: Routing entropy per layer is close to the theoretical maximum (max ≈ 2.08 for 8 experts). That indicates routing is using many experts rather than collapsing to a single expert. Specific observations:

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;top2&lt;/code&gt; layer 2 entropy reaches very close to max early, which matches its aggressive expert usage behavior.&lt;/li&gt;
&lt;li&gt;Layer 4 entropies are slightly lower and more variable across experiments.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;strong&gt;Per-expert usage&lt;/strong&gt;: Usage bars across models and layers show modest deviations from perfect uniformity (uniform = 12.5% for 8 experts). Most experiments show usage within roughly 10.6% to 14.7% per expert. A few spots show underused experts (for example &lt;code&gt;throughput_opt&lt;/code&gt; layer 4 had some experts near 9.8–9.9%).&lt;/li&gt;

&lt;li&gt;Interpretation: Routing is generally balanced, but small non-uniformities exist and could cause local specialization or slight load imbalance. Very skewed usage might lead to undertrained experts and affect generalization.&lt;/li&gt;

&lt;/ul&gt;

&lt;h3&gt;
  
  
  Trade-offs and interpretation
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Dense vs MoE capacity&lt;/strong&gt;: MoE increases parameter count substantially but this did not translate into better validation loss in these runs. Denser capacity alone is not sufficient to improve generalization.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Throughput trade-offs&lt;/strong&gt;: Dense model is faster and achieves better validation performance. For these experiments, MoE costs (especially backward step) made them slower despite higher capacity. &lt;code&gt;throughput_opt&lt;/code&gt; shows that MoE overhead can be reduced but not fully eliminated.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;MoE-top2 anomaly&lt;/strong&gt;: The &lt;code&gt;top2&lt;/code&gt; variant fits training data best but generalizes worst. This suggests gating choices (top-2 routing) can increase overfitting risk or cause training instability unless balanced with stronger regularization or gating temperature tuning.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Balancing and entropy&lt;/strong&gt;: Entropy values close to the max indicate gating is not collapsing, which is good for utilization. However, small routing imbalances can still impact performance. Regularization techniques to encourage balanced loads may help.&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Implementation: &lt;a href="https://github.com/capybara-brain346/moe-router" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Working through this project gave me a clearer sense of how MoE models behave in practice. I expected the extra capacity from experts to translate quickly into better validation performance, but the results were more nuanced. The dense baseline stayed the most stable and consistent in both speed and generalization, which reinforced the idea that complexity only helps when training dynamics are tuned to support it.&lt;/p&gt;

&lt;p&gt;The MoE experiments showed where the real challenges appear. Routing adds flexibility but also brings noise, imbalance, and overhead that a standard transformer avoids. Watching how validation loss shifted under different routing strategies, or how backward times rose as soon as experts were introduced, clarified why production MoE systems depend on tight engineering and regularization. Even in a small setup, these effects surfaced immediately.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>gpt3</category>
      <category>mixtureofexperts</category>
    </item>
    <item>
      <title>Building a Vector Database from Scratch - CapybaraDB</title>
      <dc:creator>Piyush Choudhari</dc:creator>
      <pubDate>Tue, 11 Nov 2025 02:45:42 +0000</pubDate>
      <link>https://dev.to/piyush_choudhari_a5b29f7f/building-a-vector-database-from-scratch-capybaradb-ek8</link>
      <guid>https://dev.to/piyush_choudhari_a5b29f7f/building-a-vector-database-from-scratch-capybaradb-ek8</guid>
      <description>&lt;h2&gt;
  
  
  &lt;a href="https://www.piyushchoudhari.me/blog/Building-A-Vector-Database-from-Scratch-CapybaraDB#introduction" rel="noopener noreferrer"&gt;&lt;strong&gt;Introduction&lt;/strong&gt;&lt;/a&gt;
&lt;/h2&gt;

&lt;p&gt;Vector databases are one of the most popular and widely used systems in the tech industry. Their market was valued at ≈2.5 billion in 2024 and is projected to &amp;gt;3 billion in 2025. Over 70% of all organizations investing/implementing AI use vector databases for searching and embedding.&lt;/p&gt;

&lt;p&gt;I have used vector databases in multiple use cases and projects. Be it RAG, searching and filtering documents or even feeding context to agents. After using multiple databases like FAISS, ChromaDB, Pinecone and pgvector, I was fascinated by vector databases and their internal workings.&lt;/p&gt;

&lt;p&gt;Hence, I decided to implement one myself.&lt;/p&gt;

&lt;p&gt;CapybaraDB, it is a lightweight vector database implementation, built from scratch in Python:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;It can perform semantic search using sentence-transformers for embeddings.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;It supports built-in token-based chunking.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;CUDA acceleration.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Precision control (float32, float16, binary).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;.npz file storage for persistance.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcpf7o3yreq4vmr31325w.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcpf7o3yreq4vmr31325w.png" alt="capybaradb" width="800" height="622"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;a href="https://www.piyushchoudhari.me/blog/Building-A-Vector-Database-from-Scratch-CapybaraDB#what-is-a-vector-database" rel="noopener noreferrer"&gt;&lt;strong&gt;What is a Vector Database?&lt;/strong&gt;&lt;/a&gt;
&lt;/h2&gt;

&lt;p&gt;A vector database is a very special type of database which is very efficient in storing and searching dimensional vector embeddings. Embeddings are basically numerical representations of data like text, images, videos, audio, etc. In terms of structure, these embeddings are made up of array of floating point numbers representing the direction and magnitude of the generated vector.&lt;/p&gt;

&lt;p&gt;A traditional database searches for exact match for the query entered, but vector databases find items by measuring the distance/difference between the query vector and embedded vectors inside the multidimensional space. Metrics like, euclidean distance or cosine similiarity can be used to measure distances between the vectors.&lt;/p&gt;

&lt;p&gt;They're essential for modern AI applications including semantic search (finding meaning, not just keywords), recommendation systems, RAG (Retrieval Augmented Generation) for chatbots, image similarity search, and anomaly detection.&lt;/p&gt;

&lt;p&gt;Popular examples include Pinecone, Weaviate, Milvus, Qdrant, and Chroma. They've become crucial infrastructure as AI applications need to search through millions of embeddings in milliseconds while maintaining accuracy.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fy7oauwos37uqu3q6k5uz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fy7oauwos37uqu3q6k5uz.png" alt="vector db illustration" width="800" height="660"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;a href="https://www.piyushchoudhari.me/blog/Building-A-Vector-Database-from-Scratch-CapybaraDB#design-philosophy" rel="noopener noreferrer"&gt;&lt;strong&gt;Design Philosophy&lt;/strong&gt;&lt;/a&gt;
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;Simplicity&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;* A "toy" vector db implementation, aiming for minimal complexity

* Straightforward APIs (`add_document`, `search`, `get_document`)

* Minimal config to get started
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;ol&gt;
&lt;li&gt;Flexibilty&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;* Utility support for multiple file formats

* Configurable precision levels (float32, float16 and binary)

* Choice to keep in-memory store or on disk

* GPU support
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;ol&gt;
&lt;li&gt;Minimal dependencies&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;* Core dependencies limited to essential libs

* Lightweight footprint for prototyping and learning
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;ol&gt;
&lt;li&gt;Educational focus&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;* Demonstrating fundamental vector database concepts
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h2&gt;
  
  
  &lt;a href="https://www.piyushchoudhari.me/blog/Building-A-Vector-Database-from-Scratch-CapybaraDB#metrics--benchmarks" rel="noopener noreferrer"&gt;&lt;strong&gt;Metrics &amp;amp; Benchmarks&lt;/strong&gt;&lt;/a&gt;
&lt;/h2&gt;
&lt;h3&gt;
  
  
  &lt;a href="https://www.piyushchoudhari.me/blog/Building-A-Vector-Database-from-Scratch-CapybaraDB#indexing-performance" rel="noopener noreferrer"&gt;&lt;strong&gt;Indexing Performance&lt;/strong&gt;&lt;/a&gt;
&lt;/h3&gt;

&lt;p&gt;Data source: &lt;a href="https://github.com/capybara-brain346/capybaradb/blob/main/benchmark_results/indexing_performance.json?utm_source=www.piyushchoudhari.me&amp;amp;utm_medium=portfolio_website&amp;amp;utm_campaign=referral" rel="noopener noreferrer"&gt;&lt;strong&gt;benchmark_results/indexing_performance.json&lt;/strong&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Document counts tested: 10, 50, 100, 500, 1000&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Total times (s): 0.138, 1.015, 2.388, 23.126, 76.331&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Average time per doc (s): 0.0138, 0.0203, 0.0239, 0.0463, 0.0763&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Storage times remain small relative to embedding time even at 1k docs (≈0.122 s)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Index size (MB): 0.020, 0.089, 0.174, 0.859, 1.715&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Peak memory (MB): ~2.2–65.4 across scales&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Key takeaways:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Embedding dominates total indexing time. Storage overhead is negligible in comparison.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Linear growth with dataset size; average time per document rises as batches get larger and memory pressure appears.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Index size scales linearly and remains compact for thousands of chunks.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Refer to &lt;code&gt;benchmark_results/indexing_performance.png&lt;/code&gt; for the trend lines and &lt;code&gt;indexing_performance_breakdown.png&lt;/code&gt; for stacked time components.&lt;/p&gt;


&lt;h3&gt;
  
  
  &lt;a href="https://www.piyushchoudhari.me/blog/Building-A-Vector-Database-from-Scratch-CapybaraDB#query-performance" rel="noopener noreferrer"&gt;&lt;strong&gt;Query Performance&lt;/strong&gt;&lt;/a&gt;
&lt;/h3&gt;

&lt;p&gt;Data source: &lt;a href="https://github.com/capybara-brain346/capybaradb/blob/main/benchmark_results/query_performance.json?utm_source=www.piyushchoudhari.me&amp;amp;utm_medium=portfolio_website&amp;amp;utm_campaign=referral" rel="noopener noreferrer"&gt;&lt;strong&gt;benchmark_results/query_performance.json&lt;/strong&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Dataset sizes tested: 100, 500, 1000, 2500, 5000&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Average query latency (ms): 7.79, 7.54, 9.10, 8.52, 8.45&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Throughput (qps): 128.3, 132.6, 109.9, 117.4, 118.3&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;p50 latency (ms): 7.45–8.79&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;p95 latency (ms): 10.09–12.01&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;p99 latency (ms): 11.80–16.39&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Breakdown (avg):&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Embedding time (ms): ~3.87–4.53&lt;/li&gt;
&lt;li&gt;Retrieval time (ms): ~3.50–4.57&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Observations:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Latency remains stable and low (≈7–9 ms on average) from 100 to 5000 vectors for top-k search, reflecting efficient vectorized exact search.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Throughput remains &amp;gt;100 qps at all tested sizes.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The split between query embedding and retrieval remains balanced; both contribute roughly half of total latency.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Note: one anomalous value appears in &lt;code&gt;min_latency_ms&lt;/code&gt; at 500 (-524.27 ms). This is a measurement artifact and should be ignored; distributional statistics (p50/p95/p99) are consistent and reliable.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Charts: &lt;code&gt;benchmark_results/query_performance.png&lt;/code&gt; and &lt;code&gt;query_performance_breakdown.png&lt;/code&gt; visualize latency distributions and the embedding vs retrieval split.&lt;/p&gt;


&lt;h3&gt;
  
  
  &lt;a href="https://www.piyushchoudhari.me/blog/Building-A-Vector-Database-from-Scratch-CapybaraDB#retrieval-quality-synthetic" rel="noopener noreferrer"&gt;&lt;strong&gt;Retrieval Quality (Synthetic)&lt;/strong&gt;&lt;/a&gt;
&lt;/h3&gt;

&lt;p&gt;Data source: &lt;a href="https://github.com/capybara-brain346/capybaradb/blob/main/benchmark_results/retrieval_quality_synthetic.json?utm_source=www.piyushchoudhari.me&amp;amp;utm_medium=portfolio_website&amp;amp;utm_campaign=referral" rel="noopener noreferrer"&gt;&lt;strong&gt;benchmark_results/retrieval_quality_synthetic.json&lt;/strong&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Configuration:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Dataset: Synthetic&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Chunk size: 512&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Quality metrics:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Precision@k: P@1=1.00, P@3≈0.756, P@5≈0.480, P@10≈0.240&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Recall@k: R@1≈0.433, R@3≈0.956, R@5=1.00, R@10=1.00&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;F1@k: F1@1=0.60, F1@3≈0.836, F1@5≈0.643, F1@10≈0.385&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;nDCG@k: nDCG@1=1.00, nDCG@3≈0.954, nDCG@5≈0.979, nDCG@10≈0.979&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Interpretation:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Very strong early precision (P@1=1.0) and nDCG across cutoffs indicate effective ranking of the most relevant content.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Near-perfect recall by k=5 shows top-5 captures essentially all relevant items.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;See &lt;code&gt;benchmark_results/retrieval_quality_synthetic.png&lt;/code&gt; for the quality curves.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Disclaimer&lt;/strong&gt; ⚠️: The documents in the dataset used here are relatively short (typically well under 512 tokens).&lt;br&gt;&lt;br&gt;
As a result, a chunk size of 512 effectively corresponds to &lt;em&gt;document-level embeddings&lt;/em&gt; — each document was indexed as a single vector.&lt;br&gt;&lt;br&gt;
While this setup is sufficient for small-scale or toy benchmarks, it may not generalize to longer documents where sub-document (passage-level) chunking becomes necessary for finer-grained retrieval.&lt;br&gt;&lt;br&gt;
Future evaluations will include experiments with smaller chunk sizes (e.g., 128–256) and longer document corpora to assess chunk-level retrieval effects.&lt;/p&gt;


&lt;h3&gt;
  
  
  &lt;a href="https://www.piyushchoudhari.me/blog/Building-A-Vector-Database-from-Scratch-CapybaraDB#what-these-results-mean-from-a-perspective-of-a-toy-database" rel="noopener noreferrer"&gt;&lt;strong&gt;What These Results Mean from a Perspective of A "Toy Database"&lt;/strong&gt;&lt;/a&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Small to medium collections (≤10k chunks): exact search is fast, simple, and accurate.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Low latency: median ≈7–9 ms per query with &amp;gt;100 qps throughput in benchmarks.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Strong quality: excellent early precision and recall on the synthetic task with coherent chunking.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Scales linearly: indexing and index size grow linearly; storage overhead is minimal compared to embedding time.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;
  
  
  &lt;a href="https://www.piyushchoudhari.me/blog/Building-A-Vector-Database-from-Scratch-CapybaraDB#core-architecture" rel="noopener noreferrer"&gt;&lt;strong&gt;Core Architecture&lt;/strong&gt;&lt;/a&gt;
&lt;/h2&gt;
&lt;h3&gt;
  
  
  &lt;a href="https://www.piyushchoudhari.me/blog/Building-A-Vector-Database-from-Scratch-CapybaraDB#1-baseindex" rel="noopener noreferrer"&gt;&lt;strong&gt;1. BaseIndex&lt;/strong&gt;&lt;/a&gt;
&lt;/h3&gt;

&lt;p&gt;The main in-memory (temp) data store of CapybaraDB is the &lt;code&gt;BaseIndex&lt;/code&gt; class, a data structure that holds:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;class BaseIndex:    documents: Dict[str, str]           # doc_id -&amp;gt; full document text    chunks: Dict[str, Dict[str, str]]   # chunk_id -&amp;gt; {text, doc_id}    vectors: Optional[torch.Tensor]     # All chunk embeddings    chunk_ids: List[str]                # Order-preserving chunk IDs    total_chunks: int    total_documents: int    embedding_dim: Optional[int]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This design keeps documents and their chunks separate while maintaining relationships through IDs. Why this separation? It allows us to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Return full documents when retrieving search results&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Track which chunk belongs to which document&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Maintain metadata without duplicating data&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  &lt;a href="https://www.piyushchoudhari.me/blog/Building-A-Vector-Database-from-Scratch-CapybaraDB#2-index" rel="noopener noreferrer"&gt;&lt;strong&gt;2. Index&lt;/strong&gt;&lt;/a&gt;
&lt;/h3&gt;

&lt;p&gt;The &lt;code&gt;Index&lt;/code&gt; class extends &lt;code&gt;BaseIndex&lt;/code&gt; with persistence:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;class Index(BaseIndex):    def __init__(self, storage_path: Optional[Path] = None):        super().__init__()        self.storage = Storage(storage_path)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is where of auto-loading happens. When you create an &lt;code&gt;Index&lt;/code&gt;, it checks if a persisted version exists and loads it automatically. This is of-course optional, if no path is provided, the db is kept in-memory.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;a href="https://www.piyushchoudhari.me/blog/Building-A-Vector-Database-from-Scratch-CapybaraDB#3-capybaradb-the-main-interface" rel="noopener noreferrer"&gt;&lt;strong&gt;3. CapybaraDB: The Main Interface&lt;/strong&gt;&lt;/a&gt;
&lt;/h3&gt;

&lt;p&gt;The &lt;code&gt;CapybaraDB&lt;/code&gt; class which exposes the API:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;class CapybaraDB:    def __init__(        self,        collection: Optional[str] = None,        chunking: bool = False,        chunk_size: int = 512,        precision: Literal["binary", "float16", "float32"] = "float32",        device: Literal["cpu", "cuda"] = "cpu",    ):
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You can create multiple collections, control chunking, adjust precision, and choose your compute device.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;a href="https://www.piyushchoudhari.me/blog/Building-A-Vector-Database-from-Scratch-CapybaraDB#embeddings" rel="noopener noreferrer"&gt;&lt;strong&gt;Embeddings&lt;/strong&gt;&lt;/a&gt;
&lt;/h2&gt;

&lt;h3&gt;
  
  
  &lt;a href="https://www.piyushchoudhari.me/blog/Building-A-Vector-Database-from-Scratch-CapybaraDB#1-architecture" rel="noopener noreferrer"&gt;&lt;strong&gt;1. Architecture&lt;/strong&gt;&lt;/a&gt;
&lt;/h3&gt;

&lt;p&gt;CapybaraDB uses &lt;code&gt;sentence-transformers/all-MiniLM-L6-v2&lt;/code&gt;, a lightweight transformer model that converts text into 384-dimensional vectors.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;class EmbeddingModel:    def __init__(        self,        precision: Literal["binary", "float16", "float32"] = "float32",        device: Literal["cpu", "cuda"] = "cpu",    ):        self.model_name = "sentence-transformers/all-MiniLM-L6-v2"        self.tokenizer = AutoTokenizer.from_pretrained(self.model_name)        self.model = AutoModel.from_pretrained(self.model_name).to(device)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The model is initialized once and reused for all embeddings, keeping operations fast and memory-efficient.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;a href="https://www.piyushchoudhari.me/blog/Building-A-Vector-Database-from-Scratch-CapybaraDB#2-the-embedding-process" rel="noopener noreferrer"&gt;&lt;strong&gt;2. The Embedding Process&lt;/strong&gt;&lt;/a&gt;
&lt;/h3&gt;

&lt;p&gt;When you call &lt;code&gt;embed()&lt;/code&gt; on a document, here's what happens:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;def embed(self, documents: Union[str, List[str]]) -&amp;gt; torch.Tensor:    encoded_documents = self.tokenizer(        documents, padding=True, truncation=True, return_tensors="pt"    )        with torch.no_grad():        model_output = self.model(**encoded_documents)        sentence_embeddings = self._mean_pooling(        model_output, encoded_documents["attention_mask"]    )    sentence_embeddings = F.normalize(sentence_embeddings, p=2, dim=1)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Tokenization: Using &lt;code&gt;tiktoken&lt;/code&gt; package and cl100kbase token encoding the chunks are tokenized&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Generation: Production of context-aware representations for each position&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Normalization: L2 norm to convert all vectors to unit length for accurate retrieval&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  &lt;a href="https://www.piyushchoudhari.me/blog/Building-A-Vector-Database-from-Scratch-CapybaraDB#3-precision-modes" rel="noopener noreferrer"&gt;&lt;strong&gt;3. Precision Modes&lt;/strong&gt;&lt;/a&gt;
&lt;/h3&gt;

&lt;p&gt;CapybaraDB supports three precision modes:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Float32 (default)&lt;/strong&gt;: Full precision, highest accuracy&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Float16&lt;/strong&gt;: Half precision, ~50% memory savings, minimal accuracy loss&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Binary&lt;/strong&gt;: Each dimension becomes 0 or 1, resulting in memory savings. The embedding process converts values &amp;gt; 0 to 1.0:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;if self.precision == "binary":    sentence_embeddings = (sentence_embeddings &amp;gt; 0).float()
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Binary embeddings use a scaled dot product during search to compensate for information loss.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;a href="https://www.piyushchoudhari.me/blog/Building-A-Vector-Database-from-Scratch-CapybaraDB#document-processing-pipeline" rel="noopener noreferrer"&gt;&lt;strong&gt;Document Processing Pipeline&lt;/strong&gt;&lt;/a&gt;
&lt;/h2&gt;

&lt;h3&gt;
  
  
  &lt;a href="https://www.piyushchoudhari.me/blog/Building-A-Vector-Database-from-Scratch-CapybaraDB#adding-a-document" rel="noopener noreferrer"&gt;&lt;strong&gt;Adding a Document&lt;/strong&gt;&lt;/a&gt;
&lt;/h3&gt;

&lt;p&gt;When you add a document, here's the full journey:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;def add_document(self, text: str, doc_id: Optional[str] = None) -&amp;gt; str:    if doc_id is None:        doc_id = str(uuid.uuid4())        self.index.documents[doc_id] = text    self.index.total_documents += 1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Step 1: ID Generation&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If no ID is provided, we generate a UUID. This ensures every document is uniquely identifiable.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 2: Chunking (Optional)&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If chunking is enabled, the document is split using token-based chunking:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;if self.chunking:    enc = tiktoken.get_encoding("cl100k_base")    token_ids = enc.encode(text)    chunks = []    for i in range(0, len(token_ids), self.chunk_size):        tok_chunk = token_ids[i : i + self.chunk_size]        chunk_text = enc.decode(tok_chunk)        chunks.append(chunk_text)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Why token-based chunking instead of character-based?&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Respects word boundaries&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Considers tokenizer structure&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Produces more semantically coherent chunks&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Works better with the embedding model&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Step 3: Create Chunks&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Each chunk gets its own UUID and is stored with metadata:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;for chunk in chunks:    chunk_id = str(uuid.uuid4())    self.index.chunks[chunk_id] = {"text": chunk, "doc_id": doc_id}    chunk_ids.append(chunk_id)    self.index.total_chunks += 1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Step 4: Generate Embeddings&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;All chunks are embedded in one batch:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;chunk_texts = [self.index.chunks[cid]["text"] for cid in chunk_ids]chunk_embeddings = self.model.embed(chunk_texts)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Batch processing is key to performance. Embedding 100 chunks together is much faster than 100 individual embeddings.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 5: Append to Vector Store&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This is where the vectors are added to the index:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;if self.index.vectors is None:    self.index.vectors = chunk_embeddings    self.index.chunk_ids = chunk_ids    self.index.embedding_dim = chunk_embeddings.size(1)else:    self.index.vectors = torch.cat(        [self.index.vectors, chunk_embeddings], dim=0    )    self.index.chunk_ids.extend(chunk_ids)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The first document creates the tensor. Subsequent documents are concatenated along the batch dimension.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 6: Persistence&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If not in-memory mode, the index is saved immediately:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;if not self.index.storage.in_memory:    self.index.save()
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This means you can add documents and they're persisted incrementally—no manual save needed!&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;a href="https://www.piyushchoudhari.me/blog/Building-A-Vector-Database-from-Scratch-CapybaraDB#the-search-engine" rel="noopener noreferrer"&gt;&lt;strong&gt;The Search Engine&lt;/strong&gt;&lt;/a&gt;
&lt;/h2&gt;

&lt;h3&gt;
  
  
  &lt;a href="https://www.piyushchoudhari.me/blog/Building-A-Vector-Database-from-Scratch-CapybaraDB#the-search-process" rel="noopener noreferrer"&gt;&lt;strong&gt;The Search Process&lt;/strong&gt;&lt;/a&gt;
&lt;/h3&gt;

&lt;p&gt;Here's how search works end-to-end:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;def search(self, query: str, top_k: int = 5):    if self.index.vectors is None:        return []        self.index.ensure_vectors_on_device(target_device)    indices, scores = self.model.search(query, self.index.vectors, top_k)        results = []    for idx, score in zip(indices.tolist(), scores.tolist()):        chunk_id = self.index.chunk_ids[idx]        chunk_info = self.index.chunks[chunk_id]        doc_id = chunk_info["doc_id"]                results.append({            "doc_id": doc_id,            "chunk_id": chunk_id,            "text": chunk_info["text"],            "score": score,            "document": self.index.documents[doc_id],        })        return results
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Step 1: Query Embedding&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The query text is embedded using the same model:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;def search(self, query: str, embeddings: torch.Tensor, top_k: int):    query_embedding = self.embed(query)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Step 2: Similarity Computation&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The similarity between query and all stored vectors is computed:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;if self.precision == "binary":    similarities = torch.matmul(        embeddings.float(),         query_embedding.t().float()    ) / query_embedding.size(1)else:    similarities = torch.matmul(embeddings, query_embedding.t())
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For each query, the system computes similarity with all stored vectors. In &lt;strong&gt;binary precision&lt;/strong&gt;, it performs a dot product between 0/1 vectors, scaled by the embedding dimension—yielding the fraction of matching active bits. In &lt;strong&gt;float precision&lt;/strong&gt;, normalized embeddings use standard cosine similarity via matrix multiplication with the query embedding.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 3: Top-K Selection&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;We use &lt;code&gt;torch.topk&lt;/code&gt; to find the most similar vectors:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;scores, indices = torch.topk(    similarities.squeeze(),    min(top_k, embeddings.size(0)))
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Step 4: Result Assembly&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;For each result, we reconstruct the full context by:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Looking up the chunk text&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Finding the parent document ID&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Retrieving the full document&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This gives you both the specific chunk that matched and the full document context.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;a href="https://www.piyushchoudhari.me/blog/Building-A-Vector-Database-from-Scratch-CapybaraDB#storage-and-persistence" rel="noopener noreferrer"&gt;&lt;strong&gt;Storage and Persistence&lt;/strong&gt;&lt;/a&gt;
&lt;/h2&gt;

&lt;h3&gt;
  
  
  &lt;a href="https://www.piyushchoudhari.me/blog/Building-A-Vector-Database-from-Scratch-CapybaraDB#the-storage-layer" rel="noopener noreferrer"&gt;&lt;strong&gt;The Storage Layer&lt;/strong&gt;&lt;/a&gt;
&lt;/h3&gt;

&lt;p&gt;The &lt;code&gt;Storage&lt;/code&gt; class handles persistence with NumPy's compressed NPZ format:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;def save(self, index) -&amp;gt; None:    data = {        "vectors": index.vectors.cpu().numpy(),        "chunk_ids": np.array(index.chunk_ids),        "chunk_texts": np.array([index.chunks[cid]["text"] for cid in index.chunk_ids]),        "chunk_doc_ids": np.array([index.chunks[cid]["doc_id"] for cid in index.chunk_ids]),        "doc_ids": np.array(list(index.documents.keys())),        "doc_texts": np.array(list(index.documents.values())),        "total_chunks": index.total_chunks,        "total_documents": index.total_documents,        "embedding_dim": index.embedding_dim or 0,    }        np.savez_compressed(self.file_path, **data)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Why NPZ?&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Compressed by default (saves space)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Efficient binary format&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Handles large arrays well&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Cross-platform and language-agnostic&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  &lt;a href="https://www.piyushchoudhari.me/blog/Building-A-Vector-Database-from-Scratch-CapybaraDB#in-memory-vs-persistent" rel="noopener noreferrer"&gt;&lt;strong&gt;In-Memory vs Persistent&lt;/strong&gt;&lt;/a&gt;
&lt;/h3&gt;

&lt;p&gt;CapybaraDB supports two modes:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;In-Memory&lt;/strong&gt;: No file path specified. Data stays in RAM, lost on exit.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Persistent&lt;/strong&gt;: File path specified. Data is saved to disk after each &lt;code&gt;add_document()&lt;/code&gt; call.&lt;/p&gt;

&lt;p&gt;This dual-mode design enables both temporary experiments (in-memory) and production use (persistent).&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;a href="https://www.piyushchoudhari.me/blog/Building-A-Vector-Database-from-Scratch-CapybaraDB#putting-it-all-together" rel="noopener noreferrer"&gt;&lt;strong&gt;Putting It All Together&lt;/strong&gt;&lt;/a&gt;
&lt;/h2&gt;

&lt;h3&gt;
  
  
  &lt;a href="https://www.piyushchoudhari.me/blog/Building-A-Vector-Database-from-Scratch-CapybaraDB#example-simple-document-search" rel="noopener noreferrer"&gt;&lt;strong&gt;Example: Simple Document Search&lt;/strong&gt;&lt;/a&gt;
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;from capybaradb.main import CapybaraDB # Initializedb = CapybaraDB(    collection="research_papers",    chunking=True,    chunk_size=512,    device="cuda") # Add documentsdoc1_id = db.add_document("Machine learning is transforming NLP...")doc2_id = db.add_document("Deep neural networks excel at image recognition...") # Searchresults = db.search("artificial intelligence", top_k=2) # Use resultsfor result in results:    print(f"Score: {result['score']:.4f}")    print(f"Matched text: {result['text'][:100]}...")    print(f"Full document: {result['document']}")    print("---")
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  &lt;a href="https://www.piyushchoudhari.me/blog/Building-A-Vector-Database-from-Scratch-CapybaraDB#conclusion" rel="noopener noreferrer"&gt;&lt;strong&gt;Conclusion&lt;/strong&gt;&lt;/a&gt;
&lt;/h2&gt;

&lt;p&gt;Implementation: &lt;a href="https://github.com/capybara-brain346/capybaradb.git?utm_source=www.piyushchoudhari.me&amp;amp;utm_medium=portfolio_website&amp;amp;utm_campaign=referral" rel="noopener noreferrer"&gt;&lt;strong&gt;GitHub&lt;/strong&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This implementation of CapybaraDB was purely for education purposes and my own learning. I had a great time figuring out the nitty-gritty details behind vector databases and will definitely take on more challenging implementations in the future.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>vectordatabase</category>
      <category>rag</category>
    </item>
    <item>
      <title>Be Curious About Your Compute</title>
      <dc:creator>Piyush Choudhari</dc:creator>
      <pubDate>Wed, 24 Sep 2025 13:03:39 +0000</pubDate>
      <link>https://dev.to/piyush_choudhari_a5b29f7f/be-curious-about-your-compute-2gfh</link>
      <guid>https://dev.to/piyush_choudhari_a5b29f7f/be-curious-about-your-compute-2gfh</guid>
      <description>&lt;h2&gt;
  
  
  Hardware Often Takes The Backseat
&lt;/h2&gt;

&lt;p&gt;I've been a software guy throughout my journey and &lt;strong&gt;rarely I've tried to lift the curtain up and focus on the hardware&lt;/strong&gt;. Those rare times probably happened during my IOT study sessions at my engineering course.&lt;/p&gt;

&lt;p&gt;Most of the blockers I've ever faced during my AI engineering journey, have been related to software. Broken drivers, unpatched source code, outdated libraries and of course classic python quirks. &lt;/p&gt;

&lt;p&gt;But one day I faced a blocker, obviously not knowing that it was hardware related. At my internship, I was trying to &lt;strong&gt;draw inference from an Automatic1111 server on AWS&lt;/strong&gt;, the server was equipped with two specific diffusion models, one was SDXL (very heavy) another was an SD1.5 (lighter). However &lt;strong&gt;drawing inference from one model caused the next inference from the second model to be very slow&lt;/strong&gt;. My first reaction was that it was because of caching. &lt;/p&gt;

&lt;p&gt;But after debugging this issue, I learned that this was &lt;strong&gt;expected behavior by Automatic1111 module&lt;/strong&gt;. It loaded a model into VRAM and "kept it hot" for fast inference then loaded the second model, hence resulting in slow inference.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpyjo37u8v6l7q4t71pn8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpyjo37u8v6l7q4t71pn8.png" alt="my story" width="800" height="800"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;That day I learnt that understanding and knowing the relation between the software and hardware interactions goes a long way.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;"Not thinking about the hardware"&lt;/strong&gt; is something I observe a lot from peers and engineers. So, I had the idea to write this blog.&lt;/p&gt;




&lt;h2&gt;
  
  
  Types Of Available Hardware
&lt;/h2&gt;

&lt;h3&gt;
  
  
  CPU vs GPU vs TPU
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Aspect&lt;/th&gt;
&lt;th&gt;CPU&lt;/th&gt;
&lt;th&gt;GPU&lt;/th&gt;
&lt;th&gt;TPU&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Role&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;General-purpose processor, brain of the computer&lt;/td&gt;
&lt;td&gt;Originally for graphics, now massively parallel compute&lt;/td&gt;
&lt;td&gt;Custom ASIC by Google for tensor operations &amp;amp; deep learning&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Cores&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Few powerful cores (2–64, more in servers)&lt;/td&gt;
&lt;td&gt;Thousands of lightweight cores (A100 ~7,000+)&lt;/td&gt;
&lt;td&gt;Systolic arrays with Matrix Multiply Units (MXUs)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Execution Model&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Low-latency, sequential, strong branch handling&lt;/td&gt;
&lt;td&gt;SIMT/SIMD, warp scheduling, high throughput&lt;/td&gt;
&lt;td&gt;Data flows across systolic array; highly specialized instructions&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Memory&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Large cache hierarchy, moderate bandwidth&lt;/td&gt;
&lt;td&gt;High-bandwidth VRAM (HBM/GDDR), smaller caches, bulk data optimized&lt;/td&gt;
&lt;td&gt;HBM tightly coupled with compute; optimized for DL precision formats&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Strengths&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Versatile, single-thread performance, task switching&lt;/td&gt;
&lt;td&gt;Matrix/vector math, AI/ML training, rendering, high throughput&lt;/td&gt;
&lt;td&gt;Extremely efficient at AI training/inference, high perf-per-watt&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Weaknesses&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Limited parallelism, not efficient for massive matrix ops&lt;/td&gt;
&lt;td&gt;Poor at branch-heavy sequential code, higher latency, needs CUDA/OpenCL&lt;/td&gt;
&lt;td&gt;Not general-purpose, tied to Google ecosystem, less flexible&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;AI or any computationally expensive workload, &lt;strong&gt;requires extremely high throughput&lt;/strong&gt; as the calculations are extremely straight forward (matmul, gradient averages, dot products, etc...) unlike the calculations the CPU does which includes complex branching logic and minimal latency. Hence GPUs and TPUs are extremely popular choices.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgkgd9ecta4y4ersngdkc.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgkgd9ecta4y4ersngdkc.png" alt="cpu vs cpu vs tpu" width="800" height="425"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;GPUs, in particular, have extremely high bandwidths but unlike TPUs, &lt;strong&gt;they interface well using CUDA lib and Pytorch or TensorFlow also include high interoperability with them&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  Bottlenecks
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fd75scf2s9p7tbca44yaj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fd75scf2s9p7tbca44yaj.png" alt="Bottlenecks" width="689" height="267"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;It is normal to focus on metrics like TFLOPs. But a systems performance and turnaround time is often limited by how quickly data moves and can be accessed, not just how fast can it be processed.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Memory capacity (VRAM) and bandwidth are frequently more important than raw compute power&lt;/strong&gt;. A processor is inefficient if it sits idle waiting for data. VRAM capacity is a key constraint, as it determines whether a large model, such as an LLM, can fit onto a single GPU. For instance, a high-TFLOP consumer GPU like an RTX 4090 with 24 GB of VRAM cannot train models that require the 80 GB or more offered by datacenter GPUs. VRAM size also limits the training batch size, affecting efficiency.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;When scaling to multiple GPUs for larger models, &lt;strong&gt;the interconnect, the communication pathway between GPUs could become the main bottleneck&lt;/strong&gt;. Standard interconnects like PCIe have limited bandwidth (around 64 GB/s), which can get saturated when GPUs synchronize data, leading to diminishing returns when scaling beyond a few GPUs. In contrast, proprietary technologies like NVIDIA’s NVLink provide vastly superior bandwidth (up to 900 GB/s), resulting in efficient scaling for training massive foundation models.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The entire system can be constrained by &lt;strong&gt;slow data I/O from storage&lt;/strong&gt;. If data cannot be loaded from disks to the GPUs quickly enough, even the most powerful hardware will be wasted, creating a foundational bottleneck.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  Accelerators
&lt;/h2&gt;

&lt;p&gt;Modern AI accelerators are designed around &lt;strong&gt;specific philosophies&lt;/strong&gt; and each type of accelerator handle different types of workloads.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;NVIDIA H100&lt;/strong&gt; focuses on cutting-edge training with its &lt;strong&gt;Transformer Engine&lt;/strong&gt; and &lt;strong&gt;FP8 support&lt;/strong&gt;, demanding extreme bandwidth and power efficiency for massive LLMs.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Google TPU v5p&lt;/strong&gt; uses a &lt;strong&gt;systolic array (MXU)&lt;/strong&gt; for extreme efficiency in large-scale, matrix-heavy workloads, tightly coupled with Google’s distributed infrastructure.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AMD MI300&lt;/strong&gt; competes by integrating &lt;strong&gt;CPU and GPU components&lt;/strong&gt; into one package, offering a &lt;strong&gt;large unified memory pool&lt;/strong&gt; — attractive for diverse HPC and AI workloads where flexibility and capacity matter.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These differences reflect the &lt;strong&gt;architectural trade-offs&lt;/strong&gt; between general-purpose flexibility (GPU) and specialized efficiency (TPU), with AMD carving a hybrid path.&lt;/p&gt;

&lt;h3&gt;
  
  
  Accelerator Comparison
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Accelerator&lt;/th&gt;
&lt;th&gt;Memory&lt;/th&gt;
&lt;th&gt;Bandwidth&lt;/th&gt;
&lt;th&gt;Key Use Case&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;NVIDIA H100&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;80 GB HBM3&lt;/td&gt;
&lt;td&gt;~3.0 TB/s&lt;/td&gt;
&lt;td&gt;Cutting-edge LLM training, HPC workloads with extreme throughput needs&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;AMD MI300X&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;192 GB HBM3&lt;/td&gt;
&lt;td&gt;~5.3 TB/s&lt;/td&gt;
&lt;td&gt;Large-scale AI training, HPC, and very large models (fits datasets in memory)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Google TPU v5p&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;95 GB HBM2e per chip&lt;/td&gt;
&lt;td&gt;~2.77 TB/s&lt;/td&gt;
&lt;td&gt;Large-scale AI training/inference; matrix-heavy workloads in Google ecosystem&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Practical Tips
&lt;/h2&gt;

&lt;p&gt;To optimize AI workloads we need to focus on data movement too not just compute speed.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Profile your system:&lt;/strong&gt; The first step is to identify where the bottlenecks are. Is it memory capacity (VRAM), memory bandwidth, interconnect speed between GPUs, or slow data loading from storage?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Use mixed precision (AMP):&lt;/strong&gt; Modern GPUs have specialized Tensor Cores that accelerate FP16/BF16 computations. Using Automatic Mixed Precision (AMP) in frameworks like Pytorch significantly speeds up training by using lower precision for matrix maths while maintaining accuracy with FP32 for sensitive operations like loss calculation. This also reduces VRAM usage.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tune batch size and data loaders:&lt;/strong&gt; VRAM capacity limits your batch size. A larger batch can improve hardware utilization, but a small one may be forced by memory constraints. Ensure your data loading pipeline from storage to GPU is not the bottleneck, as even the fastest GPU is wasted if it's waiting for data.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Leverage optimized frameworks:&lt;/strong&gt; For multi-GPU training, use libraries like PyTorch's &lt;code&gt;DistributedDataParallel&lt;/code&gt; with NCCL backends, or advanced frameworks like DeepSpeed, to efficiently manage gradient synchronization, which is often a key bottleneck. These tools are crucial for scaling effectively.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;You don't need to be a GPU engineer or a chip designer to understand the nuances of hardware and the interaction between hardware and software. Only requirements are to keep the hardware in mind while designing system and to &lt;strong&gt;&lt;em&gt;Be Curious About Your Compute&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>hardware</category>
      <category>gpu</category>
      <category>tpu</category>
    </item>
    <item>
      <title>How I Built an Automated Social Media Workflow with LangGraph</title>
      <dc:creator>Piyush Choudhari</dc:creator>
      <pubDate>Sun, 21 Sep 2025 02:25:06 +0000</pubDate>
      <link>https://dev.to/piyush_choudhari_a5b29f7f/how-i-built-an-automated-social-media-workflow-with-langgraph-58jp</link>
      <guid>https://dev.to/piyush_choudhari_a5b29f7f/how-i-built-an-automated-social-media-workflow-with-langgraph-58jp</guid>
      <description>&lt;h2&gt;
  
  
  Why an automation?
&lt;/h2&gt;

&lt;p&gt;I recently started blogging, more specifically I have my interest in technical blogging. I enjoy going through papers, articles and GitHub repos which spike my interest. Hence leading me to the idea of posting about the same. But I realize, technical blogging can be &lt;strong&gt;more than just a medium to share my interests, it can be a powerful way to build up my personal brand.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Speaking of creating a personal brand, a lot goes into it, but in my simple view point it is &lt;strong&gt;only requires two things:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;High value and high impact content&lt;/li&gt;
&lt;li&gt;A great distribution strategy&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Social media is the "great distribution strategy".&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;But writing content optimized for social media, more specifically LinkedIn and X were &lt;strong&gt;not the areas I wanted to spend a lot of time in&lt;/strong&gt; and also me &lt;strong&gt;being an engineer, I was attracted to automating it away.&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  How it works?
&lt;/h2&gt;

&lt;h3&gt;
  
  
  &lt;em&gt;Central Idea&lt;/em&gt;
&lt;/h3&gt;

&lt;p&gt;At the heart of this project is a &lt;strong&gt;Langgraph workflow&lt;/strong&gt; that automates my content pipeline, which means, right from the idea stage to the draft and publish stage of my social media posts (not my blogs), the workflow takes care of the entire generation.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;em&gt;What kind of posts I want?&lt;/em&gt; &lt;/li&gt;
&lt;li&gt;
&lt;em&gt;What type of post I need depending on day of the week?&lt;/em&gt; &lt;/li&gt;
&lt;li&gt;&lt;em&gt;Is the content of the post legitimate and factual?&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;Does the post make any claims with no basis?&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;Does the post state any statements without proper citations?&lt;/em&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  &lt;em&gt;Why Langgraph?&lt;/em&gt;
&lt;/h3&gt;

&lt;p&gt;After using multiple tools for creating agents like &lt;strong&gt;LCEL, Crew AI and DSPY&lt;/strong&gt;. I felt Langgraph was extremely easy to follow in terms of the workflow as it &lt;strong&gt;maps very well to flowcharts and UMLs&lt;/strong&gt; we encounter on daily basis. Also, I am familiar with Langgraph as &lt;strong&gt;I used it extensively at my internship&lt;/strong&gt; to build agentic systems. &lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;em&gt;Flowchart&lt;/em&gt;
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8pb4gkund70zvrm56kj0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8pb4gkund70zvrm56kj0.png" alt="post automation flowchart" width="800" height="800"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;em&gt;Here’s a breakdown of the process:&lt;/em&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Capture &amp;amp; Research&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The workflow starts by capturing a raw idea (&lt;code&gt;capture_idea&lt;/code&gt;).&lt;/li&gt;
&lt;li&gt;It then pulls in notes and references from Obsidian (&lt;code&gt;obsidian_research&lt;/code&gt;), ensuring the idea is grounded in prior research.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;p&gt;&lt;strong&gt;Planning &amp;amp; Teaser Generation&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The &lt;code&gt;planner_agent&lt;/code&gt; structures the idea into a roadmap.&lt;/li&gt;
&lt;li&gt;Depending on the phase, the system may generate an early &lt;strong&gt;teaser post&lt;/strong&gt; (&lt;code&gt;teaser_generator&lt;/code&gt;) to share on Monday as a preview.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;p&gt;&lt;strong&gt;Drafting the Blog&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;If we’re in the drafting phase, the &lt;code&gt;blog_drafter&lt;/code&gt; creates a long-form draft.&lt;/li&gt;
&lt;li&gt;Once the final blog URL is available, the system scrapes the published content (&lt;code&gt;scraper&lt;/code&gt;) and builds a concise summary (&lt;code&gt;summarizer&lt;/code&gt;).&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;p&gt;&lt;strong&gt;Social Media Post Creation&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Using the blog summary, the workflow generates tailored LinkedIn posts (&lt;code&gt;final_post_generator&lt;/code&gt;) and X/Twitter posts (&lt;code&gt;x_generator&lt;/code&gt;).&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;p&gt;&lt;strong&gt;Validation &amp;amp; Review&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Posts are validated for structure, tone, and platform-fit (&lt;code&gt;validator&lt;/code&gt;).&lt;/li&gt;
&lt;li&gt;If issues are found, they are sent for peer review (&lt;code&gt;peer_reviewer&lt;/code&gt;) and, if necessary, refined by the &lt;code&gt;content_improver&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;The workflow limits improvements to 3 iterations to avoid endless loops.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;p&gt;&lt;strong&gt;Self-Evaluation &amp;amp; Recovery&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Before finalizing, posts undergo a self-check (&lt;code&gt;self_evaluator&lt;/code&gt;).&lt;/li&gt;
&lt;li&gt;If errors occur (e.g., missing blog URL, broken logic), a &lt;code&gt;recovery_agent&lt;/code&gt; steps in to fix or gracefully exit.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;p&gt;&lt;strong&gt;Completion&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Once validated, the system outputs &lt;strong&gt;ready-to-publish posts&lt;/strong&gt; for LinkedIn and X.&lt;/li&gt;
&lt;li&gt;If human oversight is required (e.g., controversial phrasing or ambiguous context), the workflow pauses and flags it for manual review.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;The workflow code snippet:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;create_workflow&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;workflow&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;StateGraph&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;AutomationState&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;workflow&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_node&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;capture_idea&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;capture_idea&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;workflow&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_node&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;obsidian_research&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;process_obsidian_content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;workflow&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_node&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;planner_agent&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;planner_agent&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;workflow&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_node&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;teaser_generator&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;teaser_generator&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;workflow&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_node&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;blog_drafter&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;blog_drafter&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;workflow&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_node&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;scraper&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;scrape_blog_content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;workflow&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_node&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;summarizer&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;generate_blog_summary&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;workflow&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_node&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;final_post_generator&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;generate_linkedin_posts&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;workflow&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_node&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;x_generator&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;generate_x_posts&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;workflow&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_node&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;validator&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;validate_posts&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;workflow&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_node&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;peer_reviewer&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;peer_review_agent&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;workflow&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_node&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content_improver&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;content_improver_agent&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;workflow&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_node&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;self_evaluator&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;self_evaluator&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;workflow&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_node&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;recovery_agent&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;recovery_agent&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;workflow&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set_entry_point&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;capture_idea&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;workflow&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_edge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;capture_idea&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;obsidian_research&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;workflow&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_edge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;obsidian_research&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;planner_agent&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;workflow&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_conditional_edges&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;planner_agent&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;should_generate_teaser&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;teaser_generator&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;teaser_generator&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;planner_agent&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;planner_agent&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;scraper&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;scraper&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;workflow&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_conditional_edges&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;teaser_generator&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;should_generate_blog_draft&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;blog_drafter&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;blog_drafter&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;planner_agent&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;planner_agent&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;scraper&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;scraper&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;workflow&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_conditional_edges&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;blog_drafter&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;should_scrape_blog&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;scraper&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;scraper&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;END&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;END&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;workflow&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_edge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;scraper&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;summarizer&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;workflow&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_edge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;summarizer&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;final_post_generator&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;workflow&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_edge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;final_post_generator&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;x_generator&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;workflow&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_conditional_edges&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;x_generator&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;should_validate_or_end&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;validator&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;validator&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;recovery_agent&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;recovery_agent&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;END&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;END&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;workflow&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_conditional_edges&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;validator&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;should_improve_or_evaluate&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;peer_reviewer&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;peer_reviewer&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;self_evaluator&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;self_evaluator&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;recovery_agent&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;recovery_agent&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;workflow&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_conditional_edges&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;peer_reviewer&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;should_improve_or_end&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content_improver&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content_improver&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;self_evaluator&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;self_evaluator&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;recovery_agent&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;recovery_agent&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;workflow&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_edge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content_improver&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;validator&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;workflow&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_conditional_edges&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;self_evaluator&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;should_loop_or_end&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;validator&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;validator&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;recovery_agent&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;recovery_agent&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;END&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;END&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;workflow&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_edge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;recovery_agent&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;END&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;workflow&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;compile&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  &lt;em&gt;Entire code for my automation:&lt;/em&gt; &lt;a href="https://github.com/capybara-brain346/post-automation.git" rel="noopener noreferrer"&gt;&lt;u&gt;GitHub&lt;/u&gt;&lt;/a&gt;
&lt;/h3&gt;




&lt;h2&gt;
  
  
  Closing Thoughts
&lt;/h2&gt;

&lt;p&gt;It was very fun building out this automation. Obviously it can't be deemed a boon for my productivity in such a short period of time, but I will be keeping a &lt;strong&gt;close eye on gains (or losses) as a result of this automation, which I certainly will be posting about.&lt;/strong&gt;&lt;/p&gt;

</description>
      <category>langgraph</category>
      <category>agents</category>
      <category>aiml</category>
      <category>automation</category>
    </item>
  </channel>
</rss>
