<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Ma Uttaram</title>
    <description>The latest articles on DEV Community by Ma Uttaram (@ma_uttaram_f822b3b02ec546).</description>
    <link>https://dev.to/ma_uttaram_f822b3b02ec546</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3817033%2Fa22df131-c16d-488d-8322-e273f2357c0e.png</url>
      <title>DEV Community: Ma Uttaram</title>
      <link>https://dev.to/ma_uttaram_f822b3b02ec546</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/ma_uttaram_f822b3b02ec546"/>
    <language>en</language>
    <item>
      <title>Life Cycle of Training</title>
      <dc:creator>Ma Uttaram</dc:creator>
      <pubDate>Fri, 08 May 2026 15:43:06 +0000</pubDate>
      <link>https://dev.to/ma_uttaram_f822b3b02ec546/life-cycle-of-training-1cdo</link>
      <guid>https://dev.to/ma_uttaram_f822b3b02ec546/life-cycle-of-training-1cdo</guid>
      <description>&lt;p&gt;Here is the standard "Life Cycle" of a single training step:&lt;/p&gt;

&lt;h2&gt;
  
  
  1. Forward Pass (The Guess)
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Linear: $z = w \cdot x + b$&lt;/li&gt;
&lt;li&gt;Activation: $a = f(z)$ (ReLU, Sigmoid, etc.)&lt;/li&gt;
&lt;li&gt;This happens layer by layer until you get a final prediction.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  2. Loss Calculation (The Error)
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;You compare the prediction to the Actual Target.&lt;/li&gt;
&lt;li&gt;Cross-Entropy: Used for classification (e.g., "Is this a cat or a dog?").&lt;/li&gt;
&lt;li&gt;MSE (Mean Squared Error): Used for regression (e.g., "What is the price of this house?").&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  3. Backpropagation (The Blame)
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;You calculate the Gradients.&lt;/li&gt;
&lt;li&gt;This tells you exactly how much each weight ($w$) and bias ($b$) contributed to the error.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  4. Optimization (The Fix)
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;You use Gradient Descent to slightly nudge the weights in the direction that reduces the Loss.&lt;/li&gt;
&lt;li&gt;$w = w - (\text{learning rate} \cdot \text{gradient})$&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Layer 1 (Weighted SUM -&amp;gt; Activation Function) -&amp;gt;Layer 2 (Weighted SUM -&amp;gt; Activation Function)&lt;/p&gt;

&lt;h2&gt;
  
  
  get output passing input via the layers for Neural networks
&lt;/h2&gt;

&lt;h2&gt;
  
  
  🏁 Summary
&lt;/h2&gt;

&lt;p&gt;Forward (Guess) --&amp;gt; Loss (Check) --&amp;gt; Backward (Assign Blame) --&amp;gt; Optimizer (Update).&lt;br&gt;
To keep going, do you want to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;See how Cross-Entropy math actually works?&lt;/li&gt;
&lt;li&gt;Understand the difference between Weights and Biases in the update step?&lt;/li&gt;
&lt;/ul&gt;

</description>
    </item>
    <item>
      <title>Linear Algebra</title>
      <dc:creator>Ma Uttaram</dc:creator>
      <pubDate>Fri, 08 May 2026 11:47:33 +0000</pubDate>
      <link>https://dev.to/ma_uttaram_f822b3b02ec546/linear-algebra-25hl</link>
      <guid>https://dev.to/ma_uttaram_f822b3b02ec546/linear-algebra-25hl</guid>
      <description>&lt;h2&gt;
  
  
  Linear Regression
&lt;/h2&gt;

&lt;p&gt;The Math: (y = Xw + b)&lt;/p&gt;

&lt;h4&gt;
  
  
  The Loop:
&lt;/h4&gt;

&lt;p&gt;Use Gradient Descent.Predict (y_{pred}).Calculate Error ((y_{pred} - y_{true})).&lt;/p&gt;

&lt;h4&gt;
  
  
  Compute Gradient:
&lt;/h4&gt;

&lt;p&gt;(X^T \cdot \text{Error} / n).Update Weights: (w = w - (\text{learning_rate} \times \text{gradient})).&lt;/p&gt;

&lt;h2&gt;
  
  
  Logistic RegressionThe Difference:
&lt;/h2&gt;

&lt;p&gt;Same as Linear Regression, but pass the result through the Sigmoid Function: (1 / (1 + e^{-z})).&lt;/p&gt;

&lt;h4&gt;
  
  
  The Goal:
&lt;/h4&gt;

&lt;p&gt;Maps any number to a probability between 0 and 1.&lt;/p&gt;

&lt;h2&gt;
  
  
  K-Means
&lt;/h2&gt;

&lt;h4&gt;
  
  
  The Logic:
&lt;/h4&gt;

&lt;p&gt;Pick (k) random points as "centroids."Assign every data point to the nearest centroid.Move centroids to the average of their assigned points.Repeat until the centroids stop moving.&lt;/p&gt;

</description>
    </item>
    <item>
      <title>Impute+Encoding+Scaling+Hyperparam tuning</title>
      <dc:creator>Ma Uttaram</dc:creator>
      <pubDate>Fri, 08 May 2026 11:41:36 +0000</pubDate>
      <link>https://dev.to/ma_uttaram_f822b3b02ec546/imputeencodingscalinghyperparam-tuning-ifp</link>
      <guid>https://dev.to/ma_uttaram_f822b3b02ec546/imputeencodingscalinghyperparam-tuning-ifp</guid>
      <description>&lt;h2&gt;
  
  
  Data Cleaning &amp;amp; Feature EngineeringImputation:
&lt;/h2&gt;

&lt;p&gt;Don't just delete missing values; fill them with the median or use a KNN &lt;/p&gt;

&lt;h2&gt;
  
  
  imputer.Encoding:
&lt;/h2&gt;

&lt;p&gt;Use One-Hot Encoding for categories with no order (colors) and Label Encoding for order (sizes like S, M, L).&lt;/p&gt;

&lt;h2&gt;
  
  
  Scaling:
&lt;/h2&gt;

&lt;p&gt;Always use StandardScaler or MinMaxScaler for SVMs and Linear models; otherwise, large numbers will "bully" small ones.&lt;/p&gt;

&lt;h2&gt;
  
  
  Hyperparameter TuningGrid Search:
&lt;/h2&gt;

&lt;p&gt;Tests every possible combination (Slow but thorough).Random Search: Tests random combinations (Faster, often just as good).Bayesian Optimization: Uses math to "guess" which parameters will work best next based on previous results.&lt;/p&gt;

</description>
    </item>
    <item>
      <title>XBoost+Random Forest+SVM</title>
      <dc:creator>Ma Uttaram</dc:creator>
      <pubDate>Fri, 08 May 2026 11:38:55 +0000</pubDate>
      <link>https://dev.to/ma_uttaram_f822b3b02ec546/xboostrandom-forestsvm-kke</link>
      <guid>https://dev.to/ma_uttaram_f822b3b02ec546/xboostrandom-forestsvm-kke</guid>
      <description>&lt;h2&gt;
  
  
  🌲 Random Forest (The Stable One)
&lt;/h2&gt;

&lt;p&gt;Imagine asking 100 people a "Yes/No" question and taking the majority vote. That is Random Forest.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Concept: It creates a "Forest" of many Decision Trees. Each tree is trained on a random subset of the data and a random subset of the features.&lt;/li&gt;
&lt;li&gt;The "Stability" Factor: Because it averages the results of many trees, one "bad" or "weird" tree can't ruin the final prediction.&lt;/li&gt;
&lt;li&gt;Best For: When you want a model that "just works" without hours of tuning. It is very hard to break and handles messy data (outliers) beautifully.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  🚀 Gradient Boosting (The "King")
&lt;/h2&gt;

&lt;p&gt;If Random Forest is a group of people voting simultaneously, Gradient Boosting is a team of students learning from their mistakes.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Concept: It builds trees one after the other (sequentially). Tree #1 makes a guess. Tree #2 focuses only on the errors Tree #1 made. Tree #3 focuses on the errors left over by Tree #2.&lt;/li&gt;
&lt;li&gt;The "King" Status: Algorithms like XGBoost or LightGBM are incredibly fast and precise. They win almost every competition for structured data because they can find very complex patterns.&lt;/li&gt;
&lt;li&gt;Catch: They are prone to overfitting if you don't tune the hyperparameters (the "knobs") correctly.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  🛣️ Support Vector Machines (The "Widest Street")
&lt;/h2&gt;

&lt;p&gt;SVM is about finding the cleanest possible boundary between two groups.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Concept: It doesn't just draw a line; it looks for the Maximum Margin. It tries to create the widest possible "neutral zone" (the street) between classes.&lt;/li&gt;
&lt;li&gt;The Kernel Trick: Sometimes, data points are so mixed up in 2D that you can't draw a line between them. SVM uses math to "lift" the data into 3D space. Suddenly, you can slide a flat sheet of paper (a hyperplane) between the groups. When you project it back down to 2D, that flat sheet looks like a perfect circular or curved boundary.&lt;/li&gt;
&lt;li&gt;Best For: Smaller, clean datasets where you need high precision (like medical diagnosis or image recognition).&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  💡 Summary Comparison
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Algorithm&lt;/th&gt;
&lt;th&gt;Strategy&lt;/th&gt;
&lt;th&gt;Main Strength&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Random Forest&lt;/td&gt;
&lt;td&gt;Voting in parallel&lt;/td&gt;
&lt;td&gt;Reliability; hard to mess up.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Gradient Boosting&lt;/td&gt;
&lt;td&gt;Learning in sequence&lt;/td&gt;
&lt;td&gt;Pure power; highest accuracy.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;SVM&lt;/td&gt;
&lt;td&gt;Geometric separation&lt;/td&gt;
&lt;td&gt;High precision in complex spaces.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  These three algorithms represent the "Top Tier" of traditional Machine Learning. Most professional data science projects for tabular data (Excel-style data) use one of these.
&lt;/h2&gt;

&lt;h2&gt;
  
  
  🌲 Random Forest (The Stable One)
&lt;/h2&gt;

&lt;p&gt;Imagine asking 100 people a "Yes/No" question and taking the majority vote. That is Random Forest.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Concept: It creates a "Forest" of many Decision Trees. Each tree is trained on a random subset of the data and a random subset of the features.&lt;/li&gt;
&lt;li&gt;The "Stability" Factor: Because it averages the results of many trees, one "bad" or "weird" tree can't ruin the final prediction.&lt;/li&gt;
&lt;li&gt;Best For: When you want a model that "just works" without hours of tuning. It is very hard to break and handles messy data (outliers) beautifully.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  🚀 Gradient Boosting (The "King")
&lt;/h2&gt;

&lt;p&gt;If Random Forest is a group of people voting simultaneously, Gradient Boosting is a team of students learning from their mistakes.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Concept: It builds trees one after the other (sequentially). Tree #1 makes a guess. Tree #2 focuses only on the errors Tree #1 made. Tree #3 focuses on the errors left over by Tree #2.&lt;/li&gt;
&lt;li&gt;The "King" Status: Algorithms like XGBoost or LightGBM are incredibly fast and precise. They win almost every competition for structured data because they can find very complex patterns.&lt;/li&gt;
&lt;li&gt;Catch: They are prone to overfitting if you don't tune the hyperparameters (the "knobs") correctly.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  🛣️ Support Vector Machines (The "Widest Street")
&lt;/h2&gt;

&lt;p&gt;SVM is about finding the cleanest possible boundary between two groups.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Concept: It doesn't just draw a line; it looks for the Maximum Margin. It tries to create the widest possible "neutral zone" (the street) between classes.&lt;/li&gt;
&lt;li&gt;The Kernel Trick: Sometimes, data points are so mixed up in 2D that you can't draw a line between them. SVM uses math to "lift" the data into 3D space. Suddenly, you can slide a flat sheet of paper (a hyperplane) between the groups. When you project it back down to 2D, that flat sheet looks like a perfect circular or curved boundary.&lt;/li&gt;
&lt;li&gt;Best For: Smaller, clean datasets where you need high precision (like medical diagnosis or image recognition).&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  💡 Summary Comparison
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Algorithm&lt;/th&gt;
&lt;th&gt;Strategy&lt;/th&gt;
&lt;th&gt;Main Strength&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Random Forest&lt;/td&gt;
&lt;td&gt;Voting in parallel&lt;/td&gt;
&lt;td&gt;Reliability; hard to mess up.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Gradient Boosting&lt;/td&gt;
&lt;td&gt;Learning in sequence&lt;/td&gt;
&lt;td&gt;Pure power; highest accuracy.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;SVM&lt;/td&gt;
&lt;td&gt;Geometric separation&lt;/td&gt;
&lt;td&gt;High precision in complex spaces.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fa1uhprlgucby403tjh4k.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fa1uhprlgucby403tjh4k.png" alt=" " width="800" height="335"&gt;&lt;/a&gt;&lt;/p&gt;

</description>
    </item>
    <item>
      <title>Data Science Phase1</title>
      <dc:creator>Ma Uttaram</dc:creator>
      <pubDate>Fri, 01 May 2026 19:09:39 +0000</pubDate>
      <link>https://dev.to/ma_uttaram_f822b3b02ec546/phase1-1bal</link>
      <guid>https://dev.to/ma_uttaram_f822b3b02ec546/phase1-1bal</guid>
      <description>&lt;h2&gt;
  
  
  Phase 1: The Engine (Linear Algebra)
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;The Goal: Moving and transforming data.&lt;/li&gt;
&lt;li&gt;Key Formula: $y = Wx + b$&lt;/li&gt;
&lt;li&gt;$x$ (Input Vector): Your raw data (e.g., $[7, 3]$ for Sleep and Coffee).

&lt;ul&gt;
&lt;li&gt;$W$ (Weight Matrix): The "importance" values the AI gives each feature.&lt;/li&gt;
&lt;li&gt;$b$ (Bias): A baseline "nudge" (the score if sleep and coffee were zero).&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;Dot Product: Multiplying matching elements and summing them up. It measures similarity.&lt;/li&gt;

&lt;li&gt;Matrix Multiplication: Running many students through the model at once. Rule: Order matters ($AB \neq BA$).&lt;/li&gt;

&lt;/ul&gt;

&lt;h2&gt;
  
  
  Phase 2: The Map (Geometry)
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;The Concept: A matrix is a transformation machine that warps space.&lt;/li&gt;
&lt;li&gt;The Grid: The columns of your matrix are the "New Rulers." They tell you where the original axes land after the transformation.&lt;/li&gt;
&lt;li&gt;The Determinant: The "Squash Factor."&lt;/li&gt;
&lt;li&gt;$\text{Det} = 1$: Area stays the same.

&lt;ul&gt;
&lt;li&gt;$\text{Det} = 0$: The 2D world collapses into a 1D line (information is lost).&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;h2&gt;
  
  
  Phase 3: The Skeleton (Eigen-Concepts)
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Eigenvector: A special "direction" (profile) that never tilts during a transformation. It only gets longer or shorter.&lt;/li&gt;
&lt;li&gt;Eigenvalue ($\lambda$): The number that tells you how much the eigenvector was stretched.&lt;/li&gt;
&lt;li&gt;AI Insight: The eigenvector with the largest eigenvalue represents the most important trend in your data.&lt;/li&gt;
&lt;li&gt;Characteristic Equation: $\det(A - \lambda I) = 0$. We subtract $\lambda$ diagonally to find the value that "collapses" the matrix.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Phase 4: The Steering Wheel (Calculus)
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;The Derivative: A sensor that detects if the Error goes up or down when you change a weight.&lt;/li&gt;
&lt;li&gt;The Gradient ($\nabla$): A vector of all derivatives. It’s a compass pointing toward the "Mountain of Error."&lt;/li&gt;
&lt;li&gt;Gradient Descent: The process of walking opposite the gradient to find the "Valley of Minimum Error."&lt;/li&gt;
&lt;li&gt;Formula: $w_{new} = w_{old} - (\text{Learning Rate} \times \text{Gradient})$&lt;/li&gt;
&lt;li&gt;Convergence: When the Gradient is zero, the AI has found the best possible weights.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Phase 5: The Gut Check (Probability)
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Confidence: Measured by Standard Deviation ($\sigma$).&lt;/li&gt;
&lt;li&gt;Low $\sigma$: The AI is "sure" (tight bell curve).

&lt;ul&gt;
&lt;li&gt;High $\sigma$: The AI is "unsure" (wide bell curve).&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;Softmax: A formula that turns raw scores (like 10 and 2) into probabilities that add up to 100% (like 98% and 2%).&lt;/li&gt;

&lt;li&gt;Bayes' Theorem: How the AI updates its "opinion" (Prior) when it sees new data (Likelihood) to get a new result (Posterior).&lt;/li&gt;

&lt;/ul&gt;




&lt;p&gt;Quick-Reference Math Symbols:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;$W$: Weights (The "Importance")&lt;/li&gt;
&lt;li&gt;$\eta$ (Eta): Learning Rate (The "Step Size")&lt;/li&gt;
&lt;li&gt;$\nabla$ (Nabla): The Gradient (The "Direction to fix")&lt;/li&gt;
&lt;li&gt;$\lambda$ (Lambda): Eigenvalue (The "Strength of a trend")&lt;/li&gt;
&lt;li&gt;$\det$: Determinant (The "Scaling/Squashing factor")&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You’ve officially covered the "Big Five" of AI math! How does this summary look for your notes?&lt;/p&gt;

</description>
      <category>ai</category>
      <category>beginners</category>
      <category>machinelearning</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>System Design - Part 4</title>
      <dc:creator>Ma Uttaram</dc:creator>
      <pubDate>Thu, 30 Apr 2026 21:33:06 +0000</pubDate>
      <link>https://dev.to/ma_uttaram_f822b3b02ec546/part-4-2bd6</link>
      <guid>https://dev.to/ma_uttaram_f822b3b02ec546/part-4-2bd6</guid>
      <description>&lt;p&gt;Strava&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8l9ltc0b86s3i7t82njt.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8l9ltc0b86s3i7t82njt.png" alt=" " width="800" height="462"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Online Auction &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Faj579rbhl4czodhnbguf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Faj579rbhl4czodhnbguf.png" alt=" " width="800" height="377"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Facebook Live Video&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvgldycqnrsiycsxp9zme.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvgldycqnrsiycsxp9zme.png" alt=" " width="800" height="391"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Facebook Post Search&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2uwdkaepe2ybuimw7yrk.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2uwdkaepe2ybuimw7yrk.png" alt=" " width="800" height="390"&gt;&lt;/a&gt;&lt;/p&gt;

</description>
    </item>
    <item>
      <title>System Design - Part 3</title>
      <dc:creator>Ma Uttaram</dc:creator>
      <pubDate>Fri, 17 Apr 2026 21:11:04 +0000</pubDate>
      <link>https://dev.to/ma_uttaram_f822b3b02ec546/part-3-3119</link>
      <guid>https://dev.to/ma_uttaram_f822b3b02ec546/part-3-3119</guid>
      <description>&lt;p&gt;Facebook News Feed&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fa4cc8cn88qlusa23es7z.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fa4cc8cn88qlusa23es7z.png" alt=" " width="800" height="379"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Tinder&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fo5x880yx1657ow5d3o02.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fo5x880yx1657ow5d3o02.png" alt=" " width="800" height="432"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;LeetCode&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fy3snt7prr8ucw7sxdxt5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fy3snt7prr8ucw7sxdxt5.png" alt=" " width="800" height="372"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;yelp &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fty6ka25k0k0o5bgppgso.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fty6ka25k0k0o5bgppgso.png" alt=" " width="800" height="333"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Rate Limiter&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fearmyfj26khkwi3f5lgs.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fearmyfj26khkwi3f5lgs.png" alt=" " width="800" height="439"&gt;&lt;/a&gt;&lt;/p&gt;

</description>
    </item>
    <item>
      <title>System Design - Part 2</title>
      <dc:creator>Ma Uttaram</dc:creator>
      <pubDate>Fri, 17 Apr 2026 19:04:45 +0000</pubDate>
      <link>https://dev.to/ma_uttaram_f822b3b02ec546/part-2-2j59</link>
      <guid>https://dev.to/ma_uttaram_f822b3b02ec546/part-2-2j59</guid>
      <description>&lt;p&gt;Uber &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmrtgw3jc5lmff4914og8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmrtgw3jc5lmff4914og8.png" alt=" " width="800" height="375"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Robinhood&lt;br&gt;
-- Key is the Number of requests from various IPs is not allowed so limit src Ips. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcmd72hj2is581l7z1m55.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcmd72hj2is581l7z1m55.png" alt=" " width="800" height="425"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Google Docs&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsj2ztc28n7btgervlw4d.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsj2ztc28n7btgervlw4d.png" alt=" " width="800" height="393"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Distributed Cache&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fq5kok2mj9fi3wk2rbuq6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fq5kok2mj9fi3wk2rbuq6.png" alt=" " width="800" height="420"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Youtube&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fj4if2qyfuqw750h2gouq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fj4if2qyfuqw750h2gouq.png" alt=" " width="800" height="499"&gt;&lt;/a&gt;&lt;/p&gt;

</description>
    </item>
    <item>
      <title>System Design - Part 1</title>
      <dc:creator>Ma Uttaram</dc:creator>
      <pubDate>Thu, 16 Apr 2026 19:49:43 +0000</pubDate>
      <link>https://dev.to/ma_uttaram_f822b3b02ec546/parts-1-1cho</link>
      <guid>https://dev.to/ma_uttaram_f822b3b02ec546/parts-1-1cho</guid>
      <description>&lt;p&gt;News Feed Service&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyf0w0ex2bbhtjp0pfg4v.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyf0w0ex2bbhtjp0pfg4v.png" alt=" " width="800" height="518"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Drop Box&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftwse0z1h91f229dv3tu9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftwse0z1h91f229dv3tu9.png" alt=" " width="800" height="428"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Tiny URL&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fce37ls4pawt4abzap8v0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fce37ls4pawt4abzap8v0.png" alt=" " width="800" height="343"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Local Delivery Service&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpa0hmisn6npqmc11tgdt.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpa0hmisn6npqmc11tgdt.png" alt=" " width="800" height="491"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Ticket Master&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyhnelzwlqn49pwr0i136.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyhnelzwlqn49pwr0i136.png" alt=" " width="800" height="372"&gt;&lt;/a&gt;&lt;/p&gt;

</description>
    </item>
    <item>
      <title>DAG vs Langraph Nodes</title>
      <dc:creator>Ma Uttaram</dc:creator>
      <pubDate>Sun, 29 Mar 2026 00:39:21 +0000</pubDate>
      <link>https://dev.to/ma_uttaram_f822b3b02ec546/dag-vs-langraph-nodes-3en2</link>
      <guid>https://dev.to/ma_uttaram_f822b3b02ec546/dag-vs-langraph-nodes-3en2</guid>
      <description>&lt;p&gt;When we have DAG that represents our tasks and its dependencies do we still need Langraph nodes?&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpavz85becb7fws0kmr3y.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpavz85becb7fws0kmr3y.png" alt=" " width="800" height="795"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;LangGraph nodes are most valuable when your flow has cycles — loops, retries, HITL resume, conditional branching back to earlier steps. A pure DAG has none of that, so LangGraph's core value proposition doesn't apply.&lt;/p&gt;

&lt;p&gt;So are they mutually exclusive? No! They can complement each other. See the beautiful project to understand why.&lt;/p&gt;

&lt;p&gt;LangGraph nodes for the orchestration layer: intent_parser → entity_extractor → plan_builder → hitl_plan → cot_builder → hitl_confirm. This is the stateful, cyclic part.&lt;/p&gt;

&lt;p&gt;Your own DAG executor for task execution: takes the confirmed TaskGraph, runs a topological sort, fans out independent tasks with asyncio, collects results. Twenty to thirty lines of plain Python. No framework.&lt;/p&gt;

&lt;p&gt;| &lt;code&gt;Send&lt;/code&gt; API for parallel dispatch | LangGraph-native fan-out; no manual thread/asyncio management; results merged automatically |&lt;/p&gt;

&lt;p&gt;The DAG (networkx) and LangGraph Send API solve different problems and are complementary — not alternatives. See the 'Why both networkx DAG and LangGraph Send API?' section added to the Overview. Short answer: networkx tells us WHAT to run (which steps are ready, via topological ordering and predecessor checks); LangGraph Send API handles HOW to run them concurrently (fan-out to parallel node invocations, automatic state merge on fan-in). You could replace Send with asyncio.gather inside a single node, but you'd lose per-step checkpointing (partial progress survives failures), automatic state merging, and consistency with the rest of the LangGraph pipeline.&lt;/p&gt;

&lt;p&gt;Your mental model was almost right — let me clarify the full split:\n\n| Concern | Tool | Why |\n|---|---|---|\n| Dependency graph structure | networkx DAG | Planner builds it, CoT validator walks it, executor queries it for readiness |\n| Parallel task execution | LangGraph Send API | Fan-out ready steps as concurrent node invocations; LangGraph auto-merges results |\n| HITL + session resume | LangGraph interrupt() + PostgresSaver | Only LangGraph provides checkpointed pause/resume semantics |\n\nThe DAG is used across all pipeline stages — not just the executor — as a data structure for graph operations (cycle detection, topological sort, predecessor queries). LangGraph is the execution engine throughout. The executor uses both: DAG to decide which steps are ready, Send API to run them in parallel. Added a dedicated section in the executor design doc explaining this.&lt;/p&gt;

&lt;p&gt;The DAG (networkx) is a data structure for representing and querying step dependencies — it lives in dag.py and is used by three different pipeline stages (planner for cycle detection, CoT validator for topological walk, executor for readiness checks). LangGraph graph nodes (validate_cot, execute_step, etc.) are the execution units in the agent runtime. They use the DAG as a utility library. There's no conflict — the DAG tells the LangGraph nodes what order/parallelism is required; LangGraph handles actually running them, checkpointing state, and managing HITL interrupts.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftz6dg03tw8y7nwzkjyly.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftz6dg03tw8y7nwzkjyly.png" alt=" " width="800" height="1045"&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>agents</category>
      <category>ai</category>
      <category>architecture</category>
      <category>llm</category>
    </item>
    <item>
      <title>Langraph vs Langchain</title>
      <dc:creator>Ma Uttaram</dc:creator>
      <pubDate>Sat, 28 Mar 2026 22:32:39 +0000</pubDate>
      <link>https://dev.to/ma_uttaram_f822b3b02ec546/langraph-vs-langchain-23g6</link>
      <guid>https://dev.to/ma_uttaram_f822b3b02ec546/langraph-vs-langchain-23g6</guid>
      <description>&lt;p&gt;As I was working on HITL, I was checking to see if we can start with existing Agentic Frameworks like Langraph and Langchain. The below was very interesting&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F106ynunkc0z3l1qe04p2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F106ynunkc0z3l1qe04p2.png" alt=" " width="800" height="850"&gt;&lt;/a&gt;&lt;/p&gt;

</description>
    </item>
    <item>
      <title>Evaluation Techniques</title>
      <dc:creator>Ma Uttaram</dc:creator>
      <pubDate>Sat, 28 Mar 2026 20:07:01 +0000</pubDate>
      <link>https://dev.to/ma_uttaram_f822b3b02ec546/evaluation-techniques-50mm</link>
      <guid>https://dev.to/ma_uttaram_f822b3b02ec546/evaluation-techniques-50mm</guid>
      <description>&lt;p&gt;There are six main evaluation techniques, falling into two broad families: those that compare against a known answer, and those that use judgment. Here's a visual overview, then the explanation of each.Here's each technique explained:&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Exact match&lt;/strong&gt; — the simplest form. You know the correct answer, and you check whether the output equals it exactly. Works well for structured tasks: intent classification ("is this a booking request?"), entity extraction where the expected output is a fixed JSON, or tool selection ("should the agent call the calendar API or the email API here?"). Brittle for open-ended text because two correct answers can be worded differently.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Schema / constraint validation&lt;/strong&gt; — instead of checking exact values, you check the shape of the output. Does entity extraction return a valid &lt;code&gt;Task&lt;/code&gt; schema with all required fields? Did the plan builder produce a properly ordered list? This is what Pydantic and Zod do, and it's directly relevant to BuddingBuilder's FR #7. It catches malformed outputs even when the content is hard to verify.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Code execution / unit test&lt;/strong&gt; — the gold standard for any agent that produces code or structured plans. You run the output and check whether tests pass. For BuddingBuilder this applies to any task whose result is deterministically verifiable — a calculation, a formatted document, a database query result.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Reference-based LLM judge&lt;/strong&gt; — you have a golden answer, and you ask a judge model to compare the agent's output against it and score the match. Returns a score and a reason. More flexible than exact match because it can handle paraphrasing, but requires you to maintain a library of golden examples, which takes effort to build and keep current.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Rubric-based LLM judge&lt;/strong&gt; — no golden answer needed. You give the judge a scoring rubric ("rate this response 1–5 on correctness, task completion, and safety") and it evaluates the output on its own. This is the most practical technique for staging, because you can write rubrics faster than you can curate golden answers. The key is writing rubrics that are specific enough that the judge can't wriggle around them.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pairwise preference&lt;/strong&gt; — the judge sees two outputs side by side and picks the better one. You're not asking "is this good?" but "which is better — the old prompt or the new one?" This is the right technique for promotion gates: before moving from dev to staging, run pairwise eval between the new version and the current prod version. If the new version wins consistently, promote. This is also how RLHF preference data is collected.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Human eval&lt;/strong&gt; — a human reads and rates the output. Highest signal, but too slow and expensive to run on everything. Its real job is to calibrate your automated judges — you periodically sample flagged traces, have a human rate them, and check whether your judge model's scores agree. If they don't, your rubric needs refining.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Online monitoring&lt;/strong&gt; — the only technique running continuously in prod. The guard model scores inputs before the agent acts; the output validator scores responses after. Neither produces a detailed critique — they produce a fast pass/fail signal with enough metadata to route flagged interactions to the human review queue. This is what closes BuddingBuilder's prod → dev feedback loop.&lt;/p&gt;


&lt;div class="crayons-card c-embed text-styles text-styles--secondary"&gt;
    &lt;div class="c-embed__content"&gt;
      &lt;div class="c-embed__body flex items-center justify-between"&gt;
        &lt;a href="https://claude.ai/public/artifacts/6e95af46-0c33-4cce-a6dd-eeb9679a9a28" rel="noopener noreferrer" class="c-link fw-bold flex items-center"&gt;
          &lt;span class="mr-2"&gt;claude.ai&lt;/span&gt;
          

        &lt;/a&gt;
      &lt;/div&gt;
    &lt;/div&gt;
&lt;/div&gt;


</description>
      <category>ai</category>
      <category>llm</category>
      <category>machinelearning</category>
      <category>testing</category>
    </item>
  </channel>
</rss>
