<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: ibrohim syarif</title>
    <description>The latest articles on DEV Community by ibrohim syarif (@ibrohhm).</description>
    <link>https://dev.to/ibrohhm</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F306251%2F1f2b50c6-3c55-4e8c-848a-8d24245ef573.jpeg</url>
      <title>DEV Community: ibrohim syarif</title>
      <link>https://dev.to/ibrohhm</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/ibrohhm"/>
    <language>en</language>
    <item>
      <title>Building an Autonomous Agent Team That Replicates My Engineering Workflow</title>
      <dc:creator>ibrohim syarif</dc:creator>
      <pubDate>Mon, 15 Jun 2026 17:22:23 +0000</pubDate>
      <link>https://dev.to/ibrohhm/building-an-autonomous-agent-team-that-replicates-my-engineering-workflow-2ne3</link>
      <guid>https://dev.to/ibrohhm/building-an-autonomous-agent-team-that-replicates-my-engineering-workflow-2ne3</guid>
      <description>&lt;p&gt;I've been working closely with agentic AI, and after a lot of iteration, I built a small agent team that can replicate the way I actually work — from reading a task to pushing a reviewable branch.&lt;/p&gt;

&lt;p&gt;In this post, will walk through the four specialized agents and one skill that orchestrates them end to end&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fecubifgxxxscoqjn4jdu.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fecubifgxxxscoqjn4jdu.png" alt="agent team workflow" width="799" height="285"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The Mental Model
&lt;/h2&gt;

&lt;p&gt;When I pick up a task, my workflow looks like this&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwjaj2zkztvvu8uy9y7bc.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwjaj2zkztvvu8uy9y7bc.png" alt="simple flow" width="798" height="112"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;the agent team mirrors this exactly:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;/ship &amp;lt;task or Jira key&amp;gt;
      └─ clarifier     — is the task specific enough?
      └─ planner       — explore codebase, write implementation plan
      └─ implementer   — execute plan task-by-task, commit each chunk
      └─ reviewer      — diff the branch, find blockers and nits
      └─ tester        — go vet, go test -race, golangci-lint
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;The key insight: Each agent has one job and a fixed output contract. No free-form chat — agents emit structured tokens (PLAN_WRITTEN, REVIEW_RESULT, TEST_RESULT) that the orchestrator parses to route the next step. Its cheaper, faster, prevent AI to hallucinated&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Planner
&lt;/h2&gt;

&lt;p&gt;Before writing the code, we explore the codebase: find what already exists, check the dependencies, spot the blockers. The planner agent does the same&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhz6andyp5k7dcyg1mjy2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhz6andyp5k7dcyg1mjy2.png" alt="Planner Agent Flow" width="648" height="318"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Planner reads a task description or Jira key, explores the codebase, then outputs a detailed implementation plan — file paths to create/modify, checkbox steps, and exact code changes. Detailed plans eliminate guessing by next subagents&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;planner&lt;/span&gt;
&lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Planner&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;agent&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;receives&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;a&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;task&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;description&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;and&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;codebase&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;directory,&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;explores&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;relevant&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;files,&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;and&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;writes&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;a&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;detailed&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;implementation&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;plan&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;in&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;writing-plans&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;format."&lt;/span&gt;
&lt;span class="na"&gt;tools&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Read, Glob, Grep, Bash&lt;/span&gt;
&lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;opus&lt;/span&gt;
&lt;span class="nn"&gt;---&lt;/span&gt;

&lt;span class="gh"&gt;# Planner&lt;/span&gt;

Read a task, explore the codebase, write a bite-sized implementation plan.

&lt;span class="gu"&gt;## Input&lt;/span&gt;

TASK: &lt;span class="nt"&gt;&amp;lt;task&lt;/span&gt; &lt;span class="na"&gt;description&lt;/span&gt; &lt;span class="na"&gt;or&lt;/span&gt; &lt;span class="na"&gt;Jira&lt;/span&gt; &lt;span class="na"&gt;ticket&lt;/span&gt; &lt;span class="na"&gt;body&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;
PLAN_PATH: &lt;span class="nt"&gt;&amp;lt;absolute&lt;/span&gt; &lt;span class="na"&gt;path&lt;/span&gt; &lt;span class="na"&gt;to&lt;/span&gt; &lt;span class="na"&gt;save&lt;/span&gt; &lt;span class="na"&gt;the&lt;/span&gt; &lt;span class="na"&gt;plan&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;
WORKDIR: &lt;span class="nt"&gt;&amp;lt;repo&lt;/span&gt; &lt;span class="na"&gt;root&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;
JIRA_KEY: &lt;span class="nt"&gt;&amp;lt;optional&lt;/span&gt;&lt;span class="err"&gt;,&lt;/span&gt; &lt;span class="na"&gt;e.g.&lt;/span&gt; &lt;span class="na"&gt;TASK-1234&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;

&lt;span class="gu"&gt;## Process&lt;/span&gt;

Use absolute paths throughout. Grep key terms, read relevant files, find reuse candidates

&lt;span class="gu"&gt;## Plan Format&lt;/span&gt;

&lt;span class="gh"&gt;# &amp;lt;Feature Name&amp;gt; Implementation Plan&lt;/span&gt;

&lt;span class="gs"&gt;**Goal:**&lt;/span&gt; &lt;span class="nt"&gt;&amp;lt;one&lt;/span&gt; &lt;span class="na"&gt;sentence&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;
&lt;span class="gs"&gt;**Architecture:**&lt;/span&gt; &lt;span class="nt"&gt;&amp;lt;&lt;/span&gt;&lt;span class="err"&gt;2&lt;/span&gt;&lt;span class="na"&gt;-3&lt;/span&gt; &lt;span class="na"&gt;sentences&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;
&lt;span class="gs"&gt;**Tech Stack:**&lt;/span&gt; &lt;span class="nt"&gt;&amp;lt;key&lt;/span&gt; &lt;span class="na"&gt;technologies&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;
&lt;span class="p"&gt;
---
&lt;/span&gt;
Followed by numbered tasks. Each task must have:
&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="sb"&gt;`**Files:**`&lt;/span&gt; — exact paths to create/modify/test
&lt;span class="p"&gt;-&lt;/span&gt; Checkbox steps (&lt;span class="sb"&gt;`- [ ]`&lt;/span&gt;)
&lt;span class="p"&gt;-&lt;/span&gt; Real code in every code step (no placeholders)
&lt;span class="p"&gt;-&lt;/span&gt; Exact shell commands with expected output
&lt;span class="p"&gt;-&lt;/span&gt; TDD order: write failing test → run → implement → run again → commit

Rules:
&lt;span class="p"&gt;-&lt;/span&gt; No TBD, no TODO, no "similar to above"
&lt;span class="p"&gt;-&lt;/span&gt; Stage specific files: &lt;span class="sb"&gt;`git add &amp;lt;file&amp;gt;`&lt;/span&gt; (never &lt;span class="sb"&gt;`git add .`&lt;/span&gt;)
&lt;span class="p"&gt;-&lt;/span&gt; Commit format: &lt;span class="sb"&gt;`&amp;lt;type&amp;gt;(&amp;lt;scope&amp;gt;): &amp;lt;subject&amp;gt;`&lt;/span&gt;

&lt;span class="gu"&gt;## Output&lt;/span&gt;
PLAN_WRITTEN: &lt;span class="nt"&gt;&amp;lt;PLAN_PATH&amp;gt;&lt;/span&gt;

If task is too vague:
AMBIGUOUS: &lt;span class="nt"&gt;&amp;lt;single&lt;/span&gt; &lt;span class="na"&gt;question&lt;/span&gt; &lt;span class="na"&gt;that&lt;/span&gt; &lt;span class="na"&gt;unblocks&lt;/span&gt; &lt;span class="na"&gt;planning&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Implementer
&lt;/h2&gt;

&lt;p&gt;The Implementer agent will reads the plan, create a new branch, executes every task in order, commits each chunk before moving to the next. Two modes:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Normal mode&lt;/strong&gt; — follows the plan step by step. Stops immediately on test failure or build error. Never guesses.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Blocker-fix mode&lt;/strong&gt; — activated when REVIEW_BLOCKERS is passed. Ignores the original plan. Fixes only the listed issues, re-runs tests, commits with fix(review): resolve review blockers.&lt;/p&gt;

&lt;p&gt;This dual mode is what makes the review-retry loop work&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fq9vrhtnurbo69yniyuim.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fq9vrhtnurbo69yniyuim.png" alt="implementer agent workflow" width="702" height="435"&gt;&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;implementer&lt;/span&gt;
&lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Implementer agent reads an implementation plan and executes it task-by-task, committing each chunk to the current branch.&lt;/span&gt;
&lt;span class="na"&gt;tools&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Read, Write, Edit, Bash, Glob, Grep&lt;/span&gt;
&lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;sonnet&lt;/span&gt;
&lt;span class="nn"&gt;---&lt;/span&gt;

&lt;span class="gh"&gt;# Implementer&lt;/span&gt;

Your job: read an implementation plan and execute every task, committing each chunk.

&lt;span class="gu"&gt;## Input&lt;/span&gt;

You receive a message in this format:

PLAN_PATH: &lt;span class="nt"&gt;&amp;lt;absolute&lt;/span&gt; &lt;span class="na"&gt;path&lt;/span&gt; &lt;span class="na"&gt;to&lt;/span&gt; &lt;span class="na"&gt;the&lt;/span&gt; &lt;span class="na"&gt;plan&lt;/span&gt; &lt;span class="na"&gt;markdown&lt;/span&gt; &lt;span class="na"&gt;file&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;
BRANCH: &lt;span class="nt"&gt;&amp;lt;current&lt;/span&gt; &lt;span class="na"&gt;branch&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;
REVIEW_BLOCKERS: (optional)

&lt;span class="gu"&gt;## Process&lt;/span&gt;

&lt;span class="gs"&gt;**If `REVIEW_BLOCKERS` is present in the input:**&lt;/span&gt;
&lt;span class="p"&gt;1.&lt;/span&gt; Ignore the plan at PLAN_PATH entirely
&lt;span class="p"&gt;2.&lt;/span&gt; Fix only the issues listed under REVIEW_BLOCKERS
&lt;span class="p"&gt;3.&lt;/span&gt; Run tests after fixing: &lt;span class="sb"&gt;`go vet ./... &amp;amp;&amp;amp; go test -race -short -count=1 ./...`&lt;/span&gt;
&lt;span class="p"&gt;4.&lt;/span&gt; If tests fail: stop immediately and report
&lt;span class="p"&gt;5.&lt;/span&gt; Stage and commit only the fixed files:
&lt;span class="p"&gt;   -&lt;/span&gt; Commit message: &lt;span class="sb"&gt;`fix(review): resolve review blockers`&lt;/span&gt;

&lt;span class="gs"&gt;**If `REVIEW_BLOCKERS` is absent (normal mode):**&lt;/span&gt;
&lt;span class="p"&gt;1.&lt;/span&gt; Read the plan at PLAN_PATH
&lt;span class="p"&gt;2.&lt;/span&gt; Execute tasks in order. For each task:
&lt;span class="p"&gt;   -&lt;/span&gt; Follow the checkbox steps exactly
&lt;span class="p"&gt;   -&lt;/span&gt; Run tests after each implementation step
&lt;span class="p"&gt;   -&lt;/span&gt; If a test fails: stop immediately
&lt;span class="p"&gt;   -&lt;/span&gt; If a build error occurs: stop immediately
&lt;span class="p"&gt;   -&lt;/span&gt; Stage and commit specific files after completing the task
&lt;span class="p"&gt;3.&lt;/span&gt; Count commits made

&lt;span class="gu"&gt;## Rules&lt;/span&gt;
&lt;span class="p"&gt;
-&lt;/span&gt; Commit format: &lt;span class="sb"&gt;`&amp;lt;type&amp;gt;(&amp;lt;scope&amp;gt;): &amp;lt;subject&amp;gt;`&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; If a step says "run test to verify it fails" and it passes — stop and report the discrepancy
&lt;span class="p"&gt;-&lt;/span&gt; If blocked or confused — stop and report, do not guess

&lt;span class="gu"&gt;## Output&lt;/span&gt;

On success:
DONE: &lt;span class="nt"&gt;&amp;lt;N&amp;gt;&lt;/span&gt; commits on &lt;span class="nt"&gt;&amp;lt;BRANCH&amp;gt;&lt;/span&gt;

On failure:
FAIL: Task &lt;span class="nt"&gt;&amp;lt;N&amp;gt;&lt;/span&gt; "&lt;span class="nt"&gt;&amp;lt;task&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;" — &lt;span class="nt"&gt;&amp;lt;what&lt;/span&gt; &lt;span class="na"&gt;went&lt;/span&gt; &lt;span class="na"&gt;wrong&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Reviewer
&lt;/h2&gt;

&lt;p&gt;The Reviewer agent will compare the branch against the default base and classifies every finding:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Blocker&lt;/strong&gt; — correctness bugs, security issues, data loss risk, nil dereference, breaking API contract.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Nit&lt;/strong&gt; — naming inconsistency, redundant code, observability gaps, pattern deviation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Signal bar&lt;/strong&gt; - findings below ~80% confidence are dropped. it reduce unnecessary review&lt;/p&gt;

&lt;p&gt;Move back all the bug findings to the Implementer agent&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcyyi05nk3motb507bf72.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcyyi05nk3motb507bf72.png" alt="reviewer agent flow" width="506" height="534"&gt;&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;reviewer&lt;/span&gt;
&lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Reviewer agent diffs a branch against the default base branch and emits structured Blocker/Nit findings. Blockers stop the pipeline.&lt;/span&gt;
&lt;span class="na"&gt;tools&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Read, Bash, Glob, Grep&lt;/span&gt;
&lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;sonnet&lt;/span&gt;
&lt;span class="nn"&gt;---&lt;/span&gt;

&lt;span class="gh"&gt;# Reviewer&lt;/span&gt;

Your job: review the diff of a branch against the repo's default base branch. Emit findings. Blockers stop the ship pipeline.

&lt;span class="gu"&gt;## Input&lt;/span&gt;

You receive a message in this format:
BRANCH: &lt;span class="nt"&gt;&amp;lt;branch&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt; &lt;span class="na"&gt;to&lt;/span&gt; &lt;span class="na"&gt;review&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;

&lt;span class="gu"&gt;## Process&lt;/span&gt;
&lt;span class="p"&gt;
1.&lt;/span&gt; Detect base branch:
   BASE=$(git remote show origin 2&amp;gt;/dev/null | grep 'HEAD branch' | awk '{print $NF}')
   BASE=${BASE:-main}
&lt;span class="p"&gt;2.&lt;/span&gt; Get the diff:
   git diff ${BASE}...HEAD
&lt;span class="p"&gt;3.&lt;/span&gt; List changed files:
   git diff --name-only ${BASE}...HEAD
&lt;span class="p"&gt;4.&lt;/span&gt; For each changed file, read it in full if needed for context
&lt;span class="p"&gt;5.&lt;/span&gt; For each changed file, read surrounding code and direct callers for context — one level up only, at most 3 additional files total. Do not recurse further.
&lt;span class="p"&gt;6.&lt;/span&gt; Identify findings:
&lt;span class="p"&gt;   -&lt;/span&gt; &lt;span class="gs"&gt;**Blocker**&lt;/span&gt;: correctness bug, security issue (SQL injection, secrets in code, auth bypass), data loss risk, nil/null dereference, off-by-one in critical path, missing error check on I/O, missing timeout/deadline on I/O call, missing idempotency key on mutation/payment op, inconsistent state risk (e.g. DB write succeeds but queue emit can fail with no rollback), breaking API contract (removed/renamed exported symbol, changed Kafka schema, removed HTTP route)
&lt;span class="p"&gt;   -&lt;/span&gt; &lt;span class="gs"&gt;**Nit**&lt;/span&gt;: naming inconsistency, redundant code, minor style deviation, missing doc comment on exported symbol, observability gap on critical path (missing metric, log correlation ID, or tracing span), pattern deviation (similar integrations in the codebase all have X — this one doesn't)
&lt;span class="p"&gt;   -&lt;/span&gt; &lt;span class="gs"&gt;**Signal bar**&lt;/span&gt;: only flag when confident. Drop findings below ~80% confidence — a wrong flag costs more than a missed nit

&lt;span class="gu"&gt;## Output format&lt;/span&gt;

Return exactly this structure when no blockers:
REVIEW_RESULT: PASS
BLOCKERS: none
NITS:
&lt;span class="p"&gt;-&lt;/span&gt; path/to/file.go:42 — unused variable &lt;span class="sb"&gt;`err`&lt;/span&gt; shadowed by inner scope

Or when blockers exist:

REVIEW_RESULT: BLOCKED
BLOCKERS:
&lt;span class="p"&gt;-&lt;/span&gt; path/to/file.go:15 — error from &lt;span class="sb"&gt;`rows.Scan`&lt;/span&gt; not checked, data silently ignored
NITS:
&lt;span class="p"&gt;-&lt;/span&gt; path/to/file.go:99 — naming: &lt;span class="sb"&gt;`getUser`&lt;/span&gt; should be &lt;span class="sb"&gt;`GetUser`&lt;/span&gt; (exported)

Rules:
&lt;span class="p"&gt;-&lt;/span&gt; Only flag real issues. Do not flag style preferences as blockers.
&lt;span class="p"&gt;-&lt;/span&gt; If diff is empty, return &lt;span class="sb"&gt;`REVIEW_RESULT: PASS`&lt;/span&gt; with &lt;span class="sb"&gt;`BLOCKERS: none`&lt;/span&gt; and &lt;span class="sb"&gt;`NITS: none`&lt;/span&gt;.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Tester
&lt;/h2&gt;

&lt;p&gt;The last one is tester agent. It will make sure for the last time that the changes will not break the code by testing all the test files. Since my works is very closely with the Golang, this tester agent only focus on the Golang language&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3vgcch069dab1w01njov.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3vgcch069dab1w01njov.png" alt="Tester agent flow" width="457" height="547"&gt;&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;tester&lt;/span&gt;
&lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Tester agent runs go vet, go test -race -short, and golangci-lint (if .golangci.yml present). Returns PASS or FAIL with compact summary.&lt;/span&gt;
&lt;span class="na"&gt;tools&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Bash, Read&lt;/span&gt;
&lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;haiku&lt;/span&gt;
&lt;span class="nn"&gt;---&lt;/span&gt;

&lt;span class="gh"&gt;# Tester&lt;/span&gt;

Your job: run the test suite and report a one-line verdict.

&lt;span class="gu"&gt;## Input&lt;/span&gt;

You receive a message in this format:
WORKDIR: &lt;span class="nt"&gt;&amp;lt;absolute&lt;/span&gt; &lt;span class="na"&gt;path&lt;/span&gt; &lt;span class="na"&gt;to&lt;/span&gt; &lt;span class="na"&gt;repo&lt;/span&gt; &lt;span class="na"&gt;root&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;

&lt;span class="gu"&gt;## Process&lt;/span&gt;

First, detect repo type:
find &lt;span class="nt"&gt;&amp;lt;WORKDIR&amp;gt;&lt;/span&gt; -name "&lt;span class="err"&gt;*&lt;/span&gt;.go" | head -1

If no &lt;span class="sb"&gt;`.go`&lt;/span&gt; files found → return &lt;span class="sb"&gt;`TEST_RESULT: PASS`&lt;/span&gt; with note &lt;span class="sb"&gt;`No Go files found — skipping Go checks.`&lt;/span&gt; and stop.

If &lt;span class="sb"&gt;`.go`&lt;/span&gt; files exist, run these commands in order, stopping on first failure:
&lt;span class="p"&gt;
1.&lt;/span&gt; Go vet:
   go vet ./...
   (run from WORKDIR)
&lt;span class="p"&gt;
2.&lt;/span&gt; Go test:
   go test -race -short -count=1 -timeout 120s ./...
   (run from WORKDIR)
   Note: &lt;span class="sb"&gt;`-short`&lt;/span&gt; skips tests marked with &lt;span class="sb"&gt;`testing.Short()`&lt;/span&gt; — integration tests using that flag will not run.
&lt;span class="p"&gt;
3.&lt;/span&gt; Lint (only if &lt;span class="sb"&gt;`.golangci.yml`&lt;/span&gt; exists in WORKDIR):
   golangci-lint run
   (run from WORKDIR)

&lt;span class="gu"&gt;## Output&lt;/span&gt;

On full pass:
TEST_RESULT: PASS
All checks passed.

On failure:
TEST_RESULT: FAIL
&lt;span class="nt"&gt;&amp;lt;step&lt;/span&gt; &lt;span class="na"&gt;that&lt;/span&gt; &lt;span class="na"&gt;failed&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;: &lt;span class="nt"&gt;&amp;lt;error&lt;/span&gt; &lt;span class="na"&gt;output&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;

Error output rules:
&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="sb"&gt;`go vet`&lt;/span&gt;: include all output (usually short)
&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="sb"&gt;`go test`&lt;/span&gt;: include all lines containing &lt;span class="sb"&gt;`FAIL`&lt;/span&gt;, &lt;span class="sb"&gt;`panic`&lt;/span&gt;, or &lt;span class="sb"&gt;`Error`&lt;/span&gt;, plus the last 40 lines of output
&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="sb"&gt;`golangci-lint`&lt;/span&gt;: include the first 30 lines of lint errors

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Ship Skills
&lt;/h2&gt;

&lt;p&gt;All those agents will not run by their own, we still need skill to orchestrate those agent into workflow&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;/ship &amp;lt;task or Jira key&amp;gt;

Pipeline:

0. Clarifier → CLEAR or ask user one question
1. Create branch
2. Planner → PLAN_WRITTEN
3. Implementer (initial)
4. Review-retry loop (max 2 attempts)
   └─ BLOCKED → implementer fixes blockers → reviewer retries
5. Tester
6. Success summary
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Design Decisions
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Agents emit structured tokens (PLAN_WRITTEN:, REVIEW_RESULT:, BLOCKED:), not prose. Prose forces orchestrator to run a second LLM call just to extract intent — added latency, added cost, and a new failure surface for hallucinated routing. Structured tokens let orchestrator branch with a simple string match: deterministic, zero inference, no misroute&lt;/li&gt;
&lt;li&gt;80% confidence threshold — the most critical quality lever. False positives teach engineers to ignore the reviewer; high-noise output gets skipped, not fixed&lt;/li&gt;
&lt;li&gt;Different agents have different cost/capability tradeoffs. Planner needs deep reasoning (Opus). Reviewer needs precision (Sonnet). Tester just runs commands (Haiku). Wrong model assignment burns budget or misses findings&lt;/li&gt;
&lt;li&gt;Review-retry capped at 2. Uncapped loops are a denial-of-wallet attack on API credits&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;If you've ever caught yourself doing the same "explore → plan → implement → review → test" loop for the tenth time, you don't have to. The loop is automatable. You just have to write it down&lt;/p&gt;
&lt;/blockquote&gt;

</description>
      <category>ai</category>
      <category>claude</category>
      <category>agents</category>
      <category>agentskills</category>
    </item>
    <item>
      <title>The Dangers of High-Cardinality Labels in Prometheus</title>
      <dc:creator>ibrohim syarif</dc:creator>
      <pubDate>Sun, 22 Feb 2026 04:51:57 +0000</pubDate>
      <link>https://dev.to/ibrohhm/the-dangers-of-high-cardinality-labels-in-prometheus-poi</link>
      <guid>https://dev.to/ibrohhm/the-dangers-of-high-cardinality-labels-in-prometheus-poi</guid>
      <description>&lt;p&gt;We're all familiar with the warnings: "&lt;em&gt;Don't use user_id as a Prometheus label&lt;/em&gt;" or "&lt;em&gt;Don't use transaction codes as labels — they can crash Prometheus&lt;/em&gt;". But do we really understand why these are so dangerous?&lt;/p&gt;

&lt;p&gt;Before that, we need to know how Prometheus works.&lt;/p&gt;

&lt;h2&gt;
  
  
  How Prometheus Works
&lt;/h2&gt;

&lt;p&gt;Prometheus is an open-source systems monitoring and alerting tool that collects and stores its metrics as time-series data. It periodically scrapes metrics from your services based on the configured interval.&lt;/p&gt;

&lt;p&gt;This is an example of the config:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;scrape_configs:
  - job_name: 'golang-app'
    static_configs:
      - targets: ['localhost:8080']
    scrape_interval: 5s
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This config will tell Prometheus to:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Target&lt;/strong&gt;: send an HTTP request &lt;code&gt;GET to http://localhost:8080/metrics&lt;/code&gt;&lt;br&gt;
&lt;strong&gt;Periodically&lt;/strong&gt;: for every 5 seconds&lt;br&gt;
&lt;strong&gt;Label&lt;/strong&gt;: with &lt;code&gt;job=golang-app&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe42h41ov5e38ed2hpzvj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe42h41ov5e38ed2hpzvj.png" alt="how_prometheus_works" width="800" height="266"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Prometheus has three metric types:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Gauges&lt;/strong&gt; represent current measurements and reflect the current state of a system, such as CPU usage and memory usage.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Counters&lt;/strong&gt; measure discrete events that continuously increase over time. Common examples are the number of HTTP requests received, CPU seconds spent, and bytes sent.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Histogram&lt;/strong&gt; tracks the distribution of observed values. For a base metric name &lt;code&gt;&amp;lt;basename&amp;gt;&lt;/code&gt;, it exposes multiple related time series:&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;code&gt;&amp;lt;basename&amp;gt;_bucket{le="..."}&lt;/code&gt; — Cumulative counters representing the number of observations that fall within each bucket boundary&lt;br&gt;
&lt;code&gt;&amp;lt;basename&amp;gt;_sum&lt;/code&gt; — The total sum of all observed values&lt;br&gt;
&lt;code&gt;&amp;lt;basename&amp;gt;_count&lt;/code&gt; — The count of events that have been observed&lt;/p&gt;
&lt;h2&gt;
  
  
  Time Series Database (TSDB)
&lt;/h2&gt;

&lt;p&gt;Prometheus collects and stores metrics as time series. Each time series is uniquely identified by a metric name and a set of labels, while each sample within the series contains a timestamp and a value. Each unique combination of labels (method, path, and status) represents a separate time series whose value increases as more requests are processed, with total &lt;code&gt;method x path x status&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdn5fmkmn7auzhkgka8rn.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdn5fmkmn7auzhkgka8rn.png" alt="methodxpathxstatus" width="800" height="150"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In the Counter metrics example, suppose we have two endpoints: &lt;code&gt;GET: /api/data&lt;/code&gt;, &lt;code&gt;GET: /api/users&lt;/code&gt;, and each of which can return either a &lt;code&gt;200&lt;/code&gt; or &lt;code&gt;500&lt;/code&gt; status code. This results in the following metrics:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;http_requests_total{method="GET", path="/api/data",  status="200"} 17
http_requests_total{method="GET", path="/api/data",  status="500"} 0
http_requests_total{method="GET", path="/api/users", status="200"} 10
http_requests_total{method="GET", path="/api/users", status="500"} 2
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Because each time series represents a &lt;em&gt;unique combination of labels&lt;/em&gt;, these four label combinations produce four distinct time series. In the time-series database (TSDB), each of these time series is stored independently:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;// time series 1
2026-02-19 09:00:00 | {__name__="http_requests_total", method="GET", path="/api/data", status="200"} | 15
2026-02-19 09:00:05 | {__name__="http_requests_total", method="GET", path="/api/data", status="200"} | 16
2026-02-19 09:00:10 | {__name__="http_requests_total", method="GET", path="/api/data", status="200"} | 17

// time series 2
2026-02-19 09:00:00 | {__name__="http_requests_total", method="GET", path="/api/data", status="500"} | 0
2026-02-19 09:00:05 | {__name__="http_requests_total", method="GET", path="/api/data", status="500"} | 0
2026-02-19 09:00:10 | {__name__="http_requests_total", method="GET", path="/api/data", status="500"} | 0

// time series 3
2026-02-19 09:00:00 | {__name__="http_requests_total", method="GET", path="/api/users", status="200"} | 8
2026-02-19 09:00:05 | {__name__="http_requests_total", method="GET", path="/api/users", status="200"} | 9
2026-02-19 09:00:10 | {__name__="http_requests_total", method="GET", path="/api/users", status="200"} | 10

// time series 4
2026-02-19 09:00:00 | {__name__="http_requests_total", method="GET", path="/api/users", status="500"} | 1
2026-02-19 09:00:05 | {__name__="http_requests_total", method="GET", path="/api/users", status="500"} | 2
2026-02-19 09:00:10 | {__name__="http_requests_total", method="GET", path="/api/users", status="500"} | 2
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  The Dangers
&lt;/h2&gt;

&lt;p&gt;Let's go back to the warning: "&lt;em&gt;Don't use user_id as a Prometheus label&lt;/em&gt;" or "&lt;em&gt;Don't use transaction codes as labels — they can crash Prometheus.&lt;/em&gt;"&lt;/p&gt;

&lt;p&gt;Imagine you want to record transaction latency using metric labels such as:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;status&lt;/code&gt;: &lt;code&gt;pending&lt;/code&gt;, &lt;code&gt;paid&lt;/code&gt;, &lt;code&gt;success&lt;/code&gt;, &lt;code&gt;failed&lt;/code&gt;&lt;br&gt;
&lt;code&gt;payment_type&lt;/code&gt;: &lt;code&gt;wallet&lt;/code&gt;, &lt;code&gt;cash&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Here, &lt;code&gt;status&lt;/code&gt; has 4 possible values, &lt;code&gt;payment_type&lt;/code&gt; has 2 possible values. It will produce &lt;code&gt;status (4) x payment_type (2) = 8 time series&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0s1s5xzzvem70wkw12qj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0s1s5xzzvem70wkw12qj.png" alt="statusxpayment_type" width="771" height="203"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This is the example result of the metrics&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F100krkandp6x0r5wsj3s.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F100krkandp6x0r5wsj3s.png" alt="total processing time rate" width="800" height="218"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;there are exactly 8 label for the metrics&lt;/p&gt;

&lt;p&gt;Then, you adjust the metrics by adding a &lt;code&gt;code&lt;/code&gt; label, allowing request rates, error rates, and traffic patterns to be broken down per transaction&lt;/p&gt;

&lt;p&gt;&lt;code&gt;code&lt;/code&gt;: a unique identifier for each transaction&lt;/p&gt;

&lt;p&gt;However, &lt;code&gt;code&lt;/code&gt; is unique for every transaction and grows continuously with request volume. As a result, the number of possible values for &lt;code&gt;code&lt;/code&gt; is &lt;strong&gt;unbounded&lt;/strong&gt; and &lt;strong&gt;increases over time&lt;/strong&gt;. &lt;code&gt;status (4) × payment_type (2) × code (∞) = ∞ time series&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzg78isnri0arom4r2vxs.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzg78isnri0arom4r2vxs.png" alt="statusxpayment_typexcode" width="800" height="144"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This is the example result of the metrics&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbxpfskghz8w5v40ofxxr.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbxpfskghz8w5v40ofxxr.png" alt="total processing time rate with code" width="800" height="220"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This single unbounded label is enough to turn an otherwise manageable metric into a high-cardinality time-series explosion that can cause memory exhaustion and query performance.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsyxl4vuasgyc9gya0vfj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsyxl4vuasgyc9gya0vfj.png" alt="nuke" width="559" height="446"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;p&gt;Adding labels whose values grow unbounded over time—such as UUIDs, timestamps, user IDs, or transaction codes—is strongly discouraged. These labels rarely add meaningful value at the metrics level and introduce high cardinality. For high-cardinality data, better use logging, not metrics&lt;/p&gt;

&lt;p&gt;High-cardinality labels can lead to serious issues, including:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Huge memory usage&lt;/strong&gt; — each unique label set creates a new time series&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Rapid disk growth&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Slow queries&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Scrape performance issue&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It's better to use labels that have semantic meaning, and strongly recommended to keep the number of labels to a minimum&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;A wise man says, "Never use a label whose value grows with users, requests, or time."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;Good labels describe what something is, not who or which exact instance.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;code: &lt;a href="https://github.com/ibrohhm/prometheus-grafana-golang" rel="noopener noreferrer"&gt;https://github.com/ibrohhm/prometheus-grafana-golang&lt;/a&gt;&lt;/p&gt;

</description>
      <category>prometheus</category>
      <category>monitoring</category>
    </item>
    <item>
      <title>Circuit Breaker Pattern</title>
      <dc:creator>ibrohim syarif</dc:creator>
      <pubDate>Wed, 04 Dec 2024 12:00:00 +0000</pubDate>
      <link>https://dev.to/ibrohhm/circuit-breaker-pattern-1775</link>
      <guid>https://dev.to/ibrohhm/circuit-breaker-pattern-1775</guid>
      <description>&lt;p&gt;Integrating with partners often got unexpected behavior due some isssue on their server that impact to our service performance. lets say the integration flow look like this&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4ludyh9alen5a1jd7f7b.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4ludyh9alen5a1jd7f7b.png" alt="simple partner integration" width="800" height="323"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If the partner responds successfully, our service forwards the response data to the client. Otherwise, if the partner returns an error, our service will relay the error message to the client. Similar to our server, the partner have maintenance or unexpected issue that make it inaccessible. When their server fails to respond, every request to their server will get not responding error and giving unnecessary waiting time, with huge traffic this issue very possible will cause our server to crash. So what should we do to prevent that happen?&lt;/p&gt;

&lt;h2&gt;
  
  
  Solution
&lt;/h2&gt;

&lt;p&gt;The issue in this article isn't about the persistent errors from the partner but rather the additional response time caused by these errors, which could lead to our server crash (see this article &lt;a href="https://dev.to/ibrohhm/crash-and-timeout-simulation-jbp"&gt;https://dev.to/ibrohhm/crash-and-timeout-simulation-jbp&lt;/a&gt;). To solve this, we need add an another layer to manage the partner connection, acting as circuit breaker if the connection goes bad it will break the connection and return the request immediately without waiting for the partner response&lt;/p&gt;

&lt;p&gt;the circuit breaker pattern have three states&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;closed&lt;/strong&gt; means the service allow to make connections&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;half-open&lt;/strong&gt; means the service allow to make connections with limited number&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;open&lt;/strong&gt; means the service not allow to make connections, it will return error immediately&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;this is detail curcuit breaker flow&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fubm3qled4qjt4ujuycio.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fubm3qled4qjt4ujuycio.png" alt="circuit breaker flow" width="499" height="602"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;curcuit breaker allows us to control the partner connection effectively. By implement circuit breaker in our integration flow, we have no worries about the unexpected partner failure, it will cut the connection automatically and prevent our service from potential crashes due the unnecessary waiting times&lt;/p&gt;

</description>
    </item>
    <item>
      <title>Crash and Timeout Simulation</title>
      <dc:creator>ibrohim syarif</dc:creator>
      <pubDate>Sun, 21 Jul 2024 08:35:21 +0000</pubDate>
      <link>https://dev.to/ibrohhm/crash-and-timeout-simulation-jbp</link>
      <guid>https://dev.to/ibrohhm/crash-and-timeout-simulation-jbp</guid>
      <description>&lt;p&gt;Image you have apps that required called partner to served your data, the partner sometimes got unexpected behavior that we cannot control, let say it's random delay everytime you request from the partner. It's very tiny detail but if we are not handle the partner request well, it will causing our server down. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--Mh96w9l6--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://media1.giphy.com/media/GGyEfuIWI43zq/200.webp%3Fcid%3D790b76117uyc65rzr9vkylyz4c2io08uj7brbqj1a98hs4xb%26ep%3Dv1_gifs_search%26rid%3D200.webp%26ct%3Dg" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--Mh96w9l6--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://media1.giphy.com/media/GGyEfuIWI43zq/200.webp%3Fcid%3D790b76117uyc65rzr9vkylyz4c2io08uj7brbqj1a98hs4xb%26ep%3Dv1_gifs_search%26rid%3D200.webp%26ct%3Dg" alt="crash" width="300" height="200"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This article is focus on the simulation for your server to handle this partner behavior&lt;/p&gt;

&lt;p&gt;To simulate this we will create three service (client, server, partner) using golang&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;client --&amp;gt; server --&amp;gt; partner
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;client: call the server with go routine&lt;/li&gt;
&lt;li&gt;server: the server will forward the request from client to partner, act as middleware&lt;/li&gt;
&lt;li&gt;partner: simple hello world golang with random timeout&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Partner
&lt;/h3&gt;

&lt;p&gt;Partner service is simple http call with random delay&lt;/p&gt;

&lt;p&gt;The partner service only have &lt;code&gt;get /data&lt;/code&gt; endpoint with response &lt;em&gt;Hello from Partner Service&lt;/em&gt; with generate random delay everytime request the data (1-10 second delay). The partner also have logging to show the &lt;em&gt;delay_set&lt;/em&gt; and &lt;em&gt;time&lt;/em&gt; when request occur. So we can monitor the request well&lt;/p&gt;

&lt;p&gt;See the implementation: (&lt;a href="https://github.com/ibrohhm/crash_and_timeout_simulation/blob/master/partner/partner.go" rel="noopener noreferrer"&gt;partner service&lt;/a&gt;)&lt;/p&gt;

&lt;p&gt;How to run: &lt;code&gt;go run partner.go&lt;/code&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Server
&lt;/h3&gt;

&lt;p&gt;Server service is your internal service to handle the request from client. To simulate the crash, we need to set the memory limit allocation (&lt;code&gt;MemoryLimit&lt;/code&gt;) so we can simulate the crash without crashing your laptop. When running the server, it will checking the memory usage in every 1 second (&lt;code&gt;getMemoryUsage&lt;/code&gt;) and the memory usage is exceed the &lt;code&gt;MemoryLimit&lt;/code&gt; we will stop the server. The service also have logging to show the &lt;em&gt;method&lt;/em&gt;, &lt;em&gt;url&lt;/em&gt;, &lt;em&gt;latency&lt;/em&gt;, &lt;em&gt;status&lt;/em&gt;, &lt;em&gt;error&lt;/em&gt;, and &lt;em&gt;memory_usage&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;See the implementation: &lt;a href="https://github.com/ibrohhm/crash_and_timeout_simulation/blob/master/server/server.go" rel="noopener noreferrer"&gt;server service&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;How to run: &lt;code&gt;go run server.go&lt;/code&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Client
&lt;/h3&gt;

&lt;p&gt;Client service is simple golang apps that will do 100 request in 1 second to the server with go routine. I choose to create client service instead of using load test application like &lt;code&gt;JMeter&lt;/code&gt;, so we can see the logger for every request&lt;/p&gt;

&lt;p&gt;See the implementation: &lt;a href="https://github.com/ibrohhm/crash_and_timeout_simulation/blob/master/client/client.go" rel="noopener noreferrer"&gt;client service&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;How to run: &lt;code&gt;go run client.go&lt;/code&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Simulation
&lt;/h2&gt;

&lt;p&gt;In this section we will do three simulation&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;partner with no delay&lt;/li&gt;
&lt;li&gt;partner with random delay but no timeout set in the server&lt;/li&gt;
&lt;li&gt;partner with random delay with timeout set in the server&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--_2Mubwzf--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://media0.giphy.com/media/eMB8ru08jqn8wbjmgM/giphy.webp%3Fcid%3Decf05e47rfcnxhben1oyox4g5b3fyazmb682a2skuos5blyh%26ep%3Dv1_gifs_search%26rid%3Dgiphy.webp%26ct%3Dg" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--_2Mubwzf--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://media0.giphy.com/media/eMB8ru08jqn8wbjmgM/giphy.webp%3Fcid%3Decf05e47rfcnxhben1oyox4g5b3fyazmb682a2skuos5blyh%26ep%3Dv1_gifs_search%26rid%3Dgiphy.webp%26ct%3Dg" alt="simulation" width="480" height="480"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Case 1
&lt;/h3&gt;

&lt;p&gt;Every case always have happy case and this is it, our partner service have good spec and never got delay everytime we request. to make this possible you need to change the delay on partner code from &lt;code&gt;delay := time.Duration(rand.Intn(11))&lt;/code&gt; to &lt;code&gt;delay := time.Duration(0)&lt;/code&gt; (&lt;a href="https://github.com/ibrohhm/crash_and_timeout_simulation/blob/master/partner/partner.go#L14" rel="noopener noreferrer"&gt;ref&lt;/a&gt;)&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;client --&amp;gt; server --&amp;gt; partner
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;this is the result&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffpqtzb4rn8si4hfkapro.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffpqtzb4rn8si4hfkapro.png" alt="case 1" width="800" height="423"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;the partner, the server, the client is all good. everyone happy&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--3aVMesuI--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://media3.giphy.com/media/xSM46ernAUN3y/giphy.webp%3Fcid%3D790b7611j8zbdbqkmkc8ijz491n7ea1h060b1s3ukkaw3niw%26ep%3Dv1_gifs_search%26rid%3Dgiphy.webp%26ct%3Dg" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--3aVMesuI--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://media3.giphy.com/media/xSM46ernAUN3y/giphy.webp%3Fcid%3D790b7611j8zbdbqkmkc8ijz491n7ea1h060b1s3ukkaw3niw%26ep%3Dv1_gifs_search%26rid%3Dgiphy.webp%26ct%3Dg" alt="happy" width="245" height="256"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Case 2
&lt;/h3&gt;

&lt;p&gt;Our partner service have random delay (delay := time.Duration(rand.Intn(11))) and our server service not set the timeout when request to partner service&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;client --&amp;gt; server --&amp;gt; partner (random delay)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;this is the result&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4sdbpsnmsby0z7insh9y.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4sdbpsnmsby0z7insh9y.png" alt="case 2" width="800" height="426"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;our server got killed in 24 second since the memory usage exceed the MemoryLimit. This is because the client service continuosly spawn new request using goroutine to call server service, since there's no limit on the number of goroutine being spawend, the server service request the partner service with hugh number. Because of the partner delay, most of the request running at the same time and consume all available memory then leading to a crash&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--pMVDgnIp--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://media4.giphy.com/media/v1.Y2lkPTc5MGI3NjExZmVveWE4dGEzZXI3NzRvbmpneWo2Y3dmOWFjd2V5eXA0YTVoOWF0byZlcD12MV9naWZzX3NlYXJjaCZjdD1n/9M5jK4GXmD5o1irGrF/giphy.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--pMVDgnIp--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://media4.giphy.com/media/v1.Y2lkPTc5MGI3NjExZmVveWE4dGEzZXI3NzRvbmpneWo2Y3dmOWFjd2V5eXA0YTVoOWF0byZlcD12MV9naWZzX3NlYXJjaCZjdD1n/9M5jK4GXmD5o1irGrF/giphy.webp" alt="this is fine" width="436" height="500"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;what happen if we set the timeout request on the server service?&lt;/p&gt;

&lt;h3&gt;
  
  
  Case 3
&lt;/h3&gt;

&lt;p&gt;Our partner have random delay but our server set the timeout request. we need to change the &lt;code&gt;Timeout&lt;/code&gt; variable in server to some number, let say 3 second &lt;code&gt;const Timeout = 3&lt;/code&gt; (&lt;a href="https://github.com/ibrohhm/crash_and_timeout_simulation/blob/master/server/server.go#L16" rel="noopener noreferrer"&gt;ref&lt;/a&gt;)&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;client --&amp;gt; server --[with timeout]--&amp;gt; partner (random delay)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;this is the result&lt;br&gt;
&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsrcmdmq5qohk7h9yxmvw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsrcmdmq5qohk7h9yxmvw.png" alt="case 3" width="800" height="425"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If you running the simulation, you'll see our server service not get killed from the exceed memory allocation. if you look more closely in the logger, the memory_usage of the server is always around 6MB - 12MB (never exceed the 20MB) this is because the timeout killed the ongoing request and release the memory allocation&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fo1as5q0w0go5zccmjzco.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fo1as5q0w0go5zccmjzco.png" alt="time out logger" width="800" height="142"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--jbx9RfOv--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://media3.giphy.com/media/qwXFFwQATRG4o/giphy.webp%3Fcid%3D790b7611rf3qii0dabti82nahvuohf6iwi9yrsk2nlcpvd4n%26ep%3Dv1_gifs_search%26rid%3Dgiphy.webp%26ct%3Dg" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--jbx9RfOv--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://media3.giphy.com/media/qwXFFwQATRG4o/giphy.webp%3Fcid%3D790b7611rf3qii0dabti82nahvuohf6iwi9yrsk2nlcpvd4n%26ep%3Dv1_gifs_search%26rid%3Dgiphy.webp%26ct%3Dg" alt="better" width="220" height="148"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Summaries
&lt;/h2&gt;

&lt;p&gt;The partner service behavior is the external thing that we cannot control, we cannot trust the partner to have good behavior. sometimes it got delay, sometimes we cannot access it doe their internal error or else. the small delay maybe will cause our server down (like the simulation), so we better to prevent that happen and one way to prevent it is like adding the timeout when request to the server&lt;/p&gt;

&lt;p&gt;source code and simulation videos: &lt;a href="https://github.com/ibrohhm/crash_and_timeout_simulation" rel="noopener noreferrer"&gt;https://github.com/ibrohhm/crash_and_timeout_simulation&lt;/a&gt;&lt;/p&gt;

</description>
      <category>simulation</category>
      <category>timeout</category>
      <category>crash</category>
      <category>go</category>
    </item>
    <item>
      <title>Know Better About N+1 Queries Problem</title>
      <dc:creator>ibrohim syarif</dc:creator>
      <pubDate>Wed, 29 Nov 2023 16:41:42 +0000</pubDate>
      <link>https://dev.to/ibrohhm/know-better-about-n1-queries-problem-gpc</link>
      <guid>https://dev.to/ibrohhm/know-better-about-n1-queries-problem-gpc</guid>
      <description>&lt;h2&gt;
  
  
  Overview
&lt;/h2&gt;

&lt;p&gt;In the engineering process we often facing the case to query all the data based on it's parent and the data will be used for some reason. For example, let say there is &lt;code&gt;users&lt;/code&gt; table that has correlation one-to-many with the &lt;code&gt;transactions&lt;/code&gt; table, you need to get all the users and it's transactions based on the &lt;code&gt;user_ids&lt;/code&gt; that given from argument. The simple logic that we will do is to get all the user with id include in &lt;code&gt;user_ids&lt;/code&gt; then get all the transactions one by one&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;def load_data(user_ids)
    result = []
    users = User.where(id: user_ids)
    users.each do |user|
        result &amp;lt;&amp;lt; { user: user, transactions: user.transactions }
    end

    result
end
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;it's really simple logic, but is it good enough? is it bad? is it our service can endure the high throughput? is there any way to make it more efficient?&lt;/p&gt;

&lt;h2&gt;
  
  
  Look Inside the Query
&lt;/h2&gt;

&lt;p&gt;Let say we have this model&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;class User &amp;lt; ApplicationRecord
  has_many :transactions
end
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;class Transaction &amp;lt; ApplicationRecord
  belongs_to :user
end
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;we're gonna simulate the query in the rails console (run: &lt;code&gt;rails console&lt;/code&gt;)&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fz3s669swxvh01ezrqb7l.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fz3s669swxvh01ezrqb7l.png" alt="load_data method"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;we see in the image, the &lt;code&gt;load_data&lt;/code&gt; method called 4 queries to the database. The first one is query all the users based on the user_ids, and the 3 others are queries to fetch the transactions for each user.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SELECT "users".* FROM "users" WHERE "users"."id" IN (?, ?, ?)  [["id", 1], ["id", 2], ["id", 3]]
SELECT "transactions".* FROM "transactions" WHERE "transactions"."user_id" = ? /* loading for inspect */ LIMIT ?  [["user_id", 1], ["LIMIT", 11]]
SELECT "transactions".* FROM "transactions" WHERE "transactions"."user_id" = ? /* loading for inspect */ LIMIT ?  [["user_id", 2], ["LIMIT", 11]]
SELECT "transactions".* FROM "transactions" WHERE "transactions"."user_id" = ? /* loading for inspect */ LIMIT ?  [["user_id", 3], ["LIMIT", 11]]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;What happen if the user_ids is so big? will we query to fetch the transactions as much as the user that we have? now we facing the N+1 queries problem&lt;/p&gt;

&lt;h2&gt;
  
  
  N+1 query problem?
&lt;/h2&gt;

&lt;p&gt;This is common problem in the database query, it will execute the query one-by-one for all instance instead of 1 or 2 queries. In the example above we fetch all the three users data, then continue with query all the transactions for each user, it count 4 queries (1+3). If the are N users data, first it will fetch all the N users then continue to query all the transactions for each user, so it's called N+1 queries.&lt;/p&gt;

&lt;p&gt;The problem in the N+1 queries is each query will take some amount of time, bigger data that we fetch bigger time that we need and we may facing the timeout issue. N+1 query is not good for the performance and we need find the solution&lt;/p&gt;

&lt;p&gt;*we can ignore the N+1 query if the data is small or low throughput&lt;/p&gt;

&lt;h2&gt;
  
  
  Eager Load
&lt;/h2&gt;

&lt;p&gt;In ruby, we have Eager load mechanism to load all the data and it's association with single query. One of the method to trigger the eager_load is &lt;code&gt;.includes&lt;/code&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;def load_data_with_eager_load(user_ids)
    result = []
    users = User.includes(:transactions).where(id: user_ids)
    users.each do |user|
        result &amp;lt;&amp;lt; { user: user, transactions: user.transactions }
    end

    result
end
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;the method above will give result&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdcxfvyxruf9z5yfzw5s3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdcxfvyxruf9z5yfzw5s3.png" alt="load_data_with_eager_load method"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;if we look closely, the &lt;code&gt;load_data_with_eager_load&lt;/code&gt; method only trigger two query. First query get all the users and the second query get all the transactions for all users&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SELECT "users".* FROM "users" WHERE "users"."id" IN (?, ?, ?)  [["id", 1], ["id", 2], ["id", 3]]
SELECT "transactions".* FROM "transactions" WHERE "transactions"."user_id" IN (?, ?, ?)  [["user_id", 1], ["user_id", 2], ["user_id", 3]]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It reduce the database query significantly, the eager load only cost 2 queries for however much data we have&lt;/p&gt;

</description>
      <category>query</category>
    </item>
  </channel>
</rss>
