<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Joud Awad</title>
    <description>The latest articles on DEV Community by Joud Awad (@thejoud1997).</description>
    <link>https://dev.to/thejoud1997</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1238326%2F5d65a5d6-611d-4526-9bc2-d2d8643d5226.png</url>
      <title>DEV Community: Joud Awad</title>
      <link>https://dev.to/thejoud1997</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/thejoud1997"/>
    <language>en</language>
    <item>
      <title>56/60 Days System Design Questions</title>
      <dc:creator>Joud Awad</dc:creator>
      <pubDate>Wed, 01 Jul 2026 16:29:39 +0000</pubDate>
      <link>https://dev.to/thejoud1997/5660-days-system-design-questions-17hi</link>
      <guid>https://dev.to/thejoud1997/5660-days-system-design-questions-17hi</guid>
      <description>&lt;p&gt;Your background job ran for 4 minutes and nobody knows if it finished.&lt;/p&gt;

&lt;p&gt;That's not a job queue problem. That's a missing design problem.&lt;/p&gt;

&lt;p&gt;Long-running jobs break every assumption you built for synchronous APIs. Your load balancer times out after 30s. Your mobile client doesn't know whether to retry. Your retry logic re-runs a job that already half-completed.&lt;/p&gt;

&lt;p&gt;Here's the real scenario:&lt;/p&gt;

&lt;p&gt;You're processing a video upload. The job takes 2–8 minutes. Millions of users.&lt;/p&gt;

&lt;p&gt;What do you expose to the client?&lt;/p&gt;

&lt;p&gt;A) Polling endpoint — client hits /jobs/:id/status every 5s until done&lt;br&gt;
B) Webhook — job fires a POST to client's callback URL on completion&lt;br&gt;
C) SSE / WebSocket — server pushes progress updates in real time&lt;br&gt;
D) Synchronous wait — keep the HTTP connection open until the job finishes&lt;/p&gt;

&lt;p&gt;One scales to millions without coupling your infrastructure to client uptime.&lt;/p&gt;

&lt;p&gt;The others have hard production failure modes most teams don't discover until 3 AM.&lt;/p&gt;

&lt;p&gt;The deeper problem isn't transport — it's these 4 things nobody gets right the first time:&lt;/p&gt;

&lt;p&gt;→ Idempotency. Every job must be safe to re-run. If your retry logic can double-charge, double-send, or double-process — you don't have retries, you have bugs waiting.&lt;/p&gt;

&lt;p&gt;→ Progress granularity. "0% → 100%" is useless for a 6-minute job. You need intermediate states: queued, processing, transcoding, uploading, complete. Clients need something to show users.&lt;/p&gt;

&lt;p&gt;→ Timeout vs failure. A job that stops responding isn't the same as a job that failed. Dead workers, OOM kills, spot instance evictions — your queue needs a heartbeat or a visibility timeout, not just a try/catch.&lt;/p&gt;

&lt;p&gt;→ Deduplication. The client will retry. Your queue will redeliver. You need a dedup key scoped to the original request — not the job run.&lt;/p&gt;

&lt;p&gt;Pick one — A, B, C, or D — and tell me why. Full breakdown in the comments.&lt;/p&gt;

&lt;h1&gt;
  
  
  30DaysOfSystemDesign #SystemDesign #BackendEngineering #DistributedSystems
&lt;/h1&gt;

</description>
      <category>abotwrotethis</category>
      <category>systemdesign</category>
      <category>backend</category>
      <category>softwareengineering</category>
    </item>
    <item>
      <title>55/60 Days System Design Questions</title>
      <dc:creator>Joud Awad</dc:creator>
      <pubDate>Tue, 30 Jun 2026 16:46:23 +0000</pubDate>
      <link>https://dev.to/thejoud1997/5560-days-system-design-questions-1bj</link>
      <guid>https://dev.to/thejoud1997/5560-days-system-design-questions-1bj</guid>
      <description>&lt;p&gt;You built an agent. It works.&lt;/p&gt;

&lt;p&gt;Now you need 5 of them running in parallel, sharing state, and handing off work to each other.&lt;/p&gt;

&lt;p&gt;Your pipeline breaks on the first real workload.&lt;/p&gt;

&lt;p&gt;Here's the setup:&lt;br&gt;
You're building a research agent system. A user asks a complex question. You need to:&lt;/p&gt;

&lt;p&gt;• Fan out to 3 specialized sub-agents simultaneously&lt;br&gt;
• One agent might spawn 2 more based on what it finds&lt;br&gt;
• They all write back to shared context&lt;br&gt;
• A final agent synthesizes everything&lt;/p&gt;

&lt;p&gt;Classic multi-agent orchestration. You have 4 options for how agents coordinate.&lt;/p&gt;

&lt;p&gt;A) Centralized Orchestrator — one controller agent dispatches tasks, collects results, manages shared state. Agents are dumb workers.&lt;/p&gt;

&lt;p&gt;B) Decentralized Peer Handoff — each agent decides who gets the task next. No central controller. Agents communicate directly.&lt;/p&gt;

&lt;p&gt;C) Shared Message Queue + Blackboard — all agents read/write to a shared blackboard. Coordination happens through state, not calls.&lt;/p&gt;

&lt;p&gt;D) Hierarchical Nesting — orchestrator spawns sub-orchestrators. Each sub-tree is self-coordinating. Recursive decomposition of the problem.&lt;/p&gt;

&lt;p&gt;Pick one — A, B, C, or D — and tell me why. Full breakdown in the comments.&lt;/p&gt;

&lt;p&gt;Drop your answer 👇&lt;/p&gt;

</description>
      <category>abotwrotethis</category>
      <category>ai</category>
      <category>agents</category>
      <category>systemdesign</category>
    </item>
    <item>
      <title>54/60 Days System Design Questions</title>
      <dc:creator>Joud Awad</dc:creator>
      <pubDate>Mon, 29 Jun 2026 16:20:56 +0000</pubDate>
      <link>https://dev.to/thejoud1997/5460-days-system-design-questions-4ojp</link>
      <guid>https://dev.to/thejoud1997/5460-days-system-design-questions-4ojp</guid>
      <description>&lt;p&gt;You built a RAG pipeline. Works great in dev.&lt;/p&gt;

&lt;p&gt;6 months later, your users complain: "The search results are garbage."&lt;/p&gt;

&lt;p&gt;You haven't changed a line of code.&lt;/p&gt;

&lt;p&gt;Here's what happened:&lt;/p&gt;

&lt;p&gt;Your product evolved. New features, new docs, new support tickets. The data drifted — but your embedding index didn't.&lt;/p&gt;

&lt;p&gt;Now you're serving a 400GB FAISS index that was last rebuilt in January. Your chunks are stale. Your nearest-neighbor results point to deprecated docs. Your LLM is confidently hallucinating from outdated context.&lt;/p&gt;

&lt;p&gt;You need to fix this. 4 engineers each propose a solution:&lt;/p&gt;

&lt;p&gt;A) Scheduled full rebuild&lt;br&gt;
Every Sunday, re-embed the entire corpus from scratch. Replace the index atomically. Slow (4h+ at scale), expensive, but always fresh.&lt;/p&gt;

&lt;p&gt;B) Incremental upserts + soft delete&lt;br&gt;
On every document change, re-embed only the affected chunks. Mark deleted chunks as tombstoned. Keep a version field on each vector. Index size grows over time; compact quarterly.&lt;/p&gt;

&lt;p&gt;C) Embedding version registry + hot swap&lt;br&gt;
Track which embedding model version produced each vector. When the model drifts (fine-tuned or upgraded), invalidate the mismatched vectors and rebuild only those. Two indexes run in parallel during migration. Route traffic by model version.&lt;/p&gt;

&lt;p&gt;D) Approximate staleness detection&lt;br&gt;
Run a nightly job that samples 1% of your corpus, re-embeds it, and measures cosine distance against the stored vector. If drift exceeds a threshold, trigger a full rebuild. Otherwise, skip it. Cheap monitoring, reactive rebuilds.&lt;/p&gt;

&lt;p&gt;Real constraint: your corpus is 50M chunks. Full rebuild = 4 hours + ~$800 in embedding API cost. You deploy model updates every 6 weeks.&lt;/p&gt;

&lt;p&gt;Pick one — A, B, C, or D — and tell me why. Full breakdown in the comments.&lt;/p&gt;

&lt;h1&gt;
  
  
  30DaysOfSystemDesign #SystemDesign #MachineLearning #MLEngineering
&lt;/h1&gt;

</description>
      <category>abotwrotethis</category>
      <category>ai</category>
      <category>rag</category>
      <category>database</category>
    </item>
    <item>
      <title>53/60 Days System Design Questions</title>
      <dc:creator>Joud Awad</dc:creator>
      <pubDate>Sun, 28 Jun 2026 16:20:06 +0000</pubDate>
      <link>https://dev.to/thejoud1997/5360-days-system-design-questions-35hn</link>
      <guid>https://dev.to/thejoud1997/5360-days-system-design-questions-35hn</guid>
      <description>&lt;p&gt;Your migration ran fine in staging.&lt;/p&gt;

&lt;p&gt;Then you ran it in production.&lt;/p&gt;

&lt;p&gt;The app went down.&lt;/p&gt;

&lt;p&gt;Not because the SQL was wrong. Because you ran it on a live table with 40 million rows while 8 services were actively writing to it.&lt;/p&gt;

&lt;p&gt;Your setup:&lt;br&gt;
→ PostgreSQL. users table. 40M rows. Active writes from 8 services.&lt;br&gt;
→ Product request: split full_name into first_name + last_name.&lt;br&gt;
→ You have a 2-hour maintenance window tonight.&lt;/p&gt;

&lt;p&gt;The engineering question: how do you ship this without downtime?&lt;/p&gt;

&lt;p&gt;A) Run ALTER TABLE to drop full_name and add first_name + last_name in a single migration during the maintenance window.&lt;/p&gt;

&lt;p&gt;B) Add first_name + last_name as nullable columns first → backfill → update all services to write to both → drop full_name only after everything is migrated.&lt;/p&gt;

&lt;p&gt;C) Create a new users_v2 table with the target schema → dual-write to both tables → flip the read pointer → drain the old table.&lt;/p&gt;

&lt;p&gt;D) Add a DB view that aliases full_name as first_name || ' ' || last_name → let each service migrate off it at its own pace.&lt;/p&gt;

&lt;p&gt;Drop your answer 👇&lt;/p&gt;

</description>
      <category>abotwrotethis</category>
      <category>database</category>
      <category>backend</category>
      <category>software</category>
    </item>
    <item>
      <title>52/60 Days System Design Questions</title>
      <dc:creator>Joud Awad</dc:creator>
      <pubDate>Sat, 27 Jun 2026 16:12:50 +0000</pubDate>
      <link>https://dev.to/thejoud1997/5260-days-system-design-questions-2kd4</link>
      <guid>https://dev.to/thejoud1997/5260-days-system-design-questions-2kd4</guid>
      <description>&lt;p&gt;Your API just shipped a breaking change.&lt;/p&gt;

&lt;p&gt;/users now returns fullName instead of first_name + last_name. 3 mobile clients broke. 1 partner integration went down. Your on-call is not happy.&lt;/p&gt;

&lt;p&gt;You had a versioning strategy. It just wasn't the right one.&lt;/p&gt;

&lt;p&gt;There are 4 ways to version an API. Here's what actually happens when you pick each one in production:&lt;/p&gt;

&lt;p&gt;A — URL path versioning (/v1/users, /v2/users)&lt;br&gt;
Simple. Explicit. Every request makes the version visible in logs and caches. But now you're maintaining 2 full route trees. A bugfix in the business logic layer has to be patched in both. Teams quietly let v1 rot.&lt;/p&gt;

&lt;p&gt;B — Header versioning (API-Version: 2)&lt;br&gt;
Clean URLs. Version negotiation in the transport layer, not the path. Harder to test in a browser, invisible in logs unless you instrument for it, and clients forget to send the header — defaulting to whatever your server decides "latest" means.&lt;/p&gt;

&lt;p&gt;C — Query param versioning (/users?version=2)&lt;br&gt;
Fast to implement. Zero client SDK changes. Cache-unfriendly — every CDN layer treats ?version=1 and ?version=2 as separate cache keys. Works until you have 40 endpoints and version drift becomes untrackable.&lt;/p&gt;

&lt;p&gt;D — Content negotiation (Accept: application/vnd.api.v2+json)&lt;br&gt;
The REST-purist approach. Semantically correct — you're asking for a representation, not a route. Almost nobody implements it right. Client library support is inconsistent. One wrong Accept header and you get a 406.&lt;/p&gt;

&lt;p&gt;Which strategy does your team use?&lt;/p&gt;

</description>
      <category>abotwrotethis</category>
      <category>systemdesign</category>
      <category>api</category>
      <category>backend</category>
    </item>
    <item>
      <title>51/60 Days System Design Questions</title>
      <dc:creator>Joud Awad</dc:creator>
      <pubDate>Fri, 26 Jun 2026 16:37:54 +0000</pubDate>
      <link>https://dev.to/thejoud1997/5160-days-system-design-questions-10kh</link>
      <guid>https://dev.to/thejoud1997/5160-days-system-design-questions-10kh</guid>
      <description>&lt;p&gt;You're building a B2B SaaS product. 50 enterprise customers. Each one wants their data isolated. Some are on free plans. A few are paying $50k/year and demanding SLA guarantees.&lt;/p&gt;

&lt;p&gt;Your current setup:&lt;br&gt;
→ One database. One schema. A tenant_id column on every table.&lt;br&gt;
→ One app server handling all traffic.&lt;br&gt;
→ A free-tier customer running a badly-written bulk export just hammered your DB for 40 seconds. A paying enterprise customer's checkout flow timed out.&lt;/p&gt;

&lt;p&gt;Your investors are not happy. Neither is that enterprise customer.&lt;/p&gt;

&lt;p&gt;The engineering question: how do you isolate tenants without rebuilding the whole product?&lt;/p&gt;

&lt;p&gt;A) Keep one shared DB — add row-level security + query budgets per tenant to enforce limits.&lt;/p&gt;

&lt;p&gt;B) Schema-per-tenant — every customer gets their own schema in the same Postgres instance, migrations run per-schema.&lt;/p&gt;

&lt;p&gt;C) Database-per-tenant (silo model) — each enterprise customer gets a dedicated DB. Free tier stays pooled.&lt;/p&gt;

&lt;p&gt;D) Middleware bridge — route requests to tenant-specific DB clusters based on a tenant registry, free tier stays on shared pool.&lt;/p&gt;

&lt;p&gt;One of these is a band-aid. One will collapse under 500 tenants. One is how Notion, Salesforce, and every serious B2B at scale actually operates.&lt;/p&gt;

&lt;p&gt;Pick one — A, B, C, or D — and tell me why. Full breakdown in the comments (including the noisy neighbor failure mode nobody warns you about).&lt;/p&gt;

&lt;p&gt;If your team is building multi-tenant systems, share this. The wrong isolation model is a rewrite waiting to happen.&lt;/p&gt;

&lt;p&gt;Drop your answer 👇&lt;/p&gt;

&lt;h1&gt;
  
  
  30DaysOfSystemDesign #SystemDesign #BackendEngineering #SoftwareArchitecture
&lt;/h1&gt;

</description>
      <category>abotwrotethis</category>
      <category>systemdesign</category>
      <category>database</category>
      <category>software</category>
    </item>
    <item>
      <title>50/60 Days System Design Questions!</title>
      <dc:creator>Joud Awad</dc:creator>
      <pubDate>Thu, 25 Jun 2026 15:49:24 +0000</pubDate>
      <link>https://dev.to/thejoud1997/5060-days-system-design-questions-2126</link>
      <guid>https://dev.to/thejoud1997/5060-days-system-design-questions-2126</guid>
      <description>&lt;p&gt;Your ML team trained a model that hits 94% accuracy in the notebook.&lt;/p&gt;

&lt;p&gt;Then you deploy it.&lt;/p&gt;

&lt;p&gt;Peak traffic: 3,000 inference requests/second.&lt;br&gt;
P99 latency shoots to 4.2 seconds.&lt;br&gt;
GPU utilization: 23%.&lt;/p&gt;

&lt;p&gt;The model works. The serving layer is the bottleneck.&lt;/p&gt;

&lt;p&gt;Here's the setup:&lt;/p&gt;

&lt;p&gt;• PyTorch model, ~6B params&lt;br&gt;
• Single A100 GPU, 80GB VRAM&lt;br&gt;
• FastAPI wrapper calling model.predict() one request at a time&lt;br&gt;
• No batching, FP32 weights, no quantization&lt;/p&gt;

&lt;p&gt;Users are hitting timeouts. GPUs are sitting mostly idle. Your infra bill is climbing.&lt;/p&gt;

&lt;p&gt;What do you fix first?&lt;/p&gt;

&lt;p&gt;A) Enable dynamic batching — buffer incoming requests, group into batches, process in one forward pass.&lt;/p&gt;

&lt;p&gt;B) Quantize the model to INT8 — reduce weight precision from 32-bit to 8-bit, shrink memory footprint and speed up inference.&lt;/p&gt;

&lt;p&gt;C) Switch to tensor parallelism — split the model across multiple GPUs, distribute the compute.&lt;/p&gt;

&lt;p&gt;D) Add a request queue + async workers — decouple HTTP receiving from model inference, process jobs in background.&lt;/p&gt;

&lt;p&gt;All four are real production patterns. Only one directly fixes the combination of low GPU utilization + high latency at 3K RPS.&lt;/p&gt;

&lt;p&gt;Pick one — A, B, C, or D — and tell me why. Full breakdown in the comments.&lt;/p&gt;

</description>
      <category>abotwrotethis</category>
      <category>machinelearning</category>
      <category>distributedsystems</category>
      <category>systemdesign</category>
    </item>
    <item>
      <title>49/60 Days System Design Questions</title>
      <dc:creator>Joud Awad</dc:creator>
      <pubDate>Wed, 24 Jun 2026 19:43:26 +0000</pubDate>
      <link>https://dev.to/thejoud1997/4960-days-system-design-questions-gop</link>
      <guid>https://dev.to/thejoud1997/4960-days-system-design-questions-gop</guid>
      <description>&lt;p&gt;Your data team just opened a $4,200 BigQuery bill.&lt;/p&gt;

&lt;p&gt;For a single month. One analyst. 12 queries.&lt;/p&gt;

&lt;p&gt;The queries weren't wrong. They weren't inefficient SQL. They were reasonable analytics queries — "give me last 30 days of events for customer X." The problem was that every single one scanned the full 3.2 TB table. No partition pruning. No cost control. Just full scans, every time.&lt;/p&gt;

&lt;p&gt;This is the most expensive silent bug in data engineering. You write a query. It looks fast. It returns results. And every run quietly eats through terabytes you're paying per-byte to scan.&lt;/p&gt;

&lt;p&gt;The fix is partition strategy — but picking the wrong one doesn't just fail to help, it actively makes things worse.&lt;/p&gt;

&lt;p&gt;Here's the setup:&lt;/p&gt;

&lt;p&gt;You're running a 3.2 TB events table on BigQuery. 18 months of data. Ingested daily. Analyst queries almost always filter on two things: a date range ("last 30 days") and a customer_id ("for customer X").&lt;/p&gt;

&lt;p&gt;Which partition + clustering strategy do you pick?&lt;/p&gt;

&lt;p&gt;A) Partition by ingestion date — date range queries only scan the relevant day-partitions.&lt;/p&gt;

&lt;p&gt;B) Partition by customer_id ranges — customer queries only scan the relevant ID bucket.&lt;/p&gt;

&lt;p&gt;C) No partitioning — use Redshift DISTKEY on customer_id + SORTKEY on event_time.&lt;/p&gt;

&lt;p&gt;D) Partition by ingestion date + cluster on customer_id — date pruning cuts the partition, clustering prunes within it.&lt;/p&gt;

&lt;p&gt;Three of these reduce your bill. Only one fully prunes on both filters your analysts actually use.&lt;/p&gt;

&lt;p&gt;Pick one — A, B, C, or D — and tell me why. Full breakdown in the comments&lt;/p&gt;

&lt;p&gt;Drop your answer &lt;/p&gt;

&lt;h1&gt;
  
  
  30DaysOfSystemDesign #SystemDesign #DataEngineering #BigQuery
&lt;/h1&gt;

</description>
      <category>abotwrotethis</category>
      <category>dataengineering</category>
      <category>systemdesign</category>
      <category>database</category>
    </item>
    <item>
      <title>48/60 Days System Design Questions</title>
      <dc:creator>Joud Awad</dc:creator>
      <pubDate>Tue, 23 Jun 2026 17:14:44 +0000</pubDate>
      <link>https://dev.to/thejoud1997/4860-days-system-design-questions-11a5</link>
      <guid>https://dev.to/thejoud1997/4860-days-system-design-questions-11a5</guid>
      <description>&lt;p&gt;Your AI agent just got a user message: “Book me a flight to Dubai next Friday.”&lt;/p&gt;

&lt;p&gt;The LLM has access to 12 tools: search_flights, get_user_preferences, check_calendar, book_flight, send_confirmation, get_weather…&lt;/p&gt;

&lt;p&gt;How does the agent decide which tools to call, in what order, and when to stop?&lt;/p&gt;

&lt;p&gt;A) ReAct loop — model reasons step-by-step, emits a “thought” then picks one tool at a time, observes output, repeats until it self-decides it’s done&lt;/p&gt;

&lt;p&gt;B) Parallel tool calling — model emits ALL required tool calls in a single response, executes them concurrently, feeds all results back in one context update&lt;/p&gt;

&lt;p&gt;C) Forced function schema — you lock the model into a strict JSON schema per turn; it can’t produce free text, only structured tool calls you defined&lt;/p&gt;

&lt;p&gt;D) Planner-executor split — a lightweight planner LLM creates a tool call DAG upfront, a separate executor runs the graph, results flow back to planner only at checkpoints&lt;/p&gt;

&lt;p&gt;Pick one — A, B, C, or D — and tell me why you’d use it in production.&lt;/p&gt;

&lt;p&gt;Full breakdown in the comments. 👇&lt;/p&gt;

</description>
      <category>abotwrotethis</category>
      <category>ai</category>
      <category>agents</category>
      <category>systemdesign</category>
    </item>
    <item>
      <title>47/60 Days System Design Questions</title>
      <dc:creator>Joud Awad</dc:creator>
      <pubDate>Mon, 22 Jun 2026 16:44:53 +0000</pubDate>
      <link>https://dev.to/thejoud1997/4760-days-system-design-questions-2jmh</link>
      <guid>https://dev.to/thejoud1997/4760-days-system-design-questions-2jmh</guid>
      <description>&lt;p&gt;Your monolith is 6 years old.&lt;/p&gt;

&lt;p&gt;500k lines of code. One deploy every 3 weeks. One bad migration takes down the whole thing.&lt;/p&gt;

&lt;p&gt;Leadership says: "rewrite it." You've seen what happens when teams do that. 18 months. Budget overruns. Half the features never make it back. The new system launches and nobody trusts it.&lt;/p&gt;

&lt;p&gt;There's a better way. You don't rewrite — you strangle.&lt;/p&gt;

&lt;p&gt;Here's your system:&lt;/p&gt;

&lt;p&gt;• OrderService (monolith) handles: order creation, fulfillment, invoicing, returns&lt;br&gt;
• 40 engineers. 3 teams. All deploying to the same codebase.&lt;br&gt;
• Returns processing is a bottleneck. The business wants it extracted first.&lt;br&gt;
• You can't freeze feature work during migration.&lt;/p&gt;

&lt;p&gt;You need to extract Returns into a standalone service without a big-bang rewrite.&lt;/p&gt;

&lt;p&gt;What's your migration strategy?&lt;/p&gt;

&lt;p&gt;A) Strangler Fig — route /returns traffic to a new service via a proxy/facade, keep the monolith intact until the new service is proven. Deprecate monolith code after.&lt;/p&gt;

&lt;p&gt;B) Branch by Abstraction — introduce an interface inside the monolith, build the new implementation behind it, flip the switch when ready, then move it out.&lt;/p&gt;

&lt;p&gt;C) Big Bang Rewrite — freeze the monolith, rebuild everything in microservices in parallel, cut over when done.&lt;/p&gt;

&lt;p&gt;D) Database-First Migration — extract the returns database tables into a separate schema first, build the service around it, then reroute traffic.&lt;/p&gt;

&lt;p&gt;One of these lets you migrate live, with zero freeze periods, and with a clear rollback at every step.&lt;/p&gt;

&lt;p&gt;Pick one — A, B, C, or D — and tell me why. Full breakdown in the comments.&lt;/p&gt;

&lt;p&gt;Drop your answer 👇&lt;/p&gt;

</description>
      <category>abotwrotethis</category>
      <category>backend</category>
      <category>systemdesign</category>
      <category>architecture</category>
    </item>
    <item>
      <title>Queues vs Streams vs Event Bus: The Mental Model That Makes System Design Click</title>
      <dc:creator>Joud Awad</dc:creator>
      <pubDate>Mon, 22 Jun 2026 10:51:27 +0000</pubDate>
      <link>https://dev.to/thejoud1997/queues-vs-streams-vs-event-bus-the-mental-model-that-makes-system-design-click-58f</link>
      <guid>https://dev.to/thejoud1997/queues-vs-streams-vs-event-bus-the-mental-model-that-makes-system-design-click-58f</guid>
      <description>&lt;p&gt;I struggled with system design and event-driven architecture for years.&lt;/p&gt;

&lt;p&gt;Then I learned that queues, streams, and event buses are three completely different things, even though they look identical on a whiteboard.&lt;/p&gt;

&lt;p&gt;You draw the same boxes every time:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Producers&lt;/li&gt;
&lt;li&gt;a broker&lt;/li&gt;
&lt;li&gt;And something subscribing on the other end.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That picture is what kept tripping me up, because it hides the only part that actually matters: what's going on inside the broker, and the way each one pushes your system to scale in a different direction.&lt;/p&gt;

&lt;p&gt;Here's the mental model that finally got it to stick.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;A queue is a to-do list. A message lands in line, one worker picks it up, finishes it, and it's gone. Consumed once, then deleted. Want to go faster? Add more workers pulling from the same line, and that's pretty much the whole trick. SQS, RabbitMQ, Celery all work this way: one job, one consumer, done a single time. You reach for it when the work genuinely can't run twice, like charging a card or firing off a welcome email.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;A stream is a logbook you can rewind. Reading a message doesn't delete it. Each consumer just keeps a little bookmark, an offset, marking where it left off, so two teams can read the same data at once at completely different speeds. One of them can jump back to last Tuesday while the other stays on the live edge.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;An event bus is a switchboard. An event shows up, and the bus works out who should hear about it from rules you write, instead of you wiring every connection by hand. "order.paid" fires once, and shipping, analytics, and the fraud check all react to it without any of them knowing the others exist.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So same picture, three problems that barely overlap.&lt;/p&gt;

&lt;p&gt;I went through all three properly in my latest video, including how each one works under the hood and how it changes the way your system scales.&lt;/p&gt;

&lt;p&gt;  &lt;iframe src="https://www.youtube.com/embed/GQzXLhB7f2Q"&gt;
  &lt;/iframe&gt;
&lt;/p&gt;

</description>
      <category>systemdesign</category>
      <category>software</category>
      <category>eventdriven</category>
      <category>distributedsystems</category>
    </item>
    <item>
      <title>46/60 Days system Design Questions</title>
      <dc:creator>Joud Awad</dc:creator>
      <pubDate>Sun, 21 Jun 2026 17:06:55 +0000</pubDate>
      <link>https://dev.to/thejoud1997/4660-days-system-design-questions-4a7l</link>
      <guid>https://dev.to/thejoud1997/4660-days-system-design-questions-4a7l</guid>
      <description>&lt;p&gt;our production LLM agent just returned this JSON to your order processing service:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;{&lt;br&gt;
  "action": "refund",&lt;br&gt;
  "amount": "fifty dollars",&lt;br&gt;
  "order_id": null,&lt;br&gt;
  "confidence": "pretty high"&lt;br&gt;
}&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Your downstream service crashes. The retry hits the same model. Same broken output. The refund never fires — but the user got a confirmation email.&lt;/p&gt;

&lt;p&gt;You need your agent to return valid, typed, structured output — every time. What do you do?&lt;/p&gt;

&lt;p&gt;A) Prompt-engineer harder — add "Always return valid JSON with these exact fields" to your system prompt and document the schema inline.&lt;/p&gt;

&lt;p&gt;B) Use structured outputs / function calling (OpenAI, Bedrock tool use, Gemini response schema) — constrain the model at the API level to return a typed schema.&lt;/p&gt;

&lt;p&gt;C) Post-process with a validation layer — parse the output, run JSON Schema or Pydantic validation, retry with corrective context if it fails (max 2 retries).&lt;/p&gt;

&lt;p&gt;D) Add a second LLM as a judge — pass the first model's output to a smaller, faster model that scores and flags invalid responses before they reach your service.&lt;/p&gt;

&lt;p&gt;Three of these are patterns used in production AI systems. One of them is wishful thinking.&lt;/p&gt;

&lt;p&gt;Pick one — A, B, C, or D — and tell me why. I'll drop the full breakdown in the comments, including the pattern that looks defensive but actually makes hallucinations worse under load.&lt;/p&gt;

&lt;p&gt;Drop your answer&lt;/p&gt;

&lt;h1&gt;
  
  
  30DaysOfSystemDesign #SystemDesign #AIEngineering #AgenticsAI
&lt;/h1&gt;

</description>
      <category>abotwrotethis</category>
      <category>ai</category>
      <category>agents</category>
      <category>systemdesign</category>
    </item>
  </channel>
</rss>
