<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Malay Mehta</title>
    <description>The latest articles on DEV Community by Malay Mehta (@malaymehta).</description>
    <link>https://dev.to/malaymehta</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F4000772%2Febcee762-3124-47d5-b222-dcf2be03896f.png</url>
      <title>DEV Community: Malay Mehta</title>
      <link>https://dev.to/malaymehta</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/malaymehta"/>
    <language>en</language>
    <item>
      <title>System Design for Working Engineers, Not Interview Prep</title>
      <dc:creator>Malay Mehta</dc:creator>
      <pubDate>Fri, 26 Jun 2026 18:22:31 +0000</pubDate>
      <link>https://dev.to/malaymehta/system-design-for-working-engineers-not-interview-prep-43nf</link>
      <guid>https://dev.to/malaymehta/system-design-for-working-engineers-not-interview-prep-43nf</guid>
      <description>&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://malaymehta.com/blog/system-design-for-working-engineers" rel="noopener noreferrer"&gt;malaymehta.com&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The Interview Trap
&lt;/h2&gt;

&lt;p&gt;If you look at most system design tutorials, you get an extreme use case. Design Twitter. Design YouTube. Scale it to a billion users. Draw boxes on a whiteboard for 45 minutes.&lt;/p&gt;

&lt;p&gt;Do you think your app will be used by a billion users on day one? The answer is almost always no. But the tutorials don't teach you what to do when you have 500 users, unclear requirements, a team of four, and a quarter to ship something that works.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Real system design is nothing like a whiteboard interview. You don't get clean requirements, you don't design from scratch, and nobody asks you to handle a billion requests per second on day one.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Real System Design Starts with Questions, Not Diagrams
&lt;/h2&gt;

&lt;p&gt;The very first thing that matters in system design is something most tutorials skip entirely: unclear and chaotic requirements. In the real world, requirements don't come as a clean problem statement. They come from non-technical business teams, and you need to navigate through cross-questions to get all the clarity you need.&lt;/p&gt;

&lt;p&gt;Ask as many questions as possible. Understand your functional and non-functional requirements. Which features need to be synchronous and which can be async? What are the read and write load patterns? What is the maximum and average number of concurrent users right now? What does authentication look like? Do you need role-based access control?&lt;/p&gt;

&lt;p&gt;These questions drive your choices. You don't always need an axe where a knife will do. Being minimalist with a reasonable growth prediction and a 3, 6, 9 month plan will take you in the right direction.&lt;/p&gt;

&lt;p&gt;There will be things the situation demands immediately but would take more time than expected. Taking a predictable hit now and fixing it at the right future time without missing that balance is truly important. Weighing what will be expensive to change later, in terms of dollar cost or human effort, is how real architectural decisions get made.&lt;/p&gt;

&lt;h3&gt;
  
  
  Pushing Back on Bad Requirements
&lt;/h3&gt;

&lt;p&gt;Many times requirements come from non-technical business teams and you need to push back on why certain things should not be done the way they expect.&lt;/p&gt;

&lt;p&gt;Here is a real example. A business person once asked to duplicate data into another Kafka topic because their prediction was that the existing topic would not handle more load from a new subscriber. The technical reality? Kafka is built for exactly this. A new consumer group on the same topic would work without impacting existing consumers at all. If you don't push back, you end up creating tech debt with support and maintenance costs forever, just for replicating data that never needed to be replicated.&lt;/p&gt;

&lt;h2&gt;
  
  
  Trade-off Decisions Nobody Teaches
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Monolith vs Microservices
&lt;/h3&gt;

&lt;p&gt;Typically the very first thing engineers want to talk about is microservices and how they can help. But do you realistically have even 100 users on the product? Why do you need K8s, Docker, distributed tracing, cross-cutting async messaging, and service mesh? Do you really need that scale, or are you doing it to make your resume look better?&lt;/p&gt;

&lt;p&gt;If you have no real users in the thousands, a modular monolith is the best choice. Deploy everything as one server on Linux with a reverse proxy and a CNAME record. That simple. You need a database, sure. But you don't need Kafka, distributed tracing, auto-scaling, or any complex distributed computing to begin with.&lt;/p&gt;

&lt;p&gt;When predictable growth comes, add monitoring and observability to understand which requests are hitting hardest. Decouple the modules doing heavy work into independent microservices. Then pivot. That is the right sequence.&lt;/p&gt;

&lt;h3&gt;
  
  
  Synchronous vs Async
&lt;/h3&gt;

&lt;p&gt;If you don't need to process something immediately, decoupling via async helps. If it is fire-and-forget, use a simple queue. If you need multiple services to consume the same event with highly scalable producers and consumers, use Kafka. If the user is waiting for a response, keep it synchronous via a RESTful API because it needs to happen right now.&lt;/p&gt;

&lt;h3&gt;
  
  
  Build vs Buy
&lt;/h3&gt;

&lt;p&gt;Rule of thumb: never reinvent the wheel. If something already exists at low cost and does the job, buy it. If companies like OpenAI and Anthropic are not building their own payment systems and instead use established financial integrators, then you should trust that. If giants are not building everything from scratch, why should you? Building only makes sense when no existing solution fits your needs.&lt;/p&gt;

&lt;h3&gt;
  
  
  Consistency vs Availability: Real-World CAP
&lt;/h3&gt;

&lt;p&gt;When you are dealing with transactions that need ACID guarantees, use SQL. Ticket booking, inventory updates, financial debits and credits. These cannot tolerate stale reads or lost writes.&lt;/p&gt;

&lt;p&gt;If you need consistency and partition tolerance where stale reads must be errored out, NoSQL works better. Social media feeds, messaging, analytics, and streaming. If you need availability and partition tolerance with tolerance for eventual consistency, columnar databases like Cassandra fit well. IoT data, time series, high write throughput with low read frequency.&lt;/p&gt;

&lt;h3&gt;
  
  
  Perfect Architecture vs Shipping This Quarter
&lt;/h3&gt;

&lt;p&gt;Perfect architecture is always the goal, but if you can balance it with shipping this quarter, that brings real business value and revenue. Find the healthy mix. Build a base that requires very little change even if the actual decision evolves later.&lt;/p&gt;

&lt;p&gt;For example, tightly couple your audit logging service synchronously because you don't have async processing yet. It ships real business value now. Later, when async communication is added, you decouple it without changing how the end user experience works.&lt;/p&gt;

&lt;p&gt;Analytics is another one. You might not have the full setup of MySQL CDC to Debezium to ClickHouse yet. But you can start by ingesting specific tables into ClickHouse directly for analytics. Solve it elegantly later when DevOps capacity allows the full event streaming pipeline.&lt;/p&gt;

&lt;h2&gt;
  
  
  When to Scale and When Not To
&lt;/h2&gt;

&lt;p&gt;The time to scale is based on observability data and predictive customer expansion patterns. Your business understanding combined with analytical thinking will surface the signals that tell you when scaling is actually needed.&lt;/p&gt;

&lt;p&gt;Before jumping to horizontal or vertical scaling, check the basics first. Does your database have optimal indexing? Is your application connection pool configured properly? Are there N+1 queries firing hundreds of calls where one would do? These are high-level checks. Deeper concepts like partitioning and sharding are problems you encounter with billions of records, not a few million.&lt;/p&gt;

&lt;p&gt;Horizontal scaling is generally the better approach because it guarantees higher throughput with the ability to scale up or down without downtime. But only when you actually need it.&lt;/p&gt;

&lt;h3&gt;
  
  
  A Real Story: Premature Scaling Gone Wrong
&lt;/h3&gt;

&lt;p&gt;I worked with a company that had fewer than 50-100 customers and less than 5,000 business transactions total. They had already added Docker, Kubernetes, Kafka, distributed monitoring, and auto-scaling. Now they had two problems instead of one: the real business problem and a tech problem.&lt;/p&gt;

&lt;p&gt;Very few developers on the team understood microservices as a whole. Nobody knew DevOps practices well enough to manage how scaling actually works. It was not just slowing business delivery but also burning cloud costs because nobody knew how to optimize the infrastructure bill.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;A double-edged sword. Premature scaling without proper architectural guidance creates more problems than it solves.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Every Architecture Decision Is a Cost Decision
&lt;/h2&gt;

&lt;p&gt;Will you use managed Kubernetes or bare K8s? New Relic or Dynatrace or open-source alternatives? Managed database or self-hosted? It all depends on who is owning what. If you have DevOps engineers who can manage the nightmare of persistent storage, networking, constant upgrades, and maintenance, then self-hosted can work. If the answer is no, managed is better but it comes with a higher price tag.&lt;/p&gt;

&lt;p&gt;It is equally important to monitor your cloud costs and understand the incremental bills. Is your Docker image lifecycle policy set to delete old images within a few days? Is your S3 storage persistent forever or only for a retention period? Have you optimized or dropped high-cardinality metrics in your distributed tracing to save cost? How about networking costs for transporting data across regions? It all adds up.&lt;/p&gt;

&lt;p&gt;Here is a question I ask teams all the time: will you optimize your MySQL queries and indexing, or will you throw more money at bigger database instances so the app functions at a dollar cost that keeps increasing? Unless the root cause is identified and fixed, you are just burning money.&lt;/p&gt;

&lt;p&gt;Small teams with few users almost always face overly expensive microservices hosting and management. The operational overhead, debugging complexity, and cognitive load on the team need to be balanced against the actual benefits.&lt;/p&gt;

&lt;h2&gt;
  
  
  Database Design Is Architecture
&lt;/h2&gt;

&lt;p&gt;Schema design decisions haunt you for years. The table structure you choose in month one determines how painful your queries are in year two. Foreign keys, indexes, data types, normalization vs denormalization. These are architecture decisions, not database admin tasks.&lt;/p&gt;

&lt;p&gt;Pages taking 10+ seconds to load because nobody thought about indexing. N+1 queries firing hundreds of database calls. Unused columns bloating tables. No caching layer. Complex business logic with if-else ladders that nobody can follow.&lt;/p&gt;

&lt;p&gt;For SQL vs NoSQL, the real answer is simpler than the blog posts make it: if you need transactions and relationships, use SQL. If you need flexible schema with high write throughput and can tolerate eventual consistency, use NoSQL. Most applications should start with SQL and add NoSQL for specific use cases when needed.&lt;/p&gt;

&lt;p&gt;Caching strategy is another design decision that gets treated as an afterthought. Cache the data that is read frequently but changes rarely. Product catalogs, user profiles, configuration data. Invalidate on write. Start with a simple TTL-based approach and add event-driven invalidation when your system complexity demands it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Observability Is a Design Decision, Not an Afterthought
&lt;/h2&gt;

&lt;p&gt;Will your app monitoring give you real analytical insights, or is it just collecting logs nobody reads? P99 and P95 latency metrics, structured logs, alerts that tell you production broke at 12 PM instead of finding out at 2 PM when customers start calling.&lt;/p&gt;

&lt;p&gt;The real power of observability is the transactional breakdown: cache hit vs miss ratio, time spent in business logic, time spent in SQL queries, external API call latency, queue processing time. When you can see all of this in one trace, debugging goes from guesswork to precision.&lt;/p&gt;

&lt;p&gt;Being preventive rather than reactive is what separates teams that sleep well from teams that get paged at 2 AM.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Architecture Review Nobody Does
&lt;/h2&gt;

&lt;p&gt;Most teams never review their architecture after the initial design. Not a priority. Time constraints. Always something more urgent. But it does not take long for a system to become legacy if it does not keep up with the right patterns.&lt;/p&gt;

&lt;p&gt;Consider how fast things move in our industry. Teams still running Java 8 in 2026 because nobody reviewed the stack. Apps built on deprecated frameworks because there was never a checkpoint to evaluate alternatives. It takes no time for things to become stale if nobody reviews how they are done.&lt;/p&gt;

&lt;p&gt;In many architecture review meetings, I see people not coming with homework. They did not state why they considered alternatives. They have no POC results to support their design. If you only made a choice instead of evaluating options, it is a pure assumption that you struck gold. In reality it might be silver, and you would not know without backed proof.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The best architecture reviews are the ones where someone walks in and says "here is what I considered, here is what I ruled out, and here is why this approach wins." That is the difference between engineering and guessing.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  System Design Is Not a Diagram. It Is a Series of Decisions.
&lt;/h2&gt;

&lt;p&gt;Every section in this post is a decision. Monolith or microservices. Sync or async. SQL or NoSQL. Build or buy. Scale now or scale later. Spend on managed services or invest in DevOps. Ship the perfect version or ship what works this quarter.&lt;/p&gt;

&lt;p&gt;The tutorials teach you to draw boxes and arrows. Real system design is about making the right call at the right time with incomplete information and real constraints. That is a skill you build through experience, not through watching YouTube videos about designing Netflix.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;I'm &lt;a href="https://malaymehta.com" rel="noopener noreferrer"&gt;Malay Mehta&lt;/a&gt;, a Software Architect and Engineering Mentor with 726+ mentoring sessions and a 5.0 rating. I mentor working engineers on real-world architecture decisions, not interview prep. &lt;a href="https://malaymehta.com/blog" rel="noopener noreferrer"&gt;Read more on my blog&lt;/a&gt; or &lt;a href="https://malaymehta.com/contact" rel="noopener noreferrer"&gt;get in touch&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>systemdesign</category>
      <category>architecture</category>
      <category>backend</category>
      <category>webdev</category>
    </item>
    <item>
      <title>Why Your Vibe Coded App Breaks in Production</title>
      <dc:creator>Malay Mehta</dc:creator>
      <pubDate>Wed, 24 Jun 2026 14:45:24 +0000</pubDate>
      <link>https://dev.to/malaymehta/why-your-vibe-coded-app-breaks-in-production-40i2</link>
      <guid>https://dev.to/malaymehta/why-your-vibe-coded-app-breaks-in-production-40i2</guid>
      <description>&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://malaymehta.com/blog/why-your-vibe-coded-app-breaks-in-production" rel="noopener noreferrer"&gt;malaymehta.com&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  It Starts with an Idea
&lt;/h2&gt;

&lt;p&gt;Let me walk you through a story that's playing out thousands of times right now.&lt;/p&gt;

&lt;p&gt;It starts with an idea. A real one. Something that cracks a genuine problem. In the past, turning that idea into software required a developer squad, a QA team, and a bit of DevOps to get an MVP out the door.&lt;/p&gt;

&lt;p&gt;But things changed fast. Now someone with an idea and no technical background can build it themselves. We call it vibe coding. Hand the idea to an AI agent, iterate on the output, verify the end state, and ship it. If you mastered AI looping and prompt chaining, hats off. That part is genuinely impressive.&lt;/p&gt;

&lt;p&gt;The demo works. Early users sign up. You feel like you cracked it.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Wall
&lt;/h2&gt;

&lt;p&gt;Then comes the next logical thought: let me verify code quality. So you put another AI agent on it. The agent reviews the code, says it looks good, maybe suggests some minor refactors. Feels reassuring.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Until you realize what just happened: AI wrote the code, and AI reviewed the code.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Nobody with actual engineering judgment ever looked at it. Nobody walked through the code themselves. Nobody asked whether the architecture makes sense for what comes next.&lt;/p&gt;

&lt;p&gt;And I haven't even gotten to code security and vulnerabilities yet.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Actually See in These Codebases
&lt;/h2&gt;

&lt;p&gt;I review AI-generated and vibe coded repos regularly. Here is what shows up every single time:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Users can see each other's data by changing an ID in the URL.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This is the most common and most dangerous pattern. The app works perfectly in testing because you only ever test with one account. The moment two real users are on the system, one can access the other's data just by modifying a request parameter. Data isolation was never designed because the AI never thought about it, and nobody asked.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Authentication and token management full of holes.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Modified tokens are accepted. Privilege escalation is possible. The auth layer looks correct on the surface but has no real validation depth. An experienced attacker would walk through it in minutes.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Secrets floating in the code like public property.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;API keys, database credentials, third-party tokens hardcoded in source files. Open invitations for anyone who gets access to the repo. AI generates these inline because it doesn't understand the difference between making something work locally and making something safe for production.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Messy business logic with unnecessary abstractions everywhere.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;AI loves abstracting things. Layers upon layers of patterns that add complexity without value. When something breaks, nobody knows where to start looking because the logic is spread across files that reference each other in ways that don't map to anything in the business domain.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Nobody can do root cause analysis because nobody walked the code.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This is the fundamental problem. When a production issue hits, the team stares at code they didn't write, don't understand, and can't trace. Debugging becomes guesswork. Fixes introduce new bugs. The codebase fights you at every turn.&lt;/p&gt;

&lt;h2&gt;
  
  
  Your App Works with 100 Users. It Won't Work with 10,000.
&lt;/h2&gt;

&lt;p&gt;The app that worked fine in development starts showing cracks under real load. Here is what actually happens:&lt;/p&gt;

&lt;p&gt;Your database isn't functioning properly under heavy load because nobody thought about connection pooling, proper indexing, or slow query monitoring. API latency P99 and P95 are spiking, giving you sleepless nights. Is it a microservices mess, a debugging hell, or everything together?&lt;/p&gt;

&lt;p&gt;Do you have the right observability in place to even know what's happening? Are you following OWASP security principles? For most vibe coded apps, the answer is no.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Numbers Back This Up
&lt;/h2&gt;

&lt;p&gt;This is not just my observation. &lt;strong&gt;81% of enterprise tech leaders&lt;/strong&gt; report increased production issues from AI-generated code. &lt;strong&gt;43% of AI code changes&lt;/strong&gt; need manual debugging in production even after passing QA. AI-generated code introduces &lt;strong&gt;1.7x more issues&lt;/strong&gt; than human-written code across production systems.&lt;/p&gt;

&lt;h2&gt;
  
  
  A Ship Without a Sailor
&lt;/h2&gt;

&lt;p&gt;If all of this sounds like afterthoughts, then it's the right time to think. You may have gained moderate to good technical knowledge along the way, but is it enough to choose the right things now? To prioritize what to fix first? To know which shortcuts will cost you and which are fine?&lt;/p&gt;

&lt;p&gt;Without guidance, it's like a ship without a sailor. It goes wherever the wind takes it. In your case, the wind is a hallucinating AI agent that's only as good as the direction you give it.&lt;/p&gt;

&lt;p&gt;Does it seem too late? It can feel that way. You already have users growing and production burning with issues. But this is exactly the point where the right guidance makes the biggest difference.&lt;/p&gt;

&lt;h2&gt;
  
  
  What to Look for Before It's Too Late
&lt;/h2&gt;

&lt;p&gt;If you're running a vibe coded app in production right now, check these in order:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Data isolation&lt;/strong&gt; - can one user access another's data by modifying request parameters? Test with two accounts. If yes, stop everything and fix this first.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Auth and tokens&lt;/strong&gt; - try modifying a JWT token. Does your API still accept it? Can you escalate privileges?&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Secrets in code&lt;/strong&gt; - search your repo for API keys, database passwords, and tokens. Move them to environment variables.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Database under load&lt;/strong&gt; - check connection pooling, add indexes to your most queried columns, set up slow query logging.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Observability&lt;/strong&gt; - if you can't see what's happening in production without reading logs manually, you don't have observability. Set up structured logging, metrics, and alerting.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Walk the code yourself&lt;/strong&gt; - the single most important thing. Read through the critical paths. Understand how a request flows from the user to the database and back. If you can't trace it, neither can anyone you hire to fix it.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  The Right Guidance Changes Everything
&lt;/h2&gt;

&lt;p&gt;This is the vibe coded production mess. And it's fixable. An experienced mentor who has seen these patterns before can help you steer production, rescue catastrophic issues, and fix things in a way that actually stays fixed.&lt;/p&gt;

&lt;p&gt;The goal is not to throw away what you built. The goal is to save the momentum you have and build a real foundation under it.&lt;/p&gt;

&lt;p&gt;AI will keep getting better at generating code. But right now, the gap between code that works on a demo and code that works in production is where the real engineering happens.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The typing was never the hard part. The thinking is.&lt;/strong&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;I'm &lt;a href="https://malaymehta.com" rel="noopener noreferrer"&gt;Malay Mehta&lt;/a&gt;, a Software Architect and Engineering Mentor with 726+ mentoring sessions and a 5.0 rating. I help teams fix AI-generated software and mentor engineers on building the judgment AI can't replace. &lt;a href="https://malaymehta.com/blog" rel="noopener noreferrer"&gt;Read more on my blog&lt;/a&gt; or &lt;a href="https://malaymehta.com/contact" rel="noopener noreferrer"&gt;get in touch&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>vibecoding</category>
      <category>architecture</category>
      <category>webdev</category>
    </item>
  </channel>
</rss>
