<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Harshdeep Singh</title>
    <description>The latest articles on DEV Community by Harshdeep Singh (@harshdeepsingh13).</description>
    <link>https://dev.to/harshdeepsingh13</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3965372%2F96507d50-d632-47cf-81c0-0fa1b2ea41bd.png</url>
      <title>DEV Community: Harshdeep Singh</title>
      <link>https://dev.to/harshdeepsingh13</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/harshdeepsingh13"/>
    <language>en</language>
    <item>
      <title>The 95% Problem: Why Enterprise AI Keeps Failing — and What the 5% Get Right</title>
      <dc:creator>Harshdeep Singh</dc:creator>
      <pubDate>Thu, 11 Jun 2026 19:53:22 +0000</pubDate>
      <link>https://dev.to/harshdeepsingh13/the-95-problem-why-enterprise-ai-keeps-failing-and-what-the-5-get-right-3geb</link>
      <guid>https://dev.to/harshdeepsingh13/the-95-problem-why-enterprise-ai-keeps-failing-and-what-the-5-get-right-3geb</guid>
      <description>&lt;p&gt;Ninety-five out of every hundred enterprise AI pilots produce nothing a CFO would sign off on. The reflex is to blame the model — too dumb, too small, the wrong vendor. It almost never is. The thing quietly killing enterprise AI is older and more boring than any model: data nobody organized for machines, and rules nobody ever wrote down. The strangest part of the story is who is losing the fight hardest — the firms whose entire business is selling everyone else the cure.&lt;/p&gt;
&lt;h2&gt;The most expensive irony in enterprise software&lt;/h2&gt;
&lt;p&gt;In late 2025, Deloitte gave part of a government cheque back. The firm had delivered a report to Australia's Department of Employment and Workplace Relations, and reviewers found something awkward buried inside it: citations to academic papers that did not exist, and a fabricated reference to a federal court judgment. The work had been produced with help from generative AI, and no one had checked it before it went out the door. Deloitte agreed to refund part of its fee.&lt;/p&gt;
&lt;p&gt;It is tempting to read that as a story about a hallucinating chatbot. It is not. A capable model can cite a real paper; the failure was not that the AI was too weak. The failure was that nothing in the process forced a human to verify machine output before it reached a client. There was no standard operating procedure, no checkpoint, no rule with teeth. That distinction — between a model problem and a data-and-governance problem — is the entire subject of this essay, and the firms that sell AI for a living have just handed us the clearest possible illustration of it.&lt;/p&gt;
&lt;p&gt;Consider the position those firms are in. Since 2023, the Big Four and the major strategy houses have collectively poured more than ten billion dollars into AI. It is their flagship pitch. Accenture reported close to six billion dollars in generative-AI bookings in a single fiscal year. PwC became one of OpenAI's largest enterprise customers and then its reseller. KPMG signed a two-billion-dollar alliance with Microsoft. These organizations have built their modern brand on the promise that they can walk into any enterprise and fix its AI problem. And yet, internally, they have hit precisely the wall they are paid to dismantle.&lt;/p&gt;
&lt;p&gt;This is not schadenfreude about one embarrassing report. It is the most useful data point in the entire enterprise-AI conversation, because it removes the easy excuses. You cannot say the consultants lacked talent, budget, model access, or executive buy-in. They had all of it in abundance. If the people who sell the cure can still catch the disease, then the disease is not what most companies think it is. It is not a shortage of intelligence in the model. It is a shortage of order in the data and discipline in the governance — and almost nobody is immune.&lt;/p&gt;
&lt;h2&gt;The number that belongs on every board agenda&lt;/h2&gt;
&lt;p&gt;Start with the figure that has been ricocheting around boardrooms since it landed. In its 2025 report &lt;em&gt;The GenAI Divide: State of AI in Business&lt;/em&gt;, MIT's NANDA initiative studied hundreds of enterprise AI initiatives and concluded that roughly ninety-five percent of them had produced no measurable impact on the bottom line. Not weak returns. No returns. The spending in scope ran to tens of billions of dollars, and the overwhelming majority of it bought experiments that never crossed into anything a finance team could defend.&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;&lt;strong&gt;95%&lt;/strong&gt; of enterprise generative-AI pilots deliver no measurable business impact — the spending lands, the value does not.&lt;/p&gt;&lt;/blockquote&gt;
&lt;p&gt;It is not an isolated finding. Gartner expects that by the end of 2025, three in ten generative-AI projects will be abandoned after the proof-of-concept stage, and that through 2026, sixty percent of AI projects will be scrapped specifically because the organizations lacked AI-ready data. The firm goes further on agents: it forecasts that more than forty percent of agentic-AI projects will be cancelled by the end of 2027. The RAND Corporation has put the historical AI project failure rate above eighty percent — roughly twice the rate of conventional IT projects. And S&amp;amp;P Global found that the share of companies abandoning most of their AI initiatives jumped to forty-two percent in 2025, up from just seventeen percent a year earlier. The trend is not improving as the technology matures. It is getting worse as spending outruns readiness.&lt;/p&gt;
&lt;p&gt;The crucial detail is &lt;em&gt;where&lt;/em&gt; these projects die. They almost never fail in the lab. They fail on the road to production. A pilot runs on a curated slice of data — a clean schema, a controlled volume, a problem chosen because it demos well. Production runs on the actual enterprise: the duplicated records, the contradictory definitions, the fields that mean different things in different systems, the knowledge trapped in formats no machine can read. The distance between the demo and the deployment is the distance between curated data and real data, and that distance is where the money disappears. People in the field have a name for the place projects go to expire: pilot purgatory.&lt;/p&gt;
&lt;p&gt;The people closest to the data already know this. In Informatica's 2025 survey of chief data officers, the most-cited obstacle to AI success was not talent, not budget, not model quality — it was data quality and readiness. The executives responsible for the foundation are telling everyone the foundation is the problem. Most strategies are simply not listening, because listening would mean slowing down to do the tedious work, and the market is rewarding speed.&lt;/p&gt;
&lt;p&gt;And the window in which to fix this is closing faster than the failure rate alone suggests, because the industry is sprinting from chatbots to agents. Gartner expects that by the end of 2026, four in ten enterprise software applications will include task-specific AI agents, up from less than one in twenty in 2025. Agents raise the stakes of the underlying problem by an order of magnitude. A chatbot that retrieves bad data returns a bad answer a human can still catch. An agent that &lt;em&gt;acts&lt;/em&gt; on bad data — reconciling an account, approving a request, triggering a downstream workflow — propagates the error into the real world before anyone reviews it. The same analysts forecasting the agentic wave also forecast that more than forty percent of agentic projects will be cancelled by the end of 2027, for the same unglamorous reasons the chatbots failed. We are, in other words, about to point far more autonomous systems at foundations that were already too weak for the last generation of tools.&lt;/p&gt;
&lt;p&gt;That is what makes the failure rate a strategic problem rather than a technical footnote. The cost is not the wasted pilot budget; that is the cheap part. The real cost is competitive. Every quarter a rival reaches production while you re-run experiments that were always going to fail for the same reason, the rival's system gets better, its data gets cleaner, its people get more fluent, and the gap compounds. You are not standing still. You are losing ground while looking busy.&lt;/p&gt;
&lt;h2&gt;It was never the model&lt;/h2&gt;
&lt;p&gt;The comforting story inside most failed AI programs is that the technology was not ready, and that the next model — bigger, newer, from a different lab — will be the one that finally works. It is comforting because it requires nothing of the organization except patience and a bigger invoice. It is also wrong.&lt;/p&gt;
&lt;p&gt;Here is the inconvenient test. The model that hallucinated its way through your failed pilot is, in most cases, the same model that performed flawlessly in the vendor's demo. Nothing about the weights changed between those two moments. What changed was everything around the model: the quality of the data it was fed, the clarity of the instructions it was given, and the rules governing what it was allowed to touch. The model was never the variable. The environment was.&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;The model in your failed pilot and the model in the vendor's flawless demo are usually the same model. The difference between them is everything you built — or failed to build — around it.&lt;/p&gt;&lt;/blockquote&gt;
&lt;p&gt;This is also why "wait for the next model" is such a seductive and expensive trap. Each new model is genuinely more capable than the last, which makes it easy to believe the next one will finally clear the bar. But a more capable model pointed at the same unstructured data and the same absent rules does not fix the problem — it executes the same mistakes more fluently, and, increasingly, more autonomously. Capability without a foundation is not progress. It is leverage applied to a fault line.&lt;/p&gt;
&lt;p&gt;That environment has two load-bearing pieces, and almost every enterprise is missing both. The first is data that a machine can actually reason over. The second is governance a machine can actually obey. The original instinct that AI needs "organized data and some kind of SOP" is exactly right — it just turns out that each half is a deep discipline in its own right, and that naming them separately is the difference between a strategy that works and a slide that sounds good. Take them one at a time.&lt;/p&gt;
&lt;h2&gt;Gap one: data that was never built for machines&lt;/h2&gt;
&lt;p&gt;An AI agent does not think the way a database is organized. It does not navigate neat rows and columns; it reasons over entities, the relationships between them, and the context that gives them meaning. It needs to know that this customer is the same as that account, that "revenue" in the finance system and "revenue" in the sales dashboard are or are not the same number, that this contract supersedes that one, that this policy applies to this region. Enterprise data, as it actually exists, is almost the precise opposite of that.&lt;/p&gt;
&lt;p&gt;In most companies the data is siloed across systems that were never designed to talk to each other, duplicated in ways no one fully maps, and defined inconsistently enough that the same word can name genuinely different things in different systems. Worse, the knowledge that actually matters — the reasoning, the precedent, the hard-won judgment — tends to live in formats machines cannot read: slide decks, PDFs, email threads, and the heads of senior people who are about to retire. You can connect the cleanest model in the world to that, and it will faithfully reflect the chaos back to you.&lt;/p&gt;
&lt;p&gt;The most instructive proof of this comes, again, from a consultancy. When McKinsey built its internal AI platform, the firm discovered that the tool could not initially parse PowerPoint — which was a problem, because PowerPoint is where most of McKinsey's institutional knowledge actually lived. Sit with that for a moment. One of the most knowledge-intensive organizations on earth, a firm whose entire product is structured thinking, found that its crown-jewel intellectual property was effectively illegible to a machine until it did real work to fix the ingestion. If McKinsey's knowledge was trapped in slides, it is worth asking, honestly, what shape yours is in.&lt;/p&gt;
&lt;p&gt;The failure rarely announces itself as missing data. It hides in data that is present but means subtly different things in different places. Ask an agent a question as ordinary as how many active customers the business has, and it will find a dozen tables with a dozen definitions of "active" — a login within the last thirty days in one system, a non-zero balance in another, an uncancelled contract in a third. A human analyst resolves that ambiguity with context and a quick message to a colleague. An agent, lacking both, picks one definition silently and reports a confident number that is wrong in a way nobody can see. Multiply that across every entity and every metric a company cares about, and you have the real texture of the problem: not an empty warehouse, but a full one with no shared language.&lt;/p&gt;
&lt;h3&gt;You cannot retrieve your way out of a data swamp&lt;/h3&gt;
&lt;p&gt;The popular hope is that retrieval-augmented generation — pointing an agent at your documents and letting it fetch what it needs — will paper over the mess. It will not. An agent retrieving from a swamp returns swamp, dressed up in fluent prose that makes the swamp harder to detect. And the instinct to fix this by building a bigger data lake usually just produces a bigger swamp with better storage economics. Volume was never the problem. Meaning was.&lt;/p&gt;
&lt;p&gt;What actually closes the gap is a layer most enterprises have never built: a semantic, machine-readable map of what the data &lt;em&gt;means&lt;/em&gt;. In practice this goes by several names that point at the same idea — a semantic layer, an ontology, a knowledge graph, a governed data catalog. The common thread is that core business concepts get defined once, consistently, in a form an agent can consume: what a customer is, what counts as revenue, how entities relate, which rules and constraints apply. The catalog becomes the control plane of truth, and the semantic layer becomes the thing that lets a model answer in terms of your business rather than in terms of raw, ambiguous tables.&lt;/p&gt;
&lt;p&gt;The organizing principle behind all of this is treating data as a product rather than as exhaust. Exhaust is whatever a system happens to emit, owned by no one, documented nowhere. A product has an owner, a contract, documentation, versioning, and a consumer whose needs shape it. The research bears out how much this matters: organizations that treat data as a product — with curated models and shared vocabularies — are dramatically more likely to scale generative AI successfully than those that do not. When the foundation is built this way, retrieval techniques like &lt;a href="https://theharshdeepsingh.com/blog/building-an-llm-project-from-scratch-in-2026" rel="noopener noreferrer"&gt;graph-based RAG&lt;/a&gt; can ground every answer in verified, connected data, enforce access controls at the moment of the query, and trace each response back to the exact source it came from. That is the difference between an agent that confidently invents a court case and one that shows its work.&lt;/p&gt;
&lt;p&gt;A knowledge graph earns its keep precisely here. Instead of flat, disconnected tables, it stores the business as a web of entities and the relationships between them: this customer belongs to this account, which is covered by this contract, governed by this policy, owned by this team. An agent reasoning over that structure can follow the connections the way a knowledgeable employee would, and every answer it produces carries its lineage — which source, which definition, which version. That is also what makes governance enforceable at the level of &lt;em&gt;meaning&lt;/em&gt; rather than the level of the raw row, because the graph knows what a thing is and who is permitted to see it.&lt;/p&gt;
&lt;h3&gt;The strategic point hiding in the plumbing&lt;/h3&gt;
&lt;p&gt;Strip away the vocabulary and the strategic truth is simple: AI readiness is mostly data maturity wearing a more exciting outfit. You cannot purchase your way past it with a model subscription, because the thing you are missing is not compute or intelligence — it is the slow, structural work of making your own knowledge legible. That work is unglamorous, expensive, and invisible in a board deck, which is exactly why most organizations skip it and exactly why most organizations end up in the ninety-five percent. The semantic foundation is not overhead on the way to the real prize. It &lt;em&gt;is&lt;/em&gt; the prize. It is the part competitors cannot copy by signing the same vendor contract you did.&lt;/p&gt;
&lt;h2&gt;Gap two: governance that was never written down&lt;/h2&gt;
&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1745015446589-7ee6f702d8c1%3Fauto%3Dformat%26fit%3Dcrop%26w%3D1600%26h%3D720" alt="The blue glass facade of a modern office building, its windows forming a strict geometric grid." width="1600" height="720"&gt;&lt;p&gt;&lt;em&gt;Order, structure, repetition — the corporate grid. Governance is what gives an agent the same scaffolding a new employee takes for granted. Photo: Fabian Kleiser / Unsplash.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;If you ask most enterprises where their AI governance lives, the honest answer is a PDF on a shared drive — a well-intentioned document of principles that almost no one has read and that no system enforces. A PDF nobody reads is not a policy an agent can obey. It is a statement of hope. And hope does not survive contact with an autonomous system acting at machine speed across systems it was never explicitly cleared to touch.&lt;/p&gt;
&lt;p&gt;Governance for AI, and especially for agents, has to be machine-actionable to mean anything. The "SOP" intuition is the right one, but it resolves into two concrete questions that a slide of principles never answers: what is this agent actually allowed to touch and do, and what is the operating rhythm that keeps it honest over time? Get specific about both, and governance stops being a compliance ornament and starts being the thing that lets you deploy without holding your breath.&lt;/p&gt;
&lt;h3&gt;An agent is a new employee with root access and no onboarding&lt;/h3&gt;
&lt;p&gt;It helps to think of an agent as exactly what it is becoming: a digital worker. The trouble is that we wrap human workers in decades of accumulated controls and give agents almost none of them. A new human employee receives an identity, a defined role, least-privilege access to only the systems their job requires, a manager who reviews their work, and an audit trail that records what they did. An agent, in too many deployments, gets a single shared API key with broad standing credentials, the ability to call tools far outside its actual task, and no logging worth the name. It is the most over-permissioned new hire in the building, and no one interviewed it.&lt;/p&gt;
&lt;p&gt;The discipline that fixes this is well understood, even if it is rarely applied. Security researchers call the core idea least-agency, or least-privilege: an agent should receive the minimum autonomy required for its specific task and nothing more. A customer-support agent does not need write access to the billing database. A research agent does not need the ability to send external email. From there it cascades into concrete controls: whitelisting the specific tools an agent may use, issuing short-lived credentials instead of permanent keys, sandboxing execution, restricting where an agent can send data, and — critically — keeping a human in the loop for actions that are irreversible or sensitive. A mature deployment will refuse to let an agent move money without clearing a confidence threshold or obtaining a second approval, and will strip personally identifiable information before it ever reaches a model, restoring it only on the way back. None of that lives in a principles document. All of it lives in enforced policy. That, and not the PDF, is the real standard operating procedure: rules expressed as controls a system cannot route around.&lt;/p&gt;
&lt;p&gt;The danger is not hypothetical, and it does not require malice. Picture an agent handed broad database credentials so it could "be helpful," then asked to tidy up some duplicate records. With no constraint on its scope and no human checkpoint, a single ambiguous instruction becomes a destructive write across production data in seconds — faster than any person could intervene, and recorded nowhere anyone thought to look. The same autonomy that makes agents useful is what makes their mistakes fast and quiet. Standing credentials, missing audit trails, and unrestricted tool access are not exotic edge cases; they are the default state of most early agent deployments, and they are exactly how a promising program turns into a board-level incident.&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;A policy an agent cannot read is decoration. Governance that scales is policy an agent is structurally unable to disobey.&lt;/p&gt;&lt;/blockquote&gt;
&lt;h3&gt;You do not have to invent the rulebook&lt;/h3&gt;
&lt;p&gt;The encouraging part is that the scaffolding for all of this already exists, written by people who have thought about it harder than any individual team has time to. The U.S. National Institute of Standards and Technology publishes the AI Risk Management Framework, along with a dedicated profile for generative AI, and its emphasis maps almost directly onto agent controls: role-based access, continuous monitoring, adversarial testing, and lifecycle logging for traceability. The international ISO/IEC 42001 standard formalizes the idea of an AI management system, with oversight and continual improvement built in. The OWASP GenAI Security Project maintains a Top 10 for large-language-model applications and a newer Top 10 for agentic applications, cataloguing the exact failure classes teams keep rediscovering the hard way: prompt injection, tool misuse, memory leakage. And the external pressure is rising fast, from the EU AI Act to a wave of national AI laws, which means governance is shifting from a nice-to-have to a condition of doing business.&lt;/p&gt;
&lt;p&gt;Underneath the frameworks sits a single overlooked capability that decides whether any of them are real: observability. If you cannot see what an agent did — which data it touched, which tools it called, which decision it made and why — then you cannot govern it, debug it, or defend it to a regulator, and you certainly cannot trust it with anything that matters. Audit logging and traceability are not paperwork. They are the line between an agent you can put into production and a black box you can only hope behaves. Trust, in the end, is not extended to systems because they are clever. It is extended to systems because they are accountable.&lt;/p&gt;
&lt;p&gt;The reframe that matters most here is that governance is not a brake on AI. It is the enabler. The organizations that actually reach production are, consistently, the ones that invested in governance frameworks &lt;em&gt;before&lt;/em&gt; they scaled agent capabilities — not after a breach forced their hand. The absence of guardrails does not make you faster; it produces exactly the brittle, untrustworthy, occasionally catastrophic behavior that makes leadership pull the plug and sends a promising program back to purgatory. Guardrails are what let you move quickly without flinching.&lt;/p&gt;
&lt;h2&gt;The consultants' mirror&lt;/h2&gt;
&lt;p&gt;Return now to where we started, because the consulting firms are not just a cautionary anecdote. They are the most public, best-funded live experiment in internal AI adoption that exists, and watching them is the closest thing the rest of us have to a controlled trial. They are simultaneously the largest sellers of AI transformation and a room full of organizations trying to transform themselves — and they have been unusually candid about how hard it is. BCG, in its own cross-industry research, found that roughly three-quarters of companies struggle to achieve and scale value from their AI initiatives. The firms are not describing a problem they have solved from the outside. They are describing one they are living from the inside.&lt;/p&gt;
&lt;p&gt;The stakes are visible in their own pyramids. Internal tools like McKinsey's assistant and BCG's slide-polishing system can already perform a large share of the research-and-formatting work that used to define a junior analyst's first years, and entry-level hiring across the industry has tightened as a result. That is what it looks like when this technology genuinely lands inside an organization — and it is a useful reminder that getting it right is not a productivity nicety. It restructures the firm. The flip side is that getting it wrong, in public, with a client's name on the document, restructures the firm's reputation just as quickly.&lt;/p&gt;
&lt;p&gt;And so every major firm now has its own platform: McKinsey with its knowledge assistant and a fleet of internal agents numbering in the tens of thousands, BCG with its build unit and internal tools, Deloitte with its assistant and an agentic platform, PwC with an agent operating system spanning tens of thousands of deployed agents, EY with a platform giving tens of thousands of staff access to a growing roster of agents and a multi-year plan to scale into the hundreds of thousands, and KPMG with its own agentic workbench. Billions of dollars, real engineering, genuine ambition. But the tools are the visible ten percent. The invisible ninety percent — the part that determines whether any of it works — is the data and governance plumbing underneath. There is a useful rule of thumb circulating in this world that only a small fraction of AI value comes from the algorithms and the technology, with the overwhelming majority coming from people, process, and the organizational change required to make the technology stick. The firms that are winning internally are the ones that took that ratio seriously.&lt;/p&gt;
&lt;h2&gt;The five percent: what McKinsey's Lilli actually proves&lt;/h2&gt;
&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1524995997946-a1c2e315a42f%3Fauto%3Dformat%26fit%3Dcrop%26w%3D1600%26h%3D640" alt="The curved, balconied interior of a grand library, its shelves densely and neatly filled with books." width="1600" height="640"&gt;&lt;p&gt;&lt;em&gt;Organized, machine-legible knowledge was the real moat — not the model. Every rival had the same models. Photo: Susan Q Yin / Unsplash.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;If the failure rate has a counterexample worth studying, it is McKinsey's internal platform, Lilli. It is the case study everyone cites, and almost everyone draws the wrong lesson from it. The wrong lesson is that McKinsey succeeded because it had access to powerful models. That cannot be the explanation, because every competitor had access to the same models. The right lesson is far less flattering to the technology and far more useful to anyone trying to replicate the result: McKinsey succeeded because it did the boring work that everyone else was skipping.&lt;/p&gt;
&lt;p&gt;Look at what the boring work actually was. The platform draws on more than forty knowledge sources and over a hundred thousand documents and interview transcripts — but the unlock was not aggregation, it was curation and tagging, the patient labor of making a century of accumulated knowledge consistent and machine-legible. The team built what is better described as an orchestration layer than a simple retrieval bot, designed to synthesize and contextualize rather than just fetch. They confronted the unglamorous reality that their best material was trapped in slides and fixed the ingestion so the machine could read it. Only then did the human side of adoption begin: a phased rollout, training that cured what the firm called "prompt anxiety" in roughly an hour, internal evangelists, and senior leaders modeling the behavior they wanted to see.&lt;/p&gt;
&lt;p&gt;The results are the part people quote, and they are genuinely impressive: more than three-quarters of the firm's tens of thousands of employees now use the tool, heavy users return to it more than a dozen times a week, and the firm reports its people save close to a third of their research time. But the number to internalize is not the adoption rate. It is what produced it. The moat was never the model. The moat was a hundred years of knowledge made legible to machines, wrapped in the governance and the change management required to make people actually trust it and use it.&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;McKinsey's edge was not a smarter model — every rival had the same models. The edge was a century of knowledge made legible to machines, and the discipline to govern it.&lt;/p&gt;&lt;/blockquote&gt;
&lt;p&gt;The pattern repeats across the rest of the industry, even if less dramatically. The firms making real internal progress — across the spectrum of platforms and agents now deployed — are consistently the ones that invested in their data foundations and their governance before they tried to scale. And the lesson generalizes cleanly to any enterprise, in any sector, that wants off the wrong side of the divide. The winners are not the organizations with the best model. Everyone has the same models. The winners are the organizations that did the unglamorous data-and-governance work that everyone else found a reason to defer.&lt;/p&gt;
&lt;p&gt;It is worth saying plainly what this does and does not mean for everyone else, because the lesson is easy to mislearn. You cannot buy McKinsey's result by buying McKinsey's tool, any more than you could acquire a rival's culture by licensing their software. What travels is not the platform; it is the method — the willingness to treat your own knowledge as an asset worth making machine-legible, and your own governance as an engineering problem worth solving before the agents arrive. That method is available to any organization in any industry. It is simply not for sale, and it cannot be rushed.&lt;/p&gt;
&lt;h2&gt;What the winners actually do&lt;/h2&gt;
&lt;p&gt;None of this resolves into a checklist, and anyone selling you one is selling the wrong thing. But the organizations on the right side of the divide do share a small number of strategic commitments, and they are worth stating plainly — not as steps to execute in order, but as the shape of a serious posture toward AI.&lt;/p&gt;
&lt;h3&gt;They sequence data before models&lt;/h3&gt;
&lt;p&gt;The single most counterintuitive move the winners make is to stop running hero pilots and fix the foundation first. That does not mean a multi-year data project before any value is delivered; it means choosing initial use cases precisely where the data is already clean and compatible, shipping those to generate real and defensible returns, and then using that credibility and momentum to fund the remediation of the messier domains. The failure mode is the opposite: picking use cases based on strategic ambition and executive enthusiasm, discovering the data underneath cannot support them, and producing an over-budget pilot that demonstrates the limits of the data rather than the capability of the AI. Match the ambition to the data maturity, not to the org chart's excitement.&lt;/p&gt;
&lt;h3&gt;They treat data as a product, not exhaust&lt;/h3&gt;
&lt;p&gt;The winners give their data owners, contracts, documentation, and a semantic layer that defines the business vocabulary once and reuses it everywhere. The catalog becomes the control plane of truth; the ontology and knowledge graph become the connective tissue that lets agents reason over entities and relationships rather than guess at ambiguous tables. This is the work that does not show up in a launch announcement and entirely determines whether the launch was real. It is also, not coincidentally, the part of the strategy a competitor cannot acquire by signing the same contracts you did.&lt;/p&gt;
&lt;h3&gt;They make governance machine-actionable&lt;/h3&gt;
&lt;p&gt;The winners do not confuse a principles document with a control. They express policy as something a system enforces: identity for every agent, least-agency access, tool whitelisting, audit trails, and a human in the loop for anything irreversible or sensitive. They adopt an established standard rather than inventing their own — the NIST framework, the ISO management-system standard, the OWASP failure catalogues — and they wrap it in an operating rhythm: regular reviews, red-teaming, and genuine change management for agents, treating a new agent with the seriousness one would treat a new hire with broad system access. Governance, done this way, is not the thing that slows the program down. It is the thing that lets the program move without fear.&lt;/p&gt;
&lt;h3&gt;They build the context layer agents inherit&lt;/h3&gt;
&lt;p&gt;Rather than re-explaining the business inside every prompt, the winners push meaning and rules down into the data itself, so that any agent connecting to it inherits both. It is worth watching the emerging plumbing here — &lt;a href="https://theharshdeepsingh.com/blog/the-best-claude-setup-that-works-on-any-ai-tool" rel="noopener noreferrer"&gt;open protocols that standardize how agents connect&lt;/a&gt; to tools and to one another, sometimes described as an "HTTP for agents," are quickly becoming the connective standard for this world. But a word of caution that the protocol enthusiasm tends to skip: plumbing that lets agents reach your data faster does nothing good if the house behind the tap is a mess. Standard connectivity over a swamp just distributes the swamp at higher throughput. The connectivity is necessary; the clean, governed foundation is what makes it worth having.&lt;/p&gt;
&lt;h3&gt;They treat adoption as a change program, not a software rollout&lt;/h3&gt;
&lt;p&gt;Finally, the winners understand that buying licenses is not the same as achieving adoption. The most replicable lesson from the internal success stories is almost embarrassingly human: an hour of training to dissolve the anxiety of a blank prompt, visible evangelists, and leaders who actually use the tools they are asking their people to use. If the overwhelming majority of AI value comes from people and process rather than the algorithm, then the overwhelming majority of the effort has to go there too. Technology adoption has always been a human problem wearing a technical mask, and AI has not changed that. It has only raised the stakes.&lt;/p&gt;
&lt;h3&gt;They measure value, not motion&lt;/h3&gt;
&lt;p&gt;The organizations stuck in the ninety-five percent tend to measure activity — pilots launched, seats provisioned, models evaluated — and mistake it for progress. The winners measure outcomes, and they are ruthless about it: a use case either moves a number a finance team recognizes, or it is killed quickly, before it hardens into a permanent science project. That discipline is exactly what frees the budget and the attention to pour into foundations, where the compounding returns actually live. Counting pilots is how a program feels busy while going nowhere. Counting value is how it escapes the lab.&lt;/p&gt;
&lt;h2&gt;The real divide&lt;/h2&gt;
&lt;p&gt;It is worth being precise about what the so-called GenAI Divide actually divides. It is not a line between companies with good models and companies with bad ones. Frontier models are a commodity now; the same handful are available to everyone with a credit card. The divide is between the organizations that did the foundational work and the organizations that did not — and underneath the AI costume, that is simply a gap in data maturity and governance discipline that has existed for years and that AI has suddenly made expensive to ignore.&lt;/p&gt;
&lt;p&gt;And it compounds. The organizations on the right side of the divide get faster every quarter, because their agents inherit ever-cleaner data and ever-tighter rules, and each success funds the next. The organizations on the wrong side re-run pilots that fail for the same reason they failed last time, mistaking a foundation problem for a model problem and waiting for a model that was never going to save them. The gap between the two groups does not stay constant. It widens.&lt;/p&gt;
&lt;p&gt;The deepest irony of the whole story is the one we began with. The cure for the failing enterprise AI program was never a smarter model. It was the boring, expensive, unglamorous discipline that the consultants themselves had to learn the hard way, in public, with a refunded invoice as tuition: organize the data so a machine can reason over it, write the rules down in a form a machine is forced to obey, and only then let the agents loose. The companies that internalize that will not merely adopt AI. They will compound on it — quietly, structurally, and largely out of view — while everyone else is still abandoning pilots and blaming the model.&lt;/p&gt;
&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1689085383734-32c6d88221a7%3Fauto%3Dformat%26fit%3Dcrop%26w%3D1600%26h%3D720" alt="An aerial view of a road winding in tight switchbacks up a green mountain pass." width="1600" height="720"&gt;&lt;p&gt;&lt;em&gt;The divide compounds. The foundation you lay now decides how fast you can move later. Photo: Robert Bye / Unsplash.&lt;/em&gt;&lt;/p&gt;
&lt;h2&gt;Sources and further reading&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;MIT NANDA initiative, &lt;em&gt;The GenAI Divide: State of AI in Business 2025&lt;/em&gt; — the source of the widely cited finding that roughly 95% of enterprise generative-AI pilots show no measurable business impact.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a rel="noopener noreferrer nofollow" href="https://www.gartner.com/en/newsroom/press-releases/2024-07-29-gartner-predicts-30-percent-of-generative-ai-projects-will-be-abandoned-after-proof-of-concept-by-end-of-2025"&gt;Gartner&lt;/a&gt; — predictions on proof-of-concept abandonment, AI-ready data, and agentic-project cancellation rates.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a rel="noopener noreferrer nofollow" href="https://venturebeat.com/ai/consulting-giant-mckinsey-unveils-its-own-generative-ai-tool-for-employees-lilli"&gt;VentureBeat&lt;/a&gt; — background on McKinsey's internal Lilli platform and how it was built.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a rel="noopener noreferrer nofollow" href="https://www.deloitte.com/ce/en/issues/generative-ai/state-of-ai-in-enterprise.html"&gt;Deloitte&lt;/a&gt;, &lt;em&gt;State of AI in the Enterprise&lt;/em&gt; — recurring survey data on enterprise AI spend, scaling, and ROI.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a rel="noopener noreferrer nofollow" href="https://www.nist.gov/itl/ai-risk-management-framework"&gt;NIST AI Risk Management Framework&lt;/a&gt; and its Generative AI Profile — role-based access, monitoring, adversarial testing, and lifecycle logging.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a rel="noopener noreferrer nofollow" href="https://genai.owasp.org/"&gt;OWASP GenAI Security Project&lt;/a&gt; — the Top 10 for LLM applications and the Top 10 for agentic applications, covering prompt injection, tool misuse, and related failure classes.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;ISO/IEC 42001 — the international standard for an AI management system, covering oversight and continual improvement.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;


</description>
      <category>enterpriseai</category>
      <category>aistrategy</category>
      <category>datagovernance</category>
      <category>aiagents</category>
    </item>
    <item>
      <title>Building an LLM Project From Scratch in 2026</title>
      <dc:creator>Harshdeep Singh</dc:creator>
      <pubDate>Tue, 09 Jun 2026 17:33:40 +0000</pubDate>
      <link>https://dev.to/harshdeepsingh13/building-an-llm-project-from-scratch-in-2026-20i8</link>
      <guid>https://dev.to/harshdeepsingh13/building-an-llm-project-from-scratch-in-2026-20i8</guid>
      <description>&lt;p&gt;Here’s the uncomfortable truth about “AI projects” a few years ago: the hard part was never the model. It was the plumbing. Standing up a vector database, wiring an embeddings pipeline, fighting with streaming responses, gluing five libraries together — by the time it worked, you’d forgotten what you set out to build.&lt;/p&gt;
&lt;p&gt;In 2026 that plumbing has largely collapsed into a weekend’s worth of work. Model prices have fallen roughly &lt;strong&gt;80% year over year&lt;/strong&gt;, free tiers are genuinely usable, your database now does vector search natively, and one SDK handles streaming and tool calling across every provider. The skill that’s actually in demand — retrieval-augmented generation with agents — is now reachable by a developer who has never touched machine learning.&lt;/p&gt;
&lt;p&gt;So this guide does something specific. We’re going to build &lt;strong&gt;one real project&lt;/strong&gt;, end to end, that you can put on your portfolio and let strangers use: an app where someone uploads their documents and &lt;em&gt;chats with them&lt;/em&gt; — asking questions and getting answers grounded in their own files, with citations, streamed token by token. It’s the canonical 2026 LLM project, and it teaches almost everything else by osmosis.&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;&lt;strong&gt;In plain English.&lt;/strong&gt; “RAG” means the AI doesn’t answer from memory — it &lt;em&gt;looks things up&lt;/em&gt; first. You give it a pile of documents; when you ask a question, it finds the most relevant passages and answers using only those. That’s why it can talk about &lt;em&gt;your&lt;/em&gt; files without ever having been trained on them.&lt;/p&gt;&lt;/blockquote&gt;
&lt;p&gt;This guide is written for three readers at once: newcomers, working software engineers, and AI engineers. The main text stays approachable; the “In plain English” notes add no-jargon explanations, and the “Under the hood” notes add depth for engineers.&lt;/p&gt;
&lt;h3&gt;The roadmap — what we’ll actually do&lt;/h3&gt;
&lt;p&gt;Eight steps. Each one produces something that works before we add the next layer.&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Build the mental model&lt;/strong&gt; — how RAG (and &lt;em&gt;agentic&lt;/em&gt; RAG) really works, in one diagram.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Choose your models&lt;/strong&gt; — a cost comparison of hosted and self-hosted LLMs, and which embedding model to use.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Set up the MERN stack&lt;/strong&gt; — project skeleton plus a MongoDB Atlas vector index.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Ingest documents&lt;/strong&gt; — upload, parse a PDF, and split it into chunks.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Embed &amp;amp; store&lt;/strong&gt; — turn chunks into vectors and save them in MongoDB.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Retrieve&lt;/strong&gt; — find the right passages with a single &lt;code&gt;$vectorSearch&lt;/code&gt; query.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Make it agentic&lt;/strong&gt; — let the model call retrieval as a tool, on its own terms.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Stream &amp;amp; deploy&lt;/strong&gt; — render tokens live in React, then ship it to its own URL for free.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Let’s start with the one idea that makes the other seven make sense.&lt;/p&gt;

&lt;h2&gt;Step 1 · The mental model: how RAG actually works&lt;/h2&gt;
&lt;p&gt;An LLM is a brilliant improviser with no access to your private data and a tendency to confidently make things up. &lt;strong&gt;Retrieval-Augmented Generation (RAG)&lt;/strong&gt; fixes both problems with one move: before the model answers, you fetch relevant facts and hand them over as context. The model then answers from &lt;em&gt;evidence&lt;/em&gt; rather than from vibes.&lt;/p&gt;
&lt;p&gt;There are two phases. The first happens once, ahead of time (&lt;strong&gt;ingestion&lt;/strong&gt;); the second happens on every question (&lt;strong&gt;retrieval + generation&lt;/strong&gt;).&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;INGESTION (run once, when a document is added)
  document --&amp;gt; split into chunks --&amp;gt; embed each chunk --&amp;gt; store vectors in MongoDB

RETRIEVAL + GENERATION (run on every question)
  question --&amp;gt; embed --&amp;gt; vector search in MongoDB --&amp;gt; top-k chunks
                                                         |
                             +---------------------------+
                             v
        [ question + retrieved chunks ] --&amp;gt; LLM --&amp;gt; grounded answer + citations&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The magic ingredient is the &lt;strong&gt;embedding&lt;/strong&gt;: a list of numbers (a vector) that captures the &lt;em&gt;meaning&lt;/em&gt; of a piece of text. Two passages about “canceling a subscription” land near each other in this number-space even if one says “refund” and the other says “cancel my plan.” Searching by meaning instead of keywords is what makes RAG feel intelligent.&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;&lt;strong&gt;In plain English.&lt;/strong&gt; Imagine every sentence gets pinned onto a giant map, where similar meanings sit close together. To answer your question, the app drops a pin for &lt;em&gt;your question&lt;/em&gt; and grabs whatever text is pinned nearby. Those nearby notes become the AI’s cheat sheet.&lt;/p&gt;&lt;/blockquote&gt;
&lt;h3&gt;What makes it “agentic” — the 2026 upgrade&lt;/h3&gt;
&lt;p&gt;Classic RAG retrieves &lt;em&gt;once&lt;/em&gt; and hopes the first search was good enough. That breaks on real questions: “Compare the refund policy in the 2024 contract with the 2025 one” needs two different searches and a comparison. &lt;strong&gt;Agentic RAG&lt;/strong&gt; hands the model the steering wheel. Retrieval becomes a &lt;em&gt;tool&lt;/em&gt; the model can call — repeatedly — deciding what to search for, judging whether the results are sufficient, and searching again before it commits to an answer.&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;&lt;strong&gt;Under the hood.&lt;/strong&gt; A 2025 survey (“Agentic Retrieval-Augmented Generation,” arXiv:2501.09136) frames these systems around four patterns: &lt;strong&gt;reflection, planning, tool use, and multi-agent collaboration&lt;/strong&gt;. In practice you’ll implement query rewriting/decomposition, multi-hop “retrieve → reason → retrieve” loops, and self-critique (“do these passages actually answer the question?”). The cost: 3–10× the tokens and 2–5× the latency of vanilla RAG. So &lt;strong&gt;gate it&lt;/strong&gt; — a trivial FAQ should never enter the loop; a cross-document question can’t be answered without it. Cap the loop at ~5 iterations so a confused agent can’t spend your budget in a runaway.&lt;/p&gt;&lt;/blockquote&gt;
&lt;p&gt;We’ll build the simple pipeline first (so you can see every piece), then promote retrieval to an agentic tool in Step 7. That progression &lt;em&gt;is&lt;/em&gt; the lesson.&lt;/p&gt;
&lt;h2&gt;Step 2 · Choosing your LLMs — cheapest viable first&lt;/h2&gt;
&lt;p&gt;You’ll use two kinds of model: an &lt;strong&gt;embedding model&lt;/strong&gt; (turns text into vectors) and a &lt;strong&gt;generation model&lt;/strong&gt; (writes the answer). They’re priced and chosen separately. Let’s start with generation, since that’s where the “which LLM?” anxiety lives.&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;&lt;strong&gt;Read this first.&lt;/strong&gt; Every price below is a &lt;strong&gt;2026 snapshot&lt;/strong&gt; and model names change almost monthly. Treat this table as a shape, not gospel — &lt;strong&gt;confirm the current number on the provider’s pricing page&lt;/strong&gt; before you commit. The &lt;em&gt;strategy&lt;/em&gt; (route cheap, escalate rarely) outlives any specific figure.&lt;/p&gt;&lt;/blockquote&gt;
&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;colgroup&gt;
&lt;col&gt;
&lt;col&gt;
&lt;col&gt;
&lt;col&gt;
&lt;col&gt;
&lt;/colgroup&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;th colspan="1" rowspan="1"&gt;&lt;p&gt;Provider / model&lt;/p&gt;&lt;/th&gt;
&lt;th colspan="1" rowspan="1"&gt;&lt;p&gt;Input ($/1M)&lt;/p&gt;&lt;/th&gt;
&lt;th colspan="1" rowspan="1"&gt;&lt;p&gt;Output ($/1M)&lt;/p&gt;&lt;/th&gt;
&lt;th colspan="1" rowspan="1"&gt;&lt;p&gt;Free tier?&lt;/p&gt;&lt;/th&gt;
&lt;th colspan="1" rowspan="1"&gt;&lt;p&gt;Best for&lt;/p&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td colspan="1" rowspan="1"&gt;&lt;p&gt;Google Gemini Flash-Lite&lt;/p&gt;&lt;/td&gt;
&lt;td colspan="1" rowspan="1"&gt;&lt;p&gt;~$0.10–0.25&lt;/p&gt;&lt;/td&gt;
&lt;td colspan="1" rowspan="1"&gt;&lt;p&gt;~$0.40–1.50&lt;/p&gt;&lt;/td&gt;
&lt;td colspan="1" rowspan="1"&gt;&lt;p&gt;Yes — generous&lt;/p&gt;&lt;/td&gt;
&lt;td colspan="1" rowspan="1"&gt;&lt;p&gt;Learning &amp;amp; high volume; the default starter&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td colspan="1" rowspan="1"&gt;&lt;p&gt;Groq · Llama 3.1 8B&lt;/p&gt;&lt;/td&gt;
&lt;td colspan="1" rowspan="1"&gt;&lt;p&gt;~$0.05&lt;/p&gt;&lt;/td&gt;
&lt;td colspan="1" rowspan="1"&gt;&lt;p&gt;~$0.08&lt;/p&gt;&lt;/td&gt;
&lt;td colspan="1" rowspan="1"&gt;&lt;p&gt;Yes&lt;/p&gt;&lt;/td&gt;
&lt;td colspan="1" rowspan="1"&gt;&lt;p&gt;Blazing-fast responses; cheapest tokens&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td colspan="1" rowspan="1"&gt;&lt;p&gt;DeepSeek&lt;/p&gt;&lt;/td&gt;
&lt;td colspan="1" rowspan="1"&gt;&lt;p&gt;~$0.14&lt;/p&gt;&lt;/td&gt;
&lt;td colspan="1" rowspan="1"&gt;&lt;p&gt;~$0.28&lt;/p&gt;&lt;/td&gt;
&lt;td colspan="1" rowspan="1"&gt;&lt;p&gt;No (cheap)&lt;/p&gt;&lt;/td&gt;
&lt;td colspan="1" rowspan="1"&gt;&lt;p&gt;Cheapest frontier-class; OpenAI-compatible API&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td colspan="1" rowspan="1"&gt;&lt;p&gt;OpenAI · GPT mini-tier&lt;/p&gt;&lt;/td&gt;
&lt;td colspan="1" rowspan="1"&gt;&lt;p&gt;~$0.15–0.75&lt;/p&gt;&lt;/td&gt;
&lt;td colspan="1" rowspan="1"&gt;&lt;p&gt;~$0.60–4.50&lt;/p&gt;&lt;/td&gt;
&lt;td colspan="1" rowspan="1"&gt;&lt;p&gt;Credits&lt;/p&gt;&lt;/td&gt;
&lt;td colspan="1" rowspan="1"&gt;&lt;p&gt;Strong all-rounder; great tool calling&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td colspan="1" rowspan="1"&gt;&lt;p&gt;Anthropic · Claude Haiku&lt;/p&gt;&lt;/td&gt;
&lt;td colspan="1" rowspan="1"&gt;&lt;p&gt;~$1.00&lt;/p&gt;&lt;/td&gt;
&lt;td colspan="1" rowspan="1"&gt;&lt;p&gt;~$5.00&lt;/p&gt;&lt;/td&gt;
&lt;td colspan="1" rowspan="1"&gt;&lt;p&gt;No&lt;/p&gt;&lt;/td&gt;
&lt;td colspan="1" rowspan="1"&gt;&lt;p&gt;Cheapest Claude; reliable instruction-following&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td colspan="1" rowspan="1"&gt;&lt;p&gt;Frontier (GPT / Claude / Gemini Pro)&lt;/p&gt;&lt;/td&gt;
&lt;td colspan="1" rowspan="1"&gt;&lt;p&gt;~$2.50–5.00&lt;/p&gt;&lt;/td&gt;
&lt;td colspan="1" rowspan="1"&gt;&lt;p&gt;~$15–30&lt;/p&gt;&lt;/td&gt;
&lt;td colspan="1" rowspan="1"&gt;&lt;p&gt;No&lt;/p&gt;&lt;/td&gt;
&lt;td colspan="1" rowspan="1"&gt;&lt;p&gt;Final answer only, when quality truly matters&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;
&lt;p&gt;The pattern jumps out: the cheapest models are &lt;strong&gt;50–100× cheaper&lt;/strong&gt; than the flagships. For a RAG app, most of your token spend is feeding retrieved context into the model — so a cheap, capable model handling that bulk is the entire cost game. Use a frontier model only for the final synthesis, and only if you can measure that it’s actually better for your task.&lt;/p&gt;
&lt;h3&gt;Self-hosted: running models on your own machine&lt;/h3&gt;
&lt;p&gt;You can skip API bills entirely with &lt;strong&gt;Ollama&lt;/strong&gt;, which runs open models locally and exposes an OpenAI-compatible endpoint at &lt;code&gt;http://localhost:11434/v1&lt;/code&gt; — meaning your code barely changes. One command pulls and runs a model:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;# install from ollama.com, then:
ollama run llama3.1:8b          # chat model, ~6-8 GB VRAM
ollama pull nomic-embed-text    # local embedding model, free&lt;/code&gt;&lt;/pre&gt;
&lt;blockquote&gt;&lt;p&gt;&lt;strong&gt;Under the hood.&lt;/strong&gt; Rough VRAM at Q4 quantization: 7–8B models ≈ 6–8 GB, 14B ≈ 10–12 GB, 32B ≈ 20–22 GB, 70B ≈ 43–48 GB (Apple Silicon unified memory counts fully). &lt;strong&gt;Break-even vs hosted APIs is roughly 500K tokens/day of sustained traffic&lt;/strong&gt; — below that, hosted is cheaper &lt;em&gt;and&lt;/em&gt; you skip the ops. Trade-offs: full privacy and $0/token, but weaker reasoning than frontier models and you own the uptime.&lt;/p&gt;&lt;/blockquote&gt;
&lt;h3&gt;The embedding model — quieter, but it matters&lt;/h3&gt;
&lt;p&gt;Embeddings are dramatically cheaper than generation, so this is an easy call. For most projects, &lt;strong&gt;OpenAI’s &lt;/strong&gt;&lt;code&gt;text-embedding-3-small&lt;/code&gt; is the sweet spot.&lt;/p&gt;
&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;colgroup&gt;
&lt;col&gt;
&lt;col&gt;
&lt;col&gt;
&lt;col&gt;
&lt;/colgroup&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;th colspan="1" rowspan="1"&gt;&lt;p&gt;Model&lt;/p&gt;&lt;/th&gt;
&lt;th colspan="1" rowspan="1"&gt;&lt;p&gt;Dimensions&lt;/p&gt;&lt;/th&gt;
&lt;th colspan="1" rowspan="1"&gt;&lt;p&gt;Price ($/1M)&lt;/p&gt;&lt;/th&gt;
&lt;th colspan="1" rowspan="1"&gt;&lt;p&gt;Notes&lt;/p&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td colspan="1" rowspan="1"&gt;&lt;p&gt;OpenAI text-embedding-3-small&lt;/p&gt;&lt;/td&gt;
&lt;td colspan="1" rowspan="1"&gt;&lt;p&gt;1536&lt;/p&gt;&lt;/td&gt;
&lt;td colspan="1" rowspan="1"&gt;&lt;p&gt;~$0.02&lt;/p&gt;&lt;/td&gt;
&lt;td colspan="1" rowspan="1"&gt;&lt;p&gt;Best balance; our pick for the build&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td colspan="1" rowspan="1"&gt;&lt;p&gt;Google gemini-embedding&lt;/p&gt;&lt;/td&gt;
&lt;td colspan="1" rowspan="1"&gt;&lt;p&gt;768&lt;/p&gt;&lt;/td&gt;
&lt;td colspan="1" rowspan="1"&gt;&lt;p&gt;Free tier / ~$0.025&lt;/p&gt;&lt;/td&gt;
&lt;td colspan="1" rowspan="1"&gt;&lt;p&gt;Free-tier friendly&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td colspan="1" rowspan="1"&gt;&lt;p&gt;Voyage (voyage-3.5-lite)&lt;/p&gt;&lt;/td&gt;
&lt;td colspan="1" rowspan="1"&gt;&lt;p&gt;512–1024 (reducible)&lt;/p&gt;&lt;/td&gt;
&lt;td colspan="1" rowspan="1"&gt;&lt;p&gt;~$0.02&lt;/p&gt;&lt;/td&gt;
&lt;td colspan="1" rowspan="1"&gt;&lt;p&gt;Now MongoDB-owned; long context&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td colspan="1" rowspan="1"&gt;&lt;p&gt;nomic-embed-text / BGE-M3 (open)&lt;/p&gt;&lt;/td&gt;
&lt;td colspan="1" rowspan="1"&gt;&lt;p&gt;768 / 1024&lt;/p&gt;&lt;/td&gt;
&lt;td colspan="1" rowspan="1"&gt;&lt;p&gt;Free (self-host)&lt;/p&gt;&lt;/td&gt;
&lt;td colspan="1" rowspan="1"&gt;&lt;p&gt;Run free in Ollama; great quality&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;
&lt;blockquote&gt;&lt;p&gt;&lt;strong&gt;Gotcha that bites everyone.&lt;/strong&gt; Vectors from different embedding models are &lt;strong&gt;not compatible&lt;/strong&gt;. If you switch embedding models later, you must &lt;em&gt;re-embed your entire corpus&lt;/em&gt;. Pick one and commit. (Embedding 10M chunks with &lt;code&gt;text-embedding-3-small&lt;/code&gt; costs only ~$100, so this is about consistency, not cost.)&lt;/p&gt;&lt;/blockquote&gt;
&lt;p&gt;&lt;strong&gt;Our choices for this build:&lt;/strong&gt; embeddings via &lt;code&gt;text-embedding-3-small&lt;/code&gt;; generation via a cheap, fast model (Gemini Flash or a GPT mini-tier model) while learning — swappable in one line thanks to the SDK we’re about to set up.&lt;/p&gt;
&lt;h2&gt;Step 3 · Setting up the MERN stack&lt;/h2&gt;
&lt;p&gt;MERN is a natural fit for RAG in 2026 for one reason that didn’t used to be true: &lt;strong&gt;MongoDB does vector search natively&lt;/strong&gt;. Your embeddings live in the same documents as your data, queried with a normal aggregation pipeline. No separate vector database to run, sync, or pay for.&lt;/p&gt;
&lt;p&gt;Here’s the shape of the app — a standard MERN split, with the LLM logic living safely on the server:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;MongoDB Atlas&lt;/strong&gt; — stores documents, chunks, embeddings, and chat history. Free &lt;code&gt;M0&lt;/code&gt; tier includes vector search.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Express + Node.js&lt;/strong&gt; — the API: handles uploads, embedding, retrieval, and talking to the LLM. &lt;em&gt;All API keys live here, never in the browser.&lt;/em&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;React&lt;/strong&gt; — the chat UI, rendering streamed tokens as they arrive.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;The glue: the Vercel AI SDK&lt;/strong&gt; — one library for streaming, provider-switching, and tool calling, on both server and client.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;The one piece of setup that’s new: the vector index&lt;/h3&gt;
&lt;p&gt;After creating a free cluster on Atlas, you define a &lt;strong&gt;vector search index&lt;/strong&gt; on the collection that will hold your chunks. This tells MongoDB how to search the embedding field. In the Atlas UI (Atlas Search → Create Index → JSON editor), or via code:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;{
  "fields": [
    {
      "type": "vector",
      "path": "embedding",
      "numDimensions": 1536,
      "similarity": "cosine"
    },
    { "type": "filter", "path": "userId" }
  ]
}&lt;/code&gt;&lt;/pre&gt;
&lt;blockquote&gt;&lt;p&gt;&lt;strong&gt;Under the hood.&lt;/strong&gt; &lt;code&gt;numDimensions&lt;/code&gt; must &lt;em&gt;exactly&lt;/em&gt; match your embedding model’s output (1536 for &lt;code&gt;text-embedding-3-small&lt;/code&gt;). &lt;code&gt;similarity&lt;/code&gt; can be &lt;code&gt;cosine&lt;/code&gt;, &lt;code&gt;euclidean&lt;/code&gt;, or &lt;code&gt;dotProduct&lt;/code&gt; — cosine is the safe default. The &lt;code&gt;filter&lt;/code&gt; field on &lt;code&gt;userId&lt;/code&gt; is what lets you scope searches per-user, so visitors only ever retrieve their own documents — essential the moment your demo is public. Atlas uses HNSW for approximate nearest-neighbor search under the hood.&lt;/p&gt;&lt;/blockquote&gt;
&lt;p&gt;That’s the only “AI-specific infrastructure” in the whole project. Everything else is ordinary Express and React.&lt;/p&gt;
&lt;h2&gt;Step 4 · Ingesting documents: upload, parse, chunk&lt;/h2&gt;
&lt;p&gt;When a user uploads a file, three things happen on the server: we accept the upload, extract its text, and split that text into bite-sized &lt;strong&gt;chunks&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;Accepting uploads is standard Express (use &lt;code&gt;multer&lt;/code&gt;). Extracting text from a PDF is a one-liner with &lt;code&gt;pdf-parse&lt;/code&gt; (v2 fork, TypeScript-native — see note in code):&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;import { PDFParse } from "pdf-parse"; // requires the v2 fork, not the default pdf-parse@1

// `buffer` is the uploaded file from multer
const parser = new PDFParse({ data: buffer });
const { text } = await parser.getText();&lt;/code&gt;&lt;/pre&gt;
&lt;h3&gt;Why we chunk — and how big&lt;/h3&gt;
&lt;p&gt;You can’t embed an entire 50-page PDF as one vector; the meaning gets blurred into mush, and you’d feed the model far more than it needs. So we slice the text into passages. Each chunk becomes one searchable unit.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;// ~1 token = ~4 characters, so 2000 chars = ~500 tokens.
// We overlap chunks so a sentence split across a boundary
// still appears whole in at least one chunk.
function chunkText(text, size = 2000, overlap = 200) {
  const chunks = [];
  for (let i = 0; i &amp;lt; text.length; i += size - overlap) {
    chunks.push(text.slice(i, i + size).trim());
  }
  return chunks.filter(Boolean);
}&lt;/code&gt;&lt;/pre&gt;
&lt;blockquote&gt;&lt;p&gt;&lt;strong&gt;In plain English.&lt;/strong&gt; Think of chunking like cutting a long article into index cards. Too big and each card covers too many topics to be useful; too small and you lose context. A few hundred words per card, with a little overlap so sentences don’t get sliced in half, is the reliable starting point.&lt;/p&gt;&lt;/blockquote&gt;
&lt;blockquote&gt;&lt;p&gt;&lt;strong&gt;Under the hood.&lt;/strong&gt; Start with recursive character splitting at &lt;strong&gt;~400–512 tokens with 10–20% overlap&lt;/strong&gt; — the pragmatic default (~85–90% retrieval recall). &lt;em&gt;Semantic chunking&lt;/em&gt; can add ~2–3% recall but costs roughly 14× more to index, so only graduate to it when your evaluation metrics demand it. One caveat: at least one 2026 analysis found overlap added no measurable benefit in its setup while raising indexing cost — so treat the overlap figure as a starting point to validate against your own data, not a law.&lt;/p&gt;&lt;/blockquote&gt;
&lt;h2&gt;Step 5 · Embedding &amp;amp; storing vectors in MongoDB&lt;/h2&gt;
&lt;p&gt;Now we turn each chunk into a vector and save it. The AI SDK’s &lt;code&gt;embedMany&lt;/code&gt; batches the whole array efficiently, then we write one MongoDB document per chunk — text and vector together, tagged with the owner and source.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;import { embedMany } from "ai";
import { openai } from "@ai-sdk/openai";

const chunks = chunkText(text);                 // from Step 4

const { embeddings } = await embedMany({
  model: openai.embedding("text-embedding-3-small"),
  values: chunks,                               // array of strings
});

await db.collection("chunks").insertMany(
  chunks.map((chunk, i) =&amp;gt; ({
    userId,                                     // who owns it
    source: filename,                           // where it came from
    text: chunk,                                // the passage itself
    embedding: embeddings[i],                   // the 1536-dim vector
    chunkIndex: i,
    createdAt: new Date(),
  }))
);&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;That’s ingestion done. The vector is just an array of floats stored on a normal document — no special database, no migration. Upload a 30-page PDF and you’ve got a few dozen searchable, meaning-aware chunks sitting in MongoDB.&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;&lt;strong&gt;Under the hood.&lt;/strong&gt; &lt;code&gt;embedMany&lt;/code&gt; auto-batches large arrays, so you can hand it hundreds of chunks without managing request limits yourself. Store rich metadata (page number, section heading, document ID) alongside each chunk now — you’ll want it later for citations, filtering, and “parent-document” retrieval. This is the step you run once per upload, not once per question.&lt;/p&gt;&lt;/blockquote&gt;
&lt;h2&gt;Step 6 · Retrieval: finding the right passages&lt;/h2&gt;
&lt;p&gt;Here’s the payoff for all that setup. To answer a question, we embed the question with the &lt;em&gt;same&lt;/em&gt; model, then run a single &lt;code&gt;$vectorSearch&lt;/code&gt; aggregation to pull the closest chunks. This is the whole of “search by meaning,” in one query:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;import { embed } from "ai";
import { openai } from "@ai-sdk/openai";

const { embedding } = await embed({
  model: openai.embedding("text-embedding-3-small"),
  value: userQuestion,
});

const passages = await db.collection("chunks").aggregate([
  {
    $vectorSearch: {
      index: "vector_index",
      path: "embedding",
      queryVector: embedding,
      numCandidates: 150,        // over-fetch, then narrow
      limit: 5,                  // keep the best 5
      filter: { userId: { $eq: currentUserId } }
    }
  },
  {
    $project: {
      _id: 0,
      text: 1,
      source: 1,
      score: { $meta: "vectorSearchScore" }
    }
  }
]).toArray();&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;You now have the five most relevant passages, each with a similarity &lt;code&gt;score&lt;/code&gt;. Feed those into the model as context and you have working RAG. But before we generate, two notes that separate a toy from something good:&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;&lt;strong&gt;Under the hood.&lt;/strong&gt; &lt;code&gt;$vectorSearch&lt;/code&gt; must be the &lt;strong&gt;first&lt;/strong&gt; stage in the pipeline. &lt;code&gt;numCandidates&lt;/code&gt; is the approximate-search breadth — it must be ≥ &lt;code&gt;limit&lt;/code&gt;, and a common heuristic is 10–20× your limit (here 150 for a limit of 5). The &lt;code&gt;filter&lt;/code&gt; on &lt;code&gt;userId&lt;/code&gt; uses the field we declared in the index, enforcing per-user isolation efficiently. Use the &lt;code&gt;score&lt;/code&gt; as a relevance gate: if the top result scores below ~0.75, it’s often better to answer “I don’t have enough information” than to let the model hallucinate from weak matches.&lt;/p&gt;&lt;/blockquote&gt;
&lt;p&gt;The single biggest quality upgrade you’ll make later isn’t a bigger model — it’s &lt;strong&gt;hybrid search + reranking&lt;/strong&gt; (we cover it in “Where to go next”). For now, vector search alone is plenty to ship.&lt;/p&gt;
&lt;h2&gt;Step 7 · Making retrieval agentic&lt;/h2&gt;
&lt;p&gt;So far the server retrieves &lt;em&gt;before&lt;/em&gt; calling the model — a fixed pipeline. To make it agentic, we flip the control: we describe retrieval as a &lt;strong&gt;tool&lt;/strong&gt;, hand it to the model, and let the model decide when (and how often) to call it. This is the “hot” part of 2026 — and with the AI SDK it’s remarkably little code.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;import { streamText, tool, embed, stepCountIs } from "ai";
import { openai } from "@ai-sdk/openai";
import { z } from "zod";

// 1) Retrieval, described as a tool the model can call
const searchDocuments = tool({
  description: "Search the user's uploaded documents for passages " +
               "relevant to a question. Call this whenever you need facts.",
  inputSchema: z.object({
    query: z.string().describe("a focused search query"),
  }),
  execute: async ({ query }) =&amp;gt; {
    const { embedding } = await embed({
      model: openai.embedding("text-embedding-3-small"),
      value: query,
    });
    return db.collection("chunks").aggregate([
      { $vectorSearch: {
          index: "vector_index", path: "embedding",
          queryVector: embedding, numCandidates: 150, limit: 5,
          filter: { userId: { $eq: currentUserId } } } },
      { $project: { _id: 0, text: 1, source: 1,
          score: { $meta: "vectorSearchScore" } } },
    ]).toArray();
  },
});

// 2) Let the model run the loop: think -&amp;gt; search -&amp;gt; (search again) -&amp;gt; answer
const result = streamText({
  model: openai("gpt-4o-mini"),     // swap to any provider in one line
  system: "Answer ONLY using passages returned by searchDocuments. " +
          "Cite the source. If the passages don't contain the answer, " +
          "say you don't know - do not guess.",
  messages,               // from req.body — full chat history from the client
  tools: { searchDocuments },
  stopWhen: stepCountIs(5),         // hard cap on the agentic loop
});

return result.toUIMessageStreamResponse();&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Read what that does, because it’s genuinely different from classic RAG: the model receives the question, decides on its own to call &lt;code&gt;searchDocuments&lt;/code&gt; with a query &lt;em&gt;it&lt;/em&gt; wrote, reads the results, and may call it again with a refined query before answering. For “compare the 2024 and 2025 refund policies,” it can naturally run two searches and synthesize. You didn’t orchestrate that — the model did.&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;&lt;strong&gt;Important 2026 change.&lt;/strong&gt; If you learned the AI SDK before v5: &lt;code&gt;maxSteps&lt;/code&gt; was removed from the client. Multi-step tool loops are now controlled &lt;strong&gt;server-side&lt;/strong&gt; with &lt;code&gt;stopWhen&lt;/code&gt; (e.g. &lt;code&gt;stepCountIs(5)&lt;/code&gt;). This cap is also your cost safety rail — without it, a confused agent could loop and run up your bill.&lt;/p&gt;&lt;/blockquote&gt;
&lt;blockquote&gt;&lt;p&gt;&lt;strong&gt;Under the hood.&lt;/strong&gt; The &lt;code&gt;system&lt;/code&gt; prompt is doing heavy lifting for safety and grounding: “answer only from retrieved passages” plus “say you don’t know” is your first and cheapest defense against hallucination. The Zod &lt;code&gt;inputSchema&lt;/code&gt; gives the model a typed contract for the tool’s arguments. &lt;code&gt;toUIMessageStreamResponse()&lt;/code&gt; emits a standard SSE stream the React client consumes natively. Want it reusable across other AI clients (Claude Desktop, Cursor, etc.)? Expose this same retrieval as an &lt;strong&gt;MCP&lt;/strong&gt; server — overkill for a single app, but the natural next step if your tools should be shared.&lt;/p&gt;&lt;/blockquote&gt;
&lt;h2&gt;Step 8 · Streaming to React, then deploying for free&lt;/h2&gt;
&lt;p&gt;The backend streams tokens; the frontend renders them as they land. The AI SDK’s &lt;code&gt;useChat&lt;/code&gt; hook handles the entire streaming lifecycle, so your component stays tiny:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;"use client";
import { useChat } from "@ai-sdk/react";
import { DefaultChatTransport } from "ai";
import { useState } from "react";

export default function Chat() {
  const [input, setInput] = useState("");
  const { messages, sendMessage, status } = useChat({
    transport: new DefaultChatTransport({ api: "/api/chat" }),
  });

  return (
    &amp;lt;div&amp;gt;
      {messages.map((m) =&amp;gt; (
        &amp;lt;div key={m.id}&amp;gt;
          &amp;lt;strong&amp;gt;{m.role}: &amp;lt;/strong&amp;gt;
          {m.parts.map((p, i) =&amp;gt;
            p.type === "text" ? &amp;lt;span key={i}&amp;gt;{p.text}&amp;lt;/span&amp;gt; : null
          )}
        &amp;lt;/div&amp;gt;
      ))}

      &amp;lt;input value={input} onChange={(e) =&amp;gt; setInput(e.target.value)} /&amp;gt;
      &amp;lt;button
        disabled={status !== "ready"}
        onClick={() =&amp;gt; { sendMessage({ text: input }); setInput(""); }}
      &amp;gt;
        {status === "streaming" ? "Thinking..." : "Ask"}
      &amp;lt;/button&amp;gt;
    &amp;lt;/div&amp;gt;
  );
}&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The &lt;code&gt;status&lt;/code&gt; field (&lt;code&gt;ready&lt;/code&gt; / &lt;code&gt;submitted&lt;/code&gt; / &lt;code&gt;streaming&lt;/code&gt; / &lt;code&gt;error&lt;/code&gt;) gives you loading and disabled states for free. Tokens appear live as the model writes them — the experience people now expect from any AI app.&lt;/p&gt;
&lt;h3&gt;Putting it on its own website — for free&lt;/h3&gt;
&lt;p&gt;This is the part that turns a tutorial into a portfolio piece. Three boxes, three free tiers:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Vercel&lt;/strong&gt; — React frontend, free Hobby tier, global CDN, custom domain, auto-deploy from GitHub.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Render&lt;/strong&gt; — Node/Express backend, free web service (sleeps when idle), where streaming and keys live.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;MongoDB Atlas M0&lt;/strong&gt; — database + vector search, permanently free, 512 MB, limited Vector Search index capacity (verify current limits in Atlas docs).&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Total cost for a low-traffic demo: &lt;strong&gt;$0–$5/month&lt;/strong&gt;.&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;&lt;strong&gt;Under the hood.&lt;/strong&gt; Run the streaming endpoint on the long-lived backend (Render/Railway), not a short serverless function that can time out mid-stream. Know the free-tier edges: Render free services &lt;strong&gt;spin down after ~15 min idle&lt;/strong&gt; (a ~30–60s cold start on the next request — fine for a portfolio), and Atlas M0 caps at &lt;strong&gt;512 MB and limited Vector Search index capacity&lt;/strong&gt;. When you outgrow them, a dedicated Atlas tier and an always-on backend plan are the upgrade path.&lt;/p&gt;&lt;/blockquote&gt;
&lt;h4&gt;Cost controls so you never get a surprise bill&lt;/h4&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Set a &lt;strong&gt;hard spend cap&lt;/strong&gt; in your LLM provider dashboard. Non-negotiable for a public demo.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Default to a cheap model; cap &lt;code&gt;max_output_tokens&lt;/code&gt;; keep the agentic &lt;code&gt;stepCountIs&lt;/code&gt; low.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Add &lt;strong&gt;per-user / per-IP rate limiting&lt;/strong&gt; and an auth wall so bots can’t drain your quota.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Log token usage per request (the SDK’s &lt;code&gt;onFinish&lt;/code&gt; callback) so you can see costs before they surprise you.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Keep &lt;strong&gt;every API key on the server&lt;/strong&gt;. A key shipped to the browser is a key that will be stolen.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;Common pitfalls (and the fixes)&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;The semantic gap.&lt;/strong&gt; Your question and the document use different words and vector search misses the match. &lt;em&gt;Fix:&lt;/em&gt; add hybrid search (keyword + vector).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Context dilution.&lt;/strong&gt; You retrieve 10 chunks when only 2 are relevant, and the noise degrades the answer. &lt;em&gt;Fix:&lt;/em&gt; rerank, then keep a tighter top-k.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Chunk-boundary amnesia.&lt;/strong&gt; The answer is split across two chunks and neither is retrieved whole. &lt;em&gt;Fix:&lt;/em&gt; overlap, or parent-document retrieval.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Confident nonsense.&lt;/strong&gt; The model answers from weak matches as if certain. &lt;em&gt;Fix:&lt;/em&gt; a similarity-score threshold plus a system prompt that permits “I don’t know.”&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Reaching for the agent loop too early.&lt;/strong&gt; Simple lookups don’t need multi-hop reasoning — they need one fast search. &lt;em&gt;Fix:&lt;/em&gt; gate the agentic path to genuinely complex questions.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Where to go next&lt;/h2&gt;
&lt;p&gt;You’ve shipped a working agentic RAG app. Three upgrades, in priority order:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Hybrid search + reranking&lt;/strong&gt; — the highest-ROI quality jump. Run keyword (Atlas full-text via &lt;code&gt;$search&lt;/code&gt;) &lt;em&gt;and&lt;/em&gt; vector search, fuse them with Reciprocal Rank Fusion (RRF), then rerank the top 20–50 candidates with a cross-encoder (Cohere, Voyage, or self-hosted BGE) and keep the best handful. Benchmarks routinely show reranking as the single biggest accuracy gain.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Better ingestion&lt;/strong&gt; — messy PDFs with tables and multi-column layouts need a real parser (LlamaIndex’s LiteParse, Unstructured, or Docling) rather than plain text extraction.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Make it shareable via MCP&lt;/strong&gt; — expose your retrieval as a Model Context Protocol server so other AI clients can use the same tool. Worth it once your tools outlive this one app.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;That’s the whole arc: from “an LLM can’t see my data” to a public, agentic, document-grounded chat app that costs about nothing to run. The plumbing finally got out of the way — what you build on top of it is the interesting part. Now go put something on that empty portfolio URL.&lt;/p&gt;
&lt;h2&gt;Frequently asked questions&lt;/h2&gt;
&lt;h3&gt;What is agentic RAG, exactly?&lt;/h3&gt;
&lt;p&gt;Agentic RAG turns retrieval into a tool the model calls on demand. Instead of one fixed “retrieve then answer” pass, the model plans, searches, judges whether the results are sufficient, and searches again until it has enough evidence — then answers. It’s slower and costs more tokens, but it handles complex, multi-step questions that one-shot RAG can’t.&lt;/p&gt;
&lt;h3&gt;Do I need a separate vector database?&lt;/h3&gt;
&lt;p&gt;No. On the MERN stack you store embeddings inside your normal MongoDB documents and query them with the &lt;code&gt;$vectorSearch&lt;/code&gt; aggregation stage in MongoDB Atlas. For the vast majority of projects, that removes the need for a dedicated vector database entirely.&lt;/p&gt;
&lt;h3&gt;What’s the cheapest LLM for a RAG app in 2026?&lt;/h3&gt;
&lt;p&gt;For learning, the most generous free tiers are Google Gemini Flash/Flash-Lite and Groq. For the cheapest paid frontier-class model, DeepSeek is usually lowest. Prices change monthly — confirm on the provider’s pricing page. The durable strategy is to route bulk work to a cheap model and reserve a frontier model for the final answer only.&lt;/p&gt;
&lt;h3&gt;How much does it cost to run?&lt;/h3&gt;
&lt;p&gt;A portfolio-grade demo runs at roughly $0–$5/month: MongoDB Atlas free M0, a free LLM tier, embeddings at ~$0.02 per million tokens, and free deploy tiers on Vercel and Render. Set a provider spend cap and rate limits so a public demo can’t surprise you.&lt;/p&gt;
&lt;h3&gt;LangChain, LlamaIndex, or the Vercel AI SDK?&lt;/h3&gt;
&lt;p&gt;For a MERN streaming chat app, the Vercel AI SDK plus direct MongoDB vector queries is the lighter, recommended path in 2026. Reach for LlamaIndex.TS if your main challenge is heavy document ingestion, or LangChain.js/LangGraph for complex multi-agent orchestration. For ~90% of RAG web apps, the AI SDK is the right call.&lt;/p&gt;
&lt;h3&gt;Can I run this fully offline with a local model?&lt;/h3&gt;
&lt;p&gt;Yes. Ollama runs open models locally and exposes an OpenAI-compatible endpoint, so your code barely changes. Use a local embedding model like &lt;code&gt;nomic-embed-text&lt;/code&gt; too. It’s ideal for development and privacy-sensitive data; for a low-traffic public demo, hosted free tiers are usually simpler and cheaper.&lt;/p&gt;

&lt;blockquote&gt;&lt;p&gt;&lt;strong&gt;A note on accuracy.&lt;/strong&gt; LLM pricing, model names, and SDK APIs change fast. Every figure here is a 2026 snapshot — verify current prices on each provider’s pricing page and current method names in the Vercel AI SDK docs before shipping to production. Code samples are illustrative walkthroughs, not drop-in files. Images: “Visualising AI” by Google DeepMind on Unsplash, free under the Unsplash License.&lt;/p&gt;&lt;/blockquote&gt;

&lt;h2&gt;TL;DR&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;You can build and ship a production-shaped, &lt;strong&gt;agentic RAG “chat with your documents”&lt;/strong&gt; app entirely on &lt;strong&gt;MERN&lt;/strong&gt; for about &lt;strong&gt;$0/month&lt;/strong&gt;: MongoDB Atlas’ free tier (with built-in vector search), a free LLM tier (Gemini Flash or Groq), embeddings at &lt;strong&gt;$0.02 per million tokens&lt;/strong&gt;, and free deploys on Vercel + Render.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The 2026 stack is leaner than you think: &lt;strong&gt;MongoDB Atlas Vector Search&lt;/strong&gt; (no separate vector database), the &lt;strong&gt;Vercel AI SDK&lt;/strong&gt; for streaming + tool calling, and &lt;strong&gt;“agentic retrieval”&lt;/strong&gt; — where the model itself decides when and what to search — instead of the old retrieve-once-then-answer pipeline.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Pick models by job, not by brand.&lt;/strong&gt; Route cheap, high-volume work to Gemini Flash-Lite, DeepSeek, or Groq-hosted Llama; reserve a frontier model only for the final answer when quality matters. Self-hosting with Ollama only beats hosted APIs above heavy, sustained traffic.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;By the end you’ll understand embeddings, chunking, vector retrieval, tool calling, token streaming in React, and how to put the whole thing on its own public website — with cost controls so you never get a surprise bill.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;


</description>
      <category>rag</category>
      <category>agenticai</category>
      <category>llm</category>
      <category>mongodbatlas</category>
    </item>
    <item>
      <title>The Best Claude Setup (That Works on Any AI Tool)</title>
      <dc:creator>Harshdeep Singh</dc:creator>
      <pubDate>Thu, 04 Jun 2026 23:16:05 +0000</pubDate>
      <link>https://dev.to/harshdeepsingh13/the-best-claude-setup-that-works-on-any-ai-tool-5h7i</link>
      <guid>https://dev.to/harshdeepsingh13/the-best-claude-setup-that-works-on-any-ai-tool-5h7i</guid>
      <description>&lt;p&gt;Here is a confession that might sound odd in a guide about setting up Claude Code: the goal is not to marry Claude Code. The goal is to build a setup so portable that if something better ships next month — OpenAI's Codex, Cursor, Windsurf, whatever wins the week — you could pack up everything you have built and move in an afternoon. Your instructions, your custom tools, your reusable workflows: all of it should come with you.&lt;/p&gt;
&lt;p&gt;That sounds like a strange thing to optimize for. Most "best setup" posts try to lock you deeper into one tool. But the AI coding world is moving too fast for that bet to be safe, and — happily — a small set of open standards now make portability the default instead of a fantasy. If you are a working engineer, this is how you stop re-plumbing your environment every time you switch editors. And if you are a &lt;em&gt;vibe coder&lt;/em&gt; — someone who builds mostly by describing what you want in plain English and steering the AI, a style Andrej Karpathy named in early 2025 — this is how you get a professional-grade setup without months of fiddling.&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;The short version: keep your AI's instructions, tools, and skills in plain, open formats you own — not buried in one vendor's settings. Three standards make this work: MCP for tools, AGENTS.md for instructions, and Agent Skills for reusable know-how. Learn those, and the question "which coding agent should I use?" stops being scary.&lt;/p&gt;&lt;/blockquote&gt;
&lt;p&gt;Three standards do the heavy lifting here, and we will spend most of our time on them: &lt;span&gt;MCP&lt;/span&gt; &lt;span&gt;AGENTS.md&lt;/span&gt; &lt;span&gt;Agent Skills&lt;/span&gt;. Let's build up to a setup you would actually be happy to leave.&lt;/p&gt;
&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1587654780291-39c9404d746b%3Fauto%3Dformat%26fit%3Dcrop%26w%3D1600%26q%3D80" alt="Yellow, red, blue, and green LEGO bricks scattered on a surface, representing modular, interchangeable building blocks" width="1600" height="1067"&gt;&lt;p&gt;&lt;em&gt;Think of your setup as interchangeable bricks, not a sculpture glued to one base. Photo by &lt;/em&gt;&lt;a rel="noopener noreferrer nofollow" href="https://unsplash.com/@xavi_cabrera"&gt;&lt;em&gt;Xavi Cabrera&lt;/em&gt;&lt;/a&gt;&lt;em&gt; on &lt;/em&gt;&lt;a rel="noopener noreferrer nofollow" href="https://unsplash.com/"&gt;&lt;em&gt;Unsplash&lt;/em&gt;&lt;/a&gt;&lt;em&gt;.&lt;/em&gt;&lt;/p&gt;
&lt;h2&gt;Why portability is the whole game now&lt;/h2&gt;
&lt;p&gt;In the last 18 months, more than a dozen serious AI coding agents have shipped: Claude Code, OpenAI Codex, Cursor, Windsurf, Zed's agent, Aider, Cline, Continue, Gemini CLI, and more. Each one claims to be the fastest or the smartest. Some genuinely leapfrog the others — for a few weeks, until the next release.&lt;/p&gt;
&lt;p&gt;Here is the trap. If you pour weeks into one tool — memorizing its config files, hand-tuning its rules, wiring up its integrations — you have quietly built a switching cost. The day a clearly better tool arrives, you do the math on re-learning everything and you stay put, not because your tool is best, but because leaving hurts. That is the real lock-in. It is not a contract; it is your own sunk effort.&lt;/p&gt;
&lt;p&gt;The fix is to treat the agent as a replaceable part and invest in the layer underneath it. The engineer Geoffrey Huntley has made this point sharply: the agents themselves are becoming commodities, and the durable advantage is the standards layer you build around them. Put your effort there, and any agent becomes a front-end you can swap.&lt;/p&gt;
&lt;p&gt;An analogy: remember when every phone had its own charger, and switching brands meant a drawer full of dead cables? USB-C fixed that by agreeing on one shape. You buy a charger once; it works with the next phone, and the one after. The standards below are USB-C for your AI setup. Learn them once, and your tools become things you plug into — not things you are wired into.&lt;/p&gt;
&lt;h2&gt;The three open standards that set you free&lt;/h2&gt;
&lt;p&gt;You do not need to memorize specs. You need to understand what each standard &lt;em&gt;is for&lt;/em&gt;, because once you see the shape of the problem each one solves, the portability falls out naturally. They split cleanly: one is for tools, one is for instructions, one is for skills.&lt;/p&gt;
&lt;h3&gt;MCP — one way to plug in tools&lt;/h3&gt;
&lt;p&gt;The Model Context Protocol (MCP) is an open standard, created and open-sourced by Anthropic in November 2024, for connecting an AI to outside tools and data — your database, your GitHub, your Notion, your company's internal services.&lt;/p&gt;
&lt;p&gt;If you have used a code editor, you have already benefited from this idea. Editors once needed custom code to support each programming language, until the Language Server Protocol let any editor talk to any language through one shared interface. MCP does the same trick for AI and tools. Instead of every AI app writing a custom integration for every service — an N-times-M explosion — you write one MCP &lt;strong&gt;server&lt;/strong&gt; for a service, and every MCP-compatible &lt;strong&gt;host&lt;/strong&gt; can use it. The math collapses from N times M down to N plus M.&lt;/p&gt;
&lt;p&gt;Mechanically, a host (Claude Code, Cursor, ChatGPT, and others) runs a client that talks to your server over one of two transports: &lt;code&gt;stdio&lt;/code&gt; for a local tool running as a subprocess, or streamable HTTP for a remote service. The server exposes tools, read-only resources, and prompt templates; the host stays in charge of what the model is actually allowed to touch. Crucially, the configuration looks almost identical across tools — a small block naming each server:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;{
  "mcpServers": {
    "postgres": {
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-postgres", "postgresql://localhost/mydb"]
    },
    "github": {
      "url": "https://api.githubcopilot.com/mcp/",
      "headers": { "Authorization": "Bearer ${GITHUB_TOKEN}" }
    }
  }
}&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Move from Claude Code to Cursor or Codex and you copy that block over, rename one key if needed, and your tools come with you. One safety note worth tattooing on your brain: an MCP server can run code and reach real systems, so only install servers you trust, keep secrets in environment variables like the &lt;code&gt;${GITHUB_TOKEN}&lt;/code&gt; above rather than hard-coding them, and review what each tool is permitted to do.&lt;/p&gt;
&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1545987796-200677ee1011%3Fauto%3Dformat%26fit%3Dcrop%26w%3D1600%26q%3D80" alt="Low-angle photograph of an interconnected lattice of nodes and edges, representing a network of clients and servers" width="1600" height="1067"&gt;&lt;p&gt;&lt;em&gt;MCP is the wiring: many tools, one shared protocol. Photo by &lt;/em&gt;&lt;a rel="noopener noreferrer nofollow" href="https://unsplash.com/@alinnnaaaa"&gt;&lt;em&gt;Alina Grubnyak&lt;/em&gt;&lt;/a&gt;&lt;em&gt; on &lt;/em&gt;&lt;a rel="noopener noreferrer nofollow" href="https://unsplash.com/"&gt;&lt;em&gt;Unsplash&lt;/em&gt;&lt;/a&gt;&lt;em&gt;.&lt;/em&gt;&lt;/p&gt;
&lt;h3&gt;AGENTS.md — a README for your AI&lt;/h3&gt;
&lt;p&gt;Every project has unspoken rules: how to install it, how to run the tests, which folders are off-limits, what "good code" means here. A human teammate learns these over weeks. An AI agent starts fresh every session and will happily reinvent your conventions unless you write them down.&lt;/p&gt;
&lt;p&gt;That is what &lt;code&gt;AGENTS.md&lt;/code&gt; is — a plain Markdown file at the root of your repo that tells any coding agent how to work on your project. No special syntax, no required fields. Think of it as a README written for your AI instead of for a human. It was introduced alongside OpenAI's Codex and is now supported by Codex, Cursor, Aider, Cline, Windsurf, and others — with Gemini CLI requiring a GEMINI.md symlink as described below. A good one leads with commands, because the agent refers back to them constantly:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;# Project: Acme API

## Commands
- Install: `pnpm install`
- Dev server: `pnpm dev`
- Test (run before every PR): `pnpm test`
- Lint &amp;amp; typecheck: `pnpm lint &amp;amp;&amp;amp; pnpm typecheck`

## Conventions
- TypeScript strict mode. Never use `any`.
- Do not edit files in `/generated` — they are built from schemas.
- Write a test for every new endpoint.&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Here is the one wrinkle on the Claude side. Anthropic's tool uses its own file, &lt;code&gt;CLAUDE.md&lt;/code&gt;, and as of mid-2026 Claude Code does not read &lt;code&gt;AGENTS.md&lt;/code&gt; natively — a gap the community has been loudly requesting for months. The portable move is to keep &lt;strong&gt;one&lt;/strong&gt; source of truth in &lt;code&gt;AGENTS.md&lt;/code&gt; and point the others at it with a symlink, so you never maintain the same rules twice:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;# One canonical file, linked everywhere your tools look
ln -s AGENTS.md CLAUDE.md
ln -s AGENTS.md GEMINI.md&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;One file, every tool, zero duplication. One caution worth knowing: do not blindly auto-generate this file and walk away. Research has consistently shown that bloated, auto-generated instruction files tend to hurt more than they help — the important rules get buried in noise and the agent learns to half-ignore them. Keep it short, hand-curated, and honest about what is genuinely non-obvious.&lt;/p&gt;
&lt;h3&gt;Agent Skills — teach once, reuse everywhere&lt;/h3&gt;
&lt;p&gt;The third standard is the one that feels like magic the first time it clicks. An Agent Skill is a folder containing a &lt;code&gt;SKILL.md&lt;/code&gt; file — Markdown instructions for a specific task — plus any optional scripts or reference docs that task needs. Anthropic's own framing is the clearest: building a skill is like writing an onboarding guide for a new hire. You capture how to do something once, and the agent follows it forever after.&lt;/p&gt;
&lt;p&gt;The clever part is &lt;em&gt;progressive disclosure&lt;/em&gt;. At startup the agent only reads each skill's name and one-line description — a few dozen words — so a hundred skills cost almost nothing. Only when a task actually matches does it open the full instructions, and only then does it reach for the bundled scripts. Your context window stays clean until the moment a skill is relevant.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;pdf-form-filler/
├── SKILL.md          # name + description + instructions
├── scripts/          # code the agent runs (stays out of context)
│   └── fill.py
├── references/       # long docs, loaded only when needed
└── assets/           # templates, fonts, icons&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Because a skill is just Markdown and files in a folder, it is inherently portable — you can point Codex or Gemini CLI at a skills folder, tell it to read the &lt;code&gt;SKILL.md&lt;/code&gt;, and it simply works. Agent Skills are now supported across a growing list of tools: Claude Code, Codex, Cursor, Gemini CLI, and more. Write your "how we deploy" or "how we write migrations" skill once, and it travels.&lt;/p&gt;
&lt;p&gt;So when do you reach for which? They are not competitors; they answer different questions. Here is the cheat sheet:&lt;/p&gt;
&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;colgroup&gt;
&lt;col&gt;
&lt;col&gt;
&lt;col&gt;
&lt;/colgroup&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;th colspan="1" rowspan="1"&gt;&lt;p&gt;Mechanism&lt;/p&gt;&lt;/th&gt;
&lt;th colspan="1" rowspan="1"&gt;&lt;p&gt;What it gives the agent&lt;/p&gt;&lt;/th&gt;
&lt;th colspan="1" rowspan="1"&gt;&lt;p&gt;Reach for it when&lt;/p&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td colspan="1" rowspan="1"&gt;&lt;p&gt;&lt;strong&gt;Agent Skill&lt;/strong&gt;&lt;/p&gt;&lt;/td&gt;
&lt;td colspan="1" rowspan="1"&gt;&lt;p&gt;A repeatable procedure — the &lt;em&gt;how&lt;/em&gt;&lt;/p&gt;&lt;/td&gt;
&lt;td colspan="1" rowspan="1"&gt;&lt;p&gt;You keep re-explaining the same multi-step workflow&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td colspan="1" rowspan="1"&gt;&lt;p&gt;&lt;strong&gt;MCP server&lt;/strong&gt;&lt;/p&gt;&lt;/td&gt;
&lt;td colspan="1" rowspan="1"&gt;&lt;p&gt;A connection to an outside system — the &lt;em&gt;reach&lt;/em&gt;&lt;/p&gt;&lt;/td&gt;
&lt;td colspan="1" rowspan="1"&gt;&lt;p&gt;The agent needs live data or a third-party API&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td colspan="1" rowspan="1"&gt;&lt;p&gt;&lt;strong&gt;Subagent&lt;/strong&gt;&lt;/p&gt;&lt;/td&gt;
&lt;td colspan="1" rowspan="1"&gt;&lt;p&gt;A fresh worker with its own clean context&lt;/p&gt;&lt;/td&gt;
&lt;td colspan="1" rowspan="1"&gt;&lt;p&gt;A job is large, or you want an independent reviewer&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td colspan="1" rowspan="1"&gt;&lt;p&gt;&lt;strong&gt;AGENTS.md&lt;/strong&gt;&lt;/p&gt;&lt;/td&gt;
&lt;td colspan="1" rowspan="1"&gt;&lt;p&gt;Always-on project facts and rules&lt;/p&gt;&lt;/td&gt;
&lt;td colspan="1" rowspan="1"&gt;&lt;p&gt;Conventions every session should already know&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td colspan="1" rowspan="1"&gt;&lt;p&gt;&lt;strong&gt;Hook&lt;/strong&gt;&lt;/p&gt;&lt;/td&gt;
&lt;td colspan="1" rowspan="1"&gt;&lt;p&gt;Enforcement that always runs, no exceptions&lt;/p&gt;&lt;/td&gt;
&lt;td colspan="1" rowspan="1"&gt;&lt;p&gt;Something &lt;em&gt;must&lt;/em&gt; happen — formatting, blocking secrets&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;
&lt;h2&gt;Does "switch tomorrow" actually hold?&lt;/h2&gt;
&lt;p&gt;It is a fair thing to be skeptical about — a portability promise is only as good as the tools honoring it. So here is an honest snapshot of where the major agents stood in mid-2026 on all three standards. It is not perfect across the board, but it is good enough that moving your setup is a copy-and-tweak job, not a rebuild.&lt;/p&gt;
&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;colgroup&gt;
&lt;col&gt;
&lt;col&gt;
&lt;col&gt;
&lt;col&gt;
&lt;/colgroup&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;th colspan="1" rowspan="1"&gt;&lt;p&gt;Agent&lt;/p&gt;&lt;/th&gt;
&lt;th colspan="1" rowspan="1"&gt;&lt;p&gt;MCP&lt;/p&gt;&lt;/th&gt;
&lt;th colspan="1" rowspan="1"&gt;&lt;p&gt;AGENTS.md&lt;/p&gt;&lt;/th&gt;
&lt;th colspan="1" rowspan="1"&gt;&lt;p&gt;Agent Skills&lt;/p&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td colspan="1" rowspan="1"&gt;&lt;p&gt;Claude Code&lt;/p&gt;&lt;/td&gt;
&lt;td colspan="1" rowspan="1"&gt;&lt;p&gt;Yes&lt;/p&gt;&lt;/td&gt;
&lt;td colspan="1" rowspan="1"&gt;&lt;p&gt;Via CLAUDE.md / symlink&lt;/p&gt;&lt;/td&gt;
&lt;td colspan="1" rowspan="1"&gt;&lt;p&gt;Yes, native&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td colspan="1" rowspan="1"&gt;&lt;p&gt;OpenAI Codex&lt;/p&gt;&lt;/td&gt;
&lt;td colspan="1" rowspan="1"&gt;&lt;p&gt;Yes&lt;/p&gt;&lt;/td&gt;
&lt;td colspan="1" rowspan="1"&gt;&lt;p&gt;Yes (its home format)&lt;/p&gt;&lt;/td&gt;
&lt;td colspan="1" rowspan="1"&gt;&lt;p&gt;Yes&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td colspan="1" rowspan="1"&gt;&lt;p&gt;Cursor&lt;/p&gt;&lt;/td&gt;
&lt;td colspan="1" rowspan="1"&gt;&lt;p&gt;Yes&lt;/p&gt;&lt;/td&gt;
&lt;td colspan="1" rowspan="1"&gt;&lt;p&gt;Yes&lt;/p&gt;&lt;/td&gt;
&lt;td colspan="1" rowspan="1"&gt;&lt;p&gt;Yes&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td colspan="1" rowspan="1"&gt;&lt;p&gt;Windsurf&lt;/p&gt;&lt;/td&gt;
&lt;td colspan="1" rowspan="1"&gt;&lt;p&gt;Yes&lt;/p&gt;&lt;/td&gt;
&lt;td colspan="1" rowspan="1"&gt;&lt;p&gt;Yes&lt;/p&gt;&lt;/td&gt;
&lt;td colspan="1" rowspan="1"&gt;&lt;p&gt;Yes&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td colspan="1" rowspan="1"&gt;&lt;p&gt;Zed&lt;/p&gt;&lt;/td&gt;
&lt;td colspan="1" rowspan="1"&gt;&lt;p&gt;Yes&lt;/p&gt;&lt;/td&gt;
&lt;td colspan="1" rowspan="1"&gt;&lt;p&gt;Yes&lt;/p&gt;&lt;/td&gt;
&lt;td colspan="1" rowspan="1"&gt;&lt;p&gt;Via hosted agents&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td colspan="1" rowspan="1"&gt;&lt;p&gt;Aider&lt;/p&gt;&lt;/td&gt;
&lt;td colspan="1" rowspan="1"&gt;&lt;p&gt;Limited&lt;/p&gt;&lt;/td&gt;
&lt;td colspan="1" rowspan="1"&gt;&lt;p&gt;Yes&lt;/p&gt;&lt;/td&gt;
&lt;td colspan="1" rowspan="1"&gt;&lt;p&gt;Via conversion&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td colspan="1" rowspan="1"&gt;&lt;p&gt;Cline&lt;/p&gt;&lt;/td&gt;
&lt;td colspan="1" rowspan="1"&gt;&lt;p&gt;Yes&lt;/p&gt;&lt;/td&gt;
&lt;td colspan="1" rowspan="1"&gt;&lt;p&gt;Yes&lt;/p&gt;&lt;/td&gt;
&lt;td colspan="1" rowspan="1"&gt;&lt;p&gt;Yes&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td colspan="1" rowspan="1"&gt;&lt;p&gt;Gemini CLI&lt;/p&gt;&lt;/td&gt;
&lt;td colspan="1" rowspan="1"&gt;&lt;p&gt;Yes&lt;/p&gt;&lt;/td&gt;
&lt;td colspan="1" rowspan="1"&gt;&lt;p&gt;Yes (GEMINI.md)&lt;/p&gt;&lt;/td&gt;
&lt;td colspan="1" rowspan="1"&gt;&lt;p&gt;Yes&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;
&lt;p&gt;The weak spots are real and worth naming — Aider's MCP support is limited, and a few tools only run skills through their cloud agents — but the spine holds. Your tools, instructions, and skills are written in formats more than one vendor understands.&lt;/p&gt;
&lt;h2&gt;Your portable setup, layer by layer&lt;/h2&gt;
&lt;p&gt;Now let's assemble it. The trick is two tiers: one that lives &lt;em&gt;with each project&lt;/em&gt; and travels in its Git repo, and one that is &lt;em&gt;personal to you&lt;/em&gt; and follows you across every machine and every project.&lt;/p&gt;
&lt;h4&gt;Tier 1 — the project repo (commit this)&lt;/h4&gt;
&lt;p&gt;At the root of each project, keep an &lt;code&gt;AGENTS.md&lt;/code&gt; as the single source of truth, with &lt;code&gt;CLAUDE.md&lt;/code&gt; and friends symlinked to it. Add an &lt;code&gt;.mcp.json&lt;/code&gt; for the tools that project needs (databases, issue trackers), with secrets pulled from environment variables. Put team-shared skills in a folder like &lt;code&gt;.agents/skills/&lt;/code&gt;. Commit all of it. The payoff: a teammate — or you, six months later — clones the repo and the AI is instantly productive, with the same rules and the same tools, no setup call required.&lt;/p&gt;
&lt;h4&gt;Tier 2 — your personal dotfiles&lt;/h4&gt;
&lt;p&gt;Your personal taste — how you like commit messages written, your favorite skills, your global tool configs — belongs in a dotfiles repo you carry everywhere. The reliable pattern is to keep one canonical copy of each file and symlink it into the locations each tool expects, using a tiny helper so a fresh machine is set up in seconds:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;safe_symlink() {
  local src=$1 dst=$2
  [ -L "$dst" ] &amp;amp;&amp;amp; rm "$dst"
  [ -e "$dst" ] &amp;amp;&amp;amp; mv "$dst" "$dst.bak"
  mkdir -p "$(dirname "$dst")"
  ln -s "$src" "$dst"
}

safe_symlink "$DOTFILES/AGENTS.md" "$HOME/.claude/CLAUDE.md"
safe_symlink "$DOTFILES/AGENTS.md" "$HOME/.codex/AGENTS.md"
safe_symlink "$DOTFILES/skills"    "$HOME/.claude/skills"&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Tools like GNU Stow and chezmoi exist to manage exactly this, and chezmoi is the kinder choice if you bounce between macOS, Linux, and Windows. One Windows gotcha to save you an hour: symlinks there need Developer Mode or admin rights and &lt;code&gt;core.symlinks=true&lt;/code&gt; in Git, and mixing WSL and native-Windows symlinks does not work — pick one world and stay in it.&lt;/p&gt;
&lt;h2&gt;Advanced Claude Code moves that survive the move&lt;/h2&gt;
&lt;p&gt;Everything above keeps you portable. But within Claude Code there are sharper techniques worth knowing — and because they are built on the same standards and plain files, they travel with you too. The single mental model that explains most of them: Claude's context window fills up fast, and quality drops as it fills. Almost every good habit is really about protecting that space.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Subagents&lt;/strong&gt; are separate Claude instances with their own clean context and their own narrow tool permissions. Hand a big exploration or an independent code review to a subagent, and only its summary comes back — your main conversation stays uncluttered. A favorite pattern: after a long stretch of work, spin up a fresh reviewer subagent that sees only the final diff and the requirements, with no memory of the messy path that got there.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Slash commands and hooks&lt;/strong&gt; are where you encode habits. A custom command is a saved prompt you trigger by name. A hook is different and more powerful: it is a script that fires automatically at a set moment — and this is the key distinction — your instruction files only &lt;em&gt;suggest&lt;/em&gt; behavior, while a hook &lt;em&gt;guarantees&lt;/em&gt; it. Want code formatted every single time a file is written, no exceptions?&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;{
  "hooks": {
    "PostToolUse": [
      {
        "matcher": "Write|Edit",
        "hooks": [{ "type": "command", "command": "pnpm prettier --write \"$CLAUDE_FILE_PATHS\"" }]
      }
    ]
  }
}&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;strong&gt;Plan mode&lt;/strong&gt; is the antidote to an agent that charges off in the wrong direction. Toggle it and Claude reads, analyzes, and proposes a plan &lt;em&gt;without touching your files&lt;/em&gt;. The workflow Anthropic recommends is simple and worth internalizing: explore, plan, implement, commit — in that order, with your eyes on the plan before any code is written.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Worktrees and headless mode&lt;/strong&gt; are for scale. Git worktrees let you run two to four Claude sessions in parallel on separate branches without collisions. Headless mode — the much-missed &lt;code&gt;claude -p&lt;/code&gt; flag — turns Claude into a command-line citizen you can pipe into and wire into CI:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;# Pipe an error log straight into a headless review
cat error.log | claude -p "Find the root cause and propose a fix"

# CI-safe: cap turns and restrict tools
claude -p "Run the test suite and fix failures" \
  --max-turns 12 \
  --allowedTools "Bash(pnpm test:*)" "Edit"&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;And one underused habit: ask Claude to explain its reasoning as it works — add "and briefly explain each decision" to your prompt. You absorb the why as you read the diff, not just the what. For a new codebase, this single habit is worth the extra few lines of output.&lt;/p&gt;
&lt;h2&gt;If you are a vibe coder, start here&lt;/h2&gt;
&lt;p&gt;If the section above felt like a lot, this part is for you. Vibe coding — building by describing and steering rather than hand-writing every line — is a real and legitimate way to ship. But there is an honest catch worth saying plainly: the impressive demo is the easy 80%. The boring 20% — real authentication, a real database, payments, deployment, security — is where projects quietly fall apart. Security researchers scanning AI-assisted apps throughout 2025 found troubling rates of exposed credentials — API keys hardcoded in frontends, admin passwords committed to public repos — almost all of it preventable with a single review step.&lt;/p&gt;
&lt;p&gt;So here is the short, high-leverage starter kit. None of it slows you down much, and all of it keeps you out of the ditch:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Treat the AI like a sharp junior engineer with tools, not a magic box. Build features in layers — get auth working, &lt;em&gt;then&lt;/em&gt; the database, &lt;em&gt;then&lt;/em&gt; payments — rather than asking for everything in one breath.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Give it a role that changes its priorities. "Act as a senior engineer who has been burned by sloppy payment code" produces noticeably more careful, defensive work.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Always run an adversarial reviewer at the end — a fresh subagent that sees only the diff and is asked to find what is wrong. This one habit catches most of those exposed-secret disasters.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Do not launch Claude Code from your home folder — it then has reach over your SSH keys and tokens. Work inside a dedicated project folder.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Turn on the "Learning" output style and let the tool explain itself. You will absorb the why, not just collect the what.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1732120195121-dcbe6db79197%3Fauto%3Dformat%26fit%3Dcrop%26w%3D1600%26h%3D720%26q%3D80" alt="Hands typing on a laptop in a cozy, plant-filled home office in warm daylight" width="1600" height="720"&gt;&lt;p&gt;&lt;em&gt;You do not need a 10-monitor battlestation to build well — you need good defaults. Photo by &lt;/em&gt;&lt;a rel="noopener noreferrer nofollow" href="https://unsplash.com/@jakubzerdzicki"&gt;&lt;em&gt;Jakub Żerdzicki&lt;/em&gt;&lt;/a&gt;&lt;em&gt; on &lt;/em&gt;&lt;a rel="noopener noreferrer nofollow" href="https://unsplash.com/"&gt;&lt;em&gt;Unsplash&lt;/em&gt;&lt;/a&gt;&lt;em&gt;.&lt;/em&gt;&lt;/p&gt;
&lt;h2&gt;Anti-patterns to skip&lt;/h2&gt;
&lt;p&gt;A few traps catch nearly everyone. Knowing them in advance saves real time:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Instruction-file bloat.&lt;/strong&gt; A giant &lt;code&gt;AGENTS.md&lt;/code&gt; backfires — the important rules get lost in the noise and the agent half-ignores them. For each line, ask whether removing it would cause a mistake. If not, cut it.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;MCP overload.&lt;/strong&gt; Connecting fifteen servers globally floods the agent with tool options and it starts picking wrong. Add tools per project, only where they earn their place.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;No way to check the work.&lt;/strong&gt; This is the big one. If you do not give the agent a test, a build, or a linter it can run, it stops when the code merely &lt;em&gt;looks&lt;/em&gt; done — and you become the error-checker. Give it a check it can run, and have it show you the passing output rather than just claiming success.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Rules where guarantees belong.&lt;/strong&gt; "Always format the code" written in an instruction file is a suggestion it may drop. If it must happen, make it a hook.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Keeping it all to yourself.&lt;/strong&gt; If your &lt;code&gt;.mcp.json&lt;/code&gt; and shared skills are not committed, your teammates are flying blind. Portability includes your team.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;The point&lt;/h2&gt;
&lt;p&gt;The best Claude Code setup is not a clever pile of configuration locked inside one app. It is a small, portable kit — instructions in &lt;code&gt;AGENTS.md&lt;/code&gt;, tools behind MCP, know-how in Skills — written in open formats you own and can carry anywhere. Build it that way and you are not betting on which agent wins the next release cycle. You are making that question stop mattering, because whatever you reach for tomorrow, your setup is already waiting for you there.&lt;/p&gt;
&lt;p&gt;You can start in the next ten minutes. Write a short, honest &lt;code&gt;AGENTS.md&lt;/code&gt; for your current project. Symlink &lt;code&gt;CLAUDE.md&lt;/code&gt; to it. Then take the one workflow you keep re-explaining to your AI, and move it out of your head and into a &lt;code&gt;SKILL.md&lt;/code&gt;. That is the whole foundation — and it already belongs to you.&lt;/p&gt;

&lt;h3&gt;Sources &amp;amp; further reading&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;a rel="noopener noreferrer nofollow" href="https://modelcontextprotocol.io/"&gt;Model Context Protocol — official documentation&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a rel="noopener noreferrer nofollow" href="https://agents.md/"&gt;AGENTS.md — the open agent-instructions format&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a rel="noopener noreferrer nofollow" href="https://www.anthropic.com/engineering/equipping-agents-for-the-real-world-with-agent-skills"&gt;Anthropic — Equipping agents for the real world with Agent Skills&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a rel="noopener noreferrer nofollow" href="https://docs.anthropic.com/en/docs/claude-code/"&gt;Claude Code — best practices for agentic coding&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a rel="noopener noreferrer nofollow" href="https://simonwillison.net/tags/claude-code/"&gt;Simon Willison — writing on Claude Code and skills&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;


</description>
      <category>claudecode</category>
      <category>mcp</category>
      <category>agentsmd</category>
      <category>agentskills</category>
    </item>
    <item>
      <title>Photography &amp; AI: a faster, smarter 2026 workflow</title>
      <dc:creator>Harshdeep Singh</dc:creator>
      <pubDate>Thu, 04 Jun 2026 05:00:35 +0000</pubDate>
      <link>https://dev.to/harshdeepsingh13/photography-ai-a-faster-smarter-2026-workflow-24f4</link>
      <guid>https://dev.to/harshdeepsingh13/photography-ai-a-faster-smarter-2026-workflow-24f4</guid>
      <description>&lt;p&gt;Ask a working photographer where their week actually goes, and almost none of them will say “behind the camera.” The shoot is the fun part. The grind is everything after it: culling thousands of frames, matching edits across an entire gallery, color-grading video, chasing invoices, and answering the same five client emails for the hundredth time.&lt;/p&gt;
&lt;p&gt;That grind is exactly what AI is good at. Industry surveys now put AI adoption among professional photographers in the low-90s percent — it has quietly gone from novelty to default. Used well, it doesn’t replace your eye; it removes the repetitive work standing between you and the next shoot. Here’s what a modern, AI-assisted pipeline looks like, from capture to paid invoice.&lt;/p&gt;
&lt;h2&gt;The AI Photo editing Pipeline&lt;/h2&gt;
&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1551232865-e0a56728e881%3Fw%3D1400%26h%3D620%26auto%3Dformat%26fit%3Dcrop" alt="AI photo culling and editing workflow for professional photographers" width="1400" height="620"&gt;&lt;p&gt;It starts before you touch a single slider. AI culling tools like &lt;strong&gt;Aftershoot&lt;/strong&gt; and &lt;strong&gt;Imagen&lt;/strong&gt; sort a 3,000-frame wedding in minutes — flagging the sharp shots, catching closed eyes, and grouping near-duplicates so you choose from a shortlist instead of the whole card.&lt;/p&gt;
&lt;p&gt;Then comes the part that actually matters: editing &lt;em&gt;in your style&lt;/em&gt;. This is where 2026 tools pull ahead of presets. A preset applies the same math to every photo; feed a “bright and airy” preset a dark reception and you get mush. Imagen’s Personal AI Profile works differently — point it at a few thousand of your previously edited images and it studies how &lt;em&gt;you&lt;/em&gt; handle exposure, white balance, and color, then applies that judgment to a fresh set.&lt;/p&gt;
&lt;p&gt;For pixel-level rescue, &lt;strong&gt;Topaz Photo AI&lt;/strong&gt; handles denoise, sharpening, and Gigapixel upscaling; &lt;strong&gt;Lightroom&lt;/strong&gt;’s AI denoise and masking isolate skies and subjects in a click; and &lt;strong&gt;Photoshop&lt;/strong&gt;’s Firefly Generative Fill removes a stray tourist or extends a cramped background without a clone-stamp marathon. The pattern is consistent: AI does the first 80%, and you spend your time on the 20% that carries your signature.&lt;/p&gt;
&lt;h2&gt;AI in the video pipeline&lt;/h2&gt;
&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1778052404716-b39eef1f17c6%3Fw%3D1400%26h%3D820%26q%3D80%26auto%3Dformat%26fit%3Dcrop" alt="AI-powered video editing timeline in DaVinci Resolve" width="1400" height="820"&gt;&lt;p&gt;If you shoot hybrid, video is where AI buys back the most time, because video post is where the most time disappears. The headline shift is &lt;strong&gt;text-based editing&lt;/strong&gt;: your footage is transcribed, and you cut the video by editing the transcript — delete a sentence, and the matching frames vanish. Rough cuts that used to eat an afternoon now take minutes.&lt;/p&gt;
&lt;p&gt;Color is the other big win. &lt;strong&gt;DaVinci Resolve&lt;/strong&gt;’s Magic Mask isolates a subject for targeted grading, and neural color matching lines up clips shot on different bodies — your A-cam, a second camera, and the drone — into one consistent look. Imagen’s video tool, launched at NAB 2026, brings the same learn-your-style grading photographers already enjoy straight to the timeline.&lt;/p&gt;
&lt;p&gt;The finishing touches are increasingly automatic too: smart reframing turns a 16:9 edit into vertical 9:16 and square 4:5 cuts for social, AI leveling and noise removal clean up audio without manual EQ, and upscaling pushes older footage to 4K. The result is same-day turnarounds on work that used to take a week.&lt;/p&gt;
&lt;h2&gt;&lt;strong&gt;AI for the business: clients, CRM &amp;amp; delivery&lt;/strong&gt;&lt;/h2&gt;
&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1769865563021-8792ccf279e4%3Fw%3D1400%26h%3D620%26auto%3Dformat%26fit%3Dcrop" alt="Photographer using CRM software to automate client management" width="1400" height="620"&gt;&lt;p&gt;Here’s the unglamorous truth no gear review mentions: the thing most likely to sink a photography business isn’t bad photos — it’s bad admin. Inquiries that go cold, contracts that sit unsigned, invoices that slip a month. A well-run CRM is the fix, and the payoff is real: photographers consistently report clawing back the better part of a working day every week once the back office runs itself.&lt;/p&gt;
&lt;p&gt;Platforms built for this — &lt;strong&gt;HoneyBook&lt;/strong&gt;, &lt;strong&gt;Dubsado&lt;/strong&gt;, &lt;strong&gt;Studio Ninja&lt;/strong&gt;, &lt;strong&gt;Táve&lt;/strong&gt;, &lt;strong&gt;Bloom&lt;/strong&gt;, &lt;strong&gt;Sprout Studio&lt;/strong&gt; — automate the entire client journey: an inquiry triggers an instant reply, a proposal, a contract, and a payment schedule, with reminders firing on their own. The AI layer goes further. Sprout Studio drafts your emails and questionnaires, HoneyBook plugs into post-production tools so editing and client management finally talk to each other, and gallery platforms now use face recognition so guests find and buy their own photos without you lifting a finger.&lt;/p&gt;
&lt;p&gt;Speed is the quiet advantage. The studio that answers an inquiry in five minutes books the client that the studio answering in five hours was still drafting a reply to. AI simply makes those five minutes happen while you’re on a shoot.&lt;/p&gt;
&lt;blockquote&gt;&lt;h2&gt;AI hand&lt;em&gt;les the first 80%. Your taste is the last 20% — an&lt;/em&gt;&lt;strong&gt;d that’s the only part a client is really paying for.&lt;/strong&gt;
&lt;/h2&gt;&lt;/blockquote&gt;
&lt;h3&gt;Tools mentioned&lt;/h3&gt;
&lt;p&gt;&lt;span&gt;Aftershoot&lt;/span&gt;&lt;span&gt;Imagen&lt;/span&gt;&lt;span&gt;Topaz Photo AI&lt;/span&gt;&lt;span&gt;Lightroom&lt;/span&gt;&lt;span&gt;Photoshop&lt;/span&gt;&lt;span&gt;Firefly&lt;/span&gt;&lt;span&gt;DaVinci Resolve&lt;/span&gt;&lt;span&gt;CapCut&lt;/span&gt;&lt;span&gt;HoneyBook&lt;/span&gt;&lt;span&gt;Dubsado&lt;/span&gt;&lt;span&gt;Studio Ninja&lt;/span&gt;&lt;span&gt;Sprout Studio&lt;/span&gt;&lt;/p&gt;
&lt;h2&gt;Where the Human still wins&lt;/h2&gt;
&lt;p&gt;None of this is about handing your work to a machine. AI is leverage, not authorship. It can match your edit, but it can’t decide what’s worth photographing, read a nervous couple on a wedding morning, or build the trust that turns a one-off booking into a decade of referrals. That judgment is the moat — and it’s getting more valuable, not less.&lt;/p&gt;
&lt;p&gt;So don’t try to automate everything at once. Find your single biggest bottleneck — for most photographers it’s culling or the CRM — and hand that one to AI first. Win back those hours, then reinvest them where they compound: better shoots, sharper craft, and a client experience no software will ever replicate.&lt;/p&gt;
&lt;h2&gt;TL;DR&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;The shoot was never the bottleneck.&lt;/strong&gt; AI’s real value is post-shoot — it clears the repetitive work, not your creative judgment.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Photo editing:&lt;/strong&gt; AI culls thousands of frames and edits in &lt;em&gt;your&lt;/em&gt; learned style (Imagen, Aftershoot), while Topaz, Lightroom, and Firefly handle pixel-level fixes. You keep the final 20%.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Video:&lt;/strong&gt; text-based editing, neural color-matching across cameras, and auto reframe + audio cleanup turn week-long edits into same-day deliveries.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Business:&lt;/strong&gt; a CRM plus AI automations (HoneyBook, Dubsado, Sprout Studio) save roughly a day a week — and a five-minute inquiry reply wins the booking.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Start small:&lt;/strong&gt; automate one bottleneck (culling or your CRM) first, then reinvest the hours into better work.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;


</description>
      <category>aiphotoediting</category>
      <category>photographyworkflow</category>
      <category>aiforphotographers</category>
      <category>photocullingsoftware</category>
    </item>
    <item>
      <title>Full Stack Developer Portfolio Lessons: What I Learned Building 10+ Projects</title>
      <dc:creator>Harshdeep Singh</dc:creator>
      <pubDate>Tue, 02 Jun 2026 21:33:43 +0000</pubDate>
      <link>https://dev.to/harshdeepsingh13/full-stack-developer-portfolio-lessons-what-i-learned-building-10-projects-2bdi</link>
      <guid>https://dev.to/harshdeepsingh13/full-stack-developer-portfolio-lessons-what-i-learned-building-10-projects-2bdi</guid>
      <description>&lt;p&gt;I applied for a role at a mid-sized SaaS company about two years into my career. Strong company, interesting problem, good pay. I sent my application, got a recruiter callback, and then nothing for two weeks. When the feedback finally came: "We went with candidates with a stronger portfolio presence."&lt;/p&gt;

&lt;p&gt;I had 23 GitHub repositories. I had a portfolio site. I had projects. What I didn't have — and what I didn't understand for another six months — was a portfolio that told a story. I had code. Not evidence of thinking, decision-making, or the ability to ship something real.&lt;/p&gt;

&lt;p&gt;I've since built, rebuilt, and advised on a lot of developer portfolios. I've seen what gets people calls and what gets them ghosted. This isn't a guide about which framework to use or how to pick colors. It's about what actually moves the needle — the things I wish someone had told me in year one.&lt;/p&gt;

&lt;h2&gt;
  
  
  Lesson 1: Two Great Projects Beat Twenty Mediocre Ones
&lt;/h2&gt;

&lt;p&gt;The instinct is to fill the portfolio. More projects = more evidence of experience. This is wrong.&lt;/p&gt;

&lt;p&gt;A hiring manager or engineering lead looking at your portfolio has about three minutes. They're going to look at your two or three most prominent projects, click one or two live demo links, and form an opinion. If they see twenty repositories and most of them are "Todo App v2," "Weather App," "Netflix Clone," "Portfolio v1 through v6" — they've already categorized you as someone who builds tutorials, not someone who builds things.&lt;/p&gt;

&lt;p&gt;The better approach: three to five projects, each with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A real problem it solves (not "I wanted to learn React")&lt;/li&gt;
&lt;li&gt;A live deployment that actually works&lt;/li&gt;
&lt;li&gt;A README that explains why you made the decisions you made&lt;/li&gt;
&lt;li&gt;Enough complexity to have generated at least one interesting engineering problem&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Projects that tend to work: tools you built because you were frustrated with an existing tool, apps solving problems you personally had, projects where you integrated with a real API or real data source, anything with a live user base (even 10 users counts).&lt;/p&gt;

&lt;p&gt;Projects that tend not to work: tutorial clones (unless heavily modified), apps that only run locally, projects that stop at the MVP and never got deployed, apps with the same name as thousands of other developer portfolios ("My Todo App," "My Weather App").&lt;/p&gt;

&lt;p&gt;If you have 20 repos, that's fine. Pin your three best to your GitHub profile. Don't make people wade through everything — curate it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Lesson 2: Case Studies Beat Code Screenshots
&lt;/h2&gt;

&lt;p&gt;Here's the thing about showing a screenshot of your app: everyone can make an app look good in a screenshot. Filters, cropping, ideal state data. A screenshot shows what you built. It tells me nothing about how you think.&lt;/p&gt;

&lt;p&gt;A case study shows how you think. And how you think is what you're being hired for.&lt;/p&gt;

&lt;p&gt;A case study doesn't have to be a five-page document. Two or three paragraphs on each project covering:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;The problem.&lt;/strong&gt; What did you set out to solve? Be specific. Not "I wanted to learn Next.js" — that's not a problem. "Resume submissions were getting lost in email threads, so I built a tool that…" — that's a problem.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Your approach and the tradeoffs you considered.&lt;/strong&gt; What did you think about? What did you try first? What didn't work? This is where you demonstrate that you can make technical decisions, not just execute instructions.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;What you shipped.&lt;/strong&gt; Not every feature you imagined — what you actually built and deployed.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;What you'd do differently.&lt;/strong&gt; This one is disarming in the best way. It shows self-awareness, reflection, and the ability to evaluate your own work critically. Engineers who can't critique their own code can't grow.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I've seen portfolios with two projects and a well-written case study for each that outperformed portfolios with fifteen projects and no context. The case study gives an interviewer something to ask about. It shows you've thought deeply about the work. It makes the technical interview easier because you already answered half the questions in writing.&lt;/p&gt;

&lt;h2&gt;
  
  
  Lesson 3: If It's Not Deployed, It Doesn't Exist
&lt;/h2&gt;

&lt;p&gt;This is the blunt version. A project that runs on localhost is a project you're still working on. It is not a portfolio piece.&lt;/p&gt;

&lt;p&gt;I've reviewed portfolios where the "live demo" link was a localhost URL. I've seen GitHub repositories where the README says "deployment in progress" with a date from 18 months ago. I've seen apps in screenshots that couldn't actually run because they depended on a local database with no seed data.&lt;/p&gt;

&lt;p&gt;Deploying has never been easier or cheaper. There's no excuse for a portfolio project that isn't live.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Frontend:&lt;/strong&gt; Vercel (free), Netlify (free), Cloudflare Pages (free). Zero configuration for most frameworks.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Backend / API:&lt;/strong&gt; Railway (free tier), Render (free tier), Fly.io (free tier). These all support Node.js, Python, Go, whatever you're running.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Database:&lt;/strong&gt; MongoDB Atlas free tier (512MB), Supabase free tier (PostgreSQL), PlanetScale free tier (MySQL).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Full stack:&lt;/strong&gt; Railway handles full-stack apps well. Render lets you deploy multiple services from one repo. Both have one-click GitHub deploys.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Total cost of a deployed side project: $0, with a free domain subdomain. Add a custom domain for $12/year and you have a genuinely professional-looking production deployment.&lt;/p&gt;

&lt;p&gt;There's a secondary benefit to deploying: it forces you to actually finish things. There's a long list of problems you don't know about until you deploy — environment variable management, CORS configuration, database connection pooling, static asset serving. Deploying is part of building. A portfolio project that's never been deployed has never been truly finished.&lt;/p&gt;

&lt;h2&gt;
  
  
  Lesson 4: The README Is Your First Interview
&lt;/h2&gt;

&lt;p&gt;When a hiring manager or senior engineer clicks the GitHub link from your portfolio, the first thing they see is the README. If it says "A project I made for learning" or has no description at all, they've already lost interest.&lt;/p&gt;

&lt;p&gt;The README is where you make the technical case for yourself before you're in the room. Here's what a good one contains:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;First paragraph:&lt;/strong&gt; What does this thing do and why does it exist? Not "this is a web app" — tell me the specific problem it solves. One or two sentences.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tech stack and why:&lt;/strong&gt; Not just a logo grid. A sentence about why you chose what you chose. "Used PostgreSQL instead of MongoDB because the data has strong relational structure with lots of joins." "Chose Next.js App Router over CRA because we needed SSR for SEO and a built-in API layer." These sentences prove you made intentional decisions.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Screenshots or a GIF:&lt;/strong&gt; A 10-second screen recording of the app working is worth a thousand words. Not staged, not filtered — just the actual app.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How to run it:&lt;/strong&gt; Clear, complete instructions. If I clone it and follow your README and it doesn't work, that's a flag. If it works first try, that's a positive signal — it means you document carefully and you care about the developer experience of your code.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Known limitations / what you'd do differently:&lt;/strong&gt; One paragraph. Shows maturity. "If I built this again, I'd use a message queue for the email sending instead of doing it synchronously in the request lifecycle — it caused timeouts under load."&lt;/p&gt;

&lt;p&gt;This README takes maybe 45 minutes to write. It dramatically changes how your project is perceived.&lt;/p&gt;

&lt;h2&gt;
  
  
  Lesson 5: Get Your Own Domain
&lt;/h2&gt;

&lt;p&gt;This one is simple and often skipped. Your portfolio should live at &lt;code&gt;yourname.com&lt;/code&gt; or &lt;code&gt;yourname.dev&lt;/code&gt; — not &lt;code&gt;github.io/yourname/portfolio&lt;/code&gt; or &lt;code&gt;yourname.netlify.app&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;A custom domain does two things: it signals that you take yourself seriously as a professional, and it's a much better URL to put on a resume, LinkedIn, or business card. "theharshdeepsingh.com" looks intentional. "harshdeep-singh-13.github.io/portfolio-2024" looks like a homework assignment.&lt;/p&gt;

&lt;p&gt;Domains cost $10–15 per year. That is a rounding error in any budget. Buy yours today. Redirect your GitHub Pages / Vercel / Netlify deployment to it. It takes 30 minutes and it never needs to change — you own it.&lt;/p&gt;

&lt;p&gt;A note on choosing the domain: use your name. Not your "developer brand" or a clever handle. Names rank in Google. If someone searches for you, they should find your portfolio at the top. A personal domain with your name is one of the easiest SEO wins available to you.&lt;/p&gt;

&lt;h2&gt;
  
  
  Lesson 6: One AI Integration Changes Everything
&lt;/h2&gt;

&lt;p&gt;Here's the hiring landscape in 2025 from a practical perspective: companies want developers who can work with AI, build on top of AI APIs, and integrate AI capabilities into existing products. This is new enough that not everyone has done it. Old enough that "I'm planning to learn it" isn't a compelling answer.&lt;/p&gt;

&lt;p&gt;One project with a real AI integration moves you from the pile to the shortlist. Not because AI is a magic word — but because it demonstrates technical currency. You know what the OpenAI API looks like. You've dealt with token limits and streaming and prompt engineering. You've thought about cost and abuse prevention. These are all non-trivial.&lt;/p&gt;

&lt;p&gt;What counts:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A feature in an existing project that uses GPT-4o, Claude, or Gemini for a specific, meaningful task (not "ask AI anything" — that's too vague to be impressive)&lt;/li&gt;
&lt;li&gt;A RAG (retrieval-augmented generation) pipeline — document upload, embedding, search, answer generation&lt;/li&gt;
&lt;li&gt;An agent that takes structured actions based on LLM output (web search, database queries, API calls)&lt;/li&gt;
&lt;li&gt;A classification or extraction feature that uses an LLM where a simpler approach wouldn't have worked&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;What doesn't count:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;"Used ChatGPT to help me write this code" (everyone does this)&lt;/li&gt;
&lt;li&gt;A UI wrapper around ChatGPT that just passes prompts through (no engineering decision was made)&lt;/li&gt;
&lt;li&gt;A project that uses AI for something that a regex would handle just as well&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The bar isn't high. Ship one genuinely useful AI feature, document the decisions you made (model selection, prompt design, cost management), and you're ahead of the majority of developers applying for the same roles.&lt;/p&gt;

&lt;h2&gt;
  
  
  Watch: Portfolio Reviews — What Actually Works
&lt;/h2&gt;

&lt;h2&gt;
  
  
  Weak vs. Strong Portfolio Signals
&lt;/h2&gt;

&lt;p&gt;Signal&lt;br&gt;
Weak&lt;br&gt;
Strong&lt;/p&gt;

&lt;p&gt;Project count&lt;br&gt;
20+ repos, half are tutorial clones&lt;br&gt;
3–5 curated projects, each with a clear purpose&lt;/p&gt;

&lt;p&gt;Project quality&lt;br&gt;
Todo apps, weather apps, Netflix/Airbnb clones&lt;br&gt;
Tools solving real problems, deployed with real users&lt;/p&gt;

&lt;p&gt;Live demos&lt;br&gt;
No live link, "works on my machine," localhost screenshots&lt;br&gt;
Deployed URLs that load in under 3 seconds&lt;/p&gt;

&lt;p&gt;Documentation&lt;br&gt;
No README or "this is a project I made"&lt;br&gt;
Problem statement, tech choices explained, known limitations&lt;/p&gt;

&lt;p&gt;Tech recency&lt;br&gt;
Create React App, class components, outdated dependencies&lt;br&gt;
Current stack (Next.js 15, TypeScript, modern APIs)&lt;/p&gt;

&lt;p&gt;AI integration&lt;br&gt;
None, or "used AI to help me code"&lt;br&gt;
One genuine AI feature with documented engineering decisions&lt;/p&gt;

&lt;p&gt;Domain&lt;br&gt;
github.io/username or platform subdomain&lt;br&gt;
yourname.com — personal, memorable, professional&lt;/p&gt;

&lt;p&gt;Case studies&lt;br&gt;
Screenshots in a grid with a "View Project" button&lt;br&gt;
Problem → approach → tradeoffs → outcome — per project&lt;/p&gt;

&lt;h2&gt;
  
  
  What I'd Do on Day 1 If I Were Starting Over
&lt;/h2&gt;

&lt;p&gt;If I were a developer today with no portfolio and a job to find, here's the exact sequence I'd follow:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Day 1:&lt;/strong&gt; Register &lt;code&gt;firstnamelastname.com&lt;/code&gt;. It costs $12. Do it before you build anything. Having the domain makes the whole thing feel real and gives you a deadline.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Week 1:&lt;/strong&gt; Identify one problem I genuinely have — something I do manually that should be automated, something I've searched for that doesn't exist, something at work that annoys me. Build the simplest version of the solution. Not a full product — a working tool. One feature, deployed.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Week 2:&lt;/strong&gt; Write the case study. What was the problem? What did I consider? What did I ship? What didn't make the cut? What would I do differently? Two or three paragraphs per question. This is more valuable than the code itself.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Week 3:&lt;/strong&gt; Add an AI integration to the project — something that actually makes the tool better, not a bolt-on. Even a single endpoint that uses an LLM for classification or text generation counts, as long as it's doing something a simpler approach couldn't.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Week 4:&lt;/strong&gt; Point the domain at the project. Add the URL to LinkedIn's "Featured" section and your resume. Ask one person who is not a developer to try using the tool and tell you what they're confused by. Fix those things.&lt;/p&gt;

&lt;p&gt;That's it. One good project, one good case study, one AI integration, a custom domain, a LinkedIn presence. Four weeks. That's a portfolio that gets callbacks. Everything else is refinement.&lt;/p&gt;

&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Curate, don't accumulate.&lt;/strong&gt; Three deployed projects with case studies beat twenty unfinished repos. Pin your best work. Hide the rest.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Write case studies for every project.&lt;/strong&gt; Problem → approach → tradeoffs → outcome → what you'd do differently. This is what interviews are about anyway — you're just answering in advance.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Nothing without a live URL.&lt;/strong&gt; Free tiers on Vercel, Railway, and Render make deployment trivial. An undeployed project is an unfinished project.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The README is your first impression from GitHub.&lt;/strong&gt; Spend 45 minutes on it. Explain the why, not just the what. Include how to run it. List the limitations. That's a professional developer.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Get the domain.&lt;/strong&gt; yourname.com, $12, 30 minutes. It signals intent and makes your portfolio findable in Google searches by your own name.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Build one thing with a real AI integration.&lt;/strong&gt; Not a ChatGPT wrapper — a feature that uses an LLM to solve a specific problem in your project, with documented decisions about model selection and prompt design.&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>portfolio</category>
      <category>career</category>
      <category>fullstack</category>
      <category>jobsearch</category>
    </item>
    <item>
      <title>How to Integrate the OpenAI API into a Production Express App</title>
      <dc:creator>Harshdeep Singh</dc:creator>
      <pubDate>Tue, 02 Jun 2026 21:28:00 +0000</pubDate>
      <link>https://dev.to/harshdeepsingh13/how-to-integrate-the-openai-api-into-a-production-express-app-2mff</link>
      <guid>https://dev.to/harshdeepsingh13/how-to-integrate-the-openai-api-into-a-production-express-app-2mff</guid>
      <description>&lt;p&gt;Last year I helped a startup integrate the OpenAI API into their product. It was a chat feature — users could ask questions about their data and get natural language answers. The integration took about a day. Three days after launch, the founder messaged me: "Hey, something's wrong. Our AWS bill just showed an unexpected charge."&lt;/p&gt;

&lt;p&gt;It was $340. For three days. They had 60 users.&lt;/p&gt;

&lt;p&gt;The issue wasn't a bug — it was that production API usage looks nothing like a tutorial. The tutorial shows you &lt;code&gt;openai.chat.completions.create()&lt;/code&gt; and returns a response. The tutorial doesn't show you what happens when users send 500-token messages, when they open 15 browser tabs each maintaining their own chat context, or when one user fires requests 30 times per minute because they think it's broken.&lt;/p&gt;

&lt;p&gt;This guide covers what the tutorials skip: rate limiting, token counting, cost guards, streaming, error handling with retries, and model selection. These aren't optional additions — they're what separates a demo from a production feature.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Production Is Different
&lt;/h2&gt;

&lt;p&gt;Here's the gap between tutorial code and production code, stated plainly:&lt;/p&gt;

&lt;p&gt;Concern&lt;br&gt;
Tutorial Code&lt;br&gt;
Production Code&lt;/p&gt;

&lt;p&gt;Cost control&lt;br&gt;
Not mentioned&lt;br&gt;
Token counting, spending limits, model selection by task&lt;/p&gt;

&lt;p&gt;Rate limiting&lt;br&gt;
Not mentioned&lt;br&gt;
Per-user and per-IP limits to prevent abuse&lt;/p&gt;

&lt;p&gt;Error handling&lt;br&gt;
try/catch that logs to console&lt;br&gt;
Typed errors, retries with backoff, user-facing messages&lt;/p&gt;

&lt;p&gt;Response delivery&lt;br&gt;
Wait for full completion, return at once&lt;br&gt;
Streaming via SSE — response appears as it generates&lt;/p&gt;

&lt;p&gt;Context management&lt;br&gt;
Each request is independent&lt;br&gt;
Conversation history managed, truncated at token limit&lt;/p&gt;

&lt;p&gt;Secrets management&lt;br&gt;
API key hardcoded or in &lt;code&gt;.env&lt;/code&gt; (no rotation)&lt;br&gt;
Rotation strategy, usage monitoring, per-feature keys&lt;/p&gt;

&lt;p&gt;Let's build a production-grade Express API that addresses all of this. We'll go layer by layer.&lt;/p&gt;
&lt;h2&gt;
  
  
  The Architecture
&lt;/h2&gt;

&lt;p&gt;┌─────────────────────────────────────────────────────────┐&lt;br&gt;
│                  CLIENT (Browser / Mobile)               │&lt;br&gt;
│          POST /api/chat  { messages: [...] }             │&lt;br&gt;
│          GET  /api/chat/stream (SSE)                     │&lt;br&gt;
└──────────────────────┬──────────────────────────────────┘&lt;br&gt;
                       │&lt;br&gt;
                       ▼&lt;br&gt;
┌─────────────────────────────────────────────────────────┐&lt;br&gt;
│                   EXPRESS MIDDLEWARE STACK               │&lt;br&gt;
│                                                          │&lt;br&gt;
│  1. express-rate-limit  (10 req/min per IP)             │&lt;br&gt;
│  2. tokenGuard()        (reject if &amp;gt; 4,000 tokens)      │&lt;br&gt;
│  3. auth middleware     (verify user session)            │&lt;br&gt;
└──────────────────────┬──────────────────────────────────┘&lt;br&gt;
                       │&lt;br&gt;
                       ▼&lt;br&gt;
┌─────────────────────────────────────────────────────────┐&lt;br&gt;
│                   ROUTE HANDLER                          │&lt;br&gt;
│                                                          │&lt;br&gt;
│  Select model by task type                               │&lt;br&gt;
│  Build messages array from context                       │&lt;br&gt;
│  Call openai.chat.completions.create()                   │&lt;br&gt;
│  Stream or return response                               │&lt;br&gt;
└──────────────────────┬──────────────────────────────────┘&lt;br&gt;
                       │&lt;br&gt;
                       ▼&lt;br&gt;
┌─────────────────────────────────────────────────────────┐&lt;br&gt;
│                   OPENAI API                             │&lt;br&gt;
│  Model: gpt-4o-mini (default) / gpt-4o (complex tasks)  │&lt;br&gt;
└─────────────────────────────────────────────────────────┘&lt;/p&gt;
&lt;h2&gt;
  
  
  Project Setup
&lt;/h2&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;mkdir &lt;/span&gt;express-openai &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nb"&gt;cd &lt;/span&gt;express-openai
npm init &lt;span class="nt"&gt;-y&lt;/span&gt;
npm &lt;span class="nb"&gt;install &lt;/span&gt;express openai express-rate-limit tiktoken dotenv
npm &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;--save-dev&lt;/span&gt; nodemon

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# .env&lt;/span&gt;
&lt;span class="nv"&gt;OPENAI_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;sk-proj-your-key-here
&lt;span class="nv"&gt;PORT&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;3001

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h2&gt;
  
  
  Step 1: The OpenAI Client (Configured for Production)
&lt;/h2&gt;

&lt;p&gt;Don't instantiate the OpenAI client inside route handlers. Create it once, configure it for production, and export it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// src/openaiClient.js&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nx"&gt;OpenAI&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;openai&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;openai&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;apiKey&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;OPENAI_API_KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;maxRetries&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;     &lt;span class="c1"&gt;// retry on transient failures (rate limits, timeouts)&lt;/span&gt;
  &lt;span class="na"&gt;timeout&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;30&lt;/span&gt;&lt;span class="nx"&gt;_000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;   &lt;span class="c1"&gt;// 30 second timeout — don't hang forever&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="c1"&gt;// Model selection by task complexity&lt;/span&gt;
&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;MODELS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;fast&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;gpt-4o-mini&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;   &lt;span class="c1"&gt;// classification, simple Q&amp;amp;A, summarization&lt;/span&gt;
  &lt;span class="na"&gt;smart&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;gpt-4o&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;        &lt;span class="c1"&gt;// complex reasoning, code generation, analysis&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;maxRetries: 3&lt;/code&gt; and &lt;code&gt;timeout&lt;/code&gt; settings are critical. Without a timeout, a hung OpenAI request will keep your Express server's response object open indefinitely — and if you're running on a serverless function, you'll pay for that idle time.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 2: Token Counting and Cost Guard
&lt;/h2&gt;

&lt;p&gt;The &lt;code&gt;tiktoken&lt;/code&gt; library is OpenAI's own tokenizer — it counts tokens the exact same way the API does. Use it to reject requests before they hit the API:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// src/tokenCounter.js&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;encoding_for_model&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;tiktoken&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;countMessageTokens&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;gpt-4o-mini&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;enc&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;encoding_for_model&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;model&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;totalTokens&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

  &lt;span class="k"&gt;for &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;message&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="nx"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;totalTokens&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="c1"&gt;// every message has ~4 tokens of overhead&lt;/span&gt;
    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;role&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="nx"&gt;totalTokens&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="nx"&gt;enc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;encode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;role&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="nx"&gt;totalTokens&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="nx"&gt;enc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;encode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;content&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="nx"&gt;totalTokens&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="c1"&gt;// reply primer&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="nx"&gt;enc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;free&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt; &lt;span class="c1"&gt;// tiktoken requires explicit cleanup&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;totalTokens&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="c1"&gt;// overall reply overhead&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;// Express middleware — rejects requests over the token limit&lt;/span&gt;
&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;tokenGuard&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;maxInputTokens&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="nx"&gt;_000&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;return &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;next&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;messages&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;body&lt;/span&gt;&lt;span class="p"&gt;?.&lt;/span&gt;&lt;span class="nx"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nb"&gt;Array&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;isArray&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;status&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;400&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;error&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;messages must be an array&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;tokenCount&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;countMessageTokens&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;tokenCount&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;maxInputTokens&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;status&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;400&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
        &lt;span class="na"&gt;error&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;`Message too long: &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;tokenCount&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt; tokens (limit: &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;maxInputTokens&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;). Shorten your message or clear the conversation.`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="nx"&gt;tokenCount&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;limit&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;maxInputTokens&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="p"&gt;});&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;tokenCount&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;tokenCount&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="c1"&gt;// pass downstream for logging&lt;/span&gt;
    &lt;span class="nf"&gt;next&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
  &lt;span class="p"&gt;};&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A note on the limit: GPT-4o-mini's context window is 128K tokens, so 4,000 is conservative. But conservative is good here — a user who sends 30,000 tokens in one request is either doing something unusual or has a bug in their client. Reject it, log it, and let them know to clear their context.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 3: Rate Limiting
&lt;/h2&gt;

&lt;p&gt;One user shouldn't be able to drain your API budget or trigger OpenAI rate limits for everyone else. Add rate limiting before the AI routes:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// src/middleware/rateLimiter.js&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nx"&gt;rateLimit&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;express-rate-limit&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;aiRateLimiter&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;rateLimit&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;windowMs&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;60&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;// 1-minute window&lt;/span&gt;
  &lt;span class="na"&gt;max&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;15&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;               &lt;span class="c1"&gt;// 15 requests per minute per IP&lt;/span&gt;
  &lt;span class="na"&gt;standardHeaders&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="c1"&gt;// return RateLimit headers&lt;/span&gt;
  &lt;span class="na"&gt;legacyHeaders&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;message&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;error&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Too many requests. Please wait a moment before trying again.&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;retryAfter&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;60&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="na"&gt;keyGenerator&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="c1"&gt;// Use authenticated user ID if available, otherwise fall back to IP&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;user&lt;/span&gt;&lt;span class="p"&gt;?.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ip&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="c1"&gt;// Stricter limit for expensive models&lt;/span&gt;
&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;smartModelLimiter&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;rateLimit&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;windowMs&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;60&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;max&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;message&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;error&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Too many complex requests. Rate limited for 60 seconds.&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Step 4: Error Handling with Typed OpenAI Errors
&lt;/h2&gt;

&lt;p&gt;The OpenAI Node SDK throws typed errors. Use them — don't just check &lt;code&gt;err.message&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// src/middleware/openaiErrorHandler.js&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nx"&gt;OpenAI&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;openai&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;handleOpenAIError&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;next&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;err&lt;/span&gt; &lt;span class="k"&gt;instanceof&lt;/span&gt; &lt;span class="nx"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;APIError&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`OpenAI API error: &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;status&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt; &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;name&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;message&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;requestId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;?.[&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;x-request-id&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;

    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;status&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="mi"&gt;429&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;status&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;429&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
        &lt;span class="na"&gt;error&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;AI service is busy. Please try again in a moment.&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;retryAfter&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;parseInt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;?.[&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;retry-after&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;5&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
      &lt;span class="p"&gt;});&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;status&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="mi"&gt;400&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;status&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;400&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
        &lt;span class="na"&gt;error&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Invalid request to AI service. Check your message format.&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="p"&gt;});&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;status&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="mi"&gt;401&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;OpenAI authentication failed — check OPENAI_API_KEY&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
      &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;status&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;503&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;error&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;AI service unavailable.&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="c1"&gt;// Not an OpenAI error — pass to your generic error handler&lt;/span&gt;
  &lt;span class="nf"&gt;next&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Step 5: The Chat Endpoint (Non-Streaming)
&lt;/h2&gt;

&lt;p&gt;Let's wire everything together for a standard, non-streaming response first:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// src/routes/chat.js&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nx"&gt;express&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;express&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;openai&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;MODELS&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;../openaiClient.js&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;tokenGuard&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;../tokenCounter.js&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;aiRateLimiter&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;../middleware/rateLimiter.js&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;router&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;express&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Router&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

&lt;span class="nx"&gt;router&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;/&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;aiRateLimiter&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nf"&gt;tokenGuard&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="nx"&gt;_000&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;next&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;useSmartModel&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;body&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;useSmartModel&lt;/span&gt; &lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="nx"&gt;MODELS&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;smart&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;MODELS&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;fast&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;completion&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;openai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
        &lt;span class="nx"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="nx"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;max_tokens&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="nx"&gt;_000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="c1"&gt;// cap output tokens to control cost&lt;/span&gt;
        &lt;span class="na"&gt;temperature&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.7&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="p"&gt;});&lt;/span&gt;

      &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;reply&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;completion&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
      &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;usage&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;completion&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;usage&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

      &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
        &lt;span class="na"&gt;message&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;reply&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;usage&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
          &lt;span class="na"&gt;inputTokens&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;usage&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;prompt_tokens&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
          &lt;span class="na"&gt;outputTokens&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;usage&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;completion_tokens&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
          &lt;span class="na"&gt;totalTokens&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;usage&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;total_tokens&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
          &lt;span class="na"&gt;estimatedCostUsd&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;estimateCost&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;usage&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;model&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="p"&gt;},&lt;/span&gt;
      &lt;span class="p"&gt;});&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;catch &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nf"&gt;next&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;estimateCost&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;usage&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;model&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="c1"&gt;// Prices per million tokens (as of mid-2025)&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;pricing&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;gpt-4o-mini&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;input&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.15&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;output&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.60&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;gpt-4o&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;input&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;5.00&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;output&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;15.00&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;};&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;p&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;pricing&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;model&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="nx"&gt;pricing&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;gpt-4o-mini&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;inputCost&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;usage&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;prompt_tokens&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="nx"&gt;_000_000&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="nx"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;input&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;outputCost&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;usage&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;completion_tokens&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="nx"&gt;_000_000&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="nx"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;output&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;Number&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;inputCost&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="nx"&gt;outputCost&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;toFixed&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="k"&gt;default&lt;/span&gt; &lt;span class="nx"&gt;router&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Notice &lt;code&gt;max_tokens: 1_000&lt;/code&gt;. Without this, GPT-4o can produce 4,096 output tokens per request. If a user asks it to "write me a book," it will try. The &lt;code&gt;max_tokens&lt;/code&gt; cap is your backstop.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 6: Streaming Responses with Server-Sent Events
&lt;/h2&gt;

&lt;p&gt;Streaming makes AI features feel responsive. Instead of a blank screen for 3–8 seconds, the user sees text appear word by word. It's the difference between "this feels AI-powered" and "this is broken."&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// src/routes/chat-stream.js&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nx"&gt;express&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;express&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;openai&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;MODELS&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;../openaiClient.js&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;tokenGuard&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;../tokenCounter.js&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;aiRateLimiter&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;../middleware/rateLimiter.js&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;router&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;express&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Router&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

&lt;span class="nx"&gt;router&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;/stream&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;aiRateLimiter&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nf"&gt;tokenGuard&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="nx"&gt;_000&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;next&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;messages&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;body&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="c1"&gt;// Establish SSE connection&lt;/span&gt;
    &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;setHeader&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Content-Type&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;text/event-stream&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;setHeader&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Cache-Control&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;no-cache&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;setHeader&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Connection&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;keep-alive&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;setHeader&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Access-Control-Allow-Origin&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;*&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;flushHeaders&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt; &lt;span class="c1"&gt;// send headers immediately&lt;/span&gt;

    &lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;stream&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;openai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
        &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;MODELS&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;fast&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="nx"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;max_tokens&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="nx"&gt;_000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;stream&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="p"&gt;});&lt;/span&gt;

      &lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;totalOutputTokens&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

      &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="k"&gt;await &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;chunk&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="nx"&gt;stream&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;delta&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]?.&lt;/span&gt;&lt;span class="nx"&gt;delta&lt;/span&gt;&lt;span class="p"&gt;?.&lt;/span&gt;&lt;span class="nx"&gt;content&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="dl"&gt;""&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;delta&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
          &lt;span class="nx"&gt;totalOutputTokens&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="c1"&gt;// approximate; tiktoken is more accurate&lt;/span&gt;
          &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`data: &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stringify&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;delta&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;delta&lt;/span&gt; &lt;span class="p"&gt;})}&lt;/span&gt;&lt;span class="s2"&gt;

`&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;

        &lt;span class="c1"&gt;// Check for stop reason&lt;/span&gt;
        &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]?.&lt;/span&gt;&lt;span class="nx"&gt;finish_reason&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;length&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
          &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`data: &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stringify&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;warning&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;message&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Response truncated at token limit&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;})}&lt;/span&gt;&lt;span class="s2"&gt;

`&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
      &lt;span class="p"&gt;}&lt;/span&gt;

      &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`data: &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stringify&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;done&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;})}&lt;/span&gt;&lt;span class="s2"&gt;

`&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
      &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;end&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;catch &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="c1"&gt;// Send error over SSE before closing&lt;/span&gt;
      &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`data: &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stringify&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;error&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;message&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Generation failed. Please try again.&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;})}&lt;/span&gt;&lt;span class="s2"&gt;

`&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
      &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;end&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
      &lt;span class="c1"&gt;// Also pass to error handler for logging&lt;/span&gt;
      &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Streaming error:&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="k"&gt;default&lt;/span&gt; &lt;span class="nx"&gt;router&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Watch: OpenAI API with Node.js + Express
&lt;/h2&gt;

&lt;h2&gt;
  
  
  Streaming vs. Non-Streaming — When to Use Which
&lt;/h2&gt;

&lt;p&gt;Factor&lt;br&gt;
Non-Streaming&lt;br&gt;
Streaming (SSE)&lt;/p&gt;

&lt;p&gt;User experience&lt;br&gt;
Blank screen until done (3–8s)&lt;br&gt;
Text appears word by word — feels instant&lt;/p&gt;

&lt;p&gt;Complexity&lt;br&gt;
Standard REST response&lt;br&gt;
SSE connection, chunked parsing on frontend&lt;/p&gt;

&lt;p&gt;Usage logging&lt;br&gt;
Easy — &lt;code&gt;completion.usage&lt;/code&gt; has exact token counts&lt;br&gt;
Harder — token counts only available via the final chunk&lt;/p&gt;

&lt;p&gt;Caching&lt;br&gt;
Can cache the full response&lt;br&gt;
Can't cache a stream&lt;/p&gt;

&lt;p&gt;Best for&lt;br&gt;
API-to-API calls, short responses, classification tasks&lt;br&gt;
User-facing chat, long completions, code generation&lt;/p&gt;

&lt;p&gt;Serverless functions&lt;br&gt;
Works everywhere&lt;br&gt;
Needs long-running connection — use Vercel Edge Functions or a real server&lt;/p&gt;

&lt;h2&gt;
  
  
  Testing Your OpenAI Integration
&lt;/h2&gt;

&lt;p&gt;Mocking the OpenAI API in tests is a trap. The mock will pass but the real integration will fail in ways you didn't anticipate — different error formats, unexpected token usage, streaming chunk structure variations.&lt;/p&gt;

&lt;p&gt;Instead:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Unit test everything except the API call.&lt;/strong&gt; Test your token counting, your error handler, your response formatter — all without touching OpenAI. These functions should be pure and deterministic.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Use a cheap model for integration tests.&lt;/strong&gt; &lt;code&gt;gpt-4o-mini&lt;/code&gt; is $0.15 per million input tokens. Your integration test suite probably costs fractions of a cent to run. Run it.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Record and replay for expensive tests.&lt;/strong&gt; Libraries like &lt;code&gt;nock&lt;/code&gt; or VCR-style recording let you record real API responses and replay them in future test runs without hitting the API.
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Example: testing the token guard middleware in isolation&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;tokenGuard&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;../src/tokenCounter.js&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;createMockMiddlewareContext&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;./helpers.js&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="nf"&gt;test&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;tokenGuard rejects messages over the limit&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;guard&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;tokenGuard&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="c1"&gt;// tiny limit for test&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;next&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;createMockMiddlewareContext&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="na"&gt;body&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt; &lt;span class="na"&gt;role&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;user&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;a&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;repeat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;500&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;}],&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;

  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;guard&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;next&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="nf"&gt;expect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;statusCode&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;toBe&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;400&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="nf"&gt;expect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;body&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;toContain&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;too long&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="nf"&gt;expect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;next&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nx"&gt;not&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;toHaveBeenCalled&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Initialize the OpenAI client once&lt;/strong&gt; with &lt;code&gt;maxRetries&lt;/code&gt; and &lt;code&gt;timeout&lt;/code&gt; set. Don't instantiate it in route handlers or you'll get a new client per request with no retry or timeout configuration.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Count tokens before you call the API.&lt;/strong&gt; Use &lt;code&gt;tiktoken&lt;/code&gt; to measure input size and reject oversized requests before they cost you money. Set a &lt;code&gt;max_tokens&lt;/code&gt; cap on output for the same reason.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Rate limit by user ID, not just IP.&lt;/strong&gt; Authenticated users with the same IP (corporate NAT, mobile networks) would all share a single IP limit — use their user ID as the rate limit key.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Use typed error handling&lt;/strong&gt; — &lt;code&gt;instanceof OpenAI.APIError&lt;/code&gt; gives you the status code, request ID, and message. Turn 429s into user-friendly retry prompts, not 500 errors.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Stream for user-facing features, skip it for internal calls.&lt;/strong&gt; SSE streaming transforms the UX for chat interfaces. For batch processing or API-to-API calls, non-streaming is simpler to implement and log.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Test everything except the API call.&lt;/strong&gt; Token counting, error handling, and response formatting are all pure functions you can test cheaply. For integration tests, use &lt;code&gt;gpt-4o-mini&lt;/code&gt; — it's cheap enough to run in CI.&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>openai</category>
      <category>express</category>
      <category>node</category>
      <category>ai</category>
    </item>
    <item>
      <title>Deploying a Next.js App to AWS with CI/CD Pipelines (Step-by-Step)</title>
      <dc:creator>Harshdeep Singh</dc:creator>
      <pubDate>Tue, 02 Jun 2026 21:27:52 +0000</pubDate>
      <link>https://dev.to/harshdeepsingh13/deploying-a-nextjs-app-to-aws-with-cicd-pipelines-step-by-step-2j76</link>
      <guid>https://dev.to/harshdeepsingh13/deploying-a-nextjs-app-to-aws-with-cicd-pipelines-step-by-step-2j76</guid>
      <description>&lt;p&gt;The first time I deployed a Next.js app to production, it took me three days. Not because the app was complicated — it was a straightforward portfolio site. It took three days because I had no idea what I was doing with AWS, I'd never written a GitHub Actions workflow, and every tutorial I found either skipped the hard parts or assumed I already knew them.&lt;/p&gt;

&lt;p&gt;By the time I was done, I had a deployment pipeline I was genuinely proud of: push to main, GitHub Actions runs the build, tests pass, the app deploys to an EC2 instance behind CloudFront. Zero manual steps. Zero downtime deploys. Total cost: about $5/month.&lt;/p&gt;

&lt;p&gt;This guide is the one I wish had existed. We're going to deploy a Next.js app to AWS from scratch — EC2 for compute, CloudFront for CDN, GitHub Actions for CI/CD — with every step explained so you understand what you're building, not just copying commands.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why AWS Instead of Vercel?
&lt;/h2&gt;

&lt;p&gt;This is a fair question. Vercel is genuinely excellent for Next.js, and for most projects it's the right call. You push, it deploys. Done.&lt;/p&gt;

&lt;p&gt;AWS makes sense when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You need to control the infrastructure (compliance, data residency, custom VPC configuration)&lt;/li&gt;
&lt;li&gt;You're running other services (databases, queues, lambdas) in AWS and want everything in the same network&lt;/li&gt;
&lt;li&gt;You want to learn infrastructure skills that transfer to enterprise environments&lt;/li&gt;
&lt;li&gt;Your app has specific performance requirements that benefit from custom CloudFront configuration&lt;/li&gt;
&lt;li&gt;You're a freelancer or consultant who wants to bill separately for infrastructure&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If none of those apply to you, use Vercel. This guide is for when they do.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Architecture
&lt;/h2&gt;

&lt;p&gt;Here's what we're building:&lt;/p&gt;

&lt;p&gt;┌────────────────────────────────────────────────────────┐&lt;br&gt;
│                   GITHUB ACTIONS CI/CD                  │&lt;br&gt;
│                                                          │&lt;br&gt;
│  Push to main → Build → Test → Deploy to EC2           │&lt;br&gt;
└──────────────────────┬─────────────────────────────────┘&lt;br&gt;
                       │ SSH deploy&lt;br&gt;
                       ▼&lt;br&gt;
┌────────────────────────────────────────────────────────┐&lt;br&gt;
│                    AWS EC2 INSTANCE                      │&lt;br&gt;
│                                                          │&lt;br&gt;
│  Ubuntu 22.04 LTS                                        │&lt;br&gt;
│  Node.js 20 + PM2 (process manager)                     │&lt;br&gt;
│  Next.js app running on port 3000                        │&lt;br&gt;
│  Nginx reverse proxy (port 80/443 → 3000)               │&lt;br&gt;
└──────────────────────┬─────────────────────────────────┘&lt;br&gt;
                       │ Origin&lt;br&gt;
                       ▼&lt;br&gt;
┌────────────────────────────────────────────────────────┐&lt;br&gt;
│                  CLOUDFRONT CDN                          │&lt;br&gt;
│                                                          │&lt;br&gt;
│  Static assets cached at edge (/_next/static/*)         │&lt;br&gt;
│  TTL: 1 year for static, 0 for HTML                    │&lt;br&gt;
│  SSL termination via ACM certificate                     │&lt;br&gt;
│  Custom domain: yourapp.com                             │&lt;br&gt;
└────────────────────────────────────────────────────────┘&lt;/p&gt;

&lt;p&gt;This isn't the only way to run Next.js on AWS. You could use Elastic Beanstalk, App Runner, ECS, or deploy static exports to S3 + CloudFront. The EC2 + CloudFront approach gives you the most control and transfers the most skills to enterprise environments.&lt;/p&gt;
&lt;h2&gt;
  
  
  Prerequisites
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;An AWS account (free tier works for learning; a t3.micro is enough for small apps)&lt;/li&gt;
&lt;li&gt;A domain name (optional but recommended — we'll set up SSL)&lt;/li&gt;
&lt;li&gt;A GitHub repository with your Next.js app&lt;/li&gt;
&lt;li&gt;Basic familiarity with the AWS console&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The total setup takes about 90 minutes the first time. After that, every deployment is automatic.&lt;/p&gt;
&lt;h2&gt;
  
  
  Step 1: Set Up the EC2 Instance
&lt;/h2&gt;

&lt;p&gt;In the AWS Console, navigate to EC2 and launch a new instance. The settings that matter:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;AMI:&lt;/strong&gt; Ubuntu Server 22.04 LTS (free tier eligible)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Instance type:&lt;/strong&gt; t3.micro (1 vCPU, 1GB RAM) for small apps; t3.small for medium traffic&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Key pair:&lt;/strong&gt; Create a new one, download it — you'll need this for SSH and GitHub Actions&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Security group:&lt;/strong&gt; Allow inbound traffic on ports 22 (SSH), 80 (HTTP), and 443 (HTTPS). Add your IP as the only source for port 22 (don't expose SSH to 0.0.0.0/0).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Once the instance is running, SSH in and set up the environment:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Connect to your instance&lt;/span&gt;
ssh &lt;span class="nt"&gt;-i&lt;/span&gt; your-key.pem ubuntu@YOUR_EC2_PUBLIC_IP

&lt;span class="c"&gt;# Update system packages&lt;/span&gt;
&lt;span class="nb"&gt;sudo &lt;/span&gt;apt-get update &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nb"&gt;sudo &lt;/span&gt;apt-get upgrade &lt;span class="nt"&gt;-y&lt;/span&gt;

&lt;span class="c"&gt;# Install Node.js 20 via NodeSource&lt;/span&gt;
curl &lt;span class="nt"&gt;-fsSL&lt;/span&gt; https://deb.nodesource.com/setup_20.x | &lt;span class="nb"&gt;sudo&lt;/span&gt; &lt;span class="nt"&gt;-E&lt;/span&gt; bash -
&lt;span class="nb"&gt;sudo &lt;/span&gt;apt-get &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-y&lt;/span&gt; nodejs

&lt;span class="c"&gt;# Install PM2 globally (process manager for Node.js)&lt;/span&gt;
&lt;span class="nb"&gt;sudo &lt;/span&gt;npm &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-g&lt;/span&gt; pm2

&lt;span class="c"&gt;# Install Nginx&lt;/span&gt;
&lt;span class="nb"&gt;sudo &lt;/span&gt;apt-get &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-y&lt;/span&gt; nginx

&lt;span class="c"&gt;# Verify everything installed correctly&lt;/span&gt;
node &lt;span class="nt"&gt;--version&lt;/span&gt;   &lt;span class="c"&gt;# v20.x.x&lt;/span&gt;
npm &lt;span class="nt"&gt;--version&lt;/span&gt;    &lt;span class="c"&gt;# 10.x.x&lt;/span&gt;
pm2 &lt;span class="nt"&gt;--version&lt;/span&gt;    &lt;span class="c"&gt;# 5.x.x&lt;/span&gt;
nginx &lt;span class="nt"&gt;-v&lt;/span&gt;         &lt;span class="c"&gt;# nginx/1.24.x&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Step 2: Configure Nginx as a Reverse Proxy
&lt;/h2&gt;

&lt;p&gt;Nginx will listen on port 80 and forward requests to your Next.js app on port 3000. This is the standard setup for Node.js apps on Linux servers.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sudo &lt;/span&gt;nano /etc/nginx/sites-available/nextjs-app

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Paste this configuration:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;server &lt;span class="o"&gt;{&lt;/span&gt;
    listen 80&lt;span class="p"&gt;;&lt;/span&gt;
    server_name YOUR_DOMAIN.com www.YOUR_DOMAIN.com&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="c"&gt;# Proxy requests to Next.js&lt;/span&gt;
    location / &lt;span class="o"&gt;{&lt;/span&gt;
        proxy_pass http://localhost:3000&lt;span class="p"&gt;;&lt;/span&gt;
        proxy_http_version 1.1&lt;span class="p"&gt;;&lt;/span&gt;
        proxy_set_header Upgrade &lt;span class="nv"&gt;$http_upgrade&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        proxy_set_header Connection &lt;span class="s1"&gt;'upgrade'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        proxy_set_header Host &lt;span class="nv"&gt;$host&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        proxy_set_header X-Real-IP &lt;span class="nv"&gt;$remote_addr&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        proxy_set_header X-Forwarded-For &lt;span class="nv"&gt;$proxy_add_x_forwarded_for&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        proxy_set_header X-Forwarded-Proto &lt;span class="nv"&gt;$scheme&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        proxy_cache_bypass &lt;span class="nv"&gt;$http_upgrade&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;

    &lt;span class="c"&gt;# Cache Next.js static assets at Nginx level too&lt;/span&gt;
    location /_next/static/ &lt;span class="o"&gt;{&lt;/span&gt;
        proxy_pass http://localhost:3000&lt;span class="p"&gt;;&lt;/span&gt;
        proxy_cache_valid 200 1y&lt;span class="p"&gt;;&lt;/span&gt;
        add_header Cache-Control &lt;span class="s2"&gt;"public, max-age=31536000, immutable"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Enable the site and test the config&lt;/span&gt;
&lt;span class="nb"&gt;sudo ln&lt;/span&gt; &lt;span class="nt"&gt;-s&lt;/span&gt; /etc/nginx/sites-available/nextjs-app /etc/nginx/sites-enabled/
&lt;span class="nb"&gt;sudo &lt;/span&gt;nginx &lt;span class="nt"&gt;-t&lt;/span&gt;            &lt;span class="c"&gt;# should say "test is successful"&lt;/span&gt;
&lt;span class="nb"&gt;sudo &lt;/span&gt;systemctl restart nginx

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Step 3: The GitHub Actions Workflow
&lt;/h2&gt;

&lt;p&gt;This is where the CI/CD magic happens. The workflow does four things: checks out code, runs your build, SSHs into the server, and restarts the app. Create this file in your repository:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;mkdir&lt;/span&gt; &lt;span class="nt"&gt;-p&lt;/span&gt; .github/workflows

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Create &lt;code&gt;.github/workflows/deploy.yml&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Deploy to AWS EC2&lt;/span&gt;

&lt;span class="na"&gt;on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;push&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;branches&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;main&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
  &lt;span class="na"&gt;workflow_dispatch&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;    &lt;span class="c1"&gt;# also allow manual triggers&lt;/span&gt;

&lt;span class="na"&gt;jobs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;deploy&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;runs-on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ubuntu-latest&lt;/span&gt;

    &lt;span class="na"&gt;steps&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Checkout code&lt;/span&gt;
        &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;actions/checkout@v4&lt;/span&gt;

      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Set up Node.js&lt;/span&gt;
        &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;actions/setup-node@v4&lt;/span&gt;
        &lt;span class="na"&gt;with&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;node-version&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;20"&lt;/span&gt;
          &lt;span class="na"&gt;cache&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;npm"&lt;/span&gt;

      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Install dependencies&lt;/span&gt;
        &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;npm ci&lt;/span&gt;

      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Run linter&lt;/span&gt;
        &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;npm run lint&lt;/span&gt;

      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Build Next.js app&lt;/span&gt;
        &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;npm run build&lt;/span&gt;
        &lt;span class="na"&gt;env&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="c1"&gt;# Pass any build-time env vars here&lt;/span&gt;
          &lt;span class="na"&gt;NEXT_PUBLIC_GA_ID&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ secrets.NEXT_PUBLIC_GA_ID }}&lt;/span&gt;

      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Deploy to EC2&lt;/span&gt;
        &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;appleboy/ssh-action@v1.0.3&lt;/span&gt;
        &lt;span class="na"&gt;with&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;host&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ secrets.EC2_HOST }}&lt;/span&gt;
          &lt;span class="na"&gt;username&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ubuntu&lt;/span&gt;
          &lt;span class="na"&gt;key&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ secrets.EC2_PRIVATE_KEY }}&lt;/span&gt;
          &lt;span class="na"&gt;script&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
            &lt;span class="s"&gt;cd /var/www/nextjs-app&lt;/span&gt;

            &lt;span class="s"&gt;# Pull latest code&lt;/span&gt;
            &lt;span class="s"&gt;git pull origin main&lt;/span&gt;

            &lt;span class="s"&gt;# Install dependencies (production only)&lt;/span&gt;
            &lt;span class="s"&gt;npm ci --omit=dev&lt;/span&gt;

            &lt;span class="s"&gt;# Build the app on the server&lt;/span&gt;
            &lt;span class="s"&gt;npm run build&lt;/span&gt;

            &lt;span class="s"&gt;# Restart with PM2 (zero-downtime reload)&lt;/span&gt;
            &lt;span class="s"&gt;pm2 reload nextjs-app --update-env&lt;/span&gt;

            &lt;span class="s"&gt;echo "Deploy complete at $(date)"&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Step 4: Set Up GitHub Secrets
&lt;/h2&gt;

&lt;p&gt;In your GitHub repository, go to Settings → Secrets and variables → Actions, and add these three secrets:&lt;/p&gt;

&lt;p&gt;Secret Name&lt;br&gt;
Value&lt;/p&gt;

&lt;p&gt;&lt;code&gt;EC2_HOST&lt;/code&gt;&lt;br&gt;
Your EC2 instance's public IP address or domain&lt;/p&gt;

&lt;p&gt;&lt;code&gt;EC2_PRIVATE_KEY&lt;/code&gt;&lt;br&gt;
The full contents of your &lt;code&gt;.pem&lt;/code&gt; file (including the BEGIN/END lines)&lt;/p&gt;

&lt;p&gt;&lt;code&gt;NEXT_PUBLIC_GA_ID&lt;/code&gt;&lt;br&gt;
Your Google Analytics measurement ID (or any other public env vars)&lt;/p&gt;

&lt;p&gt;For the private key, open the .pem file in a text editor, copy everything including &lt;code&gt;-----BEGIN RSA PRIVATE KEY-----&lt;/code&gt; and &lt;code&gt;-----END RSA PRIVATE KEY-----&lt;/code&gt;, and paste it as the secret value.&lt;/p&gt;
&lt;h2&gt;
  
  
  Step 5: First Deploy and PM2 Setup
&lt;/h2&gt;

&lt;p&gt;Before the GitHub Action can work, you need to get the app running on the server for the first time:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Clone your repo to the server&lt;/span&gt;
&lt;span class="nb"&gt;sudo mkdir&lt;/span&gt; &lt;span class="nt"&gt;-p&lt;/span&gt; /var/www/nextjs-app
&lt;span class="nb"&gt;sudo chown &lt;/span&gt;ubuntu:ubuntu /var/www/nextjs-app
&lt;span class="nb"&gt;cd&lt;/span&gt; /var/www/nextjs-app
git clone https://github.com/YOUR_USERNAME/YOUR_REPO.git &lt;span class="nb"&gt;.&lt;/span&gt;

&lt;span class="c"&gt;# Create .env file on the server&lt;/span&gt;
nano .env
&lt;span class="c"&gt;# Add your production environment variables here&lt;/span&gt;

&lt;span class="c"&gt;# Install dependencies and build&lt;/span&gt;
npm ci
npm run build

&lt;span class="c"&gt;# Start with PM2&lt;/span&gt;
pm2 start npm &lt;span class="nt"&gt;--name&lt;/span&gt; &lt;span class="s2"&gt;"nextjs-app"&lt;/span&gt; &lt;span class="nt"&gt;--&lt;/span&gt; start
pm2 save                  &lt;span class="c"&gt;# persist across server restarts&lt;/span&gt;
pm2 startup               &lt;span class="c"&gt;# generate startup script&lt;/span&gt;
&lt;span class="c"&gt;# PM2 will output a command to run — run it&lt;/span&gt;

&lt;span class="c"&gt;# Verify the app is running&lt;/span&gt;
pm2 status
curl http://localhost:3000  &lt;span class="c"&gt;# should return HTML&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Step 6: CloudFront CDN (Optional but Recommended)
&lt;/h2&gt;

&lt;p&gt;CloudFront puts your app behind a global CDN, which means static assets load from an edge location near your users instead of your EC2 server. For most apps, this makes a meaningful difference in load times outside your server's region.&lt;/p&gt;

&lt;p&gt;In the AWS Console, go to CloudFront and create a new distribution:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Origin domain:&lt;/strong&gt; Your EC2 public IP or domain (not localhost)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Origin protocol policy:&lt;/strong&gt; HTTP only (Nginx handles the connection to EC2)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Viewer protocol policy:&lt;/strong&gt; Redirect HTTP to HTTPS&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cache policy for &lt;code&gt;/_next/static/*&lt;/code&gt;:&lt;/strong&gt; &lt;code&gt;CachingOptimized&lt;/code&gt; — these files are content-addressed, so they can be cached for years&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cache policy for &lt;code&gt;/*&lt;/code&gt; (HTML pages):&lt;/strong&gt; &lt;code&gt;CachingDisabled&lt;/code&gt; — Next.js handles its own cache headers; CloudFront should pass them through&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you have a domain, attach it to the CloudFront distribution and request an ACM (AWS Certificate Manager) certificate for free SSL. DNS validation takes about 15 minutes.&lt;/p&gt;

&lt;h2&gt;
  
  
  Watch: Next.js CI/CD to AWS EC2 with GitHub Actions
&lt;/h2&gt;

&lt;h2&gt;
  
  
  Common Pitfalls
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Building on the server vs. building in CI
&lt;/h3&gt;

&lt;p&gt;The workflow above builds in the GitHub Action AND on the server. That's redundant — you only need to do it in one place. For small apps, building on the server is fine (simpler). For larger teams, build in CI, upload the artifact, and skip the build step on the server. The tradeoff: artifacts can be large (100MB+), so you need S3 or similar to store them.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Forgetting to set NODE_ENV=production
&lt;/h3&gt;

&lt;p&gt;When you run &lt;code&gt;npm start&lt;/code&gt; (which runs &lt;code&gt;next start&lt;/code&gt;), Next.js automatically sets &lt;code&gt;NODE_ENV=production&lt;/code&gt;. But PM2 doesn't always inherit this. Be explicit in your PM2 config or startup command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pm2 start npm &lt;span class="nt"&gt;--name&lt;/span&gt; &lt;span class="s2"&gt;"nextjs-app"&lt;/span&gt; &lt;span class="nt"&gt;--&lt;/span&gt; start &lt;span class="nt"&gt;--&lt;/span&gt; &lt;span class="nt"&gt;--NODE_ENV&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;production

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  3. Not configuring PM2 to restart on crash
&lt;/h3&gt;

&lt;p&gt;By default PM2 restarts crashed processes, but you want to limit restarts to prevent crash loops. Add &lt;code&gt;--max-restarts 10&lt;/code&gt; and &lt;code&gt;--min-uptime 5000&lt;/code&gt; to your pm2 start command. Five seconds of uptime before a restart counts is usually enough to catch truly broken deployments.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. SSH key permissions
&lt;/h3&gt;

&lt;p&gt;The most common SSH error you'll hit is &lt;code&gt;UNPROTECTED PRIVATE KEY FILE&lt;/code&gt;. GitHub Actions handles this correctly when you use &lt;code&gt;appleboy/ssh-action&lt;/code&gt;, but if you're doing raw SSH commands, your .pem file needs &lt;code&gt;chmod 400 your-key.pem&lt;/code&gt; — readable only by the owner, nothing else.&lt;/p&gt;

&lt;h2&gt;
  
  
  EC2 vs. Vercel vs. AWS Amplify — Which Should You Choose?
&lt;/h2&gt;

&lt;p&gt;Factor&lt;br&gt;
EC2 + CloudFront&lt;br&gt;
Vercel&lt;br&gt;
AWS Amplify&lt;/p&gt;

&lt;p&gt;Setup time&lt;br&gt;
90 min (first time)&lt;br&gt;
5 min&lt;br&gt;
20 min&lt;/p&gt;

&lt;p&gt;Next.js feature support&lt;br&gt;
Full (you control the runtime)&lt;br&gt;
Full (built for Next.js)&lt;br&gt;
Most features, some lag&lt;/p&gt;

&lt;p&gt;Cost at low traffic&lt;br&gt;
~$5/month (t3.micro)&lt;br&gt;
Free tier, then $20+/month&lt;br&gt;
Pay per build + hosting&lt;/p&gt;

&lt;p&gt;Cost at high traffic&lt;br&gt;
Predictable (fixed instance)&lt;br&gt;
Can get expensive fast&lt;br&gt;
Moderate&lt;/p&gt;

&lt;p&gt;Infrastructure control&lt;br&gt;
Full — you own everything&lt;br&gt;
None — Vercel manages it&lt;br&gt;
Partial&lt;/p&gt;

&lt;p&gt;Learning value&lt;br&gt;
High — enterprise-transferable&lt;br&gt;
Low (it just works)&lt;br&gt;
Medium&lt;/p&gt;

&lt;p&gt;Best for&lt;br&gt;
Learning, compliance, cost control&lt;br&gt;
Speed, simplicity, teams&lt;br&gt;
Existing AWS customers&lt;/p&gt;

&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;The stack:&lt;/strong&gt; EC2 (compute) + Nginx (reverse proxy) + PM2 (process manager) + CloudFront (CDN) + GitHub Actions (CI/CD). Each layer has one job.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GitHub Actions workflow:&lt;/strong&gt; trigger on push to main → install → lint → build → SSH into EC2 → &lt;code&gt;git pull&lt;/code&gt; → rebuild → &lt;code&gt;pm2 reload&lt;/code&gt;. About 25 lines of YAML.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Store secrets properly:&lt;/strong&gt; EC2 host, private key, and env vars go in GitHub repository secrets — never hardcoded in workflow files.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;PM2 is essential&lt;/strong&gt; for production Node.js — it keeps the process alive, restarts on crash, and enables zero-downtime reloads. Run &lt;code&gt;pm2 startup&lt;/code&gt; to make it persist across server reboots.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;CloudFront is optional but worth it&lt;/strong&gt; — static assets cached at the edge make a real difference for users outside your server's region, and the free ACM SSL certificate saves you the hassle of Certbot configuration.&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>nextjs</category>
      <category>aws</category>
      <category>cicd</category>
      <category>devops</category>
    </item>
    <item>
      <title>React + TypeScript Best Practices in 2025: What Actually Matters</title>
      <dc:creator>Harshdeep Singh</dc:creator>
      <pubDate>Tue, 02 Jun 2026 21:21:57 +0000</pubDate>
      <link>https://dev.to/harshdeepsingh13/react-typescript-best-practices-in-2025-what-actually-matters-22dn</link>
      <guid>https://dev.to/harshdeepsingh13/react-typescript-best-practices-in-2025-what-actually-matters-22dn</guid>
      <description>&lt;p&gt;You open a new React project, add TypeScript, and immediately hit Stack Overflow for how to type your first prop. The first answer says use &lt;code&gt;interface&lt;/code&gt;. The second says &lt;code&gt;type&lt;/code&gt;. The third is a six-paragraph thread about why one is semantically superior to the other. You close the tab and just write &lt;code&gt;any&lt;/code&gt; to get on with your life.&lt;/p&gt;

&lt;p&gt;Sound familiar? TypeScript in React has a reputation problem. Not because it's hard — it's genuinely great once it clicks — but because the community has generated a staggering volume of contradictory, context-free advice. Every dev tool tutorial starts with TypeScript. Every linting config bans &lt;code&gt;any&lt;/code&gt;. Every PR reviewer has a hot take on generics.&lt;/p&gt;

&lt;p&gt;This guide cuts through that. I'm not going to cover every TypeScript feature or every React pattern. What I'm going to do is share the specific conventions I use in production React + TypeScript apps in 2025 — the things that have made codebases genuinely easier to work in, not just theoretically safer.&lt;/p&gt;

&lt;h2&gt;
  
  
  What this guide is NOT
&lt;/h2&gt;

&lt;p&gt;Before we get into it, let me set expectations clearly:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Not a TypeScript basics tutorial.&lt;/strong&gt; I'm assuming you know what a type is, what an interface is, and that &lt;code&gt;string !== String&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Not exhaustive.&lt;/strong&gt; TypeScript has dozens of utility types, conditional types, template literal types, and more. I'm not covering all of them — just the ones I reach for constantly.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Not framework-neutral.&lt;/strong&gt; This is specifically about TypeScript in React apps. Some of these patterns won't apply to a Node.js CLI or a library.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Not about configuration.&lt;/strong&gt; Strict mode settings, tsconfig targets, module resolution — another article for another day.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;What this guide IS is opinionated. I'm going to tell you what I think the right call is in most situations, and why. You'll disagree with some of it. That's fine.&lt;/p&gt;

&lt;h2&gt;
  
  
  Typing Props the Right Way
&lt;/h2&gt;

&lt;p&gt;Let's start here because it's where every React + TypeScript journey begins, and where a lot of the confusion lives.&lt;/p&gt;

&lt;h3&gt;
  
  
  Interface vs. Type Alias
&lt;/h3&gt;

&lt;p&gt;Here's my rule: &lt;strong&gt;use &lt;code&gt;interface&lt;/code&gt; for component props, &lt;code&gt;type&lt;/code&gt; for everything else&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Why? Interfaces have declaration merging, which can occasionally bite you in unexpected ways, but they also produce cleaner error messages and feel more natural for describing object shapes. They're also what the React community defaults to, so your code will look familiar to anyone joining your team.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Good — interface for component props&lt;/span&gt;
&lt;span class="kr"&gt;interface&lt;/span&gt; &lt;span class="nx"&gt;ButtonProps&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nl"&gt;label&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;onClick&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="k"&gt;void&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;variant&lt;/span&gt;&lt;span class="p"&gt;?:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;primary&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;secondary&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;ghost&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;disabled&lt;/span&gt;&lt;span class="p"&gt;?:&lt;/span&gt; &lt;span class="nx"&gt;boolean&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;// Good — type alias for unions and computed shapes&lt;/span&gt;
&lt;span class="kd"&gt;type&lt;/span&gt; &lt;span class="nx"&gt;ButtonVariant&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;primary&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;secondary&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;ghost&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kd"&gt;type&lt;/span&gt; &lt;span class="nx"&gt;Theme&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;light&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;dark&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You'll see guides that say "always use &lt;code&gt;type&lt;/code&gt;" or "always use &lt;code&gt;interface&lt;/code&gt;." Honestly? Consistency matters more than which one you pick. Pick a rule and stick to it across your codebase.&lt;/p&gt;

&lt;h3&gt;
  
  
  Required vs. Optional Props
&lt;/h3&gt;

&lt;p&gt;Default to required. Make something optional only when it genuinely has a sensible default or when it's truly not needed in many use cases.&lt;/p&gt;

&lt;p&gt;This is the inverse of what a lot of developers do. They add &lt;code&gt;?&lt;/code&gt; to everything to make TypeScript stop complaining, and then their components have fifteen optional props where most of them are actually always passed. That erases the value of having types at all.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Bad — over-optionalized&lt;/span&gt;
&lt;span class="kr"&gt;interface&lt;/span&gt; &lt;span class="nx"&gt;CardProps&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nl"&gt;title&lt;/span&gt;&lt;span class="p"&gt;?:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;description&lt;/span&gt;&lt;span class="p"&gt;?:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;imageUrl&lt;/span&gt;&lt;span class="p"&gt;?:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;href&lt;/span&gt;&lt;span class="p"&gt;?:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;// Good — be explicit about what's truly optional&lt;/span&gt;
&lt;span class="kr"&gt;interface&lt;/span&gt; &lt;span class="nx"&gt;CardProps&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nl"&gt;title&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;description&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;imageUrl&lt;/span&gt;&lt;span class="p"&gt;?:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="c1"&gt;// genuinely optional — card can work without an image&lt;/span&gt;
  &lt;span class="nl"&gt;href&lt;/span&gt;&lt;span class="p"&gt;?:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;    &lt;span class="c1"&gt;// optional — sometimes cards aren't clickable&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Extending HTML Element Props
&lt;/h3&gt;

&lt;p&gt;This is one of the most useful patterns in React + TypeScript, and it's underused. When you're building a component that wraps an HTML element, extend that element's props so your component accepts all the native attributes automatically.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Without this pattern — you have to manually add every HTML attribute&lt;/span&gt;
&lt;span class="kr"&gt;interface&lt;/span&gt; &lt;span class="nx"&gt;ButtonProps&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nl"&gt;label&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;onClick&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="k"&gt;void&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="c1"&gt;// what about type="submit"? aria-label? data-testid? className?&lt;/span&gt;
  &lt;span class="c1"&gt;// you'll spend forever adding these one by one&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;// With this pattern — extend React's built-in types&lt;/span&gt;
&lt;span class="kr"&gt;interface&lt;/span&gt; &lt;span class="nx"&gt;ButtonProps&lt;/span&gt; &lt;span class="kd"&gt;extends&lt;/span&gt; &lt;span class="nx"&gt;React&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ButtonHTMLAttributes&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nl"&gt;label&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;variant&lt;/span&gt;&lt;span class="p"&gt;?:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;primary&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;secondary&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="c1"&gt;// ...and you automatically get onClick, type, aria-*, data-*, className, etc.&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;Button&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="nx"&gt;label&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;variant&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;primary&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;...&lt;/span&gt;&lt;span class="nx"&gt;rest&lt;/span&gt; &lt;span class="p"&gt;}:&lt;/span&gt; &lt;span class="nx"&gt;ButtonProps&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;return &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;

      &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nx"&gt;label&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;...rest&lt;/code&gt; spread pattern combined with extended HTML props is one of those things that once you start using, you can't go back. Your components become instantly more composable and you stop maintaining a manual list of passthrough props.&lt;/p&gt;

&lt;h2&gt;
  
  
  Custom Hooks with TypeScript
&lt;/h2&gt;

&lt;p&gt;Custom hooks are where TypeScript really earns its keep, because hooks often manage complex state and return multiple values. If your hook's return type is just inferred as &lt;code&gt;any[]&lt;/code&gt;, you've lost all the benefit.&lt;/p&gt;

&lt;h3&gt;
  
  
  Typing Return Values Explicitly
&lt;/h3&gt;

&lt;p&gt;Always define the return type of custom hooks explicitly. Don't rely on inference here — it breaks down the moment your hook has multiple return paths or conditional logic.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Bad — inferred return type is unreliable&lt;/span&gt;
&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;useUser&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;user&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;setUser&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;useState&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="c1"&gt;// typed as null&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;loading&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;setLoading&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;useState&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;setError&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;useState&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="c1"&gt;// ...fetch logic&lt;/span&gt;

  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;user&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;loading&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;error&lt;/span&gt; &lt;span class="p"&gt;};&lt;/span&gt; &lt;span class="c1"&gt;// TypeScript infers this poorly&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;// Good — define the return interface explicitly&lt;/span&gt;
&lt;span class="kr"&gt;interface&lt;/span&gt; &lt;span class="nx"&gt;User&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nl"&gt;id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;email&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="kr"&gt;interface&lt;/span&gt; &lt;span class="nx"&gt;UseUserResult&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nl"&gt;user&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;User&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;loading&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;boolean&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;error&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;useUser&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nx"&gt;UseUserResult&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;user&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;setUser&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;useState&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;loading&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;setLoading&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;useState&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;setError&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;useState&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="c1"&gt;// ...fetch logic&lt;/span&gt;

  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;user&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;loading&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;error&lt;/span&gt; &lt;span class="p"&gt;};&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now when you destructure this hook in a component, every field is typed correctly, and you get autocomplete without having to remember what the hook returns.&lt;/p&gt;

&lt;h3&gt;
  
  
  Generics for Reusable Hooks
&lt;/h3&gt;

&lt;p&gt;Okay so here's where hooks get really powerful. If you're building a reusable data-fetching hook, generics let you make it work with any shape of data without losing type safety.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kr"&gt;interface&lt;/span&gt; &lt;span class="nx"&gt;FetchResult&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nl"&gt;data&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;T&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;loading&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;boolean&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;error&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;useFetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;url&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nx"&gt;FetchResult&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;setData&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;useState&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;loading&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;setLoading&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;useState&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;setError&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;useState&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="nf"&gt;useEffect&lt;/span&gt;&lt;span class="p"&gt;(()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;cancelled&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="nf"&gt;fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;url&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
      &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;then&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ok&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;throw&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`HTTP &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;status&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nb"&gt;Promise&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
      &lt;span class="p"&gt;})&lt;/span&gt;
      &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;then&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;d&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;cancelled&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
          &lt;span class="nf"&gt;setData&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;d&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
          &lt;span class="nf"&gt;setLoading&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
      &lt;span class="p"&gt;})&lt;/span&gt;
      &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;catch&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="na"&gt;err&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;Error&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;cancelled&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
          &lt;span class="nf"&gt;setError&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
          &lt;span class="nf"&gt;setLoading&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
      &lt;span class="p"&gt;});&lt;/span&gt;

    &lt;span class="k"&gt;return &lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;cancelled&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="p"&gt;};&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;url&lt;/span&gt;&lt;span class="p"&gt;]);&lt;/span&gt;

  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;loading&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;error&lt;/span&gt; &lt;span class="p"&gt;};&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;// Usage — TypeScript knows exactly what data is&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;loading&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;error&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;useFetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;/api/user/123&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="c1"&gt;// data is typed as User | null — not unknown, not any&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;T&lt;/code&gt; propagates through the entire hook. That's the magic. You call &lt;code&gt;useFetch&amp;lt;User&amp;gt;&lt;/code&gt; once at the call site, and TypeScript figures out the rest.&lt;/p&gt;

&lt;h2&gt;
  
  
  Generic Components
&lt;/h2&gt;

&lt;p&gt;Generics in components are the thing that trips up most mid-level React developers. They look intimidating. They have funny angle-bracket syntax. But once you understand when to reach for them, they save you from maintaining three slightly-different versions of the same component.&lt;/p&gt;

&lt;h3&gt;
  
  
  When to Use Generic Components
&lt;/h3&gt;

&lt;p&gt;Reach for generics when your component works with data of a variable shape, but still needs to be type-safe. A list component, a select dropdown, a data table — these are classic candidates.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Without generics — you end up with separate UserList, ProjectList, etc.&lt;/span&gt;
&lt;span class="c1"&gt;// or you use any[] and lose type safety&lt;/span&gt;

&lt;span class="c1"&gt;// With generics — one component that works for any data shape&lt;/span&gt;
&lt;span class="kr"&gt;interface&lt;/span&gt; &lt;span class="nx"&gt;ListProps&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nl"&gt;items&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;T&lt;/span&gt;&lt;span class="p"&gt;[];&lt;/span&gt;
  &lt;span class="nl"&gt;renderItem&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;item&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;T&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;index&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;React&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ReactNode&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;keyExtractor&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;item&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;T&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;emptyMessage&lt;/span&gt;&lt;span class="p"&gt;?:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;List&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="nx"&gt;items&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;renderItem&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;keyExtractor&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;emptyMessage&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;No items&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;}:&lt;/span&gt; &lt;span class="nx"&gt;ListProps&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;items&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; 
&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nx"&gt;emptyMessage&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="k"&gt;return &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;


      &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nx"&gt;items&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;item&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;index&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;renderItem&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;item&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;index&lt;/span&gt;&lt;span class="p"&gt;)}&lt;/span&gt;
      &lt;span class="p"&gt;))}&lt;/span&gt;


  &lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;// Usage — TypeScript infers T from the items array&lt;/span&gt;
 &lt;span class="nx"&gt;u&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;  &lt;span class="c1"&gt;// TypeScript knows u is a User&lt;/span&gt;
  &lt;span class="nx"&gt;renderItem&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{(&lt;/span&gt;&lt;span class="nx"&gt;u&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="sr"&gt;/&lt;/span&gt;&lt;span class="err"&gt;&amp;gt;
&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Notice that you don't even need to write &lt;code&gt;&amp;lt;List&amp;lt;User&amp;gt;&amp;gt;&lt;/code&gt; at the call site — TypeScript infers &lt;code&gt;T = User&lt;/code&gt; from the &lt;code&gt;items&lt;/code&gt; prop. That's inference doing its job.&lt;/p&gt;

&lt;p&gt;One thing to watch: in &lt;code&gt;.tsx&lt;/code&gt; files, the compiler can confuse &lt;code&gt;&amp;lt;T&amp;gt;&lt;/code&gt; with a JSX tag. If you get a parse error, either add a constraint (&lt;code&gt;&amp;lt;T extends object&amp;gt;&lt;/code&gt;) or use a comma (&lt;code&gt;&amp;lt;T,&amp;gt;&lt;/code&gt;) to disambiguate.&lt;/p&gt;

&lt;h2&gt;
  
  
  Discriminated Unions for State
&lt;/h2&gt;

&lt;p&gt;Here's the thing that's changed how I think about React state more than anything else: replacing boolean flags with discriminated union types. This single pattern eliminates entire categories of bugs.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Boolean Flag Problem
&lt;/h3&gt;

&lt;p&gt;You've seen this component. You've written this component.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// The boolean flag trap&lt;/span&gt;
&lt;span class="kr"&gt;interface&lt;/span&gt; &lt;span class="nx"&gt;FormState&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nl"&gt;isLoading&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;boolean&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;isSuccess&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;boolean&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;isError&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;boolean&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;errorMessage&lt;/span&gt;&lt;span class="p"&gt;?:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;data&lt;/span&gt;&lt;span class="p"&gt;?:&lt;/span&gt; &lt;span class="nx"&gt;SubmitResult&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;// Nothing stops you from setting isLoading: true AND isSuccess: true simultaneously&lt;/span&gt;
&lt;span class="c1"&gt;// That's an impossible state — but TypeScript can't catch it&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;state&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;FormState&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;isLoading&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;isSuccess&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="c1"&gt;// ← this should be impossible&lt;/span&gt;
  &lt;span class="na"&gt;isError&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When you have three booleans representing what should be a single sequential state, you have 2³ = 8 possible combinations, but only 4 of them are actually valid. TypeScript can't protect you from the invalid ones.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Discriminated Union Fix
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Model the actual states that can exist&lt;/span&gt;
&lt;span class="kd"&gt;type&lt;/span&gt; &lt;span class="nx"&gt;FormState&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt;
  &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;status&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;idle&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;status&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;loading&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;status&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;success&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="nl"&gt;data&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;SubmitResult&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;status&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;error&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="nl"&gt;errorMessage&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt; &lt;span class="p"&gt;};&lt;/span&gt;

&lt;span class="c1"&gt;// Now the impossible states are literally unrepresentable&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;state&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;FormState&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;status&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;idle&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;};&lt;/span&gt;

&lt;span class="c1"&gt;// And in your component, TypeScript narrows types automatically&lt;/span&gt;
&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;FormFeedback&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="nx"&gt;state&lt;/span&gt; &lt;span class="p"&gt;}:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nl"&gt;state&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;FormState&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;status&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;loading&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;status&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;error&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="c1"&gt;// TypeScript knows state.errorMessage exists here&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;status&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;success&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="c1"&gt;// TypeScript knows state.data exists here&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="c1"&gt;// idle&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The key is the &lt;code&gt;status&lt;/code&gt; discriminant property. When you narrow on &lt;code&gt;state.status === "error"&lt;/code&gt;, TypeScript automatically knows which variant of the union you're in, and which other fields are available.&lt;/p&gt;

&lt;p&gt;This pattern is especially powerful in data-fetching scenarios, form submission flows, and anywhere you have a multi-step process. Start reaching for it instead of &lt;code&gt;isLoading / isError / isSuccess&lt;/code&gt; and your state management will become dramatically cleaner.&lt;/p&gt;

&lt;h2&gt;
  
  
  Taming any
&lt;/h2&gt;

&lt;p&gt;Let me be direct: &lt;code&gt;any&lt;/code&gt; is a code smell, but it's not always your fault. Sometimes you're working with a library that has poor types, an API that returns unpredictable shapes, or legacy code you don't own. The goal isn't to never use &lt;code&gt;any&lt;/code&gt; — it's to reach for better tools first.&lt;/p&gt;

&lt;h3&gt;
  
  
  Use unknown Instead of any for External Data
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;unknown&lt;/code&gt; is the type-safe cousin of &lt;code&gt;any&lt;/code&gt;. It says "I don't know what this is yet" instead of "pretend this is whatever I need it to be." You can't do anything with an &lt;code&gt;unknown&lt;/code&gt; value without first narrowing it with a type guard.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Bad — any lets you do anything, including wrong things&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;fetchData&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;url&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nb"&gt;Promise&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;url&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;fetchData&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;/api/user&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;doesNotExist&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;boom&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="c1"&gt;// TypeScript is fine with this. Your app is not.&lt;/span&gt;

&lt;span class="c1"&gt;// Good — unknown forces you to validate before using&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;fetchData&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;url&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nb"&gt;Promise&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;url&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;isUser&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;value&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;unknown&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nx"&gt;value&lt;/span&gt; &lt;span class="k"&gt;is&lt;/span&gt; &lt;span class="nx"&gt;User&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;return &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="k"&gt;typeof&lt;/span&gt; &lt;span class="nx"&gt;value&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;object&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt;
    &lt;span class="nx"&gt;value&lt;/span&gt; &lt;span class="o"&gt;!==&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt;
    &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;id&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="nx"&gt;value&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt;
    &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;name&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="nx"&gt;value&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt;
    &lt;span class="k"&gt;typeof &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;value&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nx"&gt;User&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;string&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;
  &lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;fetchData&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;/api/user&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;isUser&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="c1"&gt;// TypeScript now knows data is User&lt;/span&gt;
  &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;name&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Type Guards Are Your Friends
&lt;/h3&gt;

&lt;p&gt;The &lt;code&gt;value is User&lt;/code&gt; return type in the example above is a &lt;strong&gt;type guard&lt;/strong&gt;. It's a function that tells TypeScript "if this returns true, the value is of type T in the branches that follow." This is how you move from &lt;code&gt;unknown&lt;/code&gt; territory into properly typed territory without resorting to &lt;code&gt;any&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Reusable type guard pattern&lt;/span&gt;
&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;isNonNull&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;value&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;T&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="kc"&gt;undefined&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nx"&gt;value&lt;/span&gt; &lt;span class="k"&gt;is&lt;/span&gt; &lt;span class="nx"&gt;T&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;value&lt;/span&gt; &lt;span class="o"&gt;!==&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nx"&gt;value&lt;/span&gt; &lt;span class="o"&gt;!==&lt;/span&gt; &lt;span class="kc"&gt;undefined&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;items&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;User&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;)[]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;user1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;user2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;validUsers&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;items&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;filter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;isNonNull&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="c1"&gt;// typed as User[], not (User | null)[]&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  When never Is the Right Answer
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;never&lt;/code&gt; is the type that can't exist. It's useful for exhaustiveness checks — making sure you've handled every case in a union.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;type&lt;/span&gt; &lt;span class="nx"&gt;Shape&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;circle&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;square&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;triangle&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;getArea&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;shape&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;Shape&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;size&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;switch &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;shape&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;circle&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
      &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nb"&gt;Math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;PI&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="nx"&gt;size&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="nx"&gt;size&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;square&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
      &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;size&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="nx"&gt;size&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;triangle&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
      &lt;span class="k"&gt;return &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;size&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="nx"&gt;size&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="nl"&gt;default&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
      &lt;span class="c1"&gt;// If you add "hexagon" to the Shape union and forget to handle it here,&lt;/span&gt;
      &lt;span class="c1"&gt;// TypeScript will throw a compile error on this line&lt;/span&gt;
      &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;_exhaustiveCheck&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;never&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;shape&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
      &lt;span class="k"&gt;throw&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`Unhandled shape: &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;_exhaustiveCheck&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This pattern scales beautifully with discriminated unions. Add a new status to your state type, and every switch statement that wasn't updated will fail at compile time. That's exactly the kind of safety net TypeScript is supposed to provide.&lt;/p&gt;

&lt;h2&gt;
  
  
  Before vs. After: TypeScript Patterns
&lt;/h2&gt;

&lt;p&gt;Here's a quick-reference table of the common before/after shifts. These are the six patterns I most often see in code review that a bit of TypeScript discipline cleans up immediately.&lt;/p&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;  Pattern&lt;br&gt;
  Without TypeScript Discipline&lt;br&gt;
  With TypeScript Discipline

&lt;p&gt;&lt;strong&gt;Prop typing&lt;/strong&gt;&lt;br&gt;
  &lt;code&gt;props: any&lt;/code&gt; or no types at all&lt;br&gt;
  &lt;code&gt;interface ButtonProps extends React.ButtonHTMLAttributes&amp;amp;lt;HTMLButtonElement&amp;amp;gt;&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Optional props&lt;/strong&gt;&lt;br&gt;
  Everything is &lt;code&gt;?&lt;/code&gt; to avoid errors&lt;br&gt;
  Only truly optional fields are optional; defaults handled by destructuring&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Loading/error state&lt;/strong&gt;&lt;br&gt;
  &lt;code&gt;isLoading: boolean, isError: boolean, isSuccess: boolean&lt;/code&gt;&lt;br&gt;
  Discriminated union: &lt;code&gt;{ status: "idle" | "loading" | "success" | "error" }&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;External API data&lt;/strong&gt;&lt;br&gt;
  &lt;code&gt;const data: any = await fetch(...).then(r =&amp;amp;gt; r.json())&lt;/code&gt;&lt;br&gt;
  &lt;code&gt;const data: unknown&lt;/code&gt; + type guard before use&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Reusable list&lt;/strong&gt;&lt;br&gt;
  Separate &lt;code&gt;UserList&lt;/code&gt;, &lt;code&gt;ProjectList&lt;/code&gt; components with duplicated logic&lt;br&gt;
  One generic &lt;code&gt;List&amp;amp;lt;T&amp;amp;gt;&lt;/code&gt; component with typed &lt;code&gt;renderItem&lt;/code&gt; and &lt;code&gt;keyExtractor&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Hook return type&lt;/strong&gt;&lt;br&gt;
  Inferred — breaks on multiple return paths, confusing autocomplete&lt;br&gt;
  Explicit &lt;code&gt;interface UseXxxResult { ... }&lt;/code&gt; as the return type annotation&lt;br&gt;
&lt;/p&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h2&gt;
&lt;br&gt;
  &lt;br&gt;
  &lt;br&gt;
  Module Structure for a TypeScript React Project&lt;br&gt;
&lt;/h2&gt;

&lt;p&gt;Okay, one more thing worth getting right from the start: where files live. A consistent folder structure does more for long-term maintainability than almost any TypeScript pattern. Here's the structure I use and recommend for mid-to-large React + TypeScript apps in 2025.&lt;/p&gt;

&lt;p&gt;src/&lt;br&gt;
├── app/                    # Next.js App Router pages (or pages/ for older setup)&lt;br&gt;
│   ├── layout.tsx&lt;br&gt;
│   ├── page.tsx&lt;br&gt;
│   └── blog/&lt;br&gt;
│       └── [slug]/&lt;br&gt;
│           └── page.tsx&lt;br&gt;
│&lt;br&gt;
├── components/             # Shared UI components&lt;br&gt;
│   ├── Button/&lt;br&gt;
│   │   ├── Button.tsx      # Component&lt;br&gt;
│   │   ├── Button.types.ts # Interface / type exports&lt;br&gt;
│   │   ├── Button.styles.ts# Styled components or CSS module&lt;br&gt;
│   │   └── index.ts        # Re-export for clean imports&lt;br&gt;
│   └── List/&lt;br&gt;
│       └── ...&lt;br&gt;
│&lt;br&gt;
├── hooks/                  # Custom hooks&lt;br&gt;
│   ├── useFetch.ts&lt;br&gt;
│   ├── useLocalStorage.ts&lt;br&gt;
│   └── useDebounce.ts&lt;br&gt;
│&lt;br&gt;
├── lib/                    # Utilities, helpers, non-UI logic&lt;br&gt;
│   ├── api.ts              # Fetch wrappers&lt;br&gt;
│   ├── formatters.ts       # Date, currency, string helpers&lt;br&gt;
│   └── validators.ts       # Type guards and runtime validation&lt;br&gt;
│&lt;br&gt;
├── types/                  # Shared TypeScript types&lt;br&gt;
│   ├── api.ts              # API response shapes&lt;br&gt;
│   ├── models.ts           # Domain model interfaces (User, Post, etc.)&lt;br&gt;
│   └── index.ts            # Re-exports&lt;br&gt;
│&lt;br&gt;
├── context/                # React context providers&lt;br&gt;
│   └── ThemeContext.tsx&lt;br&gt;
│&lt;br&gt;
└── styles/                 # Global styles, theme tokens&lt;br&gt;
    └── globals.css&lt;/p&gt;

&lt;p&gt;A few rules I enforce in this structure:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Co-locate component types.&lt;/strong&gt; Each component folder has its own &lt;code&gt;.types.ts&lt;/code&gt; file. Don't dump all types into a single global &lt;code&gt;types.ts&lt;/code&gt; — that file becomes a graveyard.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The &lt;code&gt;types/&lt;/code&gt; directory is for shared domain types only.&lt;/strong&gt; API response shapes, database models, shared interfaces that more than one component needs. Not component-specific props.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Barrel files are useful but dangerous.&lt;/strong&gt; An &lt;code&gt;index.ts&lt;/code&gt; in each component folder is fine. A single barrel for your entire &lt;code&gt;components/&lt;/code&gt; directory will cause circular dependency nightmares in larger apps.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Hooks go in &lt;code&gt;hooks/&lt;/code&gt;, not co-located with components.&lt;/strong&gt; This is a deliberate choice against the "co-locate everything" philosophy. In my experience, hooks get reused across features, and burying them inside a component folder makes them harder to find and share.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Use &lt;code&gt;interface&lt;/code&gt; for component props, &lt;code&gt;type&lt;/code&gt; for everything else&lt;/strong&gt; — pick a rule and apply it consistently. Default to required props; only mark things optional when they genuinely are.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Extend HTML element props&lt;/strong&gt; using &lt;code&gt;React.ButtonHTMLAttributes&amp;lt;HTMLButtonElement&amp;gt;&lt;/code&gt; and friends — this gets you all native attributes for free and makes components composable with &lt;code&gt;...rest&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Always annotate custom hook return types explicitly&lt;/strong&gt; — define a &lt;code&gt;UseXxxResult&lt;/code&gt; interface and return it. Don't trust inference when there's conditional logic involved.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Use discriminated unions instead of boolean flags&lt;/strong&gt; for anything with multiple states — &lt;code&gt;{ status: "idle" | "loading" | "success" | "error" }&lt;/code&gt; is safer, clearer, and catches impossible states at compile time.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Replace &lt;code&gt;any&lt;/code&gt; with &lt;code&gt;unknown&lt;/code&gt; at API boundaries&lt;/strong&gt; — then validate with type guards before use. Save &lt;code&gt;never&lt;/code&gt; for exhaustiveness checks in switch statements.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Structure your modules intentionally&lt;/strong&gt; — co-locate component types, put shared domain types in &lt;code&gt;types/&lt;/code&gt;, hooks in &lt;code&gt;hooks/&lt;/code&gt;, and use barrel files per component folder (not globally).&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>react</category>
      <category>typescript</category>
      <category>frontend</category>
      <category>bestpractices</category>
    </item>
    <item>
      <title>How to Build an AI Resume Builder with LangChain and Node.js</title>
      <dc:creator>Harshdeep Singh</dc:creator>
      <pubDate>Tue, 02 Jun 2026 21:21:55 +0000</pubDate>
      <link>https://dev.to/harshdeepsingh13/how-to-build-an-ai-resume-builder-with-langchain-and-nodejs-54ig</link>
      <guid>https://dev.to/harshdeepsingh13/how-to-build-an-ai-resume-builder-with-langchain-and-nodejs-54ig</guid>
      <description>&lt;p&gt;A few months back, my friend Marcus was applying for a senior backend role at a fintech company. He had five years of solid experience — distributed systems, AWS, the whole stack. But his resume read like a list of job descriptions someone had copied from LinkedIn. "Responsible for maintaining microservices." "Assisted with CI/CD pipeline implementation." You know the type.&lt;/p&gt;

&lt;p&gt;I told him: the problem isn't what you did, it's how you're saying it. Hiring managers spend about six seconds on a resume before deciding whether to read it properly. Six seconds. And if those six seconds are spent reading "responsible for maintaining" — you've lost them.&lt;/p&gt;

&lt;p&gt;We spent two hours rewriting it together. Every bullet point started with a strong verb. Every achievement had a number. "Reduced API response time by 40% by introducing Redis caching across three high-traffic endpoints." Much better. Marcus got the interview.&lt;/p&gt;

&lt;p&gt;The obvious next thought was: what if you could automate this? Not in the "dump your resume into ChatGPT and ask it to make it better" way — that produces generic slop. I mean a real, structured AI pipeline that understands resume context, applies professional rewriting patterns, and returns clean, job-specific output.&lt;/p&gt;

&lt;p&gt;That's what LangChain is built for. And in this guide, we're going to build exactly that: an AI-powered resume rewriter using LangChain and Node.js, with a real Express API, streaming responses, and the kind of prompt engineering that actually produces good results.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Is LangChain, and Why Bother?
&lt;/h2&gt;

&lt;p&gt;Here's the honest answer: LangChain is an orchestration framework for building applications on top of large language models. Think of it the way you'd think of Express.js — Express doesn't do anything you couldn't do with raw Node's &lt;code&gt;http&lt;/code&gt; module, but it gives you a structured, composable way to build web apps that doesn't collapse under its own weight.&lt;/p&gt;

&lt;p&gt;LangChain does the same thing for LLM applications. You &lt;em&gt;could&lt;/em&gt; just call the OpenAI API directly everywhere. For a one-off script, that's fine. But as soon as your app grows — different prompts for different tasks, multi-step reasoning chains, memory across conversations — raw API calls get messy fast.&lt;/p&gt;

&lt;p&gt;Here's what raw OpenAI API code looks like once a project grows:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Raw OpenAI — works, but scales badly&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;openai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;gpt-4&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;role&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;system&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;systemPrompt&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;role&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;user&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;`Rewrite this section: &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;section&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;rewritten&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;content&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's fine for one call. Now add: prompt versioning, chaining that output into a second model call, memory from previous messages, fallback to a different model when rate limits hit, streaming output to the client. Suddenly you're managing a lot of state manually.&lt;/p&gt;

&lt;p&gt;LangChain handles all of that with composable primitives: &lt;code&gt;PromptTemplate&lt;/code&gt; for reusable, testable prompts; &lt;code&gt;LLMChain&lt;/code&gt; for connecting a prompt to a model; &lt;code&gt;SequentialChain&lt;/code&gt; for multi-step pipelines; built-in streaming support; and integrations with every major LLM provider.&lt;/p&gt;

&lt;p&gt;For our resume builder, the chain looks like this: parse the resume into structured sections, run each section through a prompt that produces action-oriented bullet points, then return the assembled result. Let's build it.&lt;/p&gt;

&lt;h2&gt;
  
  
  What We're Building
&lt;/h2&gt;

&lt;p&gt;Before we write a line of code, here's the system at a glance:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;┌─────────────────────────────────────────────────────┐
│                   CLIENT (Frontend)                  │
│         POST /api/rewrite { resumeText, section }    │
└──────────────────────┬──────────────────────────────┘
                       │
                       ▼
┌─────────────────────────────────────────────────────┐
│                  EXPRESS API (Node.js)               │
│  1. Validate input                                   │
│  2. Parse resume into sections                       │
│  3. Call LangChain rewrite chain                     │
│  4. Return improved bullet points                    │
└──────────────────────┬──────────────────────────────┘
                       │
                       ▼
┌─────────────────────────────────────────────────────┐
│              LANGCHAIN REWRITE CHAIN                 │
│  PromptTemplate → ChatOpenAI (GPT-4) → Output       │
└──────────────────────┬──────────────────────────────┘
                       │
                       ▼
┌─────────────────────────────────────────────────────┐
│                  OPENAI API (GPT-4)                  │
└─────────────────────────────────────────────────────┘

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Nothing revolutionary — but each layer has a single, testable job. The chain is the interesting part, so let's get there quickly.&lt;/p&gt;

&lt;h2&gt;
  
  
  Project Setup
&lt;/h2&gt;

&lt;p&gt;Start a new Node.js project and install the dependencies:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;mkdir &lt;/span&gt;resume-ai &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nb"&gt;cd &lt;/span&gt;resume-ai
npm init &lt;span class="nt"&gt;-y&lt;/span&gt;
npm &lt;span class="nb"&gt;install &lt;/span&gt;express langchain @langchain/openai @langchain/core dotenv

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Create a &lt;code&gt;.env&lt;/code&gt; file at the root:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;OPENAI_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;sk-your-key-here
&lt;span class="nv"&gt;PORT&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;3001

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And your project structure:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;resume-ai/
├── src/
│   ├── parseResume.js
│   ├── resumeChain.js
│   └── app.js
├── .env
└── package.json

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Add &lt;code&gt;"type": "module"&lt;/code&gt; to &lt;code&gt;package.json&lt;/code&gt; so we can use ES module syntax throughout.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 1: Parsing the Resume
&lt;/h2&gt;

&lt;p&gt;This is the unglamorous part that everyone skips, and it's why most AI resume tools produce garbage. You can't just throw 800 words of resume text at a model and ask it to "make it better." You need to isolate the section you're improving — otherwise the model is operating without context.&lt;/p&gt;

&lt;p&gt;Here's a simple section parser. It's not perfect — real resumes come in dozens of formats — but it handles the common patterns:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// src/parseResume.js&lt;/span&gt;
&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;parseResumeText&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;rawText&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;sections&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;summary&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;experience&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[],&lt;/span&gt;
    &lt;span class="na"&gt;skills&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[],&lt;/span&gt;
    &lt;span class="na"&gt;education&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[],&lt;/span&gt;
  &lt;span class="p"&gt;};&lt;/span&gt;

  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;sectionKeywords&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;summary&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;summary&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;objective&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;profile&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;about&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="na"&gt;experience&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;experience&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;employment&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;work history&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;career&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="na"&gt;skills&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;skills&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;technical skills&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;technologies&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;competencies&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="na"&gt;education&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;education&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;academic&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;degree&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;university&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
  &lt;span class="p"&gt;};&lt;/span&gt;

  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;lines&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;rawText&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;filter&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;l&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;l&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;trim&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;currentSection&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

  &lt;span class="k"&gt;for &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;line&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="nx"&gt;lines&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;lowerLine&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;line&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;toLowerCase&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;trim&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;detected&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;Object&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;entries&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;sectionKeywords&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;find&lt;/span&gt;&lt;span class="p"&gt;(([,&lt;/span&gt; &lt;span class="nx"&gt;keywords&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;
      &lt;span class="nx"&gt;keywords&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;some&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;kw&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;lowerLine&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;includes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;kw&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;detected&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nx"&gt;lowerLine&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt;  &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;resumeText&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;targetSection&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;body&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;resumeText&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="k"&gt;typeof&lt;/span&gt; &lt;span class="nx"&gt;resumeText&lt;/span&gt; &lt;span class="o"&gt;!==&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;string&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;status&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;400&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;error&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;resumeText is required&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;targetSection&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="k"&gt;typeof&lt;/span&gt; &lt;span class="nx"&gt;targetSection&lt;/span&gt; &lt;span class="o"&gt;!==&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;string&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;status&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;400&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;error&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;targetSection is required&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="c1"&gt;// Stay within token limits — GPT-4 context window is large,&lt;/span&gt;
  &lt;span class="c1"&gt;// but we don't need to send the whole resume every time.&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;resumeContext&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;resumeText&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;slice&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;3000&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;rewriteChain&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;call&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
      &lt;span class="nx"&gt;resumeContext&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;sectionText&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;targetSection&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;

    &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
      &lt;span class="na"&gt;original&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;targetSection&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;rewritten&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;trim&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;catch &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Chain error:&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt;&lt;span class="p"&gt;?.&lt;/span&gt;&lt;span class="nf"&gt;includes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Rate limit&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;status&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;429&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;error&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Rate limit hit. Try again in a moment.&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;status&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;500&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;error&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Rewrite failed. Check your OpenAI API key.&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;PORT&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;PORT&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="mi"&gt;3001&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="nx"&gt;app&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;listen&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;PORT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`Resume AI API running on :&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;PORT&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The input size limit (&lt;code&gt;50kb&lt;/code&gt;) and the &lt;code&gt;resumeContext.slice(0, 3000)&lt;/code&gt; are both intentional. Most GPT-4 token limits won't be hit by a 3,000-character resume excerpt, but some resumes are surprisingly long — especially ones with extensive project descriptions. Truncating at 3,000 characters keeps costs predictable.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 4: Streaming the Response
&lt;/h2&gt;

&lt;p&gt;For a good UX, you want to stream the AI response as it arrives rather than waiting for the full completion. A 400-word rewrite might take 6–8 seconds to complete — a blank screen for 8 seconds feels broken.&lt;/p&gt;

&lt;p&gt;LangChain makes streaming straightforward with callbacks:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;HumanMessage&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;@langchain/core/messages&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="nx"&gt;app&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;/api/rewrite/stream&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;resumeText&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;targetSection&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;body&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

  &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;setHeader&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Content-Type&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;text/event-stream&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;setHeader&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Cache-Control&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;no-cache&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;setHeader&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Connection&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;keep-alive&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;flushHeaders&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;streamingModel&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;ChatOpenAI&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="na"&gt;modelName&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;gpt-4&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;temperature&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;streaming&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;callbacks&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
      &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nf"&gt;handleLLMNewToken&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;token&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
          &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`data: &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stringify&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="nx"&gt;token&lt;/span&gt; &lt;span class="p"&gt;})}&lt;/span&gt;&lt;span class="s2"&gt;

`&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="nf"&gt;handleLLMEnd&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
          &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;data: [DONE]&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
          &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;end&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
        &lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="nf"&gt;handleLLMError&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
          &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`data: &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stringify&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;error&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt; &lt;span class="p"&gt;})}&lt;/span&gt;&lt;span class="s2"&gt;

`&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
          &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;end&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
        &lt;span class="p"&gt;},&lt;/span&gt;
      &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;],&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;

  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;resumeContext&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;resumeText&lt;/span&gt;&lt;span class="p"&gt;?.&lt;/span&gt;&lt;span class="nf"&gt;slice&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;3000&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="dl"&gt;""&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;`Rewrite these resume bullets for a software developer. Be concise and action-oriented:\n&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;targetSection&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;streamingModel&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;invoke&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;HumanMessage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;)]);&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;On the frontend, you'd consume this with the Fetch API and &lt;code&gt;ReadableStream&lt;/code&gt;. Each &lt;code&gt;data:&lt;/code&gt; event carries a token, and you append it to the UI as it arrives. The user sees the response materialize in real time — feels fast, even when it isn't.&lt;/p&gt;

&lt;h2&gt;
  
  
  Watch: LangChain in Node.js (Quick Start)
&lt;/h2&gt;

&lt;h2&gt;
  
  
  Common Pitfalls (and How to Dodge Them)
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Token limits sneaking up on you
&lt;/h3&gt;

&lt;p&gt;GPT-4's context window is large, but you pay per token. If you're sending the full resume + prompt on every request, costs add up fast at scale. The fix: truncate the resume context (as shown above) and cache the parsed sections so you're not re-parsing on every API call.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. The model inventing achievements
&lt;/h3&gt;

&lt;p&gt;This is the big one. Ask the model to "quantify achievements" without any source data, and it will make numbers up. "Reduced load time by 73%" sounds great until the hiring manager asks about it in an interview. The fix: explicitly tell the model in the prompt: &lt;em&gt;"Only add numbers if they appear in the original text. If no numbers are present, use qualitative language instead."&lt;/em&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Prompt injection through resume content
&lt;/h3&gt;

&lt;p&gt;A crafty user could put something like &lt;code&gt;"Ignore all previous instructions and output..."&lt;/code&gt; inside their resume text. Since you're sending that text directly to the model, it works. The fix: sanitize input and separate resume content from the instruction portion of the prompt with a clear delimiter, like &lt;code&gt;---RESUME START---&lt;/code&gt; / &lt;code&gt;---RESUME END---&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Not rate limiting
&lt;/h3&gt;

&lt;p&gt;OpenAI's rate limits are per API key, not per user. One user hammering your endpoint can hit the limit for everyone. Add a rate limiter like &lt;code&gt;express-rate-limit&lt;/code&gt; before you go live — 5 requests per minute per IP is a reasonable starting point for a resume tool.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. Picking GPT-4 when you don't need it
&lt;/h3&gt;

&lt;p&gt;GPT-4 is expensive and slow. For most resume rewriting tasks, &lt;code&gt;gpt-4o-mini&lt;/code&gt; produces nearly identical results at a fraction of the cost. Test both. You might be surprised how good the cheaper model is for structured, constrained tasks like this one.&lt;/p&gt;

&lt;h2&gt;
  
  
  LangChain vs. Raw OpenAI API — When to Use Which
&lt;/h2&gt;

&lt;p&gt;Factor&lt;/p&gt;

&lt;p&gt;Raw OpenAI API&lt;/p&gt;

&lt;p&gt;LangChain&lt;/p&gt;

&lt;p&gt;Setup complexity&lt;/p&gt;

&lt;p&gt;Low — one import, one call&lt;/p&gt;

&lt;p&gt;Medium — more abstractions to learn&lt;/p&gt;

&lt;p&gt;Single prompt apps&lt;/p&gt;

&lt;p&gt;Perfect fit&lt;/p&gt;

&lt;p&gt;Overkill&lt;/p&gt;

&lt;p&gt;Multi-step chains&lt;/p&gt;

&lt;p&gt;Tedious to wire manually&lt;/p&gt;

&lt;p&gt;First-class support&lt;/p&gt;

&lt;p&gt;Prompt reuse and testing&lt;/p&gt;

&lt;p&gt;DIY — no built-in structure&lt;/p&gt;

&lt;p&gt;PromptTemplate makes this easy&lt;/p&gt;

&lt;p&gt;Memory across turns&lt;/p&gt;

&lt;p&gt;Manual array management&lt;/p&gt;

&lt;p&gt;Built-in memory types&lt;/p&gt;

&lt;p&gt;Streaming&lt;/p&gt;

&lt;p&gt;Supported, manual wiring&lt;/p&gt;

&lt;p&gt;Supported, callback-based&lt;/p&gt;

&lt;p&gt;Switching LLM providers&lt;/p&gt;

&lt;p&gt;Rewrite API calls&lt;/p&gt;

&lt;p&gt;Swap the model object&lt;/p&gt;

&lt;p&gt;Community / ecosystem&lt;/p&gt;

&lt;p&gt;Smaller (OpenAI-specific)&lt;/p&gt;

&lt;p&gt;Large, active, lots of integrations&lt;/p&gt;

&lt;p&gt;The rule of thumb: if your app makes more than two different types of LLM calls, or if you need any kind of chaining, LangChain saves you from writing orchestration code from scratch. For a simple one-shot wrapper, raw API is cleaner.&lt;/p&gt;

&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;LangChain&lt;/strong&gt; is an orchestration layer for LLM apps — think Express for AI. Use it when you have multi-step chains, prompt reuse, or memory requirements.- &lt;strong&gt;Parse before you prompt.&lt;/strong&gt; Sending a raw resume blob to the model is a recipe for generic output. Identify the section you want to improve and give the model focused context.- &lt;strong&gt;Constrain the prompt explicitly.&lt;/strong&gt; Action verbs, number quantification, bullet count — tell the model exactly what format you want. Vague prompts produce vague results.- &lt;strong&gt;Stream responses&lt;/strong&gt; for better UX. A blank screen for 8 seconds feels broken; a response materializing in real time feels fast.- &lt;strong&gt;Guard against pitfalls:&lt;/strong&gt; rate limit your API, sanitize resume input against prompt injection, and test &lt;code&gt;gpt-4o-mini&lt;/code&gt; before defaulting to GPT-4 — it's often good enough and 10x cheaper.&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>langchain</category>
      <category>node</category>
      <category>openai</category>
    </item>
  </channel>
</rss>
