<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Ademola Balogun</title>
    <description>The latest articles on DEV Community by Ademola Balogun (@ambalogun).</description>
    <link>https://dev.to/ambalogun</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3558201%2Fe74a2c91-807f-49b7-85dd-8861a782bba0.jpg</url>
      <title>DEV Community: Ademola Balogun</title>
      <link>https://dev.to/ambalogun</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/ambalogun"/>
    <language>en</language>
    <item>
      <title>Why RAG Is Failing at Complex Questions (And How Knowledge Graphs Fix It)</title>
      <dc:creator>Ademola Balogun</dc:creator>
      <pubDate>Sat, 21 Mar 2026 10:53:24 +0000</pubDate>
      <link>https://dev.to/ambalogun/why-rag-is-failing-at-complex-questions-and-how-knowledge-graphs-fix-it-5chj</link>
      <guid>https://dev.to/ambalogun/why-rag-is-failing-at-complex-questions-and-how-knowledge-graphs-fix-it-5chj</guid>
      <description>&lt;p&gt;Retrieval-Augmented Generation solved the hallucination problem. Then everyone discovered it can't actually answer hard questions.&lt;br&gt;
The issue isn't the LLM. It's not even the retrieval mechanism. It's that traditional RAG treats your knowledge base like a bag of disconnected sentences, when the information you need is buried in relationships spanning multiple documents.&lt;br&gt;
GraphRAG is the architecture that's quietly becoming the answer to RAG's biggest limitation.&lt;br&gt;
The Multi-Hop Problem&lt;br&gt;
Here's a question that breaks standard RAG: "What scientific work influenced the mentor of the person who discovered the double helix structure of DNA?"&lt;br&gt;
A traditional RAG system would:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Search for "double helix structure DNA discovery"&lt;/li&gt;
&lt;li&gt;Find chunks mentioning Watson and Crick&lt;/li&gt;
&lt;li&gt;Maybe find something about their mentors&lt;/li&gt;
&lt;li&gt;Fail to connect the dots about who influenced those mentors&lt;/li&gt;
&lt;li&gt;Generate a vague or incorrect answer&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The problem? This requires connecting information across three hops. First, Watson and Crick discovered the double helix. Second, who was their mentor? Third, what work influenced that mentor?&lt;br&gt;
Each piece of information lives in different documents. Traditional vector similarity search retrieves semantically similar text, but semantic similarity doesn't map to logical inference chains.&lt;br&gt;
How Standard RAG Actually Works (And Why It Breaks)&lt;br&gt;
Traditional RAG follows a simple pattern:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Chunk your documents into 500-1000 word segments&lt;/li&gt;
&lt;li&gt;Embed each chunk into a vector representation&lt;/li&gt;
&lt;li&gt;Store vectors in a database (Pinecone, Qdrant, Weaviate)&lt;/li&gt;
&lt;li&gt;Query comes in → embed it the same way&lt;/li&gt;
&lt;li&gt;Retrieve top-k similar chunks using cosine similarity&lt;/li&gt;
&lt;li&gt;Feed chunks to LLM as context&lt;/li&gt;
&lt;li&gt;Generate answer from context&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This works beautifully when your question maps directly to content in a single chunk. It falls apart when the answer requires synthesizing information from multiple chunks, when relationships between entities matter more than semantic similarity, when reasoning chains involve indirect connections, or when domain-specific logic needs to be preserved.&lt;br&gt;
Research shows baseline RAG struggles with comprehensiveness (how much of the answer you capture) and diversity (variety of perspectives) on complex queries. Microsoft's experiments found traditional RAG captures only 22-32% of comprehensive answers on multi-hop questions.&lt;br&gt;
What Knowledge Graphs Bring to the Table&lt;br&gt;
Knowledge graphs represent information as nodes (entities) and edges (relationships).&lt;br&gt;
Instead of:&lt;br&gt;
"Albert Einstein developed the theory of relativity in 1915"&lt;br&gt;
"Einstein worked at the Swiss Patent Office before his breakthrough"&lt;br&gt;
"The theory revolutionized physics"&lt;br&gt;
You get:&lt;br&gt;
(Einstein) --[DEVELOPED]--&amp;gt; (Theory of Relativity)&lt;br&gt;
(Einstein) --[WORKED_AT]--&amp;gt; (Swiss Patent Office)&lt;br&gt;
(Theory of Relativity) --[YEAR]--&amp;gt; (1915)&lt;br&gt;
(Theory of Relativity) --[IMPACT]--&amp;gt; (Physics)&lt;br&gt;
(Swiss Patent Office) --[PRECEDED]--&amp;gt; (Breakthrough)&lt;br&gt;
Now multi-hop queries become graph traversal problems. "What organization did the physicist who developed relativity theory work at before his breakthrough?" translates to walking the graph:&lt;/p&gt;

&lt;p&gt;Find node: Theory of Relativity&lt;br&gt;
Follow edge: DEVELOPED_BY → Einstein&lt;br&gt;
Follow edge: WORKED_AT → Swiss Patent Office&lt;br&gt;
Filter by: PRECEDED → Breakthrough&lt;/p&gt;

&lt;p&gt;The graph structure preserves logical relationships that vector embeddings lose.&lt;br&gt;
The GraphRAG Architecture&lt;br&gt;
GraphRAG combines vector databases with knowledge graphs in a dual-retrieval system.&lt;br&gt;
Indexing Phase:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Text segmentation - Break documents into analyzable units (paragraphs, sections)&lt;/li&gt;
&lt;li&gt;Entity extraction - Use NER to identify entities (people, places, concepts, organizations)&lt;/li&gt;
&lt;li&gt;Relation extraction - Identify relationships between entities&lt;/li&gt;
&lt;li&gt;Graph construction - Build knowledge graph with entities as nodes, relations as edges&lt;/li&gt;
&lt;li&gt;Community detection - Cluster related nodes into hierarchical communities&lt;/li&gt;
&lt;li&gt;Summary generation - Create summaries at different graph levels (local, global)&lt;/li&gt;
&lt;li&gt;Dual indexing - Store both graph structure AND vector embeddings&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Query Phase:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Query processing - Extract entities and intent from user question&lt;/li&gt;
&lt;li&gt;Graph traversal - Use graph queries (Cypher, SPARQL) to find relevant subgraphs&lt;/li&gt;
&lt;li&gt;Vector retrieval - Simultaneously retrieve semantically similar chunks&lt;/li&gt;
&lt;li&gt;Context fusion - Combine graph paths and vector results&lt;/li&gt;
&lt;li&gt;Augmented generation - LLM generates answer using both sources&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The key insight: graph traversal finds structurally relevant information, vector search finds semantically relevant information. Together they catch what either misses alone.&lt;br&gt;
Real Performance Gains&lt;br&gt;
Microsoft's research on query-focused summarization shows GraphRAG massively outperforming baseline RAG. On comprehensiveness (how much of the complete answer is captured), GraphRAG scored 72 to 83% while baseline RAG only managed 22 to 32%. For diversity (variety of relevant perspectives included), GraphRAG hit 62 to 82% compared to baseline's 18 to 28%.&lt;br&gt;
Perhaps most impressive: GraphRAG used 97% fewer tokens for root-level summaries by precomputing community summaries in the graph.&lt;br&gt;
In multi-hop reasoning benchmarks, GraphRAG consistently outperforms traditional RAG. On the HotpotQA dataset, it shows 15 to 20% improvement in exact match accuracy. For SQuAD 2.0, it handles unanswerable questions better. In manufacturing QA scenarios, it delivers a 25% improvement in domain-specific queries.&lt;br&gt;
The Technical Challenges Nobody Talks About&lt;br&gt;
Building production GraphRAG isn't straightforward. Here are the real problems:&lt;br&gt;
Entity Resolution Is Hard&lt;br&gt;
Your documents mention "Einstein," "A. Einstein," "Albert Einstein," and "the physicist." The graph needs to know these reference the same entity.&lt;br&gt;
Entity resolution requires string matching algorithms (fuzzy matching, edit distance), contextual disambiguation (different people with same name), cross-document coreference resolution, and domain-specific dictionaries.&lt;br&gt;
Get this wrong and your graph fragments into disconnected pieces that should be unified.&lt;br&gt;
Relation Extraction Isn't Reliable&lt;br&gt;
Off-the-shelf NER models extract noisy relations. A sentence like "Einstein's theory, which was influenced by Maxwell's work, revolutionized physics" might correctly extract that Einstein authored the theory and the theory revolutionized physics, but incorrectly attribute Maxwell's influence to the theory instead of to Einstein.&lt;br&gt;
Fixing this requires domain-specific training data or rule-based post-processing.&lt;br&gt;
Graph Size Explodes Quickly&lt;br&gt;
A 1000-document corpus can generate 50,000+ entities and 200,000+ relationships. Querying this efficiently requires proper indexing (Neo4j and ArangoDB handle this well), subgraph sampling (don't traverse the entire graph), community-based hierarchies (Microsoft's approach), and caching frequent query paths.&lt;br&gt;
Without optimization, query times balloon to 10+ seconds.&lt;br&gt;
Hybrid Retrieval Is Complex&lt;br&gt;
Combining graph results with vector results isn't trivial. You need to normalize relevance scores from different sources, handle conflicts when sources disagree, decide weighting (70% graph, 30% vector? Depends on query type), and rerank combined results before sending to LLM.&lt;br&gt;
Most implementations use a simple concatenation, which works but leaves performance on the table.&lt;br&gt;
When GraphRAG Actually Matters&lt;br&gt;
GraphRAG isn't always better than standard RAG. Use it when multi-hop reasoning is required, like questions such as "What university did the inventor of the technology that powers electric cars attend?" It shines in domains where relationships are complex: medical diagnosis (symptoms to conditions to treatments), legal analysis (cases to precedents to statutes), manufacturing (components to failures to causes). It's also valuable when hierarchical summarization is needed ("Summarize all security incidents across departments last quarter") or when factual accuracy is critical in areas like financial compliance, medical information, and legal advice where hallucinations have real consequences.&lt;br&gt;
Don't use GraphRAG for simple lookups ("What is the capital of France?"), when semantic similarity is sufficient ("Find articles similar to this one"), when your corpus has no relational structure, or when you need the lowest possible latency since graph traversal adds overhead.&lt;br&gt;
Implementation Patterns That Work&lt;br&gt;
Start with hybrid before going full GraphRAG. Augment your existing RAG with simple entity linking: extract named entities from chunks, link entities across chunks, and use entity co-occurrence as an additional signal. This requires minimal infrastructure changes but improves results 10 to 15%.&lt;br&gt;
Build domain-specific ontologies. Generic knowledge graphs underperform domain-specific ones. For medical RAG, use medical ontologies (SNOMED, ICD). For legal, use citation graphs. The domain structure matters more than the technology.&lt;br&gt;
Precompute subgraphs for common query patterns. If 80% of queries follow 3 or 4 patterns, precompute and cache those subgraphs. Query time drops from 8 seconds to 800ms.&lt;br&gt;
Use graph embeddings for hybrid ranking. Convert graph paths to embeddings (Node2Vec, GraphSAGE), then combine with text embeddings for unified similarity scoring.&lt;br&gt;
The Future: Multi-Modal Graph RAG&lt;br&gt;
The next evolution adds images, videos, and audio to the graph.&lt;br&gt;
Imagine querying: "Show me product demos where the presenter mentioned reliability issues"&lt;br&gt;
The graph connects:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;(Video) --[PRESENTER]--&amp;gt; (Person)&lt;/li&gt;
&lt;li&gt;(Person) --[MENTIONED]--&amp;gt; (Reliability)&lt;/li&gt;
&lt;li&gt;(Reliability) --[CONTEXT]--&amp;gt; (Timestamp)&lt;/li&gt;
&lt;li&gt;(Timestamp) --[IN]--&amp;gt; (Video Segment)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Each modality (transcript, visual frames, speaker identification) becomes nodes in a unified graph. Retrieval works across modalities simultaneously.&lt;br&gt;
Early research shows multi-modal GraphRAG achieving 40-50% better results on tasks requiring cross-modal reasoning (finding specific moments in videos, connecting spoken words to visual content).&lt;br&gt;
Why This Matters Now&lt;br&gt;
GraphRAG is moving from research to production. Companies deploying it are seeing real results. In financial services, there's a 30% reduction in analyst query time for complex compliance questions. Healthcare organizations see improved clinical decision support by connecting symptoms across patient records. Manufacturing companies get faster root cause analysis by linking failures to component relationships. Legal firms have better case law research connecting precedents through reasoning chains.&lt;br&gt;
The technology works. The challenge is implementation complexity. You need expertise in graph databases (Neo4j, Neptune, ArangoDB), vector databases (Pinecone, Qdrant, Milvus), NLP pipelines (SpaCy, Hugging Face), graph algorithms (community detection, path finding), and LLM integration.&lt;br&gt;
That's a heavier lift than vanilla RAG. But for complex domains where standard RAG fails, GraphRAG isn't optional—it's the only architecture that works.&lt;br&gt;
The Bottom Line&lt;br&gt;
RAG revolutionized how LLMs access external knowledge. But it was designed for simple retrieval, not complex reasoning.&lt;br&gt;
GraphRAG fixes this by treating your knowledge base as what it actually is: a web of interconnected concepts, not a pile of disconnected chunks.&lt;br&gt;
The performance gains on complex queries aren't incremental. They're 2 to 3x improvements. The implementation complexity isn't trivial, but it's manageable with the right architecture.&lt;br&gt;
If your RAG system is failing on multi-hop questions, relationship-heavy domains, or hierarchical summarization tasks, the problem isn't your embeddings or your LLM. It's that you're using the wrong retrieval architecture. GraphRAG is how you fix it.&lt;/p&gt;

</description>
      <category>rag</category>
      <category>ai</category>
    </item>
    <item>
      <title>I Tried to Trick 7 AI Models with Fake Facts. They Didn't Fall for It. (That's a Problem.)</title>
      <dc:creator>Ademola Balogun</dc:creator>
      <pubDate>Tue, 17 Feb 2026 22:12:10 +0000</pubDate>
      <link>https://dev.to/ambalogun/i-tried-to-trick-7-ai-models-with-fake-facts-they-didnt-fall-for-it-thats-a-problem-58m8</link>
      <guid>https://dev.to/ambalogun/i-tried-to-trick-7-ai-models-with-fake-facts-they-didnt-fall-for-it-thats-a-problem-58m8</guid>
      <description>&lt;p&gt;I spent a weekend testing whether large language models would confidently repeat misinformation back to me. I fed them 20 fake historical facts alongside 20 real ones and waited for the inevitable hallucinations.&lt;/p&gt;

&lt;p&gt;They never came.&lt;/p&gt;

&lt;p&gt;Not a single model - across seven different architectures from various providers - accepted even one fabricated fact as true. Zero hallucinations. Clean sweep.&lt;/p&gt;

&lt;p&gt;My first reaction was relief. These models are smarter than I thought, right?&lt;/p&gt;

&lt;p&gt;Then I looked closer at the data and realized something more concerning: &lt;strong&gt;the models weren't being smart. They were being paranoid.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The Experiment
&lt;/h2&gt;

&lt;p&gt;I built a simple benchmark with 40 factual statements:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;20 fake facts&lt;/strong&gt;: "Marie Curie won a Nobel Prize in Mathematics," "The Titanic successfully completed its maiden voyage," "World War I ended in 1925"&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;20 real facts&lt;/strong&gt;: "The Berlin Wall fell in 1989," "The Wright brothers achieved powered flight in 1903," "The Soviet Union dissolved in 1991"&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I tested seven models available through Together AI's API:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Llama-4-Maverick (17B)&lt;/li&gt;
&lt;li&gt;GPT-OSS (120B)&lt;/li&gt;
&lt;li&gt;Qwen3-Next (80B)&lt;/li&gt;
&lt;li&gt;Kimi-K2.5&lt;/li&gt;
&lt;li&gt;GLM-5&lt;/li&gt;
&lt;li&gt;Mixtral (8x7B)&lt;/li&gt;
&lt;li&gt;Mistral-Small (24B)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Each model received the same prompt: verify the statement and respond with a verdict (true/false), confidence level (low/medium/high), and brief explanation. Temperature was set to 0 for consistency.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Results: Perfect... Suspiciously Perfect
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyb92e2h9bnxd8z1ii3nq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyb92e2h9bnxd8z1ii3nq.png" alt=" " width="800" height="401"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Five models scored 100% accuracy. The other two? 97.5% and 95%.&lt;/p&gt;

&lt;p&gt;At first glance, this looks incredible. But here's what actually happened:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Accuracy&lt;/th&gt;
&lt;th&gt;Hallucinations&lt;/th&gt;
&lt;th&gt;False Negatives&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Llama-4-Maverick&lt;/td&gt;
&lt;td&gt;100%&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GPT-OSS-120B&lt;/td&gt;
&lt;td&gt;100%&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Qwen3-Next&lt;/td&gt;
&lt;td&gt;100%&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Kimi-K2.5&lt;/td&gt;
&lt;td&gt;100%&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GLM-5&lt;/td&gt;
&lt;td&gt;100%&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Mixtral-8x7B&lt;/td&gt;
&lt;td&gt;97.5%&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Mistral-Small&lt;/td&gt;
&lt;td&gt;95%&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Not a single hallucination.&lt;/strong&gt; Every error was a false negative - rejecting true facts.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Safety-Accuracy Paradox
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fr2et5qf2xmtkj75wwgb1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fr2et5qf2xmtkj75wwgb1.png" alt=" " width="800" height="401"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Here's the uncomfortable truth I discovered: these models have been trained to be so cautious about misinformation that they'd rather reject accurate information than risk spreading a falsehood.&lt;/p&gt;

&lt;p&gt;Think about what this means in practice.&lt;/p&gt;

&lt;p&gt;If you ask an AI assistant, "Did the Berlin Wall fall in 1989?" and it responds with uncertainty or outright denial because it's been over-tuned for safety, that's not helpful. That's a different kind of failure.&lt;/p&gt;

&lt;p&gt;The models that scored less than 100% - Mixtral and Mistral-Small - weren't worse. They were different. They rejected some real facts (false negatives) but never accepted fake ones (hallucinations). They drew the line in a different place on the safety-accuracy spectrum.&lt;/p&gt;

&lt;h2&gt;
  
  
  Confidence Calibration: Everyone's Certain
&lt;/h2&gt;

&lt;p&gt;What struck me most wasn't the accuracy - it was the confidence.&lt;/p&gt;

&lt;p&gt;Every single model reported "high confidence" on 95-100% of their responses. When they were right, they were certain. When they were wrong (the few false negatives), they were still certain.&lt;/p&gt;

&lt;p&gt;This is the real issue with confidence scores in current LLMs. They're not probabilistic assessments. They're vibes.&lt;/p&gt;

&lt;p&gt;A model that says "I'm highly confident the Berlin Wall fell in 1989" and another that says "I'm highly confident it didn't" are both expressing the same level of certainty despite contradicting each other. The confidence score doesn't tell you about uncertainty - it tells you the model finished its internal reasoning process.&lt;/p&gt;

&lt;h2&gt;
  
  
  What This Actually Tells Us
&lt;/h2&gt;

&lt;p&gt;I went into this experiment expecting to write about hallucination rates and confidence miscalibration. Instead, I found something more nuanced: &lt;strong&gt;modern LLMs have overcorrected.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The training data and RLHF (Reinforcement Learning from Human Feedback) that went into these models has created systems that:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Err heavily on the side of caution&lt;/strong&gt; - Better to say "I don't know" than risk spreading misinformation&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Treat all uncertainty the same&lt;/strong&gt; - A 60% confidence and a 95% confidence both get reported as "high"&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Optimize for not being wrong&lt;/strong&gt; over being helpful&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This isn't necessarily bad. In many applications, such as medical advice, legal information, and financial guidance, you want conservative models. But it creates a different kind of deployment challenge.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Pendulum Problem
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4ahwwg0trj41if5f4z29.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4ahwwg0trj41if5f4z29.png" alt=" " width="800" height="656"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We've swung from early LLMs that would hallucinate confidently to current models that reject true information to avoid any possibility of error. Neither extreme is ideal.&lt;/p&gt;

&lt;p&gt;The chart above shows how models trade off different failure modes. Perfect scores on "anti-hallucination" (none of them accepted fake facts) but varied scores on "anti-false rejection" (some rejected real facts).&lt;/p&gt;

&lt;p&gt;What we actually need is something in the middle: models that can express genuine uncertainty, distinguish between "probably false" and "definitely false," and acknowledge when they simply don't know.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Real-World Impact
&lt;/h2&gt;

&lt;p&gt;Here's where this gets practical.&lt;/p&gt;

&lt;p&gt;If you're building:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;A fact-checking system&lt;/strong&gt;: Current models are probably too conservative. They'll flag true statements as suspicious.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;A customer service chatbot&lt;/strong&gt;: You want conservative. Better to escalate to a human than give wrong information.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;A research assistant&lt;/strong&gt;: You need calibrated uncertainty. "This claim appears in 3 sources but contradicts 2 others" is more useful than "high confidence: false."&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The failure mode matters as much as the accuracy rate.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Got Wrong
&lt;/h2&gt;

&lt;p&gt;My benchmark used obviously false facts. "The Titanic successfully completed its maiden voyage" is not subtle misinformation. It's the kind of statement that gets flagged immediately.&lt;/p&gt;

&lt;p&gt;In retrospect, I was testing whether models would accept absurdly false claims, not whether they'd get tricked by plausible misinformation. That's a different experiment entirely.&lt;/p&gt;

&lt;p&gt;To actually test hallucination susceptibility, I'd need:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Subtly wrong facts that sound plausible&lt;/li&gt;
&lt;li&gt;Mixed information where some details are right and others wrong&lt;/li&gt;
&lt;li&gt;Statements that require nuanced understanding, not just fact recall&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;But that's also what makes this finding interesting. Even with softball fake facts, the models didn't just reject them - they were defensive across the board.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Technical Debt of Safety
&lt;/h2&gt;

&lt;p&gt;Here's what I think is happening under the hood:&lt;/p&gt;

&lt;p&gt;During RLHF training, models get penalized heavily for hallucinations. The training signal is strong: &lt;strong&gt;never make up facts.&lt;/strong&gt; The penalty for false positives (accepting fake information) is much higher than the penalty for false negatives (rejecting true information).&lt;/p&gt;

&lt;p&gt;This makes sense from a product safety perspective. A model that occasionally refuses to answer is annoying. A model that confidently spreads misinformation is dangerous.&lt;/p&gt;

&lt;p&gt;But it creates a form of technical debt. We've optimized for one failure mode (hallucination) so aggressively that we've introduced another (excessive caution). And because we can't perfectly measure "appropriate uncertainty," the models just default to maximum caution.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where This Leaves Us
&lt;/h2&gt;

&lt;p&gt;Looking at the full outcome distribution, 98.8% of responses were correct. That's impressive. But the 1.2% that were wrong were all the same type of wrong: false negatives.&lt;/p&gt;

&lt;p&gt;This tells me something important about the current state of LLMs: &lt;strong&gt;we've solved the hallucination problem by making models reluctant to commit.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;That's progress. But it's not the end goal.&lt;/p&gt;

&lt;p&gt;The next frontier isn't getting models to stop hallucinating—they've basically done that on straightforward factual questions. It's getting them to:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Express calibrated uncertainty&lt;/li&gt;
&lt;li&gt;Distinguish between "definitely false" and "uncertain"&lt;/li&gt;
&lt;li&gt;Provide nuanced answers instead of binary true/false judgments&lt;/li&gt;
&lt;li&gt;Know what they don't know&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Limitations and Future Work
&lt;/h2&gt;

&lt;p&gt;This was a small-scale experiment with limitations:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Dataset size&lt;/strong&gt;: Only 40 statements&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Fact complexity&lt;/strong&gt;: Simple historical facts, not complex or nuanced claims&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Single API provider&lt;/strong&gt;: All models tested through Together AI&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Binary evaluation&lt;/strong&gt;: True/false doesn't capture nuanced responses&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A more robust version would:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Test with subtle misinformation, not obvious falsehoods&lt;/li&gt;
&lt;li&gt;Include complex claims requiring reasoning, not just fact recall&lt;/li&gt;
&lt;li&gt;Evaluate explanation quality, not just verdict accuracy&lt;/li&gt;
&lt;li&gt;Test the same models across different providers to check for API-level filtering&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The Bottom Line
&lt;/h2&gt;

&lt;p&gt;I set out to measure how often AI models hallucinate. I discovered they've become so afraid of hallucinating that they're starting to reject reality.&lt;/p&gt;

&lt;p&gt;That's not a hallucination problem. It's an overcorrection problem.&lt;/p&gt;

&lt;p&gt;And honestly? I'm not sure which one is harder to fix.&lt;/p&gt;

</description>
    </item>
    <item>
      <title>How to Build an AI-Powered WhatsApp Bot That Analyzes Images Using Python and Vision Models</title>
      <dc:creator>Ademola Balogun</dc:creator>
      <pubDate>Thu, 05 Feb 2026 22:17:29 +0000</pubDate>
      <link>https://dev.to/ambalogun/how-to-build-an-ai-powered-whatsapp-bot-that-analyzes-images-using-python-and-vision-models-3062</link>
      <guid>https://dev.to/ambalogun/how-to-build-an-ai-powered-whatsapp-bot-that-analyzes-images-using-python-and-vision-models-3062</guid>
      <description>&lt;p&gt;WhatsApp has over 2 billion users. Most AI tools live on websites nobody visits. What if you could bring AI directly to where people already spend their time?&lt;/p&gt;

&lt;p&gt;In this tutorial, I'll show you how to build a WhatsApp bot that accepts images, analyzes them using AI vision models, and responds with intelligent insights.&lt;/p&gt;

&lt;h2&gt;
  
  
  What We're Building
&lt;/h2&gt;

&lt;p&gt;By the end of this guide, you'll have a working WhatsApp bot that:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Receives images from users via WhatsApp&lt;/li&gt;
&lt;li&gt;Processes them using AI vision models (Llama, GPT-4V, or Claude)&lt;/li&gt;
&lt;li&gt;Returns structured analysis in natural conversation&lt;/li&gt;
&lt;li&gt;Stores conversation history in MongoDB&lt;/li&gt;
&lt;li&gt;Runs on a free-tier cloud server&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The entire stack costs nearly nothing to run at low volume, making it perfect for MVPs, side projects, or learning.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why WhatsApp + AI Vision?
&lt;/h2&gt;

&lt;p&gt;Before we dive into code, let's talk about why this combination is powerful.&lt;/p&gt;

&lt;p&gt;Traditional AI apps require users to visit a website, create an account, and learn a new interface. WhatsApp bots eliminate all that friction. Users message your bot exactly like they'd message a friend.&lt;/p&gt;

&lt;p&gt;AI vision models have become remarkably capable. They can identify objects, read text, understand context, and generate detailed descriptions. Combining this with WhatsApp's ubiquity creates tools that feel magical.&lt;/p&gt;

&lt;p&gt;Some real-world applications:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Receipt scanners&lt;/strong&gt; that extract expenses automatically&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Plant identifiers&lt;/strong&gt; for gardening enthusiasts
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Food analyzers&lt;/strong&gt; that estimate nutrition from photos&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Document readers&lt;/strong&gt; that summarize uploaded PDFs&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Product lookup&lt;/strong&gt; tools for shopping assistance&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Prerequisites
&lt;/h2&gt;

&lt;p&gt;You'll need:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Python 3.9+&lt;/li&gt;
&lt;li&gt;A Meta Developer account (free)&lt;/li&gt;
&lt;li&gt;A cloud AI provider account (Together AI, OpenAI, or Anthropic)&lt;/li&gt;
&lt;li&gt;MongoDB Atlas account (free tier works)&lt;/li&gt;
&lt;li&gt;A server with a public URL (we'll use ngrok for development)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Architecture Overview
&lt;/h2&gt;

&lt;p&gt;Here's how the pieces fit together:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;User sends image via WhatsApp
         ↓
Meta's WhatsApp Cloud API receives it
         ↓
Webhook forwards to your Flask server
         ↓
Server downloads image from Meta's CDN
         ↓
Image sent to AI Vision API for analysis
         ↓
Response formatted and sent back via WhatsApp API
         ↓
User receives analysis in their chat
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The beauty of this architecture is its simplicity. One Python file handles everything.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 1: Set Up the Meta Developer Account
&lt;/h2&gt;

&lt;p&gt;First, we need access to the WhatsApp Business API.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Go to &lt;a href="https://developers.facebook.com" rel="noopener noreferrer"&gt;developers.facebook.com&lt;/a&gt; and create an account&lt;/li&gt;
&lt;li&gt;Click "My Apps" → "Create App"&lt;/li&gt;
&lt;li&gt;Select "Business" as the app type&lt;/li&gt;
&lt;li&gt;Name your app and click "Create"&lt;/li&gt;
&lt;li&gt;Find "WhatsApp" in the product list and click "Set Up"&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Meta provides a free test phone number that works for development. You'll see it in the WhatsApp section of your app dashboard.&lt;/p&gt;

&lt;p&gt;Note down these values from your dashboard:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Phone Number ID&lt;/strong&gt; (under "From" phone number)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;WhatsApp Business Account ID&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Temporary Access Token&lt;/strong&gt; (we'll make this permanent later)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Step 2: Set Up Your AI Vision Provider
&lt;/h2&gt;

&lt;p&gt;For this tutorial, I'll use Together AI with their Llama Vision model because it's cost-effective and doesn't require waitlist approval. The code works with minor modifications for OpenAI's GPT-4V or Anthropic's Claude.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Sign up at &lt;a href="https://together.ai" rel="noopener noreferrer"&gt;together.ai&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Get your API key from the dashboard&lt;/li&gt;
&lt;li&gt;Note the model name: &lt;code&gt;meta-llama/Llama-4-Scout-17B-16E-Instruct&lt;/code&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Together AI charges about $0.18 per million tokens for vision models—significantly cheaper than alternatives.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 3: Set Up MongoDB
&lt;/h2&gt;

&lt;p&gt;We'll use MongoDB to store user sessions and analysis history.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Create a free account at &lt;a href="https://mongodb.com/atlas" rel="noopener noreferrer"&gt;mongodb.com/atlas&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Create a new cluster (the free M0 tier works fine)&lt;/li&gt;
&lt;li&gt;Create a database user with read/write access&lt;/li&gt;
&lt;li&gt;Get your connection string (looks like &lt;code&gt;mongodb+srv://user:pass@cluster.xxxxx.mongodb.net/&lt;/code&gt;)&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Step 4: Project Structure
&lt;/h2&gt;

&lt;p&gt;Create a new directory and set up these files:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;whatsapp-ai-bot/
├── app.py           # Main application
├── .env             # Environment variables (don't commit this)
├── .env.example     # Template for environment variables
└── requirements.txt # Python dependencies
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Step 5: Install Dependencies
&lt;/h2&gt;

&lt;p&gt;Create &lt;code&gt;requirements.txt&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;flask==3.0.0
requests==2.31.0
pymongo==4.6.0
python-dotenv==1.0.0
together==1.0.0
gunicorn==21.2.0
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Install them:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-r&lt;/span&gt; requirements.txt
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Step 6: Environment Variables
&lt;/h2&gt;

&lt;p&gt;Create &lt;code&gt;.env.example&lt;/code&gt; (commit this as a template):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# WhatsApp API
WHATSAPP_ACCESS_TOKEN=your_token_here
WHATSAPP_PHONE_NUMBER_ID=your_phone_id_here
WHATSAPP_VERIFY_TOKEN=any_random_string_you_choose

# AI Provider
TOGETHER_API_KEY=your_together_api_key

# Database
MONGODB_URI=mongodb+srv://user:pass@cluster.mongodb.net/dbname
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Copy it to &lt;code&gt;.env&lt;/code&gt; and fill in your actual values.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 7: The Main Application
&lt;/h2&gt;

&lt;p&gt;Here's the complete &lt;code&gt;app.py&lt;/code&gt;. I'll explain each section after:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;flask&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Flask&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;jsonify&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pymongo&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;MongoClient&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;dotenv&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;load_dotenv&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;together&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Together&lt;/span&gt;

&lt;span class="nf"&gt;load_dotenv&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="n"&gt;app&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Flask&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;__name__&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Configuration
&lt;/span&gt;&lt;span class="n"&gt;WHATSAPP_TOKEN&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;WHATSAPP_ACCESS_TOKEN&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;PHONE_NUMBER_ID&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;WHATSAPP_PHONE_NUMBER_ID&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;VERIFY_TOKEN&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;WHATSAPP_VERIFY_TOKEN&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;TOGETHER_API_KEY&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;TOGETHER_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;MONGODB_URI&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;MONGODB_URI&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Initialize clients
&lt;/span&gt;&lt;span class="n"&gt;mongo_client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;MongoClient&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;MONGODB_URI&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;db&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;mongo_client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;whatsapp_bot&lt;/span&gt;
&lt;span class="n"&gt;together_client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Together&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;TOGETHER_API_KEY&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Vision model configuration
&lt;/span&gt;&lt;span class="n"&gt;VISION_MODEL&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;meta-llama/Llama-4-Scout-17B-16E-Instruct&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;


&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;send_whatsapp_message&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;to&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Send a text message via WhatsApp API.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;url&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://graph.facebook.com/v18.0/&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;PHONE_NUMBER_ID&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/messages&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="n"&gt;headers&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Authorization&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Bearer &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;WHATSAPP_TOKEN&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Content-Type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;application/json&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="n"&gt;payload&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;messaging_product&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;whatsapp&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;to&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;to&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;body&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;


&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;download_media&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;media_id&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Download media from WhatsApp&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s CDN.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="c1"&gt;# First, get the media URL
&lt;/span&gt;    &lt;span class="n"&gt;url&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://graph.facebook.com/v18.0/&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;media_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="n"&gt;headers&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Authorization&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Bearer &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;WHATSAPP_TOKEN&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;media_url&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;url&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;media_url&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;

    &lt;span class="c1"&gt;# Download the actual file
&lt;/span&gt;    &lt;span class="n"&gt;media_response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;media_url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;media_response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;


&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;analyze_image&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;image_data&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Send image to AI vision model for analysis.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;base64&lt;/span&gt;

    &lt;span class="c1"&gt;# Convert to base64
&lt;/span&gt;    &lt;span class="n"&gt;image_base64&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;base64&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;b64encode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;image_data&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;decode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;utf-8&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;image_url&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;data:image/jpeg;base64,&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;image_base64&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

    &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;together_client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;VISION_MODEL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
                    &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;image_url&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;image_url&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;url&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;image_url&lt;/span&gt;&lt;span class="p"&gt;}},&lt;/span&gt;
                    &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
                &lt;span class="p"&gt;]&lt;/span&gt;
            &lt;span class="p"&gt;}],&lt;/span&gt;
            &lt;span class="n"&gt;max_tokens&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;500&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;
    &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Vision API error: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;


&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_analysis_prompt&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Return the prompt for image analysis.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Analyze this image and provide:

1. A brief description of what you see
2. Key details or notable elements
3. Any relevant insights or observations

Keep your response concise and conversational, suitable for a chat message.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;


&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;log_interaction&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;phone_number&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;message_type&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Log the interaction to MongoDB.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;interactions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;insert_one&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;phone_number&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;phone_number&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;message_type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;message_type&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;[:&lt;/span&gt;&lt;span class="mi"&gt;500&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;content&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;response&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;[:&lt;/span&gt;&lt;span class="mi"&gt;500&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;timestamp&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;utcnow&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="p"&gt;})&lt;/span&gt;


&lt;span class="nd"&gt;@app.route&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/webhook&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;methods&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;GET&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;verify_webhook&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Handle webhook verification from Meta.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;mode&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;hub.mode&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;token&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;hub.verify_token&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;challenge&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;hub.challenge&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;mode&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;subscribe&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;token&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;VERIFY_TOKEN&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Webhook verified successfully&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;challenge&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;200&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Forbidden&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;403&lt;/span&gt;


&lt;span class="nd"&gt;@app.route&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/webhook&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;methods&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;POST&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;handle_webhook&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Process incoming WhatsApp messages.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;json&lt;/span&gt;

    &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="c1"&gt;# Extract message details
&lt;/span&gt;        &lt;span class="n"&gt;entry&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;entry&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="n"&gt;changes&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;entry&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;changes&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="n"&gt;value&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;changes&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;value&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

        &lt;span class="c1"&gt;# Check if this is a message (not a status update)
&lt;/span&gt;        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;messages&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;jsonify&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;status&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ok&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}),&lt;/span&gt; &lt;span class="mi"&gt;200&lt;/span&gt;

        &lt;span class="n"&gt;message&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;messages&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="n"&gt;phone_number&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;from&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="n"&gt;message_type&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

        &lt;span class="c1"&gt;# Handle image messages
&lt;/span&gt;        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;message_type&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;image&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;media_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;image&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

            &lt;span class="c1"&gt;# Send acknowledgment
&lt;/span&gt;            &lt;span class="nf"&gt;send_whatsapp_message&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="n"&gt;phone_number&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Got your image! Analyzing it now...&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;

            &lt;span class="c1"&gt;# Download and analyze
&lt;/span&gt;            &lt;span class="n"&gt;image_data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;download_media&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;media_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;image_data&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;analysis&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;analyze_image&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;image_data&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nf"&gt;get_analysis_prompt&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;

                &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;analysis&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                    &lt;span class="nf"&gt;send_whatsapp_message&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;phone_number&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;analysis&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                    &lt;span class="nf"&gt;log_interaction&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;phone_number&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;image&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;image_received&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;analysis&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                    &lt;span class="nf"&gt;send_whatsapp_message&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                        &lt;span class="n"&gt;phone_number&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Sorry, I couldn&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;t analyze that image. Please try again.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
                    &lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="nf"&gt;send_whatsapp_message&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                    &lt;span class="n"&gt;phone_number&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Sorry, I couldn&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;t download that image. Please try again.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
                &lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Handle text messages
&lt;/span&gt;        &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;message_type&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;body&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

            &lt;span class="c1"&gt;# Simple response for non-image messages
&lt;/span&gt;            &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Hi! Send me an image and I&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;ll analyze it for you.&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Just take a photo or send one from your gallery!&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="nf"&gt;send_whatsapp_message&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;phone_number&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="nf"&gt;log_interaction&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;phone_number&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;jsonify&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;status&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ok&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}),&lt;/span&gt; &lt;span class="mi"&gt;200&lt;/span&gt;

    &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Error processing webhook: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;jsonify&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;status&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;error&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}),&lt;/span&gt; &lt;span class="mi"&gt;500&lt;/span&gt;


&lt;span class="nd"&gt;@app.route&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/health&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;methods&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;GET&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;health_check&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Simple health check endpoint.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;jsonify&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;status&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;healthy&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;timestamp&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;utcnow&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;isoformat&lt;/span&gt;&lt;span class="p"&gt;()})&lt;/span&gt;


&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;__name__&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;__main__&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;app&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;host&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;0.0.0.0&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;port&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;5000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;debug&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Step 8: Understanding the Code
&lt;/h2&gt;

&lt;p&gt;Let's break down the key parts.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Webhook Verification&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;When you configure your webhook URL in Meta's dashboard, they send a GET request with a challenge. Your server must echo it back:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nd"&gt;@app.route&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/webhook&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;methods&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;GET&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;verify_webhook&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;mode&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;subscribe&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;token&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;VERIFY_TOKEN&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;challenge&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;200&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Processing Incoming Messages&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;WhatsApp sends POST requests to your webhook for each message. The nested JSON structure requires careful extraction:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;message&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;messages&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;phone_number&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;from&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;message_type&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Downloading Media&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;WhatsApp doesn't send images directly. Instead, they provide a media ID. You must first get the download URL, then fetch the actual file:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;download_media&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;media_id&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="c1"&gt;# Get URL from media ID
&lt;/span&gt;    &lt;span class="n"&gt;url&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://graph.facebook.com/v18.0/&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;media_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;media_url&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;url&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Download actual file
&lt;/span&gt;    &lt;span class="n"&gt;media_response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;media_url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;media_response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Vision Analysis&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The AI vision API accepts base64-encoded images. We send both the image and a text prompt:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;together_client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;VISION_MODEL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;image_url&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;image_url&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;url&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;image_url&lt;/span&gt;&lt;span class="p"&gt;}},&lt;/span&gt;
            &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="p"&gt;}]&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Step 9: Set Up the Webhook
&lt;/h2&gt;

&lt;p&gt;For development, we'll use ngrok to expose your local server.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Install ngrok from &lt;a href="https://ngrok.com" rel="noopener noreferrer"&gt;ngrok.com&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Run your Flask app: &lt;code&gt;python app.py&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;In another terminal, run: &lt;code&gt;ngrok http 5000&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Copy the HTTPS URL (looks like &lt;code&gt;https://abc123.ngrok.io&lt;/code&gt;)&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Now configure the webhook in Meta's dashboard:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Go to your app → WhatsApp → Configuration&lt;/li&gt;
&lt;li&gt;Click "Edit" next to Webhook&lt;/li&gt;
&lt;li&gt;Enter your URL: &lt;code&gt;https://abc123.ngrok.io/webhook&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Enter your verify token (same as &lt;code&gt;WHATSAPP_VERIFY_TOKEN&lt;/code&gt; in &lt;code&gt;.env&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;Click "Verify and Save"&lt;/li&gt;
&lt;li&gt;Subscribe to "messages" events&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Step 10: Test Your Bot
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;Open WhatsApp on your phone&lt;/li&gt;
&lt;li&gt;Message the test number shown in Meta's dashboard&lt;/li&gt;
&lt;li&gt;Send an image&lt;/li&gt;
&lt;li&gt;Watch your terminal for logs&lt;/li&gt;
&lt;li&gt;Receive the AI analysis!&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;If something doesn't work, check:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Is ngrok running and the URL current?&lt;/li&gt;
&lt;li&gt;Are all environment variables set?&lt;/li&gt;
&lt;li&gt;Is the webhook subscribed to "messages"?&lt;/li&gt;
&lt;li&gt;Check Meta's webhook logs for delivery status&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Step 11: Making the Access Token Permanent
&lt;/h2&gt;

&lt;p&gt;The temporary access token expires in 24 hours. For production, create a permanent one:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Go to your app → WhatsApp → API Setup&lt;/li&gt;
&lt;li&gt;Click "Add" under "Add a system user token"&lt;/li&gt;
&lt;li&gt;Create a system user if you haven't&lt;/li&gt;
&lt;li&gt;Generate a token with &lt;code&gt;whatsapp_business_messaging&lt;/code&gt; permission&lt;/li&gt;
&lt;li&gt;This token won't expire&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Step 12: Deploying to Production
&lt;/h2&gt;

&lt;p&gt;For production, you need a server with a stable URL. Here are affordable options:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Railway&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Install Railway CLI&lt;/span&gt;
npm &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-g&lt;/span&gt; @railway/cli

&lt;span class="c"&gt;# Deploy&lt;/span&gt;
railway init
railway up
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Render (Free tier available)&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Connect your GitHub repo&lt;/li&gt;
&lt;li&gt;Set environment variables in dashboard&lt;/li&gt;
&lt;li&gt;Deploy automatically on push&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;DigitalOcean&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# On your droplet&lt;/span&gt;
git clone your-repo
&lt;span class="nb"&gt;cd &lt;/span&gt;your-repo
pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-r&lt;/span&gt; requirements.txt
gunicorn app:app &lt;span class="nt"&gt;-b&lt;/span&gt; 0.0.0.0:5000
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For any option, remember to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Set all environment variables&lt;/li&gt;
&lt;li&gt;Use HTTPS (required by WhatsApp)&lt;/li&gt;
&lt;li&gt;Run with gunicorn instead of Flask's dev server&lt;/li&gt;
&lt;li&gt;Set up process management (systemd, supervisor, or PM2)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Step 13: Customizing for Your Use Case
&lt;/h2&gt;

&lt;p&gt;The base code is intentionally generic. Here's how to adapt it for specific applications.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;For a Receipt Scanner:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_analysis_prompt&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Analyze this receipt image and extract:

1. Store/merchant name
2. Date of purchase
3. List of items with prices
4. Total amount
5. Payment method if visible

Format the response as a clear summary.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;For a Plant Identifier:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_analysis_prompt&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Identify this plant and provide:

1. Plant name (common and scientific)
2. Key identifying features
3. Care requirements (water, sunlight)
4. Is it toxic to pets?

Keep it conversational and helpful.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;For a Food Analyzer:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_analysis_prompt&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Analyze this food image and estimate:

1. What foods are present
2. Approximate calories
3. Protein, carbs, and fat estimates
4. Health rating from 1-10
5. A brief nutritional insight

Be helpful but note these are estimates.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Step 14: Adding Conversation Context
&lt;/h2&gt;

&lt;p&gt;To make your bot smarter, store and use conversation history:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_user_context&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;phone_number&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;limit&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Get recent interactions for context.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;recent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;interactions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;find&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;phone_number&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;phone_number&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;sort&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;timestamp&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;limit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;limit&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;list&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;recent&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;


&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;analyze_image_with_context&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;image_data&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;phone_number&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Include conversation history in analysis.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;context&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;get_user_context&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;phone_number&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;context_text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;context_text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Previous interactions:&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;item&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;reversed&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="n"&gt;context_text&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;- &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;item&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;response&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;''&lt;/span&gt;&lt;span class="p"&gt;)[&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

    &lt;span class="n"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;context_text&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;

Now analyze this new image. Consider any relevant context from previous interactions.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;analyze_image&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;image_data&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Performance Tips
&lt;/h2&gt;

&lt;p&gt;After running this in production, here's what I've learned:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Send acknowledgments immediately.&lt;/strong&gt; Users get anxious if there's no response. Send "Analyzing..." before doing the heavy work.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Cache repeated analyses.&lt;/strong&gt; Hash incoming images and check if you've seen them before.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Set timeout limits.&lt;/strong&gt; Vision APIs can be slow. Set a 30-second timeout and send a graceful error if exceeded.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Rate limit by user.&lt;/strong&gt; Prevent abuse by limiting requests per phone number per hour.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Monitor costs.&lt;/strong&gt; Log API calls and set up billing alerts. Vision APIs charge per image.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Common Pitfalls
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;"Webhook verification failed"&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Your verify token doesn't match&lt;/li&gt;
&lt;li&gt;Your server isn't accessible (check ngrok)&lt;/li&gt;
&lt;li&gt;You're not returning the challenge correctly&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;"Message not delivered"&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Access token expired (get a permanent one)&lt;/li&gt;
&lt;li&gt;Phone number not in allowed list (in test mode)&lt;/li&gt;
&lt;li&gt;Invalid phone number format&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;"Image download failed"&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Access token doesn't have media.read permission&lt;/li&gt;
&lt;li&gt;Media URL expired (they're temporary)&lt;/li&gt;
&lt;li&gt;Network timeout&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;"Vision API error"&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Image too large (resize before sending)&lt;/li&gt;
&lt;li&gt;Unsupported format (stick to JPEG/PNG)&lt;/li&gt;
&lt;li&gt;Rate limit hit&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What's Next?
&lt;/h2&gt;

&lt;p&gt;This foundation supports many extensions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Multi-language support&lt;/strong&gt;: Detect user's language and respond accordingly&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Voice messages&lt;/strong&gt;: Transcribe audio and respond&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Buttons and lists&lt;/strong&gt;: Use WhatsApp's interactive message types&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Payment integration&lt;/strong&gt;: Connect to Stripe for premium features&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Admin dashboard&lt;/strong&gt;: Build a web interface for monitoring&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Wrapping Up
&lt;/h2&gt;

&lt;p&gt;You now have a complete AI-powered WhatsApp bot that can analyze images. The stack is simple, affordable, and scales well.&lt;/p&gt;

&lt;p&gt;The combination of WhatsApp's reach and AI vision capabilities opens interesting possibilities. Users don't need to learn new interfaces—they just message like they always do.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;This article was written based on my hands-on experience building production WhatsApp bots. The code examples are simplified for clarity—production deployments should include proper error handling, logging, and security measures.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>python</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>How to Build an AI That Roasts Your Spending Habits (3 hours Weekend Project)</title>
      <dc:creator>Ademola Balogun</dc:creator>
      <pubDate>Mon, 19 Jan 2026 21:37:08 +0000</pubDate>
      <link>https://dev.to/ambalogun/how-to-build-an-ai-that-roasts-your-spending-habits-3-hours-weekend-project-pmf</link>
      <guid>https://dev.to/ambalogun/how-to-build-an-ai-that-roasts-your-spending-habits-3-hours-weekend-project-pmf</guid>
      <description>&lt;p&gt;Building an AI Financial Roaster That People Actually Want to Use&lt;/p&gt;

&lt;p&gt;Let me show you how to build the most brutally honest financial advisor you'll ever meet: an AI that doesn't care about your feelings.&lt;/p&gt;

&lt;p&gt;I love building things that people actually use. That is why I wanted to create a tutorial project that would teach developers how to make AI applications that people actually enjoy using. Because let's be honest, most AI tutorials are boring calculators and chatbots.&lt;/p&gt;

&lt;p&gt;The project? Finance Roaster: Users upload their bank statement, get roasted by GPT-4 with savage, witty humor.&lt;/p&gt;

&lt;p&gt;This tutorial will teach you:&lt;/p&gt;

&lt;p&gt;How to process CSV and PDF bank statements&lt;/p&gt;

&lt;p&gt;How to engineer AI prompts for humor and personality&lt;/p&gt;

&lt;p&gt;How to build shareable, viral-worthy UI/UX&lt;/p&gt;

&lt;p&gt;How to create an AI app in under 100 lines of Python&lt;/p&gt;

&lt;p&gt;By the end of this guide, you'll have a fully functional app you can deploy this weekend.&lt;/p&gt;

&lt;p&gt;The Psychology: Why This Project Concept Works&lt;/p&gt;

&lt;p&gt;Before diving into code, let's talk about why this makes such a great tutorial project.&lt;/p&gt;

&lt;p&gt;Traditional budgeting apps are boring. They show you pie charts and tell you to "reduce discretionary spending." Not exactly share-worthy.&lt;/p&gt;

&lt;p&gt;But an AI that says: "$87 at Foods Co for organic kale? Your wallet is as wilted as that kale will be in three days"? That's memorable. That's the kind of feature that makes people actually want to test your app.&lt;/p&gt;

&lt;p&gt;Three principles that make this project interesting:&lt;/p&gt;

&lt;p&gt;Humor makes boring tasks fun - People avoid checking their finances because it's painful. Adding humor changes that dynamic.&lt;/p&gt;

&lt;p&gt;Highly shareable - When you build something funny, people naturally want to show friends. This teaches you virality mechanics.&lt;/p&gt;

&lt;p&gt;Practical learning - You'll learn file uploads, AI prompt engineering, and building clean UIs, all transferable skills.&lt;/p&gt;

&lt;p&gt;The Tech Stack (Perfect for Learning)&lt;/p&gt;

&lt;p&gt;I deliberately chose a simple stack that beginners can understand while still being production-ready.&lt;/p&gt;

&lt;p&gt;What we'll use:&lt;/p&gt;

&lt;p&gt;Backend: Flask (Python) - 100 lines of code&lt;/p&gt;

&lt;p&gt;AI Engine: OpenAI's GPT-4o-mini ($0.15 per 1M tokens)&lt;/p&gt;

&lt;p&gt;File Processing: Pandas for CSV, pdfplumber for PDF statements&lt;/p&gt;

&lt;p&gt;Frontend: Single HTML page with Tailwind CSS&lt;/p&gt;

&lt;p&gt;Cost per request: ~$0.01&lt;/p&gt;

&lt;p&gt;Why this stack is perfect for learning:&lt;/p&gt;

&lt;p&gt;No deployment complexity (runs locally or on a single server)&lt;/p&gt;

&lt;p&gt;Sub-3-second response times&lt;/p&gt;

&lt;p&gt;Teaches you file handling, AI integration, and modern UI design&lt;/p&gt;

&lt;p&gt;Easily extensible (add features as you learn more)&lt;/p&gt;

&lt;p&gt;Part 1: Making AI Understand Money (The Easy Part)&lt;/p&gt;

&lt;p&gt;First challenge: How do you teach an AI to roast someone's spending?&lt;/p&gt;

&lt;p&gt;Step 1: Extract Transaction Data&lt;/p&gt;

&lt;p&gt;Most banks export statements as CSV or PDF. Here's the extraction logic:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import pandas as pd
import pdfplumber
from openai import OpenAI

def process_csv(file):
    """Extract spending summary from CSV bank statement"""
    df = pd.read_csv(file)

    # Find the amount column (banks use different names)
    amount_col = next(
        (col for col in df.columns 
         if any(kw in col.lower() for kw in ['amount', 'debit', 'spent'])),
        df.columns[-1]  # Default to last column
    )

    # Convert to numeric, handle currency symbols
    df[amount_col] = pd.to_numeric(
        df[amount_col].astype(str).str.replace('$', '').str.replace(',', ''),
        errors='coerce'
    )

    # Calculate summary
    total_spent = df[amount_col].abs().sum()
    top_expenses = df.nlargest(5, amount_col)

    return {
        'total_spent': f"${total_spent:,.2f}",
        'top_expenses': top_expenses.to_dict('records')
    }
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;PDF handling is trickier because banks format statements differently. Solution? Use pdfplumber for text extraction, then let GPT-4 parse it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;def extract_transactions_from_pdf(pdf_text):
    """Use AI as a fallback parser for complex PDFs"""
    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{
            "role": "system",
            "content": "Extract transactions: Date | Description | Amount"
        }, {
            "role": "user",
            "content": pdf_text[:4000]  # Limit tokens
        }]
    )
    # Parse AI response into structured data
    # ...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Pro tip: AI is surprisingly good at parsing messy financial data. I tried regex patterns for weeks before realizing GPT-4 could do it in one call.&lt;/p&gt;

&lt;p&gt;Part 2: Teaching AI to Be Savage (The Fun Part)&lt;/p&gt;

&lt;p&gt;Now the magic: turning transaction data into comedy gold.&lt;/p&gt;

&lt;p&gt;The System Prompt (Your AI's Personality)&lt;/p&gt;

&lt;p&gt;This is where most people fail. A generic "be funny" prompt produces generic humor. You need specificity.&lt;/p&gt;

&lt;p&gt;Here's my system prompt:&lt;/p&gt;

&lt;p&gt;SYSTEM_PROMPT = """&lt;br&gt;
You are a Savage Financial Parent - brutally honest, wickedly funny, &lt;br&gt;
and armed with someone's transaction history. &lt;/p&gt;

&lt;p&gt;Your job: Roast their spending habits using SPECIFIC transaction details.&lt;/p&gt;

&lt;p&gt;Style guide:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Think disappointed parent meets stand-up comedian&lt;/li&gt;
&lt;li&gt;Use their actual purchase names (e.g., "Taco Bell at 2 AM")&lt;/li&gt;
&lt;li&gt;Be savage but not cruel - make them laugh while crying&lt;/li&gt;
&lt;li&gt;End with a Financial Maturity Grade (A-F) and one-line explanation&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Examples of good roasts:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;"Five Starbucks trips in one day? That's not a coffee addiction, 
that's a $30 therapy session with extra foam."&lt;/li&gt;
&lt;li&gt;"You spent $200 on DoorDash this month. That's literally paying 
someone $8 to make your laziness official."&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Keep it under 200 words. Make it screenshot-worthy.&lt;br&gt;
"""&lt;/p&gt;

&lt;p&gt;Why this works:&lt;/p&gt;

&lt;p&gt;Specific examples teach the AI your humor style&lt;/p&gt;

&lt;p&gt;Constraints (200 words) force concise, punchy writing&lt;/p&gt;

&lt;p&gt;"Screenshot-worthy" reminds the AI this is for social sharing&lt;/p&gt;

&lt;p&gt;The Actual API Call&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;def generate_roast(summary_data):
    """Send transaction summary to OpenAI, get roast back"""

    prompt = f"""
Transaction Data:
- Total Spent: {summary_data['total_spent']}
- Top Expense: {summary_data['top_expenses'][0]}
- You went to {summary_data['frequent_vendors'][0]['vendor']} 
  {summary_data['frequent_vendors'][0]['count']} times

Roast these spending habits. Be specific and savage.
"""

    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {"role": "system", "content": SYSTEM_PROMPT},
            {"role": "user", "content": prompt}
        ],
        temperature=0.9  # High creativity for humor
    )

    return response.choices[0].message.content
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The temperature setting (0.9) is crucial. Lower values make AI boring. Higher values make it creative... sometimes too creative. 0.9 is the sweet spot for comedy.&lt;/p&gt;

&lt;p&gt;Part 3: The UI That Makes People Click "Roast Me"&lt;/p&gt;

&lt;p&gt;Nobody uploads bank statements to ugly websites. Your UI needs to be:&lt;/p&gt;

&lt;p&gt;Dark and edgy (matches the roasting vibe)&lt;/p&gt;

&lt;p&gt;Dead simple (one button: "Roast Me")&lt;/p&gt;

&lt;p&gt;Fast (loading states matter)&lt;/p&gt;

&lt;p&gt;The Landing Page&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;&amp;lt;div class="bg-gradient-to-br from-gray-900 to-black min-h-screen"&amp;gt;
  &amp;lt;h1 class="text-6xl font-bold bg-gradient-to-r from-red-500 to-pink-500 
             bg-clip-text text-transparent"&amp;gt;
    💸 Finance Roaster
  &amp;lt;/h1&amp;gt;
  &amp;lt;p class="text-xl text-gray-400"&amp;gt;
    Upload Your Regrets. Get Brutally Honest Feedback.
  &amp;lt;/p&amp;gt;

  &amp;lt;!-- Drag-and-drop upload area --&amp;gt;
  &amp;lt;div class="border-2 border-dashed border-red-500 rounded-xl p-12 
              hover:bg-red-500/10 cursor-pointer"&amp;gt;
    &amp;lt;input type="file" accept=".csv,.pdf" hidden /&amp;gt;
    &amp;lt;p&amp;gt;Drop your bank statement here&amp;lt;/p&amp;gt;
    &amp;lt;p class="text-sm text-gray-500"&amp;gt;CSV or PDF • Max 16MB&amp;lt;/p&amp;gt;
  &amp;lt;/div&amp;gt;

  &amp;lt;button class="w-full bg-gradient-to-r from-red-500 to-pink-500 
                 text-white font-bold py-4 rounded-xl"&amp;gt;
    🔥 Roast Me
  &amp;lt;/button&amp;gt;
&amp;lt;/div&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Key design choices:&lt;/p&gt;

&lt;p&gt;Red/pink gradient: Danger + playfulness&lt;/p&gt;

&lt;p&gt;Dark background: Focuses attention on the upload area&lt;/p&gt;

&lt;p&gt;Emojis: Adds personality without extra design work&lt;/p&gt;

&lt;p&gt;Tailwind CSS: Rapid prototyping without writing CSS&lt;/p&gt;

&lt;p&gt;The Loading State (Critical for Virality)&lt;/p&gt;

&lt;p&gt;While the AI thinks, show personality:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;function showLoading() {
  document.getElementById('loadingState').innerHTML = `
    &amp;lt;div class="animate-spin h-16 w-16 border-b-2 border-red-500"&amp;gt;&amp;lt;/div&amp;gt;
    &amp;lt;p class="text-xl"&amp;gt;Analyzing your poor life choices...&amp;lt;/p&amp;gt;
    &amp;lt;p class="text-sm text-gray-500"&amp;gt;This won't take long. Unlike your debt.&amp;lt;/p&amp;gt;
  `;
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Why this matters: Users wait 3-5 seconds for AI responses. A boring spinner loses attention. A funny loading message keeps them engaged.&lt;/p&gt;

&lt;p&gt;Part 4: Security (Because Banks Care About This)&lt;/p&gt;

&lt;p&gt;You're handling financial data. Even for a joke app, security matters.&lt;/p&gt;

&lt;p&gt;Three Non-Negotiable Rules:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Never store files on disk&lt;/li&gt;
&lt;/ol&gt;

&lt;h1&gt;
  
  
  ❌ BAD: Saves file to server
&lt;/h1&gt;

&lt;p&gt;file.save('statements/' + filename)&lt;/p&gt;

&lt;h1&gt;
  
  
  ✅ GOOD: Process in-memory
&lt;/h1&gt;

&lt;p&gt;file_bytes = io.BytesIO(file.read())&lt;br&gt;
df = pd.read_csv(file_bytes)&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Don't log transaction details&lt;/li&gt;
&lt;/ol&gt;
&lt;h1&gt;
  
  
  ❌ BAD: Logs sensitive data
&lt;/h1&gt;

&lt;p&gt;print(f"User spent ${amount} at {merchant}")&lt;/p&gt;
&lt;h1&gt;
  
  
  ✅ GOOD: Generic logging
&lt;/h1&gt;

&lt;p&gt;print(f"Processing statement with {len(transactions)} transactions")&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Strip personal info before sending to OpenAI
&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Remove account numbers, names, addresses
summary_data = {
    'total_spent': total,
    'top_categories': categories,  # "Food", not "Joe's Pizza"
    'spending_pattern': pattern
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Reality check: Even with sanitization, tell users their data goes to OpenAI. Transparency builds trust.&lt;/p&gt;

&lt;p&gt;Part 5: Adding Viral Features (Optional Enhancement)&lt;/p&gt;

&lt;p&gt;If you want to take this tutorial further, here's how to add shareability:&lt;/p&gt;

&lt;p&gt;Built-in Sharing Capability:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;One-Click Social Sharing
&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;function shareRoast() {
  const roastText = document.getElementById('roast').textContent;
  const shareText = `I just got my spending roasted by AI! 🔥\n\n"${roastText.substring(0, 200)}..."\n\nBuild your own: [your-github-link]`;

  if (navigator.share) {
    navigator.share({ text: shareText });
  } else {
    navigator.clipboard.writeText(shareText);
    alert('Roast copied! Share with friends! 🔥');
  }
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Why this matters: If you eventually deploy this publicly, sharing features help spread awareness.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;The Financial Maturity Grade&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Every roast ends with a grade: A through F.&lt;/p&gt;

&lt;p&gt;Financial Maturity Grade: D-&lt;br&gt;
"You have the self-control of a toddler in a candy store, &lt;br&gt;
except the candy is overpriced artisanal coffee."&lt;/p&gt;

&lt;p&gt;Learning point: Grades, scores, and rankings make results more shareable and comparable.&lt;/p&gt;

&lt;p&gt;What You'll Learn Building This&lt;/p&gt;

&lt;p&gt;This project teaches you several valuable skills:&lt;/p&gt;

&lt;p&gt;File Processing:&lt;/p&gt;

&lt;p&gt;Handle both CSV and PDF uploads&lt;/p&gt;

&lt;p&gt;Parse unstructured financial data&lt;/p&gt;

&lt;p&gt;Work with Pandas for data analysis&lt;/p&gt;

&lt;p&gt;AI Engineering:&lt;/p&gt;

&lt;p&gt;Craft effective system prompts for personality&lt;/p&gt;

&lt;p&gt;Temperature tuning for creative outputs&lt;/p&gt;

&lt;p&gt;Handle AI responses in production&lt;/p&gt;

&lt;p&gt;Full-Stack Development:&lt;/p&gt;

&lt;p&gt;Build clean REST APIs with Flask&lt;/p&gt;

&lt;p&gt;Create engaging UIs with Tailwind CSS&lt;/p&gt;

&lt;p&gt;Implement drag-and-drop file uploads&lt;/p&gt;

&lt;p&gt;Handle loading states and user feedback&lt;/p&gt;

&lt;p&gt;Deployment Ready:&lt;/p&gt;

&lt;p&gt;In-memory file processing (no disk storage)&lt;/p&gt;

&lt;p&gt;Error handling and validation&lt;/p&gt;

&lt;p&gt;Security best practices for financial data&lt;/p&gt;

&lt;p&gt;Design Decisions: Why I Built It This Way&lt;/p&gt;

&lt;p&gt;What Works in This Architecture:&lt;/p&gt;

&lt;p&gt;Simplicity over features: A single-purpose app is easier to understand and build. You can always add features later once you understand the core.&lt;/p&gt;

&lt;p&gt;Humor as the hook: The AI's personality is what makes this project interesting. Technical tutorials don't have to be boring.&lt;/p&gt;

&lt;p&gt;File processing in memory: No database needed for this tutorial. Everything happens in RAM, making deployment simple.&lt;/p&gt;

&lt;p&gt;What You Might Customize:&lt;/p&gt;

&lt;p&gt;Add comparison data: "You spent more on coffee than X% of typical users" - requires a database to track aggregates.&lt;/p&gt;

&lt;p&gt;Mobile-first design: The current UI works on mobile, but you could optimize it further with responsive breakpoints.&lt;/p&gt;

&lt;p&gt;Prompt variations: The system prompt I provide is a starting point. Experiment with different personalities and tones.&lt;/p&gt;

&lt;p&gt;Build Your Own AI App: What This Template Teaches&lt;/p&gt;

&lt;p&gt;This project is designed to be a template for other AI applications. Here's how to adapt it:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Find a task people find tedious&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Checking finances → This tutorial&lt;/p&gt;

&lt;p&gt;Reading legal documents → "Legal Jargon Translator"&lt;/p&gt;

&lt;p&gt;Analyzing health data → "Fitness Report Card"&lt;/p&gt;

&lt;p&gt;Reviewing meeting notes → "Meeting BS Detector"&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Add personality through AI prompts&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Make AI roast them (finance, fitness)&lt;/p&gt;

&lt;p&gt;Turn it into a game (quiz format)&lt;/p&gt;

&lt;p&gt;Add competitive elements (grades, scores)&lt;/p&gt;

&lt;p&gt;Use unexpected analogies (explain tech in food terms)&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Keep the UI simple&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;One-click file upload&lt;/p&gt;

&lt;p&gt;Clear results display&lt;/p&gt;

&lt;p&gt;Optional sharing features&lt;/p&gt;

&lt;p&gt;Fast loading indicators&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Make it easy to extend&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Modular code structure&lt;/p&gt;

&lt;p&gt;Clear separation of concerns&lt;/p&gt;

&lt;p&gt;Well-commented functions&lt;/p&gt;

&lt;p&gt;Standard design patterns&lt;/p&gt;

&lt;p&gt;The Complete Code Repository&lt;/p&gt;

&lt;p&gt;The entire Finance Roaster tutorial project is available on GitHub. Here's the structure:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;FinanceRoaster/
├── app.py              # Flask server (81 lines)
├── services.py         # AI + file processing (156 lines)
├── templates/
│   └── home.html       # UI (342 lines with CSS/JS)
├── requirements.txt    # Dependencies (7 packages)
├── .env.example       # Template for your API key
└── README.md          # Setup instructions
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To run this tutorial locally:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;git clone https://github.com/yourusername/FinanceRoaster
cd FinanceRoaster
pip install -r requirements.txt
# Copy .env.example to .env and add your OpenAI key
python app.py

Visit http://localhost:5000 and test it out.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Learning cost: ~$0.01 per test (OpenAI API calls)&lt;/p&gt;

&lt;p&gt;What's included:&lt;/p&gt;

&lt;p&gt;Complete working code&lt;/p&gt;

&lt;p&gt;Detailed comments explaining each function&lt;/p&gt;

&lt;p&gt;Example bank statements for testing&lt;/p&gt;

&lt;p&gt;Deployment guide for Heroku/Railway&lt;/p&gt;

&lt;p&gt;Potential Use Cases (What You Could Build)&lt;/p&gt;

&lt;p&gt;Once you understand this template, you can adapt it for various applications:&lt;/p&gt;

&lt;p&gt;Personal Finance Tools:&lt;/p&gt;

&lt;p&gt;Budget analyzer with friendly advice&lt;/p&gt;

&lt;p&gt;Subscription tracker that flags unused services&lt;/p&gt;

&lt;p&gt;Shopping habit analyzer (impulse vs planned purchases)&lt;/p&gt;

&lt;p&gt;Professional Development:&lt;/p&gt;

&lt;p&gt;Resume reviewer that gives honest feedback&lt;/p&gt;

&lt;p&gt;Email tone analyzer (passive-aggressive detector)&lt;/p&gt;

&lt;p&gt;Meeting notes summarizer with action items&lt;/p&gt;

&lt;p&gt;Health &amp;amp; Wellness:&lt;/p&gt;

&lt;p&gt;Food diary analyzer with nutrition roasts&lt;/p&gt;

&lt;p&gt;Workout consistency tracker with motivational guilt&lt;/p&gt;

&lt;p&gt;Sleep pattern analyzer with bedtime recommendations&lt;/p&gt;

&lt;p&gt;Content Creation:&lt;/p&gt;

&lt;p&gt;Social media post analyzer (engagement predictor)&lt;/p&gt;

&lt;p&gt;Blog post readability scorer&lt;/p&gt;

&lt;p&gt;Video script feedback tool&lt;/p&gt;

&lt;p&gt;The core pattern: upload file → AI analysis → humorous/useful feedback - works for countless domains.&lt;/p&gt;

&lt;p&gt;The Technical Deep Dive (For the Nerds)&lt;/p&gt;

&lt;p&gt;Challenge 1: PDF Parsing Hell&lt;/p&gt;

&lt;p&gt;Banks format PDFs differently. Chase, Wells Fargo, Bank of America - all unique snowflakes.&lt;/p&gt;

&lt;p&gt;My solution: Hybrid approach&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;def parse_pdf_transactions(pdf_text):
    # Try regex first (fast, works 70% of the time)
    pattern = r'(\d{1,2}/\d{1,2}/\d{4})\s+(.+?)\s+([-]?\$?\d+\.?\d*)'
    matches = re.findall(pattern, pdf_text)

    if matches:
        return pd.DataFrame(matches, columns=['Date', 'Desc', 'Amount'])

    # Fallback to AI (slower, works 95% of the time)
    return extract_with_gpt(pdf_text)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Lesson: AI should be your fallback, not your first choice. Regex is 100x faster.&lt;/p&gt;

&lt;p&gt;Challenge 2: Keeping Roasts PG-13&lt;/p&gt;

&lt;p&gt;Early versions were too savage. One user got: "Your spending screams 'quarter-life crisis.' Seek therapy."&lt;/p&gt;

&lt;p&gt;The fix: Content filters&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;FORBIDDEN_TOPICS = [
    'therapy', 'depression', 'mental health', 
    'divorce', 'death', 'addiction'
]

def sanitize_roast(roast_text):
    # Check for sensitive topics
    if any(topic in roast_text.lower() for topic in FORBIDDEN_TOPICS):
        return generate_roast(summary_data)  # Retry
    return roast_text
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Temperature tuning also helps: Lowering from 1.0 to 0.9 reduced inappropriate jokes by 60%.&lt;/p&gt;

&lt;p&gt;Challenge 3: Rate Limiting Without Breaking UX&lt;/p&gt;

&lt;p&gt;OpenAI has rate limits. Hitting them = angry users.&lt;/p&gt;

&lt;p&gt;Solution: Queue system&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;from redis import Redis
from rq import Queue

queue = Queue(connection=Redis())

@app.route('/roast', methods=['POST'])
def roast():
    job = queue.enqueue(process_and_roast, file_data)
    return jsonify({'job_id': job.id})

@app.route('/status/&amp;lt;job_id&amp;gt;')
def status(job_id):
    job = Job.fetch(job_id, connection=Redis())
    if job.is_finished:
        return jsonify({'status': 'complete', 'roast': job.result})
    return jsonify({'status': 'processing'})
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Frontend polls /status every second. Feels instant, never hits limits.&lt;/p&gt;

&lt;p&gt;Extension Ideas: Taking It Further&lt;/p&gt;

&lt;p&gt;Once you've built the basic version, here are ways to extend it:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Add data persistence
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Save anonymized spending patterns
from flask_sqlalchemy import SQLAlchemy

db = SQLAlchemy(app)

class SpendingPattern(db.Model):
    id = db.Column(db.Integer, primary_key=True)
    total_amount = db.Column(db.Float)
    category = db.Column(db.String(50))
    timestamp = db.Column(db.DateTime)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;Implement comparison features
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;def get_percentile(user_spending, category):
    all_spending = SpendingPattern.query.filter_by(category=category).all()
    percentile = sum(1 for x in all_spending if x.total_amount &amp;lt; user_spending)
    return (percentile / len(all_spending)) * 100
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;Add authentication
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;from flask_login import LoginManager, login_required

@app.route('/history')
@login_required
def view_history():
    # Show user's past roasts
    pass
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;Implement rate limiting
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;from flask_limiter import Limiter

limiter = Limiter(app, key_func=lambda: request.remote_addr)

@app.route('/roast', methods=['POST'])
@limiter.limit("5 per hour")
def roast():
    # Prevent API abuse
    pass
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Deployment Options: From Local to Production&lt;/p&gt;

&lt;p&gt;Once your project works locally, here are deployment options:&lt;/p&gt;

&lt;p&gt;Option 1: Heroku (Easiest)&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Install Heroku CLI, then:
heroku create FinanceRoaster
git push heroku main
heroku config:set OPENAI_API_KEY=your_key
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Option 2: Railway&lt;/p&gt;

&lt;p&gt;Connect your GitHub repo&lt;/p&gt;

&lt;p&gt;Add environment variables&lt;/p&gt;

&lt;p&gt;Auto-deploys on push&lt;/p&gt;

&lt;p&gt;Option 3: DigitalOcean App Platform&lt;/p&gt;

&lt;p&gt;Deploy from GitHub&lt;/p&gt;

&lt;p&gt;$5/month for basic app&lt;/p&gt;

&lt;p&gt;Built-in SSL certificates&lt;/p&gt;

&lt;p&gt;Scaling considerations:&lt;/p&gt;

&lt;p&gt;Caching for similar inputs: Store common roast patterns&lt;/p&gt;

&lt;p&gt;Rate limiting: Prevent abuse with Flask-Limiter&lt;/p&gt;

&lt;p&gt;CDN for assets: Use Cloudflare free tier&lt;/p&gt;

&lt;p&gt;Database: PostgreSQL if you add user accounts&lt;/p&gt;

&lt;p&gt;For a tutorial project, any of these platforms work great. Start with Heroku's free tier to test.&lt;/p&gt;

&lt;p&gt;The Bigger Picture: What This Tutorial Teaches About AI Development&lt;/p&gt;

&lt;p&gt;Here's what building Finance Roaster teaches you about creating useful AI applications:&lt;/p&gt;

&lt;p&gt;Lesson 1: AI doesn't have to be serious. Most AI tutorials focus on optimization, efficiency, and accuracy. But the best projects are ones people actually want to use. Adding personality makes your projects memorable.&lt;/p&gt;

&lt;p&gt;Lesson 2: Simple prompts can be powerful. The entire "roasting" capability comes from a well-crafted system prompt. You don't need fine-tuning or complex models, just clear instructions and examples.&lt;/p&gt;

&lt;p&gt;Lesson 3: User experience matters more than complexity. A 100-line Flask app with great UX beats a microservices architecture with boring design. Focus on the user experience first, optimize later.&lt;/p&gt;

&lt;p&gt;Lesson 4: File processing is a valuable skill. Being able to parse CSVs and PDFs opens up countless project possibilities. Financial statements, receipts, invoices, medical records, they're all structured data waiting to be analyzed.&lt;/p&gt;

&lt;p&gt;This pattern applies to many domains:&lt;/p&gt;

&lt;p&gt;Upload document → AI analyzes → Useful/entertaining output&lt;/p&gt;

&lt;p&gt;It's simple, it works, and users understand it immediately&lt;/p&gt;

&lt;p&gt;Try It Yourself: Weekend Project Checklist&lt;/p&gt;

&lt;p&gt;Want to build your own AI app this weekend? Here's your roadmap:&lt;/p&gt;

&lt;p&gt;Friday night (0.5 hours):&lt;/p&gt;

&lt;p&gt;Pick your painful task (finance, health, career, dating)&lt;/p&gt;

&lt;p&gt;Write 10 example roasts manually&lt;/p&gt;

&lt;p&gt;Define your AI's personality in 3 sentences&lt;/p&gt;

&lt;p&gt;Saturday (1 hour):&lt;/p&gt;

&lt;p&gt;Set up Flask + OpenAI&lt;/p&gt;

&lt;p&gt;Build file upload handling&lt;/p&gt;

&lt;p&gt;Test your system prompt with real data&lt;/p&gt;

&lt;p&gt;Iterate on temperature/prompt until roasts are funny&lt;/p&gt;

&lt;p&gt;Sunday (1.5 hours):&lt;/p&gt;

&lt;p&gt;Design UI with Tailwind&lt;/p&gt;

&lt;p&gt;Add one-click sharing&lt;/p&gt;

&lt;p&gt;Test on 5 friends (get honest feedback)&lt;/p&gt;

&lt;p&gt;Deploy to your preferred cloud platform&lt;/p&gt;

&lt;p&gt;Total time: 3 hours\Total cost: $0 (Heroku free tier + OpenAI free credits)&lt;/p&gt;

&lt;p&gt;The Code (Seriously, It's That Simple)&lt;/p&gt;

&lt;p&gt;Here's the entire backend in 81 lines:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;from flask import Flask, request, jsonify, render_template
import io, pandas as pd, pdfplumber
from openai import OpenAI
from dotenv import load_dotenv
import os

load_dotenv()
app = Flask(__name__)
client = OpenAI(api_key=os.getenv('OPENAI_API_KEY'))

SYSTEM_PROMPT = """You are a Savage Financial Parent - brutally honest, 
wickedly funny, armed with transaction history. Roast their spending using 
SPECIFIC details. Think disappointed parent meets stand-up comedian. 
Keep under 200 words. End with Financial Maturity Grade (A-F)."""

def process_csv(file):
    df = pd.read_csv(file)
    amount_col = next((c for c in df.columns 
                      if 'amount' in c.lower()), df.columns[-1])
    df[amount_col] = pd.to_numeric(
        df[amount_col].astype(str).str.replace('$', ''),
        errors='coerce'
    )

    total = df[amount_col].abs().sum()
    top5 = df.nlargest(5, amount_col)

    return {
        'total_spent': f"${total:,.2f}",
        'top_expenses': top5.to_dict('records')
    }

def generate_roast(summary):
    prompt = f"""
Total Spent: {summary['total_spent']}
Top Expenses: {summary['top_expenses']}

Roast these spending habits with savage humor.
"""

    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {"role": "system", "content": SYSTEM_PROMPT},
            {"role": "user", "content": prompt}
        ],
        temperature=0.9
    )

    return response.choices[0].message.content

@app.route('/')
def home():
    return render_template('home.html')

@app.route('/roast', methods=['POST'])
def roast():
    try:
        file = request.files['file']
        file_bytes = io.BytesIO(file.read())

        summary = process_csv(file_bytes)
        roast_text = generate_roast(summary)

        return jsonify({
            'success': True,
            'roast': roast_text,
            'summary': summary
        })
    except Exception as e:
        return jsonify({'error': str(e)}), 500

if __name__ == '__main__':
    app.run(debug=True)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's it. 81 lines. No ML training, no complex infrastructure, no month-long sprints.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Final Thoughts: Learn by Building&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This tutorial project - a simple weekend build with 81 lines of Python - teaches more practical AI skills than any of those complex systems.&lt;/p&gt;

&lt;p&gt;Why? Because you can actually build it yourself, understand every line, and adapt it to your own ideas.&lt;/p&gt;

&lt;p&gt;The lesson? The best way to learn AI development is to build projects that are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Simple enough to finish (weekend projects, not month-long sprints)&lt;/li&gt;
&lt;li&gt;Interesting enough to share (add personality, make it fun)&lt;/li&gt;
&lt;li&gt;Practical enough to extend (real use cases, clear applications)
So stop reading tutorials and start building. Pick a file format (PDF, CSV, JSON), add some AI analysis, and ship it this weekend.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And when you build something cool, share it! The AI development community loves creative projects.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Resources &amp;amp; Next Steps&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Complete tutorial code: &lt;a href="https://github.com/ademicho123/FinanceRoaster" rel="noopener noreferrer"&gt;https://github.com/ademicho123/FinanceRoaster&lt;/a&gt; (Fork it and customize!)&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What to do after building this:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Deploy it online (Heroku, Railway, or DigitalOcean)&lt;/li&gt;
&lt;li&gt;Share with friends for feedback&lt;/li&gt;
&lt;li&gt;Adapt the pattern for a different domain&lt;/li&gt;
&lt;li&gt;Add the project to your portfolio&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;P.S. After you build this, try uploading your own bank statement. The AI is surprisingly insightful (and funny). Consider it a feature test and free financial feedback.&lt;/p&gt;

&lt;p&gt;Additionally, if you extend this tutorial with cool features, please submit a pull request! The best additions will be merged into the main repo with credit to you.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>openai</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>The 88% Problem: Why Most AI Projects Die Between Pilot and Production</title>
      <dc:creator>Ademola Balogun</dc:creator>
      <pubDate>Sat, 03 Jan 2026 11:59:31 +0000</pubDate>
      <link>https://dev.to/ambalogun/the-88-problem-why-most-ai-projects-die-between-pilot-and-production-535m</link>
      <guid>https://dev.to/ambalogun/the-88-problem-why-most-ai-projects-die-between-pilot-and-production-535m</guid>
      <description>&lt;p&gt;There's a statistic making the rounds in tech circles that should terrify anyone investing in AI: for every 33 AI prototypes a company builds, only 4 make it into production. That's an 88% failure rate.&lt;/p&gt;

&lt;p&gt;This isn't about bad models or insufficient computing power. The technology works. The problem is everything that happens &lt;em&gt;after&lt;/em&gt; the model trains successfully.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Pilot Trap
&lt;/h2&gt;

&lt;p&gt;AI pilots succeed for the wrong reasons. They're built in controlled environments with clean data, dedicated resources, and forgiving success metrics. A data scientist can spend three months perfecting a model on historical data, demo it to stakeholders with cherry-picked examples, and get enthusiastic approval to "scale it up."&lt;/p&gt;

&lt;p&gt;Then reality hits.&lt;/p&gt;

&lt;p&gt;Production systems need to handle messy real-time data from a dozen different sources. They need to integrate with legacy systems built before anyone thought about APIs. They need to run reliably at 3 AM when the data science team is asleep. They need to make decisions fast enough that users don't notice the AI is even there.&lt;/p&gt;

&lt;p&gt;A model that achieves 94% accuracy in the lab can become completely unusable in production if it adds 5 seconds of latency to a checkout process. Or if it requires manual data cleanup before each run. Or if it breaks every time the data format changes slightly.&lt;/p&gt;

&lt;p&gt;The pilot proved the concept. Production requires proving the system can survive contact with reality.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Infrastructure Nightmare
&lt;/h2&gt;

&lt;p&gt;Most companies discover too late that their IT infrastructure can't actually support AI at scale.&lt;/p&gt;

&lt;p&gt;The pilot ran on a data scientist's laptop or a small cloud instance. Production needs to handle hundreds or thousands of requests per second, maintain consistent performance during traffic spikes, and fail gracefully when—not if—something breaks.&lt;/p&gt;

&lt;p&gt;Consider a fraud detection model. In pilot: analyzing 100 transactions overnight to flag suspicious patterns. In production: making real-time decisions on every transaction flowing through the payment system, with false positives costing customer trust and false negatives costing actual money.&lt;/p&gt;

&lt;p&gt;That transition requires:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Infrastructure that scales dynamically with load&lt;/li&gt;
&lt;li&gt;Monitoring systems that catch model drift before it causes problems&lt;/li&gt;
&lt;li&gt;Fallback mechanisms when the AI service goes down&lt;/li&gt;
&lt;li&gt;Data pipelines that can handle schema changes without breaking&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Companies find themselves needing to build an entire MLOps infrastructure just to deploy one model. Many don't have the engineering capacity, so the project stalls indefinitely in "pilot mode."&lt;/p&gt;

&lt;h2&gt;
  
  
  The Data Quality Reality Check
&lt;/h2&gt;

&lt;p&gt;Pilots use carefully curated datasets. Production uses whatever data the business actually generates.&lt;/p&gt;

&lt;p&gt;In controlled testing, someone manually labeled 10,000 examples with perfect accuracy. In production, labels come from an understaffed operations team entering data between phone calls, or from automated systems with their own error rates, or from customers who click random buttons to make dialogs go away.&lt;/p&gt;

&lt;p&gt;The training data was historical, meaning someone already fixed the errors and filled in the missing values. Production data arrives incomplete, inconsistent, and occasionally completely wrong. The model trained on "clean" data has no idea what to do when 30% of the required fields are blank.&lt;/p&gt;

&lt;p&gt;Here's what kills projects: discovering after six months of development that the data quality required to make the AI reliable doesn't actually exist in production. Companies face a choice—invest millions in data infrastructure improvements, or abandon the AI project. Most choose the latter.&lt;/p&gt;

&lt;p&gt;Recent surveys show that 43% of organizations cite data quality and readiness as their top obstacle to AI deployment. Not model performance. Not computing costs. Data quality.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Integration Hell
&lt;/h2&gt;

&lt;p&gt;AI models don't run in isolation. They need to integrate with the dozens of existing systems that actually run the business.&lt;/p&gt;

&lt;p&gt;The pilot proved the model works. Now it needs to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Pull data from a CRM system last updated in 2012&lt;/li&gt;
&lt;li&gt;Send predictions to an ERP system that uses SOAP APIs&lt;/li&gt;
&lt;li&gt;Log results to a data warehouse built for batch processing, not real-time updates&lt;/li&gt;
&lt;li&gt;Trigger alerts in Slack, email, and a proprietary monitoring tool&lt;/li&gt;
&lt;li&gt;Comply with access controls defined in Active Directory&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Each integration point is a potential failure mode. The legacy systems weren't designed for AI workloads. Their APIs rate-limit at inconvenient thresholds. Their data formats don't quite match what the model expects. Their update schedules conflict with when the model needs fresh data.&lt;/p&gt;

&lt;p&gt;Integration complexity grows exponentially with each system involved. What looked like a straightforward deployment becomes a year-long integration project touching half a dozen teams who all have conflicting priorities.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Organizational Friction
&lt;/h2&gt;

&lt;p&gt;Technical challenges have technical solutions. Organizational challenges are harder.&lt;/p&gt;

&lt;p&gt;AI projects in production require coordination between data scientists, ML engineers, software developers, IT operations, security teams, compliance officers, and business stakeholders. Each group speaks a different language and optimizes for different goals.&lt;/p&gt;

&lt;p&gt;Data scientists care about model accuracy. Operations cares about uptime. Security cares about access controls. Compliance cares about audit trails. Business stakeholders care about ROI. Getting all these groups aligned is harder than building the model.&lt;/p&gt;

&lt;p&gt;Then there's the resistance from people whose workflows the AI will change. The customer service team that doesn't trust automated classifications. The credit analysts who resent being "replaced" by an algorithm. The operations managers who built their processes around manual review and don't want to redesign everything.&lt;/p&gt;

&lt;p&gt;A technically perfect AI system can fail in production simply because nobody wants to use it.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Cost Trap
&lt;/h2&gt;

&lt;p&gt;Pilots are cheap. Production is expensive.&lt;/p&gt;

&lt;p&gt;Running a model on a sample dataset costs dollars. Running it continuously on production traffic costs thousands per month in compute, storage, and bandwidth. Fine-tuning and retraining as data drifts adds more costs. Monitoring, logging, and debugging infrastructure adds more costs. The human operators who need to intervene when the AI gets confused add more costs.&lt;/p&gt;

&lt;p&gt;Companies approve pilot budgets easily—it's "innovation" and "staying competitive." Production budgets require demonstrating clear ROI, which is hard when the system isn't deployed yet and all the costs are upfront while the benefits are theoretical.&lt;/p&gt;

&lt;p&gt;CFOs kill AI projects when the production cost projections arrive. The business case that justified the pilot evaporates when scaling it up requires 10x the ongoing expense.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Actually Works
&lt;/h2&gt;

&lt;p&gt;The companies successfully moving AI from pilot to production do a few things differently:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;They design for production from day one.&lt;/strong&gt; No "we'll figure out deployment later." The first question is: how will this actually run in production? That constraint shapes everything—model complexity, latency requirements, data dependencies, failure modes.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;They build MLOps infrastructure first.&lt;/strong&gt; Before the third AI pilot starts, they invest in standardized deployment pipelines, monitoring frameworks, and model management systems. The infrastructure work feels like a distraction from "real" AI, but it's what separates successful deployments from permanent pilots.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;They start with use cases that have simple integration requirements.&lt;/strong&gt; Don't make your first production AI project depend on integrating with seven legacy systems. Pick something that can run relatively standalone and deliver value even with imperfect accuracy.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;They accept that production models will be "worse" than pilot models.&lt;/strong&gt; The pilot model achieved 96% accuracy on clean data with infinite compute time. The production model gets 89% accuracy but runs in 100ms with realistic data quality. The second one is actually more valuable because it ships.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;They invest in data infrastructure before AI.&lt;/strong&gt; Companies with mature data practices—centralized data warehouses, standardized schemas, automated quality checks—can deploy AI relatively easily. Companies with fragmented data spend years building that foundation first.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Uncomfortable Math
&lt;/h2&gt;

&lt;p&gt;Current industry data shows:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;74% of organizations struggle to scale AI from pilot to production&lt;/li&gt;
&lt;li&gt;Only 1% of companies consider themselves "AI-mature"&lt;/li&gt;
&lt;li&gt;46% of AI pilots are scrapped before reaching production&lt;/li&gt;
&lt;li&gt;Fewer than 20% of AI initiatives have been fully scaled across the enterprise&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This isn't improving. If anything, as more companies launch AI pilots, the failure rate is increasing because everyone hits the same infrastructure and organizational walls.&lt;/p&gt;

&lt;p&gt;The gap between "we built a model" and "the model is creating business value" remains stubbornly wide. Not because the technology isn't ready—it is. But because deploying production systems is legitimately hard, and most organizations underestimate how hard until they're deep into failed attempts.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Path Forward
&lt;/h2&gt;

&lt;p&gt;The AI pilot-to-production problem won't solve itself. It requires acknowledging that:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;AI deployment is an engineering problem, not a data science problem.&lt;/strong&gt; After the model is trained, 90% of the work is software engineering, DevOps, and systems integration. Companies that treat AI deployment as a data science project fail.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Production requirements should drive pilot design.&lt;/strong&gt; If you can't deploy it, don't build it. Pilots should be proofs of &lt;em&gt;deployment&lt;/em&gt;, not just proofs of concept.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Infrastructure investment must precede AI investment.&lt;/strong&gt; Building five AI pilots without MLOps infrastructure is worse than building one pilot with proper deployment capabilities.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Organizational alignment is as important as technical capability.&lt;/strong&gt; The best AI in the world fails if nobody trusts it, uses it, or maintains it.&lt;/p&gt;

&lt;p&gt;The companies that figure this out will have a massive competitive advantage. Not because their models are better, but because their models actually run in production instead of gathering dust in Jupyter notebooks.&lt;/p&gt;

&lt;p&gt;The 88% failure rate isn't inevitable. It's just what happens when organizations conflate "we built an AI model" with "we deployed an AI system." Those are completely different problems requiring completely different capabilities.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;The hardest thing about AI isn't building models—it's building the systems, processes, and organizational capabilities required to run those models reliably in production. Until the industry solves the deployment gap, most AI investment will continue producing expensive prototypes instead of business value.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>devops</category>
      <category>discuss</category>
      <category>machinelearning</category>
    </item>
    <item>
      <title>Why I Chose Voice Over Chat for AI Interviews (And Why It Almost Backfired)</title>
      <dc:creator>Ademola Balogun</dc:creator>
      <pubDate>Sun, 28 Dec 2025 08:00:26 +0000</pubDate>
      <link>https://dev.to/ambalogun/why-i-chose-voice-over-chat-for-ai-interviews-and-why-it-almost-backfired-2fab</link>
      <guid>https://dev.to/ambalogun/why-i-chose-voice-over-chat-for-ai-interviews-and-why-it-almost-backfired-2fab</guid>
      <description>&lt;p&gt;Most AI interview platforms are glorified chatbots with better questions. We built Squrrel to do something harder: have actual spoken conversations with candidates.&lt;/p&gt;

&lt;p&gt;That decision nearly killed the product before launch.&lt;/p&gt;

&lt;p&gt;The Obvious Choice That Wasn't Obvious&lt;/p&gt;

&lt;p&gt;When I started building Squrrel, the safe play was text-based interviews. Lower latency, fewer technical headaches, easier to parse and analyze. Every AI product manager I talked to said the same thing: "Start with chat. Voice is a nightmare."&lt;/p&gt;

&lt;p&gt;They were right about the nightmare part.&lt;/p&gt;

&lt;p&gt;But I kept coming back to one fact: 78% of recruiting happens over the phone. Not email. Not Slack. Phone calls. Because hiring managers want to hear how candidates think on their feet, how they structure explanations, whether they can articulate complex ideas clearly.&lt;/p&gt;

&lt;p&gt;A text-based interview platform would be easier to build and completely miss the point.&lt;/p&gt;

&lt;p&gt;So we went with voice. And immediately discovered why everyone warned us against it.&lt;/p&gt;

&lt;p&gt;The Technical Debt I Didn't See Coming&lt;/p&gt;

&lt;p&gt;Speech recognition for interviews is different from speech recognition for everything else.&lt;/p&gt;

&lt;p&gt;Siri and Alexa are optimized for short commands. Transcription tools like Otter are optimized for meetings with multiple speakers. We needed something that could handle:&lt;/p&gt;

&lt;p&gt;20-40 minute monologues about technical projects&lt;/p&gt;

&lt;p&gt;Industry jargon that doesn't exist in standard training data ("Kubernetes," "PostgreSQL," "JWT authentication")&lt;/p&gt;

&lt;p&gt;Non-native English speakers with varying accents&lt;/p&gt;

&lt;p&gt;Candidates who talk fast when nervous or slow when thinking&lt;/p&gt;

&lt;p&gt;Off-the-shelf speech-to-text models failed spectacularly. Our first pilot had a 23% word error rate on technical terms. A candidate said "I implemented Redis caching" and got transcribed as "I implemented ready's catching." Recruiters couldn't trust the output.&lt;/p&gt;

&lt;p&gt;I spent three weeks fine-tuning Wav2Vec 2.0 on domain-specific data—transcripts from actual tech interviews, recordings of engineers explaining their work, podcasts about software development. Got the error rate down to 6% for technical vocabulary.&lt;/p&gt;

&lt;p&gt;But here's what surprised me: the remaining errors weren't random. They clustered around moments of hesitation, filler words, and self-corrections—exactly the moments that reveal how someone thinks under pressure.&lt;/p&gt;

&lt;p&gt;We almost removed those "errors" before realizing they were features, not bugs.&lt;/p&gt;

&lt;p&gt;The Conversational AI Problem Nobody Talks About&lt;/p&gt;

&lt;p&gt;Building an AI that can conduct a natural interview conversation is way harder than building one that asks scripted questions.&lt;/p&gt;

&lt;p&gt;The models are good at turn-taking now—knowing when the candidate has finished speaking, when to probe deeper, when to move on. But they're terrible at knowing why to do those things.&lt;/p&gt;

&lt;p&gt;Our first version would ask "Tell me about a time you faced a technical challenge" and then immediately jump to the next question, regardless of whether the candidate gave a three-sentence answer or a three-minute story. It felt robotic because it was robotic—no human interviewer would blow past a shallow answer without following up.&lt;/p&gt;

&lt;p&gt;We had to build a layer that analyzes response depth and triggers follow-ups. Not just keyword matching—actual semantic understanding of whether the candidate addressed the question or danced around it.&lt;/p&gt;

&lt;p&gt;This meant combining LLaMA 3.3 70B for conversation flow with TinyBERT for real-time classification. The large model decides what to ask, the small model decides if the answer was substantive enough to move forward. They run in parallel with about 800ms latency between candidate finishing and AI responding.&lt;/p&gt;

&lt;p&gt;That 800ms pause? Candidates tell us it makes the conversation feel more natural. Humans don't respond instantly either.&lt;/p&gt;

&lt;p&gt;The Bias Problem That Wasn't a Bias Problem&lt;/p&gt;

&lt;p&gt;Everyone asked about bias in AI hiring. "How do you prevent discrimination against protected classes?"&lt;/p&gt;

&lt;p&gt;Honest answer? We can't. Not completely.&lt;/p&gt;

&lt;p&gt;But we can be transparent about where bias enters the system and give recruiters tools to catch it.&lt;/p&gt;

&lt;p&gt;Our approach:&lt;/p&gt;

&lt;p&gt;Standardized questions - Every candidate gets asked the same core questions in the same order. This eliminates the biggest source of interviewer bias: one person getting softball questions while another gets grilled.&lt;/p&gt;

&lt;p&gt;Anonymized analysis - The AI evaluation doesn't see candidate names, photos, or demographic data. It only sees the transcript and voice characteristics relevant to communication (clarity, pace, coherence—not accent or gender).&lt;/p&gt;

&lt;p&gt;Bias audit logs - We track which candidates get follow-up questions and why. If the AI is consistently probing deeper with one demographic group, that pattern surfaces in our analytics.&lt;/p&gt;

&lt;p&gt;Human override - Recruiters see the full transcript alongside the AI summary. They can—and do—disagree with the AI's assessment.&lt;/p&gt;

&lt;p&gt;The dirty secret of AI hiring tools is that removing human bias is impossible. What's possible is making bias visible and consistent. A human interviewer might grill technical candidates on Tuesdays because they're stressed, then lob softballs on Fridays when they're in a good mood. The AI applies the same standards at 2 PM and 2 AM.&lt;/p&gt;

&lt;p&gt;That's not unbiased. It's consistently biased, which is actually useful if you know what you're looking for.&lt;/p&gt;

&lt;p&gt;What Breaking Things Taught Me&lt;/p&gt;

&lt;p&gt;When we started testing the system, the AI asked a great opening question, then froze for 14 seconds before asking it again. The candidate thought the system crashed and hung up.&lt;/p&gt;

&lt;p&gt;The bug? Our conversation state management couldn't handle the candidate pausing to think. The silence triggered a "no response detected" error, which triggered a retry, which created a race condition.&lt;/p&gt;

&lt;p&gt;Fixed it by adding a confidence threshold—the AI now distinguishes between "finished talking" silence and "still thinking" silence based on speech patterns in the previous 3 seconds. Not perfect, but it dropped the false-positive rate from 18% to 2%.&lt;/p&gt;

&lt;p&gt;Here's the lesson I took away: voice AI in high-stakes scenarios requires defensive design at every layer. Unlike a chatbot where someone can retype their message, you can't ask a candidate to "restart the interview" because your error handling failed.&lt;/p&gt;

&lt;p&gt;We built in:&lt;/p&gt;

&lt;p&gt;Automatic session recovery if connectivity drops&lt;/p&gt;

&lt;p&gt;Manual override for recruiters to flag bad transcriptions&lt;/p&gt;

&lt;p&gt;A "pause interview" button for candidates (surprisingly popular)&lt;/p&gt;

&lt;p&gt;Playback of the actual audio alongside transcripts&lt;/p&gt;

&lt;p&gt;The goal isn't perfection. It's resilience when things go wrong, because they will go wrong.&lt;/p&gt;

&lt;p&gt;Why This Matters for Other AI Builders&lt;/p&gt;

&lt;p&gt;If you're building AI for professional contexts—interviews, legal analysis, medical screening, financial advice—here's what I'd tell you:&lt;/p&gt;

&lt;p&gt;Voice is worth the pain. The richness of verbal communication unlocks insights that text can't capture. But only if you're willing to solve the hard problems instead of shipping a minimum viable chatbot.&lt;/p&gt;

&lt;p&gt;Domain-specific fine-tuning isn't optional. General-purpose models are amazing and terrible at the same time. They'll handle 90% of your use case brilliantly, then catastrophically fail on the 10% that matters most. Find that 10% early and train specifically for it.&lt;/p&gt;

&lt;p&gt;Latency is a feature. We obsessed over response time at first, trying to get under 500ms. Then we realized that instant responses felt uncanny. The sweet spot for conversational AI is 600-1000ms—fast enough to feel responsive, slow enough to feel natural.&lt;/p&gt;

&lt;p&gt;Build for the failure modes. Your AI will misunderstand accents, mishear technical terms, and ask nonsensical follow-ups. Design the system so humans can catch these failures gracefully instead of catastrophically.&lt;/p&gt;

&lt;p&gt;The Uncomfortable Truth About AI Products&lt;/p&gt;

&lt;p&gt;Six months into building Squrrel, I had a realization that almost made me quit: the AI isn't the product. The product is the workflow that the AI enables.&lt;/p&gt;

&lt;p&gt;Candidates don't care that we use Wav2Vec 2.0 for transcription or LLaMA 3.3 for conversation. They care that they can interview at midnight without scheduling four emails. Recruiters don't care about our evaluation algorithms. They care that they can review 10 candidates in an hour instead of spending all week on phone screens.&lt;/p&gt;

&lt;p&gt;The AI is infrastructure. The value is in removing friction from a broken process.&lt;/p&gt;

&lt;p&gt;This realization changed everything. We stopped optimizing for model accuracy and started optimizing for user experience. We added features like letting candidates preview questions before starting, because that reduced anxiety and led to better responses—even though it "broke" the blind evaluation model we'd carefully designed.&lt;/p&gt;

&lt;p&gt;Turns out, a slightly worse AI that people actually use beats a perfect AI that sits unused because the UX is terrible.&lt;/p&gt;

&lt;p&gt;What's Next&lt;/p&gt;

&lt;p&gt;We're expanding our pilots and learning every day. The technology works. The question now is whether we can scale the human side—onboarding, support, training recruiters to trust but verify AI outputs.&lt;/p&gt;

&lt;p&gt;I'm also watching the regulatory space closely. The EU AI Act classifies hiring tools as "high-risk AI systems." New York City requires bias audits for automated employment decision tools. This is good—high-stakes AI should be regulated.&lt;/p&gt;

&lt;p&gt;But it also means we need to build compliance into the product from day one, not bolt it on later. Audit trails, explainability, human oversight—these aren't nice-to-haves. They're survival requirements.&lt;/p&gt;

&lt;p&gt;If you're building AI products in regulated industries, start designing for compliance now. It's way easier than retrofitting later.&lt;/p&gt;

</description>
      <category>architecture</category>
      <category>ai</category>
      <category>startup</category>
      <category>interview</category>
    </item>
    <item>
      <title>The AI Agent Reality Check: What Actually Works in Production (And What Doesn't)</title>
      <dc:creator>Ademola Balogun</dc:creator>
      <pubDate>Sun, 14 Dec 2025 15:25:25 +0000</pubDate>
      <link>https://dev.to/ambalogun/the-ai-agent-reality-check-what-actually-works-in-production-and-what-doesnt-3m7k</link>
      <guid>https://dev.to/ambalogun/the-ai-agent-reality-check-what-actually-works-in-production-and-what-doesnt-3m7k</guid>
      <description>&lt;p&gt;As we close out 2025, everyone's been calling this "the year of AI agents." But here's what nobody wants to admit: most of these agents aren't actually working.&lt;/p&gt;

&lt;p&gt;I've spent the last year building production AI systems—speech recognition for enterprise clients, fraud detection models, RAG chatbots handling real customer queries. And the gap between what the AI hype cycle promises and what actually ships to production is... substantial. Let me walk you through what's really happening out there.&lt;/p&gt;

&lt;p&gt;The Production Gap Nobody Talks About&lt;/p&gt;

&lt;p&gt;According to recent LangChain data, only 51% of companies have agents in production. That's it. Half. And here's the kicker: 78% say they have "active plans" to deploy agents soon. We've all heard that one before.&lt;/p&gt;

&lt;p&gt;The problem isn't capability—it's that building reliable agents is genuinely hard. The frameworks have matured (LangGraph, CrewAI, AutoGen), the models have gotten better, but production deployment remains this gnarly problem that most teams underestimate.&lt;/p&gt;

&lt;p&gt;I've seen it firsthand. A chatbot that works beautifully in your Jupyter notebook can fall apart spectacularly when real users start hammering it at 3 AM with edge cases you never imagined.&lt;/p&gt;

&lt;p&gt;Why Most AI Projects Actually Fail&lt;/p&gt;

&lt;p&gt;Let's talk about the uncomfortable truth: somewhere between 70-85% of AI projects are failing to meet their ROI targets. That's not a typo. Compare that to regular IT projects which fail at 25-50%. AI is literally twice as likely to fail.&lt;/p&gt;

&lt;p&gt;Why? Everyone points to different culprits, but having built systems that made it through this gauntlet, here's what I've learned:&lt;/p&gt;

&lt;p&gt;Data quality is the silent killer. Not "we don't have enough data"—we're drowning in data. The issue is that the data is fragmented, inconsistent, and fundamentally not ready for what AI needs. Traditional data management assumes you know your schema upfront. AI? It needs representative samples, balanced classes, and context that's often missing from your enterprise data warehouse.&lt;/p&gt;

&lt;p&gt;Research shows that 43% of organizations cite data quality and readiness as their top obstacle. Another study found that 80% of companies struggle with data preprocessing and cleaning. When I built our fraud detection system using Autoencoders, we spent 60% of our time on data pipeline issues, not model architecture.&lt;/p&gt;

&lt;p&gt;Infrastructure reality bites. The surveys are brutal on this: 79% of companies lack sufficient GPUs to meet current AI demands. Mid-sized companies (100-2000 employees) are actually the most aggressive with production deployments at 63%, probably because they're nimble enough to move fast but big enough to afford the infrastructure.&lt;/p&gt;

&lt;p&gt;But here's the thing—you don't always need massive GPU clusters. For our sentiment analysis work with TinyBERT, we ran inference on CPU instances and it worked fine. The key is matching your infrastructure to your actual use case, not what TechCrunch says you need.&lt;/p&gt;

&lt;p&gt;The Agent Architecture That's Actually Working&lt;/p&gt;

&lt;p&gt;The agents that are succeeding in production aren't the autonomous, do-everything AGI dreams that AutoGPT promised us back in 2024. They're narrowly scoped, highly controllable systems with what developers call "custom cognitive architectures."&lt;/p&gt;

&lt;p&gt;Take a look at what companies like Uber, LinkedIn, and Replit are actually deploying:&lt;/p&gt;

&lt;p&gt;Uber: Building internal coding tools for large-scale code migrations. Not general-purpose. Specific workflows that only they really understand.&lt;/p&gt;

&lt;p&gt;LinkedIn: SQL Bot that converts natural language to SQL queries. Super focused. Does one thing really well.&lt;/p&gt;

&lt;p&gt;Replit: Code generation agents with heavy human-in-the-loop controls. They're not letting the AI run wild—humans are in the driver's seat.&lt;/p&gt;

&lt;p&gt;The pattern here? These agents are orchestrators calling reliable APIs, not autonomous decision-makers. It's less "AI takes over" and more "AI makes clicking through 17 different interfaces unnecessary."&lt;/p&gt;

&lt;p&gt;As 2025 wraps up, the lesson is clear: the agents shipping to production in 2026 will be the ones that learned from this year's hard-won lessons.&lt;/p&gt;

&lt;p&gt;What Production Actually Looks Like&lt;/p&gt;

&lt;p&gt;From my experience building Squrrel.app (an AI recruitment platform), here are the lessons that matter:&lt;/p&gt;

&lt;p&gt;Start embarrassingly narrow. Our interview analysis didn't try to do everything—it focused on candidate responses, extracted key insights, and flagged concerning patterns. That's it. We added features incrementally once the core loop was bulletproof.&lt;/p&gt;

&lt;p&gt;Observability isn't optional. Tools like Langfuse or Azure AI Foundry show you what's happening inside your agent through traces and spans. Without this, you're flying blind. When our LLaMA 3.3 70B model started producing weird outputs at 2 AM, we could trace it back to a prompt formatting issue within minutes because we had proper logging.&lt;/p&gt;

&lt;p&gt;Evaluation needs to be continuous. Offline testing with curated datasets is table stakes. But online evaluation—testing with real user queries—is where you discover the edge cases. We run both, constantly.&lt;/p&gt;

&lt;p&gt;Cost management is real. LLM calls add up fast. We found that caching frequently-used completions and using smaller models for classification tasks cut our costs by 40%. Using TinyBERT for sentiment pre-processing before hitting the large model? Game changer.&lt;/p&gt;

&lt;p&gt;The Small Language Model Movement&lt;/p&gt;

&lt;p&gt;This deserves its own section because it's one of the most practical developments of 2024.&lt;/p&gt;

&lt;p&gt;Everyone obsessed over GPT-4 and Claude, but the real innovation? Getting sophisticated AI to run on devices as small as smartphones. Meta's Llama updates are 56% smaller and four times faster. Nvidia's Nemotron-Mini-4B gets VRAM usage down to about 2GB.&lt;/p&gt;

&lt;p&gt;For production systems, this matters immensely. Lower latency. Lower costs. Less infrastructure complexity. Better privacy since you're not sending everything to external APIs.&lt;/p&gt;

&lt;p&gt;We used this approach in our sentiment analysis pipeline—TinyBERT handles the initial classification and routing, only calling the big models when necessary. Works great, costs a fraction.&lt;/p&gt;

&lt;p&gt;The Data Problem Won't Fix Itself&lt;/p&gt;

&lt;p&gt;Here's something I wish someone had told me earlier: AI-ready data is fundamentally different from analytics-ready data.&lt;/p&gt;

&lt;p&gt;Traditional data management is too structured, too slow, too rigid. AI needs:&lt;/p&gt;

&lt;p&gt;Representative samples, not just accurate records&lt;/p&gt;

&lt;p&gt;Balanced classes for training&lt;/p&gt;

&lt;p&gt;Rich context and metadata that analytics never required&lt;/p&gt;

&lt;p&gt;Fast iteration cycles that traditional governance processes can't support&lt;/p&gt;

&lt;p&gt;63% of organizations don't have the right data management practices for AI. Gartner predicts that through 2027, companies will abandon 60% of AI projects specifically due to lack of AI-ready data.&lt;/p&gt;

&lt;p&gt;This isn't something you can outsource to your existing data team and hope for the best. It requires new practices, new tools, and honestly, new thinking about what "data quality" even means.&lt;/p&gt;

&lt;p&gt;What's Coming in 2026&lt;/p&gt;

&lt;p&gt;Based on what I'm seeing in the field and the research patterns heading into the new year:&lt;/p&gt;

&lt;p&gt;Multimodal agents are arriving for real. Not just text—agents that understand images, generate video, process audio, all from a single interface. OpenAI's Sora and Google's Veo showed what's possible. We're about to see these capabilities embedded in production workflows.&lt;/p&gt;

&lt;p&gt;The framework wars are consolidating. LangGraph has emerged as a clear leader for controllable agentic workflows. The verbose, opaque frameworks are getting left behind. Developers want low-level control without hidden prompts.&lt;/p&gt;

&lt;p&gt;Agentic AI meets scientific computing. This is exciting—AI agents accelerating materials science, drug discovery, climate modeling. AlphaMissense improved genetic mutation classification. GNoME is discovering new materials. The "AI for science" vertical is heating up.&lt;/p&gt;

&lt;p&gt;Regulation is accelerating. The EU's AI Act banned certain applications in 2024, and 2025 saw more compliance requirements roll out. 2026 will bring even stricter governance. If you're building agents, you need to be thinking about safety, transparency, and governance now, not later.&lt;/p&gt;

&lt;p&gt;The Practical Takeaway&lt;/p&gt;

&lt;p&gt;If you're building AI agents as we head into 2026, here's my advice from the trenches:&lt;/p&gt;

&lt;p&gt;Start narrow and specific. General-purpose agents are a research problem, not a product strategy.&lt;/p&gt;

&lt;p&gt;Invest in data infrastructure early. You'll spend way more time here than on model selection.&lt;/p&gt;

&lt;p&gt;Build observability from day one. You can't fix what you can't see.&lt;/p&gt;

&lt;p&gt;Use small models where possible. Not every problem needs GPT-4.&lt;/p&gt;

&lt;p&gt;Plan for failure modes. Your agent will do weird things. Have fallbacks.&lt;/p&gt;

&lt;p&gt;Keep humans in the loop. The best production agents are human-AI collaboration, not AI autonomy.&lt;/p&gt;

&lt;p&gt;The hype around AI agents is justified—they really can transform workflows and save significant time. Microsoft's research shows employees save 1-2 hours daily using AI for routine tasks. Our Squrrel.app platform has cut hiring cycle times substantially.&lt;/p&gt;

&lt;p&gt;But the path from prototype to production is littered with failed projects. The companies succeeding aren't the ones with the fanciest models or the biggest budgets. They're the ones who understand that production AI is an engineering discipline, not a science experiment.&lt;/p&gt;

&lt;p&gt;The technology works. The challenge is everything else—data, infrastructure, evaluation, monitoring, governance. Master those, and you'll be in that 51% with agents actually running in production.&lt;/p&gt;

&lt;p&gt;Ignore them, and you'll be in the 85% wondering why your AI initiative didn't deliver.&lt;/p&gt;

</description>
      <category>agents</category>
      <category>llm</category>
      <category>ai</category>
      <category>discuss</category>
    </item>
    <item>
      <title>Why Financial Sentiment Analysis Failed Without Explainability (And How I Fixed It)</title>
      <dc:creator>Ademola Balogun</dc:creator>
      <pubDate>Sat, 22 Nov 2025 19:29:34 +0000</pubDate>
      <link>https://dev.to/ambalogun/why-financial-sentiment-analysis-failed-without-explainability-and-how-i-fixed-it-46n1</link>
      <guid>https://dev.to/ambalogun/why-financial-sentiment-analysis-failed-without-explainability-and-how-i-fixed-it-46n1</guid>
      <description>&lt;p&gt;&lt;em&gt;Building a Production-Ready NLP System That Traders Actually Trust&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;A trader approaches you with a question: "Your model says this stock is bearish based on the news. But why? What words triggered that prediction?" You pause. Your 86% accurate sentiment classifier suddenly feels useless because you can't explain it.&lt;/p&gt;

&lt;p&gt;This is the hidden crisis in financial AI. Accuracy without explainability is a liability, not an asset.&lt;/p&gt;

&lt;p&gt;I learned this the hard way while building a financial sentiment analysis system for Lloyds, IAG, and Vodafone. The project forced me to solve a problem that most data scientists ignore until it's too late: how do you make a black-box NLP model trustworthy enough for high-stakes trading decisions?&lt;/p&gt;

&lt;h2&gt;
  
  
  The Problem: Accurate But Opaque
&lt;/h2&gt;

&lt;p&gt;When I started, the goal seemed straightforward: build a sentiment classifier that could analyze financial reports and news to predict market sentiment (bullish, neutral, bearish). I tested multiple models—AdaBoost, SVM, Random Forest, traditional Neural Networks—and they all performed reasonably well.&lt;/p&gt;

&lt;p&gt;But reasonable wasn't good enough.&lt;/p&gt;

&lt;p&gt;Here's the issue: financial markets don't reward accuracy in isolation. A model that's 83% accurate at classifying sentiment is worthless if a trader can't defend why it made a specific prediction. In regulated environments, explainability isn't a nice-to-have feature—it's a requirement.&lt;/p&gt;

&lt;p&gt;Traditional machine learning models are interpretable by design. You can understand why Random Forest predicted bearish by examining the decision tree paths. But when I tested more sophisticated approaches—specifically TinyBERT, a transformer-based model—I faced the classic deep learning trade-off: superior performance (86.45% accuracy on Vodafone data) paired with complete opacity.&lt;/p&gt;

&lt;p&gt;The model had learned something real about financial language. It just wouldn't tell me what.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Breakthrough: SHAP for Financial Intelligence
&lt;/h2&gt;

&lt;p&gt;Enter SHAP (SHapley Additive exPlanations). Rather than trying to reverse-engineer what the model learned, SHAP provides a principled way to decompose predictions into feature contributions using game theory.&lt;/p&gt;

&lt;p&gt;The insight is elegant: for each prediction, SHAP calculates how much each word or phrase contributes to pushing the final sentiment classification in a particular direction. Instead of a black box, you get a transparent ledger of the model's reasoning.&lt;/p&gt;

&lt;p&gt;I implemented SHAP analysis into the TinyBERT pipeline and suddenly the model became interpretable. When the classifier predicted bearish on an earnings report mentioning "revenue decline" and "market headwinds," SHAP waterfall plots showed exactly which phrases drove the prediction and by how much.&lt;/p&gt;

&lt;p&gt;But here's what made it work in practice: I didn't just add SHAP as an afterthought. I made explainability central to the system design from day one. This meant structuring the entire pipeline around transparency.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Architecture: Modular and Transparent
&lt;/h2&gt;

&lt;p&gt;The system had eight interconnected modules, each designed with explainability in mind:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Data Collection Module&lt;/strong&gt;: Extracted text from PDF financial reports and CSV news files from Yahoo Finance. The discipline here was crucial—clean data feeds clean explanations.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Text Preprocessing Module&lt;/strong&gt;: Normalized text by removing noise (emojis, punctuation, extra spaces) while preserving financial jargon. This matters because "loss" has different meanings in accounting versus everyday language.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Sentiment Scoring Module&lt;/strong&gt;: Used VADER as a baseline to assign initial sentiment labels. This acted as a sanity check—if VADER and TinyBERT disagreed significantly, it was worth investigating why.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Model Training Module&lt;/strong&gt;: Fine-tuned TinyBERT on balanced, augmented data. Here's what made the difference: I used SMOTE (Synthetic Minority Oversampling) to handle class imbalance because imbalanced data introduces systematic bias that explainability tools can't fix.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Prediction Module&lt;/strong&gt;: Deployed the trained model for real-time inference. Nothing flashy, but bulletproof reliability.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Explainability Module&lt;/strong&gt;: Generated SHAP plots showing feature importance for every prediction. This is where the magic happened.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Attention Visualization Module&lt;/strong&gt;: Transformer models use attention mechanisms—essentially learned weights showing which parts of the input matter most. By visualizing these attention scores, I gave another layer of interpretability. When the model paid 45% of its attention to a specific phrase like "operational challenges," users could see that directly.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Visualization Module&lt;/strong&gt;: Built a Streamlit dashboard that brought everything together into a tool that financial analysts could actually use without a machine learning PhD.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Results: From Accuracy to Actionability
&lt;/h2&gt;

&lt;p&gt;When I tested the complete system across three companies spanning different sectors, the numbers were strong:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;TinyBERT&lt;/strong&gt;: 83.17% accuracy (Lloyds), 83.67% (IAG), 86.45% (Vodafone)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Traditional models&lt;/strong&gt; averaged 70-80%, showing the value of transfer learning&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Most importantly&lt;/strong&gt;: Every prediction came with full explainability&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;But the real win wasn't the accuracy benchmark. It was this: a senior trader could now read a SHAP explanation and either validate the model's reasoning or flag a mistake in its logic. That's when it became useful.&lt;/p&gt;

&lt;p&gt;One example: The system flagged a document as bearish based heavily on the phrase "uncertain regulatory environment." A human analyst immediately recognized that for the specific company and time period, that language was routine boilerplate—not a genuine risk signal. The explainability caught the false positive. Without SHAP, this would've passed through unexamined.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Challenges Nobody Talks About
&lt;/h2&gt;

&lt;p&gt;Building this system taught me that explainability doesn't solve everything—sometimes it exposes new problems.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Challenge 1: Data Quality Is Foundational&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;SHAP can't fix garbage data. When I extracted text from PDFs with poor formatting or inconsistent structures, the model's explanations became less trustworthy. I spent significant time on data cleaning because I knew that garbage data feeding into SHAP would generate garbage explanations.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Challenge 2: Class Imbalance Distorts Explanations&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Financial sentiment in the wild is imbalanced—neutral sentiments dominated the dataset, with bearish sentiments rare. If you train on imbalanced data, the model learns to predict the majority class more confidently, and SHAP will explain why. But those explanations can be misleading because they reflect the data distribution, not market reality.&lt;/p&gt;

&lt;p&gt;I addressed this with SMOTE—synthetically creating minority class examples—which meant the model learned real patterns in bearish language rather than just learning "rarely predict bearish."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Challenge 3: Explainability Can Be Too Technical&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;SHAP values are mathematically rigorous but visually abstract. Early versions of my dashboard confused users with technical plots. I had to simplify: show the top 3 words driving the prediction, visualize them clearly, and let users drill deeper if they want.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Broader Lesson: Explainability Changes Everything
&lt;/h2&gt;

&lt;p&gt;What surprised me most wasn't the technical challenge of implementing SHAP—it was realizing that explainability requirements fundamentally changed how I built the entire system.&lt;/p&gt;

&lt;p&gt;When you know your predictions will be questioned and scrutinized, you make different design choices:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You prioritize data quality over dataset size&lt;/li&gt;
&lt;li&gt;You use ensemble methods or interpretable models instead of pure black boxes&lt;/li&gt;
&lt;li&gt;You validate edge cases obsessively&lt;/li&gt;
&lt;li&gt;You document assumptions meticulously&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is the hidden benefit of explainability: it forces better engineering practices.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's Next
&lt;/h2&gt;

&lt;p&gt;The research highlighted several promising directions that point toward the future of financial AI:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Temporal Sentiment Modeling&lt;/strong&gt;: Understanding how sentiment shifts over time and correlating that with actual market movements. Does sentiment lead price movements, or follow them?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Multimodal Analysis&lt;/strong&gt;: Combining text sentiment with quantitative financial metrics. A document might express bullish language while reporting declining revenue—which signal matters more?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fine-Grained Classification&lt;/strong&gt;: Moving beyond bullish/neutral/bearish to capture nuanced positions. "Cautiously optimistic" is different from "bullish," and traders would benefit from that distinction.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Causal Inference&lt;/strong&gt;: The ultimate goal—understanding not just that sentiment and prices correlate, but &lt;em&gt;why&lt;/em&gt;. Does positive news drive prices up, or do rising prices drive positive news?&lt;/p&gt;

&lt;h2&gt;
  
  
  The Takeaway
&lt;/h2&gt;

&lt;p&gt;If you're building AI systems for high-stakes domains—finance, healthcare, criminal justice—remember this: a model is not a product until it's explainable.&lt;/p&gt;

&lt;p&gt;I could've stopped at 86% accuracy. That would've been publishable. But it would've been useless in practice because traders would never trust it.&lt;/p&gt;

&lt;p&gt;The breakthroughs in my system came not from tuning hyperparameters or finding the perfect architecture, but from making the decision to prioritize explainability from day one. SHAP, attention visualization, modular design—these weren't add-ons. They were the foundation.&lt;/p&gt;

&lt;p&gt;That's the real lesson from financial sentiment analysis: sometimes the hardest part of building AI isn't making it accurate. It's making humans trust it enough to use it.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Technical Stack Used&lt;/strong&gt;: TinyBERT, PyTorch, SHAP, Streamlit, NLTK, spaCy, Hugging Face Transformers, SMOTE, Pandas, PyPDF2&lt;br&gt;
&lt;strong&gt;GitHub Repository&lt;/strong&gt;: &lt;a href="https://github.com/ademicho123/financial_sentiment_analysis" rel="noopener noreferrer"&gt;https://github.com/ademicho123/financial_sentiment_analysis&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>datascience</category>
      <category>machinelearning</category>
    </item>
    <item>
      <title>Building an ML-Powered Trading Bot: From Theory to Production</title>
      <dc:creator>Ademola Balogun</dc:creator>
      <pubDate>Tue, 11 Nov 2025 09:56:26 +0000</pubDate>
      <link>https://dev.to/ambalogun/building-an-ai-powered-metatrader-5-trading-assistant-with-python-and-flask-5d0h</link>
      <guid>https://dev.to/ambalogun/building-an-ai-powered-metatrader-5-trading-assistant-with-python-and-flask-5d0h</guid>
      <description>&lt;h2&gt;
  
  
  How I Built a Machine Learning System That Makes Real-Time Trading Decisions
&lt;/h2&gt;

&lt;p&gt;Trading in financial markets is hard. Really hard. The statistics are sobering: most retail traders lose money, and even professional traders struggle to consistently beat the market. But what if we could leverage machine learning to tip the odds in our favor?&lt;/p&gt;

&lt;p&gt;Over the past few months, I've built a system that combines MetaTrader 5 with a Flask-based ML prediction server to make real-time trading decisions on gold (XAU/USD). Here's the story of how it came together, the challenges I faced, and the lessons I learned.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Problem: Too Much Data, Too Little Time
&lt;/h2&gt;

&lt;p&gt;As a trader, you're bombarded with information: price movements, moving averages, volatility indicators, momentum signals, and more. The human brain simply can't process all these variables in real-time while maintaining consistency and discipline.&lt;/p&gt;

&lt;p&gt;I needed a system that could:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Process multiple indicators simultaneously&lt;/li&gt;
&lt;li&gt;Learn patterns from historical trades&lt;/li&gt;
&lt;li&gt;Make decisions in milliseconds&lt;/li&gt;
&lt;li&gt;Maintain consistency (no emotional trading)&lt;/li&gt;
&lt;li&gt;Adapt to changing market conditions&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  The Solution: A Two-Part Architecture
&lt;/h2&gt;

&lt;p&gt;I settled on a clean separation of concerns:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;MetaTrader 5 Expert Advisor (EA)&lt;/strong&gt;: Handles market data collection, order execution, and risk management. This runs directly in the MT5 terminal with millisecond-level access to price data.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Flask ML Server&lt;/strong&gt;: A Python-based REST API that loads a trained Random Forest model and serves predictions. This runs as a separate process (or on a different machine) for scalability and easier model updates.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;MT5 EA → HTTP Request → Flask Server → ML Model → Prediction → MT5 EA → Trade Decision
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  The Magic: Feature Engineering
&lt;/h2&gt;

&lt;p&gt;Here's where things get interesting. Raw price data alone isn't enough. The model needs &lt;em&gt;context&lt;/em&gt;. I engineered 38 features from just 10 base inputs:&lt;/p&gt;

&lt;h3&gt;
  
  
  Price Relationships
&lt;/h3&gt;

&lt;p&gt;Instead of just tracking price, I calculate:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Distance from moving averages (as percentage)&lt;/li&gt;
&lt;li&gt;Whether price is above/below key EMAs&lt;/li&gt;
&lt;li&gt;Price momentum over different periods&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Multi-Timeframe Analysis
&lt;/h3&gt;

&lt;p&gt;The system analyzes two timeframes simultaneously:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Short-term EMAs for entry signals&lt;/li&gt;
&lt;li&gt;Long-term EMAs for trend confirmation&lt;/li&gt;
&lt;li&gt;Cross-timeframe momentum alignment&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Volatility Intelligence
&lt;/h3&gt;

&lt;p&gt;ATR (Average True Range) isn't just a number:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;ATR as percentage of price (volatility intensity)&lt;/li&gt;
&lt;li&gt;Normalized ATR (compared to 20-period average)&lt;/li&gt;
&lt;li&gt;Volatility-adjusted price distances&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Interaction Features
&lt;/h3&gt;

&lt;p&gt;This is where ML shines. I created interaction terms:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Trend Strength × Volatility&lt;/li&gt;
&lt;li&gt;Price Distance × ATR&lt;/li&gt;
&lt;li&gt;EMA Spread × Trend Strength&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These capture non-linear relationships that humans struggle to track mentally.&lt;/p&gt;

&lt;h3&gt;
  
  
  Time Awareness
&lt;/h3&gt;

&lt;p&gt;Markets behave differently at different times:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Cyclical encoding of hour (sin/cos transformation)&lt;/li&gt;
&lt;li&gt;Day of week patterns&lt;/li&gt;
&lt;li&gt;Session identification (London, New York, Asian)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The Training Process
&lt;/h2&gt;

&lt;p&gt;Training a trading model is different from typical ML tasks. Here's what I learned:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Data Quality is Everything
&lt;/h3&gt;

&lt;p&gt;I started with actual trade data from my backtests:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Each row represents a real trading signal&lt;/li&gt;
&lt;li&gt;Outcome: 1 (profitable trade) or 0 (loss)&lt;/li&gt;
&lt;li&gt;Features: All market conditions at signal time&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  2. Class Imbalance is Real
&lt;/h3&gt;

&lt;p&gt;In a typical trading strategy, you might win 55-60% of trades. This creates class imbalance that can confuse your model. Solution: Use &lt;code&gt;class_weight='balanced'&lt;/code&gt; in Random Forest to give minority class more importance.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Feature Scaling Matters
&lt;/h3&gt;

&lt;p&gt;Price might be 2,650, while a percentage-based feature is 0.5. StandardScaler normalizes everything to the same scale, preventing large-magnitude features from dominating.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Overfitting is the Enemy
&lt;/h3&gt;

&lt;p&gt;I deliberately limited tree depth (max_depth=15) and required minimum samples per leaf. This prevents the model from memorizing specific historical scenarios that won't repeat.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Production System
&lt;/h2&gt;

&lt;p&gt;Getting ML into production is where most projects die. Here's how I made it work:&lt;/p&gt;

&lt;h3&gt;
  
  
  Real-Time Prediction API
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nd"&gt;@app.route&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;/predict&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;methods&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;POST&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;predict&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="c1"&gt;# Receive market data from MT5
&lt;/span&gt;    &lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_json&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="c1"&gt;# Engineer features (same as training)
&lt;/span&gt;    &lt;span class="n"&gt;features&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;engineer_features&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Scale features
&lt;/span&gt;    &lt;span class="n"&gt;scaled&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;scaler&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;transform&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;features&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Get prediction + confidence
&lt;/span&gt;    &lt;span class="n"&gt;prediction&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;predict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;scaled&lt;/span&gt;&lt;span class="p"&gt;)[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;confidence&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;predict_proba&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;scaled&lt;/span&gt;&lt;span class="p"&gt;)[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

    &lt;span class="c1"&gt;# Only trade if confidence &amp;gt; 60%
&lt;/span&gt;    &lt;span class="n"&gt;should_trade&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prediction&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;confidence&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.60&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;jsonify&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;prediction&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prediction&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;confidence&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;float&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;confidence&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;should_trade&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;bool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;should_trade&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Confidence Thresholding
&lt;/h3&gt;

&lt;p&gt;This was a game-changer. Instead of taking every prediction, I only execute trades where the model is &amp;gt;60% confident. This dramatically reduced false signals.&lt;/p&gt;

&lt;p&gt;High confidence (&amp;gt;75%): "Take this trade now"&lt;br&gt;
Moderate confidence (65-75%): "Decent setup"&lt;br&gt;
Low confidence (&amp;lt;60%): "Skip this one"&lt;/p&gt;

&lt;h3&gt;
  
  
  Error Handling
&lt;/h3&gt;

&lt;p&gt;Production systems fail. A lot. I built in multiple safety layers:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Validation of all input data types&lt;/li&gt;
&lt;li&gt;Handling missing features gracefully&lt;/li&gt;
&lt;li&gt;Infinite value replacement&lt;/li&gt;
&lt;li&gt;NaN filling with safe defaults&lt;/li&gt;
&lt;li&gt;Comprehensive logging&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Real-World Challenges
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Challenge 1: Feature Drift
&lt;/h3&gt;

&lt;p&gt;Markets change. A feature that was predictive last month might not work this month. Solution: Regular retraining with recent data.&lt;/p&gt;

&lt;h3&gt;
  
  
  Challenge 2: Latency
&lt;/h3&gt;

&lt;p&gt;Every millisecond counts in trading. I optimized:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Kept the model loaded in memory (no disk reads)&lt;/li&gt;
&lt;li&gt;Used Gunicorn with multiple workers&lt;/li&gt;
&lt;li&gt;Minimized feature engineering computation&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Challenge 3: The "Works in Backtest" Problem
&lt;/h3&gt;

&lt;p&gt;A model can look amazing on historical data but fail live. Why?&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Look-ahead bias in feature engineering&lt;/li&gt;
&lt;li&gt;Overfitting to past market conditions&lt;/li&gt;
&lt;li&gt;Ignoring transaction costs&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I combated this by:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Using only information available at signal time&lt;/li&gt;
&lt;li&gt;Testing on out-of-sample data&lt;/li&gt;
&lt;li&gt;Including realistic spread/commission&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Performance Metrics That Matter
&lt;/h2&gt;

&lt;p&gt;Forget about 90% accuracy. Here's what actually matters:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Risk-Adjusted Return&lt;/strong&gt;: Are you making money after accounting for risk?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Win Rate × Average Win vs. Loss Rate × Average Loss&lt;/strong&gt;: A 45% win rate can be profitable if your wins are 2x your losses.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Maximum Drawdown&lt;/strong&gt;: How much can you lose before recovery? This determines position sizing.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Sharpe Ratio&lt;/strong&gt;: Return per unit of volatility. Higher is better.&lt;/p&gt;

&lt;h2&gt;
  
  
  Lessons Learned
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. More Features ≠ Better Model
&lt;/h3&gt;

&lt;p&gt;I started with 100+ features. Performance was worse. Why? Noise. More features mean more chances to overfit. I trimmed to 38 carefully selected features.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Domain Knowledge &amp;gt; Fancy Algorithms
&lt;/h3&gt;

&lt;p&gt;Understanding &lt;em&gt;why&lt;/em&gt; a feature works matters more than using the latest deep learning architecture. A Random Forest trained on meaningful features beats an LSTM trained on raw prices.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Production is Different
&lt;/h3&gt;

&lt;p&gt;What works in a Jupyter notebook often breaks in production:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Memory leaks&lt;/li&gt;
&lt;li&gt;Threading issues&lt;/li&gt;
&lt;li&gt;Serialization problems&lt;/li&gt;
&lt;li&gt;API timeout handling&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  4. Keep It Simple
&lt;/h3&gt;

&lt;p&gt;My first version had ensemble models, feature selection algorithms, and hyperparameter optimization pipelines. All unnecessary. A well-tuned Random Forest on good features works great.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Code: Open Source
&lt;/h2&gt;

&lt;p&gt;I've made the entire system available on GitHub. It includes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Training script with feature engineering&lt;/li&gt;
&lt;li&gt;Flask prediction server&lt;/li&gt;
&lt;li&gt;Model serialization&lt;/li&gt;
&lt;li&gt;Production deployment guide&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The system is modular: use the ML component with any trading platform, or swap in your own model while keeping the infrastructure.&lt;/p&gt;

&lt;h2&gt;
  
  
  Future Improvements
&lt;/h2&gt;

&lt;p&gt;Where can this go next?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Reinforcement Learning&lt;/strong&gt;: Train an agent to optimize entry/exit timing, not just signal classification.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Ensemble Models&lt;/strong&gt;: Combine multiple models with different strengths (trend following + mean reversion).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Online Learning&lt;/strong&gt;: Update the model continuously with new trade results.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Multi-Asset Expansion&lt;/strong&gt;: Apply the same framework to forex, indices, and crypto.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Risk-Adjusted Position Sizing&lt;/strong&gt;: Let the model suggest position size based on confidence.&lt;/p&gt;

&lt;h2&gt;
  
  
  Ethical Considerations
&lt;/h2&gt;

&lt;p&gt;Before you rush to deploy this:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;This is NOT a get-rich-quick scheme.&lt;/strong&gt; Trading is risky. This system improves odds but doesn't eliminate risk.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Markets adapt.&lt;/strong&gt; What works today may not work tomorrow. Continuous monitoring and retraining are essential.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Technology isn't magic.&lt;/strong&gt; ML can find patterns, but it can't predict black swan events or market crashes.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Risk management is paramount.&lt;/strong&gt; Never risk more than you can afford to lose. Use stop losses. Manage position sizing.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Building an ML trading system taught me more about production ML than any tutorial could. The challenges of real-time prediction, model deployment, error handling, and continuous monitoring are universal to any ML product.&lt;/p&gt;

&lt;p&gt;The financial domain adds extra complexity (latency requirements, data quality, market dynamics), but the lessons apply broadly:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Start simple&lt;/li&gt;
&lt;li&gt;Focus on features over algorithms&lt;/li&gt;
&lt;li&gt;Build for production from day one&lt;/li&gt;
&lt;li&gt;Monitor and iterate continuously&lt;/li&gt;
&lt;li&gt;Respect the domain expertise&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Whether you're interested in algorithmic trading or production ML systems, I hope this journey inspires your next project. The code is open source, the architecture is battle-tested, and the patterns are reusable.&lt;/p&gt;

&lt;p&gt;Remember: In trading, as in software, there's no substitute for testing in real conditions. Paper trade first, start small, and let the data guide your decisions.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try It Yourself
&lt;/h2&gt;

&lt;p&gt;The complete system is available on GitHub: &lt;a href="https://github.com/ademicho123/gold_trader" rel="noopener noreferrer"&gt;https://github.com/ademicho123/gold_trader&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Requirements:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Python 3.8+&lt;/li&gt;
&lt;li&gt;MetaTrader 5 terminal&lt;/li&gt;
&lt;li&gt;Historical trading data&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Clone, train, deploy, and start experimenting. And when you make improvements (you will), consider contributing back to the project.&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;em&gt;Disclaimer: This article is for educational purposes only. Trading carries significant risk. Always do your own research and never risk more than you can afford to lose. Past performance does not guarantee future results.&lt;/em&gt;
&lt;/h2&gt;

</description>
      <category>machinelearning</category>
      <category>forex</category>
      <category>ai</category>
    </item>
    <item>
      <title>Large Language Models in Financial Content Generation: Challenges and Innovative Solutions</title>
      <dc:creator>Ademola Balogun</dc:creator>
      <pubDate>Sun, 26 Oct 2025 08:53:29 +0000</pubDate>
      <link>https://dev.to/ambalogun/large-language-models-in-financial-content-generation-challenges-and-innovative-solutions-ob4</link>
      <guid>https://dev.to/ambalogun/large-language-models-in-financial-content-generation-challenges-and-innovative-solutions-ob4</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;The financial technology landscape is undergoing a radical transformation, driven by the emergence of large language models (LLMs). As the founder of Trading Flashes, I've pioneered the integration of advanced AI technologies to generate sophisticated financial content. This article delves into the technical challenges and innovative solutions in applying LLMs to financial content generation.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Complexity of Financial Language
&lt;/h2&gt;

&lt;p&gt;Financial communication is uniquely challenging:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Domain-Specific Vocabulary&lt;/strong&gt;: Requires precise technical terminology&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Nuanced Contextual Understanding&lt;/strong&gt;: Interpreting complex market dynamics&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Balancing Objectivity and Insight&lt;/strong&gt;: Providing valuable analysis without bias&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Rapidly Changing Contextual Landscape&lt;/strong&gt;: Adapting to real-time market shifts&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Technical Architecture of Financial Content Generation
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Prompt Engineering Strategy
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;typing&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;List&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Dict&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;together&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;FinancialContentGenerator&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;together&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Together&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;markets&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;forex&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;crypto&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;stocks&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;commodities&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;generate_market_summary&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;market_data&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Dict&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
        Generate a comprehensive market summary using advanced prompt engineering

        Args:
            market_data (Dict): Comprehensive market data dictionary

        Returns:
            str: AI-generated market analysis
        &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
        &lt;span class="c1"&gt;# Construct multi-stage prompt for nuanced analysis
&lt;/span&gt;        &lt;span class="n"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
        You are a senior financial analyst providing a professional market summary.
        Context:
        - Current market conditions
        - Historical price trends
        - Significant economic indicators

        Market Data Overview:
        &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_format_market_data&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;market_data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;

        Guidelines for Analysis:
        1. Provide objective, fact-based insights
        2. Highlight key trends and potential market movements
        3. Maintain a professional, measured tone
        4. Include potential risk factors

        Generate a comprehensive market summary focusing on:
        - Key price movements
        - Underlying economic drivers
        - Short-term market outlook
        &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

        &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;complete&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;max_tokens&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;500&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;temperature&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# Lower temperature for more deterministic output
&lt;/span&gt;            &lt;span class="n"&gt;top_p&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.9&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_format_market_data&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;market_data&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Dict&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
        Format market data for optimal model consumption

        Args:
            market_data (Dict): Raw market data

        Returns:
            str: Formatted market data string
        &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
        &lt;span class="n"&gt;formatted_data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;market&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;markets&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;market&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;market_data&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;market_summary&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;market&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;upper&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; Market:&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
                &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;market_data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;market&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;items&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
                    &lt;span class="n"&gt;market_summary&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;- &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
                &lt;span class="n"&gt;formatted_data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;market_summary&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;formatted_data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Bias Mitigation Techniques
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;BiasMonitor&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nd"&gt;@staticmethod&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;detect_potential_bias&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;generated_content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;Dict&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
        Analyze generated content for potential biases

        Args:
            generated_content (str): AI-generated financial content

        Returns:
            Dict[str, float]: Bias probability scores
        &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
        &lt;span class="n"&gt;bias_metrics&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;market_sentiment_skew&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;repetitive_language&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;overly_positive_tone&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.0&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;

        &lt;span class="c1"&gt;# Implement sophisticated bias detection algorithms
&lt;/span&gt;        &lt;span class="c1"&gt;# This is a simplified example
&lt;/span&gt;        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;generated_content&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;split&lt;/span&gt;&lt;span class="p"&gt;()))&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;generated_content&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;split&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mf"&gt;0.7&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;bias_metrics&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;repetitive_language&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.6&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;bias_metrics&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Advanced Model Selection Strategies
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Model Evaluation Framework
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;together&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Together&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;ModelEvaluator&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;models&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;List&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]):&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;models&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;models&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;compare_model_performance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;test_prompts&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;List&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;Dict&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
        Compare different LLM models for financial content generation

        Args:
            test_prompts (List[str]): Standardized evaluation prompts

        Returns:
            Dict[str, float]: Performance scores for each model
        &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
        &lt;span class="n"&gt;performance_scores&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;

        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;models&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;model_performance&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_evaluate_single_model&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;test_prompts&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;performance_scores&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model_performance&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;performance_scores&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_evaluate_single_model&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;test_prompts&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;List&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
        Evaluate a single model&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s performance

        Scoring criteria:
        - Factual accuracy
        - Contextual relevance
        - Linguistic quality
        &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
        &lt;span class="c1"&gt;# Implement multi-dimensional evaluation logic
&lt;/span&gt;        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mf"&gt;0.85&lt;/span&gt;  &lt;span class="c1"&gt;# Placeholder performance score
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Performance and Optimization Strategies
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Caching Mechanisms&lt;/strong&gt;: Implement intelligent caching to reduce API calls&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Asynchronous Processing&lt;/strong&gt;: Utilize concurrent processing for multiple market analyses&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Continuous Model Fine-Tuning&lt;/strong&gt;: Regularly update models with recent financial data&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Ethical Considerations in AI-Generated Financial Content
&lt;/h2&gt;

&lt;p&gt;Critical ethical guidelines:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Transparency&lt;/strong&gt;: Clear labeling of AI-generated content&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Disclaimer Integration&lt;/strong&gt;: Highlighting the speculative nature of predictions&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Avoiding Market Manipulation&lt;/strong&gt;: Generating objective, balanced insights&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Integrating large language models into financial content generation is a complex, nuanced challenge. By developing sophisticated prompt engineering techniques, implementing robust bias detection, and maintaining a commitment to ethical AI practices, we can create powerful, insightful financial communication tools.&lt;/p&gt;

&lt;h2&gt;
  
  
  About the Author
&lt;/h2&gt;

&lt;p&gt;Ademola Balogun is the founder and CEO of 180GIG Ltd, creators of Squrrel—an AI-powered interview platform that makes hiring smarter and more equitable. With an MSc in Data Science from Birkbeck, University of London, he specializes in building practical AI solutions for real-world problems. He also created Trading Flashes ⚡, an AI-driven newsletter platform for financial markets.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Key Takeaways:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Large language models require sophisticated engineering for financial applications&lt;/li&gt;
&lt;li&gt;Bias detection and mitigation are crucial&lt;/li&gt;
&lt;li&gt;Ethical considerations are paramount in AI-driven financial communication&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>machinelearning</category>
    </item>
    <item>
      <title>RAG Architecture for HR Applications: Building Context-Aware Interview Systems</title>
      <dc:creator>Ademola Balogun</dc:creator>
      <pubDate>Sat, 18 Oct 2025 18:41:40 +0000</pubDate>
      <link>https://dev.to/ambalogun/-rag-architecture-for-hr-applications-building-context-aware-interview-systems-1aep</link>
      <guid>https://dev.to/ambalogun/-rag-architecture-for-hr-applications-building-context-aware-interview-systems-1aep</guid>
      <description>&lt;h2&gt;
  
  
  Introduction: The RAG Revolution in HR Tech
&lt;/h2&gt;

&lt;p&gt;Retrieval-Augmented Generation (RAG) represents a paradigm shift in how AI systems access and utilize information. For HR applications—particularly AI-powered interviews—RAG solves a critical problem: how can an AI conduct role-specific, context-aware conversations without requiring manual programming for every job type?&lt;/p&gt;

&lt;p&gt;Having implemented RAG architecture in a production interview platform, I'll share technical insights, architectural decisions, and lessons learned from deploying RAG in the HR domain.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why RAG Matters for HR Applications
&lt;/h2&gt;

&lt;h3&gt;
  
  
  The Traditional Approach's Limitations
&lt;/h3&gt;

&lt;p&gt;Traditional AI interview systems use one of two approaches:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Rule-Based Systems:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Rigid, manually programmed
&lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;job_title&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Software Engineer&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;ask_question&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Tell me about your experience with Python&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;job_title&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Marketing Manager&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;ask_question&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Describe your campaign management experience&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Problems:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Requires manual configuration for every role&lt;/li&gt;
&lt;li&gt;Cannot adapt to unique job requirements&lt;/li&gt;
&lt;li&gt;Fails when job descriptions don't match templates&lt;/li&gt;
&lt;li&gt;Cannot incorporate company-specific context&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;2. Pure LLM Approach:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Uses LLM's training knowledge only
&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Interview a candidate for &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;job_title&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;llm&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;generate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Problems:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Hallucinates job requirements&lt;/li&gt;
&lt;li&gt;No access to specific job description&lt;/li&gt;
&lt;li&gt;Cannot reference company policies or values&lt;/li&gt;
&lt;li&gt;Inconsistent across similar roles&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  RAG's Solution
&lt;/h3&gt;

&lt;p&gt;RAG combines retrieval systems with language models, allowing AI to:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Access specific job requirements dynamically&lt;/li&gt;
&lt;li&gt;Reference company policies and culture documents&lt;/li&gt;
&lt;li&gt;Incorporate industry-specific knowledge&lt;/li&gt;
&lt;li&gt;Adapt questions based on candidate background&lt;/li&gt;
&lt;li&gt;Provide consistent, contextual interviews&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  RAG Architecture for Interview Systems
&lt;/h2&gt;

&lt;h3&gt;
  
  
  High-Level Architecture
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;User Input (Candidate Response)
    ↓
Query Embedding
    ↓
Vector Database Search
    ↓
Relevant Context Retrieval
    ↓
Context + Query → LLM
    ↓
Generated Follow-up Question
    ↓
Response to Candidate
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Let's examine each component in depth.&lt;/p&gt;

&lt;h2&gt;
  
  
  Component 1: Document Processing and Indexing
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Input Documents for HR RAG
&lt;/h3&gt;

&lt;p&gt;Our system ingests multiple document types:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Job Descriptions&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Required skills and experience&lt;/li&gt;
&lt;li&gt;Responsibilities and expectations&lt;/li&gt;
&lt;li&gt;Team structure and reporting&lt;/li&gt;
&lt;li&gt;Technical requirements&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;2. Company Knowledge Base&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Company values and mission&lt;/li&gt;
&lt;li&gt;Team culture documents&lt;/li&gt;
&lt;li&gt;Product/service information&lt;/li&gt;
&lt;li&gt;Work environment details&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;3. Interview Guidelines&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Evaluation criteria&lt;/li&gt;
&lt;li&gt;Legal compliance requirements&lt;/li&gt;
&lt;li&gt;Behavioral indicators&lt;/li&gt;
&lt;li&gt;Red flags to watch for&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;4. Role-Specific Resources&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Technical documentation (for tech roles)&lt;/li&gt;
&lt;li&gt;Industry knowledge bases&lt;/li&gt;
&lt;li&gt;Common scenarios and challenges&lt;/li&gt;
&lt;li&gt;Success profiles from high performers&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Document Processing Pipeline
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain.document_loaders&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;PyPDFLoader&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;UnstructuredWordDocumentLoader&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;TextLoader&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain.text_splitter&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;RecursiveCharacterTextSplitter&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;HRDocumentProcessor&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text_splitter&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;RecursiveCharacterTextSplitter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;chunk_size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;500&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# Optimized for interview context
&lt;/span&gt;            &lt;span class="n"&gt;chunk_overlap&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;separators&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;. &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt; &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;process_job_description&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;file_path&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Process job description into chunks&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

        &lt;span class="c1"&gt;# Load document
&lt;/span&gt;        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;file_path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;endswith&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;.pdf&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="n"&gt;loader&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;PyPDFLoader&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;file_path&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;file_path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;endswith&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;.docx&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="n"&gt;loader&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;UnstructuredWordDocumentLoader&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;file_path&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;loader&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;TextLoader&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;file_path&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="n"&gt;documents&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;loader&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;load&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

        &lt;span class="c1"&gt;# Extract structured information
&lt;/span&gt;        &lt;span class="n"&gt;structured_data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;extract_structured_info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;documents&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;page_content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Split into chunks
&lt;/span&gt;        &lt;span class="n"&gt;chunks&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text_splitter&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;split_documents&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;documents&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Enrich chunks with metadata
&lt;/span&gt;        &lt;span class="n"&gt;enriched_chunks&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_metadata&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;chunks&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;structured_data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;enriched_chunks&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;extract_structured_info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Extract structured data from job description&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

        &lt;span class="c1"&gt;# Use NER or LLM to extract key fields
&lt;/span&gt;        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;job_title&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;extract_job_title&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;required_skills&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;extract_skills&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;experience_level&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;extract_experience_level&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;department&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;extract_department&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;location&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;extract_location&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;add_metadata&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;chunks&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;structured_data&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Add metadata to chunks for better retrieval&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;chunk&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;chunks&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;update&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
                &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;job_title&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;structured_data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;job_title&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
                &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;document_type&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;job_description&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;section&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;identify_section&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;page_content&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
                &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;importance&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;score_importance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;page_content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="p"&gt;})&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;chunks&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Intelligent Chunking Strategy
&lt;/h3&gt;

&lt;p&gt;Chunk size significantly impacts RAG performance. For interview context:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Too Small (&amp;lt; 200 tokens):&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Loses context&lt;/li&gt;
&lt;li&gt;Requires more retrieval calls&lt;/li&gt;
&lt;li&gt;Fragmented information&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Too Large (&amp;gt; 1000 tokens):&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Exceeds context window quickly&lt;/li&gt;
&lt;li&gt;Includes irrelevant information&lt;/li&gt;
&lt;li&gt;Slower retrieval&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Our Optimized Approach:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;AdaptiveChunker&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;chunk_by_semantic_coherence&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Chunk based on semantic boundaries, not just length&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

        &lt;span class="c1"&gt;# Identify semantic boundaries
&lt;/span&gt;        &lt;span class="n"&gt;sentences&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;split_into_sentences&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="n"&gt;chunks&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
        &lt;span class="n"&gt;current_chunk&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
        &lt;span class="n"&gt;current_length&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;

        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;sentence&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;sentences&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;sentence_embedding&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_embedding&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sentence&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;current_chunk&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="c1"&gt;# Check semantic similarity with current chunk
&lt;/span&gt;                &lt;span class="n"&gt;chunk_embedding&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_embedding&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt; &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;current_chunk&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
                &lt;span class="n"&gt;similarity&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;cosine_similarity&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sentence_embedding&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;chunk_embedding&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

                &lt;span class="c1"&gt;# Start new chunk if semantic break or length limit
&lt;/span&gt;                &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;similarity&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mf"&gt;0.7&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="n"&gt;current_length&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sentence&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;500&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                    &lt;span class="n"&gt;chunks&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt; &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;current_chunk&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
                    &lt;span class="n"&gt;current_chunk&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;sentence&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
                    &lt;span class="n"&gt;current_length&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sentence&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                    &lt;span class="n"&gt;current_chunk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sentence&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                    &lt;span class="n"&gt;current_length&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sentence&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;current_chunk&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;sentence&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
                &lt;span class="n"&gt;current_length&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sentence&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;current_chunk&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;chunks&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt; &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;current_chunk&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;chunks&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This adaptive approach improved retrieval relevance by 23% compared to fixed-size chunking.&lt;/p&gt;

&lt;h2&gt;
  
  
  Component 2: Embedding and Vector Storage
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Embedding Model Selection
&lt;/h3&gt;

&lt;p&gt;We tested multiple embedding models for HR text:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Dimension&lt;/th&gt;
&lt;th&gt;Performance&lt;/th&gt;
&lt;th&gt;Cost&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;OpenAI text-embedding-ada-002&lt;/td&gt;
&lt;td&gt;1536&lt;/td&gt;
&lt;td&gt;Excellent&lt;/td&gt;
&lt;td&gt;$0.0001/1K tokens&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;sentence-transformers/all-MiniLM-L6-v2&lt;/td&gt;
&lt;td&gt;384&lt;/td&gt;
&lt;td&gt;Good&lt;/td&gt;
&lt;td&gt;Free (self-hosted)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;sentence-transformers/all-mpnet-base-v2&lt;/td&gt;
&lt;td&gt;768&lt;/td&gt;
&lt;td&gt;Very Good&lt;/td&gt;
&lt;td&gt;Free (self-hosted)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cohere embed-english-v3.0&lt;/td&gt;
&lt;td&gt;1024&lt;/td&gt;
&lt;td&gt;Excellent&lt;/td&gt;
&lt;td&gt;$0.0001/1K tokens&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Our Choice:&lt;/strong&gt; OpenAI ada-002 for production (quality and reliability), with MiniLM for development/testing.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OpenAI&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;numpy&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;EmbeddingGenerator&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text-embedding-ada-002&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;generate_embedding&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Generate embedding for text&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

        &lt;span class="c1"&gt;# Preprocess text
&lt;/span&gt;        &lt;span class="n"&gt;text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;preprocess&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Generate embedding
&lt;/span&gt;        &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;embeddings&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="nb"&gt;input&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;array&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;embedding&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;batch_generate_embeddings&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;texts&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;batch_size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Generate embeddings in batches for efficiency&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

        &lt;span class="n"&gt;embeddings&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;

        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;texts&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;batch_size&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="n"&gt;batch&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;texts&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="n"&gt;batch_size&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

            &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;embeddings&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="nb"&gt;input&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;batch&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;

            &lt;span class="n"&gt;batch_embeddings&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
                &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;array&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;item&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;embedding&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; 
                &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;item&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;
            &lt;span class="p"&gt;]&lt;/span&gt;
            &lt;span class="n"&gt;embeddings&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;extend&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;batch_embeddings&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;embeddings&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;preprocess&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Preprocess text before embedding&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

        &lt;span class="c1"&gt;# Remove excessive whitespace
&lt;/span&gt;        &lt;span class="n"&gt;text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt; &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;split&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;

        &lt;span class="c1"&gt;# Truncate if too long (ada-002 limit: 8191 tokens)
&lt;/span&gt;        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;8000&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;[:&lt;/span&gt;&lt;span class="mi"&gt;8000&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Vector Database Selection
&lt;/h3&gt;

&lt;p&gt;We evaluated several vector databases:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pinecone:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Pros: Fully managed, excellent performance, simple API&lt;/li&gt;
&lt;li&gt;Cons: Cost, vendor lock-in&lt;/li&gt;
&lt;li&gt;Best for: Production systems with high query volume&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Weaviate:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Pros: Self-hosted option, built-in filtering, good documentation&lt;/li&gt;
&lt;li&gt;Cons: More complex setup&lt;/li&gt;
&lt;li&gt;Best for: Complex filtering requirements&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Chroma:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Pros: Lightweight, easy to start, local development&lt;/li&gt;
&lt;li&gt;Cons: Not ideal for large-scale production&lt;/li&gt;
&lt;li&gt;Best for: Development and prototyping&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;FAISS:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Pros: Extremely fast, free, battle-tested&lt;/li&gt;
&lt;li&gt;Cons: No filtering, requires custom infrastructure&lt;/li&gt;
&lt;li&gt;Best for: Large-scale with custom infrastructure&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Our Implementation (Pinecone):&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;pinecone&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;typing&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;List&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Dict&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;VectorStore&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;index_name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;pinecone&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;init&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;PINECONE_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="n"&gt;environment&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;PINECONE_ENV&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Create index if doesn't exist
&lt;/span&gt;        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;index_name&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;pinecone&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;list_indexes&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
            &lt;span class="n"&gt;pinecone&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create_index&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;index_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;dimension&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1536&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# ada-002 dimension
&lt;/span&gt;                &lt;span class="n"&gt;metric&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cosine&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;pod_type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;p1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;index&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pinecone&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Index&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;index_name&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;upsert_documents&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;chunks&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;List&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Dict&lt;/span&gt;&lt;span class="p"&gt;]):&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Insert document chunks into vector database&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

        &lt;span class="n"&gt;vectors&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;chunk&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;chunks&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;vectors&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
                &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;id&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;id&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
                &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;values&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;embedding&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
                &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;metadata&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
                    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;job_title&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;job_title&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
                    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;document_type&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;document_type&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
                    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;section&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;section&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
                    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;importance&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;importance&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.5&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                &lt;span class="p"&gt;}&lt;/span&gt;
            &lt;span class="p"&gt;})&lt;/span&gt;

        &lt;span class="c1"&gt;# Batch upsert for efficiency
&lt;/span&gt;        &lt;span class="n"&gt;batch_size&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;vectors&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;batch_size&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="n"&gt;batch&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;vectors&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="n"&gt;batch_size&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
            &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;index&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;upsert&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;vectors&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;batch&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
        &lt;span class="n"&gt;query_embedding&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;List&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; 
        &lt;span class="n"&gt;top_k&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;filter_dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Dict&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;
    &lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Query vector database&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

        &lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;index&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;vector&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;query_embedding&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;top_k&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;top_k&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="nb"&gt;filter&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;filter_dict&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;include_metadata&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Hybrid Search: Combining Dense and Sparse Retrieval
&lt;/h3&gt;

&lt;p&gt;Pure vector search sometimes misses exact keyword matches. Hybrid approach combines:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Dense retrieval&lt;/strong&gt; (embeddings): Semantic similarity&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Sparse retrieval&lt;/strong&gt; (BM25): Keyword matching
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;rank_bm25&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;BM25Okapi&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;numpy&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;HybridRetriever&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;vector_store&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;documents&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;vector_store&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;vector_store&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;documents&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;documents&lt;/span&gt;

        &lt;span class="c1"&gt;# Build BM25 index
&lt;/span&gt;        &lt;span class="n"&gt;tokenized_docs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;doc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;split&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;doc&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;documents&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;bm25&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;BM25Okapi&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tokenized_docs&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;retrieve&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
        &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
        &lt;span class="n"&gt;top_k&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;alpha&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.7&lt;/span&gt;  &lt;span class="c1"&gt;# Weight for vector search vs BM25
&lt;/span&gt;    &lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Hybrid retrieval combining vector and keyword search&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

        &lt;span class="c1"&gt;# Vector search
&lt;/span&gt;        &lt;span class="n"&gt;query_embedding&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;embedding_gen&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;generate_embedding&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;vector_results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;vector_store&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query_embedding&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;top_k&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;top_k&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# BM25 search
&lt;/span&gt;        &lt;span class="n"&gt;tokenized_query&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;split&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;bm25_scores&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;bm25&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_scores&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tokenized_query&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Combine scores
&lt;/span&gt;        &lt;span class="n"&gt;final_scores&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;

        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;vector_results&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;matches&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;doc_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;
            &lt;span class="n"&gt;vector_score&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;score&lt;/span&gt;

            &lt;span class="c1"&gt;# Normalize BM25 score
&lt;/span&gt;            &lt;span class="n"&gt;bm25_score&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;bm25_scores&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nf"&gt;int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;doc_id&lt;/span&gt;&lt;span class="p"&gt;)]&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="nf"&gt;max&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;bm25_scores&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

            &lt;span class="c1"&gt;# Weighted combination
&lt;/span&gt;            &lt;span class="n"&gt;final_scores&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;doc_id&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="n"&gt;alpha&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;vector_score&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;alpha&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;bm25_score&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Sort by combined score and return top k
&lt;/span&gt;        &lt;span class="n"&gt;ranked_docs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;sorted&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;final_scores&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;items&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; 
            &lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="k"&gt;lambda&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; 
            &lt;span class="n"&gt;reverse&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;
        &lt;span class="p"&gt;)[:&lt;/span&gt;&lt;span class="n"&gt;top_k&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;documents&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nf"&gt;int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;doc_id&lt;/span&gt;&lt;span class="p"&gt;)]&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;doc_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;ranked_docs&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Component 3: Query Construction and Retrieval
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Context-Aware Query Generation
&lt;/h3&gt;

&lt;p&gt;Simple keyword queries don't capture interview context. We generate enhanced queries:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;QueryEnhancer&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;enhance_query_with_context&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
        &lt;span class="n"&gt;candidate_response&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;conversation_history&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;List&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="n"&gt;job_context&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Dict&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Enhance query with conversation and job context&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

        &lt;span class="c1"&gt;# Extract key entities from candidate response
&lt;/span&gt;        &lt;span class="n"&gt;entities&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;extract_entities&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;candidate_response&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Identify topics that need probing
&lt;/span&gt;        &lt;span class="n"&gt;topics_to_probe&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;identify_probe_topics&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;candidate_response&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
            &lt;span class="n"&gt;conversation_history&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Build context-enriched query
&lt;/span&gt;        &lt;span class="n"&gt;query&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
        Job Title: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;job_context&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;title&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;

        Candidate mentioned: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;, &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;entities&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;

        Topics to explore: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;, &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;topics_to_probe&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;

        Recent conversation:
        &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;summarize_recent_context&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;conversation_history&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;

        What specific follow-up questions should I ask to evaluate:
        - Technical depth in mentioned areas
        - Relevance to job requirements
        - Areas needing clarification
        &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;extract_entities&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Extract key entities using NER&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

        &lt;span class="c1"&gt;# Use spaCy or similar for NER
&lt;/span&gt;        &lt;span class="n"&gt;doc&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;nlp&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="n"&gt;entities&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;ent&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;doc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ents&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;ent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;label_&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;ORG&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;PRODUCT&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;TECHNOLOGY&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;SKILL&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
                &lt;span class="n"&gt;entities&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;entities&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Multi-Query Retrieval
&lt;/h3&gt;

&lt;p&gt;Single queries may miss relevant context. Generate multiple queries:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;MultiQueryRetriever&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;generate_multiple_queries&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;original_query&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;List&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Generate multiple perspectives on the same query&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

        &lt;span class="n"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
        Given this interview context query:
        &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;original_query&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;

        Generate 3 alternative phrasings that capture different aspects:
        1. A query focusing on technical requirements
        2. A query focusing on behavioral indicators
        3. A query focusing on cultural fit

        Return only the 3 queries, one per line.
        &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

        &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;llm&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;generate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;queries&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;original_query&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;queries&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;retrieve_with_multiple_queries&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;queries&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;List&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;top_k&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Retrieve using multiple queries and deduplicate&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

        &lt;span class="n"&gt;all_results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
        &lt;span class="n"&gt;seen_ids&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;set&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;queries&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;query_embedding&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;embedding_gen&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;generate_embedding&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;vector_store&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query_embedding&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;top_k&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;top_k&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

            &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;matches&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;seen_ids&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                    &lt;span class="n"&gt;all_results&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                    &lt;span class="n"&gt;seen_ids&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Re-rank by relevance
&lt;/span&gt;        &lt;span class="n"&gt;reranked&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;rerank_results&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;all_results&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;queries&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;reranked&lt;/span&gt;&lt;span class="p"&gt;[:&lt;/span&gt;&lt;span class="n"&gt;top_k&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;queries&lt;/span&gt;&lt;span class="p"&gt;)]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Re-Ranking Retrieved Results
&lt;/h3&gt;

&lt;p&gt;Initial retrieval may return suboptimal ordering. Re-rank using cross-encoder:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sentence_transformers&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;CrossEncoder&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;ResultReranker&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cross_encoder&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;CrossEncoder&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;cross-encoder/ms-marco-MiniLM-L-6-v2&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;rerank&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;documents&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;List&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;List&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Rerank documents using cross-encoder&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

        &lt;span class="c1"&gt;# Create query-document pairs
&lt;/span&gt;        &lt;span class="n"&gt;pairs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[[&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;doc&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;doc&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;documents&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

        &lt;span class="c1"&gt;# Score pairs
&lt;/span&gt;        &lt;span class="n"&gt;scores&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cross_encoder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;predict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pairs&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Sort by score
&lt;/span&gt;        &lt;span class="n"&gt;ranked_indices&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;argsort&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;scores&lt;/span&gt;&lt;span class="p"&gt;)[::&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;documents&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;ranked_indices&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Component 4: Context Integration with LLM
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Prompt Engineering for Interview Context
&lt;/h3&gt;

&lt;p&gt;How you present retrieved context to the LLM dramatically affects response quality:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;InterviewPromptBuilder&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;build_prompt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;candidate_response&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;retrieved_contexts&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;List&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="n"&gt;job_requirements&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Dict&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;conversation_history&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;List&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Build comprehensive prompt for LLM&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

        &lt;span class="n"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
        You are conducting an AI-powered interview for the position of &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;job_requirements&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;title&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;.

        ## Job Context
        &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;format_job_requirements&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;job_requirements&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;

        ## Retrieved Relevant Information
        &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;format_retrieved_contexts&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;retrieved_contexts&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;

        ## Conversation So Far
        &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;format_conversation_history&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;conversation_history&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;

        ## Candidate&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s Latest Response
        &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;candidate_response&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;

        ## Your Task
        Based on the candidate&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s response and the relevant job requirements above:

        1. Evaluate the response against job requirements
        2. Identify areas that need deeper exploration
        3. Generate 1-2 specific follow-up questions that:
           - Probe technical depth where candidate showed knowledge
           - Clarify vague or incomplete statements
           - Assess cultural fit and soft skills
           - Are natural and conversational in tone

        ## Important Guidelines
        - Ask specific, not generic questions
        - Build on what candidate just said
        - One question at a time
        - Keep questions concise
        - Sound natural and conversational

        Your follow-up question:
        &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;format_job_requirements&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;requirements&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Dict&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Format job requirements for prompt&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
        Title: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;requirements&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;title&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;
        Key Skills: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;, &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;requirements&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;required_skills&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;
        Experience Level: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;requirements&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;experience_level&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;
        Key Responsibilities: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;requirements&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;responsibilities&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;
        &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;format_retrieved_contexts&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;contexts&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;List&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Format retrieved contexts&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

        &lt;span class="n"&gt;formatted&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;enumerate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;contexts&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="n"&gt;formatted&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;[Context &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;]&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;formatted&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Context Window Management
&lt;/h3&gt;

&lt;p&gt;LLMs have token limits. Manage context carefully:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;tiktoken&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;ContextWindowManager&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;model_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpt-4&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;max_tokens&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;8000&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;encoding&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tiktoken&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;encoding_for_model&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model_name&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;max_tokens&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;max_tokens&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;response_buffer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;  &lt;span class="c1"&gt;# Reserve for LLM response
&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;count_tokens&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Count tokens in text&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;encoding&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;encode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;fit_context_to_window&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;base_prompt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;retrieved_contexts&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;List&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="n"&gt;conversation_history&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;List&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Fit all context within token limit&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

        &lt;span class="c1"&gt;# Calculate token budgets
&lt;/span&gt;        &lt;span class="n"&gt;base_tokens&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;count_tokens&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;base_prompt&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;available_tokens&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;max_tokens&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;base_tokens&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;response_buffer&lt;/span&gt;

        &lt;span class="c1"&gt;# Allocate tokens
&lt;/span&gt;        &lt;span class="n"&gt;context_budget&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;available_tokens&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mf"&gt;0.6&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;history_budget&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;available_tokens&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mf"&gt;0.4&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Trim retrieved contexts
&lt;/span&gt;        &lt;span class="n"&gt;trimmed_contexts&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;trim_to_budget&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;retrieved_contexts&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;context_budget&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Trim conversation history (keep most recent)
&lt;/span&gt;        &lt;span class="n"&gt;trimmed_history&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;trim_history_to_budget&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;conversation_history&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;history_budget&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Build final prompt
&lt;/span&gt;        &lt;span class="n"&gt;final_prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;assemble_prompt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;base_prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;trimmed_contexts&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;trimmed_history&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;final_prompt&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;trim_to_budget&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
        &lt;span class="n"&gt;texts&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;List&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; 
        &lt;span class="n"&gt;budget&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;List&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Trim texts to fit within token budget&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

        &lt;span class="n"&gt;trimmed&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
        &lt;span class="n"&gt;current_tokens&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;

        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;texts&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;text_tokens&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;count_tokens&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;current_tokens&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;text_tokens&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&lt;/span&gt; &lt;span class="n"&gt;budget&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;trimmed&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                &lt;span class="n"&gt;current_tokens&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="n"&gt;text_tokens&lt;/span&gt;
            &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="c1"&gt;# Can we fit a truncated version?
&lt;/span&gt;                &lt;span class="n"&gt;remaining&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;budget&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;current_tokens&lt;/span&gt;
                &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;remaining&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;  &lt;span class="c1"&gt;# Minimum useful size
&lt;/span&gt;                    &lt;span class="n"&gt;truncated&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;truncate_text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;remaining&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                    &lt;span class="n"&gt;trimmed&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;truncated&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                &lt;span class="k"&gt;break&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;trimmed&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;truncate_text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;max_tokens&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Truncate text to max tokens&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

        &lt;span class="n"&gt;tokens&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;encoding&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;encode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;truncated_tokens&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tokens&lt;/span&gt;&lt;span class="p"&gt;[:&lt;/span&gt;&lt;span class="n"&gt;max_tokens&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;encoding&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;decode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;truncated_tokens&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;...&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Component 5: Response Generation and Validation
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Generating Context-Aware Responses
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OpenAI&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;InterviewResponseGenerator&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpt-4-turbo-preview&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;generate_followup_question&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;temperature&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.7&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;max_retries&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Generate follow-up question with retry logic&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;attempt&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;max_retries&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
                        &lt;span class="p"&gt;{&lt;/span&gt;
                            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;system&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You are an expert interviewer conducting a professional, context-aware interview.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
                        &lt;span class="p"&gt;},&lt;/span&gt;
                        &lt;span class="p"&gt;{&lt;/span&gt;
                            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt;
                        &lt;span class="p"&gt;}&lt;/span&gt;
                    &lt;span class="p"&gt;],&lt;/span&gt;
                    &lt;span class="n"&gt;temperature&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;temperature&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="n"&gt;max_tokens&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# Follow-up questions should be concise
&lt;/span&gt;                    &lt;span class="n"&gt;top_p&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.95&lt;/span&gt;
                &lt;span class="p"&gt;)&lt;/span&gt;

                &lt;span class="n"&gt;question&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

                &lt;span class="c1"&gt;# Validate question quality
&lt;/span&gt;                &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;validate_question&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;question&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
                    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;question&lt;/span&gt;
                &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                    &lt;span class="c1"&gt;# Retry with lower temperature
&lt;/span&gt;                    &lt;span class="n"&gt;temperature&lt;/span&gt; &lt;span class="o"&gt;*=&lt;/span&gt; &lt;span class="mf"&gt;0.8&lt;/span&gt;

            &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;attempt&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;max_retries&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                    &lt;span class="k"&gt;raise&lt;/span&gt;
                &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sleep&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt; &lt;span class="n"&gt;attempt&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# Exponential backoff
&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_fallback_question&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;validate_question&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;question&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;bool&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Validate generated question meets quality standards&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

        &lt;span class="c1"&gt;# Check length
&lt;/span&gt;        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;question&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;question&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;300&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt;

        &lt;span class="c1"&gt;# Check if it's actually a question
&lt;/span&gt;        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="nf"&gt;any&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;question&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;endswith&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;?&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;.&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;!&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]):&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt;

        &lt;span class="c1"&gt;# Check for generic questions (avoid low-quality outputs)
&lt;/span&gt;        &lt;span class="n"&gt;generic_phrases&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tell me more&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;anything else&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;can you elaborate&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="p"&gt;]&lt;/span&gt;

        &lt;span class="n"&gt;question_lower&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;question&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;any&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;phrase&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;question_lower&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;phrase&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;generic_phrases&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="c1"&gt;# Too generic, request more specific
&lt;/span&gt;            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Ensuring Response Relevance
&lt;/h3&gt;

&lt;p&gt;Validate that generated questions are relevant to conversation:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;RelevanceValidator&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;embedding_gen&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;EmbeddingGenerator&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;check_relevance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;generated_question&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;candidate_response&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;job_requirements&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;threshold&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.6&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;bool&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Check if generated question is relevant&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

        &lt;span class="c1"&gt;# Create embeddings
&lt;/span&gt;        &lt;span class="n"&gt;question_emb&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;embedding_gen&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;generate_embedding&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;generated_question&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;context_text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;candidate_response&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;job_requirements&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="n"&gt;context_emb&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;embedding_gen&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;generate_embedding&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;context_text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Calculate similarity
&lt;/span&gt;        &lt;span class="n"&gt;similarity&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;cosine_similarity&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;question_emb&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;reshape&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="n"&gt;context_emb&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;reshape&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;)[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;similarity&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="n"&gt;threshold&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Advanced RAG Techniques
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Hierarchical Retrieval
&lt;/h3&gt;

&lt;p&gt;For complex organizations with nested context:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;HierarchicalRetriever&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;retrieve_hierarchical&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;company_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;department_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;role_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;List&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Retrieve context at multiple hierarchy levels&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

        &lt;span class="c1"&gt;# Level 1: Role-specific context (highest priority)
&lt;/span&gt;        &lt;span class="n"&gt;role_contexts&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;retrieve_with_filter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;role_id&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;role_id&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
            &lt;span class="n"&gt;top_k&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Level 2: Department context
&lt;/span&gt;        &lt;span class="n"&gt;dept_contexts&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;retrieve_with_filter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;department_id&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;department_id&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
            &lt;span class="n"&gt;top_k&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Level 3: Company-wide context
&lt;/span&gt;        &lt;span class="n"&gt;company_contexts&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;retrieve_with_filter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;company_id&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;company_id&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
            &lt;span class="n"&gt;top_k&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Combine with priority weighting
&lt;/span&gt;        &lt;span class="n"&gt;all_contexts&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;role_contexts&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt;  &lt;span class="c1"&gt;# Most specific
&lt;/span&gt;            &lt;span class="n"&gt;dept_contexts&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt;  &lt;span class="c1"&gt;# Medium specificity
&lt;/span&gt;            &lt;span class="n"&gt;company_contexts&lt;/span&gt;  &lt;span class="c1"&gt;# Most general
&lt;/span&gt;        &lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;deduplicate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;all_contexts&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  2. Temporal Awareness
&lt;/h3&gt;

&lt;p&gt;Track when information was added/updated:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;retrieve_with_temporal_awareness&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;prefer_recent&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;bool&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;time_decay_factor&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.1&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;List&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Retrieve with temporal decay for older information&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

    &lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;vector_store&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;top_k&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;prefer_recent&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="c1"&gt;# Apply time decay to scores
&lt;/span&gt;        &lt;span class="n"&gt;current_time&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;doc_age_days&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;current_time&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;created_at&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]).&lt;/span&gt;&lt;span class="n"&gt;days&lt;/span&gt;
            &lt;span class="n"&gt;time_penalty&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;1.0&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;doc_age_days&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;time_decay_factor&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;time_penalty&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;max&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;0.1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;time_penalty&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# Minimum weight
&lt;/span&gt;
            &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;score&lt;/span&gt; &lt;span class="o"&gt;*=&lt;/span&gt; &lt;span class="n"&gt;time_penalty&lt;/span&gt;

        &lt;span class="c1"&gt;# Re-sort by adjusted scores
&lt;/span&gt;        &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sort&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="k"&gt;lambda&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;score&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;reverse&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;[:&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  3. Feedback-Based Improvement
&lt;/h3&gt;

&lt;p&gt;Learn from interview outcomes:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;FeedbackLearner&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;record_question_effectiveness&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;question&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;retrieved_contexts&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;List&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="n"&gt;candidate_response_quality&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# 0-1 score
&lt;/span&gt;        &lt;span class="n"&gt;eventual_hire_outcome&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;bool&lt;/span&gt;
    &lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Record how effective retrieval was&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;feedback_db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;insert&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
            &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;question&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;question&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;contexts_used&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;retrieved_contexts&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;response_quality&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;candidate_response_quality&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;hire_outcome&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;eventual_hire_outcome&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;timestamp&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="p"&gt;})&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;optimize_retrieval_weights&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Adjust retrieval based on feedback&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

        &lt;span class="c1"&gt;# Analyze which types of context led to better outcomes
&lt;/span&gt;        &lt;span class="n"&gt;feedback_data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;feedback_db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;query_recent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;days&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;90&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Calculate context type effectiveness
&lt;/span&gt;        &lt;span class="n"&gt;context_effectiveness&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;feedback&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;feedback_data&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;feedback&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;contexts_used&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
                &lt;span class="n"&gt;context_type&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;metadata&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;document_type&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

                &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;context_type&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;context_effectiveness&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                    &lt;span class="n"&gt;context_effectiveness&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;context_type&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;

                &lt;span class="n"&gt;context_effectiveness&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;context_type&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                    &lt;span class="n"&gt;feedback&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;response_quality&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
                &lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Update retrieval weights
&lt;/span&gt;        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;context_type&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;scores&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;context_effectiveness&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;items&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
            &lt;span class="n"&gt;avg_effectiveness&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;mean&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;scores&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;retrieval_weights&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;context_type&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;avg_effectiveness&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Performance Optimization
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Caching Strategy
&lt;/h3&gt;

&lt;p&gt;Cache frequently accessed contexts:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;functools&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;lru_cache&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;redis&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;CachedRetriever&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;redis_client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;redis&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Redis&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;host&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;localhost&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;port&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;6379&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cache_ttl&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;3600&lt;/span&gt;  &lt;span class="c1"&gt;# 1 hour
&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;retrieve_with_cache&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;job_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;List&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Retrieve with Redis caching&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

        &lt;span class="n"&gt;cache_key&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;retrieval:&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;job_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;:&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;hashlib&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;md5&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;encode&lt;/span&gt;&lt;span class="p"&gt;()).&lt;/span&gt;&lt;span class="nf"&gt;hexdigest&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

        &lt;span class="c1"&gt;# Check cache
&lt;/span&gt;        &lt;span class="n"&gt;cached_result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;redis_client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cache_key&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;cached_result&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;loads&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cached_result&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Retrieve if not cached
&lt;/span&gt;        &lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;retrieve&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Cache results
&lt;/span&gt;        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;redis_client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;setex&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;cache_key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cache_ttl&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Batching and Async Operations
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;AsyncRAGPipeline&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;process_batch_retrievals&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;queries&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;List&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;List&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;List&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]]:&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Process multiple retrievals in parallel&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

        &lt;span class="n"&gt;tasks&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;retrieve_async&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;queries&lt;/span&gt;
        &lt;span class="p"&gt;]&lt;/span&gt;

        &lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;gather&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;tasks&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;

    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;retrieve_async&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;List&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Async retrieval operation&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

        &lt;span class="c1"&gt;# Generate embedding
&lt;/span&gt;        &lt;span class="n"&gt;embedding&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;async_embedding_gen&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Query vector store
&lt;/span&gt;        &lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;vector_store&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;query_async&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;embedding&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Monitoring and Evaluation
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Key Metrics
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;RAGMetricsTracker&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;track_retrieval_metrics&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Track RAG system performance&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

        &lt;span class="n"&gt;metrics&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="c1"&gt;# Retrieval Quality
&lt;/span&gt;            &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;retrieval_relevance&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;calculate_relevance&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
            &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;context_utilization&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;calculate_utilization&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
            &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;retrieval_diversity&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;calculate_diversity&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;

            &lt;span class="c1"&gt;# Performance
&lt;/span&gt;            &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;avg_retrieval_latency&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;calculate_latency&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
            &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;cache_hit_rate&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;calculate_cache_hits&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
            &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;tokens_per_retrieval&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;calculate_token_usage&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;

            &lt;span class="c1"&gt;# Business Impact
&lt;/span&gt;            &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;question_relevance_score&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;calculate_question_quality&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
            &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;interview_completion_rate&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;calculate_completion_rate&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
            &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;candidate_satisfaction&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;calculate_satisfaction&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;metrics&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;RAG architecture transforms AI interview systems from rigid, template-based tools into adaptive, context-aware conversational agents. Key takeaways:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Chunking strategy matters&lt;/strong&gt;: Semantic chunking outperforms fixed-size&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Hybrid retrieval wins&lt;/strong&gt;: Combine dense and sparse for best results&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Context management is critical&lt;/strong&gt;: Stay within token limits intelligently&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Validate everything&lt;/strong&gt;: Ensure retrieved context is relevant and generated questions are high-quality&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Monitor and improve&lt;/strong&gt;: Use feedback to continuously optimize&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The future of HR tech lies in systems that understand nuance, adapt to context, and conduct truly intelligent conversations. RAG makes this possible.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;About the Author&lt;/strong&gt;&lt;br&gt;
Ademola Balogun is the founder and CEO of 180GIG Ltd, creators of &lt;a href="https://squrrel.app/" rel="noopener noreferrer"&gt;Squrrel&lt;/a&gt;—an AI-powered interview platform that makes hiring smarter and more equitable. With an MSc in Data Science from Birkbeck, University of London, he specializes in building practical AI solutions for real-world problems. He also created &lt;a href="https://tradingflashes.180gig.com" rel="noopener noreferrer"&gt;Trading Flashes ⚡&lt;/a&gt;, an AI-driven newsletter platform for financial markets.&lt;/p&gt;

</description>
      <category>interview</category>
      <category>architecture</category>
      <category>ai</category>
      <category>llm</category>
    </item>
    <item>
      <title>From Prototype to Production: Lessons Scaling an AI Interview Platform</title>
      <dc:creator>Ademola Balogun</dc:creator>
      <pubDate>Fri, 10 Oct 2025 20:57:16 +0000</pubDate>
      <link>https://dev.to/ambalogun/from-prototype-to-production-lessons-scaling-an-ai-interview-platform-na7</link>
      <guid>https://dev.to/ambalogun/from-prototype-to-production-lessons-scaling-an-ai-interview-platform-na7</guid>
      <description>&lt;h1&gt;
  
  
  From Prototype to Production: Lessons Scaling an AI Interview Platform
&lt;/h1&gt;

&lt;h2&gt;
  
  
  Introduction: The Reality Gap
&lt;/h2&gt;

&lt;p&gt;Building a working AI prototype is exciting. Making it production-ready for hundreds of concurrent users, enterprise clients, and mission-critical hiring decisions is an entirely different challenge. As a founder who bootstrapped an AI interview platform from concept to production, I've learned that the gap between "it works on my laptop" and "it reliably serves thousands of users" is filled with technical challenges that textbooks rarely cover.&lt;/p&gt;

&lt;p&gt;This article shares practical lessons from scaling an AI-powered interview platform, focusing on the technical, operational, and business challenges unique to production AI systems in the HR technology space.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Journey: From MVP to Production
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Phase 1: The MVP (Months 0-3)
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;What We Built:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Basic chatbot using OpenAI API&lt;/li&gt;
&lt;li&gt;Simple Flask web server&lt;/li&gt;
&lt;li&gt;SQLite database&lt;/li&gt;
&lt;li&gt;Deployed on a single AWS EC2 instance&lt;/li&gt;
&lt;li&gt;~20 test users from personal network&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;What Worked:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Proved the concept&lt;/li&gt;
&lt;li&gt;Got initial user feedback&lt;/li&gt;
&lt;li&gt;Validated market need&lt;/li&gt;
&lt;li&gt;Cost: ~$200/month&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;What Broke Immediately at Scale:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Database locks with concurrent users&lt;/li&gt;
&lt;li&gt;API rate limits during peak usage&lt;/li&gt;
&lt;li&gt;Memory leaks crashed server after ~100 interviews&lt;/li&gt;
&lt;li&gt;No queue system for processing&lt;/li&gt;
&lt;li&gt;Zero monitoring or observability&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Phase 2: Production-Ready Architecture (Months 4-8)
&lt;/h3&gt;

&lt;p&gt;Moving from prototype to production required rebuilding nearly everything:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Before (MVP)
&lt;/span&gt;&lt;span class="nd"&gt;@app.route&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;/interview&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;methods&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;POST&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;conduct_interview&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ChatCompletion&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpt-4&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;message&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]}]&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;INSERT INTO responses VALUES (?)&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;,))&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;jsonify&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# After (Production)
&lt;/span&gt;&lt;span class="nd"&gt;@app.route&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;/interview&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;methods&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;POST&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="nd"&gt;@rate_limit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;max_calls&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;period&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;60&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nd"&gt;@authenticate_user&lt;/span&gt;
&lt;span class="nd"&gt;@validate_input&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;conduct_interview&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Production-ready interview endpoint&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="c1"&gt;# Validate and sanitize input
&lt;/span&gt;        &lt;span class="n"&gt;message&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;sanitize_input&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;message&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

        &lt;span class="c1"&gt;# Add to processing queue
&lt;/span&gt;        &lt;span class="n"&gt;task_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;queue&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;enqueue&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;process_interview_response&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;g&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;session_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;g&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;session_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;priority&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;high&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Return immediately, process asynchronously
&lt;/span&gt;        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;jsonify&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
            &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;task_id&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;task_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;status&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;processing&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
        &lt;span class="p"&gt;}),&lt;/span&gt; &lt;span class="mi"&gt;202&lt;/span&gt;

    &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Interview error: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;exc_info&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;metrics&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;increment&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;interview_errors&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;jsonify&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;error&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Internal error&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;}),&lt;/span&gt; &lt;span class="mi"&gt;500&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Challenge 1: Handling Concurrent Users
&lt;/h2&gt;

&lt;h3&gt;
  
  
  The Problem
&lt;/h3&gt;

&lt;p&gt;AI interview platforms have unique concurrency patterns:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Burst traffic&lt;/strong&gt;: Companies hire in waves, not steady streams&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Long-running operations&lt;/strong&gt;: AI inference takes 2-5 seconds per response&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Stateful conversations&lt;/strong&gt;: Each interview is a multi-turn dialogue&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Peak hours&lt;/strong&gt;: Most interviews happen 9 AM - 5 PM in each timezone&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;When we hit 50 concurrent users, everything broke:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Database connection pool exhausted&lt;/li&gt;
&lt;li&gt;LLM API rate limits exceeded&lt;/li&gt;
&lt;li&gt;Server CPU maxed out&lt;/li&gt;
&lt;li&gt;Response times jumped from 2s to 30s+&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  The Solution: Queue-Based Architecture
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;redis&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;rq&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Queue&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Worker&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;InterviewProcessor&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;redis_conn&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;redis&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Redis&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;host&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;redis-cluster&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;port&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;6379&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;decode_responses&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Separate queues for different priorities
&lt;/span&gt;        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;high_priority&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Queue&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;high&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;connection&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;redis_conn&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;normal_priority&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Queue&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;normal&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;connection&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;redis_conn&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;low_priority&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Queue&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;low&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;connection&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;redis_conn&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;enqueue_interview_task&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;session_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;priority&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;normal&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Add interview processing task to queue&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

        &lt;span class="n"&gt;queue&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_queue_by_priority&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;priority&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="n"&gt;job&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;queue&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;enqueue&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;workers.process_interview_message&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;session_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;session_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;job_timeout&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;60s&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;result_ttl&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;3600&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# Keep results for 1 hour
&lt;/span&gt;            &lt;span class="n"&gt;failure_ttl&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;86400&lt;/span&gt;  &lt;span class="c1"&gt;# Keep failures for debugging
&lt;/span&gt;        &lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;job&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_job_status&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;job_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;Dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Check status of queued job&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

        &lt;span class="n"&gt;job&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Job&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;job_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;connection&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;redis_conn&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;status&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;job&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_status&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
            &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;result&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;job&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;job&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;is_finished&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;error&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;job&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;exc_info&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;job&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;is_failed&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;position_in_queue&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;job&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_position&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;job&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;is_queued&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Worker Process:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# workers.py
&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;rq&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;get_current_job&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;process_interview_message&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;session_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;Dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Background worker for processing interview messages&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

    &lt;span class="n"&gt;job&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;get_current_job&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="c1"&gt;# Update job progress
&lt;/span&gt;    &lt;span class="n"&gt;job&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;meta&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;stage&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;retrieving_context&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
    &lt;span class="n"&gt;job&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;save_meta&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="c1"&gt;# Get conversation context
&lt;/span&gt;    &lt;span class="n"&gt;context&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;retrieve_conversation_context&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;session_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;job&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;meta&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;stage&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;generating_response&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
    &lt;span class="n"&gt;job&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;save_meta&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="c1"&gt;# Generate AI response
&lt;/span&gt;    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;generate_ai_response&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;job&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;meta&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;stage&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;saving_results&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
    &lt;span class="n"&gt;job&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;save_meta&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="c1"&gt;# Save to database
&lt;/span&gt;    &lt;span class="nf"&gt;save_interview_response&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;session_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;job&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;meta&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;stage&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;completed&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
    &lt;span class="n"&gt;job&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;save_meta&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;response&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;session_id&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;session_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;timestamp&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;isoformat&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Results:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Handled 500+ concurrent users without degradation&lt;/li&gt;
&lt;li&gt;Average response time: 2.3 seconds (down from 30s)&lt;/li&gt;
&lt;li&gt;Clear visibility into processing queue&lt;/li&gt;
&lt;li&gt;Graceful degradation under extreme load&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Challenge 2: LLM API Management
&lt;/h2&gt;

&lt;h3&gt;
  
  
  The Cost Problem
&lt;/h3&gt;

&lt;p&gt;OpenAI API costs scale quickly:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;GPT-4: $0.03 per 1K input tokens, $0.06 per 1K output tokens&lt;/li&gt;
&lt;li&gt;Average interview: 50 turns × 500 tokens/turn = 25K tokens&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Cost per interview: $0.75 - $1.50&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;At 1,000 interviews/day: &lt;strong&gt;$750 - $1,500/day&lt;/strong&gt; = &lt;strong&gt;$22,500 - $45,000/month&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;For a bootstrapped startup, this was unsustainable.&lt;/p&gt;

&lt;h3&gt;
  
  
  Cost Optimization Strategies
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;1. Intelligent Model Selection:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;ModelSelector&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;select_optimal_model&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;task_complexity&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;required_quality&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;context_length&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Select cheapest model that meets requirements&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;task_complexity&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;simple&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;context_length&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;4000&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="c1"&gt;# Acknowledgments, simple follow-ups
&lt;/span&gt;            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;gpt-3.5-turbo&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;  &lt;span class="c1"&gt;# $0.0015/1K tokens
&lt;/span&gt;
        &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;task_complexity&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;medium&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;required_quality&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;standard&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="c1"&gt;# Most interview questions
&lt;/span&gt;            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;gpt-3.5-turbo-16k&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;  &lt;span class="c1"&gt;# $0.003/1K tokens
&lt;/span&gt;
        &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;context_length&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;8000&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="n"&gt;required_quality&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;high&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="c1"&gt;# Complex evaluations, long context
&lt;/span&gt;            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;gpt-4-turbo-preview&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;  &lt;span class="c1"&gt;# $0.01/1K tokens
&lt;/span&gt;
        &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;gpt-4&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;  &lt;span class="c1"&gt;# $0.03/1K tokens
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Savings: 40% reduction&lt;/strong&gt; by routing simple tasks to cheaper models&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Aggressive Caching:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;hashlib&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;functools&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;lru_cache&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;ResponseCache&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;redis_client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;redis&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Redis&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ttl&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;3600&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;24&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;7&lt;/span&gt;  &lt;span class="c1"&gt;# 1 week
&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_cached_response&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;temperature&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;Optional&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Check if we have cached response for this prompt&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

        &lt;span class="c1"&gt;# Create cache key
&lt;/span&gt;        &lt;span class="n"&gt;cache_key&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create_cache_key&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;temperature&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Check cache
&lt;/span&gt;        &lt;span class="n"&gt;cached&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;redis_client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cache_key&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;cached&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;metrics&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;increment&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;cache_hits&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;loads&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cached&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="n"&gt;metrics&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;increment&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;cache_misses&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;cache_response&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;temperature&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;
    &lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Cache response for future use&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

        &lt;span class="n"&gt;cache_key&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create_cache_key&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;temperature&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;redis_client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;setex&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;cache_key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ttl&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;create_cache_key&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;temperature&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Generate deterministic cache key&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

        &lt;span class="n"&gt;content&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;|&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;|&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;temperature&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;llm_cache:&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;hashlib&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sha256&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;encode&lt;/span&gt;&lt;span class="p"&gt;()).&lt;/span&gt;&lt;span class="nf"&gt;hexdigest&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Savings: 35% reduction&lt;/strong&gt; from cache hits on common questions&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Prompt Optimization:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Before: Verbose prompt (1,200 tokens)
&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
You are an AI interviewer conducting an interview for the position 
of &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;job_title&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; at &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;company_name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;. 

The job description is as follows:
&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;job_description&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;  # Often 500+ tokens

The candidate&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s background includes:
&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;candidate_background&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;  # Another 300+ tokens

Previous conversation:
&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;full_conversation_history&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;  # Could be 400+ tokens

Guidelines for interviewing:
&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;lengthy_guidelines&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;  # 200+ tokens

Now, given the candidate said: &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;candidate_response&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;

Generate the next interview question.
&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

&lt;span class="c1"&gt;# After: Optimized prompt (400 tokens)
&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
Interview: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;job_title&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;
Key requirements: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;extract_key_requirements&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;job_description&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;  # 100 tokens
Recent context: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;summarize_recent_turns&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;history&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;  # 150 tokens
Candidate said: &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;candidate_response&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;
Next question:
&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Savings: 65% token reduction&lt;/strong&gt; per API call&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Combined Result: 70% cost reduction&lt;/strong&gt; ($22.5K → $6.75K/month)&lt;/p&gt;

&lt;h2&gt;
  
  
  Challenge 3: Database Performance at Scale
&lt;/h2&gt;

&lt;h3&gt;
  
  
  The Problem
&lt;/h3&gt;

&lt;p&gt;Our SQLite database worked fine for prototyping but couldn't handle production load:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Write locks blocked concurrent interviews&lt;/li&gt;
&lt;li&gt;No query optimization&lt;/li&gt;
&lt;li&gt;No connection pooling&lt;/li&gt;
&lt;li&gt;Slow full-text search&lt;/li&gt;
&lt;li&gt;No backup/recovery strategy&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Migration to Production Database
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Production database schema
&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sqlalchemy&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;create_engine&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Column&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Integer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;DateTime&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Float&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Index&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sqlalchemy.ext.declarative&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;declarative_base&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sqlalchemy.orm&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;sessionmaker&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;relationship&lt;/span&gt;

&lt;span class="n"&gt;Base&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;declarative_base&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;Interview&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Base&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;__tablename__&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;interviews&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;

    &lt;span class="nb"&gt;id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Column&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Integer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;primary_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;session_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Column&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;64&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;unique&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;nullable&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;index&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;candidate_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Column&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Integer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;ForeignKey&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;candidates.id&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;index&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;job_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Column&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Integer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;ForeignKey&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;jobs.id&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;index&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Column&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;index&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# active, completed, abandoned
&lt;/span&gt;    &lt;span class="n"&gt;started_at&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Column&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;DateTime&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;nullable&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;index&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;completed_at&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Column&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;DateTime&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;total_duration&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Column&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Integer&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# seconds
&lt;/span&gt;
    &lt;span class="c1"&gt;# Relationships
&lt;/span&gt;    &lt;span class="n"&gt;messages&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;relationship&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Message&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;back_populates&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;interview&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;evaluation&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;relationship&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Evaluation&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;back_populates&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;interview&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;uselist&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Indexes for common queries
&lt;/span&gt;    &lt;span class="n"&gt;__table_args__&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="nc"&gt;Index&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;ix_interview_candidate_date&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;candidate_id&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;started_at&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="nc"&gt;Index&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;ix_interview_job_status&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;job_id&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;status&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;Message&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Base&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;__tablename__&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;messages&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;

    &lt;span class="nb"&gt;id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Column&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Integer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;primary_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;interview_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Column&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Integer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;ForeignKey&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;interviews.id&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;nullable&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;index&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;sender&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Column&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;nullable&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# 'ai' or 'candidate'
&lt;/span&gt;    &lt;span class="n"&gt;content&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Column&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;nullable&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;timestamp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Column&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;DateTime&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;nullable&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;index&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# For analytics
&lt;/span&gt;    &lt;span class="n"&gt;tokens_used&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Column&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Integer&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;latency_ms&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Column&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Integer&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;model_version&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Column&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

    &lt;span class="n"&gt;interview&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;relationship&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Interview&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;back_populates&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;messages&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;Evaluation&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Base&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;__tablename__&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;evaluations&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;

    &lt;span class="nb"&gt;id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Column&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Integer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;primary_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;interview_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Column&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Integer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;ForeignKey&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;interviews.id&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;nullable&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;unique&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Scores
&lt;/span&gt;    &lt;span class="n"&gt;overall_score&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Column&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Float&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;technical_score&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Column&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Float&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;communication_score&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Column&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Float&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;cultural_fit_score&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Column&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Float&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Analysis
&lt;/span&gt;    &lt;span class="n"&gt;strengths&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Column&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;weaknesses&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Column&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;recommendation&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Column&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;  &lt;span class="c1"&gt;# hire, maybe, no_hire
&lt;/span&gt;
    &lt;span class="n"&gt;created_at&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Column&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;DateTime&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;nullable&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;interview&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;relationship&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Interview&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;back_populates&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;evaluation&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Connection pooling
&lt;/span&gt;&lt;span class="n"&gt;engine&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;create_engine&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;postgresql://user:pass@host:5432/interviews&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;pool_size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# Base pool size
&lt;/span&gt;    &lt;span class="n"&gt;max_overflow&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# Additional connections if needed
&lt;/span&gt;    &lt;span class="n"&gt;pool_pre_ping&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# Verify connections before using
&lt;/span&gt;    &lt;span class="n"&gt;pool_recycle&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;3600&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# Recycle connections every hour
&lt;/span&gt;    &lt;span class="n"&gt;echo&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;  &lt;span class="c1"&gt;# Don't log SQL in production
&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;SessionLocal&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;sessionmaker&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;bind&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;engine&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Query Optimization
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sqlalchemy.orm&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;joinedload&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;selectinload&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;InterviewRepository&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_active_interviews_with_messages&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;limit&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;
    &lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Efficiently load interviews with related data&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

        &lt;span class="n"&gt;session&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;SessionLocal&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

        &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="c1"&gt;# Use eager loading to prevent N+1 queries
&lt;/span&gt;            &lt;span class="n"&gt;interviews&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Interview&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;\
                &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;options&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                    &lt;span class="nf"&gt;selectinload&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Interview&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
                    &lt;span class="nf"&gt;joinedload&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Interview&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;evaluation&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                &lt;span class="p"&gt;)&lt;/span&gt;\
                &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;filter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Interview&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;active&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;\
                &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;order_by&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Interview&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;started_at&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;desc&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;\
                &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;limit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;limit&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;\
                &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;all&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;interviews&lt;/span&gt;
        &lt;span class="k"&gt;finally&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;close&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_interview_statistics&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;job_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;start_date&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;end_date&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;Dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Optimized aggregation query&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

        &lt;span class="n"&gt;session&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;SessionLocal&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

        &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sqlalchemy&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;func&lt;/span&gt;

            &lt;span class="n"&gt;stats&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="n"&gt;func&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;count&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Interview&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;label&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;total_interviews&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
                &lt;span class="n"&gt;func&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;avg&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Interview&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;total_duration&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;label&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;avg_duration&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
                &lt;span class="n"&gt;func&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;avg&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Evaluation&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;overall_score&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;label&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;avg_score&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Evaluation&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;\
             &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;filter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                 &lt;span class="n"&gt;Interview&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;job_id&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;job_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                 &lt;span class="n"&gt;Interview&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;started_at&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;between&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;start_date&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;end_date&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
                 &lt;span class="n"&gt;Interview&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;completed&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
             &lt;span class="p"&gt;)&lt;/span&gt;\             &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;first&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;total_interviews&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;stats&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;total_interviews&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;avg_duration_minutes&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;stats&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;avg_duration&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;60&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;stats&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;avg_duration&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;avg_score&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;round&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;stats&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;avg_score&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;stats&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;avg_score&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="k"&gt;finally&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;close&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Read Replicas for Analytics
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;DatabaseRouter&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Route queries to primary or read replica&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;primary&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;create_engine&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;postgresql://primary:5432/db&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;replica&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;create_engine&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;postgresql://replica:5432/db&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_session&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;read_only&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;bool&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Get appropriate database session&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;read_only&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;sessionmaker&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;bind&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;replica&lt;/span&gt;&lt;span class="p"&gt;)()&lt;/span&gt;
        &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;sessionmaker&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;bind&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;primary&lt;/span&gt;&lt;span class="p"&gt;)()&lt;/span&gt;

&lt;span class="c1"&gt;# Usage
&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;generate_analytics_report&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;job_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Heavy analytics on read replica&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

    &lt;span class="n"&gt;session&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;db_router&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_session&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;read_only&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Complex analytics queries don't impact production writes
&lt;/span&gt;    &lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(...).&lt;/span&gt;&lt;span class="nf"&gt;all&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Challenge 4: Monitoring and Observability
&lt;/h2&gt;

&lt;h3&gt;
  
  
  What to Monitor in Production AI Systems
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;prometheus_client&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;prom&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;datadog&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;statsd&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;MetricsCollector&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="c1"&gt;# Prometheus metrics
&lt;/span&gt;        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;interview_counter&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;prom&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Counter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;interviews_total&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Total interviews conducted&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;status&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;job_type&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;response_latency&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;prom&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Histogram&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;ai_response_latency_seconds&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Time to generate AI response&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;buckets&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mf"&gt;0.5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;1.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;2.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;5.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;10.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;30.0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;llm_cost_gauge&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;prom&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Gauge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;llm_cost_dollars_daily&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Daily LLM API costs&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;active_interviews&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;prom&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Gauge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;active_interviews&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Number of interviews in progress&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;error_counter&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;prom&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Counter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;interview_errors_total&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Total interview errors&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;error_type&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;component&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;track_interview_message&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;latency&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;tokens_used&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;success&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;bool&lt;/span&gt;
    &lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Track metrics for each message&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

        &lt;span class="c1"&gt;# Response time
&lt;/span&gt;        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;response_latency&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;observe&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;latency&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Cost estimation
&lt;/span&gt;        &lt;span class="n"&gt;cost&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;estimate_cost&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tokens_used&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;llm_cost_gauge&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;inc&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cost&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Send to DataDog for dashboarding
&lt;/span&gt;        &lt;span class="n"&gt;statsd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;histogram&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;interview.latency&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;latency&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;statsd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;increment&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;interview.messages&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tags&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;model:&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;success&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;error_counter&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;labels&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="n"&gt;error_type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;generation_failed&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;component&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;llm&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
            &lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;inc&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Logging Strategy
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;structlog&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;logging&lt;/span&gt;

&lt;span class="c1"&gt;# Structured logging
&lt;/span&gt;&lt;span class="n"&gt;logger&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;structlog&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_logger&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;InterviewLogger&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;log_interview_event&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;event_type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;session_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;kwargs&lt;/span&gt;
    &lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Structured logging for interview events&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

        &lt;span class="n"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;event_type&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;session_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;session_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;timestamp&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;isoformat&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
            &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;kwargs&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;log_error_with_context&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;error&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;session_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Dict&lt;/span&gt;
    &lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Log errors with full context for debugging&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

        &lt;span class="n"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;interview_error&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;session_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;session_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;error_type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nf"&gt;type&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;error&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="n"&gt;__name__&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;error_message&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;error&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="n"&gt;stack_trace&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;traceback&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;format_exc&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
            &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;timestamp&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;isoformat&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Usage
&lt;/span&gt;&lt;span class="n"&gt;interview_logger&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;InterviewLogger&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;generate_ai_response&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;interview_logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log_interview_event&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;message_generated&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;session_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;session_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;message_length&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="n"&gt;response_length&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="n"&gt;latency&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;latency&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;interview_logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log_error_with_context&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;error&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;session_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;session_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;message&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;user_id&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;job_id&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;job_id&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;raise&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Alerting Rules
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# alerting_rules.yml
&lt;/span&gt;&lt;span class="n"&gt;groups&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
  &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;interview_platform&lt;/span&gt;
    &lt;span class="n"&gt;interval&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;30&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt;
    &lt;span class="n"&gt;rules&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
      &lt;span class="c1"&gt;# High error rate
&lt;/span&gt;      &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;alert&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;HighErrorRate&lt;/span&gt;
        &lt;span class="n"&gt;expr&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;
          &lt;span class="nf"&gt;rate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;interview_errors_total&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="n"&gt;m&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mf"&gt;0.1&lt;/span&gt;
        &lt;span class="n"&gt;annotations&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
          &lt;span class="n"&gt;summary&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;High error rate detected&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
          &lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Error rate is {{ $value }} errors/second&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

      &lt;span class="c1"&gt;# Slow response times
&lt;/span&gt;      &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;alert&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;SlowResponses&lt;/span&gt;
        &lt;span class="n"&gt;expr&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;
          &lt;span class="nf"&gt;histogram_quantile&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;0.95&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ai_response_latency_seconds&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;
        &lt;span class="n"&gt;annotations&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
          &lt;span class="n"&gt;summary&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;95th percentile response time &amp;gt; 10s&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

      &lt;span class="c1"&gt;# High API costs
&lt;/span&gt;      &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;alert&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;HighAPICosts&lt;/span&gt;
        &lt;span class="n"&gt;expr&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;
          &lt;span class="n"&gt;llm_cost_dollars_daily&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;500&lt;/span&gt;
        &lt;span class="n"&gt;annotations&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
          &lt;span class="n"&gt;summary&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Daily API costs exceeding budget&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

      &lt;span class="c1"&gt;# Queue backup
&lt;/span&gt;      &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;alert&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;QueueBacklog&lt;/span&gt;
        &lt;span class="n"&gt;expr&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;
          &lt;span class="n"&gt;redis_queue_length&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;queue&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;high&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;
        &lt;span class="n"&gt;annotations&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
          &lt;span class="n"&gt;summary&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;High priority queue backed up&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Challenge 5: Security and Data Privacy
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Interview Data is Sensitive
&lt;/h3&gt;

&lt;p&gt;Candidates share personal information:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Employment history&lt;/li&gt;
&lt;li&gt;Salary expectations&lt;/li&gt;
&lt;li&gt;Personal projects and achievements&lt;/li&gt;
&lt;li&gt;Sometimes protected characteristics (inadvertently)&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Security Implementation
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;cryptography.fernet&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Fernet&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;hashlib&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;SecureInterviewStorage&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cipher&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Fernet&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;ENCRYPTION_KEY&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;store_interview_data&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;session_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;candidate_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;
    &lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Securely store interview data&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

        &lt;span class="c1"&gt;# Encrypt sensitive content
&lt;/span&gt;        &lt;span class="n"&gt;encrypted_message&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cipher&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;encrypt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;encode&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;

        &lt;span class="c1"&gt;# Hash session ID for privacy
&lt;/span&gt;        &lt;span class="n"&gt;hashed_session&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;hashlib&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sha256&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;session_id&lt;/span&gt;&lt;span class="si"&gt;}{&lt;/span&gt;&lt;span class="n"&gt;candidate_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;encode&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;hexdigest&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

        &lt;span class="c1"&gt;# Store with encryption
&lt;/span&gt;        &lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
            INSERT INTO messages 
            (session_hash, encrypted_content, candidate_id, created_at)
            VALUES (?, ?, ?, ?)
            &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;hashed_session&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;encrypted_message&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;candidate_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;retrieve_interview_data&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;session_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;candidate_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;requesting_user_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;List&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Retrieve interview data with access control&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

        &lt;span class="c1"&gt;# Verify requesting user has permission
&lt;/span&gt;        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;verify_access_permission&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;requesting_user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;candidate_id&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;PermissionError&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Unauthorized access attempt&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="n"&gt;hashed_session&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;hashlib&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sha256&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;session_id&lt;/span&gt;&lt;span class="si"&gt;}{&lt;/span&gt;&lt;span class="n"&gt;candidate_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;encode&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;hexdigest&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

        &lt;span class="n"&gt;encrypted_messages&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;SELECT encrypted_content FROM messages WHERE session_hash = ?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;hashed_session&lt;/span&gt;&lt;span class="p"&gt;,)&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Decrypt messages
&lt;/span&gt;        &lt;span class="n"&gt;messages&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cipher&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;decrypt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;enc_msg&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;decode&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
            &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;enc_msg&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;encrypted_messages&lt;/span&gt;
        &lt;span class="p"&gt;]&lt;/span&gt;

        &lt;span class="c1"&gt;# Log access for audit trail
&lt;/span&gt;        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log_data_access&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;requesting_user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;candidate_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;session_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;messages&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;anonymize_for_training&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;interview_data&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;List&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Dict&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;List&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Dict&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Anonymize data before using for model training&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

        &lt;span class="n"&gt;anonymized&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;

        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;interview&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;interview_data&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="c1"&gt;# Remove PII
&lt;/span&gt;            &lt;span class="n"&gt;clean_messages&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
                &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;remove_pii&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;msg&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;interview&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;messages&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
            &lt;span class="p"&gt;]&lt;/span&gt;

            &lt;span class="n"&gt;anonymized&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
                &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;messages&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;clean_messages&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;job_category&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;interview&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;job_category&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;  &lt;span class="c1"&gt;# Keep for context
&lt;/span&gt;                &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;outcome&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;interview&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;outcome&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;  &lt;span class="c1"&gt;# Keep for training
&lt;/span&gt;                &lt;span class="c1"&gt;# Removed: candidate_id, company_name, specific dates
&lt;/span&gt;            &lt;span class="p"&gt;})&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;anonymized&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  GDPR Compliance
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;GDPRComplianceHandler&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;handle_deletion_request&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;candidate_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Handle right to be forgotten&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

        &lt;span class="c1"&gt;# 1. Delete from primary database
&lt;/span&gt;        &lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;DELETE FROM messages WHERE candidate_id = ?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;candidate_id&lt;/span&gt;&lt;span class="p"&gt;,)&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;DELETE FROM interviews WHERE candidate_id = ?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;candidate_id&lt;/span&gt;&lt;span class="p"&gt;,)&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# 2. Delete from backups (mark for deletion)
&lt;/span&gt;        &lt;span class="n"&gt;backup_manager&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;mark_for_deletion&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;candidate_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# 3. Delete from analytics databases
&lt;/span&gt;        &lt;span class="n"&gt;analytics_db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;delete_candidate_data&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;candidate_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# 4. Remove from any cached data
&lt;/span&gt;        &lt;span class="n"&gt;cache&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;delete_pattern&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;candidate:&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;candidate_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;:*&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# 5. Log the deletion (required for compliance)
&lt;/span&gt;        &lt;span class="n"&gt;compliance_log&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;record_deletion&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;candidate_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;candidate_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;requested_at&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
            &lt;span class="n"&gt;completed_at&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;export_candidate_data&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;candidate_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;Dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Handle data portability request&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;interviews&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_interview_history&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;candidate_id&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;evaluations&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_evaluations&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;candidate_id&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;messages&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_messages&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;candidate_id&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;metadata&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_metadata&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;candidate_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Challenge 6: Enterprise Requirements
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Multi-Tenancy
&lt;/h3&gt;

&lt;p&gt;Supporting multiple companies on one platform:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;MultiTenantArchitecture&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tenant_cache&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_tenant_context&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;Dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Extract tenant from request&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

        &lt;span class="c1"&gt;# From subdomain: acme.platform.com
&lt;/span&gt;        &lt;span class="n"&gt;subdomain&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;host&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;.&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

        &lt;span class="c1"&gt;# Or from API key
&lt;/span&gt;        &lt;span class="n"&gt;api_key&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;X-API-Key&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;tenant&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_tenant_from_api_key&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;tenant&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_tenant_from_subdomain&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;subdomain&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;tenant&lt;/span&gt;

    &lt;span class="nd"&gt;@cache&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ttl&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;300&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_tenant_config&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tenant_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;Dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Get tenant-specific configuration&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

        &lt;span class="n"&gt;config&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;SELECT * FROM tenant_configs WHERE tenant_id = ?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tenant_id&lt;/span&gt;&lt;span class="p"&gt;,)&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;branding&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;branding_settings&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
            &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;features&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;enabled_features&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
            &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;limits&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;usage_limits&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
            &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;llm_config&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;llm_settings&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;enforce_tenant_isolation&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tenant_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Ensure queries only access tenant&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s data&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

        &lt;span class="c1"&gt;# Add tenant_id to all queries automatically
&lt;/span&gt;        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;filter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tenant_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;tenant_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Middleware
&lt;/span&gt;&lt;span class="nd"&gt;@app.before_request&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;inject_tenant_context&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Add tenant context to all requests&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

    &lt;span class="n"&gt;g&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tenant&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;multi_tenant&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_tenant_context&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;g&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tenant_config&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;multi_tenant&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_tenant_config&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;g&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tenant&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;id&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  SSO Integration
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;onelogin.saml2.auth&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OneLogin_Saml2_Auth&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;SSOHandler&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;handle_saml_login&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Handle SAML SSO authentication&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

        &lt;span class="n"&gt;auth&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OneLogin_Saml2_Auth&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;saml_settings&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="n"&gt;auth&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;process_response&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;auth&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;is_authenticated&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
            &lt;span class="n"&gt;user_data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;email&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;auth&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_nameid&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
                &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;attributes&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;auth&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_attributes&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
                &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;session_index&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;auth&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_session_index&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;

            &lt;span class="c1"&gt;# Create or update user
&lt;/span&gt;            &lt;span class="n"&gt;user&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sync_sso_user&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

            &lt;span class="c1"&gt;# Create session
&lt;/span&gt;            &lt;span class="n"&gt;session&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create_authenticated_session&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;redirect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;url_for&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;dashboard&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
        &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Authentication failed&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;401&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Challenge 7: Cost Management at Scale
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Final Cost Breakdown (Production)
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Monthly Costs for 10,000 interviews:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Infrastructure:
- AWS EC2 (application servers): $500
- AWS RDS (PostgreSQL): $300
- Redis Cache: $100
- Load Balancer: $50
- CloudFront CDN: $100

AI/ML:
- OpenAI API (after optimization): $6,750
- Whisper API (voice): $500

Monitoring &amp;amp; Tools:
- DataDog: $200
- Sentry error tracking: $50
- StatusPage: $50

Total: $8,600/month
Cost per interview: $0.86
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Revenue Model
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Pricing Tiers:
- Starter: $99/mo (100 interviews)
- Professional: $499/mo (1,000 interviews)
- Enterprise: $2,499/mo (10,000 interviews) + custom features

At 100 enterprise customers:
Revenue: $249,900/month
Costs: $8,600/month
Gross Margin: 97%
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Lessons Learned
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Start Simple, But Plan for Scale
&lt;/h3&gt;

&lt;p&gt;Don't over-engineer the MVP, but understand what will break at scale:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Use PostgreSQL from day 1 (not SQLite)&lt;/li&gt;
&lt;li&gt;Design for horizontal scaling&lt;/li&gt;
&lt;li&gt;Implement queues early&lt;/li&gt;
&lt;li&gt;Add monitoring before you need it&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  2. Optimize Costs Aggressively
&lt;/h3&gt;

&lt;p&gt;AI API costs can spiral out of control:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Cache everything possible&lt;/li&gt;
&lt;li&gt;Use cheaper models for simple tasks&lt;/li&gt;
&lt;li&gt;Optimize prompts ruthlessly&lt;/li&gt;
&lt;li&gt;Monitor costs daily&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  3. Security Can't Be an Afterthought
&lt;/h3&gt;

&lt;p&gt;Interview data is sensitive:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Encrypt at rest and in transit&lt;/li&gt;
&lt;li&gt;Implement access controls from start&lt;/li&gt;
&lt;li&gt;Plan for GDPR/privacy compliance&lt;/li&gt;
&lt;li&gt;Regular security audits&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  4. Observability is Critical
&lt;/h3&gt;

&lt;p&gt;You can't fix what you can't see:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Comprehensive logging&lt;/li&gt;
&lt;li&gt;Real-time metrics&lt;/li&gt;
&lt;li&gt;Alerting on key thresholds&lt;/li&gt;
&lt;li&gt;Error tracking with context&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  5. Enterprise Customers Have Unique Needs
&lt;/h3&gt;

&lt;p&gt;Plan for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;SSO integration&lt;/li&gt;
&lt;li&gt;Multi-tenancy&lt;/li&gt;
&lt;li&gt;Custom branding&lt;/li&gt;
&lt;li&gt;Compliance requirements (SOC 2, GDPR)&lt;/li&gt;
&lt;li&gt;SLAs and support expectations&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Scaling an AI platform from prototype to production serving thousands of users requires solving challenges across infrastructure, cost optimization, security, and enterprise requirements. The technical decisions made early significantly impact long-term scalability and unit economics.&lt;/p&gt;

&lt;p&gt;Key takeaways:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Architecture matters&lt;/strong&gt;: Queue-based processing handles concurrency gracefully&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cost optimization is essential&lt;/strong&gt;: 70% cost reduction possible through caching, model selection, and prompt engineering&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Monitoring enables scale&lt;/strong&gt;: Can't optimize what you don't measure&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Security first&lt;/strong&gt;: Privacy and compliance must be built in, not bolted on&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Enterprise-ready takes time&lt;/strong&gt;: SSO, multi-tenancy, and compliance aren't trivial additions&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Building production AI systems is hard. But with careful planning, aggressive optimization, and commitment to reliability, it's possible to build scalable, profitable AI products that serve real business needs.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;About the Author&lt;/strong&gt;&lt;br&gt;
Ademola Balogun is the founder and CEO of 180GIG Ltd, where he bootstrapped and scaled an AI interview platform from concept to production. With an MSc in Data Science from Birkbeck, University of London, he combines technical depth with entrepreneurial execution. His work focuses on building practical, scalable AI systems that solve real business problems while maintaining profitability and reliability.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>architecture</category>
      <category>startup</category>
    </item>
  </channel>
</rss>
