<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Artem KK</title>
    <description>The latest articles on DEV Community by Artem KK (@kazkozdev).</description>
    <link>https://dev.to/kazkozdev</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3834673%2Fe455ed15-4828-4ced-b5c4-97bf43cb3143.jpg</url>
      <title>DEV Community: Artem KK</title>
      <link>https://dev.to/kazkozdev</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/kazkozdev"/>
    <language>en</language>
    <item>
      <title>I Studied How GitHub READMEs Are Actually Evaluated — Here Are the 5 Things That Matter</title>
      <dc:creator>Artem KK</dc:creator>
      <pubDate>Sat, 04 Apr 2026 05:01:44 +0000</pubDate>
      <link>https://dev.to/kazkozdev/i-studied-how-github-readmes-are-actually-evaluated-here-are-the-5-things-that-matter-2epd</link>
      <guid>https://dev.to/kazkozdev/i-studied-how-github-readmes-are-actually-evaluated-here-are-the-5-things-that-matter-2epd</guid>
      <description>&lt;p&gt;I spent weeks reading hiring threads, portfolio guides, recruiter-facing articles, Reddit discussions, and academic papers to answer one question: &lt;strong&gt;what do people actually look at when they evaluate a GitHub profile?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I expected to find a clear standard. I didn't.&lt;/p&gt;

&lt;p&gt;What I found was more useful: most README "best practices" aren't rules — they're &lt;strong&gt;signals&lt;/strong&gt;. And there's a formal framework for understanding why some signals matter and others don't.&lt;/p&gt;

&lt;p&gt;I wrote up the full deep-dive with all sources and references. Here's the short version of what I verified.&lt;/p&gt;




&lt;h2&gt;
  
  
  1. Your README Is a Screening Surface, Not Documentation
&lt;/h2&gt;

&lt;p&gt;People don't start with a deep code review. The first pass is shallow — they're scanning for signs of seriousness. Eye-tracking research shows recruiters spend about 7 seconds on an initial screen. Your README's first job isn't to explain everything. It's to &lt;strong&gt;justify continued attention&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  2. Tests and CI Are Signals, Not Checkboxes
&lt;/h2&gt;

&lt;p&gt;Tests, CI, &lt;code&gt;.env.example&lt;/code&gt;, meaningful commits — these details keep showing up in advice because they compress information. They help a reviewer infer &lt;em&gt;how you work&lt;/em&gt;. But here's the caveat: a CI pipeline on a three-file todo app is a weak signal. These things only matter when attached to substantive work.&lt;/p&gt;

&lt;h2&gt;
  
  
  3. The Clone Problem Is Real, but Misunderstood
&lt;/h2&gt;

&lt;p&gt;The internet loves to say "remove your Netflix clone." The real issue isn't that familiar project shapes are bad — it's that simple clones signal tutorial-following more than independent judgment. The real divide is &lt;strong&gt;replication as an endpoint&lt;/strong&gt; vs. &lt;strong&gt;replication as a starting point for something original&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  4. Proof Beats Description — Every Time
&lt;/h2&gt;

&lt;p&gt;A live demo, a screenshot, a short GIF, a deployment link, or even a small number of real users. What matters is whether the reader has to &lt;em&gt;imagine&lt;/em&gt; the project works, or can &lt;em&gt;see&lt;/em&gt; that it does. After I added a deployment link and a 10-second GIF to one of my projects, people stopped asking "what does it do?" and started asking implementation questions.&lt;/p&gt;

&lt;h2&gt;
  
  
  5. Writing Helps Only When It Extends Real Work
&lt;/h2&gt;

&lt;p&gt;Blog posts don't replace projects. But "I built X, here's what broke, and here's what I learned" is powerful — because it shows how you think. A post-mortem of a real project is hard to fake. Generic "Top 10 tips" pieces are not.&lt;/p&gt;




&lt;h2&gt;
  
  
  What I'd Actually Do Now
&lt;/h2&gt;

&lt;p&gt;If I were cleaning up a GitHub repo today, I'd focus on five things:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Explain what the project is in &lt;strong&gt;one clear sentence&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Show &lt;strong&gt;proof&lt;/strong&gt; that it works: demo, screenshot, deployment&lt;/li&gt;
&lt;li&gt;Include signals of engineering discipline: tests, CI, setup clarity&lt;/li&gt;
&lt;li&gt;Explain &lt;strong&gt;why&lt;/strong&gt; the project exists, not just what it does&lt;/li&gt;
&lt;li&gt;Add a section showing &lt;strong&gt;reflection&lt;/strong&gt;: trade-offs, challenges, what you'd change&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The question that ties it all together:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;What uncertainty is this README removing for the person reading it?&lt;/p&gt;
&lt;/blockquote&gt;




&lt;p&gt;The full article goes deeper into the academic research behind these ideas — including signaling theory, eye-tracking studies, and peer-reviewed work on how GitHub profiles are evaluated in hiring.&lt;/p&gt;

&lt;p&gt;👉 &lt;a href="https://medium.com/local-llm-lab/the-great-readme-hunt-what-readme-best-practices-actually-signal-d9df9782b512" rel="noopener noreferrer"&gt;Read the full deep-dive on Medium&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;📂 All sources and references are also available on GitHub: &lt;a href="https://github.com/KazKozDev/github-rabbit-hole" rel="noopener noreferrer"&gt;github.com/KazKozDev/github-rabbit-hole&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;What do you think is the stronger signal in a portfolio: a polished clone, or a rough project with real users?&lt;/strong&gt; I'd love to hear your take in the comments.&lt;/p&gt;

</description>
      <category>github</category>
      <category>softwareengineering</category>
      <category>career</category>
      <category>portfolio</category>
    </item>
    <item>
      <title>Building a Perplexity Clone for Local LLMs in 50 Lines of Python</title>
      <dc:creator>Artem KK</dc:creator>
      <pubDate>Fri, 20 Mar 2026 05:42:39 +0000</pubDate>
      <link>https://dev.to/kazkozdev/building-a-perplexity-clone-for-local-llms-in-50-lines-of-python-2p79</link>
      <guid>https://dev.to/kazkozdev/building-a-perplexity-clone-for-local-llms-in-50-lines-of-python-2p79</guid>
      <description>&lt;p&gt;Your local LLM is smart but blind — it can't see the internet. Here's how to give it eyes, a filter, and a citation engine.&lt;/p&gt;




&lt;p&gt;This is a hands-on tutorial. We'll install a library, run a real query, break down every stage of what happens inside, and look at the actual output your LLM receives.&lt;/p&gt;

&lt;p&gt;By the end, you'll have a working pipeline that turns any local model (Ollama, LM Studio, anything with a text input) into something that searches the web, reads pages, ranks the results, and generates a structured prompt with inline citations — like a self-hosted Perplexity.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Background:&lt;/strong&gt; If you want to understand the architecture this is based on, I wrote a &lt;a href="https://medium.com/@kazkozdev/how-perplexity-actually-searches-the-internet-ae4b50dd9837" rel="noopener noreferrer"&gt;deep dive into how Perplexity actually works&lt;/a&gt; — the five-stage RAG pipeline, hybrid retrieval on Vespa.ai, Cerebras-accelerated inference, the citation integrity problems. This tutorial is the practical counterpart.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Repo:&lt;/strong&gt; &lt;a href="https://github.com/KazKozDev/production_rag_pipeline" rel="noopener noreferrer"&gt;github.com/KazKozDev/production_rag_pipeline&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  What We're Building
&lt;/h2&gt;

&lt;p&gt;A pipeline that does this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Your question
    ↓
Search (Bing + DuckDuckGo, parallel)
    ↓
Semantic pre-filter (drop irrelevant results before fetching)
    ↓
Fetch pages (only the ones that passed filtering)
    ↓
Extract content (strip boilerplate, ads, navigation)
    ↓
Chunk + Rerank (BM25 + semantic + answer-span + MMR)
    ↓
LLM-ready prompt with numbered citations
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The pipeline does NOT include the LLM itself — it builds the prompt. You plug in whatever model you want.&lt;/p&gt;




&lt;h2&gt;
  
  
  Step 1: Installation
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/KazKozDev/production_rag_pipeline.git
&lt;span class="nb"&gt;cd &lt;/span&gt;production_rag_pipeline
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Pick your install level:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Minimal — BM25 ranking, BeautifulSoup extraction. No ML models.&lt;/span&gt;
pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nb"&gt;.&lt;/span&gt;

&lt;span class="c"&gt;# Better extraction with trafilatura&lt;/span&gt;
pip &lt;span class="nb"&gt;install&lt;/span&gt; .[extraction]

&lt;span class="c"&gt;# Semantic ranking with sentence-transformers (recommended)&lt;/span&gt;
pip &lt;span class="nb"&gt;install&lt;/span&gt; .[semantic]

&lt;span class="c"&gt;# Everything&lt;/span&gt;
pip &lt;span class="nb"&gt;install&lt;/span&gt; .[full]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For this tutorial, use &lt;code&gt;.[full]&lt;/code&gt;. First run will download embedding models (~100–500MB depending on language) — this only happens once.&lt;/p&gt;

&lt;p&gt;No API keys needed. Bing and DuckDuckGo are queried without authentication.&lt;/p&gt;




&lt;h2&gt;
  
  
  Step 2: Your First Query — 3 Lines
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;production_rag_pipeline&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;build_llm_prompt&lt;/span&gt;

&lt;span class="n"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;build_llm_prompt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;latest AI news&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;lang&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;en&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's the entire interface. &lt;code&gt;build_llm_prompt&lt;/code&gt; runs the full pipeline — search, filter, fetch, extract, rerank — and returns a formatted string ready to paste into any LLM.&lt;/p&gt;

&lt;h3&gt;
  
  
  CLI alternative
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;production-rag-pipeline &lt;span class="s2"&gt;"latest AI news"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Or with options:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Search-only mode (no page fetching)&lt;/span&gt;
production-rag-pipeline &lt;span class="s2"&gt;"Bitcoin price"&lt;/span&gt; &lt;span class="nt"&gt;--mode&lt;/span&gt; search

&lt;span class="c"&gt;# Russian query&lt;/span&gt;
production-rag-pipeline &lt;span class="s2"&gt;"новости ИИ"&lt;/span&gt; &lt;span class="nt"&gt;--mode&lt;/span&gt; &lt;span class="nb"&gt;read&lt;/span&gt; &lt;span class="nt"&gt;--lang&lt;/span&gt; ru
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  macOS users
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;./run_llm_query.command
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This bootstraps a virtual environment automatically on first run.&lt;/p&gt;




&lt;h2&gt;
  
  
  Step 3: What Just Happened — Stage by Stage
&lt;/h2&gt;

&lt;p&gt;Let's trace what the pipeline actually does with &lt;code&gt;"latest AI news"&lt;/code&gt;. Enable debug mode to see it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;production_rag_pipeline.pipeline&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;search_extract_rerank&lt;/span&gt;

&lt;span class="n"&gt;chunks&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;fetched_urls&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;search_extract_rerank&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;latest AI news&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;num_fetch&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;lang&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;en&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;debug&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Stage 1: Dual-Engine Search
&lt;/h3&gt;

&lt;p&gt;Bing and DuckDuckGo are searched &lt;strong&gt;in parallel&lt;/strong&gt;. Results are merged with position-based scoring — first result from each engine scores highest, and results that appear in both engines get a boost.&lt;/p&gt;

&lt;p&gt;The pipeline detects keywords like "news", "latest", "breaking" and switches DDG to its &lt;strong&gt;News index&lt;/strong&gt; — returning actual articles instead of generic homepages.&lt;/p&gt;

&lt;h3&gt;
  
  
  Stage 2: Semantic Pre-Filtering
&lt;/h3&gt;

&lt;p&gt;This is the key optimization. Before fetching any page, the pipeline computes &lt;strong&gt;cosine similarity&lt;/strong&gt; between the query embedding and each result's title+snippet embedding.&lt;/p&gt;

&lt;p&gt;Results below threshold get dropped:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;English: threshold 0.30&lt;/li&gt;
&lt;li&gt;Russian: threshold 0.25&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In practice, &lt;strong&gt;~11 out of 20 results get filtered&lt;/strong&gt; — saving about 6 seconds of HTTP fetches.&lt;/p&gt;

&lt;p&gt;Example from a real run with &lt;code&gt;"LLM agents news"&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;✗ flutrackers.com     sim=0.12  → filtered (irrelevant)
✓ llm-stats.com       sim=0.68  → fetched
✗ reddit.com/r/gaming  sim=0.15  → filtered
✓ arxiv.org/abs/2503   sim=0.71  → fetched
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;No hardcoded domain lists. Pure semantic relevance.&lt;/p&gt;

&lt;h3&gt;
  
  
  Stage 3: Parallel Fetch + Content Extraction
&lt;/h3&gt;

&lt;p&gt;Surviving results (typically 5–9 URLs) are fetched in parallel. Content extraction runs a two-stage quality check:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Structural check:&lt;/strong&gt; Does &amp;gt;30% of lines look like numbers/prices/tables?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Semantic check:&lt;/strong&gt; If flagged, is the table relevant to the query?&lt;/p&gt;

&lt;p&gt;This is how exchange rate tables from &lt;code&gt;cbr.ru&lt;/code&gt; pass for a currency query (similarity 0.75) but CS:GO price lists get rejected (similarity 0.05).&lt;/p&gt;

&lt;p&gt;After extraction, boilerplate is stripped — navigation, ads, newsletter signup patterns, cookie banners.&lt;/p&gt;

&lt;h3&gt;
  
  
  Stage 4: Chunking + Multi-Signal Reranking
&lt;/h3&gt;

&lt;p&gt;Extracted content is chunked, then reranked by four signals:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;BM25&lt;/strong&gt; — classic lexical term-frequency matching&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Semantic similarity&lt;/strong&gt; — cosine between query and chunk embeddings&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Answer-span detection&lt;/strong&gt; — does this chunk directly answer the question?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;MMR diversity&lt;/strong&gt; — prevents top results from all being paraphrases of the same paragraph&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Optional: a &lt;strong&gt;cross-encoder&lt;/strong&gt; runs on the final shortlist for maximum accuracy (slower but better).&lt;/p&gt;

&lt;p&gt;For news queries, &lt;strong&gt;freshness penalties&lt;/strong&gt; apply:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Content &amp;gt;7 days old: −1 confidence&lt;/li&gt;
&lt;li&gt;Content &amp;gt;30 days old: −2 confidence&lt;/li&gt;
&lt;li&gt;Outdated sources flagged in the prompt with exact age&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Stage 5: Prompt Assembly with Citation Binding
&lt;/h3&gt;

&lt;p&gt;The pipeline builds a structured prompt:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;production_rag_pipeline.pipeline&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;build_llm_context&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;production_rag_pipeline.prompts&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;build_llm_prompt&lt;/span&gt;

&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;source_mapping&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;grouped_sources&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;build_llm_context&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;chunks&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;fetched_urls&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;fetched_urls&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;renumber_sources&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# ← fixes phantom citation numbers
&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Citation numbers are renumbered after every filtering step. If three sources survive, they're numbered [1], [2], [3] — never [1], [3], [7] with phantom gaps.&lt;/p&gt;

&lt;p&gt;Current date and time are injected into the prompt so the LLM can reason about source freshness.&lt;/p&gt;




&lt;h2&gt;
  
  
  Step 4: What the Output Looks Like
&lt;/h2&gt;

&lt;p&gt;The final prompt looks roughly like this (abbreviated):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Current date: 2026-03-20

Answer the user's question using ONLY the provided sources.
Cite sources using [1], [2], etc. Do not make claims without a citation.

=== SOURCES ===

[1] OpenAI announces GPT-5 turbo with 1M context window
Source: techcrunch.com | Published: 2026-03-19
OpenAI today released GPT-5 Turbo, featuring a 1 million token
context window and improved reasoning capabilities...

[2] Google DeepMind publishes Gemini 2.5 technical report
Source: blog.google | Published: 2026-03-18
The technical report details architectural changes including
mixture-of-experts scaling to 3.2 trillion parameters...

[3] Anthropic raises $5B Series E at $90B valuation
Source: reuters.com | Published: 2026-03-17
Anthropic closed a $5 billion funding round, bringing its
total raised to over $15 billion...

=== QUESTION ===

latest AI news
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Drop this into Ollama, LM Studio, or any API. The model sees curated, relevant, cited content — not raw web pages.&lt;/p&gt;




&lt;h2&gt;
  
  
  Step 5: Configuration
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Dataclass
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;production_rag_pipeline&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;RAGConfig&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;build_llm_prompt&lt;/span&gt;

&lt;span class="n"&gt;config&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;RAGConfig&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;num_per_engine&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;12&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;       &lt;span class="c1"&gt;# results per search engine
&lt;/span&gt;    &lt;span class="n"&gt;top_n_fetch&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;           &lt;span class="c1"&gt;# max pages to fetch
&lt;/span&gt;    &lt;span class="n"&gt;fetch_timeout&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;        &lt;span class="c1"&gt;# seconds per page
&lt;/span&gt;    &lt;span class="n"&gt;total_context_chunks&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;12&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="c1"&gt;# chunks in final prompt
&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;build_llm_prompt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;latest AI news&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  YAML
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;production-rag-pipeline &lt;span class="s2"&gt;"latest AI news"&lt;/span&gt; &lt;span class="nt"&gt;--config&lt;/span&gt; config.example.yaml
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Environment variables
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;RAG_TOP_N_FETCH&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;8
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;RAG_FETCH_TIMEOUT&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;10
production-rag-pipeline &lt;span class="s2"&gt;"latest AI news"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Step 6: The 50-Line Version
&lt;/h2&gt;

&lt;p&gt;Here's the entire pipeline, from query to LLM-ready prompt, using the module-level API:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;production_rag_pipeline.search&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;search&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;production_rag_pipeline.fetch&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;fetch_pages_parallel&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;production_rag_pipeline.extract&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;extract_content&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;chunk_text&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;production_rag_pipeline.rerank&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;rerank_chunks&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;production_rag_pipeline.pipeline&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;build_llm_context&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;production_rag_pipeline.prompts&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;build_llm_prompt&lt;/span&gt;

&lt;span class="c1"&gt;# 1. Search
&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;latest AI news&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;num_per_engine&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;lang&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;en&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# 2. Fetch
&lt;/span&gt;&lt;span class="n"&gt;urls&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;url&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;[:&lt;/span&gt;&lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;]]&lt;/span&gt;
&lt;span class="n"&gt;pages&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;fetch_pages_parallel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;urls&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;timeout&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# 3. Extract + Chunk
&lt;/span&gt;&lt;span class="n"&gt;all_chunks&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;html&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;pages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;items&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;extract_content&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;html&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;chunks&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;chunk_text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;all_chunks&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;extend&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;chunks&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# 4. Rerank
&lt;/span&gt;&lt;span class="n"&gt;ranked&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;rerank_chunks&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;all_chunks&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;lang&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;en&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# 5. Build prompt
&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;mapping&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;sources&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;build_llm_context&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;ranked&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;renumber_sources&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;build_llm_prompt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;sources&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;sources&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is what &lt;code&gt;build_llm_prompt("latest AI news")&lt;/code&gt; does internally, broken into visible steps.&lt;/p&gt;




&lt;h2&gt;
  
  
  Graceful Degradation
&lt;/h2&gt;

&lt;p&gt;The pipeline works at every install level:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Install&lt;/th&gt;
&lt;th&gt;Ranking&lt;/th&gt;
&lt;th&gt;Extraction&lt;/th&gt;
&lt;th&gt;Speed&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;pip install .&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;BM25 only&lt;/td&gt;
&lt;td&gt;BeautifulSoup&lt;/td&gt;
&lt;td&gt;Fastest, least accurate&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;pip install .[extraction]&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;BM25 only&lt;/td&gt;
&lt;td&gt;Trafilatura&lt;/td&gt;
&lt;td&gt;Better content quality&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;pip install .[semantic]&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;BM25 + semantic + MMR&lt;/td&gt;
&lt;td&gt;BeautifulSoup&lt;/td&gt;
&lt;td&gt;Much better ranking&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;pip install .[full]&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;BM25 + semantic + cross-encoder + MMR&lt;/td&gt;
&lt;td&gt;Trafilatura&lt;/td&gt;
&lt;td&gt;Best quality&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;No GPU required. Semantic models run on CPU — slower, but functional.&lt;/p&gt;




&lt;h2&gt;
  
  
  How It Compares to Perplexity
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;Perplexity&lt;/th&gt;
&lt;th&gt;production-rag-pipeline&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Index&lt;/td&gt;
&lt;td&gt;200B+ pre-indexed URLs&lt;/td&gt;
&lt;td&gt;Real-time Bing + DDG&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Latency&lt;/td&gt;
&lt;td&gt;358ms median&lt;/td&gt;
&lt;td&gt;8–15s on a MacBook&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Models&lt;/td&gt;
&lt;td&gt;20+ with dynamic routing&lt;/td&gt;
&lt;td&gt;You choose (Ollama, LM Studio, etc.)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Inference&lt;/td&gt;
&lt;td&gt;Cerebras CS-3, 1,200 tok/s&lt;/td&gt;
&lt;td&gt;Your hardware&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cost&lt;/td&gt;
&lt;td&gt;$20/mo Pro&lt;/td&gt;
&lt;td&gt;Free&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Privacy&lt;/td&gt;
&lt;td&gt;Cloud&lt;/td&gt;
&lt;td&gt;Local&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Code&lt;/td&gt;
&lt;td&gt;Closed&lt;/td&gt;
&lt;td&gt;Open source, MIT&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The gap is real — especially on latency and index size. But for a tool that runs on your laptop, feeds any local model, and costs nothing, the tradeoff is worth it.&lt;/p&gt;




&lt;h2&gt;
  
  
  Multilingual Support
&lt;/h2&gt;

&lt;p&gt;The pipeline auto-detects language by Cyrillic character ratio (10% threshold):&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;English&lt;/strong&gt; → &lt;code&gt;all-MiniLM-L6-v2&lt;/code&gt; (fast, English-optimized)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Russian&lt;/strong&gt; → &lt;code&gt;paraphrase-multilingual-MiniLM-L12-v2&lt;/code&gt; (13 languages)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Cross-encoder reranking also switches models per language. No manual configuration needed.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;production-rag-pipeline &lt;span class="s2"&gt;"новости ИИ"&lt;/span&gt; &lt;span class="nt"&gt;--lang&lt;/span&gt; ru
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdza4pi269yswcmh6q0wt.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdza4pi269yswcmh6q0wt.png" alt="Dark abstract 3D render with the word BUILD in large translucent letters" width="800" height="307"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  What's Next
&lt;/h2&gt;

&lt;p&gt;This is Part 2 of a series:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://medium.com/@kazkozdev/how-perplexity-actually-searches-the-internet-ae4b50dd9837" rel="noopener noreferrer"&gt;Part 1&lt;/a&gt;&lt;/strong&gt; — How Perplexity Actually Searches the Internet (architecture teardown)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Part 2&lt;/strong&gt; — You're reading it (build the local equivalent)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Star the repo if this is useful: &lt;a href="https://github.com/KazKozDev/production_rag_pipeline" rel="noopener noreferrer"&gt;github.com/KazKozDev/production_rag_pipeline&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Issues and contributions welcome.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>rag</category>
      <category>python</category>
    </item>
  </channel>
</rss>
