<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: VLSiddarth</title>
    <description>The latest articles on DEV Community by VLSiddarth (@vlsiddarth).</description>
    <link>https://dev.to/vlsiddarth</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1314572%2F772fbff0-de5e-43eb-a504-ce619239388d.png</url>
      <title>DEV Community: VLSiddarth</title>
      <link>https://dev.to/vlsiddarth</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/vlsiddarth"/>
    <language>en</language>
    <item>
      <title>Andrej Karpathy said manual data ingest for AI agents is too slow. I built the fix.</title>
      <dc:creator>VLSiddarth</dc:creator>
      <pubDate>Tue, 07 Apr 2026 09:00:05 +0000</pubDate>
      <link>https://dev.to/vlsiddarth/andrej-karpathy-said-manual-data-ingest-for-ai-agents-is-too-slow-i-built-the-fix-2co8</link>
      <guid>https://dev.to/vlsiddarth/andrej-karpathy-said-manual-data-ingest-for-ai-agents-is-too-slow-i-built-the-fix-2co8</guid>
      <description>&lt;h2&gt;
  
  
  Andrej Karpathy said manual data ingest for AI agents is too slow. I built the fix.
&lt;/h2&gt;

&lt;p&gt;Last week Andrej Karpathy posted about building personal knowledge &lt;br&gt;
bases for LLM agents. He described his workflow: manually indexing &lt;br&gt;
source documents into a raw/ directory, writing custom search tools, &lt;br&gt;
building a naive search engine over his wiki.&lt;/p&gt;

&lt;p&gt;Then he wrote this:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"I think there is room here for an incredible new product &lt;br&gt;
instead of a hacky collection of scripts."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;He was right. So I built it.&lt;/p&gt;


&lt;h2&gt;
  
  
  The Problem He Identified
&lt;/h2&gt;

&lt;p&gt;Karpathy's workflow is brilliant but it requires him to manually &lt;br&gt;
curate every source. He clips articles with Obsidian Web Clipper, &lt;br&gt;
downloads images locally, feeds them one-by-one to his LLM agent.&lt;/p&gt;

&lt;p&gt;For a researcher at his level that works. For a developer building &lt;br&gt;
production AI agents for clients, it doesn't scale.&lt;/p&gt;

&lt;p&gt;Here's the specific failure mode I kept hitting:&lt;/p&gt;

&lt;p&gt;You build a RAG pipeline. It works. A user asks about a Python library.&lt;br&gt;
Your retriever finds a Stack Overflow answer with cosine similarity &lt;strong&gt;0.94&lt;/strong&gt;.&lt;br&gt;
The LLM answers confidently. The user follows the advice. It breaks their project.&lt;/p&gt;

&lt;p&gt;The Stack Overflow answer was from 2021. The library changed its API in 2023.&lt;/p&gt;

&lt;p&gt;Your retriever did its job perfectly. Your vector store had no concept &lt;br&gt;
of when that document was written. No exception was raised. No warning &lt;br&gt;
was shown. The cosine similarity score told you nothing about whether &lt;br&gt;
the knowledge was still true.&lt;/p&gt;

&lt;p&gt;This is the silent failure mode of every RAG pipeline in production. &lt;br&gt;
Tavily, Exa, and SerpAPI don't tell you when their results are stale.&lt;/p&gt;

&lt;p&gt;So I built a retrieval API that does.&lt;/p&gt;


&lt;h2&gt;
  
  
  What I Built
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Knowledge Universe&lt;/strong&gt; is an open-source retrieval API that gives &lt;br&gt;
Karpathy's LLM wiki agents something they currently don't have: &lt;br&gt;
a production-grade data ingestion layer that crawls 18 knowledge &lt;br&gt;
sources simultaneously, scores every result for freshness, and &lt;br&gt;
returns structured documents in 3 seconds.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Install the CLI&lt;/span&gt;
pip &lt;span class="nb"&gt;install &lt;/span&gt;knowledge-universe

&lt;span class="c"&gt;# Get a free API key&lt;/span&gt;
ku signup you@email.com

&lt;span class="c"&gt;# Run your first query&lt;/span&gt;
ku discover &lt;span class="s2"&gt;"transformer architecture"&lt;/span&gt; &lt;span class="nt"&gt;--difficulty&lt;/span&gt; 3
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output:&lt;br&gt;
Found 8 sources [2980ms]&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;🟢 [arxiv] Learning Novel Transformer Architecture for Time-series&lt;/p&gt;

&lt;p&gt;&lt;a href="https://arxiv.org/abs/2502.13721v1" rel="noopener noreferrer"&gt;https://arxiv.org/abs/2502.13721v1&lt;/a&gt;&lt;br&gt;
decay=0.23 (fresh)  quality=8.5/10&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;⚪ [kaggle] In-depth guide to Transformer architecture&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.kaggle.com/code/tientd95/in-depth-guide-to-transformer-arc" rel="noopener noreferrer"&gt;https://www.kaggle.com/code/tientd95/in-depth-guide-to-transformer-arc&lt;/a&gt;&lt;br&gt;
decay=0.40 (unknown)  quality=4.5/10&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;🟢 [paperswithcode] Gradient Boosting within a Single Attention Layer&lt;/p&gt;

&lt;p&gt;&lt;a href="https://arxiv.org/abs/2604.03190" rel="noopener noreferrer"&gt;https://arxiv.org/abs/2604.03190&lt;/a&gt;&lt;br&gt;
decay=0.01 (fresh)  quality=7.6/10&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;🟠 [github] An-Jhon/Hand-Drawn-Transformer&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/An-Jhon/Hand-Drawn-Transformer" rel="noopener noreferrer"&gt;https://github.com/An-Jhon/Hand-Drawn-Transformer&lt;/a&gt;&lt;br&gt;
decay=0.68 (stale)  quality=2.6/10&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;🟡 [semantic_scholar] Transformer+transformer architecture for image captioni&lt;/p&gt;

&lt;p&gt;&lt;a href="https://doi.org/10.11591/ijai.v14.i3.pp2338-2346" rel="noopener noreferrer"&gt;https://doi.org/10.11591/ijai.v14.i3.pp2338-2346&lt;/a&gt;&lt;br&gt;
decay=0.44 (aging)  quality=5.1/10&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;🟢 [arxiv] A Survey of Graph Transformers: Architectures, Theories&lt;/p&gt;

&lt;p&gt;&lt;a href="https://arxiv.org/abs/2502.16533v2" rel="noopener noreferrer"&gt;https://arxiv.org/abs/2502.16533v2&lt;/a&gt;&lt;br&gt;
decay=0.23 (fresh)  quality=8.9/10&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;⚪ [kaggle] LB 0.73 single fold transformer architecture&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.kaggle.com/code/hengck23/lb-0-73-single-fold-transformer-a" rel="noopener noreferrer"&gt;https://www.kaggle.com/code/hengck23/lb-0-73-single-fold-transformer-a&lt;/a&gt;&lt;br&gt;
decay=0.40 (unknown)  quality=4.5/10&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;🟠 [github] tum-pbs/pde-transformer&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/tum-pbs/pde-transformer" rel="noopener noreferrer"&gt;https://github.com/tum-pbs/pde-transformer&lt;/a&gt;&lt;br&gt;
decay=0.70 (stale)  quality=2.5/10&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Cache hit: False  |  Time: 2980ms&lt;/p&gt;

&lt;p&gt;Every result tells you not just what it found, but how much to trust it.&lt;/p&gt;

&lt;p&gt;Live API: &lt;a href="https://vlsiddarth-knowledge-universe.hf.space" rel="noopener noreferrer"&gt;https://vlsiddarth-knowledge-universe.hf.space&lt;/a&gt;&lt;br&gt;&lt;br&gt;
GitHub: &lt;a href="https://github.com/VLSiddarth/Knowledge-Universe" rel="noopener noreferrer"&gt;https://github.com/VLSiddarth/Knowledge-Universe&lt;/a&gt;&lt;/p&gt;


&lt;h2&gt;
  
  
  The Architecture
&lt;/h2&gt;

&lt;p&gt;The core idea: run 18 crawlers in parallel, score everything, &lt;br&gt;
return the best 8-10 results with freshness metadata attached.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Your App / Agent
          │
          ▼  POST /v1/discover
┌─────────────────────────────────────────────────┐
│            Knowledge Universe API               │
│                                                 │
│  1. Cache check (Redis) ──── HIT → 200ms return │
│        │ MISS                                   │
│  2. asyncio.gather(18 crawlers, per-timeouts)   │
│        ├── arXiv          (25s timeout)         │
│        ├── CrossRef       (8s) ← [Academic]     │
│        ├── PapersWithCode (8s) ← [SOTA Models]  │
│        ├── Documentation  (3s) ← [Fast-Fail]    │
│        ├── GitHub         (8s)                  │
│        ├── StackOverflow  (6s)                  │
│        ├── HuggingFace    (8s)                  │
│        ├── Kaggle         (6s)                  │
│        ├── YouTube        (8s)                  │
│        ├── Sketchfab      (5s) ← [3D Spatial]   │
│        ├── Freesound      (5s) ← [Audio]        │
│        ├── Wikipedia      (5s)                  │
│        ├── MIT OCW        (5s)                  │
│        ├── OpenLibrary    (5s)                  │
│        ├── Podcast Index  (5s)                  │
│        ├── Libgen         (4s)                  │
│        ├── CommonCrawl    (2s) ← [Fast-Fail]    │
│        └── GH Archive     (2s) ← [Fast-Fail]    │
│        │                                        │
│  3. Semantic pre-filter (cosine sim &amp;gt; 0.25)     │
│  4. Quality ranker (5-dimension scoring)        │
│  5. Knowledge Decay Engine                      │
│  6. LLM reranker (all-MiniLM-L6-v2)             │
│  7. Coverage Confidence Score                   │
│  8. Cache result (Redis, 4h TTL)                │
│        │                                        │
│  Return: sources + decay_scores + confidence    │
└─────────────────────────────────────────────────┘
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Why asyncio.gather and not threads?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Each crawler is an async HTTP call. &lt;code&gt;asyncio.gather&lt;/code&gt; runs all 18 &lt;br&gt;
simultaneously, so the total wall time equals the &lt;strong&gt;slowest crawler&lt;/strong&gt;, &lt;br&gt;
not the sum. The parallel ceiling is arXiv at ~2.5s for complex queries.&lt;/p&gt;

&lt;p&gt;One lesson that cost me 5 seconds of latency: one blocking crawler &lt;br&gt;
kills everything. My original Kaggle integration used the official &lt;br&gt;
SDK which runs synchronous urllib3 under the hood. I wrapped it in &lt;br&gt;
&lt;code&gt;run_in_executor&lt;/code&gt; thinking that was fine. It held a thread pool slot &lt;br&gt;
for 2.5 seconds on every query and pushed cold latency from 3s to 8s.&lt;/p&gt;

&lt;p&gt;The fix: replace the SDK with direct async HTTP calls using httpx:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Before — blocks the thread pool
&lt;/span&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;crawl&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;topic&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;difficulty&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;loop&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_event_loop&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;loop&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run_in_executor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_crawl_sync&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;topic&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;difficulty&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# After — true async, 300ms for same results
&lt;/span&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;crawl&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;topic&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;difficulty&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;httpx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;AsyncClient&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;datasets&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://www.kaggle.com/api/v1/datasets/list&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;search&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;topic&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sortBy&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;usability&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
            &lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Authorization&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Basic &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_encoded_creds&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_parse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;datasets&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="n"&gt;difficulty&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Cold latency dropped from 8.8s to 3.1s with that one change.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Knowledge Decay Formula
&lt;/h2&gt;

&lt;p&gt;This is the part that doesn't exist in any other retrieval API.&lt;/p&gt;

&lt;p&gt;Every result gets a decay score computed from its age and source type:&lt;br&gt;
decay = 1 - 0.5 ^ (age_days / half_life)&lt;br&gt;
freshness = 1 - decay&lt;/p&gt;

&lt;p&gt;Half-lives are tuned per platform based on how fast knowledge &lt;br&gt;
in that domain becomes outdated:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Platform&lt;/th&gt;
&lt;th&gt;Half-life&lt;/th&gt;
&lt;th&gt;Why&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;HuggingFace&lt;/td&gt;
&lt;td&gt;120 days&lt;/td&gt;
&lt;td&gt;ML model landscape changes monthly&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GitHub&lt;/td&gt;
&lt;td&gt;180 days&lt;/td&gt;
&lt;td&gt;Dependencies update constantly&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;YouTube&lt;/td&gt;
&lt;td&gt;270 days&lt;/td&gt;
&lt;td&gt;Library tutorials date quickly&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Stack Overflow&lt;/td&gt;
&lt;td&gt;365 days&lt;/td&gt;
&lt;td&gt;API answers age with framework versions&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;arXiv&lt;/td&gt;
&lt;td&gt;1,095 days&lt;/td&gt;
&lt;td&gt;Research papers have longer shelf life&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Wikipedia&lt;/td&gt;
&lt;td&gt;1,460 days&lt;/td&gt;
&lt;td&gt;Actively maintained, slow decay&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Open Library&lt;/td&gt;
&lt;td&gt;1,825 days&lt;/td&gt;
&lt;td&gt;Books revised infrequently&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The output on every result:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="nl"&gt;"stackoverflow:59523557"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"decay_score"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;0.986&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"freshness"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;0.014&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"label"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"decayed"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"age_days"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;2263&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"half_life_days"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;365&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That Stack Overflow answer from 2021 with cosine similarity 0.94?&lt;br&gt;
Its &lt;code&gt;freshness&lt;/code&gt; score is 0.014. It gets downweighted before it &lt;br&gt;
reaches your LLM. The silent failure mode is no longer silent.&lt;/p&gt;

&lt;p&gt;Fast-moving topics get an additional volatility multiplier.&lt;br&gt;
Topics like "LLMs", "React", "Docker", "Claude" use a ×1.1 &lt;br&gt;
multiplier on the decay rate — knowledge in those domains &lt;br&gt;
goes stale faster than the platform average.&lt;/p&gt;


&lt;h2&gt;
  
  
  The 5-Dimension Quality Ranker
&lt;/h2&gt;

&lt;p&gt;Before decay is applied, each source gets a base quality score:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;WEIGHTS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;authority&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;            &lt;span class="mf"&gt;0.35&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# platform trust + content type
&lt;/span&gt;    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;difficulty_alignment&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.30&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# how well difficulty matches request
&lt;/span&gt;    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;completeness&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;         &lt;span class="mf"&gt;0.20&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# metadata richness
&lt;/span&gt;    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;social_proof&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;         &lt;span class="mf"&gt;0.10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# stars, citations, views (log-scaled)
&lt;/span&gt;    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;accessibility&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;        &lt;span class="mf"&gt;0.05&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# open access bonus
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="n"&gt;final_score&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;base_quality&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;decay_penalty_multiplier&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Difficulty alignment&lt;/strong&gt; is the most impactful dimension for &lt;br&gt;
Karpathy's use case specifically. When you're feeding an LLM wiki &lt;br&gt;
agent, you want sources matched to the agent's context level. &lt;br&gt;
A research synthesis agent should get arXiv papers, not YouTube &lt;br&gt;
explainers. A learning tool for beginners should get the opposite.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# difficulty gap penalty
# gap=0: score 10.0  (perfect match)
# gap=1: score 8.5   (acceptable)  
# gap=2: score 6.0   (marginal)
# gap=3: score 2.0   (nearly blocked)
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  The Coverage Confidence Score
&lt;/h2&gt;

&lt;p&gt;This is the feature that surprised me most when I built it.&lt;/p&gt;

&lt;p&gt;After reranking, the API computes the average cosine similarity &lt;br&gt;
between your query and the top results. If the average falls below &lt;br&gt;
0.45, it warns you and suggests better queries:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="nl"&gt;"coverage_intelligence"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"confidence"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;0.36&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"confidence_label"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"low"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"coverage_warning"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"warning_message"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Low confidence — results may not match intent"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"suggested_queries"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"attention mechanism self-attention explained"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"transformer encoder decoder architecture tutorial"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"attention is all you need paper walkthrough"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This matters for LLM wiki agents specifically. When the agent asks &lt;br&gt;
an ambiguous question that doesn't match how sources are indexed, &lt;br&gt;
instead of silently returning mediocre results, the API tells the &lt;br&gt;
agent to rephrase. The agent can use the suggested queries directly.&lt;/p&gt;

&lt;p&gt;Karpathy mentioned he runs "health checks" over his wiki to find &lt;br&gt;
inconsistent data and impute missing data. Coverage confidence is &lt;br&gt;
essentially that health check, automated and running on every query.&lt;/p&gt;


&lt;h2&gt;
  
  
  Performance vs Competitors
&lt;/h2&gt;

&lt;p&gt;Tested against Tavily, Exa, and SerpAPI using identical queries:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Knowledge Universe&lt;/th&gt;
&lt;th&gt;Tavily&lt;/th&gt;
&lt;th&gt;Exa&lt;/th&gt;
&lt;th&gt;SerpAPI&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Cold latency&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;3.1s&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;5.4s&lt;/td&gt;
&lt;td&gt;1.5s&lt;/td&gt;
&lt;td&gt;3.5s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cache hit&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;220ms&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;N/A&lt;/td&gt;
&lt;td&gt;N/A&lt;/td&gt;
&lt;td&gt;N/A&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Decay scoring&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;✅&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;✗&lt;/td&gt;
&lt;td&gt;✗&lt;/td&gt;
&lt;td&gt;✗&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Confidence score&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;✅&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;✗&lt;/td&gt;
&lt;td&gt;✗&lt;/td&gt;
&lt;td&gt;✗&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Difficulty ranking&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;✅&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;✗&lt;/td&gt;
&lt;td&gt;✗&lt;/td&gt;
&lt;td&gt;✗&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Source diversity&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;18 typed platforms (incl. 3D &amp;amp; Audio)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Web only&lt;/td&gt;
&lt;td&gt;Web only&lt;/td&gt;
&lt;td&gt;Google only&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;KU is faster than Tavily on cold queries despite hitting 18 typed &lt;br&gt;
sources vs Tavily's general web index. The parallel architecture &lt;br&gt;
is what makes this possible — the wall clock time equals the &lt;br&gt;
slowest single crawler, not the sum of all crawlers.&lt;/p&gt;

&lt;p&gt;Note on Exa: Exa is faster (1.5s) because it uses a single unified &lt;br&gt;
search index rather than parallel crawling. The tradeoff is no decay &lt;br&gt;
scoring and no source type diversity — you get whatever their index &lt;br&gt;
decided to rank.&lt;/p&gt;


&lt;h2&gt;
  
  
  LangChain Integration — Drop-in Ready
&lt;/h2&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_core.documents&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Document&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_knowledge_universe_docs&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;topic&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;difficulty&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;formats&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;your_key_here&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;min_freshness&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# filter sources below 30% fresh
&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Document&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;

    &lt;span class="n"&gt;formats&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;formats&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;pdf&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;github&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;html&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;video&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;stackoverflow&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

    &lt;span class="n"&gt;resp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://vlsiddarth-knowledge-universe.hf.space/v1/discover&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;X-API-Key&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;topic&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;topic&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;difficulty&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;difficulty&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;formats&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;formats&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;max_results&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="n"&gt;timeout&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;30&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="c1"&gt;# Check coverage confidence
&lt;/span&gt;    &lt;span class="n"&gt;cov&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;resp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;coverage_intelligence&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{})&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;cov&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;coverage_warning&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;⚠️  Low confidence. Try: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;cov&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;suggested_queries&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[])&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;docs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
    &lt;span class="n"&gt;decay_map&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;resp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;decay_scores&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{})&lt;/span&gt;

    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;source&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;resp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sources&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[]):&lt;/span&gt;
        &lt;span class="n"&gt;decay&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;decay_map&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;source&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="p"&gt;{})&lt;/span&gt;
        &lt;span class="n"&gt;freshness&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;decay&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;freshness&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.5&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Filter stale sources before they reach your LLM
&lt;/span&gt;        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;freshness&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;min_freshness&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;continue&lt;/span&gt;

        &lt;span class="n"&gt;docs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Document&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;page_content&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;source&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;summary&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="n"&gt;metadata&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;title&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;         &lt;span class="n"&gt;source&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;title&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;url&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;           &lt;span class="n"&gt;source&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;url&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;platform&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;      &lt;span class="n"&gt;source&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;source_platform&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;freshness&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;     &lt;span class="n"&gt;freshness&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;decay_label&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;   &lt;span class="n"&gt;decay&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;label&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;quality_score&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;source&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;quality_score&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;difficulty&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;    &lt;span class="n"&gt;source&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;difficulty&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;))&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;docs&lt;/span&gt;


&lt;span class="c1"&gt;# Usage — drop into any existing LangChain RAG chain
&lt;/span&gt;&lt;span class="n"&gt;docs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;get_knowledge_universe_docs&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;topic&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;transformer architecture&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;difficulty&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ku_test_...&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;doc&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;docs&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;[&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;doc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;decay_label&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;] &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;doc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;title&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;  freshness=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;doc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;freshness&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;  url=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;doc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;url&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;For Karpathy's LLM wiki use case specifically:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Give your LLM wiki agent a tool that does the manual ingest
# he described — automatically, with freshness scoring
&lt;/span&gt;
&lt;span class="n"&gt;wiki_sources&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;get_knowledge_universe_docs&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;topic&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;mixture of experts routing algorithms&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;difficulty&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;           &lt;span class="c1"&gt;# researcher-level sources only
&lt;/span&gt;    &lt;span class="n"&gt;min_freshness&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;      &lt;span class="c1"&gt;# only recent sources go into the wiki
&lt;/span&gt;    &lt;span class="n"&gt;formats&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;pdf&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;github&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;  &lt;span class="c1"&gt;# papers and implementations
&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Feed directly to your wiki agent
&lt;/span&gt;&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;source&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;wiki_sources&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ingest_to_wiki&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;source&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The manual raw/ directory collection Karpathy describes is now &lt;br&gt;
three lines of code.&lt;/p&gt;


&lt;h2&gt;
  
  
  Things That Didn't Work
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;MinHash LSH deduplication misses near-identical titles.&lt;/strong&gt;&lt;br&gt;
Wikipedia returns both "Neural network" and "Neural network &lt;br&gt;
(machine learning)" as separate articles. After normalization &lt;br&gt;
they differ, so both pass deduplication. Fixed with a &lt;br&gt;
parenthetical-stripping step before the hash comparison.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Per-crawler timeouts were the wrong abstraction initially.&lt;/strong&gt;&lt;br&gt;
I started with a global 8s timeout for all crawlers. &lt;br&gt;
CommonCrawl and GHArchive always time out at 8s with 0 results, &lt;br&gt;
wasting the full 8s on every query. Setting them to 2s fast-fail &lt;br&gt;
dropped the parallel ceiling from 8s to 3s.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Semantic Scholar blocked Hugging Face IP addresses.&lt;/strong&gt;&lt;br&gt;
I originally used Semantic Scholar for academic papers. In late 2024, they changed their policy and started throwing &lt;code&gt;403 Forbidden&lt;/code&gt; errors for server-to-server requests from Hugging Face Spaces free tiers.&lt;/p&gt;

&lt;p&gt;The Fix: I ripped it out and integrated CrossRef. They index 150M+ scholarly works, have a fully open API (CC0 metadata), and actively encourage programmatic access via their "polite pool" (just pass your email in the User-Agent). It gave me access to IEEE, ACM, and Nature papers that arXiv misses, with zero rate-limit blocks. &lt;/p&gt;

&lt;p&gt;Lesson: &lt;strong&gt;profile each crawler individually before setting any &lt;br&gt;
global timeout.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;_is_stale()&lt;/code&gt; was silently killing cache hit rate.&lt;/strong&gt;&lt;br&gt;
The function checked if a cached result was older than 80% of &lt;br&gt;
the TTL (14,400s × 0.8 = 11,520s). Any query between 3.2 and &lt;br&gt;
4 hours old triggered a full cold re-crawl even though Redis &lt;br&gt;
still had the result. Cache hit rate was 25%. One-line fix: &lt;br&gt;
use the full TTL. Hit rate went to 50%+ immediately.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The shared model singleton took too long to notice.&lt;/strong&gt;&lt;br&gt;
Both &lt;code&gt;LocalLLMReranker&lt;/code&gt; and &lt;code&gt;CoverageConfidenceScorer&lt;/code&gt; were &lt;br&gt;
independently loading &lt;code&gt;all-MiniLM-L6-v2&lt;/code&gt; (90MB, ~300MB RAM). &lt;br&gt;
Loading twice pushed HuggingFace Spaces free tier (2GB limit) &lt;br&gt;
near the ceiling and added ~500ms to first requests. Fixed with &lt;br&gt;
a module-level singleton:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# src/integrations/shared_model.py
&lt;/span&gt;&lt;span class="n"&gt;_model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;
&lt;span class="n"&gt;_model_lock&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;threading&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Lock&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_shared_model&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="k"&gt;global&lt;/span&gt; &lt;span class="n"&gt;_model&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;_model&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;_model&lt;/span&gt;
    &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;_model_lock&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;_model&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sentence_transformers&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;SentenceTransformer&lt;/span&gt;
            &lt;span class="n"&gt;_model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;SentenceTransformer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;all-MiniLM-L6-v2&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;_model&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Both classes now call &lt;code&gt;get_shared_model()&lt;/code&gt;. Model loads once. &lt;br&gt;
Shared embeddings from the reranker pass directly to the &lt;br&gt;
confidence scorer — zero extra encode() calls per request.&lt;/p&gt;




&lt;h2&gt;
  
  
  Try It Now
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Install the Python SDK&lt;/span&gt;
pip &lt;span class="nb"&gt;install &lt;/span&gt;knowledge-universe

&lt;span class="c"&gt;# Or Node&lt;/span&gt;
npm &lt;span class="nb"&gt;install &lt;/span&gt;knowledge-universe

&lt;span class="c"&gt;# Get a free API key (500 calls/month, no credit card)&lt;/span&gt;
ku signup you@email.com

&lt;span class="c"&gt;# Query the live API directly&lt;/span&gt;
curl &lt;span class="nt"&gt;-X&lt;/span&gt; POST https://vlsiddarth-knowledge-universe.hf.space/v1/discover &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"X-API-Key: ku_test_..."&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Content-Type: application/json"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'{
    "topic": "transformer architecture",
    "difficulty": 3,
    "formats": ["pdf", "github", "html", "video"],
    "max_results": 10
  }'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Live API + Swagger docs:&lt;br&gt;&lt;br&gt;
&lt;a href="https://vlsiddarth-knowledge-universe.hf.space" rel="noopener noreferrer"&gt;https://vlsiddarth-knowledge-universe.hf.space&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;GitHub (MIT licensed):&lt;br&gt;&lt;br&gt;
&lt;a href="https://github.com/VLSiddarth/Knowledge-Universe" rel="noopener noreferrer"&gt;https://github.com/VLSiddarth/Knowledge-Universe&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  What's Next
&lt;/h2&gt;

&lt;p&gt;Two things I'm actively building:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Streaming results&lt;/strong&gt; — return sources as they arrive from &lt;br&gt;
each crawler rather than waiting for all 18. The first 3 results &lt;br&gt;
could be in your agent pipeline within 800ms. You see something &lt;br&gt;
immediately; the pipeline enriches as more crawlers complete.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. &lt;code&gt;/v1/monitor&lt;/code&gt; webhook alerts&lt;/strong&gt; — register a topic and a &lt;br&gt;
webhook URL. Knowledge Universe checks that topic daily. When &lt;br&gt;
freshness drops below a threshold or a significantly better &lt;br&gt;
source appears, it pushes an update to your endpoint. Your &lt;br&gt;
RAG pipeline stays current without polling.&lt;/p&gt;

&lt;p&gt;If you're building LLM agents that need external knowledge — &lt;br&gt;
whether it's Karpathy's wiki pattern, a production RAG pipeline, &lt;br&gt;
or something in between — I'd genuinely like to hear what breaks.&lt;/p&gt;

&lt;p&gt;What's your current approach to handling source freshness in &lt;br&gt;
retrieval? Drop it in the comments.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Tags: python, ai, machinelearning, webdev&lt;/em&gt;&lt;/p&gt;

</description>
      <category>python</category>
      <category>ai</category>
      <category>machinelearning</category>
      <category>webdev</category>
    </item>
  </channel>
</rss>
