<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Praful Reddy</title>
    <description>The latest articles on DEV Community by Praful Reddy (@prafulreddy).</description>
    <link>https://dev.to/prafulreddy</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3871637%2Fdfb49680-d4a7-46ca-b39b-0859d4c61fbd.jpg</url>
      <title>DEV Community: Praful Reddy</title>
      <link>https://dev.to/prafulreddy</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/prafulreddy"/>
    <language>en</language>
    <item>
      <title>Why I used a 50-year-old algorithm instead of embeddings to cut Claude API token costs</title>
      <dc:creator>Praful Reddy</dc:creator>
      <pubDate>Wed, 22 Apr 2026 04:16:37 +0000</pubDate>
      <link>https://dev.to/prafulreddy/why-i-used-a-50-year-old-algorithm-instead-of-embeddings-to-cut-claude-api-token-costs-5g6j</link>
      <guid>https://dev.to/prafulreddy/why-i-used-a-50-year-old-algorithm-instead-of-embeddings-to-cut-claude-api-token-costs-5g6j</guid>
      <description>&lt;p&gt;I built &lt;strong&gt;Prism&lt;/strong&gt; — a local proxy that routes only relevant knowledge to Claude per query using BM25, with zero extra API calls, zero embeddings, and zero vector databases.&lt;br&gt;
Every time you send a prompt to Claude, it considers its entire &lt;br&gt;
knowledge space. A question about a React bug still costs tokens &lt;br&gt;
on geography, cooking, history, and every other domain Claude was &lt;br&gt;
trained on. Nobody talks about this because the context window is &lt;br&gt;
large enough that it "works." But it's wasteful by design — and &lt;br&gt;
it produces padded, unfocused responses as a side effect.&lt;/p&gt;

&lt;p&gt;I spent two weeks building a fix. The result is &lt;br&gt;
&lt;a href="https://github.com/YOUR_USERNAME/prism-ai" rel="noopener noreferrer"&gt;Prism&lt;/a&gt; — a local &lt;br&gt;
proxy that intercepts your Claude API calls and routes only &lt;br&gt;
relevant knowledge to each query. Zero extra API calls. Zero &lt;br&gt;
embeddings. Zero vector database. Pure deterministic math.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;## The problem with every other approach&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Before I explain what Prism does, I want to explain what it &lt;br&gt;
deliberately does &lt;em&gt;not&lt;/em&gt; do — because the distinction matters.&lt;/p&gt;

&lt;p&gt;Every existing context optimizer I found uses a second, smaller &lt;br&gt;
LLM to compress the input to the main LLM. LLMlingua, Selective &lt;br&gt;
Context, LLM-DCP — they all work the same way: call a model to &lt;br&gt;
decide what to keep, then call the main model with the compressed &lt;br&gt;
input.&lt;/p&gt;

&lt;p&gt;That's two inference calls instead of one. You're burning tokens &lt;br&gt;
to save tokens. The abstraction is broken at the foundation.&lt;/p&gt;

&lt;p&gt;I kept asking: do you actually need a model to decide what's &lt;br&gt;
relevant? For most prompts, the answer is no. The relevant &lt;br&gt;
domain is &lt;em&gt;structurally detectable&lt;/em&gt; from the words in the query &lt;br&gt;
itself. You don't need intelligence to know that "fix this &lt;br&gt;
TypeError" is a code/debug question. You need pattern matching.&lt;/p&gt;

&lt;p&gt;So I reached for &lt;strong&gt;BM25&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;## What BM25 is and why it works here&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;BM25&lt;/strong&gt; (Best Match 25) is an information retrieval algorithm from &lt;br&gt;
&lt;strong&gt;1994&lt;/strong&gt;, built on TF-IDF principles from the &lt;strong&gt;1970s&lt;/strong&gt;. It's the &lt;br&gt;
ranking function that powered search engines before neural &lt;br&gt;
networks existed. It scores documents against a query using &lt;br&gt;
term frequency, inverse document frequency, and document length &lt;br&gt;
normalization. No model. No training. Pure math.&lt;/p&gt;

&lt;p&gt;Here's the key insight: I pre-built a corpus of 40 knowledge &lt;br&gt;
domain nodes — javascript, security, databases, geography, &lt;br&gt;
history, medicine, etc. Each node has a keyword set describing &lt;br&gt;
that domain. At query time, BM25 scores every domain node &lt;br&gt;
against the incoming prompt in microseconds and returns the &lt;br&gt;
top 5 by relevance.&lt;/p&gt;

&lt;p&gt;The Knowledge Graph then walks relationship edges — if &lt;br&gt;
"javascript" scores highest, its related domains (node, react, &lt;br&gt;
typescript) get included at a discounted score. The result is &lt;br&gt;
a focused set of 3-5 domains that actually matter for this &lt;br&gt;
specific query.&lt;/p&gt;

&lt;p&gt;Building the index takes ~2ms at startup. Each query takes &lt;br&gt;
under 1ms. The entire operation costs zero dollars and works &lt;br&gt;
completely offline.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;## The four pipeline stages&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Intent Classifier&lt;/strong&gt;&lt;br&gt;
Reads the prompt and assigns one of six intent types: CODE, &lt;br&gt;
DEBUG, FACTUAL, CONCEPTUAL, DECISION, or CREATIVE. Uses a &lt;br&gt;
deterministic keyword graph — trigger words, regex patterns, &lt;br&gt;
confidence thresholds. No model call. Under 1ms.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Knowledge Graph&lt;/strong&gt;&lt;br&gt;
BM25-scores 40 domain nodes against the prompt + intent. &lt;br&gt;
Applies intent affinity boosts (DEBUG queries get a 1.4x &lt;br&gt;
multiplier on security and language domains). Walks relationship &lt;br&gt;
edges for related domains. Returns top 5 nodes with scores.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Context Injector&lt;/strong&gt;&lt;br&gt;
Builds a focused system prompt fragment under 300 tokens. &lt;br&gt;
Format varies by intent:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;DEBUG: "Diagnose from [security] perspective. State cause 
first. Then fix. Then why."&lt;/li&gt;
&lt;li&gt;FACTUAL: "Answer from [geography] knowledge. One direct 
answer. No padding. Facts only."&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Always appends: "Be dense. Replace meta-commentary with labels: &lt;br&gt;
[reason] [context] [caveat]. Skip preamble and sign-off."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Response Enforcer&lt;/strong&gt;&lt;br&gt;
Post-processes Claude's raw response before returning it to &lt;br&gt;
you. Runs 111 filler phrase patterns — "Here's the thing &lt;br&gt;
about", "Let me walk you through", "I hope this helps", &lt;br&gt;
"Great question" and 108 others. Prefix and suffix patterns &lt;br&gt;
are deleted entirely. Inline patterns are replaced with &lt;br&gt;
compact semantic labels. Result: 30-50% shorter responses &lt;br&gt;
that are actually denser with information.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;## Before and after&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Here's a real example. Prompt: &lt;em&gt;"fix the TypeError in my &lt;br&gt;
auth middleware"&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Without Prism:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Full knowledge space considered&lt;/li&gt;
&lt;li&gt;Response opens with: "I'd be happy to help you fix that 
TypeError! Let me walk you through what's likely happening 
here. TypeErrors in Express middleware are quite common and 
usually fall into a few categories..."&lt;/li&gt;
&lt;li&gt;Tokens in: ~800 (with any system context)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;With Prism:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Intent: DEBUG (0.94 confidence)&lt;/li&gt;
&lt;li&gt;Domains activated: security (0.91), javascript (0.87), 
node (0.72)&lt;/li&gt;
&lt;li&gt;System fragment injected: 52 tokens&lt;/li&gt;
&lt;li&gt;Response opens directly with the diagnosis&lt;/li&gt;
&lt;li&gt;Filler removed: 6 phrases&lt;/li&gt;
&lt;li&gt;Tokens saved: ~140&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The response isn't just shorter — it's structured differently. &lt;br&gt;
It leads with cause, then fix, then explanation. That's the &lt;br&gt;
intent-specific formatting doing its job.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;## How to use it&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Prism runs as a local proxy on port 3179. You change one &lt;br&gt;
thing in your code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Before&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Anthropic&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;baseURL&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;https://api.anthropic.com&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="c1"&gt;// After — that's it&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Anthropic&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;baseURL&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;http://localhost:3179&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Or use the SDK directly:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;prism&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;prism-ai&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;prism&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;send&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;fix the TypeError in my auth middleware&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;apiKey&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ANTHROPIC_API_KEY&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;intent&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;    &lt;span class="c1"&gt;// DEBUG&lt;/span&gt;
&lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;domains&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;   &lt;span class="c1"&gt;// ['security', 'javascript', 'node']&lt;/span&gt;
&lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;tokensIn&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;  &lt;span class="c1"&gt;// 312&lt;/span&gt;
&lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;fillerRemoved&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="c1"&gt;// 6&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Install:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npx prism-ai start
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's the entire setup.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;## Prism Agent&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I also built &lt;a href="https://github.com/YOUR_USERNAME/prism-agent" rel="noopener noreferrer"&gt;Prism Agent&lt;/a&gt; &lt;br&gt;
on top of the SDK — a Claude Code alternative with a live &lt;br&gt;
knowledge graph pane in the terminal. Every turn shows you &lt;br&gt;
which domains activated, their BM25 scores, tokens saved, &lt;br&gt;
and filler removed. You can pin domains (always include this) &lt;br&gt;
or suppress them (never use this). First coding agent that &lt;br&gt;
isn't a black box.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-g&lt;/span&gt; prism-agent
prism-agent
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Why this matters beyond token costs
&lt;/h2&gt;

&lt;p&gt;The token savings are real but they're not the main point. &lt;br&gt;
The main point is that focused context produces better &lt;br&gt;
responses. When Claude is directed at a specific domain &lt;br&gt;
with a specific response format for a specific intent type, &lt;br&gt;
the quality of the answer goes up — not just the length &lt;br&gt;
down. The Response Enforcer compounds this by removing the &lt;br&gt;
preamble padding that dilutes the actual answer.&lt;/p&gt;

&lt;p&gt;BM25 has been solving information retrieval problems for 50 &lt;br&gt;
years. It doesn't hallucinate. It doesn't drift. It doesn't &lt;br&gt;
need a GPU. It runs in a for loop. For the specific problem &lt;br&gt;
of "which knowledge domain is this query about," it's more &lt;br&gt;
than sufficient — and it's the right tool precisely because &lt;br&gt;
it's so much simpler than the alternatives.&lt;/p&gt;

&lt;p&gt;Both projects are fully open source and MIT licensed.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;prism-ai&lt;/strong&gt;: &lt;a href="https://npmjs.com/package/prism-ai" rel="noopener noreferrer"&gt;npm&lt;/a&gt; | 
&lt;a href="https://github.com/YOUR_USERNAME/prism-ai" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;prism-agent&lt;/strong&gt;: 
&lt;a href="https://github.com/YOUR_USERNAME/prism-agent" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you're building anything on top of Claude or other LLM &lt;br&gt;
APIs and want to talk through the BM25 implementation, &lt;br&gt;
drop a comment — happy to go deeper on any part of this.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>opensource</category>
      <category>typescript</category>
      <category>llm</category>
    </item>
    <item>
      <title>I built an open-source Python tool to detect drift in embedding spaces</title>
      <dc:creator>Praful Reddy</dc:creator>
      <pubDate>Fri, 17 Apr 2026 00:19:56 +0000</pubDate>
      <link>https://dev.to/prafulreddy/i-built-an-open-source-python-tool-to-detect-drift-in-embedding-spaces-2ea4</link>
      <guid>https://dev.to/prafulreddy/i-built-an-open-source-python-tool-to-detect-drift-in-embedding-spaces-2ea4</guid>
      <description>&lt;h1&gt;
  
  
  I built an open-source Python tool to detect drift in embedding spaces
&lt;/h1&gt;

&lt;p&gt;Most monitoring pipelines wait for a downstream metric to break: accuracy drops, retrieval quality slips, or user-facing behavior gets worse.&lt;/p&gt;

&lt;p&gt;By then, the shift has already happened.&lt;/p&gt;

&lt;p&gt;I wanted a simpler way to catch changes earlier by looking directly at the embeddings themselves.&lt;/p&gt;

&lt;p&gt;So I built &lt;strong&gt;drift-lens-monitor&lt;/strong&gt; — an open-source Python package for detecting drift in embedding spaces.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;GitHub: &lt;a href="https://github.com/PRAFULREDDYM/Drift_lense" rel="noopener noreferrer"&gt;https://github.com/PRAFULREDDYM/Drift_lense&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;PyPI: &lt;a href="https://pypi.org/project/drift-lens-monitor/" rel="noopener noreferrer"&gt;https://pypi.org/project/drift-lens-monitor/&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What problem this solves
&lt;/h2&gt;

&lt;p&gt;A lot of modern ML systems depend on embeddings:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;semantic search&lt;/li&gt;
&lt;li&gt;RAG pipelines&lt;/li&gt;
&lt;li&gt;recommenders&lt;/li&gt;
&lt;li&gt;clustering&lt;/li&gt;
&lt;li&gt;classification pipelines with embedding-based features&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Even when the raw system looks “healthy,” the embedding space can start changing underneath you.&lt;/p&gt;

&lt;p&gt;That change may come from:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;new user behavior&lt;/li&gt;
&lt;li&gt;model updates&lt;/li&gt;
&lt;li&gt;data source changes&lt;/li&gt;
&lt;li&gt;upstream preprocessing differences&lt;/li&gt;
&lt;li&gt;gradual distribution shift over time&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you only monitor business metrics or model accuracy, you often detect the issue late.&lt;/p&gt;

&lt;p&gt;The idea behind this project is simple:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;take snapshots of embeddings over time and compare them directly.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What drift-lens-monitor includes
&lt;/h2&gt;

&lt;p&gt;The package currently supports three drift detection approaches.&lt;/p&gt;

&lt;h3&gt;
  
  
  1) Fréchet Embedding Distance (FED)
&lt;/h3&gt;

&lt;p&gt;This is inspired by FID-style comparison, but applied to arbitrary embeddings.&lt;/p&gt;

&lt;p&gt;At a high level, it compares the mean and covariance of two embedding distributions.&lt;/p&gt;

&lt;p&gt;Useful when you want a compact statistical distance between a reference snapshot and a current snapshot.&lt;/p&gt;

&lt;h3&gt;
  
  
  2) Maximum Mean Discrepancy (MMD)
&lt;/h3&gt;

&lt;p&gt;This is a kernel-based, non-parametric method for comparing two samples.&lt;/p&gt;

&lt;p&gt;I included permutation-based p-values so it can be used not just as a raw distance, but also as a statistical test.&lt;/p&gt;

&lt;p&gt;Useful when you want a more flexible distribution comparison without assuming Gaussian structure.&lt;/p&gt;

&lt;h3&gt;
  
  
  3) Persistent homology
&lt;/h3&gt;

&lt;p&gt;This is the unusual one.&lt;/p&gt;

&lt;p&gt;Instead of only asking whether two embedding clouds differ statistically, this looks at whether their &lt;strong&gt;shape&lt;/strong&gt; changes.&lt;/p&gt;

&lt;p&gt;It builds topological summaries over the point cloud and compares them using Wasserstein distance.&lt;/p&gt;

&lt;p&gt;Why that matters:&lt;/p&gt;

&lt;p&gt;A system can preserve rough averages while still changing structurally. For example:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;clusters merge&lt;/li&gt;
&lt;li&gt;clusters split&lt;/li&gt;
&lt;li&gt;holes or loops appear/disappear&lt;/li&gt;
&lt;li&gt;local geometry shifts in ways mean/covariance may not capture&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This makes persistent homology an interesting complement to more standard drift metrics.&lt;/p&gt;

&lt;h2&gt;
  
  
  Design goals
&lt;/h2&gt;

&lt;p&gt;I wanted the tool to stay practical:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;local-first&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;no cloud dependency&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;no API keys&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;simple snapshot storage&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;easy to inspect and extend&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Snapshots are stored as &lt;strong&gt;parquet files&lt;/strong&gt;, so the workflow stays lightweight and reproducible.&lt;/p&gt;

&lt;p&gt;The package can be used:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;as a Python library&lt;/li&gt;
&lt;li&gt;through a &lt;strong&gt;Streamlit dashboard&lt;/strong&gt; for visual exploration&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Install
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
bash
pip install drift-lens-monitor

## Example workflow

The intended workflow is straightforward:

1. Save a reference embedding snapshot
2. Save a new embedding snapshot later
3. Compare them using one or more drift methods
4. Inspect drift scores and visualize the changes

This makes it usable for both experimentation and production-adjacent monitoring workflows.

## Why I built it

I was interested in a gap I keep seeing in ML tooling:

we monitor model outputs, latency, costs, and downstream metrics heavily, but we often do much less direct monitoring of the representation space itself.

Embeddings are doing a huge amount of work in modern AI systems. They deserve first-class monitoring too.

I also wanted to explore whether more unusual techniques like **topological drift detection** could add signal beyond standard statistical distances.

## What I’d love feedback on

I’d especially love feedback on three things:

1. Does persistent homology feel genuinely useful here, or too heavyweight?
2. What baselines or benchmark datasets would make this more convincing?
3. How should the package/API be improved to make it easier to use in real workflows?

## Links

- GitHub: https://github.com/PRAFULREDDYM/Drift_lense
- PyPI: https://pypi.org/project/drift-lens-monitor/
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

</description>
      <category>python</category>
      <category>machinelearning</category>
      <category>opensource</category>
      <category>ai</category>
    </item>
    <item>
      <title>A zero-token progress bar for Claude Code</title>
      <dc:creator>Praful Reddy</dc:creator>
      <pubDate>Fri, 10 Apr 2026 11:48:59 +0000</pubDate>
      <link>https://dev.to/prafulreddy/a-zero-token-progress-bar-for-claude-code-51bp</link>
      <guid>https://dev.to/prafulreddy/a-zero-token-progress-bar-for-claude-code-51bp</guid>
      <description>&lt;p&gt;Every Claude Code extension I've seen shows the same thing: token counts, API costs, model info. Useful, but it doesn't answer the question I actually care about mid-session — how much of the work is done.&lt;br&gt;
So I built task-progress-bar. It reads Claude Code's native task list from disk and renders a live ASCII progress bar with time estimates. It runs as a PostToolUse hook, which means it consumes zero tokens — it's a subprocess that Claude never sees.&lt;/p&gt;

&lt;p&gt;Tasks [████████░░] 8/10 (~3m left) | ✓8 ⟳1 ○1&lt;/p&gt;

&lt;h2&gt;
  
  
  How it works
&lt;/h2&gt;

&lt;p&gt;Claude Code persists tasks as JSON files in ~/.claude/tasks/. The plugin watches for TodoWrite, TodoRead, TaskCreate, and TaskUpdate tool calls via the PostToolUse hook, then:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Parses every JSON file in the tasks directory&lt;/li&gt;
&lt;li&gt;Counts completed, in-progress, and pending tasks&lt;/li&gt;
&lt;li&gt;Computes a time estimate using an exponential moving average (EMA)&lt;/li&gt;
&lt;li&gt;Outputs a single status line to Claude Code's statusLine renderer&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The time estimation is straightforward. Each time a task moves to completed, the timestamp is logged. The interval between consecutive completions feeds into an EMA with α=0.3:&lt;/p&gt;

&lt;p&gt;EMA_new = 0.3 × latest_interval + 0.7 × EMA_old&lt;br&gt;
estimated_remaining = EMA × tasks_left&lt;/p&gt;

&lt;p&gt;Intervals over 1 hour are clamped to avoid skew from session breaks. The first 3 completions show "calculating..." until there's enough data.&lt;br&gt;
The entire plugin is a single Python file — stdlib only, no pip dependencies. It uses json, pathlib, time, and sys. Nothing else.&lt;/p&gt;

&lt;h2&gt;
  
  
  Install
&lt;/h2&gt;

&lt;p&gt;curl -fsSL &lt;a href="https://raw.githubusercontent.com/PRAFULREDDYM/task-progress-bar/main/install.sh" rel="noopener noreferrer"&gt;https://raw.githubusercontent.com/PRAFULREDDYM/task-progress-bar/main/install.sh&lt;/a&gt; | bash&lt;/p&gt;

&lt;p&gt;The installer checks for Python 3.8+, copies the script to ~/.claude/, and patches settings.json with the hook configuration. There's a matching uninstall.sh for clean removal.&lt;/p&gt;

&lt;h2&gt;
  
  
  What it looks like
&lt;/h2&gt;

&lt;p&gt;The progress bar color-codes by completion percentage — red below 33%, yellow from 33–66%, green above 66%. When all tasks finish, it shows ✅ All done! for 30 seconds and then hides.&lt;br&gt;
If you run it standalone (python3 task_progress_bar.py), you get a full multi-line colored view with a task breakdown and per-task average.&lt;/p&gt;

&lt;h2&gt;
  
  
  Links
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;GitHub: &lt;a href="https://dev.tourl"&gt;github.com/PRAFULREDDYM/task-progress-bar&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Requirements: Python 3.8+, Claude Code v2.1+&lt;/li&gt;
&lt;li&gt;License: MIT&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>claudecode</category>
      <category>productivity</category>
      <category>terminal</category>
      <category>python</category>
    </item>
  </channel>
</rss>
