DEV Community: Imtiaz Ahmad

My LLM pipeline passed every eval. Then it started lying to users in production.

Imtiaz Ahmad — Sat, 06 Jun 2026 16:57:35 +0000

My LLM pipeline passed every eval. Then it started lying to users in production.

Not dramatically. Quietly. Confidently wrong.

Here's what happened.

I shipped a RAG pipeline for one of our SaaS products. Tested it on 40 documents. Response quality was sharp — I was genuinely proud of it. Onboarded the first real users. Within 72 hours, the system was returning answers that sounded authoritative but were just... fabricated. Hallucinated policy details. Made-up clause numbers.

I dug into the traces. The context window was silently overflowing. When the retrieved chunks exceeded the limit, the model didn't throw an error. It didn't truncate cleanly. It just started confabulating to fill the gap — and nothing in my eval suite caught it because my test docs were small.

The fix took 45 minutes. A token counter, a hard limit, a fallback message. Done.

But that 72-hour window cost me real trust with early users.

The lesson I keep relearning: LLMs don't fail loudly. They fail smoothly. The model will always return something — it will just sometimes be fiction dressed as fact. If you're shipping LLM features without per-call tracing and a token budget enforced at retrieval time, you are not testing for the failure mode that will actually hurt you.

Build the guardrails before you need them. Not after you read a support ticket, wondering why your product confidently told someone the wrong thing.

Building a Chrome Extension with Manifest V3, React, and a Shadow DOM Panel injected on Amazon published: true

Imtiaz Ahmad — Thu, 04 Jun 2026 15:40:07 +0000

Built Profit Scout — a MV3 Chrome Extension that injects a React-powered sourcing panel directly onto Amazon product pages. Reads ASIN from the DOM, calls a Node.js backend for margin calc + supplier data, renders results in a shadow DOM panel. The MV3 service worker constraint was the biggest architectural challenge.

Content Script — Reading the Amazon DOM

// content.js — injected on amazon.com/dp/* pages
const getProductData = () => {
  const asin = document.querySelector('[data-asin]')?.dataset.asin;
  const price = document.querySelector('.a-price .a-offscreen')
    ?.textContent.replace('$','');
  const title = document.querySelector('#productTitle')
    ?.textContent.trim();
  return { asin, price: parseFloat(price), title };
};

chrome.runtime.sendMessage({ type: 'ANALYSE_PRODUCT', data: getProductData() });

Service Worker — MV3 Stateless Pattern

// service-worker.js
chrome.runtime.onMessage.addListener((msg, sender, sendResponse) => {
  if (msg.type === 'ANALYSE_PRODUCT') {
    fetchAnalysis(msg.data).then(result => {
      // Persist to storage — service workers can terminate any time
      chrome.storage.local.set({ [msg.data.asin]: result });
      sendResponse(result);
    });
    return true; // keep channel open for async
  }
});

const fetchAnalysis = async ({ asin, price }) => {
  const [fees, suppliers, demand] = await Promise.all([
    fetch(`https://api.profitscout.io/fba-fees?asin=${asin}`).then(r=>r.json()),
    fetch(`https://api.profitscout.io/suppliers?asin=${asin}`).then(r=>r.json()),
    fetch(`https://api.profitscout.io/demand?asin=${asin}`).then(r=>r.json()),
  ]);
  return { fees, suppliers, demand };
};

Shadow DOM Panel — Injecting React Without Conflicts

// inject-panel.js
const host = document.createElement('div');
host.id = 'profit-scout-root';
document.body.appendChild(host);

const shadow = host.attachShadow({ mode: 'open' });
const mountPoint = document.createElement('div');
shadow.appendChild(mountPoint);

// Inject styles into shadow root so Amazon CSS doesn't bleed in
const style = document.createElement('style');
style.textContent = PANEL_STYLES; // bundled CSS string
shadow.appendChild(style);

ReactDOM.createRoot(mountPoint).render();

MV3 Gotchas I Hit

Service worker terminates after ~30s idle — never assume state persists. Use chrome.storage.local for everything.
No XMLHttpRequest in service workers — use fetch() only.
host_permissions required for cross-origin calls — declare your API domain in manifest.json.
Content Security Policy — Amazon's CSP blocks inline scripts. Shadow DOM + bundled JS sidesteps this cleanly.

Built at Ai Soft Tech Solution — aisofttechsolution.com

Building a persistent AI business assistant with LangChain, FastAPI, and Redis

Imtiaz Ahmad — Wed, 03 Jun 2026 12:50:23 +0000

TL;DR: I built a personal AI assistant that actually knows my business — using a LangChain agent, dual-layer memory (Redis + pgvector), and a model router that switches between GPT-4o and Claude 3.5 by task type. Here's the full architecture.

The architecture

The system has three layers:

Frontend — Next.js 14, WebSocket streaming for real-time responses
Agent layer — FastAPI + LangChain AgentExecutor with four tools (email, CRM, tasks, calendar)
Memory layer — Redis for session state, Supabase pgvector for long-term RAG

The memory problem

Most LLM demos are stateless. Each request hits the API cold. Jarvis solves this with a hybrid retriever: BM25 keyword search for exact names/dates + semantic cosine search for concepts. A cross-encoder re-ranker then trims results to the top 5 chunks before injection.

The model router

Not all tasks need the same model. I route tool-use tasks (CRM lookups, scheduling, email sends) to GPT-4o function calling, and writing/reasoning tasks to Claude 3.5 Sonnet. This cuts costs and improves output quality vs. using one model for everything.

Code snippet — tool registration in LangChain:

tools = [
    CRMQueryTool(db=supabase),
    EmailDraftTool(client=sendgrid),
    TaskManagerTool(db=redis),
    CalendarReaderTool()
]
agent = initialize_agent(tools, llm, agent=AgentType.OPENAI_FUNCTIONS)

Key learnings

Context injection strategy matters more than model choice
Redis TTL for session memory should match your average session length (I use 2h)
Always stream responses — users abandon non-streaming AI UIs within 3 seconds

Full repo coming soon. Follow for updates.