Building a Local-First Research Agent that Actually Remembers (using AIsa, Cognee & Ollama)

#programming #ai #javascript #tutorial

AI agents are great at searching the web. They are terrible at remembering what they found yesterday.

Most "memory" in AI apps is just vector similarity search—retrieving chunks of text that mathematically look like your query. This fails when you need structured reasoning over time (e.g., "How has the sentiment on usage-based pricing changed in the last 30 days?").

In this post, I'll show you how to build a Founder Research Agent that combines:

AIsa.one: For high-quality live web intelligence and LLM routing.
Cognee: For deterministic, graph-based memory (running locally!).
Ollama: For local inference (using Gemma 3 12b) to keep reasoning free and private.

The Stack

Backend: Node.js (Express) acting as the orchestrator.
Memory Service: Python (FastAPI) wrapping Cognee, because Cognee is Python-native.
Intelligence: AIsa API for search + Ollama for local processing.

Why this architecture?

We decouple the brain (LLM) from the memory (Cognee).

AIsa acts as our eyes and ears. Instead of maintaining 10 different scraper APIs, we use AIsa's unified gateway to search the web and scholar sources.
Cognee structures this raw text into a Knowledge Graph. Instead of just saving "Pricing is popular", it creates nodes:

(Concept: Usage-Based Pricing) --[relationship: increasing_adoption_in]--> (Market: SaaS).
Ollama runs the loop. We use
```
gemma3:12b
```
locally to extract these triples, saving huge API costs.

Key Implementation Details

1. Reliable Search with AIsa We switched from a myriad of tools to a single AIsa endpoint.

//src/services/aisa.js
const  response  =  await  axios.post('https://api.aisa.one/v1/chat/completions',  {
  model:  'gpt-4o',  // Using the best model for extraction
  messages:  [
  {  role:  "system",  content:  "You are a search engine. Return JSON..."  },
  {  role:  "user",  content:  query  }
  ]
});

2. Structured Evidence with Cognee The magic happens when we store data. We don't just dump text. We normalize it into a strict Pydantic model in our Python microservice:

class Evidence(BaseModel):
    id: str
    type: str
    summary: str
    source_url: str
    sentiment: str

This allows us to ask rigorous questions later, like "Show me all evidence that contradicts this hypothesis from trusted sources."

We now have an agent that runs locally, costs pennies (thanks to caching & local LLMs), and builds a clearer picture of the world the more you use it.