Building a Fully Local, Offline Trading Research Memory Agent with Zvec

#python #ai #tutorial #programming

Have you ever wanted an AI assistant that actually remembers market context without relying on cloud APIs or paying token fees? Today, I'm going to walk you through how we built a Trading Research Memory Agent, a fully localized python architecture using Zvec, Ollama, and HuggingFace Datasets.

This system ingests thousands of daily market rows, extracts the sentiment and articles, embeds them via local models, and allows a Local LLM to synthesize research insights. We are bringing the research desk to the terminal.

🛠️ The Tech Stack

To ensure that the entire pipeline is fast, offline, and free from API-rate limits, we utilized:

Zvec (0.2.0) - A blazing-fast vector database that supports dense vectors, sparse vectors, scalar filtering, and hybrid search built right into Python.
Ollama - Powering both local embeddings (nomic-embed-text) and the reactive LLM agent query generation (llama3.2).
HuggingFace - Providing the foundational dataset (danilocorsi/LLMs-Sentiment-Augmented-Bitcoin-Dataset), packed with 25k+ days of detailed Bitcoin pricing, LLM-reasoning sentiment scores, and historical articles.

🏗️ Architecture Overview

The system pipeline is entirely self-contained. It operates in two major phases: Ingestion and Querying.

1. Ingestion Phase

The dataset stores complex nested data—each row represents a calendar day of Bitcoin information alongside JSON-encoded lists of articles from CoinTelegraph, Bitcoin News, and Reddit.

Our script (ingest.py):

Explodes the arrays so each physical article becomes its own vector document (turning 5,000 late-2024 rows into tens of thousands of individual records).
Embeds the text using a custom DenseEmbeddingFunction hooked into Ollama (nomic-embed-text) and a custom BM25EmbeddingFunction for sparse lexical representation.
Persists it locally using zvec.create_and_open().

Crucially, Zvec 0.2 introduces typed data schemas, so our metadata filters (like sentiment_class="bearish" or fng_value=0.2) are typed strictly into INT64, DOUBLE, or STRING fields, allowing for lightning-fast SQL-style exclusion routing during queries.

2. Query Phase

When a user types a query like "Show me strong buy signals with high confidence from 2024", our agent.py:

Parses the natural language into strict Zvec string filters (action_class = 'buy' and year = 2024).
Encodes the query into both Dense and Sparse Vectors simultaneously.
Executes a Hybrid Search on Zvec, using a built-in RrfReRanker (Reciprocal Rank Fusion) to combine semantic relevancy (Dense) with exact keyword matching (Sparse).
Synthesizes the top 8 documents by passing them as context into our llama3.2 agent prompt, returning a clean, analytical summary.

🧩 Building on Top of This (Forking)

The beauty of Zvec's Python implementation combined with Ollama's local hosting is how effortlessly you can modify the agent:

Swap the Asset: The HuggingFace dataset tracks Bitcoin, but you can build your own CSV pipeline for stocks, ETH, or commodities and just feed the explode_row function.
Upgrade the Agent: Currently, llama3.2 runs a zero-shot synthesization prompt. By extending the CLI loop in agent.py, you could turn this into an iterative ReAct loop that triggers follow-up searches.
Experiment with Re-Ranking: We used standard RRF, but zvec supports weighted re-rankers or cross-encoder architectures that you can snap in.

Getting Started

Everything runs through standard Python. All you need is ollama serve running in the background.

git clone https://github.com/harishkotra/zvec-trading-research-agent zvec-research-agent
cd zvec-research-agent
pip install -r requirements.txt
ollama pull nomic-embed-text
ollama pull llama3.2
python ingest.py  # Loads and embeds the data locally
python agent.py   # Starts the interactive terminal