Have you ever wanted an AI assistant that actually remembers market context without relying on cloud APIs or paying token fees? Today, I'm going to walk you through how we built a Trading Research Memory Agent, a fully localized python architecture using Zvec, Ollama, and HuggingFace Datasets.
This system ingests thousands of daily market rows, extracts the sentiment and articles, embeds them via local models, and allows a Local LLM to synthesize research insights. We are bringing the research desk to the terminal.
๐ ๏ธ The Tech Stack
To ensure that the entire pipeline is fast, offline, and free from API-rate limits, we utilized:
- Zvec (0.2.0) - A blazing-fast vector database that supports dense vectors, sparse vectors, scalar filtering, and hybrid search built right into Python.
-
Ollama - Powering both local embeddings (
nomic-embed-text) and the reactive LLM agent query generation (llama3.2). -
HuggingFace - Providing the foundational dataset (
danilocorsi/LLMs-Sentiment-Augmented-Bitcoin-Dataset), packed with 25k+ days of detailed Bitcoin pricing, LLM-reasoning sentiment scores, and historical articles.
๐๏ธ Architecture Overview
The system pipeline is entirely self-contained. It operates in two major phases: Ingestion and Querying.
1. Ingestion Phase
The dataset stores complex nested dataโeach row represents a calendar day of Bitcoin information alongside JSON-encoded lists of articles from CoinTelegraph, Bitcoin News, and Reddit.
Our script (ingest.py):
- Explodes the arrays so each physical article becomes its own vector document (turning 5,000 late-2024 rows into tens of thousands of individual records).
-
Embeds the text using a custom
DenseEmbeddingFunctionhooked into Ollama (nomic-embed-text) and a customBM25EmbeddingFunctionfor sparse lexical representation. -
Persists it locally using
zvec.create_and_open().
Crucially, Zvec 0.2 introduces typed data schemas, so our metadata filters (like sentiment_class="bearish" or fng_value=0.2) are typed strictly into INT64, DOUBLE, or STRING fields, allowing for lightning-fast SQL-style exclusion routing during queries.
2. Query Phase
When a user types a query like "Show me strong buy signals with high confidence from 2024", our agent.py:
-
Parses the natural language into strict Zvec string filters (
action_class = 'buy' and year = 2024). - Encodes the query into both Dense and Sparse Vectors simultaneously.
-
Executes a Hybrid Search on Zvec, using a built-in
RrfReRanker(Reciprocal Rank Fusion) to combine semantic relevancy (Dense) with exact keyword matching (Sparse). -
Synthesizes the top 8 documents by passing them as context into our
llama3.2agent prompt, returning a clean, analytical summary.
๐งฉ Building on Top of This (Forking)
The beauty of Zvec's Python implementation combined with Ollama's local hosting is how effortlessly you can modify the agent:
-
Swap the Asset: The HuggingFace dataset tracks Bitcoin, but you can build your own CSV pipeline for stocks, ETH, or commodities and just feed the
explode_rowfunction. -
Upgrade the Agent: Currently,
llama3.2runs a zero-shot synthesization prompt. By extending the CLI loop inagent.py, you could turn this into an iterative ReAct loop that triggers follow-up searches. -
Experiment with Re-Ranking: We used standard RRF, but
zvecsupports weighted re-rankers or cross-encoder architectures that you can snap in.
Getting Started
Everything runs through standard Python. All you need is ollama serve running in the background.
git clone https://github.com/harishkotra/zvec-trading-research-agent zvec-research-agent
cd zvec-research-agent
pip install -r requirements.txt
ollama pull nomic-embed-text
ollama pull llama3.2
python ingest.py # Loads and embeds the data locally
python agent.py # Starts the interactive terminal
Build locally. Keep your data private. Happy coding!


Top comments (0)