We’re generating more data than ever, and AI‑powered search is great—until your dataset gets huge and your RAM starts crying for mercy. Most vector search systems rely on expensive DRAM to keep indexes fast, but that approach doesn’t scale. KIOXIA’s AiSAQ (All‑in‑Storage ANNS with Product Quantization) flips the script: it runs approximate nearest neighbor search directly on SSD, slashing DRAM usage by 3,200× in billion‑scale workloads. The vic_aisaq_demo repo from ARPA Hellenic Logical Systems puts this tech into a practical, local‑first retrieval pipeline that’s as auditable as it is efficient.
TL;DR:
vic_aisaq_democombines tiered metadata filtering with flash‑optimized vector search to keep memory low and answers relevant. It’s a live demo of storage‑aware AI for edge and controller‑style environments.
The Problem: DRAM Is the Bottleneck
Graph‑based nearest neighbor search (like HNSW) is fast, but it keeps key index structures in DRAM. With billion‑scale datasets, memory costs explode. Even compressed representations can still require tens of gigabytes of RAM. KIOXIA’s AiSAQ technology changes that by moving those compressed vectors to flash storage, consuming as little as 10 MB of DRAM during search without sacrificing recall.
But low DRAM is only half the story. You also need a retrieval strategy that doesn’t waste time parsing irrelevant files.
How vic_aisaq_demo Works: Tiered Retrieval Meets Flash‑Native Search
The demo builds on two open‑source building blocks:
-
lc0_vic– a tiered retrieval controller that plans and orchestrates search in layers (L0 → L1 → L2). -
aisaq-diskann– a flash‑oriented ANN backend optimized for low‑DRAM environments.
The execution flow is refreshingly simple:
- Librarian / Plan – Turn a natural‑language question into retrieval intent using a lightweight LLM (e.g., qwen2.5:0.5b via Ollama).
- L0 Metadata Filter – Narrow down candidate files by extension, size, time, or path hints. Cheap and fast.
- L1 Vector Search – Run native AiSAQ ANN search over embeddings to find semantically similar content.
- L2 Deep Read – Parse only the top few files and extract evidence snippets.
- Ranked Response – Return paths, scores, and run metrics.
The tiered approach keeps deep parsing affordable at scale.
Benchmark results show latency remains stable as dataset size grows, while DRAM footprint stays near zero. The funnel chart below visualises how each tier slashes the candidate pool:
And here’s how the pipeline shifts results from superficial matching to true semantic evidence:
Try It Yourself
The repo is built to be reproducible and local‑first. You’ll need:
- WSL (Ubuntu) for building AiSAQ binaries
-
Ollama running locally (or over the network) with two models:
- Planner model:
qwen2.5:0.5b - Embedding model:
embeddinggemma
- Planner model:
- Python 3.13 and the usual suspects (see
requirements.txt)
Once you’ve built the AiSAQ index from a sample drive, a query like:
python3 scripts/run_query.py \
"Find the Q3 2025 contract that mentions penalty clauses" \
--aisaq-root /home/$USER/aisaq-diskann
…will return ranked files with evidence snippets, tier labels, and latency metrics.
Why This Matters
vic_aisaq_demo isn’t just a toy. It demonstrates a realistic, storage‑aware retrieval pattern that could run on devices with tight memory budgets—think edge gateways, embedded controllers, or even future SSD firmware that embeds intelligence directly on the drive. The Computational Storage Landscape report maps this evolution, and this repo is one of the first runnable examples that puts those ideas into practice.
The two charts below summarise that systems trade‑off and scaling behaviour:
The takeaway? You don’t need a cluster of DRAM‑heavy servers to run effective semantic search. Sometimes the smartest storage is the one that knows what not to load into memory.
Check out the full repo: ARPAHLS/vic_aisaq_demo




Top comments (0)