DEV Community

Yuuichi Eguchi
Yuuichi Eguchi

Posted on

I built two high-performance Python libraries for production AI: LLM log analytics and vector similarity search

Hello everyone,

I'm excited to share two Python libraries I've been working on recently: llmlog_engine and mini_faiss. Both tackle performance-critical problems in production AI systems with C++ implementations under the hood while providing clean, Pythonic APIs.

For context, I've been building LLM-powered applications in production, and two recurring bottlenecks kept appearing. First, analyzing application logs to understand model behavior, error rates, and latency patterns was painfully slow with pandas alone. Second, running similarity searches on embeddings for retrieval systems felt like overkill with full FAISS for smaller datasets, yet pure NumPy was too slow.

I explored existing solutions but found a gap: llmlog_engine addresses the need for a lightweight, embedded analytics engine specifically designed for LLM logs, while mini_faiss provides a minimal vector search library that's easier to understand and integrate than full FAISS but significantly faster than NumPy.

Both libraries share the same philosophy: solve one problem exceptionally well with minimal dependencies and maximum performance.

What My Projects Do

llmlog_engine: Columnar Analytics for LLM Logs

A specialized embedded database for analyzing LLM application logs stored as JSONL.

Core capabilities:

  • Fast JSONL ingestion into columnar storage format
  • Efficient filtering on numeric and string columns
  • Group-by aggregations (COUNT, SUM, AVG, MIN, MAX)
  • Dictionary encoding for low-cardinality strings (model names, routes)
  • SIMD-friendly memory layout for performance
  • pandas DataFrame integration

Performance:

  • 6.8x faster than pure Python on 100k rows
  • Benchmark: Filter by model + latency, group by route, compute 6 metrics
    • Pure Python: 0.82s
    • C++ Engine: 0.12s

mini_faiss: Lightweight Vector Similarity Search

A focused, high-performance library for similarity search in dense embeddings.

Core capabilities:

  • SIMD-accelerated distance computation (L2 and inner product)
  • NumPy-friendly API with clean type signatures
  • ~1500 lines of readable C++ code
  • Support for both Euclidean and cosine similarity
  • Heap-based top-k selection

Performance:

  • ~7x faster than pure NumPy on typical workloads
  • Benchmark: 100k vectors, 768 dimensions
    • mini_faiss: 0.067s
    • NumPy: 0.48s

Architecture Philosophy

Both libraries follow the same design pattern:

  1. Core logic in C++17: Performance-critical operations using modern C++
  2. Python bindings via pybind11: Zero-copy data transfer with NumPy
  3. Minimal dependencies: No heavy frameworks or complex build chains
  4. Columnar/SIMD-friendly layouts: Data structures optimized for CPU cache
  5. Type safety: Strict validation at Python/C++ boundary

This approach delivers near-native performance while maintaining Python's developer experience.

Syntax Examples

llmlog_engine

Load and analyze logs:

from llmlog_engine import LogStore

# Load JSONL logs
store = LogStore.from_jsonl("production_logs.jsonl")

# Analyze slow responses by model
slow_by_model = (store.query()
    .filter(min_latency_ms=500)
    .aggregate(
        by=["model"],
        metrics={
            "count": "count",
            "avg_latency": "avg(latency_ms)",
            "max_latency": "max(latency_ms)"
        }
    ))

print(slow_by_model)  # Returns pandas DataFrame
Enter fullscreen mode Exit fullscreen mode

Error analysis:

# Analyze error rates by model and route
errors = (store.query()
    .filter(status="error")
    .aggregate(
        by=["model", "route"],
        metrics={"count": "count"}
    ))
Enter fullscreen mode Exit fullscreen mode

Combined filters:

# Filter by multiple conditions (AND logic)
result = (store.query()
    .filter(
        model="gpt-4.1",
        min_latency_ms=1000,
        route="chat"
    )
    .aggregate(
        by=["model"],
        metrics={"avg_tokens": "avg(tokens_output)"}
    ))
Enter fullscreen mode Exit fullscreen mode

Expected JSONL format:

{"ts": "2024-01-01T12:00:00Z", "model": "gpt-4.1", "latency_ms": 423, "tokens_input": 100, "tokens_output": 921, "route": "chat", "status": "ok"}
{"ts": "2024-01-01T12:00:15Z", "model": "gpt-4.1-mini", "latency_ms": 152, "tokens_input": 50, "tokens_output": 214, "route": "rag", "status": "ok"}
Enter fullscreen mode Exit fullscreen mode

mini_faiss

Basic similarity search:

import numpy as np
from mini_faiss import IndexFlatL2

# Create index for 768-dimensional vectors
d = 768
index = IndexFlatL2(d)

# Add vectors to index
xb = np.random.randn(10000, d).astype("float32")
index.add(xb)

# Search for nearest neighbors
xq = np.random.randn(5, d).astype("float32")
distances, indices = index.search(xq, k=10)

print(distances.shape)  # (5, 10) - 5 queries, 10 neighbors each
print(indices.shape)    # (5, 10)
Enter fullscreen mode Exit fullscreen mode

Cosine similarity search:

from mini_faiss import IndexFlatIP

# Create inner product index
index = IndexFlatIP(d=768)

# Normalize vectors for cosine similarity
xb = np.random.randn(10000, 768).astype("float32")
xb /= np.linalg.norm(xb, axis=1, keepdims=True)

index.add(xb)
distances, indices = index.search(xq_normalized, k=10)
# Higher distances = more similar
Enter fullscreen mode Exit fullscreen mode

Implementation Highlights

llmlog_engine

Columnar storage with dictionary encoding:

  • String columns (model, route, status) mapped to int32 IDs
  • Numeric columns stored as contiguous arrays
  • Filtering operates on compact integer representations

Query execution:

  1. Build boolean mask from filter predicates (AND logic)
  2. Group matching rows by specified columns
  3. Compute aggregations only on filtered rows
  4. Return pandas DataFrame

Example internal representation:

Column: model       [0, 1, 0, 2, 0, ...] (int32 IDs)
Column: latency_ms  [423, 1203, 512, ...] (int32)
Dictionary: model   {0: "gpt-4.1-mini", 1: "gpt-4.1", 2: "gpt-4-turbo"}
Enter fullscreen mode Exit fullscreen mode

mini_faiss

Distance computation:

  • L2: ||q - db||^2 = ||q||^2 - 2*q·db + ||db||^2
  • Precomputes database norms for efficiency
  • Vectorizable loops enable SIMD auto-vectorization

Top-k selection:

  • Heap-based algorithm: O(N log k) per query
  • Efficient for typical case where k << N
  • Separate implementations for min (L2) and max (inner product)

Row-major storage:

data = [v_0[0], v_0[1], ..., v_0[d-1],
        v_1[0], v_1[1], ..., v_1[d-1],
        ...]
Enter fullscreen mode Exit fullscreen mode

Cache-friendly for batch distance computation.

Installation

Both libraries use standard Python packaging:

# llmlog_engine
git clone https://github.com/yuuichieguchi/llmlog_engine.git
cd llmlog_engine
pip install -e .

# mini_faiss
git clone https://github.com/yuuichieguchi/mini_faiss.git
cd mini_faiss
pip install .
Enter fullscreen mode Exit fullscreen mode

Requirements:

  • Python 3.8+
  • C++17 compiler (GCC, Clang, MSVC)
  • CMake 3.15+
  • pybind11 (installed via pip)

Use Cases

llmlog_engine

  • Monitor LLM application health in production
  • Analyze latency patterns by model and endpoint
  • Track error rates and failure modes
  • Debug performance regressions
  • Generate usage reports for cost analysis

mini_faiss

  • Dense retrieval for RAG systems
  • Document similarity search
  • Image search using vision model embeddings
  • Recommendation systems (nearest neighbor recommendations)
  • Prototyping before scaling to full FAISS

Known Limitations

llmlog_engine

  • In-memory only (no persistence yet)
  • Single-threaded query execution
  • No complex expressions or nested objects
  • No distributed processing

mini_faiss

  • Brute force search only (no approximate methods)
  • Append-only index (no deletion/updates)
  • Fixed vector dimension per index
  • Single machine, memory-limited (~1M vectors at 768d ≈ 3GB)

Both libraries prioritize simplicity and correctness in V1. Advanced features (parallel execution, approximate search, compression) can be added without breaking APIs.

Target Audience

These libraries are for Python developers who:

  • Need better performance than pure Python/NumPy
  • Want minimal dependencies and simple APIs
  • Prefer understanding their dependencies (both are <2000 lines of C++)
  • Are building small to medium-scale systems
  • Value type safety and clean abstractions

I'm actively using both in production, so they're battle-tested against real workloads.

Comparison to Alternatives

llmlog_engine vs. pandas/DuckDB:

  • More specialized: purpose-built for LLM log schema
  • Faster for common queries on columnar data
  • Simpler: no SQL, just Python method chaining
  • Embedded: no external process or server

mini_faiss vs. FAISS/NumPy:

  • Simpler than FAISS: easier to understand, modify, debug
  • Faster than NumPy: SIMD acceleration, optimized layout
  • Smaller scope: does one thing well (exact search)
  • Better for learning: clean, readable implementation

Future Roadmap

llmlog_engine

  • Memory-mapped on-disk format
  • Parallel query execution
  • SIMD micro-optimizations
  • Timestamp range filters
  • Compression for numeric columns

mini_faiss

  • Approximate search methods (IVF, PQ, HNSW)
  • GPU acceleration (CUDA/Metal)
  • Index serialization (save/load)
  • Multi-threaded search
  • Custom distance functions

Feedback Welcome

I'd love to hear:

  • Does this solve problems you're facing?
  • What features would make these more useful?
  • Any bugs or edge cases I should handle?
  • Performance bottlenecks in your use cases?

Both projects are MIT licensed and contributions are welcome!

llmlog_engine:

mini_faiss:

Thanks for reading!

Top comments (0)