Hello everyone,
I'm excited to share two Python libraries I've been working on recently: llmlog_engine and mini_faiss. Both tackle performance-critical problems in production AI systems with C++ implementations under the hood while providing clean, Pythonic APIs.
For context, I've been building LLM-powered applications in production, and two recurring bottlenecks kept appearing. First, analyzing application logs to understand model behavior, error rates, and latency patterns was painfully slow with pandas alone. Second, running similarity searches on embeddings for retrieval systems felt like overkill with full FAISS for smaller datasets, yet pure NumPy was too slow.
I explored existing solutions but found a gap: llmlog_engine addresses the need for a lightweight, embedded analytics engine specifically designed for LLM logs, while mini_faiss provides a minimal vector search library that's easier to understand and integrate than full FAISS but significantly faster than NumPy.
Both libraries share the same philosophy: solve one problem exceptionally well with minimal dependencies and maximum performance.
What My Projects Do
llmlog_engine: Columnar Analytics for LLM Logs
A specialized embedded database for analyzing LLM application logs stored as JSONL.
Core capabilities:
- Fast JSONL ingestion into columnar storage format
- Efficient filtering on numeric and string columns
- Group-by aggregations (COUNT, SUM, AVG, MIN, MAX)
- Dictionary encoding for low-cardinality strings (model names, routes)
- SIMD-friendly memory layout for performance
- pandas DataFrame integration
Performance:
- 6.8x faster than pure Python on 100k rows
- Benchmark: Filter by model + latency, group by route, compute 6 metrics
- Pure Python: 0.82s
- C++ Engine: 0.12s
mini_faiss: Lightweight Vector Similarity Search
A focused, high-performance library for similarity search in dense embeddings.
Core capabilities:
- SIMD-accelerated distance computation (L2 and inner product)
- NumPy-friendly API with clean type signatures
- ~1500 lines of readable C++ code
- Support for both Euclidean and cosine similarity
- Heap-based top-k selection
Performance:
- ~7x faster than pure NumPy on typical workloads
- Benchmark: 100k vectors, 768 dimensions
- mini_faiss: 0.067s
- NumPy: 0.48s
Architecture Philosophy
Both libraries follow the same design pattern:
- Core logic in C++17: Performance-critical operations using modern C++
- Python bindings via pybind11: Zero-copy data transfer with NumPy
- Minimal dependencies: No heavy frameworks or complex build chains
- Columnar/SIMD-friendly layouts: Data structures optimized for CPU cache
- Type safety: Strict validation at Python/C++ boundary
This approach delivers near-native performance while maintaining Python's developer experience.
Syntax Examples
llmlog_engine
Load and analyze logs:
from llmlog_engine import LogStore
# Load JSONL logs
store = LogStore.from_jsonl("production_logs.jsonl")
# Analyze slow responses by model
slow_by_model = (store.query()
.filter(min_latency_ms=500)
.aggregate(
by=["model"],
metrics={
"count": "count",
"avg_latency": "avg(latency_ms)",
"max_latency": "max(latency_ms)"
}
))
print(slow_by_model) # Returns pandas DataFrame
Error analysis:
# Analyze error rates by model and route
errors = (store.query()
.filter(status="error")
.aggregate(
by=["model", "route"],
metrics={"count": "count"}
))
Combined filters:
# Filter by multiple conditions (AND logic)
result = (store.query()
.filter(
model="gpt-4.1",
min_latency_ms=1000,
route="chat"
)
.aggregate(
by=["model"],
metrics={"avg_tokens": "avg(tokens_output)"}
))
Expected JSONL format:
{"ts": "2024-01-01T12:00:00Z", "model": "gpt-4.1", "latency_ms": 423, "tokens_input": 100, "tokens_output": 921, "route": "chat", "status": "ok"}
{"ts": "2024-01-01T12:00:15Z", "model": "gpt-4.1-mini", "latency_ms": 152, "tokens_input": 50, "tokens_output": 214, "route": "rag", "status": "ok"}
mini_faiss
Basic similarity search:
import numpy as np
from mini_faiss import IndexFlatL2
# Create index for 768-dimensional vectors
d = 768
index = IndexFlatL2(d)
# Add vectors to index
xb = np.random.randn(10000, d).astype("float32")
index.add(xb)
# Search for nearest neighbors
xq = np.random.randn(5, d).astype("float32")
distances, indices = index.search(xq, k=10)
print(distances.shape) # (5, 10) - 5 queries, 10 neighbors each
print(indices.shape) # (5, 10)
Cosine similarity search:
from mini_faiss import IndexFlatIP
# Create inner product index
index = IndexFlatIP(d=768)
# Normalize vectors for cosine similarity
xb = np.random.randn(10000, 768).astype("float32")
xb /= np.linalg.norm(xb, axis=1, keepdims=True)
index.add(xb)
distances, indices = index.search(xq_normalized, k=10)
# Higher distances = more similar
Implementation Highlights
llmlog_engine
Columnar storage with dictionary encoding:
- String columns (model, route, status) mapped to int32 IDs
- Numeric columns stored as contiguous arrays
- Filtering operates on compact integer representations
Query execution:
- Build boolean mask from filter predicates (AND logic)
- Group matching rows by specified columns
- Compute aggregations only on filtered rows
- Return pandas DataFrame
Example internal representation:
Column: model [0, 1, 0, 2, 0, ...] (int32 IDs)
Column: latency_ms [423, 1203, 512, ...] (int32)
Dictionary: model {0: "gpt-4.1-mini", 1: "gpt-4.1", 2: "gpt-4-turbo"}
mini_faiss
Distance computation:
- L2:
||q - db||^2 = ||q||^2 - 2*q·db + ||db||^2 - Precomputes database norms for efficiency
- Vectorizable loops enable SIMD auto-vectorization
Top-k selection:
- Heap-based algorithm: O(N log k) per query
- Efficient for typical case where k << N
- Separate implementations for min (L2) and max (inner product)
Row-major storage:
data = [v_0[0], v_0[1], ..., v_0[d-1],
v_1[0], v_1[1], ..., v_1[d-1],
...]
Cache-friendly for batch distance computation.
Installation
Both libraries use standard Python packaging:
# llmlog_engine
git clone https://github.com/yuuichieguchi/llmlog_engine.git
cd llmlog_engine
pip install -e .
# mini_faiss
git clone https://github.com/yuuichieguchi/mini_faiss.git
cd mini_faiss
pip install .
Requirements:
- Python 3.8+
- C++17 compiler (GCC, Clang, MSVC)
- CMake 3.15+
- pybind11 (installed via pip)
Use Cases
llmlog_engine
- Monitor LLM application health in production
- Analyze latency patterns by model and endpoint
- Track error rates and failure modes
- Debug performance regressions
- Generate usage reports for cost analysis
mini_faiss
- Dense retrieval for RAG systems
- Document similarity search
- Image search using vision model embeddings
- Recommendation systems (nearest neighbor recommendations)
- Prototyping before scaling to full FAISS
Known Limitations
llmlog_engine
- In-memory only (no persistence yet)
- Single-threaded query execution
- No complex expressions or nested objects
- No distributed processing
mini_faiss
- Brute force search only (no approximate methods)
- Append-only index (no deletion/updates)
- Fixed vector dimension per index
- Single machine, memory-limited (~1M vectors at 768d ≈ 3GB)
Both libraries prioritize simplicity and correctness in V1. Advanced features (parallel execution, approximate search, compression) can be added without breaking APIs.
Target Audience
These libraries are for Python developers who:
- Need better performance than pure Python/NumPy
- Want minimal dependencies and simple APIs
- Prefer understanding their dependencies (both are <2000 lines of C++)
- Are building small to medium-scale systems
- Value type safety and clean abstractions
I'm actively using both in production, so they're battle-tested against real workloads.
Comparison to Alternatives
llmlog_engine vs. pandas/DuckDB:
- More specialized: purpose-built for LLM log schema
- Faster for common queries on columnar data
- Simpler: no SQL, just Python method chaining
- Embedded: no external process or server
mini_faiss vs. FAISS/NumPy:
- Simpler than FAISS: easier to understand, modify, debug
- Faster than NumPy: SIMD acceleration, optimized layout
- Smaller scope: does one thing well (exact search)
- Better for learning: clean, readable implementation
Future Roadmap
llmlog_engine
- Memory-mapped on-disk format
- Parallel query execution
- SIMD micro-optimizations
- Timestamp range filters
- Compression for numeric columns
mini_faiss
- Approximate search methods (IVF, PQ, HNSW)
- GPU acceleration (CUDA/Metal)
- Index serialization (save/load)
- Multi-threaded search
- Custom distance functions
Feedback Welcome
I'd love to hear:
- Does this solve problems you're facing?
- What features would make these more useful?
- Any bugs or edge cases I should handle?
- Performance bottlenecks in your use cases?
Both projects are MIT licensed and contributions are welcome!
llmlog_engine:
mini_faiss:
Thanks for reading!
Top comments (0)