I spent the last 2 weeks compiling every production-ready LLM tool I could find. Not research papers. Not demos. Tools you can actually deploy today.
The result: a curated list organized by what you actually need to build.
The Stack
Here's the minimal stack for a production LLM application:
1. Inference → How you run the model
2. Vector DB → How you store embeddings (for RAG)
3. Framework → How you orchestrate prompts and chains
4. Monitoring → How you track quality and costs
5. Testing → How you ensure output quality
Let me break down the best tool in each category.
1. Inference: vLLM (self-hosted) or Groq (API)
Self-hosted: vLLM gives you the highest throughput for open models. Pair it with a quantized Llama 3 model and you get enterprise-grade inference at GPU rental cost.
API: Groq has the fastest inference speeds I've seen — tokens come back nearly instantly. Their free tier is generous enough for prototyping.
2. Vector DB: pgvector (if you already use Postgres)
Don't add another database to your stack unless you need to. If you're on Postgres, pgvector handles most RAG workloads. It won't beat Pinecone on billion-vector searches, but for 99% of apps it's fine.
-- That's literally it
CREATE EXTENSION vector;
CREATE TABLE documents (
id serial PRIMARY KEY,
content text,
embedding vector(1536)
);
-- Similarity search
SELECT content
FROM documents
ORDER BY embedding <=> '[0.1, 0.2, ...]'
LIMIT 5;
3. Framework: Just use the API directly
Controversial take: you probably don't need LangChain. For most applications, the OpenAI/Anthropic SDK + a simple retry wrapper is enough. LangChain adds complexity that makes debugging harder.
If you DO need a framework, Instructor is excellent for structured output.
4. Monitoring: Langfuse
Open source, self-hostable, tracks everything you need: token usage, latency, cost, output quality. It's what most startups I've talked to are using.
5. Testing: Promptfoo
Write test cases for your prompts like you write unit tests for code. Catch regressions before they hit production.
prompts:
- "Classify this support ticket: {{input}}"
tests:
- vars:
input: "My account is locked"
assert:
- type: contains
value: "account"
The Full List
I put together a comprehensive list with 50+ tools across all categories: Awesome LLM Tools 2026
It covers inference engines, quantization tools, RAG frameworks, vector databases, prompt engineering, monitoring, fine-tuning, deployment platforms, and cost optimization strategies.
What's Missing?
What tools are you using that I should add to the list? Especially interested in:
- Evaluation frameworks
- Cost optimization tools
- Self-hosting solutions
More resources:
- Awesome LLM Tools — the full curated list
- The Real Cost of Running an LLM in Production
- Free API Directory — 100+ free APIs
Top comments (0)