Alex Spinov

Posted on Mar 25 • Edited on Mar 27

Every Tool You Need to Build an LLM App in 2026 (One List)

#ai #webdev #discuss #machinelearning

I spent the last 2 weeks compiling every production-ready LLM tool I could find. Not research papers. Not demos. Tools you can actually deploy today.

The result: a curated list organized by what you actually need to build.

The Stack

Here's the minimal stack for a production LLM application:

1. Inference    → How you run the model
2. Vector DB    → How you store embeddings (for RAG)
3. Framework    → How you orchestrate prompts and chains
4. Monitoring   → How you track quality and costs
5. Testing      → How you ensure output quality

Let me break down the best tool in each category.

1. Inference: vLLM (self-hosted) or Groq (API)

Self-hosted: vLLM gives you the highest throughput for open models. Pair it with a quantized Llama 3 model and you get enterprise-grade inference at GPU rental cost.

API: Groq has the fastest inference speeds I've seen — tokens come back nearly instantly. Their free tier is generous enough for prototyping.

2. Vector DB: pgvector (if you already use Postgres)

Don't add another database to your stack unless you need to. If you're on Postgres, pgvector handles most RAG workloads. It won't beat Pinecone on billion-vector searches, but for 99% of apps it's fine.

-- That's literally it
CREATE EXTENSION vector;
CREATE TABLE documents (
  id serial PRIMARY KEY,
  content text,
  embedding vector(1536)
);

-- Similarity search
SELECT content
FROM documents
ORDER BY embedding <=> '[0.1, 0.2, ...]'
LIMIT 5;

3. Framework: Just use the API directly

Controversial take: you probably don't need LangChain. For most applications, the OpenAI/Anthropic SDK + a simple retry wrapper is enough. LangChain adds complexity that makes debugging harder.

If you DO need a framework, Instructor is excellent for structured output.

4. Monitoring: Langfuse

Open source, self-hostable, tracks everything you need: token usage, latency, cost, output quality. It's what most startups I've talked to are using.

5. Testing: Promptfoo

Write test cases for your prompts like you write unit tests for code. Catch regressions before they hit production.

prompts:
  - "Classify this support ticket: {{input}}"

tests:
  - vars:
      input: "My account is locked"
    assert:
      - type: contains
        value: "account"

The Full List

I put together a comprehensive list with 50+ tools across all categories: Awesome LLM Tools 2026

It covers inference engines, quantization tools, RAG frameworks, vector databases, prompt engineering, monitoring, fine-tuning, deployment platforms, and cost optimization strategies.

What's Missing?

What tools are you using that I should add to the list? Especially interested in:

Evaluation frameworks
Cost optimization tools
Self-hosting solutions

More resources:

Awesome LLM Tools — the full curated list
The Real Cost of Running an LLM in Production
Free API Directory — 100+ free APIs

Need custom dev tools, scrapers, or API integrations? I build automation for dev teams. Email spinov001@gmail.com — or explore awesome-web-scraping.

You might also like:

Need data from the web without writing scrapers? Check my *Apify actors** — ready-made scrapers for HN, Reddit, LinkedIn, and 75+ more sites. Or email: spinov001@gmail.com*
NEW: CoinGecko Free Crypto API | Open-Meteo Free Weather API | ExchangeRate Free API

If you liked this, check out my list of 36+ Free APIs Every Developer Should Bookmark — all with generous free tiers, no credit card required.

Top comments (1)

Max Quimby • Mar 31

Comprehensive list — bookmarking this as a reference. A few thoughts from building LLM apps in production:

The gap I see most teams fall into is treating this like a "pick one from each category" problem. In practice, your choices cascade. Pick vLLM for inference and you're probably going safetensors format, which means your quantization options differ from teams on Ollama with GGUF. Pick LangChain for orchestration and your observability needs differ from teams using raw API calls with structured outputs.

Two categories I'd add that are becoming critical in 2026:

Cost attribution — when you're running multiple models across multiple providers, knowing which feature/user/team is responsible for what spend is harder than it sounds. Tools like Helicone and Portkey's analytics are becoming essential, not optional.

Prompt versioning — once you have more than 3-4 prompts in production, you need version control beyond "it's in git." Knowing which prompt version generated which outputs for debugging regressions is something that'll save you at 2am.

What's been the most surprising tool in your stack — something that wasn't on your radar initially but became essential?