DEV Community

# llm

Posts

👋 Sign in for the ability to sort posts by relevant, latest, or top.
TOON Benchmarks: A Critical Analysis of Different Results

TOON Benchmarks: A Critical Analysis of Different Results

2
Comments 1
7 min read
Prompt Caching Slashed My AI Bills by 90%. Here's What Nobody Tells You.

Prompt Caching Slashed My AI Bills by 90%. Here's What Nobody Tells You.

Comments
5 min read
AI-Powered Resume & Job Description Matching with RAG

AI-Powered Resume & Job Description Matching with RAG

Comments
1 min read
LLPY-14: Evaluación y Métricas de Calidad - Midiendo el Éxito del RAG

LLPY-14: Evaluación y Métricas de Calidad - Midiendo el Éxito del RAG

Comments
12 min read
Agent Optimization: Why Context Engineering Isn’t Enough

Agent Optimization: Why Context Engineering Isn’t Enough

Comments
5 min read
Train it or feed it? Teaching LLMs your data the smart way

Train it or feed it? Teaching LLMs your data the smart way

Comments
4 min read
Understanding RAG: How AI Models Learn to Search Before They Speak

Understanding RAG: How AI Models Learn to Search Before They Speak

1
Comments
3 min read
🧑‍🚀 Choosing the Right Engine to Launch Your LLM (LM Studio, Ollama, and vLLM)

🧑‍🚀 Choosing the Right Engine to Launch Your LLM (LM Studio, Ollama, and vLLM)

Comments 3
3 min read
AI Security Tools Find Critical curl Vulnerabilities

AI Security Tools Find Critical curl Vulnerabilities

Comments
9 min read
Why Claude Code's Unix Philosophy Beats Other AI Assistants

Why Claude Code's Unix Philosophy Beats Other AI Assistants

Comments
8 min read
AutoAgents – a Rust-Based Multi-Agent Framework for LLM-Powered Intelligence

AutoAgents – a Rust-Based Multi-Agent Framework for LLM-Powered Intelligence

7
Comments
1 min read
Step-by-Step: Manual vLLM Setup on Google Cloud L4 (Debian)

Step-by-Step: Manual vLLM Setup on Google Cloud L4 (Debian)

Comments
2 min read
🧩 Runtime Snapshots #3 — QA That Speaks JSON

🧩 Runtime Snapshots #3 — QA That Speaks JSON

4
Comments
1 min read
Gemini 2.5 Flash-Lite: Speed > Scale — 887 TPS, 50% Less Verbosity, Real-World Wins

Gemini 2.5 Flash-Lite: Speed > Scale — 887 TPS, 50% Less Verbosity, Real-World Wins

Comments
1 min read
About context and LLM

About context and LLM

2
Comments
7 min read
How to Build Developer Trust in AI‑Powered Code Generation Through Data‑Driven Feedback and Evaluation

How to Build Developer Trust in AI‑Powered Code Generation Through Data‑Driven Feedback and Evaluation

1
Comments 1
8 min read
How to Improve Cross-Lingual Retrieval Accuracy in Bilingual RAG Chatbots

How to Improve Cross-Lingual Retrieval Accuracy in Bilingual RAG Chatbots

1
Comments 1
9 min read
Granite 4: IBM introduces a line of small but fast LLMs

Granite 4: IBM introduces a line of small but fast LLMs

Comments
2 min read
OpenAI's SORA 2 Release Pattern: What It Means for AI Video

OpenAI's SORA 2 Release Pattern: What It Means for AI Video

Comments
9 min read
The RAG Debugging Playbook: A Step-by-Step Guide to Trace-Level Failures and Fixes

The RAG Debugging Playbook: A Step-by-Step Guide to Trace-Level Failures and Fixes

Comments
10 min read
Synthetic Data for RAG: Safe Generation, Deduplication, and Drift-Aware Curation in 2025

Synthetic Data for RAG: Safe Generation, Deduplication, and Drift-Aware Curation in 2025

2
Comments
10 min read
Why We Need AI Observability

Why We Need AI Observability

1
Comments 1
9 min read
AI Browsers and Prompt Injection: The New Cybersecurity Frontier

AI Browsers and Prompt Injection: The New Cybersecurity Frontier

3
Comments 5
6 min read
🚀 TOON (Token-Oriented Object Notation) — The Smarter, Lighter JSON for LLMs

🚀 TOON (Token-Oriented Object Notation) — The Smarter, Lighter JSON for LLMs

40
Comments 11
3 min read
LLMs: Decoding the Geometry of Alignment

LLMs: Decoding the Geometry of Alignment

Comments
2 min read
loading...