DEV Community

# llm

Posts

👋 Sign in for the ability to sort posts by relevant, latest, or top.
The Most Dangerous Bias of Your AI Assistant Is That It Agrees With You

Transcript history as a reward signal

The Most Dangerous Bias of Your AI Assistant Is That It Agrees With You

8
Comments 13
6 min read
When I started running models locally, I thought quantization meant squeezing more into RAM. Turns o

When I started running models locally, I thought quantization meant squeezing more into RAM. Turns o

Comments 1
1 min read
Building Lookspan: local-first observability & replay for LLM apps (v0.4.0)

Building Lookspan: local-first observability & replay for LLM apps (v0.4.0)

Comments
2 min read
MachinaCheck: Manufacturing Agents That Actually Ship

MachinaCheck: Manufacturing Agents That Actually Ship

Comments
2 min read
Lost in the Middle: Why LLMs Quietly Ignore the Centre of Their Own Context Window

Lost in the Middle: Why LLMs Quietly Ignore the Centre of Their Own Context Window

Comments
3 min read
Stop AI Hallucinations: How to Make Natural Language Testing Real with "Harness Engineering"

Stop AI Hallucinations: How to Make Natural Language Testing Real with "Harness Engineering"

Comments 1
9 min read
Compass v1.1.0 · we shipped a memory plugin that catches its own consumption drift

Compass v1.1.0 · we shipped a memory plugin that catches its own consumption drift

Comments
5 min read
We Built a 'Grovel Index' to Measure LLM Sycophancy —Here's What We Found

We Built a 'Grovel Index' to Measure LLM Sycophancy —Here's What We Found

2
Comments 3
5 min read
LangChain vs LangGraph: Why AI Agents Need Stateful Orchestration

LangChain vs LangGraph: Why AI Agents Need Stateful Orchestration

Comments 1
4 min read
Why “Local Document AI” Is Really an OCR + RAG + Local Inference Problem

Why “Local Document AI” Is Really an OCR + RAG + Local Inference Problem

5
Comments
4 min read
Prompt Engineering Is Table Stakes. Context Engineering Is the Next Frontier.

Prompt Engineering Is Table Stakes. Context Engineering Is the Next Frontier.

1
Comments
3 min read
Two Pre-Registered Benchmarks for Audit-Native RAG: RAB (EU AI Act 10/12/19) + LRB (Time-Travel Retrieval)

Two Pre-Registered Benchmarks for Audit-Native RAG: RAB (EU AI Act 10/12/19) + LRB (Time-Travel Retrieval)

1
Comments
3 min read
OWASP Top 10 for LLMs: A Practitioner’s Implementation Guide

OWASP Top 10 for LLMs: A Practitioner’s Implementation Guide

Comments
9 min read
What is an LLM evaluation harness? A deep dive into lm-eval-harness

What is an LLM evaluation harness? A deep dive into lm-eval-harness

1
Comments
7 min read
DeepClaude Merges Two AI Models Into One Agent Loop

DeepClaude Merges Two AI Models Into One Agent Loop

Comments
6 min read
👋 Sign in for the ability to sort posts by relevant, latest, or top.