DEV Community

# llm

Posts

đź‘‹ Sign in for the ability to sort posts by relevant, latest, or top.
KVQuant: Run 70B LLMs on 8GB RAM with 4-bit KV Cache Quantization

KVQuant: Run 70B LLMs on 8GB RAM with 4-bit KV Cache Quantization

Comments
1 min read
Securing Agentic Workflows: A Deterministic 'Human-in-the-Loop' Pattern for LLMs

Securing Agentic Workflows: A Deterministic 'Human-in-the-Loop' Pattern for LLMs

Comments
5 min read
I just wanted to chat with my Raspberry Pi.

I just wanted to chat with my Raspberry Pi.

Comments
9 min read
Introducing KORA: Open-Source AI Orchestration for Task Graphs

Introducing KORA: Open-Source AI Orchestration for Task Graphs

2
Comments 1
2 min read
Fix Your Prompt Structure Before You Touch Your Infrastructure

Fix Your Prompt Structure Before You Touch Your Infrastructure

Comments
4 min read
The AI Tasks Developers Trust And the Ones They Double-Check

The AI Tasks Developers Trust And the Ones They Double-Check

Comments
11 min read
27/30 Days System Design Questions!

27/30 Days System Design Questions!

1
Comments 4
2 min read
I measured MCP vs a CLI for agent search. The MCP used 17x more tokens per call.

I measured MCP vs a CLI for agent search. The MCP used 17x more tokens per call.

11
Comments 2
6 min read
Why File-to-Markdown Conversion Is Becoming an AI Input Layer

Why File-to-Markdown Conversion Is Becoming an AI Input Layer

Comments 1
7 min read
Function-calling eval was a 2024 problem. Tool-using agents are the 2026 one.

Function-calling eval was a 2024 problem. Tool-using agents are the 2026 one.

1
Comments
5 min read
Running 35B–400B LLMs on a GPU-less Cluster to Mine 10,000 Papers — and the 4 Bugs That Almost Ruined the Data

Running 35B–400B LLMs on a GPU-less Cluster to Mine 10,000 Papers — and the 4 Bugs That Almost Ruined the Data

1
Comments
9 min read
I Compressed GPT-2 to Run on an Arduino

I Compressed GPT-2 to Run on an Arduino

Comments
1 min read
Agent Series (11): A2A Protocol — How Agents Collaborate with Each Other

Agent Series (11): A2A Protocol — How Agents Collaborate with Each Other

Comments
5 min read
TurboQuant on a MacBook Pro, part 2: perplexity, KL divergence, and asymmetric K/V on M5 Max

TurboQuant on a MacBook Pro, part 2: perplexity, KL divergence, and asymmetric K/V on M5 Max

Comments
8 min read
Why I'm Building a Local-First AI Coding Workspace (And How Behavioral Routing Makes It Work)

Why I'm Building a Local-First AI Coding Workspace (And How Behavioral Routing Makes It Work)

Comments
6 min read
đź‘‹ Sign in for the ability to sort posts by relevant, latest, or top.