DEV Community

# llm

Posts

đź‘‹ Sign in for the ability to sort posts by relevant, latest, or top.
GPU autoscaling on Kubernetes with KEDA: building an external scaler with NVML

GPU autoscaling on Kubernetes with KEDA: building an external scaler with NVML

Comments
3 min read
Why Your AI Agent Keeps Overreaching — And How to Fix It with a Boundary Contract

Why Your AI Agent Keeps Overreaching — And How to Fix It with a Boundary Contract

Comments
4 min read
New `llama.cpp` Updates, AI Agents for Any LLM, and Quantized Vector Index for Local Inference

New `llama.cpp` Updates, AI Agents for Any LLM, and Quantized Vector Index for Local Inference

Comments
3 min read
I wired 908 creator dossiers into my Substack commenter. Here is what changed.

I wired 908 creator dossiers into my Substack commenter. Here is what changed.

Comments
3 min read
Token Consumption Optimization in LLM Applications

Token Consumption Optimization in LLM Applications

1
Comments
2 min read
The 20% of your AI agent's tool schemas that's pure cruft (and the one-liner to strip it)

The 20% of your AI agent's tool schemas that's pure cruft (and the one-liner to strip it)

Comments
2 min read
We stopped Googling and started Prompting

We stopped Googling and started Prompting

Comments
4 min read
Compass v1.1.0 · we shipped a memory plugin that catches its own consumption drift

Compass v1.1.0 · we shipped a memory plugin that catches its own consumption drift

Comments
5 min read
Running Local LLMs Without Burning Out Your GPU

Running Local LLMs Without Burning Out Your GPU

Comments
3 min read
From Code Completion to Autonomous Reasoning: What the Oceanus Leak Tells Us About the Future of AI Software Engineering

From Code Completion to Autonomous Reasoning: What the Oceanus Leak Tells Us About the Future of AI Software Engineering

Comments
7 min read
Building a Practical AI Assistant with Python: From Prompt to Production Thinking

Building a Practical AI Assistant with Python: From Prompt to Production Thinking

3
Comments 2
3 min read
How to Handle LLM API Errors & Rate Limits in Node.js

How to Handle LLM API Errors & Rate Limits in Node.js

Comments
4 min read
I measured the token cost of 13 real AI agents (GitHub's MCP server alone is 3,546 tokens/turn)

I measured the token cost of 13 real AI agents (GitHub's MCP server alone is 3,546 tokens/turn)

Comments 1
2 min read
AIchain Reasoning: One Parameter for Every Provider

AIchain Reasoning: One Parameter for Every Provider

Comments
5 min read
MarginGate: Margin-Gated Verification for Batch-Invariant Decoding

MarginGate: Margin-Gated Verification for Batch-Invariant Decoding

Comments
5 min read
đź‘‹ Sign in for the ability to sort posts by relevant, latest, or top.