DEV Community

# llm

Posts

đź‘‹ Sign in for the ability to sort posts by relevant, latest, or top.
Running OpenAI's gpt-oss-20b with 128k Context on a Single L4 GPU

Running OpenAI's gpt-oss-20b with 128k Context on a Single L4 GPU

Comments
13 min read
Your AI speed benchmark is measuring the one workload you don't run

Your AI speed benchmark is measuring the one workload you don't run

Comments
3 min read
Your Tech Stack Has an AI Problem: How to Audit and Fix It in 2026

Your Tech Stack Has an AI Problem: How to Audit and Fix It in 2026

Comments
8 min read
RAG Series (21): Performance Optimization — Faster and Cheaper

RAG Series (21): Performance Optimization — Faster and Cheaper

Comments
7 min read
How the itrstats tax assistant works: one query, every layer

How the itrstats tax assistant works: one query, every layer

Comments
10 min read
The Shai-Hulud Worm Is Now Open Source — Here's How to Stop Self-Replicating Prompts Before They Reach Your LLM

The Shai-Hulud Worm Is Now Open Source — Here's How to Stop Self-Replicating Prompts Before They Reach Your LLM

1
Comments
5 min read
The Hype Correction

The Hype Correction

2
Comments
4 min read
Building llama.cpp from source on a Dell Precision T5820 with an RTX 3090 Ti (after seven power cycles)

Building llama.cpp from source on a Dell Precision T5820 with an RTX 3090 Ti (after seven power cycles)

Comments
16 min read
The LLM Kept Saying “Fixed.” For Three Months, It Wasn’t.

The LLM Kept Saying “Fixed.” For Three Months, It Wasn’t.

Comments
7 min read
Inference Arbitrage: How I Route 200+ Daily LLM Calls Across Five Models

Inference Arbitrage: How I Route 200+ Daily LLM Calls Across Five Models

Comments
10 min read
Three Months of Speed-Up Experiments on a 3090 Ti: Autoregressive DFlash MTP for Qwen3.6-27B

Three Months of Speed-Up Experiments on a 3090 Ti: Autoregressive DFlash MTP for Qwen3.6-27B

Comments
18 min read
LLM Benchmark Rankings 2026: 15 Models Tested on 38 Real Coding Tasks

LLM Benchmark Rankings 2026: 15 Models Tested on 38 Real Coding Tasks

Comments
28 min read
How I Track Claude, Codex, and Gemini Quotas from One Script

How I Track Claude, Codex, and Gemini Quotas from One Script

Comments
6 min read
Designing a Multi-Agent AI System for Content Analysis and Recommendations

Designing a Multi-Agent AI System for Content Analysis and Recommendations

Comments
7 min read
A Practical Model Selection Matrix for Multi-Model AI Apps

A Practical Model Selection Matrix for Multi-Model AI Apps

Comments 1
2 min read
đź‘‹ Sign in for the ability to sort posts by relevant, latest, or top.