DEV Community

# llm

Posts

👋 Sign in for the ability to sort posts by relevant, latest, or top.
Why RAG is the Must-Have AI Skill in 2026: 11 Types Explained!

Why RAG is the Must-Have AI Skill in 2026: 11 Types Explained!

Comments
14 min read
You've Been Breached for 3 Weeks. Your SIEM Has No Idea.

You've Been Breached for 3 Weeks. Your SIEM Has No Idea.

2
Comments
6 min read
Stream LLM responses in a voice pipeline: Tool calling, structured outputs, and real-time actions

Stream LLM responses in a voice pipeline: Tool calling, structured outputs, and real-time actions

Comments
10 min read
GPT-5.5 Pro เทียบกับ Instant: คุ้มค่าไหมเมื่อราคาต่าง 6 เท่า

GPT-5.5 Pro เทียบกับ Instant: คุ้มค่าไหมเมื่อราคาต่าง 6 เท่า

Comments
6 min read
AI Agents vs. Traditional Automation: When to Use Each

AI Agents vs. Traditional Automation: When to Use Each

Comments
13 min read
The Softmax Bottleneck: Why Making LLMs Bigger Doesn't Always Make Them Smarter

The Softmax Bottleneck: Why Making LLMs Bigger Doesn't Always Make Them Smarter

1
Comments
4 min read
Counterintuitive: WSL2 + vllm cannot fit Qwen2.5-7B-1M on 6GB VRAM where Windows transformers can

Counterintuitive: WSL2 + vllm cannot fit Qwen2.5-7B-1M on 6GB VRAM where Windows transformers can

Comments
2 min read
Why I Chose Free AI Models Over GPT-4 for Code Generation (And What Happened)

Why I Chose Free AI Models Over GPT-4 for Code Generation (And What Happened)

1
Comments
5 min read
Model Showdown Round 4: Opus vs Qwen — Writers, Not Coders

Model Showdown Round 4: Opus vs Qwen — Writers, Not Coders

Comments
10 min read
SubQ Model: Can Subquadratic Make Long-Context AI More Efficient?

SubQ Model: Can Subquadratic Make Long-Context AI More Efficient?

1
Comments
9 min read
🔬 Direction 1 closure on JAMES — when the hypothesis fails but the data turns "7-tier monotonic natural-stop gradient"

Gemma 4 Challenge: Write about Gemma 4 Submission

🔬 Direction 1 closure on JAMES — when the hypothesis fails but the data turns "7-tier monotonic natural-stop gradient"

1
Comments
2 min read
TokenSpeed and the Quiet Race to Make LLM Inference Boring

TokenSpeed and the Quiet Race to Make LLM Inference Boring

1
Comments 1
5 min read
ExLlamaV3 Updates, Unsloth Qwen GGUFs & Phi3 Autonomous Bridge

ExLlamaV3 Updates, Unsloth Qwen GGUFs & Phi3 Autonomous Bridge

Comments 1
3 min read
Part 8 — Token-by-Token: Why AI Generates Text One Word at a Time (And Why It Costs 4x More)

Part 8 — Token-by-Token: Why AI Generates Text One Word at a Time (And Why It Costs 4x More)

Comments 1
9 min read
How Large Language Models Work — From Transformers to Conversational AI

How Large Language Models Work — From Transformers to Conversational AI

Comments
4 min read
👋 Sign in for the ability to sort posts by relevant, latest, or top.