DEV Community

# gpu

Posts

đź‘‹ Sign in for the ability to sort posts by relevant, latest, or top.
Why I Self-Host 7 RTX 5090 GPUs Instead of Using Cloud AI

Why I Self-Host 7 RTX 5090 GPUs Instead of Using Cloud AI

Comments
6 min read
Hopper/Blackwell Tensor Core Optimization, llama.cpp VRAM Fix & 4W NPU Inference

Hopper/Blackwell Tensor Core Optimization, llama.cpp VRAM Fix & 4W NPU Inference

Comments
3 min read
Why I Self-Host 7 RTX 5090 GPUs Instead of Using AWS

Why I Self-Host 7 RTX 5090 GPUs Instead of Using AWS

Comments
6 min read
8-Bit Quantization Destroyed 92% of Code Generation — The Culprit Wasn't Bit Count

8-Bit Quantization Destroyed 92% of Code Generation — The Culprit Wasn't Bit Count

Comments
5 min read
I Couldn't Build a Local LLM PC for $1,300 — Budget Tiers and the VRAM Cliffs Between Them

I Couldn't Build a Local LLM PC for $1,300 — Budget Tiers and the VRAM Cliffs Between Them

Comments
6 min read
From one model to seven — what it took to make TurboQuant model-portable

From one model to seven — what it took to make TurboQuant model-portable

Comments
3 min read
GPU Power Tools & CUDA Deep Dives for Local LLM Builders

GPU Power Tools & CUDA Deep Dives for Local LLM Builders

Comments
3 min read
Parameter Count Is the Worst Way to Pick a Model on 8GB VRAM

Parameter Count Is the Worst Way to Pick a Model on 8GB VRAM

Comments
5 min read
How Much GPU Memory Does Your LLM Actually Need?

How Much GPU Memory Does Your LLM Actually Need?

Comments
2 min read
I Couldn’t Debug My AI/ML GPU Incident - So I Built gpuxray

I Couldn’t Debug My AI/ML GPU Incident - So I Built gpuxray

Comments
3 min read
What do you want to know about hardware acceleration? Ask the Google team!

What do you want to know about hardware acceleration? Ask the Google team!

8
Comments 1
1 min read
MoE Beat Dense 27B by 2.4x on 8GB VRAM — The 35B-A3B Benchmark Nobody Expected

MoE Beat Dense 27B by 2.4x on 8GB VRAM — The 35B-A3B Benchmark Nobody Expected

Comments
5 min read
The Memory Bandwidth Gap Is 49x and Growing — Why Local LLMs Hit a Ceiling

The Memory Bandwidth Gap Is 49x and Growing — Why Local LLMs Hit a Ceiling

Comments
7 min read
PyRadiomics Inefficiency in Large-Scale Studies Addressed by GPU Acceleration for Faster Processing

PyRadiomics Inefficiency in Large-Scale Studies Addressed by GPU Acceleration for Faster Processing

Comments
8 min read
I Tested TurboQuant KV Cache Compression on Consumer GPUs. Here's What Actually Happened.

I Tested TurboQuant KV Cache Compression on Consumer GPUs. Here's What Actually Happened.

Comments
6 min read
đź‘‹ Sign in for the ability to sort posts by relevant, latest, or top.