plasmon - DEV Community

plasmon

Apr 23

Why Local LLM JSON Output Breaks — Failure Patterns and How to Fix Them in Code

#llm #ai #programming #python

3 min read

plasmon

Apr 23

INT8 Hits 58x, Voltage Underscaling Saves 36% — Semiconductor Physics Limits Are Being Bypassed by Software in 2026

#semiconductor #ai #hardware #gpu

4 min read

plasmon

Apr 21

I Ran an LLM Agent on 8GB VRAM — It Broke After 5 Tool Calls

#ai #llm #gpu #programming

6 min read

plasmon

Apr 21

The Memory Wall Can't Be Killed — 3 Papers Proving Every Architecture Hits It

#semiconductor #ai #hardware #llm

5 min read

plasmon

Apr 21

The Physics Wall in 2026: 3 Papers That Show Why Node Shrinks Won't Save Us

#semiconductor #hardware #ai #gpu

8 min read

plasmon

Apr 18

20260325_llamacpp_options_8gb_en

7 min read

plasmon

Apr 17

20260325_vram_expansion_physics_en

9 min read

plasmon

Apr 16

20260325_llm_framework_comparison_en

6 min read

plasmon

Apr 14

20260324_ai_bubble_8gb_en

#discuss #ai #news #startup

7 min read

plasmon

Apr 14

20260324_snn_vs_gpu_en

#ai #llm #machinelearning #performance

6 min read

plasmon

Apr 14

VRAMを増やせば解決する、は物理的に間違っている — HBM・CXL・Unified Memoryが取れなかったもの

#llm #gpu #vram

4 min read

plasmon

Apr 14

llama.cppの設定で8GBの性能が5倍変わる — 主要オプションの最適値を出した

#llm #llamacpp #gpu

4 min read

plasmon

Apr 14

20260323_prompt_anatomy_en

#ai #llm #machinelearning #promptengineering

5 min read

plasmon

Apr 14

20260323_heterogeneous_integration_en

#ai #architecture #computerscience #performance

11 min read

plasmon

Apr 8

Ollama, LM Studio, and GPT4All Are All Just llama.cpp — Here's Why Performance Still Differs

#llm #machinelearning #ai #python

6 min read

plasmon

Apr 8

99.8% of LLM Inference Power Isn't Spent on Computation

#llm #gpu #hardware #ai

7 min read

plasmon

Apr 8

Q4 KV Cache Fit 32K Context into 8GB VRAM — Only Math Broke

#llm #quantization #vram #localllm

8 min read

plasmon

Apr 8

HBM4 Didn't Break the Memory Wall — It Just Moved It

#semiconductor #llm #hardware #ai

6 min read

plasmon

Apr 7

Running Just One LLM on 8GB VRAM Is a Waste

#llm #machinelearning #python #ai

8 min read

plasmon

Apr 7

Light Just Cut KV Cache Memory Traffic to 1/16th

#llm #photonics #semiconductor #inference

7 min read

plasmon

Apr 7

ツール呼び出しでも大きいモデルは勝てなかった

#llm #ai #python

4 min read

plasmon

Apr 7

RAGの検索精度を3軸で測ったら最適解が条件で全く変わった

#rag #llm #python #ai

1

3 min read

plasmon

Apr 6

They Routed Power Through the Back of the Chip and 30% IR Drop Vanished

#semiconductor #hardware #ai #gpu

6 min read

plasmon

Apr 6

Letting AI Control RAG Search Improved Accuracy by 79%

#ai #rag #llm #machinelearning

6 min read

plasmon

Apr 5

If Memory Could Compute, Would We Still Need GPUs?

#ai #semiconductor #hardware #gpu

6 min read

plasmon

Apr 4

I Couldn't Build a Local LLM PC for $1,300 — Budget Tiers and the VRAM Cliffs Between Them

#llm #gpu #localllm #vram

1

6 min read

plasmon

Apr 4

8-Bit Quantization Destroyed 92% of Code Generation — The Culprit Wasn't Bit Count

#ai #llm #machinelearning #gpu

5 min read

plasmon

Apr 3

The Recursive Loop Has Started: AI Is Now Designing AI Chips

#a #i

1

7 min read

plasmon

Apr 2

ML Hit 99% Accuracy on Yield Prediction — The Factory Floor Ignored It

#semiconductor #machinelearning #manufacturing #deeplearning

8 min read

plasmon

Apr 2

3 Classifiers, 3 Answers: Why CoT Faithfulness Scores Are Meaningless

#llm #ai #machinelearning #deeplearning

6 min read

plasmon

Apr 2

Parameter Count Is the Worst Way to Pick a Model on 8GB VRAM

#llm #locallm #gpu #llamacpp

5 min read

plasmon

Mar 31

The Memory Bandwidth Gap Is 49x and Growing — Why Local LLMs Hit a Ceiling

#hardware #ai #machinelearning #gpu

7 min read

plasmon

Mar 31

MoE Beat Dense 27B by 2.4x on 8GB VRAM — The 35B-A3B Benchmark Nobody Expected

#llm #machinelearning #ai #gpu

5 min read

plasmon

Mar 30

I Designed a Memory System for Claude Code — 'Forgetting' Was the Hardest Part

#ai #claudecode #llm #programming

6 min read

plasmon

Mar 30

80% of LLM 'Thinking' Is a Lie — What CoT Faithfulness Research Actually Shows

#llm #machinelearning #deeplearning #ai

7 min read

plasmon

Mar 29

80% of LLM 'Thinking' Is a Lie — What CoT Faithfulness Research Actually Shows

#llm #machinelearning #deeplearning #ai

7 min read

plasmon

Mar 27

Can Spiking Neural Networks Kill the GPU? 3 Papers Show the Reality

#semiconductor #deeplearning #ai #hardware

1

6 min read

plasmon

Mar 26

I Let Claude Code Run My Tech Blog. A Fake Article Passed Every Quality Check.

#ai #llm #automation #claudecode

2

10 min read

plasmon

Mar 26

How Many Nanometers Until Physics Says No? The 3 Walls Beyond 2nm, Read Through Papers in 2026

#semiconductor #hardware #deeplearning #ai

1

14 min read

plasmon

Mar 26

Still Picking API vs Local LLM by Gut Feeling? A Framework With Real Benchmarks

#llm #ai #programming #productivity

6 min read

plasmon

Mar 25

I Tried Speculative Decoding on RTX 4060 8GB — Every Config Was Slower Than Baseline

#llm #gpu #benchmark #ai

1

8 min read

plasmon

Mar 25

Stop Letting AI Be Nice — LLM Sycophancy Mode Is Killing Your Engineering Thinking

#ai #chatgpt #productivity #programming

1

2

6 min read

plasmon

Mar 25

Stop Letting AI Be Nice — LLM Sycophancy Mode Is Killing Your Engineering Thinking

#ai #chatgpt #productivity #programming

2

6 min read

plasmon

Mar 24

95% of LLM Inference Energy Is Wasted on Data Movement — Why Optical Interconnects (CPO) Can't Fix It

#ai #hardware #semiconductor #physics

7 min read

plasmon

Mar 24

3D Chip Stacking Hits a 200μm Warpage Wall — Why Your Next GPU Memory Might Crack

#s #e #m #i

11 min read

plasmon

Mar 23

I Pitted 3 Qwen3.5 Models Against Each Other on an RTX 4060 8GB — What Spec Sheets Don't Tell You

#llm #locallm #gpu #benchmark

8 min read

plasmon

Mar 22

What Happens When You Bring LLMs Into a Semiconductor FAB — 5 ArXiv Papers, Brutally Honest Reviews

#semiconductor #llm #manufacturing #ai

9 min read

plasmon

Mar 22

I Built a Fully Local Paper RAG on an RTX 4060 8GB — BGE-M3 + Qwen2.5-32B + ChromaDB

#rag #llm #python #embeddings

10 min read

plasmon

Mar 22

Running Qwen2.5-32B on RTX 4060 8GB — Beating M4 at 10.8 t/s with llama.cpp

#llm #gpu #benchmark #ai

2

7 min read

1 Week Community Wellness Streak

Writing Debut

Want to connect with plasmon?