Skip to content
Navigation menu
Search
Powered by Algolia
Search
Log in
Create account
DEV Community
Close
Distributed LLM Series' Articles
Back to Lewis Won's Series
Demystifying GPUs: From Core Architecture to Scalable Systems
Lewis Won
Lewis Won
Lewis Won
Follow
Jul 20 '25
Demystifying GPUs: From Core Architecture to Scalable Systems
#
nvidia
#
gpu
#
architecture
#
cuda
83
reactions
Comments
2
comments
12 min read
From Scatter to All-Reduce: A Plain-English Guide to Collective Operations
Lewis Won
Lewis Won
Lewis Won
Follow
Jul 25 '25
From Scatter to All-Reduce: A Plain-English Guide to Collective Operations
#
programming
#
distributedsystems
13
reactions
Comments
Add Comment
21 min read
ZeRO by hand with a 4-parameter model
Lewis Won
Lewis Won
Lewis Won
Follow
Aug 1 '25
ZeRO by hand with a 4-parameter model
#
distributedsystems
#
llm
#
machinelearning
#
ai
1
reaction
Comments
1
comment
23 min read
Tensor parallelism by hand
Lewis Won
Lewis Won
Lewis Won
Follow
Aug 23 '25
Tensor parallelism by hand
#
machinelearning
#
pytorch
#
distributedsystems
#
llm
14
reactions
Comments
Add Comment
28 min read
We're a place where coders share, stay up-to-date and grow their careers.
Log in
Create account