DO YOU KNOW HOW BIG IS GPT-4?

#ai #llm #discuss #machinelearning

1.8 TRILLION parameters across 120 layers, making it 10 times larger than GPT-3!

16 EXPERTS within the model, each with 111 BILLION parameters for MLP!

13 TRILLION tokens of training data, including text-based and code-based data, with some fine-tuning from ScaleAI and internally!

$63 MILLION in training costs, taking into account computational power and training time!

3 TIMES MORE expensive to run than the 175B parameter Davinci, due to larger clusters and lower utilization rates!

128 GPUs for inference, using 8-way tensor parallelism and 16-way pipeline parallelism!

VISION ENCODER for autonomous agents to read web pages, transcribe images, and videos, adding more parameters and fine-tuned with 2 TRILLION tokens!

And, get this... GPT-5 might have 10 TIMES THE PARAMETERS of GPT-4! That means even larger embedding dimensions, more layers, and double the number of experts!

Top comments (0)

Jozu Hub vs. Docker Hub? Which One Works Best for AI/ML?

Jesse Williams - Nov 22

Thoroughly experimented with Fine-Tuning / DreamBooth training of Flux-dev-de-distill, PixelWave v03, Verus Vision

Furkan Gözükara - Oct 31

Exploring Java's Role in Cloud Computing and AI for 2024

Teo Nordic - Oct 31

Thursday Quiz

Scofield Idehen - Oct 31

DEV Community