DEV Community

Jay
Jay

Posted on

DO YOU KNOW HOW BIG IS GPT-4?

1.8 TRILLION parameters across 120 layers, making it 10 times larger than GPT-3!

16 EXPERTS within the model, each with 111 BILLION parameters for MLP!

13 TRILLION tokens of training data, including text-based and code-based data, with some fine-tuning from ScaleAI and internally!

$63 MILLION in training costs, taking into account computational power and training time!

3 TIMES MORE expensive to run than the 175B parameter Davinci, due to larger clusters and lower utilization rates!

128 GPUs for inference, using 8-way tensor parallelism and 16-way pipeline parallelism!

VISION ENCODER for autonomous agents to read web pages, transcribe images, and videos, adding more parameters and fine-tuned with 2 TRILLION tokens!

And, get this... GPT-5 might have 10 TIMES THE PARAMETERS of GPT-4! That means even larger embedding dimensions, more layers, and double the number of experts!

Top comments (0)