DEV Community

Ben Halpern
Ben Halpern

Posted on

Explain GPT-3 Like I'm Five

GPT-3 was recently released and has been trending in different circles.

First described in May 2020, Generative Pre-trained<span class="mw-reflink-text">[lower-alpha 1]</span> Transformer 3 (GPT-3) is an unsupervised transformer language model and the successor to GPT-2.<span class="mw-reflink-text">[109]</span><span class="mw-reflink-text">[110]</span><span class="mw-reflink-text">[111]</span> OpenAI stated that full version of GPT-3 contains 175ย billion parameters,<span class="mw-reflink-text">[111]</span> two orders of magnitude larger than the 1.5ย billion parameters<span class="mw-reflink-text">[112]</span> in the full version of GPT-2 (although GPT-3 models with as few as 125 million parameters were also trained).<span class="mw-reflink-text">[113]</span>

OpenAI stated that GPT-3 succeeds at certain "meta-learning" tasks. It can generalize the purpose of a single input-output pair. The paper gives an example of translation and cross-linguistic transfer learning between English and Romanian, and between English and German.<span class="mw-reflink-text">[111]</span>

GPT-3 dramatically improved benchmark results over GPT-2. OpenAI cautioned that such scaling up of language models could be approaching or encountering the fundamental capability limitations of predictive language models.<span class="mw-reflink-text">[114]</span> Pre-training GPT-3โ€ฆ

Can anyone offer their most straightforward explanation for some of the concepts involved?

Top comments (9)

Collapse
 
amananandrai profile image
amananandrai • Edited

GPT-3 is a type of Neural Language Model. It is used for solving various Natural Language Processing like Question Answering, Text Summarisation, Machine Translation (basically conversion from one language to other) and many more tasks. It is the most powerful language model built till now which can be seen by the amount of parameters present in it and the time required to train it. Neural language model means that it is a type of Neural network. It is a few shot learning model which means that it requires very less data to perform a new task eg- if it is trained to translate from English to French it will require just a 100 sentences to train it to translate from English to German or any other unknown language.

It is gaining great attention because it can generate text as well as code after some fine tuning. It is able to generate code for working react apps. It is also able to write poems, stories etc just by giving it a few sentences. That's the power of few shot learning.OpenAi has released its api for developers to try and build new tools.

It is also based on Transformers like other language models like BERT, ERNIE etc. It is pretrained using very large dataset and unsupervised learning technique is used for training of GPT-3. The concept of transfer learning which was popular in image classification is now implemented in the field of NLP.

Collapse
 
spiritupbro profile image
spiritupbro

can it spun article like we have an article from medium for example and we want to generate a new article based on that article is that possible?

Collapse
 
amananandrai profile image
amananandrai

It just needs a little bit of hint(prompt) if you give it a prompt like a write an article on Albert Einstein or write an article on JavaScript it will do that for you.
So yes it can write an article based on a previous article for you.

Collapse
 
loufranco profile image
Lou Franco

Imagine a 2D map of the United States. Now, imagine I have a function that takes a place name and provides the lat/long of that place on the map. So, I pass "Miami, FL" and "Ft. Lauderdale, FL" and I get coordinates whose distances are closer than "New York City" and "London". And using this function I could implement functionality on top of it -- I could give an estimate of driving time for closer locations (for example) -- it would obviously break down for some places, but it would be good enough for some applications.

The goals of models like GPTx are to define a coordinate system (many many dimensions), such that text that maps to similar concepts are closer together. And the relative distance maps to relative closeness and direction and combinations seem to also map as they would on a physical map. It seems to preserve "stylistic" attributes -- closeness in style have close coordinates.

Once you have a function that can produce a coordinate like this, you can build applications on top of it. Like the driving time application above, sometimes it works well, and sometimes it breaks down.

Collapse
 
cullophid profile image
Andreas Mรธller

You have never met a 5 year old have you?

Collapse
 
amananandrai profile image
amananandrai

Here's an example of GPT-3 api used to create code just from description.

Collapse
 
shadowtime2000 profile image
shadowtime2000

It's GPT-2, but it is larger so it doesn't need as much training data.

Collapse
 
mayankjoshi profile image
mayank joshi

It's a machine learning model that works like a magic.

Collapse
 
spiritupbro profile image
spiritupbro

man i really need this thread bookmarked