loading...

Explain GPT-3 Like I'm Five

ben profile image Ben Halpern ・1 min read

GPT-3 was recently released and has been trending in different circles.

Generative Pretrained Transformer 3, commonly known by its abbreviated form GPT-3, is an unsupervised Transformer language model and the successor to GPT-2. It was first described in May 2020. OpenAI stated that full version of GPT-3 contains 175 billion parameters, two orders of magnitude larger than the 1.5 billion parameters in the full version of GPT-2 (although GPT-3 models with as few as 125 million parameters were also trained).

OpenAI stated that GPT-3 succeeds at certain "meta-learning" tasks. It can generalize the purpose of a single input-output pair. The paper gives an example of translation and cross-linguistic transfer learning between English and Romanian, and between English and German.

GPT-3 dramatically improved benchmark results over GPT-2. OpenAI cautioned that such scaling up of language models could be approaching or encountering fundamental capability limitations. Pre-training GPT-3 required several thousand petaflop/s-days of compute, compared to tens of petaflop/s-days for the full GPT-2 model…

Can anyone offer their most straightforward explanation for some of the concepts involved?

Discussion

markdown guide
 

GPT-3 is a type of Neural Language Model. It is used for solving various Natural Language Processing like Question Answering, Text Summarisation, Machine Translation (basically conversion from one language to other) and many more tasks. It is the most powerful language model built till now which can be seen by the amount of parameters present in it and the time required to train it. Neural language model means that it is a type of Neural network. It is a few shot learning model which means that it requires very less data to perform a new task eg- if it is trained to translate from English to French it will require just a 100 sentences to train it to translate from English to German or any other unknown language.

It is gaining great attention because it can generate text as well as code after some fine tuning. It is able to generate code for working react apps. It is also able to write poems, stories etc just by giving it a few sentences. That's the power of few shot learning.OpenAi has released its api for developers to try and build new tools.

It is also based on Transformers like other language models like BERT, ERNIE etc. It is pretrained using very large dataset and unsupervised learning technique is used for training of GPT-3. The concept of transfer learning which was popular in image classification is now implemented in the field of NLP.

 

can it spun article like we have an article from medium for example and we want to generate a new article based on that article is that possible?

 

It just needs a little bit of hint(prompt) if you give it a prompt like a write an article on Albert Einstein or write an article on JavaScript it will do that for you.
So yes it can write an article based on a previous article for you.

 

Imagine a 2D map of the United States. Now, imagine I have a function that takes a place name and provides the lat/long of that place on the map. So, I pass "Miami, FL" and "Ft. Lauderdale, FL" and I get coordinates whose distances are closer than "New York City" and "London". And using this function I could implement functionality on top of it -- I could give an estimate of driving time for closer locations (for example) -- it would obviously break down for some places, but it would be good enough for some applications.

The goals of models like GPTx are to define a coordinate system (many many dimensions), such that text that maps to similar concepts are closer together. And the relative distance maps to relative closeness and direction and combinations seem to also map as they would on a physical map. It seems to preserve "stylistic" attributes -- closeness in style have close coordinates.

Once you have a function that can produce a coordinate like this, you can build applications on top of it. Like the driving time application above, sometimes it works well, and sometimes it breaks down.

 

You have never met a 5 year old have you?

 

Here's an example of GPT-3 api used to create code just from description.

 

It's GPT-2, but it is larger so it doesn't need as much training data.

 

It's a machine learning model that works like a magic.

 

man i really need this thread bookmarked