DEV Community

Cover image for Understanding How Generative AI Works
Vikas_Brilworks
Vikas_Brilworks

Posted on

Understanding How Generative AI Works

The world has been fascinated by generative AI and its potential to change the way we work and live. In 2023, this so-called "generative AI" took center stage, moving from theory to practice.

We have hundreds of apps powered by generative algorithms for many jobs across different industries, from e-commerce to media and entertainment. In large part, customer-facing applications such as ChatGPT have pushed AI into mainstream technology in a short span of time, as these chatbots possess an awe-inspiring ability to mimic human creativity and conversation.

These groundbreaking generative AI examples are powered by generative AI. This branch of AI is the driving force behind today's AI-powered systems that can engage in conversations that are remarkably human-like.

Many of you have heard of generative AI, but do you know how it works? How can it understand our sentiments, emotions (even though it doesn't have emotions), and context? In this article, we will learn how generative works.

What is generative AI?

Generative AI refers to the technology that generates different kinds of content, from text and images to audio and videos. It works by predicting the next content in the sequence and uses the same principle for image, and video generation by predicting the next pixel or region in an image.

A generative AI program can leverage different algorithms (or models) and techniques for generating content, although some may have common techniques.

Let’s take ChatGPT as an example. It leverages GPT models, short for generative pre-trained transformers. GPT models include an architecture or framework called transformers, one kind of neural network.

Neural networks are what powers today’s artificial intelligence world. In simple words, they utilize neural network AI to attain this power. When neural networks are developed and trained in a unique way, they get different names to distinguish them from other architectures.

Let’s take the example of CNNs (Convolutional neural networks), introduced in the 1990s but recognized in 2012, which revolutionized computer vision. GANs (Generative adversarial networks), developed by Ian Goodfellow in 2014, have transformed the field of generative AI. Transformers, introduced in a seminal paper — Attention Is All You Need — by Vaswani et al.- have pushed the boundary of neural networks-these transformers power today’s popular apps, such as Gemini and ChatGPT.

Neural networks, one of the popular generative AI terms that you often encounter, are the backbone of any model. You can find many types of neural networks today, including:

Let’s first understand how Generative AI works using a hypothetical case of generating handwritten digits. We’ll use a Generative Adversarial Network (GAN) as the example, which has two main components: a discriminator and a generator.

An example of generating handwritten digits with GANs
A GAN is a pair of two neural networks: a generator that takes random noise (input) and a discriminator that tries to distinguish between real images from the dataset and the images generated by the generator. The discriminator, which has real images, tries to differentiate between real and generated (fake) images.

The discriminator learns to classify real images as real (label = 1) and fake images as fake (label = 0).

The generator aims to improve its ability to trick the discriminator, while the discriminator becomes better at telling real from fake. This process continues until the discriminator can’t distinguish between the real and generated image.

How Generative Works in Simple Words with Transformers
GPT (generative pre-trained transformers), and Bert (Bidirectional Encoder Representations from Transformers) are a few examples of generative AI tools powered by transformers. Transformers are the backbone of today’s many popular state-of-the-art generative AI tools.

In this example, we will look into how LLMs generate content that seems original using transformers.

Let’s understand how an AI tool can create an article titled “Best Exercise Routines for Busy Professionals” by integrating information from documents about exercise, routines, and busy lifestyles.

The AI processes the text, but first, it breaks the text into smaller segments called “tokens.” Tokens are the smallest units of text and can be as short as a single character or as long as a word.

For example, “Exercise is important daily” becomes [“Exercise,” “is,” “important,” “daily”].

This segmentation helps the model handle manageable chunks of text and comprehend sentence structures.

Next, LLM AI embedding is used. Each token is converted into a vector (a list of numbers) using embeddings.

If you don’t know what vector embedding is, it is a process of converting text into numbers that hold their meaning and relationship.

Transformers, the technology behind today’s most advanced generative AI models, use a sophisticated positional encoding scheme. This “positional encoding” process uniquely represents the position of words in a sequence.

It adds positional information to each word vector, ensuring the model retains the order of words. It also employs an attention mechanism, a process that weighs tokens on their importance.

For example, if the model has read texts about “exercise” and understands its importance for health, and has also read about “busy professionals” needing efficient routines, it will pay to these connections.

Similarly, if it has read about “routines” that are quick and effective, it can link the concept of “exercise” to “busy professionals.”

Now, it connects the information or context from different parts to give a clearer picture of the text’s purpose.

So, even if the original texts never specifically mentioned “exercise routines for busy professionals,” it will generate relevant information by combining the concepts of “exercise,” “routines,” and “busy professionals.”

This is because it has learned the broader contexts around each of these terms.

After the model has analyzed the input using its attention mechanism and other processing steps, it predicts the likelihood (probability) of each word in its vocabulary being the next word in the sequence of text it’s generating. This helps the model decide what word should come next based on what it has learned from the input.

It might determine that after words like “best” and “exercise,” the word “routines” is likely to come next. Similarly, it might associate “routines” with the interests of “busy professionals.”

How transformers and attention work
Transformers, developed by engineers at Google, are revolutionizing the generative AI field.

As we have seen in the above example, they leverage the attention mechanism for tasks like language modeling and classification.

It includes an encoder that processes input sequences token by token and converts these tokens into vector representations.

Now, the self-attention mechanism enables the model to weigh the importance of each word (token) in the context of other words in the sequence.

There are typically three types of attention mechanisms in transformers:

Self-attention, to understanding the dependencies and relationships between the sequence.
Encoder-decoder attention, used in sequence-to-sequence tasks, decoder decodes the encoder’s output.
Multi-headed attention improves learning.
However, the interesting thing is that the transformer does not inherently understand the order of tokens; therefore, it employs positional encoding, which provides information about the token’s position in the sequence.

Attention mechanisms in transformers enable parallel computation across tokens in a sequence, making transformers incredibly powerful in tasks such as machine translation, text generation, and sentiment analysis.

How to Evaluate Generative AI Models?
However, if you want to leverage the capabilities of these generative AI models, there are some methods you can use to check their BLEU or ROUGUE score.

Metrics like BLEU (for language generation) or FID (Fréchet Inception Distance, for image generation) are used to quantitatively measure the similarity and quality of generated outputs compared to ground truth or reference data.

Conclusion
Artificial intelligence generation has revolutionized numerous industries, from healthcare to finance, by enabling machines to learn and adapt autonomously. This advanced technology facilitates efficient data analysis and predictive modeling, driving innovation and enhancing decision-making processes across various sectors.

As AI continues to evolve, its capacity for generating insights and solutions promises to redefine the future landscape of technology and business operations.

Today, AI is all the rage, adding to the potential of traditional technologies. From businesses to daily lifestyles, AI is popping up everywhere. In this article, we have learned how generative AI works and the methods and technology that make it this powerful. We have also provided an example of how a generative tool quickly writes an article.

Top comments (0)