DEV Community

Cover image for AWS re:Invent 2025 - Intelligent vs. Knowledgeable Models through the Lens of Data (AIM358)
Kazuya
Kazuya

Posted on

AWS re:Invent 2025 - Intelligent vs. Knowledgeable Models through the Lens of Data (AIM358)

🦄 Making great presentations more accessible.
This project aims to enhances multilingual accessibility and discoverability while maintaining the integrity of original content. Detailed transcriptions and keyframes preserve the nuances and technical insights that make each session compelling.

Overview

📖 AWS re:Invent 2025 - Intelligent vs. Knowledgeable Models through the Lens of Data (AIM358)

In this video, Or Lenchner, CEO of Bright Data, distinguishes between intelligent and knowledgeable AI models. While LLMs excel at reasoning and can win Math Olympics gold medals, they fail at simple real-time tasks like purchasing products without access to current web data. Bright Data serves 20,000 customers including major foundational model builders, processing 50 billion web pages daily. Lenchner demonstrates how ChatGPT open source model becomes useful only when paired with Bright Data MCP for real-time information, showing salary data retrieval as an example. He predicts 2026 will merge intelligence and knowledge layers, enabling true automation of daily tasks across e-commerce and travel.


; This article is entirely auto-generated while preserving the original presentation content as much as possible. Please note that there may be typos or inaccuracies.

Main Part

Thumbnail 0

The Intelligence Gap: Why Smart AI Models Still Fail at Simple Tasks

Hello everyone. I'm Or Lenchner. I'm the CEO of Bright Data, and today we're going to talk about something that sounds trivial but actually it's not. It's intelligent models versus knowledgeable models and what does that mean?

It's been exactly three years since the launch of ChatGPT-3 and that magical moment. To me, the magic continues. I keep working with these tools like everyone else here, and we're experiencing this same magical moment almost on an hourly basis when we're using the tools, right? These are super intelligent models. They know everything about the world. They actually answer very complicated questions. They're winning gold medals in the Math Olympics, so their intelligence level is extremely high.

By definition, the models that we're using today are the smartest the world has ever seen, and their next version that might come up tomorrow or the next foundational model builder that might launch their model in a week will by definition be smarter. So the intelligence level keeps on growing and expanding, and it's just magical. However, all of you know the feeling that you're using this intelligent model, and at the same time it can be, I don't have any other better word to use, extremely stupid on various things that you actually want to achieve on just a daily routine. You know, buying something or just asking something simple that every five-year-old can answer. But if it's lacking the knowledge, intelligence is not enough, and this is what we're going to talk about today.

Thumbnail 110

Thumbnail 150

We're going to look at it from the data perspective. Pausing this thought for a second, I want to introduce briefly Bright Data and show you the reason or the ability of Bright Data to look into this intelligence versus knowledge and model ecosystem and our unique position into the AI industry. So Bright Data serves 20,000 customers today that are obsessed about getting the data they need from the public web in order to do various things including AI. This includes most of the large language models and foundational models and pretty much every other AI company that builds on top of these models. When they need data for training or even data for real-time, they come to us.

Thumbnail 170

Thumbnail 180

Thumbnail 190

It also means that the largest e-commerce platforms that need to understand what's going on in the world and be competitive when you're coming to their website to buy something are also using our infrastructure to get web data. Same goes for financial services. Think of the largest American banks that need to get smart investment decisions. They need data for that. Cybersecurity as well. To find a threat in this huge ecosystem of the World Wide Web, you actually need to collect a lot of data, and many other industries and vertical use cases that all need web data.

Thumbnail 210

This creates massive scale. We see roughly 50 billion web pages a day. Our customers are using the Bright Data platform to collect data from this huge amount of internet on a daily basis. That's three times more than all of the Google search queries globally, worldwide, every single day. So the scale is massive, and it's not an easy thing to do.

Thumbnail 240

Thumbnail 250

Thumbnail 270

We invest a lot in research and development and are considered to be one of the leading companies in the world when it comes to different technologies around web data collection and web scraping. We also keep a very large portion of the internet that we're seeing, and we are increasing our web archive that just passed 450 billion web pages and growing very, very fast. And to do all of that, we have the largest known pool of bots that are actually using and browsing the web every second of the day to collect this information.

Thumbnail 290

So when we are talking about AI in the form of or in the sense of data, we know what we're talking about because we are the largest vendor for the AI industry when it comes to data. Now when you want to build an intelligent model, you need three major ingredients, and that's kind of obvious today for everyone, right?

One is you need very smart people to write the sophisticated algorithms of the large language model. Then you also need a lot of compute to train the model, and obviously you need a lot of data to feed those massive clusters of GPUs that are needed to train the model.

Thumbnail 330

Thumbnail 350

Now, the trends that we're seeing as the largest web data provider for this industry are as follows. We see that on the algorithm side, the open source and open weight models are already on par with the closed source, closed weight models, right? So you will get almost identical performance on almost all of the evaluation tests out there when you compare open source, open weight to closed source, closed weight. In addition to that, we see that training is becoming cheaper. It's not just the cost of acquiring the compute and the GPUs, it's also what you can do today with previous generations of GPUs. But also the cost of energy and the cost of cooling, which is essential to training large language models, is decreasing, and actually it's going down pretty fast.

Thumbnail 390

So if the starting point to train new models is actually easier than ever before, I'm not saying it's easy, but it's easier than ever before, then what we're seeing as a data company is that the data becomes the major moat. Because if you can all get the same compute and use the same open source, open weight models, then the data matters the most.

Thumbnail 410

Thumbnail 430

Thumbnail 440

Thumbnail 450

Merging Knowledge with Intelligence: Real-Time Data as the Missing Layer for AI in 2026

Now, with that background of our scale and what we're seeing and the fact that we're servicing the vast majority of the foundational model builders, let's get back to the subject we started with, intelligent versus knowledgeable LLMs. Now, an analogy I like to use is physics. Can a theoretical physicist prove a new phenomenon? Just think about the brain of a physicist. The answer is no. What they can do is describe the new phenomenon. They have a theory. They can describe it. They can propose and develop theories and models in order to prove or disprove this theory. But it stops there, so they can be very, very smart, very intelligent, but it stops there.

Thumbnail 460

Thumbnail 470

Thumbnail 480

What they can do afterwards is work with their colleagues, the experimental physicists, to run an experiment in order to provide evidence and prove or disprove this new theory or this new phenomenon. LLMs today work the same. So using the same analogy, I'll ask a different question. Can an LLM purchase the right product for you today? And we'll also talk about the future. But today, with this very intelligent and sophisticated model that you can use, the answer is no. It can't actually do that simple task of, let's say, buying milk that you can pour into the bowl of cereals that you want to eat tomorrow morning at your breakfast. It just can't do that.

Thumbnail 520

Thumbnail 530

So today the models are more like the theoretical physicist, and they're still lacking that knowledge to actually do that experiment in physics, but in the real world to buy that carton of milk so you can eat your cereals. So if it doesn't have the real-time price, just one example of a crucial piece of information for you to use the model to make it both efficient, smart, and useful for you, then it's not there yet.

Thumbnail 550

Thumbnail 580

And as the Internet, the World Wide Web, the last 30 years of information that was built by the largest tech companies out there and all of the data that is generated, is becoming that infrastructure layer for the new Internet, these chatbots that we're all using, we see two major trends that are taking shape in 2025. And something big is going to happen in 2026, which is only a few weeks away, and we're going to cover that as well. But what we're seeing is that everyone is consuming as much data as they can to build the most intelligent models. And if this graph represents the usage growth of data for training, you can see that it is growing and it's not going to stop growing. However, it is growing in a linear manner. And you already have these very sophisticated, intelligent models, right?

Thumbnail 630

Let's use this example of milk again. These models can tell you everything about the ingredients of the milk. They can tell you about the class action lawsuit that the milk company got 10 years ago and what was the outcome. They're perfect for that, but what we're seeing now is an absolute explosion and an exponential growth when it comes to real-time data to add the knowledge layer on top of the intelligent layer to also tell you what is the price of the milk, not just everything about the ingredients of the milk. This is not a vision. This is what we're seeing from our perspective as the largest data vendor for this industry, and we're seeing that not just in milk, obviously everywhere. Just think about the world around you. Think about what you're doing every day.

You want to know everything about the Las Vegas Sphere, but also you want to go to a show. So the intelligent layer of the model will tell you everything about the construction of the Sphere and how much it cost and who was the construction company and everything you want to know about it. But to buy a ticket to tonight's show in the Sphere, you can't do that unless you're adding the knowledge layer and giving it to, allowing the LLM to access that layer. And this is the current trend that is starting to take shape. Again, it's not there yet, but this is the trend that we're seeing, adding knowledge on top of intelligence.

Thumbnail 710

Thumbnail 740

Now I took a very popular evaluation test that is working specifically or testing specifically reasoning and knowledge, exactly what we're here to talk about. And you can see all of the models performing one way or another. And for this event, I was focusing on the ChatGPT open source 120 billion tokens that everyone can use and test as an open source tool, and what we're going to see now is an example for using only that intelligent, and again, very intelligent in this case, the open source of GPT, but lacking that knowledge layer, and that will be on the left-hand side. On the right-hand side, we will see the same model exactly using that knowledge layer. In this case, the Bright Data MCP, so giving that open source LLM access to real-time information from the web.

Thumbnail 770

Thumbnail 780

Thumbnail 800

Thumbnail 810

Let's take a look. It's a bit hard to see from far, so I'll explain. We are asking about salaries from Glassdoor in different companies. We just want to know that because we want to recruit. On the left-hand side, without access to real-time web data, the model kind of says in a very apologetic way, sorry, I can't get this information. So actually completely useless. I just got the most sophisticated model out there that is completely useless for a daily task that I want to perform. On the right-hand side, with the Bright Data MCP, it actually got you a complete table with all of the information you wanted to get, everything, the salaries per position in different companies.

As a human, you can do it in five minutes, but taking that super smart model that can win a gold medal in Math Olympics, you just can't do that without that knowledge layer that gives you access to the web. We're seeing by working with these companies that in 2026, the knowledge layer and the intelligent layer are going to merge. This is something huge for us, for humanity I would even say, because this is where it is actually going to automate everything that we're doing. It's not going to happen in one day, but you're starting to see that in e-commerce, in travel, when you're not just using these models to plan your trip or you're not using these models just to ask a question about the product.

We're starting to see and sense the ability of actually getting the relevant real-time information like the availability of the product, the price of the product, the shipping time of the product to your location, or the price of that flight or hotel room. And again, from our perspective, from the data perspective, we're seeing that in 2026, this is going to unlock a huge opportunity for everyone who's working on AI, with AI, and just for everyone in the world who is using these tools. Because when you combine knowledge with intelligence, it's really unlocking everything that we're doing as humans. It can and will replace a lot of the tasks that we're doing on a daily basis that you don't really need to put in effort anymore. You can use your time doing other things.

I'll finish by inviting everyone again to our very cool and interesting and unique event tonight in the BattleBot Arena. You can visit our booth right after this talk and take all the details, and we'll see you there in a very exciting bot event. Thank you for joining.


; This article is entirely auto-generated using Amazon Bedrock.

Top comments (0)