Kazuya

Posted on Dec 8, 2025

AWS re:Invent 2025-From idea to instant production readiness:building AI agents on Cloudflare-AIM119

🦄 Making great presentations more accessible.
This project enhances multilingual accessibility and discoverability while preserving the original content. Detailed transcriptions and keyframes capture the nuances and technical insights that convey the full value of each session.

Note: A comprehensive list of re:Invent 2025 transcribed articles is available in this Spreadsheet!

Overview

📖 AWS re:Invent 2025-From idea to instant production readiness:building AI agents on Cloudflare-AIM119

In this video, David from Cloudflare addresses the challenge that Gen AI projects typically take 29 weeks and cost $1 million, with 30% being canceled. He introduces Cloudflare's developer platform, which operates across 335+ owned data centers with built-in security and performance. The platform now serves 4 million developers and was rated #1 by Stack Overflow. Cloudflare bet on inference over training infrastructure, installing GPUs in 210+ cities. With 76% of developers now using AI for coding, Cloudflare focuses on agentic AI automation. Key advantages include pay-per-use GPU billing (excluding wait time), global deployment proximity, and rapid development—PayPal built their MCP server in just 3 days. Cloudflare partners with Anthropic on the MCP protocol.

; This article is entirely auto-generated while preserving the original presentation content as much as possible. Please note that there may be typos or inaccuracies.

Main Part

The Million-Dollar Problem: Why Gen AI Projects Fail and How Cloudflare's Developer Platform Solves It

Thanks, Robbie. All right. Hey, everybody. I'm David from Cloudflare. Who here has heard of Cloudflare? Okay, the majority of you. Who here has heard about the Cloudflare developer platform? Okay, a couple of you guys. Okay, so this is going to be valuable because I'm on the developer platform side. So let's actually go into how that's going to help you guys, all right?

There's a big problem. So are the majority of you guys developers, engineering managers? Yeah? Okay. Cool. So you'll probably relate to this problem. Your VP, your CEO has assigned you this project. You've got to make a Gen AI application, right? And you're like, great, I'll do that. On average, it takes about 29 weeks to complete one of these Gen AI projects. That's over six months. During that time, the labor of your engineers, the cost to do some POCs, the inference costs, all of that, it equates to about a million dollars according to Gartner, all right, for this one Gen AI project.

We're at the end of the year now. And guess what? 30% of those Gen AI projects that were started earlier this year, they've been canceled, done, removed. And so you as an engineering manager, a developer, or a product manager, you, of course, do a debrief with your team, like, hey, what happened, right? One, why did it cost like a million dollars and it took us like six months to get this off the ground? And one of the things that developers typically say in these postmortems is that half their time wasn't even spent on developing the application. It was on provisioning servers, managing the servers, choosing what region you should be running this Gen AI project. And that's an issue. If you want to be moving fast, that's an issue of your developers not actually using 100% of their time developing, right?

So how do we set up engineers to execute quickly without making mistakes? You need an opinionated platform that prevents your engineers from reinventing the table stakes of infrastructure every time they need to deploy an application. So this is where Cloudflare comes in. We're focused on modern architectures, modern code, modern developer experiences. But how do we do that, right?

So, at the beginning, the majority of you guys raised your hands, you knew about Cloudflare, you probably use one of our application services like CDN Performance, right? You know us for DNS or you may be using some of our Zero Trust products from Cloudflare One. But here, what we have done is with our developer platform, you get the benefit of the first two here, application services, the security, and then when you deploy your applications on Cloudflare, everything's built in. The security, the speed is built in. And so what we've actually done, you see all these dots across the world.

We have data centers in over 335 cities. We own those, right? We're not renting them from another hyperscaler, we own those. And the reason why we built out all those data centers was actually for the application services, the CDN, right? But now we're piggybacking off of that infrastructure. And we're allowing you to deploy your applications in those data centers as well. So we've gone through, you know, what makes apps work well, what scales, what doesn't, because 20% of the internet runs through Cloudflare, right? And so now, when you deploy an application on Cloudflare, all of those learnings are yours by default.

And so as a result, what we have done is in order to actually build those application services, we had our own developer platform. We were using it internally and we had more and more customers wanting that type of technology.

As we were using our own developer platform, we came up with primitives, building blocks in these four categories. We have the compute, the brains in our infrastructure, so we're talking about CPUs here. We have data, so we have object storage, we have durable execution, and we have media. If you're watching live sports or if you are trying to stream something, we have solutions for that. And lastly, we have AI, and what I mean by that is we have added GPUs to our data centers as well. They're in over 210 cities right now, and it's still growing.

This is what I was just talking about, right? So you have complete reach. It's your application, our network. You deploy it once on Cloudflare, and you're going to run better than anybody else because we specialize in speed, performance with our data centers. And we're getting really popular. Last year, the number was 3 million developers using Cloudflare to deploy their applications. This year, we're at 4 million.

Stack Overflow, you guys know Stack Overflow, right? I mean, I don't use it that much anymore because of Claude and similar tools, right? I feel kind of sad, but they do do a yearly survey. They surveyed about 50,000 developers, and Cloudflare was rated the number one cloud platform where people are learning to code, right? Because it is so easy to just click deploy, and that's it. You're not provisioning servers, you're not choosing what region. We make those decisions on your behalf. That's where that opinionated platform comes into play.

Betting on Inference and Agents: Cloudflare's GPU Strategy and Cost-Efficient AI Infrastructure

And then we're top two when it comes to learners that are using AI, right? So vibe coding, people who are vibe coding, they're deploying on Cloudflare infrastructure. All right, so this whole conference, I've noticed a lot of the talks are all around agents. I'm going to drive that home, okay, guys?

So just to give you a little brief history lesson, we took some bets at Cloudflare with our infrastructure. Last year, you know, we were looking at the numbers, and 44% of developers were already using AI for generating code, right? We believed that AI adoption was going to explode like everybody did. But the bet that we took was that with all those dots around the world, do we install GPUs for training, or do we install GPUs for inference? Because they're two different things here.

We made a bet that the majority of traffic or the needs is actually on the inference side, not on the training side. So we didn't want to play in the training business. We wanted to play in the inference business. That's where we shifted our focus. We built out our infrastructure, our GPUs to do inference closest to the user, right, with those hundreds of cities that we have our GPUs in.

That was a good bet, because what happened was, of course, if you guys remember, DeepSeek kind of proved that theory of we don't need that much training infrastructure to make these models anymore. But the real usage was actually with the inference, with us using DeepSeek or us using ChatGPT's models, right? That's where most of the traffic is at.

And by the way, remember it was like 40% of developers were using AI to help them with coding? Well, today, that's grown to 76%, and I'm sure this is going to continue to grow. So what's next, right? We talked about some bets on the inference side. What's our next bet?

Our next bet is related to agents. It's the automation part, right? I feel like inference now is kind of commoditized, so we want to build on top of that. We want to offer services for you to not only do inference, but to actually automate things. I know there's a lot of terms being thrown around. The automation part is the agentic AI part, right? We're past the phase of just betting on generative AI where you have an augmented human. So for example, me writing the copy for this script, I was using Gen AI.

And so what's interesting with the automation part, the agentic AI by contrast, is you give AI a goal, a task to complete, and it goes ahead and does it. So for example, for this conference, if I were to send you an email, instead of me doing that manually, I set up an agent and it would help me pull all your attendees list, draft up the email, have me review it, so that's the human in the loop part, and then it'll fire off the emails to you guys and follow up with you guys automatically.

Okay, so let's actually see what can you actually build on Cloudflare. So each one of these icons is a building block, is a primitive that we have on the developer platform. So this is a CRM agent. And whether you want to build a voice agent, so someone's calling in to the helpline, we have technology for that. We have WebRTC, we have our real-time product. And we can have our voice models that are running on our AI infrastructure to answer those calls. So once you answer that call, you want to also observe it. You guys are probably building these Gen AI applications or agents, you want to see what they're doing. Well, we have a product called AI Gateway. It's a gateway, it gives you observability, you can control costs with caching. And there's also guardrails as well. So if there's any PII or harmful content, we can block that too. And of course, it goes on and on and on. And of course, we have security, all of that built in because of our Zero Trust solutions.

But you might be thinking, this is great, but Cloudflare is not the only one that has a lot of these pieces that I just showed you. But what is it that makes Cloudflare a perfect platform for building agents and deploying them? Why not build agents on other clouds? So remember, agents are AI. They have the brain. There's LLMs. There's also some workflows, and then some APIs. Let's talk about the LLM piece. So our CEO Matthew Prince, on a recent earnings call, had said that GPU utilization is only averaging around 30%. And a third of those are utilizing less than 15% of the GPUs that are provisioned. But what's the cause of this low utilization? It's because GPU use cases for AI can be broken down into either training or inference. Training is very predictable. It's constant. But the interesting part around inference or when an agent takes an action, it's very bursty. With Black Friday, you may have a ton of traffic and then it just dies. So what we did is we said, okay, we're only going to charge you for when you're using the GPU. That's it. So you're not overprovisioning your GPU usage on Cloudflare with your agents. Get charged, pay as you go. That's the first thing.

The other interesting thing is wall clock time, so waiting time. The human in the loop with your agents, sometimes they have to wait for you to say go or no go, or they're waiting for some other database to give them the data that they need. All that waiting time, we don't charge you for it. So in this example here, there's a total waiting time of 207 milliseconds. We don't charge you for all of that. We only charge you for when there's compute. So it's only 7 milliseconds of billable time here.

And then, what makes Cloudflare AI workloads faster, it's because of the dots that you saw on the map. We can run your agents closest to your users, wherever they are, or closest to the services that they need to connect to. And again, it's just all done automatically on the back end. And so yeah, going back to that Stack Overflow survey, why people are learning to develop on Cloudflare, it's just really easy. It's one command line, and you're deployed across the globe.

PayPal's Three-Day Success Story: Building MCP Servers on Cloudflare

You don't get charged because it's across the globe, right? You only get charged when you're actually using these services.

All right, so real-life example. MCP servers are very useful for companies to extend their services to their clients. PayPal built their MCP server on Cloudflare, but what does that actually mean? So let's say I'm a small business owner, right, and I'm using Claude. I just sold some goods. So here, Claude is connected to the PayPal MCP server. It's connected to my PayPal account, and I tell Claude, this is my client's email address, this is the product that they bought, this is where they're located, and Claude then sends that information to the PayPal MCP server.

What gets returned is a few things. So it says, great, okay, you did some house cleaning services, David. The cost is 250 bucks. Oh, and by the way, I looked up the local tax law with RAG and applied the 10% tax law over there in California. And now the invoice is ready and you can just send it to your client. So that's one way to go about using PayPal's MCP server.

The really cool one is traditionally, as a small business owner, I would need to hire a bookkeeper to look at which invoices are unpaid, what invoices are in draft. Now, using Claude and PayPal's MCP server, I can ask all those questions, have it analyze my accounts receivable and payable, and within 30 seconds, I have all that information, right? So this is a really fast-moving cost savings service that PayPal had built on Cloudflare.

And how long did it take PayPal to build this on Cloudflare? Three days. That's it, right? So the 29 weeks we're talking about, three days, and that's how easy it is to build on Cloudflare. There's a ton of companies that have built their MCP servers on Cloudflare, including Atlassian, Asana, companies that are here on the show floor. We have a partnership with Anthropic. We're building the MCP protocol with them. We're influencing that.

And then finally, the last thing that I'll leave you with is scan the QR code, try to build something on Cloudflare just to see how easy it is. You can use any vibe coding platform as well. And if you guys have any other questions, I'll be outside over there. And if not, come to the Cloudflare booth. It's 1384. And that's it. Thank you.

; This article is entirely auto-generated using Amazon Bedrock.