Kazuya

Posted on Dec 8, 2025

AWS re:Invent 2025 - PhysicsX: Scaling Physics AI for Automotive Aerodynamics (STP109)

🦄 Making great presentations more accessible.
This project enhances multilingual accessibility and discoverability while preserving the original content. Detailed transcriptions and keyframes capture the nuances and technical insights that convey the full value of each session.

Note: A comprehensive list of re:Invent 2025 transcribed articles is available in this Spreadsheet!

Overview

📖 AWS re:Invent 2025 - PhysicsX: Scaling Physics AI for Automotive Aerodynamics (STP109)

In this video, James Leahy from PhysicsX presents foundation models for automotive aerodynamics, demonstrating scaling laws where more data yields better models. He showcases how their pre-trained model reduces CFD simulation time from 4 days to 6 seconds. The presentation reveals that models trained on public datasets like DrivAerNet (8,000 simulations) and Luminary (2,000 simulations) perform poorly out-of-distribution, but their diverse dataset of 18,000 simulations with 120+ baselines achieves significantly better zero-shot performance. The system uses AWS SageMaker HyperPod for both GPU-based Siemens Star simulations and model training, with an active learning loop driven by uncertainty quantification and generative geometry models.

; This article is entirely auto-generated while preserving the original presentation content as much as possible. Please note that there may be typos or inaccuracies.

Main Part

PhysicsX: Building Foundation Models to Transform Engineering Simulation

My name is James Leahy, and I'm a Principal Research Scientist at PhysicsX and a mathematician. I'm here to tell you about how we're building foundation models for physics, specifically for aerodynamics in this case and automotive aerodynamics, and I'm going to show you some scaling laws that we're seeing. More data, better models. It's kind of obvious when you see this happening in language and video, but for engineering and physics, this is very far from a trivial problem. It's an ongoing effort, and we'll see some early results today.

Here's what we're going to cover. First, I'm going to give you a quick introduction to who PhysicsX is. Then I'll explain how pre-trained models are the next frontier for physics AI. I'll talk about our new pre-trained model for automotive aerodynamics, and then I'll show you some scaling laws. We'll look at how AWS infrastructure is making this possible, and then I'll close with where we're heading next.

So let's start with who PhysicsX is. PhysicsX is an AI company, and we're focused on industrial applications. Our mission is simple: build and deploy AI to accelerate industrial innovation. We've raised over $155 million in Series B funding. We have over 200 engineers, a mix of AI researchers, simulation engineers, and software engineers. I think crucially, it's important to understand that we are not just a research organization. We work directly with our customers to build the things that are usable and used. We partner with leading players across aerospace and defense, semiconductors, materials, energy and renewables, industrials, and automotive.

So what you're seeing here is a computational fluid dynamics simulation that's approximating the Navier-Stokes equations around an immersed body, in this case an airplane. CFD was a major step change from building physical prototypes. Before CFD, if you wanted to know how air flowed around a design, you had to build it and put it in a wind tunnel. Wind tunnels still play a vital role today, but everyone knows that simulations have fundamentally increased the speed at which we can innovate. Simulations, though, still are a bottleneck, and there's room to innovate faster.

That's where physics AI comes in. A legacy simulation like the one you just saw, which was a 7-second transient simulation consisting of over a billion mesh elements, takes 4 days to compute. Using physics AI, we can do this in a 6-second inference step. 4 days becomes 6 seconds in inference. That's not just an incremental improvement, that's a fundamental transformation in how manufacturers can innovate and design. The models are called surrogates because they act as stand-ins for expensive physics simulations. A well-trained surrogate can give you answers that are accurate enough to make engineering decisions at a fraction of the computational cost.

We don't just work on aircraft. Our technology applies across engineering domains, as I spoke about earlier: aerodynamics for vehicles, heat transfer for thermal management, chemistry for reactions, and electromagnetics for electronics. The physics is different in each domain, but the fundamental approach is the same. We learn from simulation and, hopefully, more experimental data and make fast, accurate predictions.

So what does it take to build an AI surrogate? Well, you have to do some of those expensive simulations that I talked about earlier up front. To generate the training data, you need to store that data. You need to process it, you need to clean it, and convert it to formats that machine learning models can consume. Then you need to train the models, validate them, and deploy them. We're building a platform designed to completely support the AI lifecycle for engineering and physics: the data management, the model training, the fine-tuning, and all deployed as an agentic workflow. We have a simulation workbench which orchestrates the simulations. We have the AI workbench to develop and deploy models from the AI model catalog, some of which are pre-trained, some of which are not. And then we have a unified data backbone with enterprise security and multi-cloud scalability. This is not just a research infrastructure. This is production-grade software.

So now let me talk about our pre-trained model for automotive aerodynamics. I spoke earlier about how numerical simulation was the first major step change for engineering. The simulations are complex to set up. They take hours or days, but you get cheaper and faster exploration than physical prototypes.

Then came traditional machine learning models and surrogates. You need a training data set, you need a machine learning model and a pipeline, and once you're trained, you can get vast design space exploration. But for every customer we encounter, every new geometry we see, we have to collect a new data set from scratch.

So what we're doing is we're going out and generating large pre-training data sets, coupling it with foundational architectures and serious infrastructure that's involved to get everything from the simulations to the data loader and to train at scale using dynamic data parallelism. And what you get from these pre-trained foundation models is you get lower upfront data needs for customers and better out of the box performance, zero shot and fine-tuned.

Creating Diverse Pre-Training Data Sets for Automotive Aerodynamics

So when we wanted to build the automotive aerodynamics model, we faced some challenges. The challenge is that there wasn't a very large diverse corpus of data. Some of the largest public data sets consist of the DrivAerNet, which is out of the MIT Decode lab of Fayaz Ahmed, which is a remarkable contribution to the field, make no doubt. That consists of 8,000 Reynolds-averaged Navier-Stokes simulations. These are steady state time averaged approximations. They are the fastest amongst the different simulation types.

And if you look at what the data is comprised of, it's 8 cars that are morphed, and morphing is like stretching and pulling different parts of the car. You get topologically very similar cars. Another data set consists of a Luminary Shift SUV that's 2,000 detached eddy simulations. These are of higher fidelity than RANS. And here there are only two baseline cars that are morphed 1,000 times. So that's roughly 10 cars which have 1,000 morphs. It's not very diverse.

Another challenge I'll mention is that when you have this diverse data, you have different fidelity simulations, different turbulence closure models, and this is still an open research problem as to whether you should pre-train these models on diverse sets of simulations and how exactly you should do it. We'll only focus on the geometric diversity today.

So I said that these data sets aren't diverse, but why does this matter? We pre-trained various attention-based architectures. If you're familiar, Transolver, ABUPT, point cloud attention, linear attention. These meshes are very large, so you have to use variants of attention that are suitable. And we trained them on Luminary and DrivAerNet, these two public data sets that I just spoke about, and we looked at their out of distribution performance on a representative customer data set. We call it CustomerNet. That's one baseline simulation with very intricate morphs that we see that our customers have, and we looked at their performance.

What sort of performance are we looking at? I'll first talk about the quantities that we're looking at. There's the drag coefficient. This is a measure of the aerodynamic resistance of the car. A typical sedan has a drag coefficient between 0.25 and 0.30. Understanding this number is how automotive manufacturers can optimize for fuel efficiency and EV range and other aspects.

The pressure coefficient is the pressure on the surface of the car. This tells you how pressure varies across every point of the vehicle's surface. You have a high pressure region in the front and a low pressure in the wake, and these differences are what create the drag. And if you can predict the pressure field accurately, you understand why the design has its drag.

So let's look at these graphs. If we look at the top one, this is the drag coefficient, and along the y-axis is the Spearman rank correlation coefficient. It measures how accurately you're ranking the designs. You look in the green and the orange. The green is the model that's trained on DrivAerNet. The orange is the one that's trained on Luminary. And if you look at the performance within distribution, it's quite good. If we look at how it performs out of distribution on the CustomerNet, it's orders of magnitude worse.

We see a similar story for the pressure coefficient here. Lower is better. We're looking at the mean absolute error of the pressure coefficient on the surface of the car. So it seems pretty obvious you need to go out and you need to generate a very diverse data set.

So this data set is growing as we speak. We are still generating this data and training the model. We'll talk a little bit about how we're getting more diverse designs through active learning.

So here, what you see are some numbers that are trained on 18,000 simulations with over 120 baselines at a fixed inlet velocity for now. We see a difference here that our model still performs well on DrivAerNet and Luminary, these public datasets, but when we look at the zero-shot performance, we get significantly better results, and this is just zero-shot. This isn't when you fine-tune.

So now you want to understand as we're growing this dataset, what's happening out of distribution. Is your out-of-distribution performance improving? What we're seeing here is a graph where along the y-axis is the out-of-distribution mean absolute error of the pressure coefficient, and along the x-axis you see the training set size, and you see this decreasing. Now when you look along the x-axis, you say training set size, are these just morphs or are these baselines? These are some of the questions we're trying to understand. How many times should you morph a car? Does that improve the performance? Here this is going along basically 100 morphs per baseline that we're increasing along the x-axis. We'll talk about how we're now implementing active learning, and that's going to increase the slope of these curves.

So accuracy alone isn't enough for engineering applications. Engineers need to know when to trust the model. If you're predicting a drag coefficient, a difference of 0.01 matters. You need to have confidence intervals. If we look on the right, this is showing how drag accumulates throughout the length of the car from the front to the back. You get this by integrating the pressure and the shear stress on the surface, and as you get all the way to the right point of the graph, the final number on the upper right, that's the coefficient of drag. You can see that that blue band is the uncertainty region, and at that final point on the right, that represents the confidence interval around the final drag coefficient. This is the sort of uncertainty quantification we expect from our models, and we'll see that this uncertainty quantification isn't just crucial for the end use. You need to know at which points the models, which designs the models are uncertain, to know which designs to go for next, and that's the active learning loop that I'll speak about next.

Leveraging AWS Infrastructure for Active Learning and Future Multiphysics Applications

So let me now talk about, we're at an AWS conference, let me talk about how we're using AWS to simulate and train these models. Let me start with the diverse geometries. Say we have a diverse set of geometries. These then go into our aerodynamic simulation. We're using Siemens Star, and we're using GPU-based solvers, and these are on SageMaker HyperPod, the same clusters that our models are being trained on. On the output of the aerodynamic simulations, you get the diverse resolved flows, very large set of fields, terabytes upon terabytes. Those are stored in S3. That's our long-term storage.

Then those fields are then preprocessed using AWS Batch in parallel to get them to a form that's ingested by our models. Our models are then also trained on SageMaker HyperPod using data parallelism, and the storage that's used there is FSx for Lustre. Then finally, I talked about that our model has uncertainty quantification. So that, plus we have, last year at AWS we spoke about our large generative geometry model. This is a model that can generate different designs. The uncertainty coupled with the geometry generative mechanism are what drives this active learning loop. After we had a set of baseline geometries to fine-tune the geometry model, this can then tell us what parts of the design space is the model most uncertain, and we can get those geometries which then feed back into the loop to go through the aerodynamic simulation.

Now that aerodynamic simulation I spoke about was Siemens Star. That's RANS at the moment. That's not a transient simulation. I think there's a lot of evidence that's showing that training on transient data is better use, but to train on transient data, you need to have an online data loader. That's where it becomes even more important that the simulator and the models are trained in the same place. So building this infrastructure even for RANS is setting the stage for the next development.

I'll kind of go over this again, a little in a little more detail about the AWS architecture. So at the heart is the SageMaker HyperPod.

We have a SLURM for job management and a controller machine that works across multiple compute nodes. For the simulation node, you have FSx for Lustre, which is the file system layer. It's bidirectionally synced with S3, and FSx is where the models and the data loader are. The network architecture has two subnets: the HyperPod private one, which is the HyperPod cluster, and the public subnet, which hosts the MLflow dashboard where we see the results. All of our researchers enter in through the AWS Systems Manager.

So let's wrap up and see what we learned and where PhysicsX is headed next. So we talked about simulations and we saw how ML surrogates were a major leap forward to be able to drive down inference time. Then we talked about pre-training models and how this will help us accelerate our customers' workflows by requiring less data from them up front, and we're seeing that scaling laws are emerging.

Now, what's next for us? We started with automotive aerodynamics and we will continue to innovate here, as I mentioned, going to transient and so on. But we'll also be looking at other domains: radar cross section, aeroelastics, and structural mechanics. And you should ask, well, is this going to be all one model or are these going to be separate models? Both. We'll train the best model in class and we'll look at what happens when you continue to increase the scope of the model for multiphysics. I think we're working with our customers to understand what are the best models for them, and so we'll build both.

I encourage you to visit physicsx.ai to learn more. We'll be releasing a blog post soon about the model. And I just want to thank you for your time today and for listening.

; This article is entirely auto-generated using Amazon Bedrock.