Kazuya

Posted on Dec 6, 2025 • Edited on Dec 8, 2025

AWS re:Invent 2025 - Modernizing Legacy Systems: Boeing's PLM Cloud Transformation (IND321)

🦄 Making great presentations more accessible.
This project enhances multilingual accessibility and discoverability while preserving the original content. Detailed transcriptions and keyframes capture the nuances and technical insights that convey the full value of each session.

Note: A comprehensive list of re:Invent 2025 transcribed articles is available in this Spreadsheet!

Overview

📖 AWS re:Invent 2025 - Modernizing Legacy Systems: Boeing's PLM Cloud Transformation (IND321)

In this video, AWS and Boeing discuss modernizing PLM operations on AWS. Justin Iravani from AWS Professional Services outlines key PLM challenges including performance, data friction, and global collaboration. Jim Gallagher and Dan Meyering from Boeing detail their migration of Dassault Systèmes 3D Experience from on-premises to AWS, achieving a 99% reduction in environment provisioning time (from 30 days to 5 hours), 78% decrease in manual tasks, and over 40% cost savings. They leveraged Infrastructure as Code with Terraform, EC2, RDS, EFS, and Application Load Balancers to eliminate bureaucratic bottlenecks and enable parallel deployment of hundreds of environments. The session concludes with future opportunities using generative AI and Amazon Bedrock for natural language PLM queries and automated workflows.

; This article is entirely auto-generated while preserving the original presentation content as much as possible. Please note that there may be typos or inaccuracies.

Main Part

Introduction: PLM Modernization Challenges and the Boeing-AWS Partnership

My name is Justin Iravani, and I am a Senior Cloud Infrastructure Architect with AWS Professional Services. Today we're going to be talking about the current state of PLM in terms of challenges, the perceived barriers to modernization, and some principles for addressing those challenges. Then I'll turn it over to Jim and Dan, and we'll talk through how Boeing was able to modernize their PLM operations on AWS and what's needed to operate hundreds of environments on AWS.

When it comes to product data and product lifecycle management use cases, we see three key challenges raised by our customers. The first is performance and scalability. As product data and analysis solutions expand due to things like increases in 3D model fidelity, business intelligence, and metadata and variation complexity, it becomes more challenging. Additionally, the needed reach of these solutions and applications to regions, organizations, and functions means more data velocity and distance.

This leads to our second challenge, which is data friction. Data friction has largely been an artifact of historical on-premises architectures and integrations based on legacy infrastructure and applications. These functional silos also create corresponding data silos that are isolated for reasons like security and access controls. Due in large part to the first two challenges, global collaboration continues to be a challenge as companies engage in engineering activities which span multiple regions and span the globe. Additionally, these connections are becoming more and more prevalent as companies continue to accelerate mergers and acquisitions and partnerships around new technologies and innovations.

Barriers to PLM Migration and the Three Pillars: Digital Twins, Threads, and Fabric

For those manufacturers whose business is dependent upon PLM today, the prospect of upgrades, let alone migrations, is very daunting. Service interruptions must be minimized, and new implementations have to be handled in a way that's durable and reliable. Manufacturers also have significant investment in training the teams and engineers using these systems, and they want to make sure that they continue to maximize that knowledge into the future. There's also often multiple layers of applications built on top of these PLM systems as well as very tight integrations with other critical systems like ERP, MES, and SCM.

While there's a large volume of data in these legacy systems, there's plenty of opportunity for design reuse, but that can often be difficult to manage. Additionally, migration projects involving PLM often last months or even years and can often be very expensive. Any PLM solution running on AWS needs to account for the core three pillars, the PLM pillars of digital twins, digital threads, and so-called digital fabric.

AWS helps our customers establish various digital twins that represent many aspects of product definition. Some of these are more obvious, like 3D product representations, but others may involve requirements of digital twins about how the product should behave as well. PLM digital twins and engineering twins are established by hosting partner solutions as well as using storage and database services to build engineering digital twins to support the various solutions that exist. Just as an interesting sort of side note, Boeing was one of the pioneers in this digital twin space. Back in the early 2000s, this began with the so-called virtual airplane project, so they definitely continue to innovate down this path.

The product data and lifecycle management thread represents the connectivity between the various data sources. These threads are what delivers a richer set of information for decision making and aid overall processes for more efficient and effective outcomes, allowing for interaction with disparate data sources in real time with appropriate context. The third pillar for product data and lifecycle management involves the ecosystem of the so-called digital fabric. This builds upon the development data and digital twins with interconnectivity through the threads and adds stakeholders from across the network. Global connectivity and availability is critical in leveraging and maximizing skills and resources required to deliver today's innovations.

AWS Modernization Approach: Customer Obsession, Automation, and Iterative Value Delivery

R&D is at the heart of most manufacturers. Market pressure to deliver increasingly complex products to market faster as businesses exploring new methods to keep pace and unlock new revenue streams. However, traditional R&D infrastructure and the complex integrations built on top of that infrastructure barely keep up with the existing product development work streams.

This infrastructure is inflexible, poorly utilized, and can be managed in year-long cycles. Increasingly, customers are looking to AWS and our partners for best practices around R&D IT infrastructure modernization and application modernization to enable outcomes like agile engineering, multi-disciplinary optimization, concurrent engineering, and smart products.

The AWS approach to modernization can help our customers overcome these perceived barriers. Our approach is being customer obsessed, understanding what are the needs of our particular customers and working with other customers in the marketplace. Then working backwards to break down these complex problems into smaller, more manageable components. We obviously also need to focus on building solutions and architectures that scale appropriately with our customers, accounting for not only architecture but also operational complexity. For example, facilitating things like self-service. By leveraging automation tools like CloudFormation and Terraform, we can ensure consistency, reliability, efficiency, and quality by reducing the potential for manual mistakes.

Lastly, we also prioritize speed and agility over perfection. The goal is to deliver customer value rapidly and iteratively rather than having a perfect solution. This often means focusing on what we can do within our given span of control, or as I'm known for saying, we can always make this more complicated later. With that, I'd love to hand it over to Jim.

Boeing's PLM Landscape: 3D Experience and Legacy On-Premises Infrastructure Challenges

Thanks, Justin. My name is Jim Gallagher. I'm the lead architect for 3D Experience at Boeing. First, maybe a blurb about what is PLM. PLM is Product Lifecycle Management. The industry concept is womb to tomb system for managing product engineering, manufacturing, and support data. It includes CAD, CAM, CAE simulation, tooling, planning, change management, and configuration management, representing the as designed, as planned, as built, and as supported lifecycle states for all systems. Mechanical structures, electrical, hydraulics, HVAC tooling, ground support equipment, and so on.

The goal is a digital twin of each product we manage at any given lifecycle state, along with all the data and analysis leading to and supporting the management of that state. PLM systems are used across all Boeing product lines. Our next generation PLM system at Boeing is Dassault Systèmes 3D Experience. Boeing is investing in enhancements in 3D Experience to best execute Boeing design and manufacturing processes. 3D Experience is currently in production for several dozen smaller programs with several thousand users. 3D Experience will be the PLM system for the next new commercial airliner.

As Justin mentioned in the introduction, legacy on-premises infrastructure management presented common challenges. Infrastructure capacity is based on yearly budget and capital acquisition cycles. Management of limited resources requires process: assemble a hosting package, request, review, approve, reject, provisioning. All this takes time and means competition between projects for resources. Projects may not get what they need when they need it. Different cost models for different services. Chargeback models must be developed and maintained, including which services charge back to projects and which are peanut butter spread across the enterprise.

Often, there's a significant lag between approval and provisioning because of handoffs and work queues. Once in hand, legacy managed infrastructure has additional drawbacks for both the project and the enterprise. There's significant incentive for projects not to give back if they think they might be able to reuse. Chargeback models do not always reflect actual or proportional cost of the infrastructure. It's difficult to right size.

Expanding often means a request approval cycle. Shrinking means a project may not get it back later. So PLM to AWS.

Migration Strategy: Cloud Opportunities, New Responsibilities, and Desired Outcomes

Cloud opportunities. Boeing leadership saw opportunities in AWS to enable application teams to move faster and save money. We created an enterprise internal cloud. The enterprise worked with AWS to create network segregated regions in AWS GovCloud for Boeing, which appears to be on the Boeing network. This greatly simplifies application access and systems integrations.

Empower the team. Turn on the AWS accounts, enable the architects to design and the operations team to build. But why is it taking so long? Our app has a large and complicated footprint. It was not as easy as it seemed at first.

Rehosting and reaping benefits requires time for analysis, learning, and iterative experimentation. We established an engagement with AWS ProServe to assist and guide our PLM team. Beware of the devil you don't know. On-premises was slow and inefficient, but the processes and patterns were well known. There were many divisions of labor and hosted services.

Legacy infrastructure included OS admin support, systems integrator support for server tracking and compliance, backups, network attached storage, load balancing, OS blockpoint and patching. Our new responsibilities, skills, and AWS services. We needed to take care of those things ourselves. The vast majority of those legacy managed services were not available to us in AWS. There was no AWS experience on our team.

Success depended on working closely with DSO and AWS ProServe. It was clear we would need to work closely with DSO infrastructure architects and AWS ProServe to establish new architectural patterns to best meet our objectives. What were the migration outcomes that Boeing was looking for? Accelerate infrastructure operations, time to provision, time to decommission, time to install and upgrade the application.

Avoid bureaucracy, no approval or request cycles. AWS proficiency. Ops and architecture need to understand what can be done and what should be done and be able to execute. Avoid dependencies. No waiting on other teams to provision what is already approved or needed, or troubleshooting.

Reduce cost. Enable elastic compute. Don't provision for the peak, provision for the baseline. Implement processes and tools to monitor and turn off VMs not being used. Network storage bill by consumption, not by provision capacity. Improve configuration management via infrastructure as code. Reduce variation between servers and simplify provisioning.

The next two desired outcomes were not specific to AWS, but we used the migration as a forcing event to accelerate these additional outcomes. Follow the sun. Leverage AWS account segregation to enable our colleagues in India to provision and manage environments that do not contain export controlled data. Tech insertion. Address OS obsolescence as part of the migration to AWS.

On-premises, we were running Red Hat 7, which is at end of life, and so our newly provisioned systems in AWS are Red Hat 8. Lastly, the Boeing 3D Experience PLM releases are major undertakings with hundreds of people involved. We wanted to avoid impacting release schedules.

So we partnered with AWS Professional Services to bring their experience with other 3D Experience customers and deep AWS expertise to bear on our desired outcomes. We also engaged with Dassault Systèmes. We work continuously with our Dassault Systèmes colleagues, and this project required establishing some new priorities around how to best map the out-of-the-box 3D Experience architecture to AWS infrastructure.

We needed to establish capability priorities. We had a long list of many science projects to work through for different capabilities: infrastructure as code, load balancing, data replication, disaster recovery, and so on. Priorities and minimum viable products for different phases of the path to production were determined. We also had to establish a schedule for which environments would migrate as part of which PLM release. Don't try to migrate everything at once.

The Bureaucratic Nightmare: Manual Processes and Inconsistent Infrastructure Before AWS

Now I will hand it over to Mr. Dan Meyering. Thank you, Jim. Good afternoon, everyone. My name is Dan. I'm a lead systems integrator at Boeing working in PLM. I'm here to start off with everyone's favorite topic, of course: bureaucracy. The doors are locked so you can't run at this point. That's why we put it in the middle.

To set the stage, in order to support 3D Experience, we need a predictable stack: compute hosts, databases, load balancers, network attached storage, file systems, DNS aliases, and so on for each environment. Importantly, the work doesn't end once the resources are allocated. Post-provisioning steps like patching, troubleshooting configuration issues when it's delivered, access configuration, and compliance tasks are all essential. Our goal has always been, like anybody's would be, repeatable infrastructure that allows the application to run reliably.

Before AWS, each component—the VMs, the network attached storage, file systems, databases, and DNS— they all required requests to separate teams, and those teams not only provisioned the resources but also owned ongoing support and many configuration tasks for them. That model meant many handoffs, a lot of emails, coordination, phone calls, and instant messaging, depending on the preferences of that actual support team. On paper, it worked for a long time and it looked structured. But in practice, the coordination costs and delays were significant, especially in hindsight.

Submitting requests was a bureaucratic nightmare. Teams required specific forms and formats. Mistakes in a request could potentially cost weeks of delay. The images on the right of the slides here are actual example snapshots from some of my team's internal instructions. All of them were earned in blood, all of them to try and ensure the request for a specific resource was submitted correctly to minimize delays. The lower left-hand side, that long thing, is actually shorter than reality. I had to cut it there, otherwise it looked awkward. That is a single request for a network attached storage file system for one region exporting to one net group. Typically we could put a couple on one, but anything beyond that, it's likely to get messed up for some reason.

We spent a lot of time following up, clarifying requirements, and reacting to inconsistent results. That daily overhead obviously diverted engineering time away from delivery and created a cycle of frustration and rework. Our reward for that headache of all that manual model was inconsistent infrastructure: servers that were missized for the workload, network attached storage file systems hosted in the wrong region or exporting to the wrong net group, databases configured slightly improperly, and systems that required bespoke fixes.

Those handoffs increased operational risk, sometimes surfacing as late-night outages, such as when a database listener wasn't quite configured to start up automatically after a maintenance window, and so we would get a production call in the middle of the night. Typically not production, but we would get a call nevertheless to support some environment that went down. It all reduced our ability to iterate or experiment because we had to overprovision to hedge against variability. So with that mess in mind, let's explore what happened when we moved to AWS.

AWS Services Transformation: ALB, EC2, ASG, EFS, and RDS Replace Legacy Systems

Well, a concrete early win was replacing our single Apache HTTP host with Amazon's Application Load Balancer. This is kind of starting out from the user perspective, I guess, when it hits the link. It allowed us to remove a single point of failure we had in our environments and eliminate the need to maintain complex load balancing configs in Apache across services.

For example, for the most part, the single point of failure was that we were hosting Apache for an entire environment, usually a multi-tier environment, on a single VM in the environment which also kind of shared that host with other services. So if that host went down, even if the other services were fine, the whole access to the application was gone. But the ALB gives us centralized health checks visible in the console, highly available integration with EC2 and auto scaling that supports application cookies and sticky sessions required by 3D Experience, especially for auto scaling, and simplifies the DNS and the ingress controls.

For compute, the EC2 gives us API-driven right-sized instances. We can select CPU and RAM profiles per service and use AWS recommendations such as through Compute Optimizer to actually just tune the sizing. In addition, by baking our prerequisites into the custom AMIs the EC2s are built from and using instance user data for specific bootstrapping, we have reduced the manual setup and sped up instance readiness significantly. Last but not least, the tags on the EC2 instances have also enabled all sorts of metadata and discovery that we did not have on-premises.

An additional note that's worthy of mention is for the AMIs. For our on-premises instances, there was usually this forced wait period because we had a kind of push security patching model. So as a base image, as time went on from whenever it was published and we consumed it, the further it went from the patching cycle, the longer it would take before the instance was actually ready to use, sometimes up to three hours, and that's not great for ASG. In fact, that makes auto scaling impossible effectively. So the AMIs have completely wiped that out.

Similar to EC2s and the scaling auto scaling I just mentioned, the ASGs, the auto scaling groups, have let us scale services in and out automatically or manually. Our pattern uses user data scripts and pulls the binaries from the EFS that's attached to every single host. It runs a startup bash script, starts the services, does some little finagling and grepping sometimes, and just starts it up and works seamlessly. This approach supports scaling driven by application metrics. CloudWatch can trigger a scale event on either Java metrics or user accounts, and it doubles as a recovery mechanism as well because launch artifacts are stored on the EFS, which is replicated to the DR region.

So speaking of EFS, it has replaced our legacy network attached storage. No more of those long request forms that always get messed up in one end or another. It's API driven, automatically scales, and supports per environment volume, so teams or environments no longer share a single file system due to scarcity, and we don't have to try to anticipate capacity needs, which also means we're not paying for the storage we might have needed, though that typically wasn't the problem. We usually ran out of space.

On-premises, this was always a headache due to limited network attached storage sizing. Anytime we went over two terabytes in a request for some reason, it became a slightly different request process. Anyway, all this added up to an unreasonable amount of overhead to provision storage, like having to for every request, having to decide repeatedly like what its purpose was, what region is it going to, what environment, for which permissions, and when we run out of space, what's going to happen. Inevitably we did run out of space, so we would then have to rob space from other environments. But now the EFS has significantly reduced that headache.

It's reduced the blast radius of concern for that network attached storage. It's vaporized, it's gone. There's no concern anymore, and it's removed all the repetitive decision making we used to make, all while avoiding resource contention as well, and we're not paying for unnecessary storage.

For databases, we now use RDS. It provides a managed database service with high availability across multiple availability zones, automated backups, point in time recovery, read replicas, snapshots, you name it.

We can import Oracle dump files from S3 for priming new databases without repetitive DBA involvement. We didn't really gather metrics on how much it saved the DBA's time as well, but there's definitely been a lot less interaction with them, so they're freed up to do all sorts of stuff, as are all the various teams that I mentioned that we used to send a request to. Surely we're not interfacing with them nearly as much, so they're working on other things, no doubt.

So to sum up our new infrastructure, we now have resources that are quick to provision as well as tear down. The infrastructure for each environment is consistent, right-sized to specific resource needs, is performance-based on EC2 and RDS types being tailored to the actual workload, as well as being scalable and automatable.

Infrastructure as Code with Terraform: Templates, Parallel Deployment, and 60% Time Reduction

So how has that enabled us to accelerate our operations? Well, obviously AWS exposes infrastructure through APIs, and Infrastructure as Code turns those APIs into well-defined code by consolidating all those docs and instructions and manual steps into Terraform code bases. We've achieved consistency across all the EC2s, EFSs, databases, DNS, load balancers, and other related services. The benefits are faster deployments, improved reliability, consistency, and repeatability.

So zooming out a little bit on the Infrastructure as Code, we're using Terraform. We started off with CloudFormation, it was great, but the enterprise was going Terraform, so we're going to want to use their modules, so we're using Terraform. Our code lives in GitLab. A GitLab runner executes the pipeline to plan intended changes, and we use approval gates and validations to ensure expected outcomes before triggering the apply stage. The CI/CD approach enforces peer review. It creates auditable change history and replaces all sorts of manual provisioning steps with an automated repeatable flow.

Zooming in a little bit further, to standardize outcomes, we created environment templates. Essentially we have small templates, medium, large, and we have monolithic, which all the services are hosted on one EC2 essentially, and it's basically a death box. Each environment template defines different baseline EC2 sizing, RDS classes, LDAP bootstrap settings, DNS aliases, ASG configuration, volume sizing, and load balancing configurations. The templates have reduced the overhead and ensured deployments match the intended performance and scale profile, stuff that we used to have to research. We did not have the standardized sizing on-premises. Instead, previously we would effectively find a close match environment, then try to provision architecture based on that close match, which was a heavily manual and investigative process. A lot of that was kind of downstream of the inconsistency with some of the infrastructure we get. We'd have to rob stuff or patch things.

So zooming out a little bit further, top left there, there's an example of a small. That's kind of the template I was just referring to. These environment definitions provide a minimal set of parameters for an effective environment size. And then our Terraform 3DX core module forms the kind of the scaffolding for all resources in AWS, and it iterates through the definitions of, say, the small environment to deploy those subtle configurations that that environment needs and for all required resources. And then at the build level for the environment, we source the core module using the smart defaults to remove repetitive manual entry while still allowing for overrides. Such as, for example, maybe pinning an older AMI that we may need to test some issue maybe in production that has an older AMI, and then we just fire up a dev box with that old AMI pinned to it, or maybe we need to override the Route 53 DNS alias for the environment, little things like that.

So now that all our infrastructure is defined by code, we can deploy multiple environments in parallel without sequential bottlenecks that we used to see. This ability dramatically shortens wait times for teams needing fresh environments and supports a more agile development cadence. The benefit of this beyond just automation cannot be overstated. Previously someone on the ops team would be bogged down for a day or a week deploying just one or a small handful of environments due to the manual effort and coordination that was required.

And because of the pipeline and the modular nature of the code, we can easily deploy one environment or 10 environments or conceivably hundreds of environments in parallel in only a few hours. On top of that, most of that time is passive, so someone on the ops team who's deploying these things can just go and work on other more important tasks instead of just watching it cook.

Anecdotally, I've gone out and I've kind of pushed the limits. I tried to build a multi-tier environment as big as I can be, and I ran out of IPs, so I had to bring that down, but it was not the fault of the Terraform and the Infrastructure as Code, which is the account ran out of IPs because it's a small test account. And then I've deployed, you know, and the ops team has deployed easily 80 dev boxes in under an hour or so, all at the same time. Okay, so that said, we'll zoom out to the highest level we have now for we have the AWS 3DX core Terraform codebase kind of at the lower level.

In that we also have multiple repos to separate the concerns for the PLM infrastructure across multiple AWS accounts. We have an AMI account and an AMI repo with Chef cookbooks that kind of spit out an AMI in our AMI pipeline. It dumps that AMI essentially into the AWS account for AMIs. It goes through some testing and trials, and then once it passed muster, we share it with our other accounts and then kind of pin it into the latest version of our 3DX core infrastructure version. Of course we have the 3DX core infrastructure repo. We have a PLM environment repo, actually, here's the core infrastructure. We have a PLM environment repo for per environment definitions themselves that kind of refer to the core. And we have a prerequisites repo to standardize accounts, IAM profiles, roles, et cetera, that all the accounts just have. It's kind of one and done, some tweaks over time as we learn things.

So we have, as I mentioned earlier, we also have the, we're zooming out, you know, this is the broad architecture here for our entire pipeline. The enterprise standard AMI feeds into our AMI pipeline, so that's in the bottom left-hand corner there. So that's loading into our AMI pipeline. Ours layers on the stuff for 3D Experience, and then that is essentially referred to in our AWS 3DX core, which is literally at the core of this image here. That also refers to enterprise modules for Terraform for all the resources that I just mentioned in order to maintain IEC standards across the company.

Okay, so that's all infrastructure. We've been talking infrastructure this whole time. Let's get to deploying 3D Experience. In order to do so, for our configuration management, we use Ansible configuration management for the deployment of 3D Experience. And this image right here is actually like an actual normalized inventory document for one of our environments, what it might have looked like in yesteryear. But now we actually have dynamic inventories driven by resource tags that we put on the EC2s, et cetera. So there's no manual copy paste of host names, et cetera, which obviously caused all sorts of issues. Anytime a human is interacting, there's going to be problems.

Beyond that, with all the spare time we have, we've kind of opened up with automating a lot of the infrastructure, dynamic inventory and things. You know, since we're not laboring with that paperwork anymore to wire infrastructure together manually, we've been able to kind of refactor all of our Ansible playbooks into roles, turn linear stuff into roles, and kind of streamline even our inventory. I think our inventory is even less than that now. It's like down to four lines. So this makes the Ansible scripts simpler. Deployments are more reliable and have positioned us to integrate application deployment into CI/CD more tightly, and the fewer little touch points there are manually on those. And in fact that dynamic inventory file is now dynamically generated as well, so it's like one human interaction.

Okay, so we're still working on deploying. Let's look at what the deployment pattern of 3D Experience looked like on-prem historically. To conserve the on-premises resources, we allocated multiple 3D Experience services on a single host that forced serial, brittle installs.

Only one service could be installed at a time on a host, and a single problematic service could affect the others, and often did. What's more, our Ansible playbooks had to account for many deployment patterns, special cases, and the variability of the infrastructure resources, thereby becoming increasingly complex to maintain.

Those conditions resulted in inconsistent deployments between environments, making each environment kind of feel like a game of Tetris, especially if there's a deadline and we need to get it up and running soon. You're just slamming in pieces, and by the time it gets filled to the top, you see all the little blocks you missed and you're like, oh, I'll deal with that later. And we did, usually at 2 a.m. In short, or just as a frame of reference, the full deployment of an application under ideal conditions was about 9.5 hours for a single environment, and that was on-premises. I did it again. I clicked the thing. Oops, let me go back.

However, now in AWS with our right-sized hosts and AMIs that bake in the prerequisites, we now deploy each service to its own host, very fancy, and we run the installs in parallel. Pre-baked AMIs along with parallel deployment patterns has shortened install time by over 60%. It's down to about 3.5 hours, and that was simply the potential that was opened up by moving from on-premises onto AWS. It wasn't the deployment of the application itself that changed. And the combined result obviously is faster, more predictable deployments and simpler automation. Our playbooks no longer need so many special cases to handle these ad hoc configurations, robbing infrastructure from one environment or another.

Challenges, Lessons Learned, and New Opportunities in the AWS Migration Journey

But moving to AWS was not without its challenges. Of course, it did introduce a learning curve across cloud services. Like Jim mentioned, we didn't really have any experience with AWS, Terraform, et cetera, et cetera. So Terraform, GitOps, CI/CD, systems admin, database admin, compliance, these are all roles that they were once handled by the other teams but are now our responsibility. So we did have to take that on. I learned that that's a shallow learning curve, not a steep learning curve. I've always referred to it as a steep learning curve, but it took a little while.

A major risk we encountered, another challenge, was that when learning about infrastructure as code with pipelines, there was kind of the blast radius. It's come up a lot. We started off adding a bunch of environments to a single main Terraform file. However, when we wanted to alter one of those environments in that collection, the Terraform would nevertheless still need to kind of verify the plan and the state file with the other environments in that same commit, which is not what we wanted. So we started kind of separating the environments and segmenting them into their own branch and little commits.

Another challenge was that the Terraform will do, like any code I suppose, exactly what you codify. So critical resources can be accidentally destroyed without proper guardrails, hence the approval gates that we added to the Terraform plan. Ironically, the on-premises, the hurdle to creating all those resources was also the bulwark against accidentally destroying them. But we've learned a lot.

One final challenge was the subtleties of the ASGs, auto scaling groups. We learned this the hard way. If a replacement process was not suspended and we stopped an EC2 in that ASG, then, you know, whoopsie, the ASG would do exactly what it's configured to do and destroy that EC2. That functionality was not what we were ready for, though we deployed it with the intent of getting there. Lesson learned, suspend certain auto scaling processes that you actually don't want active unless you're ready for them.

And then there was kind of some bleeding edge headaches with 3D Experience itself being on AWS. On-premises we couldn't do auto scaling, so we never had to configure the cookies in special ways for 3DX itself. But now we had to tune the application load balancer, the ALB, cookie persistence to support scale out behavior for 3DX services, and adjusted database practices because RDS enforces more granular permissions instead of the on-premises grant all that we were using. A little bit wild west, but the RDS won't allow that. It's best practice not to do it, and the RDS ensures you follow best practices. And then of course, all these fixes and lessons learned are documented in the DS 3DX knowledge base now. DS gets that stuff in there real quick so other customers don't have to run into the same issues.

And of course, with every challenge, the obverse of that coin is there are new opportunities. We have already been seeing many benefits, and we continue to evolve. We're collecting metrics for components such as 3D Space Index, which basically indexes all the search data so people can type things and they come up. We're fine-tuning things to refine trade-offs between indexing time and database sizing and enabling increasingly granular optimizations with EC2s and all that.

Additionally, our team has greatly broadened its own skill set, now having been freed from the shackles of bureaucracy. Our time is better spent on creative tasks now rather than repetitive tasks. Additionally, operationally, we're moving toward running Ansible playbooks through CI/CD pipelines and plan to try out a pull model to improve deployment performance even further. For disaster recovery, just going down the list of opportunities, we are evaluating a smaller RDS class for the read replica in the recovery region because the replica is not actually serving user traffic, so it just has to keep up with the big boss, the primary instance in whatever region that's in.

We're also working toward enabling developers to trigger a kind of build, deploy, test, destroy workflow from a CI/CD pipeline or dashboard in order to streamline and accelerate the development and test cycles. Similarly, we're working toward a build, deploy, test, destroy workflow for our own team so that playbooks are constantly tested to failure, fixed, then merged back in for all developers to enjoy. Okay, so what's the outcome of basically everything I've been talking about? We've got some bar charts over here, and those are always fun.

Measurable Impact: 99% Faster Deployments, 78% Fewer Manual Tasks, and Significant Cost Savings

The impact of everything has been measurable and immeasurable in a lot of ways. The manual tasks and touch points, the left-hand bar chart there, the manual tasks and touch points for a single environment have dropped by 78%. That was 75% a few weeks ago, and now it's 78%. That's a lot, reducing opportunities for error and tribal knowledge. A full environment build as well, the right-hand bar chart, that giant tower and that little sliver at the bottom there to compare, the infrastructure provisioning, the configuration, the application install now completes in roughly five hours versus up to 30 days. That's a 99% improvement for a single environment.

Scale that out to potentially hundreds of environments, and you kind of start to see the picture. That was all time people spent bogged down on other mundane repetitive tasks as well. They've freed up all that potential. What's more, the few remaining manual tasks that are on that 78%, the remaining 22% that remain, are just much simpler. We can now deploy potentially hundreds of environments in parallel instead of spending days or weeks on a single build. Fewer handoffs mean fewer errors and much faster time to value.

Some of the benefits are more difficult to gather metrics for, but anecdotally we've gone from an operations team that spent a lot of their time stuck in the mire of paperwork and then putting out fires or fielding support tickets for the rest of the time to now vanishingly few support tickets, no 2 a.m. fires to put out in my recollection, at least in the past 18 months. We're spending the majority of our time deploying environments or developing our tools, experimenting with new services or strategies. I mean, there's a panoply of services in AWS to use, and we haven't even scraped the surface at this point. Most of our time is spent more valuably now. Quite frankly, we are a whole different team than we were two years ago.

This kind of feeds into the roadmap and the future, some pie-in-the-sky stuff that is now kind of in work with this newfound spare time. We've implemented dynamic inventories and are refactoring playbooks into roles. Our current work streams include moving toward an Ansible pull-based configuration model, as I mentioned, integrating Ansible into CI/CD so developers can self-provision environments, optimizing auto-scaling group metrics, and optimizing deployments by varying the EC2 and RDS classes within an environment. That can be used for deployment performance, for instance. If we just want to get something deployed quickly, we can just scale all the resources up, get it deployed, get over that hump, works as a catalyst, and then we just down-regulate the sizing to whatever the baseline is we expect for the user.

Services can scale themselves out by demand, hopefully enabling AMI rotations without data loss. Not all the services can be hosted in Auto Scaling Groups, which would have otherwise simplified the rotation of the AMIs. These efforts focus on making the experience for developers and users self-sufficient or self-serve and resilient. Next steps include quantifying the improvements in support load and continuing to expand self-service.

Getting to setting the sun on my portion here, in order to support our follow the sun operating model, we have established distinct accounts and IAM boundaries so teams have the privileges they need while limiting exposure. This enables distributed teams to work around the clock and respond faster to incidents. Equally important, dev environments are now cheap and disposable. Developers can rebuild instead of hoarding. They can shut down, start up, restart, and save money when not in use. Changes like those reduce support calls, late night wake-ups, and improve developer velocity.

Overall, we're getting closer to a complete cattle, not pets operational model for both our infrastructure as well as our dev environments. On a personal note, we would not be here if it weren't for the professional services team. They came on board. Aaron Brown, Justin Iravani, who I have the privilege of sharing the stage with, they came in and they walked our poor souls through the valley of death. We have no idea what it would have looked like without them. It definitely would have been more expensive, messier, and not the high quality that it actually is now and keeps getting better. I'll end it there. I'll hand it back to Jim now. Thank you for your time.

All right. PLM operations controlling costs. Migrating to AWS services enabled us to get precise usage-based cost. Now we only pay for what we use. Compute, network, storage, billed by consumption, not billed by capacity. Operating expenses, we got out of the capital budget cycles. We're no longer purchasing infrastructure annually, but instead we get a monthly bill for our infrastructure. Infrastructure stopped when not used. If we're not using it, we just turn it off, which simultaneously stops the cost.

The team also developed by leveraging EC2 APIs a self-service webpage where developers can start and stop their VMs without AWS privileges. Right sized infrastructure. Because of all the various EC2 instance sizing, we're able to provision EC2 sizes that are right for each app service and the right quantity for the environment size. We no longer provision for the peak, we provision for the baseline, but plan for the peak via auto scaling, which Dan talked about.

Infrastructure can be divested. It's super easy for us to decommission things we're not using, just terminate it. We don't need to worry if we need it again because we have the power to provision. Latest infrastructure. We update to the latest EC2 and RDS types, which are usually less expensive than the old ones, just by shutdown, change type, and restart as the new types become available.

Time for improvement. We have moved away from the heavy processes that Dan talked about, enabling more time for continuous improvement. At the end of the day, real savings vary depending on environment type. Monolithic developer sandbox servers with all the services on one host are approximately the same cost between on-prem and AWS, and that's taking into account the Boeing chargeback model. However, disaster recovery enabled production environments are less than half as expensive because of right sizing and auto scaling.

The more servers associated with the environment, the more savings we were able to realize with the move to AWS. Now, let's talk about the actual business outcomes that all this new technology enables. Accelerate operations. Major improvements in time to revision and decommission realized. Avoiding bureaucracy. 90% reduction in requests and approvals needed. Some requests are still needed, mainly around networking, subnet provisioning, routing, information security, and managed firewalls.

AWS proficiency. Our operations team has done a tremendous job assembling the skills to operate and automate infrastructure in AWS. Follow the sun. We got a couple of bullets out of order here. I'm going to come back to that one. Avoid dependencies. No waiting on other teams to provision what is already approved or needed. Still, some dependencies exist for troubleshooting the areas that belong to other teams, such as networking and firewalls.

Cost reduction. We have realized significant savings. Improving configuration management via infrastructure as code. As Dan described, we are fully invested in infrastructure as code and reaping the benefits. The next two outcomes were not specific to AWS, but we used the migration as a forcing event. Follow the sun. We successfully enabled follow the sun for PLM operations.

Tech insertion. OS and CPU obsolescence, both realized. And lastly, maintain release schedule. Early on, we established that our PLM release dates anchored our AWS migration activities. We respond to the PLM schedule, not the other way around. We maintained this principle and it has worked well for us. And now, back to Justin.

Key Takeaways and the Future of PLM: Generative AI and Agentic Workflows

Great. So, thanks Jim. So let's talk about some of the takeaways. So as you just heard, by leveraging AWS services and partnering with AWS ProServe as well as using the AWS ways of working, the Boeing PLM team was able to greatly increase their speed to execute as well as their operating flexibility, having time to do more things. We heard they were able to reduce the application deployment time by more than 99% end to end, which is really quite a tremendous accomplishment.

They were able to gain those consistent environments, so developers obviously love that consistency. Because of the automation infrastructure as code, they were able to cut down on that tribal knowledge, and so the time to first commit for a new team member is basically within a day. You can be deploying these environments. Also, there's no longer handoffs between teams, between the DBA team, the storage team, and so on.

By right sizing their resources, they were able to, you know, start and stop those instances, being able to right size things. They were able to reduce their overall infrastructure costs by more than 40%. So I'm sure all of your CFOs would love that as well. By leveraging the latest in compute and networking AWS services, the 3DExperience implementation is snappier, definitely snappier as described, and again, a lot more consistent. So in terms of the overall performance, it's been a lot more stable for the developers to get in and use their environments.

Boeing also gained new features by enabling AWS services such as easy data replication across region for disaster recovery and auto scaling. A wise man once told me that's the holy grail for the 3DExperience application. So, now that we've talked about how modernization led to real business outcomes for Boeing, let's talk about what does the future of PLM look like.

So, PLM systems are about building high quality products. The more time teams are spending on focusing on problem solving as compared to, you know, being down in minutia and doing a lot of tasks, the better.

This is really where generative AI and agentic AI come into the picture. Using these cutting-edge tools, we are able to do things like ask the PLM natural language questions. For example, what material is Part A made of? Why is Part B 5 millimeters and not 6 millimeters? Who are the approved suppliers for Part C? Being able to ask the PLM these questions provides a lot of benefits for the developers there.

Furthermore, by integrating agentic AI into the day-to-day workflows, we can do things like put an agent in the publishing workflow and check for incompatible materials, cutting down on rework. Additionally, we can also do real-time bill of materials analysis and provide real-time change recommendations. We can also do natural language operational activities. So for example, if I get a new vendor rather than having a heavyweight process, I can just say in Slack, "Hey, PLM, add this user to this service."

One of the device teams within AWS has been piloting an AI-powered PLM platform which leverages AWS services such as AWS Bedrock and Amazon Q. Some of the initial feedback is very, very promising. The feedback from a variety of roles, from sustainability scientists to product design engineers, is that the generative PLM platform lets them innovate by cutting down on the amount of time that they need to look for things, really streamlining that knowledge discovery, and that just saves tons and tons of time for them.

It accelerates workflows by automating repetitive information gathering tasks, so you can set up a little agent every day to aggregate information on your behalf. This allows the teams to focus on productivity and higher-value activities like innovating and problem solving we talked about, and this leads to increased productivity. It also enhances decision making by providing relevant contextual information grounded in organizational knowledge, resulting in more informed decisions that align with business goals and priorities.

It also improves collaboration by making knowledge more accessible and discoverable across teams. Obviously, it's a big deal to have that information flow happen. This fosters a culture of knowledge sharing and drives continuous learning and success outcomes. As AI services improve, these types of tooling will become more impactful, and we're just getting started.

So if you're interested in exploring the future of PLM with AWS, please reach out to your account team. Thank you all for coming. It's been my honor to present our journey to you here today, and I look forward to exploring the future with you. Thank you all so much. And please remember to fill out your session survey so that Dan and Jim can get invited back next year and we can see what they did. So thank you all.

; This article is entirely auto-generated using Amazon Bedrock.

DEV Community