Session: WPS201 - AWS re:Invent 2025
This comprehensive session explores how to migrate complex enterprise and federal workloads to AWS without disrupting critical missions or compromising security controls. The presentation combines core migration patterns, AWS Well-Architected guidance, and real-world customer journeys from the National Reconnaissance Office (NRO) and Fannie Mae.
Why large migrations are hard
The session opens by revisiting familiar migration drivers: lowering compute costs and increasing innovation velocity as organizations move from on-premises environments to rehosted, replatformed, and cloud-native architectures.
The 7 Rs of Migration
The speaker frames migration strategy in terms of the "7 Rs," emphasizing that large-scale programs almost always use a mix of these approaches rather than a single pattern:
| Strategy | Description | Best For |
|---|---|---|
| Rehost | "Lift and shift" to cloud | Quick migration, minimal changes |
| Relocate | Move to different infrastructure | Hypervisor-level migrations |
| Repurchase | Move to SaaS solutions | Replacing custom applications |
| Retain | Keep on-premises | Compliance or latency requirements |
| Retire | Eliminate unused applications | Reducing technical debt |
| Replatform | Minor cloud optimizations | Moderate benefits with low risk |
| Refactor | Redesign for cloud-native | Maximum cloud benefits |
⚠️ Key Message: Tooling alone is not enough, especially when dealing with diverse legacy systems, complex databases, and tight budgets. Organizations with limited cloud experience must invest early in automation, governance, and repeatable mechanisms for account management.
Foundational patterns: MRA, phases, and landing zones
The session recommends using an AWS Migration Readiness Assessment (MRA) as an on-ramp for any large program. The MRA process helps:
- Inventory applications and discover dependencies
- Determine appropriate migration strategies per workload
- Avoid one-off, unstructured decisions
Migration Phases and Tools
| Phase | Focus | Key AWS Tools |
|---|---|---|
| Assess | Discovery & planning | Application Discovery Service, Migration Readiness Assessment |
| Mobilize | Setup & preparation | Control Tower, Landing Zone, Account vending |
| Migrate/Modernize | Execution & optimization | Migration tools, Well-Architected reviews |
In parallel, organizations should establish an AWS Control Tower-style landing zone or "account vending machine" to provision new accounts quickly and consistently, with built-in guardrails for security, compliance, and governance.
Applying the AWS Well-Architected Framework
Throughout the session, the AWS Well-Architected Framework acts as the backbone for design and review. The six pillars are positioned as a way to reason about trade-offs for each workload, not as a checklist to satisfy after migration.
AWS Well-Architected Framework Pillars
| Pillar | Focus Area | Key Considerations |
|---|---|---|
| Operational Excellence | Running & monitoring systems | Automation, procedures, continuous improvement |
| Security | Protecting information & systems | Identity, permissions, data protection |
| Reliability | System availability & recovery | Fault tolerance, backup, disaster recovery |
| Performance Efficiency | Using resources effectively | Right-sizing, monitoring, technology selection |
| Cost Optimization | Delivering value at lowest cost | Resource optimization, pricing models |
| Sustainability | Minimizing environmental impact | Efficiency improvements, renewable energy |
💡 Practical Tip: Ask business stakeholders which pillars matter most for each application. Some workloads may emphasize reliability and operational excellence, while others prioritize cost or performance. These decisions should drive architecture choices, capacity planning, and operational runbooks.
Organizational enabler: a Cloud Center of Excellence
A recurring theme is the importance of a Cloud Center of Excellence (CCoE) for sustained modernization. The CCoE is described as a cross-functional team spanning:
- 🏗️ Cloud architecture
- 🖥️ Infrastructure
- 🔒 Security
- ⚙️ Operations
- 💻 Software engineering
CCoE Responsibilities
Rather than a purely formal committee, the CCoE serves as a pragmatic mechanism for scaling expertise:
- ✅ Standardizing landing zones and network patterns
- 🔐 Defining IAM controls and security policies
- 💰 Establishing cost management practices
- 📚 Codifying best practices and lessons learned
- 🚀 Enabling innovation across business units
Both Fannie Mae and the NRO leveraged CCoEs to significantly increase their "innovation velocity" after completing initial migrations.
Fannie Mae: reimagining financial forecasting
🏦 Fannie Mae Case Study: Financial Forecasting Transformation
The Fannie Mae story focuses on transforming a decades-old financial forecasting process into a modern, scalable platform on AWS. As a cornerstone of the U.S. housing finance system, Fannie Mae:
- 🏠 Purchases home loans from lenders
- 📦 Packages them into mortgage-backed securities
- 💼 Sells them to investors
This makes accurate, timely forecasting of portfolio performance a mission-critical capability.
Legacy System Challenges
| Challenge | Impact |
|---|---|
| Spreadsheet-heavy processes | Manual errors, limited scalability |
| Fragmented systems | Data silos, integration complexity |
| End-of-life infrastructure | Performance bottlenecks, maintenance costs |
| Spread-thin expertise | Knowledge silos, single points of failure |
| Heavy customizations | Slow change cycles, brittle systems |
These limitations severely constrained the organization's ability to provide forward-looking insights in a dynamic macroeconomic environment.
Ambitious goals: from 80 days to 10 days
🎯 Program Objectives: Transforming at Scale
Launched in September 2023, the Forecast Transformation Program set ambitious goals:
| Metric | Before | Target | Improvement |
|---|---|---|---|
| Forecasting lifecycle | ~80 days | ~10 days | 87.5% reduction |
| Stress-test execution | Sequential | Parallel | Concurrent processing |
| System consolidation | ~80 systems | 1 platform | 98.75% reduction |
| Calculations consolidated | ~2,000 | Unified | Single source of truth |
| Loan records capacity | Limited | 1.5B per run | Massive scale |
Additional Requirements
- ⚡ Auto-scaling compute based on demand
- 🔒 Strict regulatory compliance (security, auditability, data quality)
- 🎭 Holistic approach - tackling people, process, technology, and governance simultaneously
Target architecture: S3 backbone, EMR, and Step Functions
🏗️ Solution Architecture
Fannie Mae selected AWS for its comprehensive storage and database services and ability to meet stringent security and compliance requirements.
| Component | AWS Service | Purpose | Key Benefits |
|---|---|---|---|
| Data Backbone | Amazon S3 | Unified data domain | Input data, model outputs, calculation results, analytics data |
| Compute Engine | Amazon EMR | Big data processing | Billions of loan records, multiple clusters, auto-scaling |
| Orchestration | AWS Step Functions | End-to-end workflow | Data ingestion, models, calculations, system integration |
| Metadata & Config | Aurora + DynamoDB | Configuration management | Scenarios, model parameters, calculation rules |
| User Interface | Angular on Fargate | Business user experience | Scenario definition, input specification, execution triggers |
| Analytics | SageMaker + Tableau | Reporting & analysis | Regulatory reporting, internal analytics |
💡 Smart Design Choice: YAML Business Logic
A particularly clever design encodes business calculations in YAML, which:
- ✅ Decouples business logic from application code
- ✅ Enables business-driven changes without code redeployment
- ✅ Reduces development cycle time for rule updates
Execution flow: from user input to analytics
🔄 Execution Flow: From Input to Analytics
graph TD
A[Business User Input] --> B[Data Ingestion]
B --> C[S3 Data Backbone]
C --> D[Model Execution]
D --> E[EMR Calculations]
E --> F[Output Processing]
F --> G[Analytics & Reporting]
Step-by-Step Process
- 👤 User Input: Business users select scenario types and enter assumptions via UI
- 📥 Data Ingestion: Platform pulls data from multiple systems of record
- 🗄️ Data Storage: All data written to S3 data backbone
- 🤖 Model Execution: Invokes existing and new platform-specific models
- ⚙️ Rule Application: EMR applies thousands of YAML-defined business rules
- 📤 Output Distribution: Results routed to downstream systems via APIs and SNS
- 📊 Analytics Preparation: Data prepared for regulatory and management tools
Real-world engineering challenges
🏗️ Engineering Challenges at Scale
Building this platform was not straightforward - here's what the team faced:
Team & Process Challenges
| Challenge | Scale | Solution Approach |
|---|---|---|
| Team coordination | 100+ engineers | Standardized guardrails, coding standards, integration patterns |
| System integration | 76 upstream/downstream systems | Careful API design, minimal disruption approach |
| Program duration | Multi-year timeline | Consistent governance, regular architecture reviews |
Technical & Operational Challenges
-
💾 Data Volume: Each scenario generates ~500 TB of data
- Required early archival strategy design
- Drove significant storage cost considerations
-
⚡ Performance Optimization: Multiple EMR clusters processing massive workloads
- Minimized cross-AZ traffic for latency reduction
- Controlled data transfer costs
-
📈 Scaling Limits: Constantly hitting AWS service quotas
- Continuous monitoring and quota increases
- Proactive capacity planning
Cost, performance, and reliability: applied Well-Architected
💰 Cost Optimization: Business Requirement, Not Nice-to-Have
Cost optimization was treated as a business requirement, with continuous architecture refinement using the Well-Architected Framework.
Optimization Strategies
| Strategy | Implementation | Benefit |
|---|---|---|
| S3 Partitioning | Smart data organization | Reduced query costs, improved performance |
| Redshift Spectrum | Analytics over S3 data | Cost-efficient analytics without data movement |
| Pre-partitioning | S3 optimization | Support for tens of millions TPS |
| Real-time monitoring | Cost visibility framework | Proactive cost management |
| Pre-execution estimates | UI cost preview | Business user cost awareness |
EMR Fleet Strategy
Mixed instance types and architectures for optimal cost-performance:
- 🔧 Intel processors for compatibility
- ⚡ AMD processors for cost-performance balance
- 🚀 AWS Graviton for energy efficiency and cost savings
- 🎯 Workload-based selection for optimal resource utilization
Measurable outcomes
📈 Measurable Outcomes: Transformational Results
The transformation delivered significant and measurable improvements across all key metrics:
Performance & Cost Results
| Metric | Before | After | Improvement |
|---|---|---|---|
| Infrastructure costs | Baseline | 61% of baseline | 39% reduction |
| Execution time | Baseline | 30% of baseline | 70% reduction |
| Manual processes | Baseline | 40% of baseline | 60% reduction |
| System execution | ~20 days | ~3 days | 85% reduction |
| Loan records/scenario | ~1.4M | ~36M | 25x increase |
Operational Efficiency
- 📊 Scenario capacity: 20+ scenarios per month at massive scale
- 👁️ Observability: Dramatically improved monitoring and governance
- 👥 Team efficiency: From 15 squads (build) → 2 squads (support)
- 🎯 Operational complexity: Significant reduction in day-to-day management
These results represent a complete transformation from a legacy, constraint-heavy system to a modern, scalable, cloud-native platform.
Lessons you can apply
🎓 Key Lessons for Your Organization
The session concludes with actionable lessons that extend beyond this single use case:
Core Principles
| Principle | Description | Why It Matters |
|---|---|---|
| Marathon mindset | Many sprints, not a big bang | Sustainable progress, manageable risk |
| Celebrate incremental wins | Acknowledge progress milestones | Maintain team motivation, stakeholder support |
| People over technology | Focus on collaboration & governance | Technology is an enabler, people determine success |
| Business-first approach | Start with problems, not solutions | Ensures technology serves business objectives |
Non-Negotiable Foundations
🗄️ Data Quality & Governance
- Critical for migration success
- Essential for advanced analytics and AI enablement
- Must be established from day one, not retrofitted
🤝 Agile Stakeholder Engagement
- Iterative collaboration with business stakeholders
- Continuous refinement with cloud providers
- Architecture evolution based on real-world feedback
💡 Final Takeaway: Successful large-scale migrations require a balanced approach combining technical excellence, organizational change management, and continuous stakeholder engagement. Technology amplifies good processes and governance - it doesn't replace them.
Top comments (0)