DEV Community

Cover image for Architecting Large-Scale Migrations with Fannie Mae and the NRO (AWS re:Invent 2025 – WPS201)
Ahmed Adel
Ahmed Adel

Posted on

Architecting Large-Scale Migrations with Fannie Mae and the NRO (AWS re:Invent 2025 – WPS201)

Session: WPS201 - AWS re:Invent 2025

This comprehensive session explores how to migrate complex enterprise and federal workloads to AWS without disrupting critical missions or compromising security controls. The presentation combines core migration patterns, AWS Well-Architected guidance, and real-world customer journeys from the National Reconnaissance Office (NRO) and Fannie Mae.

Why large migrations are hard

The session opens by revisiting familiar migration drivers: lowering compute costs and increasing innovation velocity as organizations move from on-premises environments to rehosted, replatformed, and cloud-native architectures.

The 7 Rs of Migration

The speaker frames migration strategy in terms of the "7 Rs," emphasizing that large-scale programs almost always use a mix of these approaches rather than a single pattern:

Strategy Description Best For
Rehost "Lift and shift" to cloud Quick migration, minimal changes
Relocate Move to different infrastructure Hypervisor-level migrations
Repurchase Move to SaaS solutions Replacing custom applications
Retain Keep on-premises Compliance or latency requirements
Retire Eliminate unused applications Reducing technical debt
Replatform Minor cloud optimizations Moderate benefits with low risk
Refactor Redesign for cloud-native Maximum cloud benefits

⚠️ Key Message: Tooling alone is not enough, especially when dealing with diverse legacy systems, complex databases, and tight budgets. Organizations with limited cloud experience must invest early in automation, governance, and repeatable mechanisms for account management.

Foundational patterns: MRA, phases, and landing zones

The session recommends using an AWS Migration Readiness Assessment (MRA) as an on-ramp for any large program. The MRA process helps:

  • Inventory applications and discover dependencies
  • Determine appropriate migration strategies per workload
  • Avoid one-off, unstructured decisions

Migration Phases and Tools

Phase Focus Key AWS Tools
Assess Discovery & planning Application Discovery Service, Migration Readiness Assessment
Mobilize Setup & preparation Control Tower, Landing Zone, Account vending
Migrate/Modernize Execution & optimization Migration tools, Well-Architected reviews

In parallel, organizations should establish an AWS Control Tower-style landing zone or "account vending machine" to provision new accounts quickly and consistently, with built-in guardrails for security, compliance, and governance.

Applying the AWS Well-Architected Framework

Throughout the session, the AWS Well-Architected Framework acts as the backbone for design and review. The six pillars are positioned as a way to reason about trade-offs for each workload, not as a checklist to satisfy after migration.

AWS Well-Architected Framework Pillars

Pillar Focus Area Key Considerations
Operational Excellence Running & monitoring systems Automation, procedures, continuous improvement
Security Protecting information & systems Identity, permissions, data protection
Reliability System availability & recovery Fault tolerance, backup, disaster recovery
Performance Efficiency Using resources effectively Right-sizing, monitoring, technology selection
Cost Optimization Delivering value at lowest cost Resource optimization, pricing models
Sustainability Minimizing environmental impact Efficiency improvements, renewable energy

💡 Practical Tip: Ask business stakeholders which pillars matter most for each application. Some workloads may emphasize reliability and operational excellence, while others prioritize cost or performance. These decisions should drive architecture choices, capacity planning, and operational runbooks.

Organizational enabler: a Cloud Center of Excellence

A recurring theme is the importance of a Cloud Center of Excellence (CCoE) for sustained modernization. The CCoE is described as a cross-functional team spanning:

  • 🏗️ Cloud architecture
  • 🖥️ Infrastructure
  • 🔒 Security
  • ⚙️ Operations
  • 💻 Software engineering

CCoE Responsibilities

Rather than a purely formal committee, the CCoE serves as a pragmatic mechanism for scaling expertise:

  • Standardizing landing zones and network patterns
  • 🔐 Defining IAM controls and security policies
  • 💰 Establishing cost management practices
  • 📚 Codifying best practices and lessons learned
  • 🚀 Enabling innovation across business units

Both Fannie Mae and the NRO leveraged CCoEs to significantly increase their "innovation velocity" after completing initial migrations.

Fannie Mae: reimagining financial forecasting

🏦 Fannie Mae Case Study: Financial Forecasting Transformation

The Fannie Mae story focuses on transforming a decades-old financial forecasting process into a modern, scalable platform on AWS. As a cornerstone of the U.S. housing finance system, Fannie Mae:

  • 🏠 Purchases home loans from lenders
  • 📦 Packages them into mortgage-backed securities
  • 💼 Sells them to investors

This makes accurate, timely forecasting of portfolio performance a mission-critical capability.

Legacy System Challenges

Challenge Impact
Spreadsheet-heavy processes Manual errors, limited scalability
Fragmented systems Data silos, integration complexity
End-of-life infrastructure Performance bottlenecks, maintenance costs
Spread-thin expertise Knowledge silos, single points of failure
Heavy customizations Slow change cycles, brittle systems

These limitations severely constrained the organization's ability to provide forward-looking insights in a dynamic macroeconomic environment.

Ambitious goals: from 80 days to 10 days

🎯 Program Objectives: Transforming at Scale

Launched in September 2023, the Forecast Transformation Program set ambitious goals:

Metric Before Target Improvement
Forecasting lifecycle ~80 days ~10 days 87.5% reduction
Stress-test execution Sequential Parallel Concurrent processing
System consolidation ~80 systems 1 platform 98.75% reduction
Calculations consolidated ~2,000 Unified Single source of truth
Loan records capacity Limited 1.5B per run Massive scale

Additional Requirements

  • Auto-scaling compute based on demand
  • 🔒 Strict regulatory compliance (security, auditability, data quality)
  • 🎭 Holistic approach - tackling people, process, technology, and governance simultaneously

Target architecture: S3 backbone, EMR, and Step Functions

🏗️ Solution Architecture

Fannie Mae selected AWS for its comprehensive storage and database services and ability to meet stringent security and compliance requirements.

Component AWS Service Purpose Key Benefits
Data Backbone Amazon S3 Unified data domain Input data, model outputs, calculation results, analytics data
Compute Engine Amazon EMR Big data processing Billions of loan records, multiple clusters, auto-scaling
Orchestration AWS Step Functions End-to-end workflow Data ingestion, models, calculations, system integration
Metadata & Config Aurora + DynamoDB Configuration management Scenarios, model parameters, calculation rules
User Interface Angular on Fargate Business user experience Scenario definition, input specification, execution triggers
Analytics SageMaker + Tableau Reporting & analysis Regulatory reporting, internal analytics

💡 Smart Design Choice: YAML Business Logic

A particularly clever design encodes business calculations in YAML, which:

  • Decouples business logic from application code
  • Enables business-driven changes without code redeployment
  • Reduces development cycle time for rule updates

Execution flow: from user input to analytics

🔄 Execution Flow: From Input to Analytics

graph TD
    A[Business User Input] --> B[Data Ingestion]
    B --> C[S3 Data Backbone]
    C --> D[Model Execution]
    D --> E[EMR Calculations]
    E --> F[Output Processing]
    F --> G[Analytics & Reporting]
Enter fullscreen mode Exit fullscreen mode

Step-by-Step Process

  1. 👤 User Input: Business users select scenario types and enter assumptions via UI
  2. 📥 Data Ingestion: Platform pulls data from multiple systems of record
  3. 🗄️ Data Storage: All data written to S3 data backbone
  4. 🤖 Model Execution: Invokes existing and new platform-specific models
  5. ⚙️ Rule Application: EMR applies thousands of YAML-defined business rules
  6. 📤 Output Distribution: Results routed to downstream systems via APIs and SNS
  7. 📊 Analytics Preparation: Data prepared for regulatory and management tools

Real-world engineering challenges

🏗️ Engineering Challenges at Scale

Building this platform was not straightforward - here's what the team faced:

Team & Process Challenges

Challenge Scale Solution Approach
Team coordination 100+ engineers Standardized guardrails, coding standards, integration patterns
System integration 76 upstream/downstream systems Careful API design, minimal disruption approach
Program duration Multi-year timeline Consistent governance, regular architecture reviews

Technical & Operational Challenges

  • 💾 Data Volume: Each scenario generates ~500 TB of data

    • Required early archival strategy design
    • Drove significant storage cost considerations
  • Performance Optimization: Multiple EMR clusters processing massive workloads

    • Minimized cross-AZ traffic for latency reduction
    • Controlled data transfer costs
  • 📈 Scaling Limits: Constantly hitting AWS service quotas

    • Continuous monitoring and quota increases
    • Proactive capacity planning

Cost, performance, and reliability: applied Well-Architected

💰 Cost Optimization: Business Requirement, Not Nice-to-Have

Cost optimization was treated as a business requirement, with continuous architecture refinement using the Well-Architected Framework.

Optimization Strategies

Strategy Implementation Benefit
S3 Partitioning Smart data organization Reduced query costs, improved performance
Redshift Spectrum Analytics over S3 data Cost-efficient analytics without data movement
Pre-partitioning S3 optimization Support for tens of millions TPS
Real-time monitoring Cost visibility framework Proactive cost management
Pre-execution estimates UI cost preview Business user cost awareness

EMR Fleet Strategy

Mixed instance types and architectures for optimal cost-performance:

  • 🔧 Intel processors for compatibility
  • AMD processors for cost-performance balance
  • 🚀 AWS Graviton for energy efficiency and cost savings
  • 🎯 Workload-based selection for optimal resource utilization

Measurable outcomes

📈 Measurable Outcomes: Transformational Results

The transformation delivered significant and measurable improvements across all key metrics:

Performance & Cost Results

Metric Before After Improvement
Infrastructure costs Baseline 61% of baseline 39% reduction
Execution time Baseline 30% of baseline 70% reduction
Manual processes Baseline 40% of baseline 60% reduction
System execution ~20 days ~3 days 85% reduction
Loan records/scenario ~1.4M ~36M 25x increase

Operational Efficiency

  • 📊 Scenario capacity: 20+ scenarios per month at massive scale
  • 👁️ Observability: Dramatically improved monitoring and governance
  • 👥 Team efficiency: From 15 squads (build) → 2 squads (support)
  • 🎯 Operational complexity: Significant reduction in day-to-day management

These results represent a complete transformation from a legacy, constraint-heavy system to a modern, scalable, cloud-native platform.

Lessons you can apply

🎓 Key Lessons for Your Organization

The session concludes with actionable lessons that extend beyond this single use case:

Core Principles

Principle Description Why It Matters
Marathon mindset Many sprints, not a big bang Sustainable progress, manageable risk
Celebrate incremental wins Acknowledge progress milestones Maintain team motivation, stakeholder support
People over technology Focus on collaboration & governance Technology is an enabler, people determine success
Business-first approach Start with problems, not solutions Ensures technology serves business objectives

Non-Negotiable Foundations

🗄️ Data Quality & Governance

  • Critical for migration success
  • Essential for advanced analytics and AI enablement
  • Must be established from day one, not retrofitted

🤝 Agile Stakeholder Engagement

  • Iterative collaboration with business stakeholders
  • Continuous refinement with cloud providers
  • Architecture evolution based on real-world feedback

💡 Final Takeaway: Successful large-scale migrations require a balanced approach combining technical excellence, organizational change management, and continuous stakeholder engagement. Technology amplifies good processes and governance - it doesn't replace them.

Top comments (0)