Eliana Lam

Posted on Nov 30, 2025 • Originally published at aws-user-group.com

Stifel Modern Data Platform

#architecture #aws #dataengineering

Speaker: Martin Nieuwoudt, Hossein Johari, Srinivas Kandi, Amit Maindola @ AWS FSI Meetup 2025 Q4

Stifel Overview:

Around 10,000 associates across North America and Europe
Approximately $540 billion in assets under management
Mission: Making dreams come true through mortgages, retirement planning, and capital provision
Notable project: Financing the Mackinac Bridge in the 1950s

Data Journey Analogy:

Pre-modernization: Like waiting for a ferry with limited capacity, long wait times, and high operational overhead
Post-modernization: Like the Mackinac Bridge, enabling a continuous flow and seamless connection

Modern Data Platform:

Goal: Connect people, processes, and insights seamlessly
Analogous to the Mackinac Bridge uniting Michigan
Unites the business through data

Premodernization Environment:

Used expensive and powerful SQL server
High availability with six nodes
One node resided on AWS cloud
Faced issues like growth (both organic and through mergers and acquisitions)

Challenges Premodernization:

Resources were contentious due to storage and compute being on the same servers
Rapid growth led to multiple business teams developing their own business logic
Created technical sprawl with minor differences in processing for different business units
Lack of data governance, data catalog, and knowledge of available data

Business Drivers for Modernization:

Unified set of data with fully integrated business logic in one place
No duplication, system should grow without limitations
Technology should align with business needs, not develop processes based on perceived importance
Improve operational efficiency, reduce friction, ensure data availability to clients
Enhance governance, compliance, and control

High-Level Platform Requirements:

Centralized business logic
Predefined data products approved by business owners
Scalable system, easily adjustable, performant under any data or process pressure
Metadata-driven, event-based notifications for seamless integration
Continuous processing once sources are ready to meet SLA
Immediate notification to operations department in case of issues
Comprehensive monitoring, alerting, and ticketing for observability

Chosen Approach: Data Mesh Architecture

Hub-and-spoke model with data domains aligned to business units
Data products shared between business units, allowing access even if not the owner
Metadata-driven, event-based notifications for seamless integration
Continuous processing, immediate issue notification, and comprehensive monitoring

Data Catalog and Centralized Publishing:

Fully open data catalog to inform business of available data
Centralized place for publishing and defining data to the business

Three-Tier Architecture:

Raw Data Ingestion:

Collects data from various vendors and trading systems
Data in different formats, some from outside the country
Stored in a data lake with historical data (up to 20+ years)
Most up-to-date section of the data lake

Central Governance Account:

Acts as a glue between all components
Supports data sharing between data domains and raw ingestions
Sends notifications when new data is available in the data lake
Responsible for governance, data catalog, and business glossary
Processes run daily to collect data catalog information
Develop business glossaries

Data Domains:

Aligned with business operations
Each domain owns and produces its data, but shares with other domains
Analytics data domain collects all data for analytics, BI dashboards, reporting, AI applications, and intelligent processes

Key Takeaways for Implementing the Architecture:

Federated Governance:

Balances domain autonomy with organizational consistency
Empowers domains to create and manage data products while adhering to centralized quality standards
Ensures innovation, reliability, and operational excellence
Improves operational efficiency by reducing system interdependencies and streamlining maintenance
Achieves cost optimization through reduced development and infrastructure expenses

Technical Best Practices:

Event-Driven Architecture:

Embraces publish-subscribe patterns
Messaging pattern: message senders, called publishers, categorize messages into classes, and send them without needing to know which components will receive them.
Moves away from rigid batch dependencies
Enables real-time data flows that respond dynamically to business events

Metadata-Driven Architecture:

Centrally manages dependencies and pipeline states
Allows intelligent decisions about workflow execution and resource allocation

Standardization on Open Data Formats:

Uses Apache Hudi for data lake storage
Ensures interoperability across tech stack
Provides optimized storage patterns for batch and streaming workloads
Maintains data consistency

Organizational Transformation:

Breaking Down Data Silos:

Standardizes toolset and implements data product approach
Enables seamless data sharing and improves cross-functional collaboration

Empowering Business Domains:

Grants greater autonomy in data governance
Allows informed decisions on data sharing vs. domain-specific data

Customer-First Data Set:

Implements systems for real-time data processing and personalized customer experiences
Enhances ability to respond to customer needs dynamically

Agile and Responsive Organization:

Focuses on creating an organization that can better serve customers
Maintains balance between centralization and domain autonomy
Embraces latest technological changes

Journey Timeline:

2021: Started with high-level architecture
2023: Refined architecture, presented to leadership, and received approval
Engaged AWS Proserve: To help build the architecture

Implementation Phases:

September 2024: Built and tested core components
January 2025: Floated limited data to the new architecture
September 2025: Built API endpoints using data within the architecture
Current: APIs are part of an application supporting clients; ongoing work to enrich data domains and onboard more applications

Future Focus:

Building New Data Domains: While enriching existing domains
Enterprise-Wide Adoption: As more data becomes available on the platform
Operational Efficiencies: With the shift away from the legacy platform
Unstructured Data and AI Use Cases: Expanding the platform to include unstructured data and emerging AI applications

Data Mesh Overview:

Decentralized data architecture
Treats data as a product
Shifts ownership from central team to individual business domains

Key Principles:

Domain-Oriented Decentralization:

Data ownership and management by individual business domains
Each domain manages its own data products

Data as a Product:

Data is treated as a consumable product with clear value proposition
Focus on data quality, discoverability, and accessibility

Self-Serve Data Infrastructure:

Provides a platform that enables domains to manage their data independently
Empowers domains with tools and capabilities for data processing and analytics

Federated Computational Governance:

Establishes standards and guidelines for data quality, security, and compliance
Balances domain autonomy with organizational consistency
Ensures data products meet organizational standards while allowing for innovation

Benefits: