Speaker: Martin Nieuwoudt, Hossein Johari, Srinivas Kandi, Amit Maindola @ AWS FSI Meetup 2025 Q4
Stifel Overview:
Around 10,000 associates across North America and Europe
Approximately $540 billion in assets under management
Mission: Making dreams come true through mortgages, retirement planning, and capital provision
Notable project: Financing the Mackinac Bridge in the 1950s
Data Journey Analogy:
Pre-modernization: Like waiting for a ferry with limited capacity, long wait times, and high operational overhead
Post-modernization: Like the Mackinac Bridge, enabling a continuous flow and seamless connection
Modern Data Platform:
Goal: Connect people, processes, and insights seamlessly
Analogous to the Mackinac Bridge uniting Michigan
Unites the business through data
Premodernization Environment:
Used expensive and powerful SQL server
High availability with six nodes
One node resided on AWS cloud
Faced issues like growth (both organic and through mergers and acquisitions)
Challenges Premodernization:
Resources were contentious due to storage and compute being on the same servers
Rapid growth led to multiple business teams developing their own business logic
Created technical sprawl with minor differences in processing for different business units
Lack of data governance, data catalog, and knowledge of available data
Business Drivers for Modernization:
Unified set of data with fully integrated business logic in one place
No duplication, system should grow without limitations
Technology should align with business needs, not develop processes based on perceived importance
Improve operational efficiency, reduce friction, ensure data availability to clients
Enhance governance, compliance, and control
High-Level Platform Requirements:
Centralized business logic
Predefined data products approved by business owners
Scalable system, easily adjustable, performant under any data or process pressure
Metadata-driven, event-based notifications for seamless integration
Continuous processing once sources are ready to meet SLA
Immediate notification to operations department in case of issues
Comprehensive monitoring, alerting, and ticketing for observability
Chosen Approach: Data Mesh Architecture
Hub-and-spoke model with data domains aligned to business units
Data products shared between business units, allowing access even if not the owner
Metadata-driven, event-based notifications for seamless integration
Continuous processing, immediate issue notification, and comprehensive monitoring
Data Catalog and Centralized Publishing:
Fully open data catalog to inform business of available data
Centralized place for publishing and defining data to the business
Three-Tier Architecture:
Raw Data Ingestion:
Collects data from various vendors and trading systems
Data in different formats, some from outside the country
Stored in a data lake with historical data (up to 20+ years)
Most up-to-date section of the data lake
Central Governance Account:
Acts as a glue between all components
Supports data sharing between data domains and raw ingestions
Sends notifications when new data is available in the data lake
Responsible for governance, data catalog, and business glossary
Processes run daily to collect data catalog information
Develop business glossaries
Data Domains:
Aligned with business operations
Each domain owns and produces its data, but shares with other domains
Analytics data domain collects all data for analytics, BI dashboards, reporting, AI applications, and intelligent processes
Key Takeaways for Implementing the Architecture:
Federated Governance:
Balances domain autonomy with organizational consistency
Empowers domains to create and manage data products while adhering to centralized quality standards
Ensures innovation, reliability, and operational excellence
Improves operational efficiency by reducing system interdependencies and streamlining maintenance
Achieves cost optimization through reduced development and infrastructure expenses
Technical Best Practices:
Event-Driven Architecture:
Embraces publish-subscribe patterns
Messaging pattern: message senders, called publishers, categorize messages into classes, and send them without needing to know which components will receive them.
Moves away from rigid batch dependencies
Enables real-time data flows that respond dynamically to business events
Metadata-Driven Architecture:
Centrally manages dependencies and pipeline states
Allows intelligent decisions about workflow execution and resource allocation
Standardization on Open Data Formats:
Uses Apache Hudi for data lake storage
Ensures interoperability across tech stack
Provides optimized storage patterns for batch and streaming workloads
Maintains data consistency
Organizational Transformation:
Breaking Down Data Silos:
Standardizes toolset and implements data product approach
Enables seamless data sharing and improves cross-functional collaboration
Empowering Business Domains:
Grants greater autonomy in data governance
Allows informed decisions on data sharing vs. domain-specific data
Customer-First Data Set:
Implements systems for real-time data processing and personalized customer experiences
Enhances ability to respond to customer needs dynamically
Agile and Responsive Organization:
Focuses on creating an organization that can better serve customers
Maintains balance between centralization and domain autonomy
Embraces latest technological changes
Journey Timeline:
2021: Started with high-level architecture
2023: Refined architecture, presented to leadership, and received approval
Engaged AWS Proserve: To help build the architecture
Implementation Phases:
September 2024: Built and tested core components
January 2025: Floated limited data to the new architecture
September 2025: Built API endpoints using data within the architecture
Current: APIs are part of an application supporting clients; ongoing work to enrich data domains and onboard more applications
Future Focus:
Building New Data Domains: While enriching existing domains
Enterprise-Wide Adoption: As more data becomes available on the platform
Operational Efficiencies: With the shift away from the legacy platform
Unstructured Data and AI Use Cases: Expanding the platform to include unstructured data and emerging AI applications
Data Mesh Overview:
Decentralized data architecture
Treats data as a product
Shifts ownership from central team to individual business domains
Key Principles:
Domain-Oriented Decentralization:
Data ownership and management by individual business domains
Each domain manages its own data products
Data as a Product:
Data is treated as a consumable product with clear value proposition
Focus on data quality, discoverability, and accessibility
Self-Serve Data Infrastructure:
Provides a platform that enables domains to manage their data independently
Empowers domains with tools and capabilities for data processing and analytics
Federated Computational Governance:
Establishes standards and guidelines for data quality, security, and compliance
Balances domain autonomy with organizational consistency
Ensures data products meet organizational standards while allowing for innovation
Benefits:
Improved agility and scalability
Better management, sharing, and analysis of data products
Enhanced collaboration and cross-functional data usage
Increased operational efficiency and cost optimization
Support for real-time data processing and personalized customer experiences
Top comments (0)