DEV Community

Eliana Lam
Eliana Lam

Posted on • Originally published at aws-user-group.com

Stifel Modern Data Platform

Speaker: Martin Nieuwoudt, Hossein Johari, Srinivas Kandi, Amit Maindola @ AWS FSI Meetup 2025 Q4



Stifel Overview:

  • Around 10,000 associates across North America and Europe

  • Approximately $540 billion in assets under management

  • Mission: Making dreams come true through mortgages, retirement planning, and capital provision

  • Notable project: Financing the Mackinac Bridge in the 1950s

Data Journey Analogy:

  • Pre-modernization: Like waiting for a ferry with limited capacity, long wait times, and high operational overhead

  • Post-modernization: Like the Mackinac Bridge, enabling a continuous flow and seamless connection

Modern Data Platform:

  • Goal: Connect people, processes, and insights seamlessly

  • Analogous to the Mackinac Bridge uniting Michigan

  • Unites the business through data

Premodernization Environment:

  • Used expensive and powerful SQL server

  • High availability with six nodes

  • One node resided on AWS cloud

  • Faced issues like growth (both organic and through mergers and acquisitions)



Challenges Premodernization:

  • Resources were contentious due to storage and compute being on the same servers

  • Rapid growth led to multiple business teams developing their own business logic

  • Created technical sprawl with minor differences in processing for different business units

  • Lack of data governance, data catalog, and knowledge of available data

Business Drivers for Modernization:

  • Unified set of data with fully integrated business logic in one place

  • No duplication, system should grow without limitations

  • Technology should align with business needs, not develop processes based on perceived importance

  • Improve operational efficiency, reduce friction, ensure data availability to clients

  • Enhance governance, compliance, and control

High-Level Platform Requirements:

  • Centralized business logic

  • Predefined data products approved by business owners

  • Scalable system, easily adjustable, performant under any data or process pressure

  • Metadata-driven, event-based notifications for seamless integration

  • Continuous processing once sources are ready to meet SLA

  • Immediate notification to operations department in case of issues

  • Comprehensive monitoring, alerting, and ticketing for observability

Chosen Approach: Data Mesh Architecture

  • Hub-and-spoke model with data domains aligned to business units

  • Data products shared between business units, allowing access even if not the owner

  • Metadata-driven, event-based notifications for seamless integration

  • Continuous processing, immediate issue notification, and comprehensive monitoring

Data Catalog and Centralized Publishing:

  • Fully open data catalog to inform business of available data

  • Centralized place for publishing and defining data to the business



Three-Tier Architecture:

Raw Data Ingestion:

  • Collects data from various vendors and trading systems

  • Data in different formats, some from outside the country

  • Stored in a data lake with historical data (up to 20+ years)

  • Most up-to-date section of the data lake

Central Governance Account:

  • Acts as a glue between all components

  • Supports data sharing between data domains and raw ingestions

  • Sends notifications when new data is available in the data lake

  • Responsible for governance, data catalog, and business glossary

  • Processes run daily to collect data catalog information

  • Develop business glossaries

Data Domains:

  • Aligned with business operations

  • Each domain owns and produces its data, but shares with other domains

  • Analytics data domain collects all data for analytics, BI dashboards, reporting, AI applications, and intelligent processes



Key Takeaways for Implementing the Architecture:

Federated Governance:

  • Balances domain autonomy with organizational consistency

  • Empowers domains to create and manage data products while adhering to centralized quality standards

  • Ensures innovation, reliability, and operational excellence

  • Improves operational efficiency by reducing system interdependencies and streamlining maintenance

  • Achieves cost optimization through reduced development and infrastructure expenses

Technical Best Practices:

Event-Driven Architecture:

  • Embraces publish-subscribe patterns

  • Messaging pattern: message senders, called publishers, categorize messages into classes, and send them without needing to know which components will receive them.

  • Moves away from rigid batch dependencies

  • Enables real-time data flows that respond dynamically to business events

Metadata-Driven Architecture:

  • Centrally manages dependencies and pipeline states

  • Allows intelligent decisions about workflow execution and resource allocation

Standardization on Open Data Formats:

  • Uses Apache Hudi for data lake storage

  • Ensures interoperability across tech stack

  • Provides optimized storage patterns for batch and streaming workloads

  • Maintains data consistency

Organizational Transformation:

Breaking Down Data Silos:

  • Standardizes toolset and implements data product approach

  • Enables seamless data sharing and improves cross-functional collaboration

Empowering Business Domains:

  • Grants greater autonomy in data governance

  • Allows informed decisions on data sharing vs. domain-specific data

Customer-First Data Set:

  • Implements systems for real-time data processing and personalized customer experiences

  • Enhances ability to respond to customer needs dynamically

Agile and Responsive Organization:

  • Focuses on creating an organization that can better serve customers

  • Maintains balance between centralization and domain autonomy

  • Embraces latest technological changes



Journey Timeline:

  • 2021: Started with high-level architecture

  • 2023: Refined architecture, presented to leadership, and received approval

  • Engaged AWS Proserve: To help build the architecture

Implementation Phases:

  • September 2024: Built and tested core components

  • January 2025: Floated limited data to the new architecture

  • September 2025: Built API endpoints using data within the architecture

  • Current: APIs are part of an application supporting clients; ongoing work to enrich data domains and onboard more applications

Future Focus:

  • Building New Data Domains: While enriching existing domains

  • Enterprise-Wide Adoption: As more data becomes available on the platform

  • Operational Efficiencies: With the shift away from the legacy platform

  • Unstructured Data and AI Use Cases: Expanding the platform to include unstructured data and emerging AI applications



Data Mesh Overview:

  • Decentralized data architecture

  • Treats data as a product

  • Shifts ownership from central team to individual business domains

Key Principles:

Domain-Oriented Decentralization:

  • Data ownership and management by individual business domains

  • Each domain manages its own data products

Data as a Product:

  • Data is treated as a consumable product with clear value proposition

  • Focus on data quality, discoverability, and accessibility

Self-Serve Data Infrastructure:

  • Provides a platform that enables domains to manage their data independently

  • Empowers domains with tools and capabilities for data processing and analytics

Federated Computational Governance:

  • Establishes standards and guidelines for data quality, security, and compliance

  • Balances domain autonomy with organizational consistency

  • Ensures data products meet organizational standards while allowing for innovation

Benefits:

  • Improved agility and scalability

  • Better management, sharing, and analysis of data products

  • Enhanced collaboration and cross-functional data usage

  • Increased operational efficiency and cost optimization

  • Support for real-time data processing and personalized customer experiences

Top comments (0)