Speaker: Jayaseelan Shanmugam @ AWS FSI Meetup 2025 Q4
Introduction to PayPal:
PayPal is a global payment service provider processing 1.7 trillion in annual payment volume.
Operates in 200 global markets with 430 million active accounts.
Processes approximately 900 transactions per second, with peaks during holiday seasons like Black Friday and Cyber Monday.
Critical Problem:
Ensuring the accuracy and reconciliation of the massive transaction volume.
Reconciling transactions across multiple systems within PayPal, with external processors, and networks.
Matching transactions to ensure no data or financial discrepancies.
Validating that financial records accurately reflect actual customer payments.
Reconciliation Process:
Transactions flow through PayPal’s system and are recorded in multiple internal ledgers.
Transactions are sent to external processors for clearing and confirmation by the network.
PayPal settles the transaction money to the merchant.
Reconciliation involves matching transactions across PayPal’s internal systems, processor acknowledgments, and funding settlement summaries (end of day, T+1, T+2).
Primary goal: Ensure transactions are not lost and there are no discrepancies.
Why Reconciliation Matters:
Three-way matching problem: PayPal internal ledger, external processor records, and network confirmations.
Critical for financial accuracy and customer trust.
Ensures that financial records reflect actual customer payments.
High-Level Architecture:
Focusing on how PayPal achieved near real-time reconciliation.
Technologies and strategies used to handle the scale and complexity of the problem.
Business Impact:
Reduction in reconciliation time from 24 hours to 15 minutes.
Improved accuracy with minimal discrepancies.
Enhanced customer trust and operational efficiency.
Continued Discussion on PayPal’s Reconciliation System
Three-Way Matching Problem:
PayPal Internal Ledger:
- Records transactions within PayPal’s systems.
External Processor:
- Registers transactions in their local system.
Network Confirmation:
- Confirms whether the transaction has been successfully made.
Responsibilities:
PayPal is responsible for matching transactions at every stage from entry into the system to settlement with the merchant.
Manual reconciliation is impractical due to the high volume of transactions.
Automated State Machine with Rule Engine:
Utilizes an automated state machine and high-level rule engine.
Configured to handle transaction processing with external vendors and timelines for acknowledgments and funding summaries.
Ensures transactions are reconciled efficiently.
Importance of Reconciliation:
Critical for understanding "what happened versus what actually happened."
Ensures transactions are auditable and compliant with regulatory standards (PCI DSS).
Provides a clear record of when transactions were recorded and settled.
Current Gaps in Legacy System:
The legacy system relies on end-of-day batch processing.
Uses a store and process mechanism where transactions accumulate throughout the day and are reconciled at the end of the day.
Source of truth is not the direct operational data store to avoid performance and latency impacts.
Utilizes an ETL system sourcing data from Oracle GoldenGate.
ETL pipeline involves transformation and formatting, leading to potential data mismatches or inconsistencies.
Need for Improvement:
Move away from batch processing to near real-time processing.
Reduce reliance on ETL systems to minimize data transformation issues.
Enhance automation to ensure accurate and timely reconciliation.
Problems with Legacy System:
Experiences delays due to accommodating all transactions until the end of the day.
Transactions are matched and account books are closed only at the end of the day.
This delay is a significant problem.
Objective:
Transition to a new age platform in the cloud.
Leverage AWS infrastructure to solve the aforementioned problems.
Key Objectives of the New Solution:
End-to-End Data Integrity Across Payment Lifecycle:
Ensure data integrity from the moment a record enters the real-time payment processing system.
Track transactions across multiple systems within PayPal.
Link all transactions with the correct identifier and timestamp.
Match outbound files sent to processors with inbound records received from networks or vendors.
Automated State-Driven Match Logic:
Move from a store-and-process mechanism to a stream-and-process mechanism.
Reduce the entire reconciliation cycle.
Real-Time Monitoring:
Identify exceptions while matching transactions within the internal system or records from external vendors.
Record exceptions where matches fail between received records and local ledger transactions.
Operational team to act on these recorded exceptions.
Technical Architecture:
Data Injection:
- Sources from which data is injected into the reconciler.
Reconciliation Process:
- Methods and processes involved in performing reconciliation.
Storage of Reconciliation Outcomes:
- How the results of the reconciliation are stored.
Operational Team Leverage:
- How the operational team uses reconciliation exceptions and acts on them.
High-Level Technical Reconciliation Overview
Scope Confinement:
Upstream payment processing systems are abstracted out.
Focus starts with the real-time payment card processor.
Real-Time Payment Card Processor:
Utilizes EKS service to receive millions of transactions per day (expected ~300 million transactions daily).
Each transaction is recorded in AWS DynamoDB, which serves as the source of truth and operational data store.
Data Flow:
[ 1 ] DynamoDB to Kinesis Data Stream:
Transactions recorded in DynamoDB are streamed via Kinesis Data Stream.
Kinesis manages ordering of transactions.
[ 2 ] Amazon Data Firehose:
Transactions are bucketed and chunked based on different business parameters.
[ 3 ] AWS S3:
Transactions are recorded in AWS S3.
S3 acts as a secondary data store for transactions but primary for file processing.
Reconciliation Process:
[ 1 ] Inbound Transactions in PayPal:
Processed transactions in PayPal are translated into file format in AWS S3.
[ 2 ] External Partner Processing:
Chunk files are translated into files and processed with external partners.
Inbound records from external partners are returned to S3.
AWS S3 as Central Source:
- S3 holds both internal transaction footprints and external processed transactions received as inbound files.
Data Processing:
EventBridge Scheduling:
Triggers Apache Spark on AWS EMR cluster every 15 minutes.
Distributed processing of transactions in S3 using a preconfigured rule engine.
Rule Engine:
Comprises multiple state graphs.
Categorizes transactions and determines terminal states.
Includes complex rules based on market operations, external partners, and cutoff times for data export/import.
Technical Reconciliation Architecture
Apache EMR Cluster and Rule Engine:
Apache EMR cluster utilizes the rule engine to match transactions.
Successful reconciliation results are written back to AWS S3.
Exceptions are sent to AWS EventBridge, which triggers a Lambda function to enrich and report exceptions back to S3.
Storage and Operational Aspects:
Data stored as parquet files in S3.
Apache Glue Catalog configured on top of parquet files.
The operational team can query data using Amazon Athena in SQL fashion.
Custom-built UI portal on top of Glue Catalog provides detailed reconciliation states and outcomes for specific days, settlements, or partners.
Architecture Highlights:
Active-Active Architecture:
Operates across multiple AWS regions.
Ensures high availability with zero recovery point objective (RPO) and recovery time objective (RTO).
If one region goes down, another can process transactions seamlessly.
In-flight transactions are managed using Amazon Kinesis Data Stream and DynamoDB for consistency across regions.
Technology and Architecture Decisions:
AWS EMR vs. Redshift:
Considered using Redshift for a data lake solution but opted for AWS EMR due to cost efficiency.
EMR cluster extension to the core processing system, leveraging existing S3 data store.
Low-cost solution achieved by using AWS EMR to realize the problem statement.
Business Impact of the New Reconciliation Solution
Accuracy:
- Improved from three 9s to four 9s.
Speed:
Reduced reconciliation time from 24 hours to a 15-minute cycle.
Horizontal cluster ensures consistent processing time (max 30 minutes) regardless of transaction volume (1 million to 300 million transactions).
Risk Reduction:
Faster reconciliation (within 15 minutes) minimizes potential fraud and risk.
Allows for quicker action on system or external issues.
Cost Optimization:
Chosen AWS EMR cluster over data lake solutions for cost efficiency.
EMR cluster and Lambda functions operate on-demand, not continuously.
Computing instances have a limited lifetime, freeing up resources and minimizing costs once processing is complete.
Top comments (0)