DEV Community

Bikash Kh
Bikash Kh

Posted on

How I Replicated Uber's Core Marketplace in Python: A Technical Deep Dive

I Built UberSim v2.0: A Production-Grade Urban Mobility Intelligence Platform ๐Ÿš—๐Ÿง 

Every time you open Uber and see a 2.1ร— surge multiplier, a complex system has already predicted demand, optimized prices, matched drivers, and logged events for future learning โ€” all within milliseconds.

I wanted to understand how those systems work.

So I built UberSim v2.0.

A Python-based urban mobility intelligence platform that simulates the core engineering challenges behind modern ride-sharing marketplaces.

Instead of building another dashboard project, I wanted to recreate the intelligence layer behind a ride-sharing platform from scratch.

๐Ÿš€ What's Inside?

๐Ÿง  Demand Forecasting

  • Spatio-temporal demand prediction (Rยฒ = 0.89)
  • Weather effects, seasonality, lag features, and neighboring zone influence
  • Predicts ride demand across multiple city zones

๐Ÿ•ธ๏ธ Graph Neural Networks

  • Models the city as a graph
  • Nodes = city zones
  • Edges = historical trip flows
  • Captures spatial mobility patterns that traditional models miss

๐Ÿค– Reinforcement Learning Pricing

Built a PPO-based surge pricing engine that learns pricing policies instead of relying on hand-crafted rules.

Optimizes multiple objectives simultaneously:

  • ๐Ÿ“ˆ Platform revenue
  • ๐Ÿš• Driver earnings
  • ๐Ÿ˜Š Rider welfare
  • โฑ๏ธ Wait times
  • โš–๏ธ Fairness constraints

One interesting finding:

The RL agent learned to gradually increase surge prices instead of aggressively reacting to demand spikes. This behavior wasn't explicitly programmed.


โšก Kafka-Style Real-Time Streaming

Implemented an event-driven architecture with:

  • Ride request streams
  • Driver status updates
  • Pricing events
  • Match results

Supports historical replay and live marketplace metrics.


๐Ÿง  Driver State LSTM

Predicts four operational driver states:

  • online_idle
  • online_busy
  • relocating
  • offline

Built entirely in NumPy with Backpropagation Through Time and Adam optimization.


๐Ÿงช Counterfactual A/B Testing

Implemented production-style experimentation techniques:

  • IPS (Inverse Propensity Scoring)
  • Doubly Robust Estimation
  • CUPED variance reduction
  • Bootstrap confidence intervals

This allows evaluating policies without deploying every experiment in production.


๐Ÿ—บ๏ธ Multi-Modal Transit Planning

Journey planning across six transportation modes:

  • ๐Ÿš— Rideshare
  • ๐ŸšŒ Bus
  • ๐Ÿš‡ Subway
  • ๐Ÿšฒ Bike
  • ๐Ÿ›ด Scooter
  • ๐Ÿšถ Walking

Uses A*/Dijkstra optimization to balance:

  • Travel time
  • Cost
  • COโ‚‚ emissions
  • Number of transfers

๐Ÿ’ก What I Learned

The hardest problem isn't maximizing revenue.

It's maximizing revenue while remaining fair.

Without constraints, optimization naturally prioritizes high-demand areas and disadvantages low-supply neighborhoods.

Adding fairness fundamentally changes the optimization landscape.

Some other takeaways:

  • RL discovers strategies humans don't explicitly program.
  • GNNs capture spatial relationships that tabular models miss.
  • Causal inference is essential for policy evaluation.
  • Pure NumPy is more powerful than people think.

๐Ÿ› ๏ธ Tech Stack

Python ยท Streamlit ยท Plotly ยท Stable-Baselines3 ยท NetworkX ยท NumPy ยท Scikit-Learn ยท Gymnasium


๐Ÿ”ฎ What's Next?

  • [ ] Graph Attention Networks (GAT)
  • [ ] Multi-Agent Reinforcement Learning
  • [ ] Real Kafka Broker Integration
  • [ ] WebGL City Visualization
  • [ ] Real-World Dataset Integration (NYC TLC, Chicago Divvy)

๐Ÿ”— GitHub

https://github.com/kh-bikash/ubersim

Feedback, ideas, and contributions are welcome ๐Ÿš€

Top comments (0)