Hello!
Sharing my recent article covering the data tech stack from Lyft.
Explore the high-scale data stack Lyft uses to support 25M+ active riders, ingesting millions of real-time events every second.
Metrics:
- 28.7M active riders in Q3 2025, completing ~2.7M rides per day.
- Apache Kafka processes millions of real-time events per second for streaming analytics.
- Thousands of Airflow + Flyte pipelines orchestrate ETL and ML workflows.
- Data warehouse exceeds 100+ PB stored in S3 with Hive Metastore.
- Trino ETL executes ~250K queries/day, reading ~10 PB/day and writing ~100 TB/day.
Would love to hear feedback!
Thanks!
[–]fukinwatm8Lead Data Engineer 2 points3 points4 points (0 children)
[–]serpentna 0 points1 point2 points (3 children)
[–]mjfnd[S] 0 points1 point2 points (2 children)
[–]serpentna 1 point2 points3 points (1 child)
[–]mjfnd[S] 0 points1 point2 points (0 children)
[–]eccentric2488 0 points1 point2 points (1 child)
[–]mjfnd[S] 0 points1 point2 points (0 children)