DEV Community

loading...
Cover image for Architecture for High-Throughput Low-Latency Big Data Pipeline on Cloud

Architecture for High-Throughput Low-Latency Big Data Pipeline on Cloud

Satish Chandra Gupta
Machine Learning Practitioner. I learn & write about doing ML in production. Cofounder: SlangLabs.in. Ex: Amazon, Microsoft Research. Newsletter: ML4Devs.com
Originally published at satishchandragupta.com ・1 min read

Scalable and efficient data pipelines are as important for the success of analytics and ML as reliable supply lines are for winning a war.


For deploying big-data analytics, data science, and machine learning (ML) applications in real-world, analytics-tuning and model-training is only around 25% of the work. Approximately 50% of the effort goes into making data ready for analytics and ML. The remaining 25% effort goes into making insights and model inferences easily consumable at scale. The data pipeline puts it all together. It is the railroad on which heavy and marvelous wagons of ML run. Long term success depends on getting the data pipeline right.

This article gives an introduction to the data pipeline and an overview of architecture alternatives.

Continue reading »

Discussion (0)