Building Distributed Systems with Ray—Just Like Running a Restaurant

#dataengineering #architecture #tutorial #ray

In our fast-paced digital world, data keeps piling up at a staggering rate. To make sense of it all, we need some supercharged computing tools that can handle what we throw at them. That is where Ray comes in. It is an open-source framework designed to help you build distributed applications without all the headaches. In the following sections, we’ll break down each Ray component using our restaurant analogy, translating abstract concepts into familiar, real-world operations. Let’s get cooking!

Ray Data: Ingredient Prep Station
Role: Prepares every ingredient (chopping, washing, stalking) so when the kitchen manager is ready, everything is good to go with the cooking process.
Ray Version: Handles all the preparations for massive data transformations—loading and mapping to transform data sets pre-work to avoid unnecessary time before training.
Key Analogy: If you have a hundred tomatoes to chop or a hundred bags of flour, it is much easier to get it done on Ray Data before it is time to train than during actual cooking.

Ray Train: The Chefs Cooking the Main Dishes
Role: Cooks distribute what they're preparing and they can follow instructions, manage temperature and time, and check in along the way.
Ray Version: Distributes CPU/GPUs for model training, distributes/checks in along the way.
Key Analogy: You want different cooks (hardware) trained on different dishes (models) at the same time as possible so you can kill two birds with one stone and work efficiently.

Ray Tune: The Research Team Experimenting with Recipes
Role: The restaurant's R&D team is constantly exploring new tastes—iterating on sauces, thickness of pasta/ice cream/whatever other courses there are.
Ray Version: Runs many trials in parallel for different parameters to assess best hyperparameters for a model.
Key Analogy: Tune is like a lab that tries dozens or hundreds or iterations on a single recipe to find the best version—most palatable tasting notes—and best version for this request.

Ray Serve: Waitstaff and Quick Service Station
Role: The system for delivering courses to diners in a timely manner at the speed these clients want, especially if there is a rush and volume/inventory can help accommodate.
Ray Version: Deploys and manages models/APIs, auto-scaling for client request routing.
Key Analogy: Serve does not train models because Serve is a waitstaff who does not cook; it takes what is made in the kitchen (Run) to clients and can take many requests at once without sacrificing turnaround time.

Ray RLlib: The Robot Chef Experimenting with Reinforcement Learning
Role: A chef using learning mechanisms over time—finds the best way to create various dishes with notes/feedback along the way.
Ray Version: Builds, distributes/runs reinforcement learning algorithms from agents that can get feedback to learn over time.
Key Analogy: RLlib is a robot chef who can learn through trial and error, figuring out which seasonings people like after adjusting and trying repeatedly.

Ray Core: Kitchen Manager and Scheduling System
Role: Schedules all kitchen staff, knows what each cook is doing, who is available to support what task, ensuring order fulfillment does not bottleneck.
Ray Version: The core component to handle fundamental parallel computing—starting/stopping processes for workers, knowing what's going on, who's got what tasks, and generating reports about assigned resources.
Key Analogy: If one cook is busy, he'll have to wait for the next and it will take longer to line up what's being prepared next but Ray Core will essentially keep everyone in line.

Ray Cluster = The Kitchen: Just like a restaurant has a kitchen where all the cooking happens, a Ray cluster is a collection of computers that work together. They pool their resources, whether it is memory or processing power, to get things done efficiently.

Ray Workers = The Kitchen Staff: In any kitchen, you have distinct roles. There are chefs whipping up meals, sous chefs prepping ingredients, and dishwashers keeping things clean. Ray workers are like that staff—they are computers or components of computers that tackle their tasks independently.

Ray Scheduler = The Head Chef: Think of the head chef as the one in charge of getting things done. They assign tasks to the staff, making sure everyone knows what to work on and when. The Ray scheduler does the same thing by distributing tasks to the workers and keeping everything on track.

Ray Tasks = Recipes: Just like a well-structured recipe guides a chef, Ray tasks are steps that need to be executed. Whether it is crunching data or performing calculations, these tasks are straightforward units of work.

Ray Jobs = Customer Orders: When a customer puts in an order, the kitchen team jumps into action to make it happen. A Ray job is like that order—it is made up of a group of tasks that need to come together to complete a bigger goal, such as training a machine learning model or processing a big dataset.

Ray Actors = Specialized Chefs/Stations: Some chefs have specialties; for instance, a pastry chef only focuses on desserts. In Ray, actors are specialized workers that keep their state between task executions, just like those chefs focus on their craft.

Ray Tasks with Dependencies = Recipe Steps: In cooking, some steps cannot happen until others are finished—like you need to caramelize onions before adding them to the dish. Ray takes care of ordering these tasks properly, ensuring everything gets done at the right time.

Ray Libraries = Specialized Cookbooks: Just as a chef might have cookbooks for various cuisines, Ray comes with built-in libraries that help you deal with common tasks. For instance, it has RLlib for reinforcement learning and Tune for optimizing hyperparameters.

Ray Client = Restaurant Manager/Front of House: Finally, much like a restaurant manager bridges the kitchen and the dining area, the Ray Client makes it easy for users outside the cluster to interact with it.

Ray's Object Store is like a restaurant's central pantry since it streamlines everything all Ray components (tasks/actors/workers) do by facilitating communication and possession of what is needed.

The Central Pantry (Object Store):
The Object Store integrates everything from processed tasks and necessary tasks so there is one place for everyone to come together without issue. In a vast restaurant setting, it is the only way to operate, and a modern-day restaurant must become Ray.
The Ingredients (Objects):
Everything that gets served eventually is in the pantry. It is all there as individual items or groups of items, if necessary, with separate and respective characteristics. In Ray, everything is stored and compartmentalized as an "object".
Access and Sharing:
Since it is a shared pantry, one ingredient can be used by many chefs or dishes and there is no overlap or redundancy. This means that it is easily accessible and reapplied without offense. In Ray, Object Store allows multiple workers to easily access or repurpose what someone else has done so distributed processing is easier and quicker without wasting energy.
Efficiency:
It saves time from searching everywhere and cuts down on ordering redundancy, excess shipping or transporting from other places. Everything they need is in-house. The same with Ray—if the Object Store can pre-exist certain uses, Ray saves on sending over certain pieces of data/transferring network access/computational time of selecting what is used.

Connecting it All — Ray as a Restaurant (single end-to-end flow)

Guest places order (user request / Job starts): someone orders a meal — a job is sent to Ray.
Restaurant manager (Ray Core / Scheduler) parses the order and breaks it down into tasks: Ray Core breaks the job down into tasks and schedules them to workers.
Pantry & prep station (Ray Data) prepare and stage ingredients: data pipelines partition, preprocess, cache, and share datasets for efficient access and effective workers.
Head chef assigns & staff coordinate (Scheduler _+ _Workers / Ray Train): the scheduler gives tasks to the cooks; workers perform distributed compute and training jobs (Ray Train), coordinating efforts for cooking and baking.
Line cooks & sous-chefs execute dishes (Workers / Ray Train): workers perform the compute-heavy steps (model training, batch jobs, data transforms) on the ingredients.
Tasting lab runs experiments (Ray Tune): parallel experiments evaluate various recipe configurations (hyperparameters), feeding better versions back into training.
Robot chef explores improvements continuously (Ray RLlib): reinforcement-learning agents autonomously learn better policy/strategies that improve recipes/models.
Waitstaff & hosts deliver dishes (Ray Serve): Ray Serve hosts the trained model, manages inference requests, scales instances, adapts to traffic — bringing results to the user.
Meal served & feedback loop (Job done → iterate): the order is fulfilled; metrics, feedback fed into Ray Tune, RLlib, retraining, get into action for iterative development.

Ray is like a restaurant that can easily ramp up for a busy night; it provides a means of scaling for computers to work collaboratively with ease on otherwise complicated tasks.

Scaling the Kitchen
Easy Scaling-Ray can add more machines to the cluster, just like a restaurant easily adds in more chefs for a rush, without any issues bringing in one, two, or three more trained professionals to manage a larger situation.
Execution Efficiency-Ray effectively works across many workers creating effective compilation of many hands and minds to cut down on complicated processes—like a kitchen where each team member takes part in a unique, critical role.
Complicated Execution-Ray can manage processes of advanced complexities adding in support to establish an all-encompassing effort—like a restaurant that can put out lots of meals with disparate processes, techniques and timing.

Real-World Examples
Recommendation Systems
Developers implement Ray by working on thousands of computations that require a quick turnaround—from movie recommendation engines that need instantaneous audience analytics to compile the best results.
Scientific Inquiry
Ray engages in many pilot studies, from raw data assessment to model training, in research inquiries seeking simple predictive analytics or extensive exploration into existing data sets.
Conclusion
Ray streamlines distributed computing for developers. When a once-strained restaurant manager can easily expand and organize and optimize service during a rush it creates better opportunities for quality work and better ideas. Similarly, Ray makes scaling and establishing powerful parallel projects to be simple and worthwhile endeavors, transforming developer thinking into reality.

Connect with me on LinkedIn

DEV Community

Building Distributed Systems with Ray—Just Like Running a Restaurant

Top comments (0)