Design Patterns for Resilient Serving - Batch Serving

#datascience #machinelearning

Batch serving is useful when we need to carry out predictions asynchronously unlike stateless serving function where we can process one instance or at most a few 1000 instances embedded in a single request.

Examples include:

Determine whether to reorder a stock-keeping unit - needs to be carried out on an hourly basis
Creating personalized songs playlist
Recommendation engines with periodic refresh rates - say, the periodic refresh rate is per hour, then, we carry out inferences for only those users who visited the website in last on hour

To achieve asynchronous predictions, batch serving makes use of distributed data processing infrastructures such as BigQuery, Apache Beam, etc.

Consider this example below, where we run inference on approx. 1.5 million rows of data using BigQuery:

WITH all_complaints AS (
SELECT * FROM ML.PREDICT(MODEL external_model,
  (SELECT consumer_complaint_narrative AS reviews
   FROM `bigquery-public-data`.cfpb_complaints.complaint_database
   WHERE consumer_complaint_narrative IS NOT NULL
  )
))
SELECT * FROM all_complaints
ORDER BY positive_review_probability DESC LIMIT 5

Here, the following operations take place in order:

Read consumer_complaint_narrative column from dataset where consumer_complaint_narrative is not NULL. Let's assume this is a total of X values. These are then distributed across N shards.
N workers process each of N shards to read the data and do the inference using the model files.
Each of the N workers find the 5 most positive complaints from the shard they processed.
Take the (5 * N) complaints, sort them and then select 5 from the actual result.

DEV Community

Design Patterns for Resilient Serving - Batch Serving

Top comments (0)

Read next

New Adam Modification Unlocks Optimal Convergence for Any Beta2 Value

Claude 3.5 AI Assistant Achieves 87% Success Rate in Computer Interface Navigation Study

ViT Enhancements for Abstract Visual Reasoning: 2D Positions and Objects

AI Creates Digital Twins of 1,000 People with 85% Accuracy in Behavioral Predictions