DEV Community

Cover image for DAY 13 - End-to-End Architecture Design
Subhasis Das
Subhasis Das

Posted on

DAY 13 - End-to-End Architecture Design

Day 13 of Phase 3: Performance & Production Thinking in the Databricks 14 Days AI Challenge – 2 (Advanced) focused on designing and documenting the end-to-end architecture of the system developed throughout the challenge.

The first task involved creating an architecture diagram that represents the complete data and machine learning workflow. The architecture illustrates how raw e-commerce event data flows through a layered lakehouse design. Raw CSV data is ingested into the Bronze layer where it is stored as Delta tables. From there, feature engineering transforms event-level data into curated user-level features within the Silver layer. These features are used to construct the training dataset for machine learning models. Logistic Regression and Random Forest models are trained and evaluated, with experiments tracked using MLflow. The trained model is then used within a batch inference pipeline to score users and generate predictions that are stored in the Gold layer. In parallel, a collaborative filtering recommendation system using ALS generates product recommendations based on user interaction data.

Visual Concept

The second task required documenting the pipeline flow. This step connected the individual components implemented across earlier phases of the challenge. The pipeline begins with data ingestion and Delta table creation, followed by feature engineering and dataset preparation. Model training and evaluation occur after the training dataset is generated, with experiment tracking handled through MLflow. The inference stage then produces prediction outputs for downstream analysis. Supporting layers such as job orchestration, streaming ingestion capability, performance monitoring, and cost optimization were incorporated to reflect how such a pipeline would operate in a real production environment.

The third task focused on defining a retraining strategy. A production-ready system must continuously adapt to evolving data patterns, so retraining can be triggered through scheduled jobs or changes in data distribution. The retraining workflow rebuilds the training dataset from updated Delta tables, retrains the models, evaluates performance metrics, and logs experiments through MLflow. The best-performing model is then deployed back into the inference pipeline.

Visual Concept

During the design and documentation process, ChatGPT assisted with structuring the architecture, organizing the pipeline flow, and refining the retraining strategy within the environment provided by Databricks.

This exercise highlighted how individual data engineering and machine learning components can be integrated into a cohesive and scalable system architecture.

Diagram generated by ChatGPT

Activity Log

Top comments (0)