DEV Community

Akan
Akan

Posted on

Building Production AI: A Three-Part MLOps Journey - Pt.2

Part 3: Deployment & Monitoring
"Production Deployment: UI, CI/CD, and Observability"
The Gist: We’ve built the engine and tested it on the track. Now, it’s time to open the showroom. In this final part, we aren’t just 'running code'—we’re launching a product. We’ll build a slick user interface, set up an automated 'safety net' (CI/CD) so we don't accidentally ship bugs, and install 'CCTV' (Monitoring) to make sure the AI stays healthy once it's out in the wild.

1. The Front Door: Gradio Application

Nobody wants to generate art by typing code into a terminal. We use Gradio to build a professional 'Front Door.' It’s a simple Python-based UI that lets users type a prompt and get an Adire masterpiece in seconds.

But here’s the pro secret: this app isn't just for the user. It’s wired into MLflow. Every time a user generates an image, the app 'telemeters' the performance data back to us. If a specific prompt is causing errors or taking 60 seconds to load, we’ll know immediately.

# app/gradio_app.py
import gradio as gr
from diffusers import StableDiffusionPipeline
import torch
import mlflow
from datetime import datetime

class InferenceApp:
    def __init__(self, model_path: str):
        # We wake up the brain and load our Adire LoRA
        self.pipe = self._load_model(model_path)
        mlflow.set_tracking_uri("../mlruns")
        mlflow.set_experiment("production_inference")

    def _load_model(self, model_path: str):
        pipe = StableDiffusionPipeline.from_pretrained(
            "runwayml/stable-diffusion-v1-5",
            torch_dtype=torch.float16
        )
        pipe.unet.load_attn_procs(model_path)
        return pipe.to("cuda" if torch.cuda.is_available() else "cpu")

    def generate(self, prompt: str, steps: int, guidance: float):
        """Generate image with full observability"""
        start = datetime.now()

        with mlflow.start_run(run_name="inference"):
            # We track exactly what the user asked for
            mlflow.log_params({"prompt": prompt, "steps": steps, "guidance": guidance})

            try:
                image = self.pipe(prompt, num_inference_steps=steps, guidance_scale=guidance).images[0]
                duration = (datetime.now() - start).total_seconds()

                # We log the 'health' of this specific request
                mlflow.log_metrics({"generation_time": duration, "prompt_length": len(prompt.split())})
                return image, f"✓ Generated in {duration:.2f}s"
            except Exception as e:
                mlflow.log_param("error", str(e))
                return None, f"✗ Error: {str(e)}"

    def launch(self):
        # The 'Layout' of our showroom
        with gr.Blocks() as demo:
            gr.Markdown("# Nigerian Adire Style Generator")
            with gr.Row():
                with gr.Column():
                    prompt = gr.Textbox(label="Prompt", placeholder="a nigerian_adire_style...")
                    steps = gr.Slider(20, 100, value=50, label="Steps")
                    guidance = gr.Slider(1, 15, value=7.5, label="Guidance")
                    btn = gr.Button("Generate", variant="primary")
                with gr.Column():
                    output = gr.Image(label="Generated Image")
                    status = gr.Textbox(label="Status")

            btn.click(fn=self.generate, inputs=[prompt, steps, guidance], outputs=[output, status])
        demo.launch(share=True)
Enter fullscreen mode Exit fullscreen mode

2. Shipping the Goods: HuggingFace Deployment

HuggingFace is the 'App Store' for AI. Instead of just sending someone a file, we deploy our model weights there. This script doesn't just upload the model; it creates a Model Card. Think of this as the 'Instruction Manual' and 'Nutrition Label' for your AI—it tells people what it is, how to use it, and what its quality scores were during training.

3. The Safety Net: CI/CD Pipeline

In professional software, we don't just 'upload and pray.' We use a GitHub Action (CI/CD). Every time we update the code, this automated 'Robot' wakes up and:

  1. Tests: Does the code even run?
  2. Evaluates: Does the new model version still meet our 0.75 quality score?
  3. Deploys: Only if everything is perfect does it push the update to the live app.It's how you sleep soundly at night knowing a small typo won't crash your production service.

4. The CCTV: Production Monitoring

Once your model is live, it can 'drift.' Maybe users start using slang the model doesn't understand, or maybe the GPU starts slowing down. We build a Monitoring Dashboard that acts like a heart rate monitor for our AI. If the average generation time spikes above 30 seconds, the system sends us an Alert.

# monitoring/dashboard.py
class MonitoringDashboard:
    def check_degradation(self, df: pd.DataFrame) -> bool:
        """Alert if performance degrades"""
        recent = df[df["timestamp"] > datetime.now() - timedelta(hours=24)]
        avg_time = recent["generation_time"].mean()

        # If the model gets 'tired' (slow), we trigger an alarm
        if avg_time > 30:
            print(f"⚠️ ALERT: Avg generation time {avg_time:.2f}s > 30s SLA")
            return True
        return False
Enter fullscreen mode Exit fullscreen mode

5. The Turbo Boost: Performance Optimization

Finally, we want our AI to be fast. In 2026, we have a few 'Cheat Codes' to speed up Stable Diffusion. By using torch.compile and xFormers, we can often double the speed of generation. We calculate our 'Speedup Ratio' using: S= optimized/baseline If S = 2.0, your users are getting their art twice as fast, and you're paying half the price for GPU time. It's a win-win.

def optimize_pipeline(pipe):
    # PyTorch 2.5+ 'compiles' the math into a faster format
    pipe.unet = torch.compile(pipe.unet)
    # Reduces VRAM so we can run on cheaper hardware
    pipe.enable_attention_slicing()
    # Uses 'Flash Attention' for a 20% speed boost
    pipe.enable_xformers_memory_efficient_attention()
    return pipe
Enter fullscreen mode Exit fullscreen mode

We've finally completed the series; we’ve gone from a few images of Adire fabric to a production-ready AI system. It’s fast, it’s monitored, it’s automated, and most importantly, it’s ready for real users. You’re no longer just playing with AI; you’re an AI Engineer.

All the resources used here are available for free:
Huggingface Repository: https://huggingface.co/AfroLogicInsect/sd-lora-nigerian-adire
GitHub Repository: https://github.com/AkanimohOD19A/adire_mlops_poc
Gradio UI:https://huggingface.co/AfroLogicInsect/sd-adire-demo

Top comments (0)