DEV Community

Cover image for Building a Production-Ready AI Loan Prediction System: A 15-Day Journey
Oluwatobiloba Olatunji
Oluwatobiloba Olatunji

Posted on

Building a Production-Ready AI Loan Prediction System: A 15-Day Journey

From Zero to Production: How I Built a Full-Stack ML Application with 88.62% Accuracy

A comprehensive technical deep-dive into building, optimizing, and deploying a multi-model machine learning system


THE JOURNEY

In 15 days, I built a production-ready loan prediction system featuring:

  • 4 ML models with 88.62% best accuracy
  • REST API with Flask and PostgreSQL
  • 83% test coverage with pytest
  • Professional Swagger documentation
  • 3-4x performance optimization
  • Live deployment on Render

Live Demo | GitHub | API Docs


Table of Contents

  1. Introduction
  2. The Problem
  3. System Architecture
  4. Week 1: Foundation
  5. Week 2: Production
  6. Week 3: Polish
  7. Technical Deep-Dives
  8. Challenges & Solutions
  9. Performance Optimization
  10. Lessons Learned
  11. What's Next

Introduction

Three weeks ago, I had an idea: build a production-grade machine learning system that demonstrates not just ML skills, but full-stack engineering, DevOps, and software craftsmanship.

The result? A loan prediction API that's currently serving real predictions with 88.62% accuracy, complete with professional documentation, comprehensive testing, and performance optimizations that make it 3-4x faster than the initial version.

Homepage Screenshot
The landing page showing live statistics and model performance

This post is a technical deep-dive into how I built it, the challenges I faced, and the lessons I learned along the way.


The Problem

Challenge: Build a system that predicts loan approval decisions based on applicant information.

Requirements:

  • High accuracy (>85%)
  • Production-ready code
  • Comprehensive testing
  • Professional documentation
  • Performance optimization
  • Real-time predictions
  • Scalable architecture

Constraints:

  • 15-day timeline
  • Solo developer
  • Limited budget (free tier hosting)

System Architecture

The system follows a layered architecture pattern:

┌─────────────────────────────────────────┐
│         Client Layer                    │
│  (Web Browser, API Clients)             │
└──────────────┬──────────────────────────┘
               │
┌──────────────▼──────────────────────────┐
│      Presentation Layer                 │
│  (Flask Routes, Swagger UI)             │
└──────────────┬──────────────────────────┘
               │
┌──────────────▼──────────────────────────┐
│      Application Layer                  │
│  (Validation, Caching, Rate Limiting)   │
└──────────────┬──────────────────────────┘
               │
       ┌───────┴──────┐
       │              │
┌──────▼─────┐  ┌─────▼──────┐
│  ML Layer  │  │ Data Layer │
│ (4 Models) │  │(PostgreSQL)│
└────────────┘  └────────────┘
Enter fullscreen mode Exit fullscreen mode

Architecture Diagram
High-level system architecture showing separation of concerns

Tech Stack:

  • Backend: Flask 3.0, Python 3.10
  • ML: scikit-learn, pandas, numpy
  • Database: PostgreSQL (production), SQLite (dev)
  • Testing: pytest, coverage
  • Documentation: Flasgger (Swagger/OpenAPI)
  • Performance: Flask-Caching, Flask-Limiter, Flask-Compress
  • Deployment: Render, Gunicorn
  • Frontend: Vanilla JavaScript, CSS3, HTML5

Week 1: Foundation (Days 1-6)

Day 1-2: API Setup & Data Exploration

I started with the basics: a Flask API and understanding the data.

# Initial Flask setup
from flask import Flask, request, jsonify

app = Flask(__name__)

@app.route('/predict', methods=['POST'])
def predict():
    data = request.get_json()
    # TODO: Add ML model
    return jsonify({"prediction": "pending"})
Enter fullscreen mode Exit fullscreen mode

The dataset had 614 loan applications with 12 features:

  • Applicant demographics (Gender, Married, Dependents, Education)
  • Financial data (ApplicantIncome, CoapplicantIncome, LoanAmount)
  • Loan details (Loan_Amount_Term, Credit_History)
  • Property information (Property_Area)

Key insights from EDA:

  • 69% approval rate in training data
  • Credit_History was the strongest predictor
  • Income and loan amount had right-skewed distributions
  • Missing values in multiple columns (~10-15%)

EDA Visualization
Exploratory data analysis showing feature distributions and correlations

Day 3-4: Data Preprocessing & Feature Engineering

Missing values were handled strategically:

# Categorical: Mode imputation
df['Gender'].fillna(df['Gender'].mode()[0], inplace=True)
df['Married'].fillna(df['Married'].mode()[0], inplace=True)

# Numerical: Median imputation
df['LoanAmount'].fillna(df['LoanAmount'].median(), inplace=True)
Enter fullscreen mode Exit fullscreen mode

Feature engineering made a huge difference:

# Created features that improved accuracy by 8%
df['Total_Income'] = df['ApplicantIncome'] + df['CoapplicantIncome']
df['Loan_to_Income_Ratio'] = df['LoanAmount'] / (df['Total_Income'] + 1)
df['Income_per_Dependent'] = df['ApplicantIncome'] / (df['Dependents'] + 1)
df['Log_ApplicantIncome'] = np.log1p(df['ApplicantIncome'])
df['Log_LoanAmount'] = np.log1p(df['LoanAmount'])
Enter fullscreen mode Exit fullscreen mode

Day 5-6: First ML Model

Started with Random Forest:

from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42, stratify=y
)

model = RandomForestClassifier(
    n_estimators=100,
    max_depth=10,
    min_samples_split=5,
    random_state=42
)

model.fit(X_train, y_train)
Enter fullscreen mode Exit fullscreen mode

Results:

  • Initial accuracy: 79.83%
  • After feature engineering: 88.62%
  • Precision: 0.89
  • Recall: 0.96
  • F1-Score: 0.92

Model Performance
Performance metrics for Random Forest model


Week 2: Production (Days 7-13)

Day 7-8: ML Integration & Validation

Integrated the model with comprehensive input validation:

class LoanApplicationValidator:
    MIN_INCOME = 0
    MAX_INCOME = 100000
    MIN_LOAN_AMOUNT = 0
    MAX_LOAN_AMOUNT = 10000

    def validate_loan_application(self, data):
        errors = []
        warnings = []

        # Required field validation
        if 'ApplicantIncome' not in data:
            errors.append("ApplicantIncome is required")
        elif data['ApplicantIncome'] <= self.MIN_INCOME:
            errors.append("ApplicantIncome must be greater than 0")

        # Range validation
        if data.get('LoanAmount', 0) > self.MAX_LOAN_AMOUNT:
            warnings.append("Loan amount is unusually high")

        return len(errors) == 0, errors, warnings
Enter fullscreen mode Exit fullscreen mode

This prevented bad predictions and improved user experience.

Varror
User-friendly validation error messages in the UI

Day 9-10: Database Integration

Added PostgreSQL for prediction history:

from flask_sqlalchemy import SQLAlchemy

db = SQLAlchemy(app)

class Prediction(db.Model):
    id = db.Column(db.Integer, primary_key=True)
    timestamp = db.Column(db.DateTime, default=datetime.utcnow)
    prediction = db.Column(db.String(20))
    confidence = db.Column(db.Float)
    input_data = db.Column(db.JSON)

    @classmethod
    def from_request(cls, input_data, result, warnings, ip):
        return cls(
            prediction=result['prediction'],
            confidence=result['confidence'],
            input_data=input_data,
            warnings=json.dumps(warnings),
            ip_address=ip
        )
Enter fullscreen mode Exit fullscreen mode

This enabled analytics and monitoring.

Prediction History
Database-backed prediction history with filtering and pagination

Day 11-12: Testing

Achieved 83% test coverage with pytest:

def test_predict_success(client):
    """Test successful prediction"""
    data = {
        "ApplicantIncome": 5000,
        "LoanAmount": 150,
        "Credit_History": 1
    }

    response = client.post('/predict', json=data)

    assert response.status_code == 200
    assert 'prediction' in response.json
    assert response.json['prediction'] in ['Approved', 'Rejected']
    assert 0 <= response.json['confidence'] <= 1

def test_predict_validation_error(client):
    """Test validation error handling"""
    data = {"ApplicantIncome": -1000}  # Invalid

    response = client.post('/predict', json=data)

    assert response.status_code == 400
    assert 'validation_errors' in response.json
Enter fullscreen mode Exit fullscreen mode

Test Coverage
pytest coverage report showing 83% code coverage

Day 13: Deployment

Deployed to Render with:

# render.yaml
services:
  - type: web
    name: loan-predictor-api
    env: python
    buildCommand: pip install -r requirements.txt
    startCommand: gunicorn app:app
    envVars:
      - key: PYTHON_VERSION
        value: 3.10.0
Enter fullscreen mode Exit fullscreen mode

Deployment challenges:

  • PostgreSQL connection string format differences
  • Environment variable management
  • Cold start optimization

Week 3: Polish (Days 14-18)

Day 14: Frontend Development

Built a responsive UI with vanilla JavaScript:

async function makePrediction() {
    const data = {
        ApplicantIncome: parseFloat(document.getElementById('income').value),
        LoanAmount: parseFloat(document.getElementById('loan').value),
        Credit_History: parseInt(document.getElementById('credit').value)
    };

    const response = await fetch(`${API_URL}/predict`, {
        method: 'POST',
        headers: {'Content-Type': 'application/json'},
        body: JSON.stringify(data)
    });

    const result = await response.json();
    displayResult(result);
}
Enter fullscreen mode Exit fullscreen mode

Design principles:

  • Mobile-first responsive design
  • Clear visual hierarchy
  • Immediate feedback
  • Error handling with helpful messages

Responsive Design
Mobile and desktop views showing responsive layout

Day 15: API Documentation

Added Swagger/OpenAPI documentation:

from flasgger import Swagger, swag_from

swagger = Swagger(app, template=swagger_template)

@app.route('/predict', methods=['POST'])
@swag_from('docs/swagger/predict.yml')
def predict():
    """Make loan approval prediction"""
    # Implementation
Enter fullscreen mode Exit fullscreen mode

This made the API self-documenting and easy to test.

Swagger Documentation
Interactive API documentation with Swagger UI

Day 16: Performance Optimization

Implemented caching, rate limiting, and compression:

from flask_caching import Cache
from flask_limiter import Limiter
from flask_compress import Compress

cache = Cache(app, config={'CACHE_TYPE': 'simple'})
limiter = Limiter(app, key_func=get_remote_address)
compress = Compress(app)

@app.route('/statistics')
@cache.cached(timeout=60)  # Cache for 1 minute
def statistics():
    return jsonify(get_statistics())

@app.route('/predict', methods=['POST'])
@limiter.limit("100 per hour")  # Rate limit
def predict():
    # Implementation
Enter fullscreen mode Exit fullscreen mode

Performance improvements:

  • Average response time: 500ms → 150ms (3.3x faster)
  • P95 response time: 800ms → 250ms (3.2x faster)
  • Cache hit rate: 85%
  • Response size: 70% smaller with compression

Day 17: Multi-Model System

Trained and compared 4 different models:

Model Accuracy Precision Recall F1-Score Training Time
Random Forest 88.62% 0.89 0.96 0.92 2.3s
Logistic Regression 84.55% 0.85 0.94 0.89 0.1s
Gradient Boosting 87.80% 0.88 0.95 0.91 5.7s
SVM 83.74% 0.84 0.93 0.88 1.2s

Random Forest won on accuracy, but Logistic Regression was 23x faster to train.

Model Comparison
Side-by-side comparison of four different ML models

Added model comparison endpoint:

@app.route('/models/benchmark', methods=['POST'])
def benchmark_models():
    """Test all models with same input"""
    results = {}

    for model_name, model in available_models.items():
        prediction = model.predict(df_processed)[0]
        results[model_name] = {
            'prediction': 'Approved' if prediction == 1 else 'Rejected',
            'confidence': float(max(model.predict_proba(df_processed)[0]))
        }

    return jsonify(results)
Enter fullscreen mode Exit fullscreen mode

Day 18: Demo Video & Documentation

Created comprehensive portfolio assets:

  • 2-minute demo video
  • 15+ screenshots
  • Architecture diagrams
  • Technical documentation

Technical Deep-Dives

1. Feature Engineering Impact

Feature engineering improved accuracy from 79.83% to 88.62% (+8.79%).

Most impactful features:

  1. Credit_History (original) - 42% importance
  2. Total_Income (engineered) - 18% importance
  3. Loan_to_Income_Ratio (engineered) - 15% importance
  4. Log_ApplicantIncome (engineered) - 12% importance
# Feature importance analysis
importances = model.feature_importances_
feature_importance_df = pd.DataFrame({
    'feature': feature_names,
    'importance': importances
}).sort_values('importance', ascending=False)

print(feature_importance_df.head(10))
Enter fullscreen mode Exit fullscreen mode

Feature Importance
Feature importance chart showing engineered features in top 5

2. Handling Class Imbalance

The dataset had 69% approvals vs 31% rejections.

Solution: Stratified train-test split

X_train, X_test, y_train, y_test = train_test_split(
    X, y, 
    test_size=0.2, 
    random_state=42,
    stratify=y  # Maintains class distribution
)
Enter fullscreen mode Exit fullscreen mode

This ensured both sets had similar approval rates.

3. Model Serialization

Used joblib for efficient model saving:

import joblib

# Save
joblib.dump(model, 'models/loan_model_v2.pkl')

# Load
model = joblib.load('models/loan_model_v2.pkl')
Enter fullscreen mode Exit fullscreen mode

joblib is faster than pickle for numpy arrays and 10x smaller for large models.

4. Database Query Optimization

Optimized statistics endpoint with eager loading:

# Before: N+1 queries (slow)
predictions = Prediction.query.all()
for pred in predictions:
    user = User.query.get(pred.user_id)  # Separate query each time

# After: Single query with join (fast)
predictions = Prediction.query.options(
    joinedload(Prediction.user)
).all()
Enter fullscreen mode Exit fullscreen mode

Result: 10x faster for 100+ predictions.

5. Caching Strategy

Different cache times for different data volatility:

# Rarely changes - long cache
@cache.cached(timeout=3600)  # 1 hour
def model_info():
    return model_metadata

# Changes occasionally - medium cache
@cache.cached(timeout=60)  # 1 minute
def statistics():
    return get_statistics()

# Changes frequently - short cache
@cache.cached(timeout=30)  # 30 seconds
def recent_predictions():
    return get_recent_predictions(10)
Enter fullscreen mode Exit fullscreen mode

Challenges & Solutions

Challenge 1: PostgreSQL Connection Issues

Problem: Render uses postgres:// but SQLAlchemy 1.4+ requires postgresql://

Solution:

database_url = os.environ.get('DATABASE_URL')
if database_url and database_url.startswith('postgres://'):
    database_url = database_url.replace('postgres://', 'postgresql+psycopg2://', 1)
Enter fullscreen mode Exit fullscreen mode

Challenge 2: Cold Start Performance

Problem: First request after inactivity took 5+ seconds

Solution:

  • Implemented health check endpoint
  • Added model preloading on startup
  • Used Render's always-on plan
# Load model on startup, not on first request
with app.app_context():
    load_model()
Enter fullscreen mode Exit fullscreen mode

Challenge 3: Missing Value Handling

Problem: Real-world data had missing values not in training

Solution: Defensive preprocessing with defaults

def preprocess_input(data):
    # Provide sensible defaults
    if 'LoanAmount' not in data or pd.isna(data['LoanAmount']):
        data['LoanAmount'] = 128.0  # Median from training

    if 'Credit_History' not in data:
        data['Credit_History'] = 1.0  # Most common value

    return data
Enter fullscreen mode Exit fullscreen mode

Challenge 4: Test Coverage Gaps

Problem: Initial coverage was only 45%

Solution: Systematic test writing

# Test matrix approach
test_cases = [
    # (input, expected_status, expected_keys)
    ({'ApplicantIncome': 5000}, 200, ['prediction', 'confidence']),
    ({'ApplicantIncome': -100}, 400, ['error', 'validation_errors']),
    ({}, 400, ['error']),
]

@pytest.mark.parametrize("input_data,status,keys", test_cases)
def test_predict(client, input_data, status, keys):
    response = client.post('/predict', json=input_data)
    assert response.status_code == status
    for key in keys:
        assert key in response.json
Enter fullscreen mode Exit fullscreen mode

Result: 45% → 83% coverage


Performance Optimization

Before vs After

Metric Before After Improvement
Avg Response Time 500ms 150ms 3.3x faster
P95 Response Time 800ms 250ms 3.2x faster
Throughput 10 req/s 35 req/s 3.5x higher
Response Size 2.5 KB 0.7 KB 72% smaller
Cache Hit Rate 0% 85% ∞ improvement

Optimization Techniques

1. Response Caching

@cache.cached(timeout=60, query_string=True)
def history():
    # Expensive database query
    return get_recent_predictions()
Enter fullscreen mode Exit fullscreen mode

2. Database Indexing

class Prediction(db.Model):
    id = db.Column(db.Integer, primary_key=True)
    timestamp = db.Column(db.DateTime, index=True)  # Index for sorting
    prediction = db.Column(db.String(20), index=True)  # Index for filtering
Enter fullscreen mode Exit fullscreen mode

3. Compression

compress = Compress(app)  # Automatic gzip compression
Enter fullscreen mode Exit fullscreen mode

4. Connection Pooling

app.config['SQLALCHEMY_ENGINE_OPTIONS'] = {
    'pool_size': 10,
    'pool_recycle': 3600,
    'pool_pre_ping': True
}
Enter fullscreen mode Exit fullscreen mode

Performance Metrics
Response time comparison before and after optimization


Lessons Learned

1. Feature Engineering > Algorithm Selection

I spent 2 days trying different algorithms (Random Forest, XGBoost, Neural Networks) and got marginal improvements (1-2%).

Then I spent 4 hours on feature engineering and got +8% accuracy.

Lesson: Understand your data first. Good features beat fancy algorithms.

2. Testing Saves Time

Initially, I skipped tests to "move faster." Then I spent 3 days debugging production issues.

After adding comprehensive tests, I could refactor confidently and caught bugs before deployment.

Lesson: Tests are not overhead. They're insurance.

3. Documentation is for Future You

I didn't document my preprocessing steps initially. Two weeks later, I couldn't remember why I used median vs mean imputation.

Good documentation saved me hours of re-research.

Lesson: Document decisions, not just code.

4. Performance Matters

My initial API took 500ms per request. Users noticed. After optimization (150ms), the experience felt instant.

Lesson: Performance is a feature.

5. Production ≠ Development

What worked locally didn't always work in production:

  • Database connection strings were different
  • Environment variables needed careful management
  • Cold starts were a real issue

Lesson: Test in production-like environments early.


What's Next

Immediate Improvements

  • [ ] Add user authentication
  • [ ] Implement A/B testing framework
  • [ ] Add model monitoring and drift detection
  • [ ] Create admin dashboard
  • [ ] Add email notifications

Future Features

  • [ ] Explainable AI (SHAP values)
  • [ ] Real-time model retraining
  • [ ] Multi-language support
  • [ ] Mobile app (React Native)
  • [ ] Batch prediction API

Technical Debt

  • [ ] Migrate to Redis for distributed caching
  • [ ] Add Celery for background tasks
  • [ ] Implement proper logging (ELK stack)
  • [ ] Add monitoring (Prometheus + Grafana)
  • [ ] Set up CI/CD pipeline

Conclusion

Building this system taught me that production ML is 20% modeling and 80% engineering.

The model was working on Day 5. Making it production-ready took 13 more days.

But that's what separates a Jupyter notebook from a real product:

Comprehensive testing

Professional documentation

Performance optimization

Error handling

Monitoring and logging

Security considerations

User experience

If you're building an ML system, focus on these fundamentals. The fancy algorithms can wait.


Resources

Connect with me:


Appendix: Complete Tech Stack

Backend

  • Python 3.10
  • Flask 3.0
  • Gunicorn 21.2
  • scikit-learn 1.3
  • pandas 2.0
  • numpy 1.24

Database

  • PostgreSQL 14
  • SQLAlchemy 2.0
  • psycopg2-binary 2.9

Testing

  • pytest 7.4
  • pytest-cov 4.1
  • coverage 7.3

Performance

  • Flask-Caching 2.1
  • Flask-Limiter 3.5
  • Flask-Compress 1.14

Documentation

  • Flasgger 0.9.7
  • Swagger UI

Frontend

  • HTML5
  • CSS3 (Grid, Flexbox)
  • JavaScript (ES6+)
  • Fetch API

DevOps

  • Git/GitHub
  • Render
  • Environment variables
  • Logging

Thanks for reading! If you found this helpful, please share it and star the repo on GitHub.

Questions? Comments? Reach out on LinkedIn or open an issue on GitHub.


Top comments (0)