Oluwatobiloba Olatunji

Posted on Jan 6

Building a Production-Ready AI Loan Prediction System: A 15-Day Journey

#machinelearning #python #flask #ai

From Zero to Production: How I Built a Full-Stack ML Application with 88.62% Accuracy

A comprehensive technical deep-dive into building, optimizing, and deploying a multi-model machine learning system

THE JOURNEY

In 15 days, I built a production-ready loan prediction system featuring:

4 ML models with 88.62% best accuracy
REST API with Flask and PostgreSQL
83% test coverage with pytest
Professional Swagger documentation
3-4x performance optimization
Live deployment on Render

Live Demo | GitHub | API Docs

Introduction
The Problem
System Architecture
Week 1: Foundation
Week 2: Production
Week 3: Polish
Technical Deep-Dives
Challenges & Solutions
Performance Optimization
Lessons Learned
What's Next

Introduction

Three weeks ago, I had an idea: build a production-grade machine learning system that demonstrates not just ML skills, but full-stack engineering, DevOps, and software craftsmanship.

The result? A loan prediction API that's currently serving real predictions with 88.62% accuracy, complete with professional documentation, comprehensive testing, and performance optimizations that make it 3-4x faster than the initial version.

The landing page showing live statistics and model performance

This post is a technical deep-dive into how I built it, the challenges I faced, and the lessons I learned along the way.

The Problem

Challenge: Build a system that predicts loan approval decisions based on applicant information.

Requirements:

High accuracy (>85%)
Production-ready code
Comprehensive testing
Professional documentation
Performance optimization
Real-time predictions
Scalable architecture

Constraints:

15-day timeline
Solo developer
Limited budget (free tier hosting)

System Architecture

The system follows a layered architecture pattern:

┌─────────────────────────────────────────┐
│         Client Layer                    │
│  (Web Browser, API Clients)             │
└──────────────┬──────────────────────────┘
               │
┌──────────────▼──────────────────────────┐
│      Presentation Layer                 │
│  (Flask Routes, Swagger UI)             │
└──────────────┬──────────────────────────┘
               │
┌──────────────▼──────────────────────────┐
│      Application Layer                  │
│  (Validation, Caching, Rate Limiting)   │
└──────────────┬──────────────────────────┘
               │
       ┌───────┴──────┐
       │              │
┌──────▼─────┐  ┌─────▼──────┐
│  ML Layer  │  │ Data Layer │
│ (4 Models) │  │(PostgreSQL)│
└────────────┘  └────────────┘

High-level system architecture showing separation of concerns

Tech Stack:

Backend: Flask 3.0, Python 3.10
ML: scikit-learn, pandas, numpy
Database: PostgreSQL (production), SQLite (dev)
Testing: pytest, coverage
Documentation: Flasgger (Swagger/OpenAPI)
Performance: Flask-Caching, Flask-Limiter, Flask-Compress
Deployment: Render, Gunicorn
Frontend: Vanilla JavaScript, CSS3, HTML5

Week 1: Foundation (Days 1-6)

Day 1-2: API Setup & Data Exploration

I started with the basics: a Flask API and understanding the data.

# Initial Flask setup
from flask import Flask, request, jsonify

app = Flask(__name__)

@app.route('/predict', methods=['POST'])
def predict():
    data = request.get_json()
    # TODO: Add ML model
    return jsonify({"prediction": "pending"})

The dataset had 614 loan applications with 12 features:

Applicant demographics (Gender, Married, Dependents, Education)
Financial data (ApplicantIncome, CoapplicantIncome, LoanAmount)
Loan details (Loan_Amount_Term, Credit_History)
Property information (Property_Area)

Key insights from EDA:

69% approval rate in training data
Credit_History was the strongest predictor
Income and loan amount had right-skewed distributions
Missing values in multiple columns (~10-15%)

Exploratory data analysis showing feature distributions and correlations

Day 3-4: Data Preprocessing & Feature Engineering

Missing values were handled strategically:

# Categorical: Mode imputation
df['Gender'].fillna(df['Gender'].mode()[0], inplace=True)
df['Married'].fillna(df['Married'].mode()[0], inplace=True)

# Numerical: Median imputation
df['LoanAmount'].fillna(df['LoanAmount'].median(), inplace=True)

Feature engineering made a huge difference:

# Created features that improved accuracy by 8%
df['Total_Income'] = df['ApplicantIncome'] + df['CoapplicantIncome']
df['Loan_to_Income_Ratio'] = df['LoanAmount'] / (df['Total_Income'] + 1)
df['Income_per_Dependent'] = df['ApplicantIncome'] / (df['Dependents'] + 1)
df['Log_ApplicantIncome'] = np.log1p(df['ApplicantIncome'])
df['Log_LoanAmount'] = np.log1p(df['LoanAmount'])

Day 5-6: First ML Model

Started with Random Forest:

from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42, stratify=y
)

model = RandomForestClassifier(
    n_estimators=100,
    max_depth=10,
    min_samples_split=5,
    random_state=42
)

model.fit(X_train, y_train)

Results:

Initial accuracy: 79.83%
After feature engineering: 88.62% ✅
Precision: 0.89
Recall: 0.96
F1-Score: 0.92

Performance metrics for Random Forest model

Week 2: Production (Days 7-13)

Day 7-8: ML Integration & Validation

Integrated the model with comprehensive input validation:

class LoanApplicationValidator:
    MIN_INCOME = 0
    MAX_INCOME = 100000
    MIN_LOAN_AMOUNT = 0
    MAX_LOAN_AMOUNT = 10000

    def validate_loan_application(self, data):
        errors = []
        warnings = []

        # Required field validation
        if 'ApplicantIncome' not in data:
            errors.append("ApplicantIncome is required")
        elif data['ApplicantIncome'] <= self.MIN_INCOME:
            errors.append("ApplicantIncome must be greater than 0")

        # Range validation
        if data.get('LoanAmount', 0) > self.MAX_LOAN_AMOUNT:
            warnings.append("Loan amount is unusually high")

        return len(errors) == 0, errors, warnings

This prevented bad predictions and improved user experience.

User-friendly validation error messages in the UI

Day 9-10: Database Integration

Added PostgreSQL for prediction history:

from flask_sqlalchemy import SQLAlchemy

db = SQLAlchemy(app)

class Prediction(db.Model):
    id = db.Column(db.Integer, primary_key=True)
    timestamp = db.Column(db.DateTime, default=datetime.utcnow)
    prediction = db.Column(db.String(20))
    confidence = db.Column(db.Float)
    input_data = db.Column(db.JSON)

    @classmethod
    def from_request(cls, input_data, result, warnings, ip):
        return cls(
            prediction=result['prediction'],
            confidence=result['confidence'],
            input_data=input_data,
            warnings=json.dumps(warnings),
            ip_address=ip
        )

This enabled analytics and monitoring.

Database-backed prediction history with filtering and pagination

Day 11-12: Testing

Achieved 83% test coverage with pytest:

def test_predict_success(client):
    """Test successful prediction"""
    data = {
        "ApplicantIncome": 5000,
        "LoanAmount": 150,
        "Credit_History": 1
    }

    response = client.post('/predict', json=data)

    assert response.status_code == 200
    assert 'prediction' in response.json
    assert response.json['prediction'] in ['Approved', 'Rejected']
    assert 0 <= response.json['confidence'] <= 1

def test_predict_validation_error(client):
    """Test validation error handling"""
    data = {"ApplicantIncome": -1000}  # Invalid

    response = client.post('/predict', json=data)

    assert response.status_code == 400
    assert 'validation_errors' in response.json

pytest coverage report showing 83% code coverage

Day 13: Deployment

Deployed to Render with:

# render.yaml
services:
  - type: web
    name: loan-predictor-api
    env: python
    buildCommand: pip install -r requirements.txt
    startCommand: gunicorn app:app
    envVars:
      - key: PYTHON_VERSION
        value: 3.10.0

Deployment challenges:

PostgreSQL connection string format differences
Environment variable management
Cold start optimization

Week 3: Polish (Days 14-18)

Day 14: Frontend Development

Built a responsive UI with vanilla JavaScript:

async function makePrediction() {
    const data = {
        ApplicantIncome: parseFloat(document.getElementById('income').value),
        LoanAmount: parseFloat(document.getElementById('loan').value),
        Credit_History: parseInt(document.getElementById('credit').value)
    };

    const response = await fetch(`${API_URL}/predict`, {
        method: 'POST',
        headers: {'Content-Type': 'application/json'},
        body: JSON.stringify(data)
    });

    const result = await response.json();
    displayResult(result);
}

Design principles:

Mobile-first responsive design
Clear visual hierarchy
Immediate feedback
Error handling with helpful messages

Mobile and desktop views showing responsive layout

Day 15: API Documentation

Added Swagger/OpenAPI documentation:

from flasgger import Swagger, swag_from

swagger = Swagger(app, template=swagger_template)

@app.route('/predict', methods=['POST'])
@swag_from('docs/swagger/predict.yml')
def predict():
    """Make loan approval prediction"""
    # Implementation

This made the API self-documenting and easy to test.

Interactive API documentation with Swagger UI

Day 16: Performance Optimization

Implemented caching, rate limiting, and compression:

from flask_caching import Cache
from flask_limiter import Limiter
from flask_compress import Compress

cache = Cache(app, config={'CACHE_TYPE': 'simple'})
limiter = Limiter(app, key_func=get_remote_address)
compress = Compress(app)

@app.route('/statistics')
@cache.cached(timeout=60)  # Cache for 1 minute
def statistics():
    return jsonify(get_statistics())

@app.route('/predict', methods=['POST'])
@limiter.limit("100 per hour")  # Rate limit
def predict():
    # Implementation

Performance improvements:

Average response time: 500ms → 150ms (3.3x faster)
P95 response time: 800ms → 250ms (3.2x faster)
Cache hit rate: 85%
Response size: 70% smaller with compression

Day 17: Multi-Model System

Trained and compared 4 different models:

Model	Accuracy	Precision	Recall	F1-Score	Training Time
Random Forest	88.62%	0.89	0.96	0.92	2.3s
Logistic Regression	84.55%	0.85	0.94	0.89	0.1s
Gradient Boosting	87.80%	0.88	0.95	0.91	5.7s
SVM	83.74%	0.84	0.93	0.88	1.2s

Random Forest won on accuracy, but Logistic Regression was 23x faster to train.

Side-by-side comparison of four different ML models

Added model comparison endpoint:

@app.route('/models/benchmark', methods=['POST'])
def benchmark_models():
    """Test all models with same input"""
    results = {}

    for model_name, model in available_models.items():
        prediction = model.predict(df_processed)[0]
        results[model_name] = {
            'prediction': 'Approved' if prediction == 1 else 'Rejected',
            'confidence': float(max(model.predict_proba(df_processed)[0]))
        }

    return jsonify(results)

Day 18: Demo Video & Documentation

Created comprehensive portfolio assets:

2-minute demo video
15+ screenshots
Architecture diagrams
Technical documentation

Technical Deep-Dives

1. Feature Engineering Impact

Feature engineering improved accuracy from 79.83% to 88.62% (+8.79%).

Most impactful features:

Credit_History (original) - 42% importance
Total_Income (engineered) - 18% importance
Loan_to_Income_Ratio (engineered) - 15% importance
Log_ApplicantIncome (engineered) - 12% importance

# Feature importance analysis
importances = model.feature_importances_
feature_importance_df = pd.DataFrame({
    'feature': feature_names,
    'importance': importances
}).sort_values('importance', ascending=False)

print(feature_importance_df.head(10))

Feature importance chart showing engineered features in top 5

2. Handling Class Imbalance

The dataset had 69% approvals vs 31% rejections.

Solution: Stratified train-test split

X_train, X_test, y_train, y_test = train_test_split(
    X, y, 
    test_size=0.2, 
    random_state=42,
    stratify=y  # Maintains class distribution
)

This ensured both sets had similar approval rates.

3. Model Serialization

Used joblib for efficient model saving:

import joblib

# Save
joblib.dump(model, 'models/loan_model_v2.pkl')

# Load
model = joblib.load('models/loan_model_v2.pkl')

joblib is faster than pickle for numpy arrays and 10x smaller for large models.

4. Database Query Optimization

Optimized statistics endpoint with eager loading:

# Before: N+1 queries (slow)
predictions = Prediction.query.all()
for pred in predictions:
    user = User.query.get(pred.user_id)  # Separate query each time

# After: Single query with join (fast)
predictions = Prediction.query.options(
    joinedload(Prediction.user)
).all()

Result: 10x faster for 100+ predictions.

5. Caching Strategy

Different cache times for different data volatility:

# Rarely changes - long cache
@cache.cached(timeout=3600)  # 1 hour
def model_info():
    return model_metadata

# Changes occasionally - medium cache
@cache.cached(timeout=60)  # 1 minute
def statistics():
    return get_statistics()

# Changes frequently - short cache
@cache.cached(timeout=30)  # 30 seconds
def recent_predictions():
    return get_recent_predictions(10)

Challenges & Solutions

Challenge 1: PostgreSQL Connection Issues

Problem: Render uses postgres:// but SQLAlchemy 1.4+ requires postgresql://

Solution:

database_url = os.environ.get('DATABASE_URL')
if database_url and database_url.startswith('postgres://'):
    database_url = database_url.replace('postgres://', 'postgresql+psycopg2://', 1)

Challenge 2: Cold Start Performance

Problem: First request after inactivity took 5+ seconds

Solution:

Implemented health check endpoint
Added model preloading on startup
Used Render's always-on plan

# Load model on startup, not on first request
with app.app_context():
    load_model()

Challenge 3: Missing Value Handling

Problem: Real-world data had missing values not in training

Solution: Defensive preprocessing with defaults

def preprocess_input(data):
    # Provide sensible defaults
    if 'LoanAmount' not in data or pd.isna(data['LoanAmount']):
        data['LoanAmount'] = 128.0  # Median from training

    if 'Credit_History' not in data:
        data['Credit_History'] = 1.0  # Most common value

    return data

Challenge 4: Test Coverage Gaps

Problem: Initial coverage was only 45%

Solution: Systematic test writing

# Test matrix approach
test_cases = [
    # (input, expected_status, expected_keys)
    ({'ApplicantIncome': 5000}, 200, ['prediction', 'confidence']),
    ({'ApplicantIncome': -100}, 400, ['error', 'validation_errors']),
    ({}, 400, ['error']),
]

@pytest.mark.parametrize("input_data,status,keys", test_cases)
def test_predict(client, input_data, status, keys):
    response = client.post('/predict', json=input_data)
    assert response.status_code == status
    for key in keys:
        assert key in response.json

Result: 45% → 83% coverage

Performance Optimization

Before vs After

Metric	Before	After	Improvement
Avg Response Time	500ms	150ms	3.3x faster
P95 Response Time	800ms	250ms	3.2x faster
Throughput	10 req/s	35 req/s	3.5x higher
Response Size	2.5 KB	0.7 KB	72% smaller
Cache Hit Rate	0%	85%	∞ improvement

Optimization Techniques

1. Response Caching

@cache.cached(timeout=60, query_string=True)
def history():
    # Expensive database query
    return get_recent_predictions()

2. Database Indexing

class Prediction(db.Model):
    id = db.Column(db.Integer, primary_key=True)
    timestamp = db.Column(db.DateTime, index=True)  # Index for sorting
    prediction = db.Column(db.String(20), index=True)  # Index for filtering

3. Compression

compress = Compress(app)  # Automatic gzip compression

4. Connection Pooling

app.config['SQLALCHEMY_ENGINE_OPTIONS'] = {
    'pool_size': 10,
    'pool_recycle': 3600,
    'pool_pre_ping': True
}

Response time comparison before and after optimization

Lessons Learned

1. Feature Engineering > Algorithm Selection

I spent 2 days trying different algorithms (Random Forest, XGBoost, Neural Networks) and got marginal improvements (1-2%).

Then I spent 4 hours on feature engineering and got +8% accuracy.

Lesson: Understand your data first. Good features beat fancy algorithms.

2. Testing Saves Time

Initially, I skipped tests to "move faster." Then I spent 3 days debugging production issues.

After adding comprehensive tests, I could refactor confidently and caught bugs before deployment.

Lesson: Tests are not overhead. They're insurance.

3. Documentation is for Future You

I didn't document my preprocessing steps initially. Two weeks later, I couldn't remember why I used median vs mean imputation.

Good documentation saved me hours of re-research.

Lesson: Document decisions, not just code.

4. Performance Matters

My initial API took 500ms per request. Users noticed. After optimization (150ms), the experience felt instant.

Lesson: Performance is a feature.

5. Production ≠ Development

What worked locally didn't always work in production:

Database connection strings were different
Environment variables needed careful management
Cold starts were a real issue

Lesson: Test in production-like environments early.

What's Next

Immediate Improvements

[ ] Add user authentication
[ ] Implement A/B testing framework
[ ] Add model monitoring and drift detection
[ ] Create admin dashboard
[ ] Add email notifications

Future Features

[ ] Explainable AI (SHAP values)
[ ] Real-time model retraining
[ ] Multi-language support
[ ] Mobile app (React Native)
[ ] Batch prediction API

Technical Debt

[ ] Migrate to Redis for distributed caching
[ ] Add Celery for background tasks
[ ] Implement proper logging (ELK stack)
[ ] Add monitoring (Prometheus + Grafana)
[ ] Set up CI/CD pipeline

Conclusion

Building this system taught me that production ML is 20% modeling and 80% engineering.

The model was working on Day 5. Making it production-ready took 13 more days.

But that's what separates a Jupyter notebook from a real product:

Comprehensive testing

Professional documentation

Performance optimization

Error handling

Monitoring and logging

Security considerations

User experience

If you're building an ML system, focus on these fundamentals. The fancy algorithms can wait.

Resources

Live Demo: https://loan-predictor-api-91xu.onrender.com/app
API Documentation: https://loan-predictor-api-91xu.onrender.com/docs
GitHub Repository: [https://github.com/olatunjitobiloba/loan-predictor-api]
Demo Video: [https://www.youtube.com/watch?v=4NUUZVpKvy8]

Connect with me:

LinkedIn: [https://www.linkedin.com/in/olatunjioluwatobiloba/]
GitHub: https://github.com/olatunjitobiloba
Email: [olatunjitobiloba05@gmail.com]

Appendix: Complete Tech Stack

Backend

Python 3.10
Flask 3.0
Gunicorn 21.2
scikit-learn 1.3
pandas 2.0
numpy 1.24

Database

PostgreSQL 14
SQLAlchemy 2.0
psycopg2-binary 2.9

Testing

pytest 7.4
pytest-cov 4.1
coverage 7.3

Performance

Flask-Caching 2.1
Flask-Limiter 3.5
Flask-Compress 1.14

Documentation

Flasgger 0.9.7
Swagger UI

Frontend

HTML5
CSS3 (Grid, Flexbox)
JavaScript (ES6+)
Fetch API

DevOps

Git/GitHub
Render
Environment variables
Logging

Thanks for reading! If you found this helpful, please share it and star the repo on GitHub.

Questions? Comments? Reach out on LinkedIn or open an issue on GitHub.