Tejas Keerthi

Posted on Apr 27

How We Used ML + NLP to Predict Software Failures in Real Time

#ai #machinelearning #nlp #django

Sentinel-Net: Predicting Software Failures Before They Happen

Modern software systems move fast. Teams ship features rapidly and rely on complex, distributed systems. In this environment, failures are inevitable, but unexpected failures are completely preventable.

We built Sentinel-Net because we were tired of watching software failures blindside teams. A critical bug sneaks into production, performance degrades without warning, and by the time anyone notices, your users are already frustrated. Sentinel-Net watches your codebase, understands what is really happening in your commits and issues, and predicts failure risk before disaster strikes.

Before we dive into the code and architecture, here is a quick look at the real-time dashboard in action:

Here is a look at the theory, architecture, and code that makes it work.

The Problem with Reactive Monitoring

Traditional monitoring tools focus on observability after deployment. Metrics like CPU usage and error rates provide valuable insight, but they are inherently reactive. By the time a server alert fires, engineers are already in firefighting mode.

Systems actually emit weak signals long before a failure occurs. Think about increased commit churn, rising unresolved issues, and stress indicators in developer language. The challenge is capturing these signals and combining them into meaningful predictions so you can catch issues before they escalate.

Architecture and Asynchronous Ingestion

Sentinel-Net uses a Django full-stack architecture alongside MongoDB for document storage, Celery for background processing, and Django Channels for WebSockets. The frontend consumes this data to display a live command center.

To predict failures, we have to ingest repository activity via GitHub webhooks. The theoretical problem here is that machine learning and NLP processing take time. If we process this synchronously, the webhook will time out and GitHub will assume the delivery failed. We solve this by using Celery to handle the heavy lifting asynchronously. This ensures GitHub gets an immediate response while the background workers process the commit data at their own pace.

from celery import shared_task
from .ml_pipeline import evaluate_repository_risk
from .nlp_engine import analyze_commit
from .models import RepositorySignal

@shared_task
def process_commit_event(repo_id, message, author, timestamp):
    nlp_results = analyze_commit(message)

    signal = RepositorySignal(
        repository_id=repo_id,
        event_type='commit',
        raw_data={'author': author, 'message': message},
        sentiment_score=nlp_results['sentiment'],
        intent_category=nlp_results['intent'],
        risk_score=nlp_results['risk_score']
    )
    signal.save()

    evaluate_repository_risk.delay(repo_id)

Transforming Noise into Features

Raw GitHub data is incredibly noisy. A single commit does not tell you much on its own. The theory behind our feature engineering is that human behavior in a codebase reveals risk. We engineer features like commit burstiness and issues-per-commit ratios over a rolling 30-day window. We use log scaling to capture non-linear growth without letting massive outliers ruin the dataset. These features give the machine learning model the mathematical context it needs to identify risky patterns over time.

import pandas as pd
import numpy as np
from datetime import datetime, timedelta

def generate_feature_matrix(events_df):
    events_df['timestamp'] = pd.to_datetime(events_df['timestamp'])

    cutoff_date = datetime.now() - timedelta(days=30)
    recent_events = events_df[events_df['timestamp'] >= cutoff_date]

    features = {}
    commits = recent_events[recent_events['type'] == 'commit']

    features['total_commits_30d'] = len(commits)
    features['avg_commits_per_day'] = len(commits) / 30.0

    issues_opened = recent_events[(recent_events['type'] == 'issue') & (recent_events['action'] == 'opened')]
    issues_closed = recent_events[(recent_events['type'] == 'issue') & (recent_events['action'] == 'closed')]
    features['open_issues_30d'] = len(issues_opened) - len(issues_closed)

    unique_contributors = commits['author'].nunique()
    features['active_contributors'] = unique_contributors

    features['issues_per_commit'] = np.log1p(len(issues_opened)) / np.log1p(len(commits)) if len(commits) > 0 else 0

    return pd.Series(features)

The Predictive ML Ensemble

When predicting system failure, a single algorithm is rarely enough. Our theory is built around an ensemble approach. These engineered features feed into a Stacking Regressor.

We combine Random Forest to map non-linear relationships, Gradient Boosting to iteratively correct mistakes, Extra Trees to handle edge-cases by injecting randomness, and HistGradientBoosting for pure computational speed. Instead of just averaging their answers, a Ridge meta-learner evaluates their predictions and figures out which base model to trust in specific scenarios.

from sklearn.ensemble import (
    RandomForestRegressor,
    GradientBoostingRegressor,
    ExtraTreesRegressor,
    HistGradientBoostingRegressor,
    StackingRegressor
)
from sklearn.linear_model import Ridge
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler

def build_ensemble_model():
    base_learners = [
        ('rf', RandomForestRegressor(n_estimators=100, random_state=42)),
        ('gb', GradientBoostingRegressor(n_estimators=100, random_state=42)),
        ('et', ExtraTreesRegressor(n_estimators=100, random_state=42)),
        ('hgb', HistGradientBoostingRegressor(random_state=42))
    ]

    stacked_ensemble = StackingRegressor(
        estimators=base_learners,
        final_estimator=Ridge()
    )

    return Pipeline([
        ('scaler', StandardScaler()),
        ('ensemble', stacked_ensemble)
    ])

Extracting Context with NLP

Numbers lack human context. A massive code deletion might look terrifying to a pure statistics model, but if a senior engineer notes that they are simply cleaning up dead code, it is actually a healthy sign. We need semantic understanding.

We analyze commit messages and issue descriptions using NLTK VADER for sentiment analysis. However, standard NLP models struggle with developer slang. We supplemented the model with custom dictionaries so it understands that words like lgtm or wip carry specific contextual weight in software engineering.

from nltk.sentiment.vader import SentimentIntensityAnalyzer
from textblob import TextBlob

sia = SentimentIntensityAnalyzer()
sia.lexicon.update({'lgtm': 0.5, 'wip': 0.0, 'wtf': -0.8, 'borked': -0.7})

def analyze_commit(message):
    sentiment_score = sia.polarity_scores(message)['compound']
    message_lower = message.lower()

    if any(word in message_lower for word in ['critical', 'hotfix', 'borked']):
        intent, risk_base = 'urgent', 0.85
    elif any(word in message_lower for word in ['fix', 'bug']):
        intent, risk_base = 'bug_fix', 0.60
    elif any(word in message_lower for word in ['add', 'feature']):
        intent, risk_base = 'feature', 0.40
    else:
        intent, risk_base = 'chore', 0.20

    final_risk = max(0.0, min(risk_base - (sentiment_score * 0.1), 1.0))

    return {
        'sentiment': sentiment_score,
        'intent': intent,
        'risk_score': final_risk
    }

Flexible Document Storage

Relational databases are a nightmare for storing diverse metadata. A pull request looks entirely different from a single commit or a server alert. MongoDB allows us to store wildly different data shapes without strict SQL migrations. We use MongoEngine to integrate this natively with Django, adding compound indexes to ensure our dashboard queries remain blazing fast even as the collection grows.

from mongoengine import Document, StringField, FloatField, DictField, DateTimeField, IntField
from datetime import datetime

class RepositorySignal(Document):
    meta = {
        'collection': 'repository_signals',
        'indexes': ['repository_id', '-timestamp']
    }

    repository_id = IntField(required=True)
    event_type = StringField(choices=('commit', 'issue'), required=True)
    timestamp = DateTimeField(default=datetime.utcnow)

    raw_data = DictField(required=True)
    sentiment_score = FloatField()
    intent_category = StringField()
    risk_score = FloatField()

Real-Time Dashboard Updates

A predictive monitor is useless if you have to refresh the page to see it. When Celery finishes updating the overall ML risk score, it pushes the data to the frontend via Django Channels. The UI uses Vanilla JavaScript to handle the WebSocket connection and inject the data live, ensuring a seamless experience that feels instantly responsive.

const repoId = document.getElementById('repo-data').dataset.id;
const wsUrl = `ws://${window.location.host}/ws/dashboard/${repoId}/`;
const ws = new WebSocket(wsUrl);

ws.onmessage = function(event) {
    const data = JSON.parse(event.data);

    if (data.type === 'RISK_SCORE_UPDATE') {
        const riskElement = document.getElementById('risk-score');
        riskElement.innerText = `${data.current_risk.toFixed(1)}%`;

        if (data.latest_signal) {
            const feed = document.getElementById('signal-feed');
            const signalHtml = `
                <div class="p-3 bg-gray-700 rounded border-l-4">
                    <span class="text-xs text-gray-400">[${data.latest_signal.intent}]</span>
                    <p class="text-sm mt-1">${data.latest_signal.message}</p>
                </div>
            `;
            feed.insertAdjacentHTML('afterbegin', signalHtml);
        }
    }
};

ws.onclose = function(e) {
    console.log('Socket closed. Reconnecting in 2 seconds...');
    setTimeout(() => { new WebSocket(wsUrl); }, 2000);
};

Acknowledgments and Team

This project was developed by:

Tejas Keerthi
Lokesh M
Srinish Reddy
Ayan Maji

We would like to express our sincere gratitude to @chanda_rajkumar for his valuable guidance and support throughout this project. His insights into system design, architecture, and development played a key role in shaping Sentinel-Net.

Conclusion

Sentinel-Net demonstrates that software failures can be anticipated. By combining machine learning, NLP, and real-time data pipelines, we shift monitoring from reactive to predictive. This empowers development teams to act early, reduce downtime, and build more reliable systems. The future of software reliability lies not in detecting failures, but in preventing them.

DEV Community