We've all been there. You train an AI agent or a model. It has 95% accuracy in testing. You deploy it. Two weeks later, users are complaining, but your dashboard still says "Average Accuracy: 94%."
The problem is volatility. Your model might be oscillating wildly (100% correct, then 0% correct), but the average hides it.
I built a tool to solve this. It's called the Predictability API, and it acts as a "FICO Score" for your data stability.
Here is how to use it to detect drift in real-time using Python.
The Code
You don't need to install heavy libraries like scikit-learn just to check for drift. You can use the standard requests library.
import requests
1. Your data stream (e.g., confidence scores from your agent)
Notice the dip in the middle? That's drift.
confidence_scores = [0.95, 0.94, 0.96, 0.60, 0.55, 0.62, 0.95, 0.96]
2. Define the API endpoint
url = "https://predictability-api.com/api/v1/sliding_window"
3. Send the data (Window Size = 3 checks stability in chunks)
payload = {
"scores": confidence_scores,
"window_size": 3,
"k": 1.0 # Sensitivity factor
}
4. Get the analysis
response = requests.post(url, json=payload)
results = response.json()['sliding_window_results']
5. Print the "Stability Score" for each window
for window in results:
print(f"Window {window['window_index']}: Score {window['score']}")
The Output
Window 0: Score 98.5 (Stable)
Window 1: Score 65.2 (Drift Detected!)
Window 2: Score 58.1 (Unstable)
Window 3: Score 97.8 (Recovered)
Why this matters
By monitoring the Predictability Score instead of just the raw average, you can trigger alerts the moment your agent becomes erratic, even if the overall average looks fine.
I built this API using Flask and Numba to make it incredibly fast. It's free to use for testing.
π Try the interactive calculator here: https://predictability-api.com π Read the Docs: https://predictability-api.com/apidocs
Let me know if you find this useful for your pipelines!
Top comments (0)