DEV Community

Cover image for # When Our AI Flood Prediction System Broke Because of Bad Data — and How We Fixed It 🌧️🛡️
Tanisha
Tanisha

Posted on

# When Our AI Flood Prediction System Broke Because of Bad Data — and How We Fixed It 🌧️🛡️

WeCoded 2026: Echoes of Experience 💜

This is a submission for the 2026 WeCoded Challenge: Echoes of Experience


The Moment Everything Broke

It was the middle of a hackathon.

Our AI model was ready.
The dashboard looked beautiful.
The architecture was solid.

But our system had one fatal flaw.

It had no reliable data.

And without data, our flood prediction platform was useless.


The Idea: Build an AI Guardian for Floods

Our team built StormShield AI, a real-time flood prediction and civic alert platform designed for Montgomery, Alabama.

The goal was simple:

Predict flood risks before they happen and alert citizens early.

The system combined:

  • Real-time environmental data
  • Machine learning forecasting
  • Generative AI alerts
  • An interactive dashboard

In theory, it was powerful.

But in practice, something kept breaking.


The Hidden Enemy of AI Projects

People think building AI systems is about models.

But the real challenge is something else:

Data pipelines.

StormShield depended on several live data sources:

  • USGS stream gauge data
  • NOAA weather alerts
  • Additional environmental signals

Some sources had APIs.

Others didn’t.

For those missing APIs, we turned to Bright Data web scraping.

That’s when the real problems began.


When Scraping Started Failing

The Bright Data scraping pipeline started behaving unpredictably.

We ran into multiple issues:

• Dynamic websites loading content with JavaScript
• Inconsistent HTML structures
• Rate limiting during scraping
• Partial or delayed responses

The result?

Our real-time dashboard would randomly stop updating.

Which meant our prediction engine stopped working too.


The Turning Point: Rethinking the Architecture

Instead of forcing scraping to work perfectly, we redesigned the pipeline.

We switched to a hybrid ingestion system.

Priority 1: Official APIs

We prioritized structured data from:

  • USGS Water Services API
  • NOAA/NWS Alerts

These became the backbone of our data pipeline.


Priority 2: Scraping as a Backup

Bright Data scraping became a fallback enrichment layer.

This meant scraping failures no longer broke the system.


Priority 3: Background Data Scheduler

We implemented APScheduler jobs that:

  • continuously poll data
  • cache responses
  • smooth noisy readings
  • retry failures

This made our system fault tolerant.


The Moment the Dashboard Came Alive

After hours of debugging…

The dashboard suddenly updated.

Live water levels appeared.

Prediction graphs began moving.

Alert statuses started changing dynamically.

Our system was finally working end-to-end.

That moment felt like magic.


What We Built: StormShield AI 🛡️

StormShield AI is a real-time flood prediction and civic alert system designed to help communities respond to dangerous weather events faster.

Key Capabilities

🌊 Real-Time Data Ingestion

Streams water-level telemetry from USGS gauges.


📈 Flood Prediction Engine

An XGBoost model predicts water levels 30 minutes ahead (T+30).


🚨 Smart Alert System

Status Trigger Meaning
🟢 GREEN Normal conditions Safe
🟡 YELLOW Rapid rise detected Prepare
🔴 RED Flood stage predicted Take action

🤖 Generative AI Alerts

Using Gemini 2.0 Flash, StormShield generates clear public safety messages that avoid panic but encourage action.


🗺️ Interactive Dashboard

Built using Streamlit, featuring:

  • live telemetry
  • predictive charts
  • flood zone maps
  • AI Q&A assistant

Tech Stack

Backend

Python
FastAPI
APScheduler
Uvicorn

Frontend

Streamlit
Plotly
Folium Maps

AI / ML

XGBoost forecasting
Gemini 2.0 Flash (alerts + RAG assistant)

Data Sources

USGS Water Services API
NOAA / NWS Alerts
Bright Data scraping


The Biggest Lesson I Learned

Before this project, I thought the hardest part of AI engineering was training models.

Now I know the truth.

The hardest part is getting reliable data into the system.

Building StormShield forced us to rethink architecture, redesign pipelines, and adapt under pressure.

And that’s the real engineering skill:

resilience when things break.


Why Experiences Like This Matter

Tech often looks polished from the outside.

But behind every working system are:

  • broken pipelines
  • failed experiments
  • late-night debugging sessions

StormShield AI reminded me that innovation doesn’t come from perfect plans.

It comes from figuring things out when nothing works at first.


Project Links

GitHub
[https://github.com/Tanishaaaaaaa/StormShield]

As a woman in tech, participating in hackathons like this reminds me that innovation often comes from diverse perspectives and collaborative problem solving.

Top comments (0)