Let’s be honest: there is nothing quite like the feeling of moving a machine learning model off a messy local Jupyter Notebook and seeing it live, breathing, and clickable on the internet.
A few weeks ago, I decided to tackle a fun challenge: predicting the knockout stages of the upcoming 2026 FIFA World Cup. Instead of just looking at standard text readouts on my terminal, I wanted to build a complete, interactive visual experience that anyone could play with.
Here is the story of how I built the data pipeline, why I chose my model, and the cheeky trick I used to keep the app online 24/7 for free.
🏗️ Breaking Down the Pipeline
I wanted to keep this workspace modular and clean. Instead of one massive script that does everything, I split the workflow across four dedicated stages:
-
The Ingestion (
01_data_alignment.ipynb): Cleaning up raw data and aligning historical match histories. -
The Feature Store (
02_historical_feature_store.ipynb): This is where the magic happens. I engineered metrics to track team form, "attacking velocity," and defensive stability over time. -
The Baseline (
03_local_validation_and_baseline.ipynb): Setting up cross-validation loops to ensure the model wasn't just guessing blindly. -
The Interface (
app.py): The final production script handling the simulation logic and rendering the front-end layout.
🤖 The Engine: Why LightGBM?
For the predictive heavy lifting, I went with LightGBM.
International football data is highly tabular but filled with tricky, non-linear relationships (e.g., how a team's current attacking form clashes with an opponent's specific defensive structure). LightGBM handles these interactions beautifully. Plus, it trains incredibly fast and allows me to output precise match probability distributions rather than just a boring binary "Win/Loss" result.
🎨 Fighting with Streamlit and CSS
I wanted this app to be intuitive for regular football fans, not just techies.
I built a sidebar playground where you can match any two qualified nations head-to-head and tweak "volatility sliders" to simulate massive underdog upsets.
But the hardest part? The bracket. Streamlit is amazing, but drawing a fully responsive, clean 16-team tournament tree using standard components is tough. To fix it, I injected raw custom CSS blocks directly into the layout. Now, it draws crisp vector grid lines mapping progression straight from the Round of 16 down to a gold-bordered podium crowning the grand champion.
😴 The "Never Sleep" Secret
I deployed the app for free using Streamlit Community Cloud. It’s an incredible service, but there is a catch: if your app doesn't get traffic for 12 hours, the server puts it to sleep. The next visitor gets a slow "This app is hibernating" loading screen.
To bypass this for my portfolio, I set up a simple GitHub Actions workflow (.github/workflows/keep_alive.yml). Every 10 hours, it automatically pushes an invisible, empty commit to the repository. This tricks the cloud server into seeing active development, resetting the hibernation timer to zero. Result? The app stays awake 24/7!
🎮 Take it for a Spin!
The entire project is public, and I’d love for you to play around with it.
- 🌍 Live App: Test Your 2026 World Cup Predictions
- 📂 GitHub Source: MouroshK/World_cup_2026
🏆 The Ultimate Prediction
To give you a sneak peek of the dashboard in action, here is the full knockout bracket generated by the pipeline's baseline run:
2026 World Cup Predictive Bracket Simulation
As you can see, the LightGBM engine forecasts an absolute heavyweight clash for the final, ultimately predicting France to edge out Argentina to lift the trophy, while nations like England and Brazil fall just short in the semifinals.

Top comments (0)