<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Vesna</title>
    <description>The latest articles on DEV Community by Vesna (@vesna123best).</description>
    <link>https://dev.to/vesna123best</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3995348%2F244dd6bf-5b7e-47dd-a53c-4d3461ede923.png</url>
      <title>DEV Community: Vesna</title>
      <link>https://dev.to/vesna123best</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/vesna123best"/>
    <language>en</language>
    <item>
      <title>I Built a World Cup 2026 Prediction Pipeline with Sportmicro, Python, and GitHub Actions</title>
      <dc:creator>Vesna</dc:creator>
      <pubDate>Sun, 21 Jun 2026 13:27:25 +0000</pubDate>
      <link>https://dev.to/vesna123best/-i-built-a-world-cup-2026-prediction-pipeline-with-sportmicro-python-and-github-actions-4h03</link>
      <guid>https://dev.to/vesna123best/-i-built-a-world-cup-2026-prediction-pipeline-with-sportmicro-python-and-github-actions-4h03</guid>
      <description>&lt;p&gt;I wanted a football prediction project that felt closer to a real data product than a one-off notebook.&lt;/p&gt;

&lt;p&gt;So I built a pipeline that:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;pulls World Cup and national-team data from Sportmicro&lt;/li&gt;
&lt;li&gt;engineers match and team-form features&lt;/li&gt;
&lt;li&gt;trains a hybrid prediction model&lt;/li&gt;
&lt;li&gt;generates upcoming fixture predictions and provisional title odds&lt;/li&gt;
&lt;li&gt;exports machine-readable outputs&lt;/li&gt;
&lt;li&gt;can refresh itself on a schedule through GitHub Actions&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The repo is here:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;world-cup-2026-prediction-sportmicro-api&lt;/code&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What problem I was trying to solve
&lt;/h2&gt;

&lt;p&gt;A lot of sports prediction demos stop at one model and one CSV. That is fine for experimentation, but it breaks down quickly if you want something you can rerun, automate, or publish.&lt;/p&gt;

&lt;p&gt;For this project, I wanted a workflow that could:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;discover the relevant World Cup league data dynamically&lt;/li&gt;
&lt;li&gt;combine tournament history with recent national-team form&lt;/li&gt;
&lt;li&gt;train models from cached data&lt;/li&gt;
&lt;li&gt;generate fresh outputs without manual cleanup&lt;/li&gt;
&lt;li&gt;fit naturally into CI/CD&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The result is a small but production-style sports analytics pipeline.&lt;/p&gt;

&lt;h2&gt;
  
  
  The stack
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Python for data fetching, feature engineering, training, and reporting&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;scikit-learn&lt;/code&gt; for classification and score expectation models&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;pandas&lt;/code&gt; and &lt;code&gt;numpy&lt;/code&gt; for the data layer&lt;/li&gt;
&lt;li&gt;Node.js only where it adds value: generating Sportmicro endpoint paths through the official &lt;code&gt;@sportmicro/endpoint&lt;/code&gt; package&lt;/li&gt;
&lt;li&gt;GitHub Actions for scheduled refreshes&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That Python + Node split is deliberate. The modeling and orchestration live in Python, while the API query construction stays aligned with Sportmicro's official endpoint builder.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why the Sportmicro integration is interesting
&lt;/h2&gt;

&lt;p&gt;Instead of hardcoding raw query strings, the repo sends a JSON request spec from Python to a Node helper script, and the helper builds the final endpoint path using &lt;code&gt;@sportmicro/endpoint&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;That means filters like &lt;code&gt;eq&lt;/code&gt;, &lt;code&gt;gte&lt;/code&gt;, &lt;code&gt;in(...)&lt;/code&gt;, pagination, and ordering are all constructed in a consistent way.&lt;/p&gt;

&lt;p&gt;This is the part I like most architecturally: Python stays clean, and the final URL generation still uses the official tooling intended for the API.&lt;/p&gt;

&lt;p&gt;Example shape of the workflow:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;python &lt;span class="nt"&gt;-m&lt;/span&gt; wc2026_predictor run-all
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Under the hood, that flow:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;discovers the World Cup league id&lt;/li&gt;
&lt;li&gt;downloads seasons and historical World Cup matches&lt;/li&gt;
&lt;li&gt;pulls recent national-team matches&lt;/li&gt;
&lt;li&gt;trains the models&lt;/li&gt;
&lt;li&gt;predicts future World Cup fixtures&lt;/li&gt;
&lt;li&gt;writes reports and CSV/JSON outputs&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  How the model works
&lt;/h2&gt;

&lt;p&gt;This is not a single-model predictor.&lt;/p&gt;

&lt;p&gt;The repo uses a layered approach:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Elo-style team strength updates&lt;/li&gt;
&lt;li&gt;rolling form features from recent matches&lt;/li&gt;
&lt;li&gt;a &lt;code&gt;RandomForestClassifier&lt;/code&gt; for match outcome classification&lt;/li&gt;
&lt;li&gt;two &lt;code&gt;PoissonRegressor&lt;/code&gt; models for expected home and away goals&lt;/li&gt;
&lt;li&gt;a hybrid ensemble that blends ML probabilities, Elo expectation, and Poisson-derived outcome probabilities&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I used this design because football is noisy. A pure classifier can miss score dynamics, and a pure Poisson model can miss richer recent-form patterns. Combining them gives a more balanced prediction layer.&lt;/p&gt;

&lt;h2&gt;
  
  
  What the pipeline outputs
&lt;/h2&gt;

&lt;p&gt;Each run produces artifacts that are useful both for people and for downstream systems:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;predictions/latest_match_predictions.csv&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;predictions/latest_match_predictions.json&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;predictions/title_odds.csv&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;predictions/title_odds.json&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;predictions/report.md&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The match predictions include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;predicted result&lt;/li&gt;
&lt;li&gt;home/draw/away probabilities&lt;/li&gt;
&lt;li&gt;expected goals&lt;/li&gt;
&lt;li&gt;most likely scoreline&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The title odds are intentionally labeled as provisional when the full tournament structure is not yet available from the API.&lt;/p&gt;

&lt;h2&gt;
  
  
  A small detail that matters: automation
&lt;/h2&gt;

&lt;p&gt;I also wired the repo to run through GitHub Actions on a schedule and by manual dispatch.&lt;/p&gt;

&lt;p&gt;The workflow:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;installs Python and Node&lt;/li&gt;
&lt;li&gt;installs dependencies&lt;/li&gt;
&lt;li&gt;runs the full prediction pipeline&lt;/li&gt;
&lt;li&gt;commits refreshed outputs back into the repository&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That turns the project from "interesting code" into something that can act like a living forecast feed.&lt;/p&gt;

&lt;h2&gt;
  
  
  Project structure
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;world-cup-2026-prediction-sportmicro-api/
├─ .github/workflows/automation.yml
├─ scripts/build_endpoint.mjs
├─ scripts/run_pipeline.ps1
├─ src/wc2026_predictor/
│  ├─ cli.py
│  ├─ endpoint_builder.py
│  ├─ features.py
│  ├─ modeling.py
│  ├─ pipeline.py
│  ├─ reporting.py
│  └─ sportmicro.py
├─ data/
├─ artifacts/
└─ predictions/
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I tried to keep the boundaries clean:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;sportmicro.py&lt;/code&gt; handles API access and normalization&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;features.py&lt;/code&gt; builds the training and fixture frames&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;modeling.py&lt;/code&gt; owns training and prediction logic&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;pipeline.py&lt;/code&gt; orchestrates the end-to-end flow&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;reporting.py&lt;/code&gt; turns outputs into a readable Markdown report&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Running it locally
&lt;/h2&gt;

&lt;p&gt;Setup is straightforward:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;install
&lt;/span&gt;python &lt;span class="nt"&gt;-m&lt;/span&gt; pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;--upgrade&lt;/span&gt; pip
python &lt;span class="nt"&gt;-m&lt;/span&gt; pip &lt;span class="nb"&gt;install&lt;/span&gt; .[dev]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then create &lt;code&gt;.env&lt;/code&gt; from &lt;code&gt;.env.example&lt;/code&gt; and set your API key:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SPORTMICRO_API_KEY=your_sportmicro_api_key_here
SPORTMICRO_BASE_URL=https://football.sportmicro.com
WC2026_PREDICTION_START_DATE=2026-01-01
WC2026_RECENT_MATCH_START_DATE=2021-01-01
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Run the full workflow:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;python &lt;span class="nt"&gt;-m&lt;/span&gt; wc2026_predictor run-all
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Or use the PowerShell wrapper:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight powershell"&gt;&lt;code&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;/scripts/run_pipeline.ps1&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  What I would improve next
&lt;/h2&gt;

&lt;p&gt;There is still plenty of room to push this further:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;add probability calibration analysis&lt;/li&gt;
&lt;li&gt;bring in more match-statistics features&lt;/li&gt;
&lt;li&gt;simulate bracket progression once full 2026 structure is available&lt;/li&gt;
&lt;li&gt;publish a small dashboard on top of the generated outputs&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Those are the kinds of upgrades that would move the project from "useful forecasting repo" to "full sports analytics product."&lt;/p&gt;

&lt;h2&gt;
  
  
  Final thought
&lt;/h2&gt;

&lt;p&gt;The part I care about most in projects like this is not just prediction accuracy. It is whether the system is structured well enough to rerun, inspect, automate, and extend.&lt;/p&gt;

&lt;p&gt;That was the goal here: build a World Cup 2026 prediction repo that is practical, composable, and easy to evolve.&lt;/p&gt;

&lt;p&gt;If you are building in sports analytics, APIs, or ML pipelines, I think this pattern is more useful than another isolated notebook.&lt;/p&gt;

&lt;p&gt;If you want, I can also turn this into:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;a shorter dev.to version with a more punchy hook&lt;/li&gt;
&lt;li&gt;a more SEO-focused version&lt;/li&gt;
&lt;li&gt;a version written in a more personal "build in public"&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>python</category>
      <category>api</category>
      <category>opensource</category>
      <category>machinelearning</category>
    </item>
  </channel>
</rss>
