DEV Community

Ertugrul
Ertugrul

Posted on • Edited on

🎮 (JumpNet) Part 3: Real-Time Inference — Watching JumpNet Come Alive

⬅️ Read Part 1: Building the Data Pipeline
⬅️ Read Part 2: Training the Model


🎬 Project Goal Recap

We started with screenshots and keypresses (Part 1), turned that into a labeled dataset, and trained a model to predict jumps and hold durations (Part 2). Now, in this final part, we’re putting that model into action — inside a real game.

The ultimate goal: Can an AI play a platformer using only one key?

To answer that, I built a simulation GUI, loaded the trained model, and observed how it performs frame by frame — all in real-time. It’s the moment where the code stops being code and becomes a decision-making agent, playing a game just like you or me.


🖼️ The Simulation GUI

At the heart of this system is a custom-built Python GUI using tkinter. Here's what it does:

  • Allows region snipping on the screen (fullscreen overlay)
  • Loads a PyTorch .pt model (JumpNet)
  • Captures screenshots from the defined region every N milliseconds
  • Sends each frame through the model
  • Simulates a spacebar press (with duration) if the model decides to jump

GUI screenshot

🎛️ Tuning matters. During early testing, I tweaked threshold and interval parameters live during gameplay. This helped find a sweet spot where the model acts fast — but not too aggressively. Every millisecond matters in a twitch-based game, and fine-tuning these values meant the difference between clearing an obstacle and face-planting into it.

This GUI doesn't just serve as an interface; it became the lab where experiments were conducted, logs were collected, and breakthrough moments were celebrated.


🧠 Real-Time Inference Logic

The main loop of inference works like this:

  1. Grab current frame using mss
  2. Preprocess it (resize, normalize)
  3. Run model → get jump_prob, hold_duration
  4. Decide if jump_prob > threshold
  5. If jump, simulate a spacebar press for hold_duration
  6. Wait for interval, repeat
# main.py (simplified loop)
frame = get_screen_region()
img_tensor = transform(frame).unsqueeze(0)
jump_pred, hold_pred = model(img_tensor)

if jump_pred.item() > threshold:
    press_space_for(hold_pred.item())
Enter fullscreen mode Exit fullscreen mode

Each cycle of this loop takes around 50ms to complete on CPU, which is more than fast enough for games with reaction windows above 200ms. However, even in this modest latency range, consistency is key.


📽️ Demo Video

Want to see it in action? Here's JumpNet controlling the game in real-time:

🎬 Watch the YouTube Demo

In this demo, you can see how JumpNet reacts to incoming platforms. The most satisfying moment? Watching the AI make a perfect sequence of jumps, just as I would.


🦾 Logging & Debugging

Every decision JumpNet makes is logged:

  • Frame Index
  • Jump Prediction
  • Hold Duration
  • Actual Keypress Time
  • Whether action was taken or skipped

Screenshot of log window 1

Screenshot of log window

This made debugging a breeze and also revealed when things went wrong — like:

  • Duplicate keypresses from overlapping inference intervals
  • Tiny fluctuations in jump_prob near the threshold
  • Holding the key too long, missing landing platforms

I also found subtle bugs in my simulation logic by comparing logs to screen recordings. For example, one issue where two overlapping frames caused a second jump to interrupt a successful one was only discovered by analyzing frame-level timestamps in the logs.


⚠️ What Went Wrong (and Why That’s Fine)

Let’s be honest — even a “perfect” model on paper can stumble in reality.

I ran into these real-world issues:

  • Lag spikes during frame capture caused the game to desync with keypresses.
  • Jump timing was too early or too late because of interval jitter.
  • Hold duration drift on certain platforms, leading to overshooting.

All of these point to a single fact:

🎯 Inference isn’t just about accuracy. It’s about **timing, **consistency, and **real-world sync.**

These problems echo what we saw in Part 2’s loss spikes — especially in regression. Slight instabilities in prediction had amplified effects when deployed. Even a 0.1-second deviation in hold time made the difference between success and failure.

To mitigate some of these, I added:

  • Minimum hold threshold (to prevent 0.05s "micro-jumps")
  • Cooldown window after jump to avoid overlaps
  • Optional manual override to retune threshold mid-run

None of these solutions were perfect, but together they stabilized performance and turned the agent into something almost reliable.


🧰 Conclusion — A Playable Baseline

Despite challenges, the agent successfully passed the first few obstacles using its own vision and learned jump strategy. It wasn’t perfect — but it was autonomous. That alone is a massive milestone.

Watching an AI play a level you know intimately is like seeing your reflection behave independently. It’s eerie, thrilling, and absolutely rewarding.

This completes the initial trilogy:

  • 🏗️ Part 1: Built the dataset
  • 🧠 Part 2: Trained the model
  • 🎮 Part 3: Watched it play

And the best part? Every piece of this pipeline is modular and extensible. I can swap the model, change the input resolution, or even hook in another task entirely (like shooting or ducking).

This isn’t just a demo. It’s a real, scalable testbed for interactive vision-based agents.


🔁 Try the Full Pipeline Yourself

Curious to build your own AI agent from scratch and watch it play a game in real-time? Everything you need is available and documented:

  • 🧱 Data Collection Tool
  • 🧠 Model Training Scripts
  • 🖥️ Real-Time Inference GUI
  • 🗃️ Logs and Dataset Viewers

Start from Part 1 and follow the journey — or dive straight in using the code:

🔗 GitHub: Full Data & Inference Toolkit

You’ll be surprised how far one key and one neural network can take you.

🔗GitHub: JumpNet Project Repository


"When your model finally makes a decision on its own — and it works — it's like watching a baby take its first step."

⬅️ Back to Part 1
⬅️ Back to Part 2


Top comments (0)