DEV Community

Ertugrul
Ertugrul

Posted on • Edited on

Modular Snip Recorder: A Data Collection Tool for Behavior Cloning (1/2)

📊 Project Origins: Why I Built This Tool

Okay, so I’ve always been super curious about how machines can learn just by watching us — and more specifically, how they can mimic our actions. That’s what Behavior Cloning is all about. But then I hit the wall: I needed data. Not just any data, but frame-perfect screen captures, with keypresses synced like a metronome.

So, like any over-caffeinated student with an idea, I built my own tool to collect that data!

"There is no intelligent model without intelligent data."


👨‍💻 Architecture Overview: Modular and Scalable Design

This started as a messy Python script. Then it turned into something modular, clean, and — dare I say — fun to use.

1. User Interface (GUI)

Made with good ol’ Tkinter, the interface is beginner-friendly but surprisingly powerful:

  • ✉️ Key Configuration: Pick which keys you want to track. It’s flexible!
  • 🔍 Monitor Selection: Multi-monitor support? Check.
  • ✂️ Snip Region: Drag and drop to select a specific screen area. No more cropping.
  • ▶️ Live Preview: Real-time visuals of what’s being recorded.
  • Stop & Save: One click, and you get a structured .npz and synced .mp4.

Main Windows

2. Overlay Region Selector

This might be my favorite part. The overlay lets you physically draw the recording area on screen. It’s intuitive and feels kind of magical the first time you use it.

# Example region output: (left, top, width, height)
region = (600, 200, 227, 227)
Enter fullscreen mode Exit fullscreen mode

3. Screen Capture + Key Tracking

Here’s where the magic happens under the hood:

  • Every time you press or release a key, the screen is captured.
  • A snapshot is taken with metadata that includes the key status, press durations, and frame index.
  • And yes, it’s real-time. I still smile when I see how precise the timing is.
(
 image,              # screenshot (float32 RGB)
 prev_interval,      # time since last release
 multi_hot,          # binary vector of active keys
 hold_durations,     # how long each key has been held
 "press",            # event type: "press" or "release"
 current_video_frame # corresponding video frame index
)
Enter fullscreen mode Exit fullscreen mode

4. Live Video Recording

Oh, and did I mention it records video too? Every frame of the region you selected is saved as .mp4, perfectly synced with the .npz entries.

  • This is so helpful for debugging.
  • And honestly, just watching the dataset back looks cool.

Main Windows with live prewiev

5. NPZ Dataset Format

In the end, everything gets saved into a .npz file:

  • data: All the keyframe examples
  • keys: Which keys were being tracked

Loading it back is a breeze with NumPy:

loaded = np.load("collected/a.npz", allow_pickle=True)
data = loaded["data"]
Enter fullscreen mode Exit fullscreen mode

🔄 Why Not Just Use Screen Recorders?

Honestly, I asked this to myself before even starting.

“Unlike traditional screen recorders, this tool captures data in a machine-readable format synchronized with keyboard events.”

OBS is great. But it doesn’t give you:

  • ❌ Frame-accurate key labels
  • ❌ Key hold durations
  • ❌ Structured datasets
  • ❌ Replayable keypress metadata

This tool does. And that’s why it was worth building. I wanted to collect data I could actually train a model on.


💥 Bonus: How I Failed a Few Times Before It Worked

This whole thing didn’t just work out perfectly from the start. Oh no — there were bugs, dead-ends, and a lot of late-night debugging with ice-cold tea on my desk.

📦 The .npy Struggles

At first, I tried saving everything as .npy. It was quick and simple — just np.save() the array and call it a day. But soon, I realized I needed to store more than just a flat array:

  • Each sample was a tuple: image, timing, key vector, durations, etc.
  • .npy didn’t support storing multiple arrays in one file cleanly.
  • I kept overwriting or misloading my data — and let’s not even talk about corrupted writes after crashes.

Then I discovered .npz. Game changer! 🎉

It let me store both the data array and a keys array in a single compressed file, with labels. Suddenly everything made sense:

np.savez("dataset.npz", data=collected_data, keys=key_list)
Enter fullscreen mode Exit fullscreen mode

Simple. Organized. And easy to inspect or load later.

🎬 The Video Sync Nightmare

Integrating video recording was a beast. I wanted every frame captured in the preview to also be saved as a .mp4, and synchronized with the data entries.

But...

  • OpenCV’s VideoWriter was picky about frame dimensions and codecs.
  • My previews were RGB, OpenCV wanted BGR.
  • Frame drops and crashes if I wrote too fast or too early.

It took hours of testing frame conversions, debugging off-by-one errors in frame count, and fixing timestamp mismatches. But eventually — finally — the video recorder synced perfectly with my data stream.

When I got the first synced .npz and .mp4 pair, I literally said out loud: “It works!” 😄


📎 Appendix: How It Works Internally

For those of you who want to go a little deeper — here are some behind-the-scenes details that didn’t fit naturally into the main story but are super important for understanding how everything holds together:

🔁 Real-Time Preview Loop

In Data_scrap_tool.py, there’s a background thread called self._preview_loop that continuously:

  • Grabs the selected screen region
  • Checks which keys are currently pressed
  • Draws key overlays using get_preview_image
  • Updates the GUI and writes the frame to the video stream

This loop runs at ~30 FPS and ensures both the live preview and .mp4 file stay in sync with the key state.

🔄 Keyboard Listener Lifecycle

The data_collector.py module runs a global key listener using pynput. But here’s the trick:

  • When you change the tracked keys from the GUI, the listener stops and restarts with the new configuration.
  • This is handled inside open_key_settings() in RecorderGUI.

This ensures that your new key list applies immediately without restarting the app.

🛑 Graceful Stop & Recovery

When you stop recording:

  • The .npz file is saved using save_dataset_npz()
  • The temporary .mp4 file is renamed and synced
  • Edge cases like name collisions or IO failures are handled using os.replace() wrapped in a try/except block

This protects your data even if the app closes abruptly or file access fails.

🧵 Threading and GUI Safety

Tkinter is single-threaded — but preview and capture operations need to run in the background. That’s why:

  • The preview loop runs in a daemon=True thread
  • GUI updates are safe because image references (self.preview_imgtk) are retained to avoid garbage collection

These are the sorts of little things that took hours to get right, but they make a huge difference in stability and usability. If you're planning to extend or modify this project, these internals are where the real action is!


🧪 Try It Yourself on GitHub

If you're curious to experiment with the tool or even contribute to it, you're absolutely welcome!

🔗 GitHub Repository: https://github.com/Ertugrulmutlu/-Data-Scrap-Tool-Advanced-Dataset-Viewer

You’ll find:

  • All the source code with clear module separation
  • Instructions to run it locally (just Python + pip install)
  • Bonus scripts for inspecting .npz files and previewing key sequences

Feel free to fork, star ⭐, or open issues if something breaks. I’d love to hear how others use or build on top of this.


🎥 Watch it in Action


Top comments (0)