📊 Project Origins: Why I Built This Tool
Okay, so I’ve always been super curious about how machines can learn just by watching us — and more specifically, how they can mimic our actions. That’s what Behavior Cloning is all about. But then I hit the wall: I needed data. Not just any data, but frame-perfect screen captures, with keypresses synced like a metronome.
So, like any over-caffeinated student with an idea, I built my own tool to collect that data!
"There is no intelligent model without intelligent data."
👨💻 Architecture Overview: Modular and Scalable Design
This started as a messy Python script. Then it turned into something modular, clean, and — dare I say — fun to use.
1. User Interface (GUI)
Made with good ol’ Tkinter, the interface is beginner-friendly but surprisingly powerful:
- ✉️ Key Configuration: Pick which keys you want to track. It’s flexible!
- 🔍 Monitor Selection: Multi-monitor support? Check.
- ✂️ Snip Region: Drag and drop to select a specific screen area. No more cropping.
- ▶️ Live Preview: Real-time visuals of what’s being recorded.
- ⏹ Stop & Save: One click, and you get a structured
.npz
and synced.mp4
.
2. Overlay Region Selector
This might be my favorite part. The overlay lets you physically draw the recording area on screen. It’s intuitive and feels kind of magical the first time you use it.
# Example region output: (left, top, width, height)
region = (600, 200, 227, 227)
3. Screen Capture + Key Tracking
Here’s where the magic happens under the hood:
- Every time you press or release a key, the screen is captured.
- A snapshot is taken with metadata that includes the key status, press durations, and frame index.
- And yes, it’s real-time. I still smile when I see how precise the timing is.
(
image, # screenshot (float32 RGB)
prev_interval, # time since last release
multi_hot, # binary vector of active keys
hold_durations, # how long each key has been held
"press", # event type: "press" or "release"
current_video_frame # corresponding video frame index
)
4. Live Video Recording
Oh, and did I mention it records video too? Every frame of the region you selected is saved as .mp4
, perfectly synced with the .npz
entries.
- This is so helpful for debugging.
- And honestly, just watching the dataset back looks cool.
5. NPZ Dataset Format
In the end, everything gets saved into a .npz
file:
-
data
: All the keyframe examples -
keys
: Which keys were being tracked
Loading it back is a breeze with NumPy:
loaded = np.load("collected/a.npz", allow_pickle=True)
data = loaded["data"]
🔄 Why Not Just Use Screen Recorders?
Honestly, I asked this to myself before even starting.
“Unlike traditional screen recorders, this tool captures data in a machine-readable format synchronized with keyboard events.”
OBS is great. But it doesn’t give you:
- ❌ Frame-accurate key labels
- ❌ Key hold durations
- ❌ Structured datasets
- ❌ Replayable keypress metadata
This tool does. And that’s why it was worth building. I wanted to collect data I could actually train a model on.
💥 Bonus: How I Failed a Few Times Before It Worked
This whole thing didn’t just work out perfectly from the start. Oh no — there were bugs, dead-ends, and a lot of late-night debugging with ice-cold tea on my desk.
📦 The .npy
Struggles
At first, I tried saving everything as .npy
. It was quick and simple — just np.save()
the array and call it a day. But soon, I realized I needed to store more than just a flat array:
- Each sample was a tuple: image, timing, key vector, durations, etc.
-
.npy
didn’t support storing multiple arrays in one file cleanly. - I kept overwriting or misloading my data — and let’s not even talk about corrupted writes after crashes.
Then I discovered .npz
. Game changer! 🎉
It let me store both the data
array and a keys
array in a single compressed file, with labels. Suddenly everything made sense:
np.savez("dataset.npz", data=collected_data, keys=key_list)
Simple. Organized. And easy to inspect or load later.
🎬 The Video Sync Nightmare
Integrating video recording was a beast. I wanted every frame captured in the preview to also be saved as a .mp4
, and synchronized with the data entries.
But...
- OpenCV’s
VideoWriter
was picky about frame dimensions and codecs. - My previews were RGB, OpenCV wanted BGR.
- Frame drops and crashes if I wrote too fast or too early.
It took hours of testing frame conversions, debugging off-by-one errors in frame count, and fixing timestamp mismatches. But eventually — finally — the video recorder synced perfectly with my data stream.
When I got the first synced .npz
and .mp4
pair, I literally said out loud: “It works!” 😄
📎 Appendix: How It Works Internally
For those of you who want to go a little deeper — here are some behind-the-scenes details that didn’t fit naturally into the main story but are super important for understanding how everything holds together:
🔁 Real-Time Preview Loop
In Data_scrap_tool.py
, there’s a background thread called self._preview_loop
that continuously:
- Grabs the selected screen region
- Checks which keys are currently pressed
- Draws key overlays using
get_preview_image
- Updates the GUI and writes the frame to the video stream
This loop runs at ~30 FPS and ensures both the live preview and .mp4
file stay in sync with the key state.
🔄 Keyboard Listener Lifecycle
The data_collector.py
module runs a global key listener using pynput
. But here’s the trick:
- When you change the tracked keys from the GUI, the listener stops and restarts with the new configuration.
- This is handled inside
open_key_settings()
inRecorderGUI
.
This ensures that your new key list applies immediately without restarting the app.
🛑 Graceful Stop & Recovery
When you stop recording:
- The
.npz
file is saved usingsave_dataset_npz()
- The temporary
.mp4
file is renamed and synced - Edge cases like name collisions or IO failures are handled using
os.replace()
wrapped in atry/except
block
This protects your data even if the app closes abruptly or file access fails.
🧵 Threading and GUI Safety
Tkinter is single-threaded — but preview and capture operations need to run in the background. That’s why:
- The preview loop runs in a
daemon=True
thread - GUI updates are safe because image references (
self.preview_imgtk
) are retained to avoid garbage collection
These are the sorts of little things that took hours to get right, but they make a huge difference in stability and usability. If you're planning to extend or modify this project, these internals are where the real action is!
🧪 Try It Yourself on GitHub
If you're curious to experiment with the tool or even contribute to it, you're absolutely welcome!
🔗 GitHub Repository: https://github.com/Ertugrulmutlu/-Data-Scrap-Tool-Advanced-Dataset-Viewer
You’ll find:
- All the source code with clear module separation
- Instructions to run it locally (just Python + pip install)
- Bonus scripts for inspecting
.npz
files and previewing key sequences
Feel free to fork, star ⭐, or open issues if something breaks. I’d love to hear how others use or build on top of this.
Top comments (0)