<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: yubin hong</title>
    <description>The latest articles on DEV Community by yubin hong (@zero_to_one0to1).</description>
    <link>https://dev.to/zero_to_one0to1</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F4014265%2F2cfd9d07-bff6-475e-bafc-3b3366834c62.jpg</url>
      <title>DEV Community: yubin hong</title>
      <link>https://dev.to/zero_to_one0to1</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/zero_to_one0to1"/>
    <language>en</language>
    <item>
      <title>We took highlight detection from 0.56 to 0.86 — with zero new footage and zero cloud training</title>
      <dc:creator>yubin hong</dc:creator>
      <pubDate>Sat, 04 Jul 2026 01:06:38 +0000</pubDate>
      <link>https://dev.to/zero_to_one0to1/we-took-highlight-detection-from-056-to-086-with-zero-new-footage-and-zero-cloud-training-j7g</link>
      <guid>https://dev.to/zero_to_one0to1/we-took-highlight-detection-from-056-to-086-with-zero-new-footage-and-zero-cloud-training-j7g</guid>
      <description>&lt;p&gt;SportZone turns a parent's phone video of a youth game into a highlight reel. The hard part is finding the decisive moment. On real footage we were missing nearly half of them. Here's how a week of measuring — not guessing — fixed it.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fxgjfoe4ll2i8r0ls6zdg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fxgjfoe4ll2i8r0ls6zdg.png" alt=" " width="799" height="265"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;01THE PROBLEM&lt;/em&gt;&lt;br&gt;
On real phone footage, the model went half-blind&lt;/p&gt;

&lt;p&gt;Our classifier scored 0.82 on curated YouTube clips. Confident, we ran it on genuine parent-filmed smartphone video for the first time. The number that matters — did we catch the decisive moment? — came back at 0.56. We were missing 44% of the highlights. A highlight tool that misses half the highlights isn't a tool.&lt;/p&gt;

&lt;p&gt;The curated-vs-real gap is the whole game. So the first thing we built wasn't a fix — it was a way to measure honestly on the real domain.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;02 THE WRONG TURN&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;We bet on the obvious culprit. We lost.&lt;/p&gt;

&lt;p&gt;"The boxes look tight — the detector must be missing people." Reasonable. So we did the expensive thing: assembled ~9,100 commercially-licensed sports images, fine-tuned a detector on a cloud GPU, hit a healthy mAP of 0.716, and plugged it back in.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2F3s39q8n1ojdppgu0pm15.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2F3s39q8n1ojdppgu0pm15.png" alt=" " width="800" height="96"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Hours of data-wrangling and training bought us nothing. Frustrating — but the failure was the clue. If a better detector changes nothing, detection was never the bottleneck. We just didn't have the evidence yet.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;03 THE MEASUREMENT THAT CHANGED EVERYTHING&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;We decomposed the failures instead of arguing about them&lt;/p&gt;

&lt;p&gt;We took every missed moment and tagged where in the pipeline it leaked: detection → tracking → pose → impact. One script, no opinions. The breakdown ended the debate:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fadybujwsj8c63r4j1ovq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fadybujwsj8c63r4j1ovq.png" alt=" " width="800" height="259"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Detection was 15%. The real leak — 55% — was downstream: the person was found and tracked, but the pose signal was too weak for our kinematics to register the impact. We'd been paving the wrong road. Lesson: decompose before you invest.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;04 FIXING THE REAL BOTTLENECK&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Sharper pose signal broke a ceiling we'd been stuck under&lt;/p&gt;

&lt;p&gt;Now aimed at the right target, the fixes were cheap and pure-software: upscale the pose crop (256 → 384), raise model precision (complexity 1 → 2), and stop cropping off the legs with an asymmetric crop bias. No data. No GPU. This wasn't moving an operating point — it was a genuinely cleaner signal, and it broke past an F1 ceiling of 0.54 that pure threshold-tuning had never cracked.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;05THE TRAP&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The improvement first looked like a regression&lt;/p&gt;

&lt;p&gt;We flipped on four improvements at once and recall dropped: 0.769 → 0.692. The tempting move: lower the detection threshold until the number looks good again. That would have papered over a real defect instead of finding it. So we ran a clean one-at-a-time ablation.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2F54co5lzp92xo05vx2w7z.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2F54co5lzp92xo05vx2w7z.png" alt=" " width="800" height="427"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This is the same lesson as the detector detour, one layer deeper: the number lying to you is more dangerous than the number that's low. Ablation is how you tell them apart.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;06THE RESULT&lt;/em&gt;&lt;br&gt;
0.56 → 0.86, and the weak sports came home&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fnbtmkifmx6aqk0s7l4cd.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fnbtmkifmx6aqk0s7l4cd.png" alt=" " width="800" height="736"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;● Shipping note  Precision sits at 0.51 — a few extra false highlights. For a highlight reel that's the right trade: better to over-catch than miss the goal. We tighten it after beta.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;07WHAT WE'D TATTOO ON OUR ARM&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Three lessons, paid for in wasted GPU hours&lt;/p&gt;

&lt;p&gt;i.&lt;br&gt;
Decompose before you invest. We burned a cloud fine-tune chasing a bottleneck that was 15% of the problem. A one-afternoon failure breakdown would have redirected the whole week.&lt;/p&gt;

&lt;p&gt;ii.&lt;br&gt;
Ablate one change at a time. Four fixes at once hid a regression inside a net gain. Isolation named the single culprit in one pass.&lt;/p&gt;

&lt;p&gt;iii.&lt;br&gt;
Distrust the number that recovers too easily. Lowering a threshold would have masked the video-mode bug. The convenient fix and the correct fix are rarely the same move.&lt;/p&gt;

&lt;h1&gt;
  
  
  the winning config — all software, no new data
&lt;/h1&gt;

&lt;p&gt;z_threshold: 2.0        # impact sensitivity&lt;br&gt;
video_mode: false     # the one flag that was the culprit&lt;br&gt;
crop_size: 384         # sharper pose signal&lt;br&gt;
model_complexity: 2    # pose precision&lt;br&gt;
bottom_bias: 0.15      # stop cropping off the legs&lt;br&gt;
tcn_rescue: on         # classifier rescues weak candidates&lt;/p&gt;

</description>
      <category>ai</category>
      <category>deeplearning</category>
      <category>machinelearning</category>
      <category>showdev</category>
    </item>
  </channel>
</rss>
