<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Ayush Gupta</title>
    <description>The latest articles on DEV Community by Ayush Gupta (@ayushgupta07xx).</description>
    <link>https://dev.to/ayushgupta07xx</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3959686%2F49d9a1eb-3b7e-4e66-ad44-ba9b8b2146a2.jpeg</url>
      <title>DEV Community: Ayush Gupta</title>
      <link>https://dev.to/ayushgupta07xx</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/ayushgupta07xx"/>
    <language>en</language>
    <item>
      <title>Building a Free OSHA Compliance Tool — 8 Weeks Solo</title>
      <dc:creator>Ayush Gupta</dc:creator>
      <pubDate>Sat, 30 May 2026 10:13:05 +0000</pubDate>
      <link>https://dev.to/ayushgupta07xx/building-a-free-osha-compliance-tool-8-weeks-solo-325p</link>
      <guid>https://dev.to/ayushgupta07xx/building-a-free-osha-compliance-tool-8-weeks-solo-325p</guid>
      <description>&lt;p&gt;Commercial workplace-safety software — Protex AI, Intenseye, and the rest — runs $500 to $2,000 a month. It watches camera feeds for PPE violations: a worker without a hard hat, a missing high-vis vest, no fall harness at height. The technology isn't exotic anymore. The price tag is.&lt;/p&gt;

&lt;p&gt;So over eight weeks, solo, I built &lt;strong&gt;SafetyVision&lt;/strong&gt; — an open-source PPE compliance monitor that does the core job for free and runs on $0 of infrastructure. Not a toy: a fine-tuned detection model, explainable predictions, OSHA-grounded incident reports, compliance forecasting, a documented API and SDK, and a one-command self-host. Three live surfaces, all free-tier.&lt;/p&gt;

&lt;p&gt;▶ &lt;strong&gt;&lt;a href="https://youtu.be/I9FxbBiZ18c" rel="noopener noreferrer"&gt;3-minute walkthrough&lt;/a&gt;&lt;/strong&gt; · &lt;strong&gt;&lt;a href="https://safetyvision.vercel.app" rel="noopener noreferrer"&gt;Live app&lt;/a&gt;&lt;/strong&gt; · &lt;strong&gt;&lt;a href="https://github.com/ayushgupta07xx/SafetyVision" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This is the story of the decisions that mattered — including the ones that didn't go to plan.&lt;/p&gt;




&lt;h2&gt;
  
  
  The product, in one breath
&lt;/h2&gt;

&lt;p&gt;Upload a worksite photo. SafetyVision finds each worker, flags missing PPE in red ranked by risk, shows you &lt;em&gt;why&lt;/em&gt; it flagged it (a GradCAM heatmap and SHAP attribution), writes an incident report citing the actual OSHA regulation, exports an audit-ready PDF, and forecasts the site's 7-day compliance trend. Every inspection is saved to your history.&lt;/p&gt;

&lt;p&gt;It runs three ways: a Next.js web app on Vercel (the product), a no-signup Gradio demo on Hugging Face Spaces (the open-source try-it), and a serverless REST API on AWS Lambda (for developers). Same core powers all three.&lt;/p&gt;

&lt;p&gt;The compromises in this project are about &lt;em&gt;scale&lt;/em&gt; — free tiers, a small model, a modest training set — never about &lt;em&gt;sophistication&lt;/em&gt;. Here's where the sophistication went.&lt;/p&gt;

&lt;h2&gt;
  
  
  The model: and an honest 0.763
&lt;/h2&gt;

&lt;p&gt;Detection is a fine-tuned YOLOv8, exported to ONNX so it runs on a plain CPU — no GPU required for end users. Version 1 was YOLOv8*&lt;em&gt;n&lt;/em&gt;* (nano), trained on ~58k images, landing at &lt;strong&gt;0.701 mAP@50&lt;/strong&gt;. Decent, but it had a clear weakness: it was biased toward frontal poses and missed workers seen from the side, the back, or partially occluded.&lt;/p&gt;

&lt;p&gt;For v2 I went bigger — YOLOv8*&lt;em&gt;s&lt;/em&gt;* (small), 80k+ images, and an aggressive Albumentations augmentation pipeline (random occlusion, brightness/contrast jitter, motion blur, mosaic) specifically to fight that frontal bias. The target was &lt;strong&gt;mAP@50 ≥ 0.78&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;It landed at &lt;strong&gt;0.763&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;I could have buried that. Instead it's in the README, the model card, and the demo's closing line. Here's why: a recruiter or a safety officer evaluating this doesn't trust a project with no failure modes — they trust one that knows exactly where it's weak. v2 is a real improvement (Fall-Detected hits 0.956, hard hats 0.936), and the per-class breakdown shows precisely which classes still struggle (NO-Safety-Vest at 0.382). An honest 0.763 with a documented gap is worth more than a suspicious 0.78.&lt;/p&gt;

&lt;p&gt;That became the project's organizing principle: &lt;strong&gt;the demo is curated, the model card is honest.&lt;/strong&gt; The demo shows the best-case path because that's what every product demo does; the model card lists every failure mode because that's what every responsible model card does. Both exist on purpose.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why two explainers, not one
&lt;/h2&gt;

&lt;p&gt;Every detection ships with &lt;em&gt;both&lt;/em&gt; a GradCAM heatmap and SHAP attribution. That's deliberate redundancy, and it's the feature I'm most attached to.&lt;/p&gt;

&lt;p&gt;GradCAM answers "where did the model look?" — it paints a heatmap over the image so you can see it attended to the head region when it flagged a missing hard hat. It's spatial and immediately intuitive; a safety officer with no ML background gets it in two seconds.&lt;/p&gt;

&lt;p&gt;SHAP answers a different question: "which pixels actually moved the prediction?" — per-pixel attribution that a technical reviewer can interrogate. It's slower to compute (the heaviest step in the pipeline) and harder to read, but it's the one that holds up under scrutiny.&lt;/p&gt;

&lt;p&gt;A black-box safety tool is a non-starter — if the system flags a worker, someone needs to be able to ask &lt;em&gt;why&lt;/em&gt;. Shipping both means the answer satisfies the floor manager and the auditor.&lt;/p&gt;

&lt;h2&gt;
  
  
  Grounding the reports in real regulations
&lt;/h2&gt;

&lt;p&gt;A generic "this worker is missing a hard hat" message isn't useful. A citation of &lt;strong&gt;29 CFR 1910.135(a)(1)&lt;/strong&gt; is. So the incident report is generated by a multimodal Gemini Flash model that receives three things: the annotated image (so it sees what the camera sees), the structured violation data, and the relevant OSHA regulation text — retrieved by a RAG pipeline (Qdrant vector store + BGE embeddings) over the actual 29 CFR 1910 and 1926 standards.&lt;/p&gt;

&lt;p&gt;Does the RAG grounding actually help, or is it theater? I A/B tested it. With RAG vs. without, judged on report quality: &lt;strong&gt;RAG wins, Cohen's d = 0.65, p = 0.0197&lt;/strong&gt; (paired t-test, N=16). Small sample, but a real and significant effect. I ran a second A/B on the detection confidence threshold (0.40 vs 0.55): &lt;strong&gt;0.40 wins, McNemar p = 4×10⁻⁵&lt;/strong&gt; on 200 held-out images. Decisions backed by numbers, not vibes.&lt;/p&gt;

&lt;h2&gt;
  
  
  The infrastructure war stories
&lt;/h2&gt;

&lt;h3&gt;
  
  
  The GCP quota wall
&lt;/h3&gt;

&lt;p&gt;I planned to train on GCP with the $300 free credit. Every GPU VM request bounced — across dozens of zones and machine types. The error messages pointed at regional quotas that &lt;em&gt;looked&lt;/em&gt; fine. The real culprit took systematic testing to find: a global &lt;code&gt;GPUS_ALL_REGIONS&lt;/code&gt; umbrella quota that defaults to &lt;strong&gt;0&lt;/strong&gt; on new paid accounts and silently overrides every regional quota. You can have regional GPU quota of 1 and still be blocked because the global cap is 0.&lt;/p&gt;

&lt;p&gt;For v1 I pivoted to Kaggle's free 2×T4 notebooks and trained around the 12-hour session cap with checkpoint-resume. For v2, after the account aged and an explicit quota request cleared the global cap, I trained on a single GCP L4 — then wound the whole GCP footprint down to $0 once the weights were on Hugging Face. Documented the entire diagnosis as an architecture decision record, because the next person hitting that wall deserves better than the error message I got.&lt;/p&gt;

&lt;h3&gt;
  
  
  Lambda Function URLs over API Gateway
&lt;/h3&gt;

&lt;p&gt;For the API, I chose a &lt;strong&gt;Lambda Function URL&lt;/strong&gt; over API Gateway. The reasoning: Function URLs are free &lt;em&gt;forever&lt;/em&gt;, while API Gateway's free tier expires after 12 months — and for a single &lt;code&gt;/analyze&lt;/code&gt; endpoint, I didn't need API Gateway's usage plans or request transformations. API-key auth and rate-limiting live at the handler level instead. It's the kind of trade you make explicit so the alternative is on record, not the kind you default into.&lt;/p&gt;

&lt;h3&gt;
  
  
  The 4MB that looked like 6MB
&lt;/h3&gt;

&lt;p&gt;Lambda Function URLs cap payloads at 6MB. I built the frontend to that limit, and 5MB images started returning 413s. The cause: &lt;strong&gt;base64 inflation.&lt;/strong&gt; A 6MB on-the-wire cap is really ~4MB of raw image once you account for the ~33% base64 overhead in the JSON envelope. And it bites the &lt;em&gt;response&lt;/em&gt; too — my annotated image, GradCAM, and SHAP visuals were going out as PNG and blowing the ceiling. Fix: JPEG q85 instead of PNG, cap input resolution at 1280px, and set the real frontend limit to 4MB. The kind of constraint that's invisible until production traffic finds it.&lt;/p&gt;

&lt;h2&gt;
  
  
  $0, on purpose
&lt;/h2&gt;

&lt;p&gt;The hard constraint was zero ongoing cost, and every runtime service honors it: AWS Lambda/S3/DynamoDB/ECR (always-free, no 12-month cliff), Supabase for Postgres + auth, Vercel for the frontend, Hugging Face for hosting and weights, Qdrant Cloud for vectors, Google AI Studio for the LLM. Cost per analysis: $0.&lt;/p&gt;

&lt;p&gt;That's not a limitation to apologize for — for a small factory that can't justify $2,000/month, the free version &lt;em&gt;is&lt;/em&gt; the product-relevant version. And the architecture is built so the expensive upgrades (a bigger model on a GPU endpoint, a frontier LLM, multi-seed evals) are config flags away, not rewrites. I built the cheap version of an upgrade-ready system.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I'd do with more time
&lt;/h2&gt;

&lt;p&gt;None of these are blind spots — each was a conscious trade against "ship the rigorous free version."&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Close the mAP gap to 0.78+&lt;/strong&gt; — more side/back-view and occluded training data; the augmentation helped but didn't fully solve the pose bias.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;A semantic guardrail / second model&lt;/strong&gt; for the report layer, beyond the current prompt-level grounding.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Multi-seed evals and bigger A/B samples&lt;/strong&gt; — the current intervals are wide; the effects are directional, not bankable beyond the strong ones.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;RTSP / live-camera ingestion&lt;/strong&gt; — the obvious product next step, but it needs persistent compute, which breaks the $0 rule.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What it actually demonstrates
&lt;/h2&gt;

&lt;p&gt;Eight weeks, solo, $0: a fine-tuned and ONNX-exported detector, dual explainability, RAG-grounded multimodal reporting, time-series forecasting with a baseline, statistically-validated A/B tests, a three-surface deployment (Next.js + Vercel, Gradio + HF Spaces, serverless AWS via Terraform), a published PyPI SDK with a CLI, and honest metrics throughout.&lt;/p&gt;

&lt;p&gt;The point was never to out-spend the incumbents. It was to show that the capability is no longer the moat — and to build the free version well enough that someone would actually use it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="https://safetyvision.vercel.app" rel="noopener noreferrer"&gt;Try the live demo&lt;/a&gt;&lt;/strong&gt; · &lt;strong&gt;&lt;a href="https://huggingface.co/ayushgupta7777/safetyvision-yolov8" rel="noopener noreferrer"&gt;Read the model card&lt;/a&gt;&lt;/strong&gt; · &lt;strong&gt;&lt;a href="https://github.com/ayushgupta07xx/SafetyVision" rel="noopener noreferrer"&gt;Deploy your own&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;SafetyVision is an AI-assisted pre-screening tool to support human safety officers — not a replacement for human judgment.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>machinelearning</category>
      <category>aws</category>
      <category>python</category>
      <category>opensource</category>
    </item>
  </channel>
</rss>
