DEV Community

Cover image for PaySnap - AI Powered Wage Theft Detector
Aadarsh Praveen
Aadarsh Praveen

Posted on

PaySnap - AI Powered Wage Theft Detector

Gemma 4 Challenge: Build With Gemma 4 Submission

This is a submission for the Gemma 4 Challenge: Build with Gemma 4

What I Built

PaySnap is an AI-powered wage theft detector that helps workers
understand their paystubs and recover stolen wages in their language.

Every year, $50 billion is stolen from American workers through wage
theft. Construction workers, restaurant staff, and farmworkers are hit hardest. They don't know their rights. They can't read their paystub. Many are afraid to report.

PaySnap changes that. A worker uploads a paystub photo — or describes
their situation in Hindi, Spanish or Chinese and PaySnap tells them
exactly what they're owed, which law was broken, and how to report it.

Try it live: paysnap.vercel.app

Key features:

  • Gemma 4 E2B fine-tuned on 365,393 real DOL enforcement cases
  • Native Gemma 4 function calling — truly agentic AI
  • Reads paystub photos via Gemma 4 vision (llama.cpp)
  • Explains violations in 11 languages
  • Detects overtime, illegal deductions, minimum wage violations
  • Always provides DOL hotline: 1-866-487-9243 (free, confidential)

Demo

🎥 Watch 3-minute demo

Live app: https://paysnap.vercel.app/

Scenario 1 — Texas construction worker:

  • Input: 52 hours, $15/hour, Texas, no overtime shown
  • PaySnap detects: 12 unpaid overtime hours
  • Result: $90 owed under FLSA 29 USC 207(a)(1)

Scenario 2 — New York restaurant worker (Hindi):

  • Input: 48 hours, $16/hour, NY, UNIFORM $35 + BREAKAGE $50 deductions
  • PaySnap detects: overtime violation + 2 illegal deductions
  • Result: $149 owed — full explanation in Hindi

Code

GitHub: https://github.com/Aadarsh-Praveen/Paysnap

Fine-tuned model (GGUF):
https://huggingface.co/Aadarsh-Praveen/paysnap-gemma4-gguf

LoRA weights:
https://huggingface.co/Aadarsh-Praveen/paysnap-gemma4-lora

Training notebook:
https://kaggle.com/code/aadarshpraveen/paysnap-gemma4-finetuning

Dataset (365,393 DOL cases):
https://kaggle.com/datasets/aadarshpraveen/paysnap-labor-law-dataset


How I Used Gemma 4

I chose Gemma 4 E2B for three specific reasons:

1. Edge deployment — Workers PaySnap serves often use older
devices. E2B runs at 63 tokens/second on Apple M3 Pro and fits
in 3.4GB as a Q4_K_M GGUF. A larger model would not run locally.

2. Fine-tuning efficiency — I fine-tuned E2B on 365,393 real
DOL enforcement cases using Unsloth LoRA on a Kaggle T4 GPU.
Training loss reached 0.009. A 31B model would have been
impossible on free compute.

3. Multilingual capability — Despite its small size, E2B
generates coherent responses in Hindi, Spanish, Chinese, and 8
other languages — critical for reaching vulnerable workers.

Four ways Gemma 4 powers PaySnap:

  1. Vision — reads paystub photos via llama.cpp multimodal API
  2. Native function calling — Gemma 4 autonomously decides which tools to call (calculate_overtime, check_deductions, get_applicable_statutes, get_dol_contact)
  3. Fine-tuned knowledge — learned real DOL enforcement patterns from 365,393 cases, +11.7% improvement on LLM-as-Judge eval
  4. Multilingual explanation — explains violations in worker's language with exact statute citations

Evaluation (LLM-as-Judge, base Gemma 4 E2B as judge):

Base Gemma 4 E2B: 8.12/10
PaySnap fine-tuned: 9.07/10
Improvement: +11.7%

All 5 dimensions improved: Legal Accuracy +1.73, Statute Quality
+1.33, Actionability +0.73, Dollar Accuracy +0.67, Worker Clarity +0.27


Team
This project was built by:

Aadarsh Praveen Selvaraj Ajithakumari — @aadarsh_praveen

Suriya Kasiyalan Siva — @suriya_ks_0902


Top comments (0)