The refurbished phone market moves hundreds of millions of devices a year. Yet most device condition grading is still done manually — a technician tapping a screen, listening to speakers, and writing "Good" or "Fair" on a form.
We built MobileMD to change that. Here's the technical approach behind our AI-powered device assessment system.
The Core Challenge: Fast, Accurate, Offline-First
The dealers and repair shops we target work in environments where:
- Network connectivity is unreliable
- Speed matters (grading 50 phones/day is common)
- Trust depends on verifiability, not just output labels
That drove our architecture toward on-device-first processing. The heavy diagnostic work happens directly on the device under test, with cloud AI as a secondary layer for report generation — not a hard dependency.
45 Hardware Tests: Silent + Interactive
We run 45 tests in two categories:
Silent tests (24) run automatically without user interaction:
- Battery capacity and cycle count via BatteryManager API
- Accelerometer, gyroscope, magnetometer calibration checks
- GPS signal acquisition time
- Proximity sensor response
- NFC, Bluetooth, Wi-Fi, and cellular radio presence
- Storage read/write benchmarks
- CPU/GPU stress validation
Interactive tests (21) require brief technician input:
- Multi-touch grid (verifies each touch zone independently)
- Display uniformity (black/white/primary color fill screens)
- Front + rear camera capture
- Speaker FFT playback + microphone recording
- Earpiece and loudspeaker directional tests
- Volume and side button press verification
- Biometric sensor checks (fingerprint, Face ID)
Each test returns a structured result: pass/fail status, raw measurement, and confidence score.
Cosmetic Grading: CV-First Pipeline
Cosmetic assessment is the hardest part to automate — "scratched" exists on a spectrum, and lighting conditions in a dealer's shop vary wildly.
Our approach: CV-first, cloud-second.
Dual-Capture Technique
We capture two images per surface:
Dark-field (black screen): With the screen backlight off in a lit environment, surface scratches refract ambient light and become visible against the dark background. We detect these using:
- Laplacian variance: Measures local sharpness changes — high variance indicates abrupt edge transitions consistent with scratch geometry
- Sobel edge detection: Highlights directional edges; linear patterns at non-display-element angles flag surface damage
Light-field (white screen): Full white illumination reveals dead pixels (dark spots against the uniform white field), uneven backlight bleed, and pressure damage. Histogram analysis across the image flags luminance anomalies in what should be a uniform white output.
On-Device Pre-Screening
Before any image leaves the device, we run the full CV pipeline locally. This pre-screening resolves approximately 80% of cosmetic assessments on-device — we only invoke cloud Vision AI when local analysis returns an uncertain or flagged result.
The cost reduction is significant: roughly 80% fewer cloud vision API calls compared to routing every image to the cloud.
// Simplified: Laplacian variance for scratch detection
func laplacianVariance(from pixelBuffer: CVPixelBuffer) -> Double {
// Apply 3x3 Laplacian kernel via Metal compute shader
// Returns variance of filtered pixel values across the image
// High variance = sharp local edges = surface anomaly present
// Low variance = smooth gradient = minimal scratching
return computedVariance
}
Cloud Vision for Damage Classification
Images flagged as uncertain by local CV are sent to cloud Vision AI with structured prompts for:
- Screen crack detection (distinguishing protective glass cracks from display panel damage)
- Housing crack and chip identification
- Side frame dent and bend assessment
- Back panel damage classification
The model returns structured JSON: damage type, location on device, severity (1–5 scale), confidence score.
Audio Analysis: FFT for Speaker Health
Speaker condition is one of the most commonly misrepresented aspects of used phone condition. "Works fine" can mean anything from pristine to barely functional.
We use Fast Fourier Transform analysis to measure actual speaker frequency response:
- The app plays a calibrated sine sweep (20Hz–20kHz) through the speaker
- The microphone records the playback output in real time
- FFT analysis compares the expected frequency response curve against the measured output
- We flag: missing frequency bands (blown driver), distortion harmonics (rattle or buzz artifacts), low-frequency rolloff (damaged suspension)
This produces an objective speaker health score — not "sounds okay," but a measured deviation from the device model's expected acoustic profile.
The Trust Score: 1,000-Point Deterministic System
Every assessment outputs a Trust Score from 0–1,000, mapping to letter grades A (900+) through F (below 400).
The score is deterministic, not ML-inferred. Given identical test results, the same score is always produced. This is intentional — dealers need scoring that is consistent, auditable, and explainable. A model that gives different results for similar devices would destroy confidence in the system.
Score composition:
- Hardware test results: 60% weight
- Cosmetic condition: 30% weight
- Device identity verification: 10% weight
Device Identity: Tamper-Proof Report Binding
A report is only meaningful if it's verifiably bound to the device it describes. We use:
- SHA-256 fingerprint of device-specific hardware identifiers combined with the assessment timestamp
- Signed report payload — any modification to report data invalidates the cryptographic signature
The QR code on every report encodes the signed hash. Any buyer scanning it can confirm the report is genuine and unmodified.
Fair Market Pricing Formula
The price output uses a three-factor weighted formula:
- 40% spec-to-price: Base value from device model, storage tier, RAM, and release year
- 40% depreciation: Modeled depreciation curve per device family, calibrated against market data
- 20% repair cost adjustment: Deducted based on detected issues, using regional market repair cost data
We maintain 55 country-specific pricing weights. A Galaxy A52 holds value very differently in India versus Germany versus Nigeria — the pricing model accounts for this rather than applying a single global curve.
What We're Still Improving
- Expanding the cosmetic CV pipeline to metal and plastic housing materials (currently optimized for glass backs)
- Better low-light scratch detection for shops without controlled lighting setups
- Improving FFT accuracy on devices with multi-driver speaker arrays
- Faster silent test parallelization to push total assessment time under 90 seconds
If you're working on computer vision for hardware inspection, audio signal analysis, or trust and verification systems for physical goods, I'd genuinely love to connect — there are interesting unsolved problems here that mainstream ML research doesn't pay much attention to.
Top comments (0)