DEV Community: Gagandeep Singh

[Boost]

Gagandeep Singh — Fri, 03 Jul 2026 07:24:02 +0000

Gagandeep Singh for Innerkore Technologies

Jul 3

AI Can Build Your UI in Seconds. Who's Handling the Forms?

#forms #ai #webdev #productivity

3 min read

AI Can Build Your UI in Seconds. Who's Handling the Forms?

Gagandeep Singh — Fri, 03 Jul 2026 07:23:37 +0000

We Keep Reinventing the Same Wheel

Every project I've started in the last five years has had the same moment. You build the UI, you wire up state, you get the design looking sharp — and then someone says "we need a contact form." Or a feedback form. Or a waitlist sign-up.

And you spend the next afternoon doing the same thing you've done a dozen times before: writing a route that accepts a POST, validating the body, pushing rows into a Google Sheet, sending a confirmation email, hoping nothing breaks at 2am.

FormProxy exists because I got tired of that afternoon.

What FormProxy Actually Does

FormProxy is a form backend. That sounds small until you think about what "form backend" actually means:

Accepting submissions from any HTML form or API call, without you writing a handler
Storing everything with configurable retention (7 days on free, 90 days on paid plans)
Routing submissions to wherever your team actually lives — Slack, email, Google Sheets, webhooks, Zapier
Managing multiple forms across workspaces, with per-form signing secrets and activation toggles

You drop a form endpoint URL into your <form action="..."> and you're done. No backend code. No database schema. No queue to maintain.

Why This Matters Even More Now That Everyone Has AI

Here's the thing nobody says out loud: AI has made it dramatically easier to build frontends, but the boring infrastructure behind forms hasn't gotten any smarter.

ChatGPT or Claude can generate a beautiful, accessible contact form in seconds. Cursor can wire up the validation. But the moment that form needs to do something — store a submission, notify Slack, sync to a spreadsheet — you're back to writing SMTP handlers and webhook endpoints by hand.

The gap between "AI can build my UI" and "AI can run my backend" is real. Tools like v0 and Lovable and bolt.new are accelerating frontend development faster than infrastructure can keep up.

FormProxy fills exactly that gap. When your AI-generated landing page needs a "Join Waitlist" form that:

Stores signups with timestamps
Pings your team in Slack instantly
Syncs every row to a Google Sheet your non-technical co-founder can read
Sends a confirmation email to the user

...you don't need to write any of that. You configure it in a UI and point your form at the endpoint.

The Integrations Are the Point

The real value isn't storage — it's routing. Here's what FormProxy can do with a submission the moment it lands:

Google Sheets

Every submission becomes a row. Columns are inferred automatically from your form fields — add a new field to your form and it appears as a new column on the next submission, no schema migration required.

We use Google's OAuth flow with drive.file scope (not the overly broad spreadsheets scope), so you retain control of exactly which files FormProxy can touch. You pick the spreadsheet via a Picker UI — no copy-pasting spreadsheet IDs.

Slack

Instant notification to any channel. Good for lead forms, support requests, anything your team wants to see in real-time.

Webhooks

Full POST to any URL with HMAC signing (using the form's signing secret), so your own services can verify the submission is genuine before processing it.

Email

Confirmation emails to submitters, notification emails to your team. Configurable templates, no mail server setup on your end.

Zapier

One integration that unlocks 5,000+ apps. If FormProxy doesn't have a native integration for your tool, Zapier fills the gap.

A Real Example: AI-Generated SaaS Landing Page

Here's a workflow I've used in production:

Use v0 to generate a landing page with a "Get Early Access" form
Point <form action="https://app.formproxy.com/f/{uid}"> at FormProxy
Configure: Google Sheets sync (so I have a CRM-lite spreadsheet), Slack notification (so I see signups immediately), email reply (so users get a confirmation)
Done in under 10 minutes

No Netlify Functions. No serverless cold starts. No "oh we missed 3 signups because the function timed out."

The AI built the frontend. FormProxy handled everything else.

What's Next

We're actively building:

AI-powered submission summaries — ask natural language questions about your form data
Conditional routing — send to Slack only if a field contains a certain value
Team comments on submissions
File uploads — already supported on paid plans, expanding to more types

If any of this resonates, try it: app.formproxy.com. Free tier is genuinely useful — 3 forms, 7-day retention, all integrations.

Would love to hear what integrations you'd want to see next. Drop a comment.

FormProxy is built with FastAPI + Next.js

Part 2: Why We Built Our Own TinyBERT (and How It Beat Shiprocket's) - Indian Address Parser

Gagandeep Singh — Fri, 03 Jul 2026 06:41:45 +0000

This is Part 2 of a series on building an open-source Indian address parser. Part 1 covered fine-tuning Qwen3-0.6B with LoRA and our first benchmark against Shiprocket's open-tinybert-indian-address-ner. This post covers the third and final model in the series, and what happened when we pointed it at the exact model that inspired it.

The itch Shiprocket's benchmark left behind

When we first benchmarked our Qwen3-0.6B model against Shiprocket's open-tinybert-indian-address-ner, the headline was good news wrapped in a caveat: we won on every one of the nine conceptually-shared fields, sometimes by a wide margin — but Shiprocket's model was doing it with a 6-layer, 768-hidden BERT variant that ran in 19 milliseconds per address on CPU. Ours took over four seconds. That's not a rounding error; that's a 240x gap.

It's easy to wave that away — "different tradeoff, different use case" — and mostly that's true. But it kept nagging at us. Shiprocket had clearly made a deliberate choice: trade some accuracy for a model small enough to run in a hot path, cheaply, at scale. We'd built the opposite thing. Neither choice is wrong, but we only had one point on that curve.

So the natural next step wasn't "make Qwen faster." It was: what if we made the same architectural bet Shiprocket did, but trained it on our own gold-labeled data? Same idea — small BERT encoder, BIO tagging instead of JSON generation — applied to our 13-field schema instead of theirs.

That's huawei-noah/TinyBERT_General_4L_312D: 4 layers, 312 hidden dimensions, about 14 million parameters. For comparison, that's roughly 5x smaller than our flan-t5-small model and 40x smaller than the Qwen3-0.6B LoRA setup. It fine-tunes in about two minutes on a laptop.

The catch: our data wasn't built for this

Here's the thing nobody tells you when you decide "let's just add a BERT-style token classifier": your training data has to actually support it, and if you've been building a generative-model pipeline for months, it probably doesn't — not in the shape you need.

Every model in this project so far had been trained the same way: given a raw address string, generate a JSON object mapping 13 field names to substrings of that address. The gold labels were always verbatim extractions — never paraphrased, never normalized. If the source text said "Kamrup Unclassified AS 781029", the district field was exactly "Kamrup", copied character-for-character, not corrected to "Kamrup Metropolitan" or expanded to "Assam".

Token classification wants something different: a BIO tag (B-district, I-district, O, ...) on every single token. We didn't have that. What we had was JSON.

The good news is that "verbatim extraction" and "convertible to BIO tags" are almost the same property. If a gold value is a real substring of the raw address, you can find its character span, then map that span onto whatever tokens your tokenizer produces. We measured how often that actually holds:

exact substring found: 25,910 / 25,915 gold field values (99.98%)

Nearly perfect. The handful of misses were genuine data artifacts — cases where two fields got glued together during earlier normalization ("PALASHBARI Kamrup" spanning a comma that got collapsed somewhere upstream). Not something a training script should paper over; we just skip labeling those and move on.

The bug that almost shipped: duplicate values

The harder problem showed up once we started converting real examples. Consider an address where both city and district are gold-labeled as "Chandigarh" — genuinely happens, since Chandigarh is its own city and district. A naive raw_text.find("Chandigarh") always returns the first occurrence. Both fields collapse onto the same span. One of them silently loses its label.

We caught this by measuring, not by inspection — we ran a full round-trip test (gold → character spans → BIO tags → reconstructed fields) across the training set and it came back at 93.75% instead of the >99% we expected. Digging into the mismatches surfaced the collision pattern immediately: any two fields sharing a value, wherever the text happened to repeat that value, were fighting over the same characters.

The fix: track every occurrence of a value in the text, not just the first, and let the overlap-resolution logic (which already claims longer spans before shorter ones, so "village" doesn't get clobbered by a "locality" substring it happens to contain) pick a distinct occurrence for each field. That brought the round-trip ceiling up to 96.68% — which we now treat as this data's honest upper bound, not a bug to keep chasing. The residual gap is two well-understood cases: fields sharing a value with too few occurrences to give each one its own span, and tokens that straddle a span boundary on already-documented data artifacts (the same glued-substring issue that shows up elsewhere in this project's known limitations).

Training and the surprising result

With the BIO conversion pipeline verified, training itself was almost anticlimactic. Ten epochs, batch size 32, cosine learning rate schedule — done in 133 seconds on Apple Silicon. Eval loss dropped from 1.91 to 0.77 and plateaued cleanly around epoch 8.

Then evaluation came back and it was, frankly, better than expected:

Model	Params	Mean field accuracy
Qwen3-0.6B + LoRA	~596M	82.4%
flan-t5-small	~77M	80.6%
TinyBERT 4L/312D	~14M	78.8%

A model 40x smaller than our best one landed within four points of it. It's not free — subLocality and village recall are both effectively 0%, meaning the model just defaults to null on those far more than gold does, a real weakness worth being upfront about. But on the fields that matter most for downstream use (district, state, city, pincode), it's solidly in the same range as its much bigger siblings.

The comparison that actually mattered

All of this was interesting on its own, but it wasn't the real test. The real test was going back to the model that started this whole detour: Shiprocket's open-tinybert-indian-address-ner.

Same name. Same task family — BIO tagging on Indian addresses. This should be the closest thing to an apples-to-apples comparison in the whole project.

Except it wasn't, and we found that out the moment we checked the config instead of assuming:

shiprocket-ai/open-tinybert-indian-address-ner
  hidden_size: 768, num_hidden_layers: 6
  params: 66,382,103

Despite the "tinybert" name, Shiprocket's model is a 6-layer, 768-hidden BERT — closer in scale to BERT-base than to the original TinyBERT paper's smallest configuration. It has 4.7x more parameters than ours. We reported that up front rather than letting a same-name comparison imply a same-size one.

With that caveat stated plainly, we ran both models on the same 237-example held-out gold test set:

Field	Ours (4L/312D, ~14M)	Shiprocket (6L/768D, ~66.4M)
houseNumber	79.8%	27.1%
houseName	81.7%	72.1%
street	50.0%	27.0%
locality	36.5%	6.7%
city	82.6%	17.4%
state	84.2%	41.5%
pincode	99.2%	69.2%
poi	20.5%	10.3%
subLocality	0.0%	0.0%

We won on all nine shared fields. Not narrowly — on city it's 82.6% vs 17.4%; on houseNumber it's 79.8% vs 27.1%. And it wasn't slower for the privilege: 11ms/address vs 16ms/address, despite being the smaller model on paper.

We didn't take the win at face value either. When we inspected Shiprocket's raw, unaggregated per-token predictions, the pattern was clear: on longer administrative-suffix text, the model's tag predictions genuinely flip-flop mid-word, with confidence scores dropping to 0.3–0.5 exactly where that happens. For "Kamrup Unclassified", the token "Kam" gets tagged B-sub_locality at 0.45 confidence, and the very next token, "rup", gets tagged I-locality at 0.42 — genuinely uncertain, internally inconsistent output, not an artifact of how we ran the comparison.

Our read on why the gap is this large: fine-tuning on task-specific gold data seems to matter more here than raw parameter count. Shiprocket's model is bigger, but it wasn't fine-tuned on this exact 13-field taxonomy and this exact address distribution. Ours was — on the same 4,110 verbatim-extraction examples that trained the Qwen3 and flan-t5-small models before it.

Where this leaves the project

Three models now sit behind one interface:

from indian_address_parser import AddressParser

parser = AddressParser()                  # tinybert — the default now
parser = AddressParser(backend="t5")       # a couple points more accurate, slower
parser = AddressParser(backend="qwen")     # the most accurate, and the heaviest

TinyBERT became the default in v0.3.0 deliberately, not by default-by-omission. It's the cheapest model in the series to download and run — a single forward pass instead of autoregressive generation, no adapter/base-model split to manage — and it gives up only a few points of accuracy to do it. For most people integrating this into a pipeline, that's the right trade to start from; the other two backends are one keyword argument away when the accuracy matters more than the footprint.

All three models, the benchmark scripts, and the full per-field breakdowns are public:

Code & benchmarks: github.com/innerkorehq/indian-address-parser
TinyBERT model: huggingface.co/gagan1985/tinybert-4l-312d-indian-address-parser
PyPI: pip install indian-address-parser

If you want to reproduce the Shiprocket comparison yourself, it's a five-minute run:

pip install indian-address-parser transformers torch
git clone https://github.com/innerkorehq/indian-address-parser
cd indian-address-parser/benchmarks
python compare_tinybert.py --out results.json

Have you run into the same "same architecture name, different actual size" trap comparing models? Curious what other small-model fine-tunes people have benchmarked against their pretrained namesakes — drop it in the comments.

Building an Open-Source Indian Address Parser: From Raw MCA/Bank Data to a Fine-Tuned LLM

Gagandeep Singh — Thu, 02 Jul 2026 08:32:43 +0000

Cross-posting the full pipeline — data labeling, LoRA fine-tuning, cross-framework conversion, and a benchmark against an existing NER model — because most of the interesting bugs weren't in the ML at all.

The problem

Indian addresses are notoriously unstructured. A single line can look like this:

FLAT NO.32, UTTARA TOWERS, MG ROAD GUWAHATI , Kamrup Unclassified AS 781029

House number, building name, street, locality, district, state, and pincode — all jammed into one free-text string with zero consistent formatting. If you've worked with Indian company registry data, bank KYC records, or delivery logistics, you already know this pain.

I set out to build something that turns strings like the above into:

{
  "houseNumber": "FLAT NO.32",
  "houseName": "UTTARA TOWERS",
  "street": "MG ROAD",
  "city": "GUWAHATI",
  "district": "Kamrup",
  "state": "AS",
  "pincode": "781029",
  "poi": null, "subsubLocality": null, "subLocality": null,
  "locality": null, "village": null, "subDistrict": null
}

13 fields, always present, null when absent. Here's the whole pipeline, warts included.

Getting labeled data without a labeling budget

Starting point: 4.37M raw addresses from two very differently-shaped sources — Indian MCA (Ministry of Corporate Affairs) company registrations, and bank/business-correspondent branch records. No labels.

Manual labeling doesn't scale to that volume, so the pipeline is layered:

Rule-based tagging — regex + gazetteer cross-checks (pincode → district/state lookup from India Post's official pincode CSV) give every record a confidence score. High-confidence ones auto-accept as "silver" labels.
LLM-assisted labeling for the rest — batched calls to an LLM via OpenRouter, with a system prompt that requires every extracted value to be copied verbatim from the source text. If the model's field value isn't a substring of the input, it gets dropped rather than trusted. This alone eliminates a whole class of hallucination.
A small human-reviewed slice as a sanity check against the LLM's own accuracy before scaling up.

One subtlety that actually mattered: MCA addresses have a machine-generated tail like "...Kamrup Unclassified AS 781029", where "Unclassified" is a fixed placeholder meaning "no sub-district classification recorded" — not a place name. Early runs had the LLM tagging "Unclassified" as a subDistrict value. Fixed by explicitly teaching the model about this convention in the prompt. Small thing, but it's the kind of domain quirk no generic address parser would know to avoid.

Also worth calling out: field taxonomy design is harder than model training. The first schema (Google Maps' full geocoding component taxonomy, 35 types) was too granular for anyone — human or LLM — to label consistently. Collapsed it to 13 fields based on what a human reviewer could actually apply without agonizing over edge cases.

Fine-tuning

LoRA on Qwen/Qwen3-0.6B, trained via MLX on an M4 Mac (mlx-lm's lora command — genuinely pleasant to work with on Apple Silicon, no CUDA/bitsandbytes wrangling).

rank=16, alpha=32, dropout=0.05
target_modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
16 of 28 layers fine-tuned, 2000 iterations, ~1.8 hours

Results on a 237-example held-out gold test set:

Metric	Value
JSON parse rate	100%
Mean per-field accuracy	82.4%
Overall exact match (all fields)	30.8%

The gap between per-field accuracy and exact-match is the interesting bit. Digging into disagreements, most of it isn't the model being wrong — it's schema ambiguity. locality/subLocality/subsubLocality/village represent the same "named area, different granularity" concept, and even the gold labels are sometimes inconsistent about which bucket a given place name belongs in (I found gold records where the same string was labeled as both locality and village simultaneously). That's a taxonomy problem, not a model problem, and no amount of additional training fixes it without a firmer labeling convention.

Getting it to run outside MLX

This is where most of the actual debugging time went, and none of it was ML.

mlx-lm produces its own adapter format — not PEFT-compatible. To make the model usable on CUDA/CPU (not just Apple Silicon), I had to hand-derive the weight conversion:

# mlx-lm: lora_a [in_features, r], lora_b [r, out_features], used as x @ A @ B
# PEFT:   lora_A.weight [r, in_features], lora_B.weight [out_features, r]
# So: peft_A = mlx_a.T, peft_B = mlx_b.T

I verified this against mlx-lm's own fuse() source (delta = (scale * lora_b.T) @ lora_a.T) rather than trusting my own derivation, then confirmed numerically — ran the same 15 addresses through both the original MLX adapter and the converted PEFT version. 13/15 identical outputs; the 2 mismatches landed exactly on the already-known-ambiguous fields, consistent with floating-point differences between backends on a near-tied softmax decision rather than a conversion bug.

Publishing, and the dependency-floor whack-a-mole

Published the model to Hugging Face (both formats — PEFT at root, MLX in a subfolder), then wrapped it as a pip install-able package: indian-address-parser on PyPI, source on GitHub.

Then real users tried to install it into their existing environments (Anaconda base envs, specifically), and things broke in sequence:

peft imports transformers.BloomPreTrainedModel, whose lazy-loading chain unconditionally does import tensorflow. In a conda env with a mismatched TF/numpy/h5py install, that crashed the whole thing before ever touching TensorFlow functionality. Fix: os.environ["USE_TF"] = "0" before any transformers/peft import, so transformers' TF-detection short-circuits.
qwen3 model type not recognized. Turns out transformers only added Qwen3 support at exactly version 4.51.0 — verified by bisecting real PyPI releases (4.50.0: no, 4.51.0: yes). My dependency floor (>=4.45.0) was loose enough that pip left an old transformers in place instead of upgrading it.
hf_hub_download() got an unexpected keyword argument 'use_auth_token'. peft<0.18.0 unconditionally passes use_auth_token=None into hf_hub_download, regardless of whether the caller asked for it. Recent huggingface_hub (1.x) dropped that deprecated kwarg entirely. Bisected peft's source across ten versions to find the exact fix boundary (0.17.1: unconditional pass, 0.18.0: conditional via walrus operator).

Each fix was verified against the actual reported failure, not just plausible-sounding — I built a venv pinned to the exact stale dependency trio from the bug report, installed the patched package, confirmed pip auto-upgraded everything, and ran real inference before calling it fixed.

The lesson, if there is one: >=X.Y.Z floors need to be the actual minimum that works, verified, not "whatever I happened to have installed while developing." Loose floors don't fail for you — they fail for whoever has an older version already sitting in their environment.

Benchmarking against an existing model

Once things were stable, I compared against Shiprocket's open-tinybert-indian-address-ner — a 6-layer TinyBERT doing BIO-tagged token classification, a fundamentally different architecture (and a different field taxonomy) than a 0.6B causal LM generating JSON.

Built an explicit field mapping covering the 9 conceptually-overlapping fields (their house_details ↔ my houseNumber, road ↔ street, etc.) and scored both against the same 237-example held-out set:

Field	Mine	Shiprocket's
city	91.3%	17.4%
state	96.2%	41.5%
pincode	100.0%	69.2%
houseNumber	84.5%	27.1%

Higher accuracy on every shared field — but Shiprocket's model is ~240x faster per address (19ms vs 4.6s). That's not a quality artifact, it's architecture: a 6-layer classifier doing a single forward pass vs. autoregressive generation. If your use case needs high-throughput/low-latency parsing over perfect accuracy, that's a legitimate reason to pick the other model. I'd rather publish that tradeoff honestly than pretend the comparison only cuts one way.

Publishing the data too

Also shipped the underlying data as two HF datasets:

indian-addresses-raw — the full 4.37M-record unlabeled corpus
indian-addresses-gold — 4,834 span-labeled training examples

Before publishing the raw corpus, I found something worth mentioning: bank/BC address records are KYC-style data and some of them embed real customer phone numbers and relational-name markers (S/O/D/O/W/O/C/O — "son of"/"care of", standard on Indian address forms). That's different from MCA's superficially similar C/O <company director> convention, which is already public disclosure. Wrote a targeted redaction pass for the bank source (verified against the corpus, not assumed — caught a "Door No." vs "D/O [name]" false-positive collision along the way), and for the gold dataset specifically, dropped the small number of affected records instead of redacting in place, since redacting text shifts the character offsets that the span labels depend on.

Try it

pip install indian-address-parser

from indian_address_parser import AddressParser

parser = AddressParser()  # pulls weights from HF automatically
parser.parse("FLAT NO.32, UTTARA TOWERS, MG ROAD GUWAHATI , Kamrup Unclassified AS 781029")

Everything's open source and Apache 2.0: model · GitHub · PyPI · datasets

Feedback and PRs welcome, especially on the locality/subLocality boundary ambiguity — I have a hypothesis for a firmer labeling convention that would help, but haven't tested whether it actually resolves the disagreement rate or just moves it around.