<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Srikar Phani Kumar Marti</title>
    <description>The latest articles on DEV Community by Srikar Phani Kumar Marti (@mspk97).</description>
    <link>https://dev.to/mspk97</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F2519231%2F04ef8b8c-91fd-4ba1-a9b7-1e98f84baf6a.png</url>
      <title>DEV Community: Srikar Phani Kumar Marti</title>
      <link>https://dev.to/mspk97</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/mspk97"/>
    <language>en</language>
    <item>
      <title>I built a backend platform that generates REST APIs from a schema — no code, no server setup</title>
      <dc:creator>Srikar Phani Kumar Marti</dc:creator>
      <pubDate>Fri, 29 May 2026 04:09:40 +0000</pubDate>
      <link>https://dev.to/mspk97/i-built-a-backend-platform-that-generates-rest-apis-from-a-schema-no-code-no-server-setup-184g</link>
      <guid>https://dev.to/mspk97/i-built-a-backend-platform-that-generates-rest-apis-from-a-schema-no-code-no-server-setup-184g</guid>
      <description>&lt;p&gt;Every side project I've shipped starts the same way: I have a frontend idea, and I immediately have to stop and go build a backend for it.&lt;/p&gt;

&lt;p&gt;Not because the backend is hard. Because it's &lt;em&gt;tedious&lt;/em&gt;. The same patterns, every time. Define a model. Wire up routes. Handle errors. Write docs nobody reads. Set up auth. Repeat.&lt;/p&gt;

&lt;p&gt;At some point I stopped asking "how do I build this backend faster" and started asking "why am I building it at all?"&lt;/p&gt;

&lt;p&gt;That question became &lt;a href="https://crudly.org" rel="noopener noreferrer"&gt;Crudly&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  What it actually does
&lt;/h2&gt;

&lt;p&gt;You create a project, add a collection (think: a database table with a name and fields), and Crudly instantly gives you a live REST API at:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight http"&gt;&lt;code&gt;&lt;span class="err"&gt;https://api.crudly.org/v1/{your-project}/{collection-name}
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;GET, POST, PUT, DELETE — all working, all authenticated, all documented. You didn't write a single line of backend code.&lt;/p&gt;

&lt;p&gt;Here's what that looks like in practice. I created a project called &lt;code&gt;sample&lt;/code&gt;, added a &lt;code&gt;todos&lt;/code&gt; collection, and within about 30 seconds I had:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# List all todos&lt;/span&gt;
curl &lt;span class="s2"&gt;"https://api.crudly.org/v1/sample/todos"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Authorization: Bearer YOUR_API_KEY"&lt;/span&gt;

&lt;span class="c"&gt;# Create a todo&lt;/span&gt;
curl &lt;span class="nt"&gt;-X&lt;/span&gt; POST &lt;span class="s2"&gt;"https://api.crudly.org/v1/sample/todos"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Authorization: Bearer YOUR_API_KEY"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Content-Type: application/json"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'{"title": "Repair Watch", "done": false}'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's it. No Express. No Postgres migrations. No deploy step.&lt;/p&gt;

&lt;h2&gt;
  
  
  The architecture decision that makes this possible
&lt;/h2&gt;

&lt;p&gt;Crudly doesn't generate code. There's no Express app being scaffolded and deployed per user. That approach sounds appealing until you think about what it costs — cold starts, isolated infra per project, deployment latency, and a maintenance nightmare.&lt;/p&gt;

&lt;p&gt;Instead, the API server uses &lt;strong&gt;dynamic runtime routing&lt;/strong&gt;. Every request to &lt;code&gt;api.crudly.org/v1/{project}/{collection}&lt;/code&gt; hits the same handler. At request time, it reads your schema configuration, validates the request against it, and serves or persists data accordingly.&lt;/p&gt;

&lt;p&gt;The schema is the source of truth. The routes don't exist as code — they're resolved at runtime from what you defined in the dashboard.&lt;/p&gt;

&lt;p&gt;This is the same pattern that makes tools like Hasura and PostgREST compelling, except Crudly's surface area is intentionally smaller. You're not writing GraphQL. You're not pointing it at your own Postgres instance. You create a collection, you get endpoints. That's the whole contract.&lt;/p&gt;

&lt;h2&gt;
  
  
  What ships with it
&lt;/h2&gt;

&lt;p&gt;The thing I kept running into with minimal API tools is that they solve one problem and leave you to figure out everything adjacent. Crudly ships the full surface:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Playground&lt;/strong&gt; — A browser-based HTTP client (Postman style) built into the dashboard. Select your collection, pick a method, fire a request. I built this because switching to Postman or Insomnia mid-flow kills momentum. The response comes back formatted, with status codes and timing.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0xssbschdzw2frk7slu8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0xssbschdzw2frk7slu8.png" alt="Playground" width="800" height="455"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Auto-generated docs&lt;/strong&gt; — Every collection gets docs that reflect the actual schema. They update when the schema changes. The curl examples use your real endpoint URLs. No Swagger YAML to maintain.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flrwnflf7l018z9ip6vzi.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flrwnflf7l018z9ip6vzi.png" alt="Auto Generated Docs" width="799" height="458"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Request logs&lt;/strong&gt; — Real-time traffic visible from the dashboard, filterable by method, status, endpoint, and date. Useful enough that I've caught integration bugs in client apps just from watching the log stream.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwrh5n5nombbenwzrujjk.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwrh5n5nombbenwzrujjk.png" alt="Request Logs" width="799" height="452"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Webhooks&lt;/strong&gt; — POST callbacks on create, update, or delete events per collection. If you're integrating with n8n, Zapier, or your own service — this is how you wire it up without polling.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;API key auth&lt;/strong&gt; — Read-only and read-write keys per project. Generate and revoke from the dashboard. Keys go in the &lt;code&gt;Authorization: Bearer&lt;/code&gt; header.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Admin Panel&lt;/strong&gt; — A generated data dashboard. Browse records, create entries, edit, delete. Useful during development and for non-technical stakeholders who need to manage content.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I didn't build (deliberately)
&lt;/h2&gt;

&lt;p&gt;No custom logic. No middleware hooks. No computed fields. No joins across collections.&lt;/p&gt;

&lt;p&gt;If you need those things, you need a real backend and you should build one. Crudly is not trying to be Firebase or Supabase. It targets a specific use case: you have a frontend, you need persistent data with a REST interface, and you want it running in under five minutes.&lt;/p&gt;

&lt;p&gt;The constraint is the feature. Every time I was tempted to add "just a bit of custom logic support," I asked whether it would break that five-minute promise. Usually it would.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where it actually fits
&lt;/h2&gt;

&lt;p&gt;The use cases where Crudly earns its keep:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Prototyping and MVPs&lt;/strong&gt; — Wire up a real API before you've decided whether the product is worth building a proper backend for.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Frontend demos and portfolios&lt;/strong&gt; — Stop mocking data in JSON files. Ship something that actually persists.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Internal tools&lt;/strong&gt; — A quick admin data store for a side dashboard, without standing up another service.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Hackathons&lt;/strong&gt; — The entire backend setup takes less time than arguing about which framework to use.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Try it
&lt;/h2&gt;

&lt;p&gt;Crudly is live at &lt;a href="https://crudly.org" rel="noopener noreferrer"&gt;crudly.org&lt;/a&gt;. Free plan available. Create a project, add a collection, hit the endpoint — it takes about two minutes to get to a working API call.&lt;/p&gt;

&lt;p&gt;If you run into anything broken or have a use case it doesn't handle, I'm interested. Still actively building this.&lt;/p&gt;

</description>
      <category>webdev</category>
      <category>api</category>
      <category>backend</category>
      <category>buildinpublic</category>
    </item>
    <item>
      <title>I Ran AI Models Directly in the Browser and Measured What It Did to Core Web Vitals</title>
      <dc:creator>Srikar Phani Kumar Marti</dc:creator>
      <pubDate>Sun, 17 May 2026 07:37:49 +0000</pubDate>
      <link>https://dev.to/mspk97/i-ran-ai-models-directly-in-the-browser-and-measured-what-it-did-to-core-web-vitals-4adj</link>
      <guid>https://dev.to/mspk97/i-ran-ai-models-directly-in-the-browser-and-measured-what-it-did-to-core-web-vitals-4adj</guid>
      <description>&lt;p&gt;Everyone is shipping AI features. Sentiment analysis on user input, speech recognition without sending audio to a server, image classification that never leaves the device. The privacy pitch is real, the latency pitch is real. But nobody's asking the obvious question:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What does running a neural network in the browser actually cost the user?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I decided to find out. I built a benchmark harness, ran four quantized models in Chrome stable, and measured the impact on Core Web Vitals — specifically INP, the metric Google now uses to rank your site.&lt;/p&gt;

&lt;p&gt;Here's what I found.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Setup
&lt;/h2&gt;

&lt;p&gt;The test uses &lt;a href="https://huggingface.co/docs/transformers.js" rel="noopener noreferrer"&gt;Transformers.js&lt;/a&gt; — the library that lets you run Hugging Face models directly in the browser via WebAssembly. All models were loaded in INT8 quantized format (q8) to reflect real production conditions.&lt;/p&gt;

&lt;p&gt;Four models, chosen to cover different architectures and modalities:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Params&lt;/th&gt;
&lt;th&gt;Task&lt;/th&gt;
&lt;th&gt;Architecture&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;DistilBERT&lt;/td&gt;
&lt;td&gt;66M&lt;/td&gt;
&lt;td&gt;Sentiment analysis&lt;/td&gt;
&lt;td&gt;Encoder (6 layers)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;BERT-base&lt;/td&gt;
&lt;td&gt;110M&lt;/td&gt;
&lt;td&gt;Feature extraction&lt;/td&gt;
&lt;td&gt;Encoder (12 layers)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Whisper Tiny&lt;/td&gt;
&lt;td&gt;39M&lt;/td&gt;
&lt;td&gt;Speech recognition&lt;/td&gt;
&lt;td&gt;Encoder-Decoder&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;MobileViT-S&lt;/td&gt;
&lt;td&gt;5.7M&lt;/td&gt;
&lt;td&gt;Image classification&lt;/td&gt;
&lt;td&gt;Vision Transformer&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The benchmark harness is live at &lt;strong&gt;&lt;a href="https://benchmark.mspk.me" rel="noopener noreferrer"&gt;benchmark.mspk.me&lt;/a&gt;&lt;/strong&gt; and open source at &lt;strong&gt;&lt;a href="https://github.com/srikarphanikumar/cwv-ai-benchmark" rel="noopener noreferrer"&gt;github.com/srikarphanikumar/cwv-ai-benchmark&lt;/a&gt;&lt;/strong&gt;. Run it yourself.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Is INP and Why Does It Matter?
&lt;/h2&gt;

&lt;p&gt;INP (Interaction to Next Paint) replaced First Input Delay as Google's interactivity metric in March 2024. It measures how long it takes for the browser to respond to a user interaction — a click, a tap, a keypress — and paint the result.&lt;/p&gt;

&lt;p&gt;Google's thresholds:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;✅ &lt;strong&gt;Good&lt;/strong&gt;: under 200ms&lt;/li&gt;
&lt;li&gt;⚠️ &lt;strong&gt;Needs Improvement&lt;/strong&gt;: 200–500ms&lt;/li&gt;
&lt;li&gt;❌ &lt;strong&gt;Poor&lt;/strong&gt;: over 500ms&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;INP affects your search ranking. More importantly, it affects whether users feel your app is responsive or broken.&lt;/p&gt;

&lt;p&gt;When you run neural network inference on the browser's main thread, you're blocking it. That means if a user clicks something while inference is running, their click won't be processed until the model finishes. That delay IS your INP.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Results
&lt;/h2&gt;

&lt;p&gt;Here's the full table from Chrome stable on an Apple M-series MacBook Pro, 16GB RAM:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Load Time&lt;/th&gt;
&lt;th&gt;Avg Inference&lt;/th&gt;
&lt;th&gt;INP&lt;/th&gt;
&lt;th&gt;INP Class&lt;/th&gt;
&lt;th&gt;Mem Δ&lt;/th&gt;
&lt;th&gt;Mem Pressure&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;DistilBERT&lt;/td&gt;
&lt;td&gt;7.85s&lt;/td&gt;
&lt;td&gt;25.1ms ±0.5&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;27.8ms&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;✅ Good&lt;/td&gt;
&lt;td&gt;+59.6MB&lt;/td&gt;
&lt;td&gt;2.5%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;BERT-base&lt;/td&gt;
&lt;td&gt;6.07s&lt;/td&gt;
&lt;td&gt;83.3ms ±1.5&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;85.0ms&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;⚠️ Needs Improvement&lt;/td&gt;
&lt;td&gt;+65.3MB&lt;/td&gt;
&lt;td&gt;4.1%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Whisper Tiny&lt;/td&gt;
&lt;td&gt;6.71s&lt;/td&gt;
&lt;td&gt;496.9ms ±6.2&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;540.3ms&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;❌ Poor&lt;/td&gt;
&lt;td&gt;+123.9MB&lt;/td&gt;
&lt;td&gt;7.1%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;MobileViT-S&lt;/td&gt;
&lt;td&gt;1.15s&lt;/td&gt;
&lt;td&gt;66.7ms ±1.0&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;75.6ms&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;⚠️ Needs Improvement&lt;/td&gt;
&lt;td&gt;+37.0MB&lt;/td&gt;
&lt;td&gt;8.0%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  The Surprising Findings
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Parameter count doesn't predict INP
&lt;/h3&gt;

&lt;p&gt;Whisper Tiny has only 39M parameters — the fewest of any model tested. It also produces the worst INP at 540.3ms, more than 19x worse than DistilBERT which has 66M parameters.&lt;/p&gt;

&lt;p&gt;The culprit is architecture, not size. Whisper is an encoder-decoder model. It doesn't process the full input in a single forward pass — it runs an &lt;strong&gt;autoregressive decode loop&lt;/strong&gt;, generating output tokens one at a time. Each iteration blocks the main thread. The total blocking time accumulates regardless of how aggressively you quantize the weights.&lt;/p&gt;

&lt;p&gt;This means &lt;strong&gt;no amount of quantization will fix Whisper's INP on the main thread&lt;/strong&gt;. It's an architectural constraint, not a tuning problem.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. MobileViT-S loads 6x faster but still misses "Good"
&lt;/h3&gt;

&lt;p&gt;MobileViT-S loads in 1.15s compared to 6–8 seconds for the text models. That's a huge UX win for initial load. But its INP of 75.6ms puts it in "Needs Improvement" territory despite having only 5.7M parameters.&lt;/p&gt;

&lt;p&gt;Vision transformer inference carries disproportionate cost relative to parameter count in WASM environments. Something to watch if you're building image classification features.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Memory pressure ≠ memory delta
&lt;/h3&gt;

&lt;p&gt;MobileViT-S has the lowest absolute memory consumption (+37MB) but the &lt;strong&gt;highest memory pressure at 8.0%&lt;/strong&gt;. That 37MB represents a larger fraction of the available JS heap than you'd expect — with implications for mid-range Android devices where heap limits are much tighter.&lt;/p&gt;




&lt;h2&gt;
  
  
  What This Means for Your Architecture
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;If you're building with encoder-only text models (DistilBERT class):&lt;/strong&gt;&lt;br&gt;
You're fine on the main thread. 27.8ms INP is negligible. Trigger inference directly on user interactions without worrying about CWV degradation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;If you're using larger encoder models (BERT-base class):&lt;/strong&gt;&lt;br&gt;
Don't trigger inference synchronously on interactions. At 85ms, stacking this with other main thread work risks crossing 200ms. Move it to a post-interaction background step — run inference after you've already painted the response.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;If you're using any encoder-decoder model (Whisper, T5, BART, etc.):&lt;/strong&gt;&lt;br&gt;
You &lt;strong&gt;must&lt;/strong&gt; offload to a Web Worker. This isn't an optimization — it's a requirement. The main thread will be blocked for hundreds of milliseconds no matter what you do. Transformers.js supports Web Worker execution natively:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;pipeline&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;@xenova/transformers&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;// Run in a Web Worker to avoid blocking main thread&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;transcriber&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;pipeline&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;automatic-speech-recognition&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Xenova/whisper-tiny&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;worker&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;If you're using vision transformers:&lt;/strong&gt;&lt;br&gt;
Test on actual mobile hardware before shipping. The memory pressure numbers on an M-series Mac will look very different on a mid-range Android.&lt;/p&gt;




&lt;h2&gt;
  
  
  Limitations to Know
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;TBT couldn't be captured in the deployed environment.&lt;/strong&gt; The Long Tasks API isn't available in cross-origin deployed contexts — only in locally-served or Chrome DevTools Protocol environments. The INP measurements are real, but the full main thread blocking profile requires a different setup to measure properly.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;All numbers are from high-end hardware.&lt;/strong&gt; An Apple M-series Mac is not the median global web user's device. INP values on mid-range Android will be significantly higher — potentially 3–5x. The relative ordering of models should hold, but don't use these absolute numbers as production thresholds for mobile.&lt;/p&gt;




&lt;h2&gt;
  
  
  Try It Yourself
&lt;/h2&gt;

&lt;p&gt;The benchmark is live and open source. Run it on your device, your network conditions, your hardware profile. Export the results as JSON or CSV.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Live benchmark&lt;/strong&gt;: &lt;a href="https://benchmark.mspk.me" rel="noopener noreferrer"&gt;benchmark.mspk.me&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Source code&lt;/strong&gt;: &lt;a href="https://github.com/srikarphanikumar/cwv-ai-benchmark" rel="noopener noreferrer"&gt;github.com/srikarphanikumar/cwv-ai-benchmark&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Full paper&lt;/strong&gt;: arXiv link coming soon&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you run it on a mid-range Android or a low-end device and want to share the numbers, I'd love to see them — that's exactly the follow-on data this research needs.&lt;/p&gt;




&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;DistilBERT is the only model that stays in Google's "Good" INP range on the main thread&lt;/li&gt;
&lt;li&gt;Whisper Tiny is "Poor" despite being the smallest model — architecture beats quantization&lt;/li&gt;
&lt;li&gt;Encoder-decoder models require Web Worker offloading — no exceptions&lt;/li&gt;
&lt;li&gt;Parameter count is a bad proxy for browser inference cost&lt;/li&gt;
&lt;li&gt;Memory pressure on mobile is a separate concern from memory consumption&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The era of client-side AI is here. Now we need to measure what it actually costs.&lt;/p&gt;

</description>
      <category>webdev</category>
      <category>ai</category>
      <category>webvitals</category>
      <category>corewebvitals</category>
    </item>
  </channel>
</rss>
