We’re a quarter into the 21st century, and the browser has quietly evolved into something much more than just a UI layer. It can run complex comput...
For further actions, you may consider blocking this person and/or reporting abuse
Perfect demo! I love your work to seek the performance edge of the browser.
I feel the root of problem is language based. Because the WAT format is not really usefull for direct coding, so the WASM perfect soulution maybe a Rust, but I think worth to spend a time to create a better direct WASM text format, I started the work, but the mordorjs bit slow me down.
The other language is the SHADER which is much problematic to mastered, because whole GPU concept is totally different than a standard CPU based programming language logic. We are standing here the opening gate of threating AI ages, but even do not understand how we can controll our own machines.
For example a good example on a shader toy ( a great place to play with shader )
shadertoy.com/view/ffSSzW
if you take a look this is really hard to understand what is going on hardwer level compared to a simple assembley which is much more straight.
A few years ago I spend a lot of time on shadertoy but honestly right now I cannot able to wrtie a working SHADER code.
This is my minimal javascript shader bootstrap (without dependency - 8 years ago - as VibeArcheologist I have a lot of stuff in my past ):
Thanks a lot for this comment, I really appreciate it 🙌
And yes, totally agree, understanding shaders is hard. It’s a completely different mental model compared to CPU programming, and it takes time to even start thinking in that way. I’m also really curious how this will evolve in the future. My gut feeling is that we won’t see average developers writing shaders directly, more likely we’ll rely on higher-level abstractions or libraries built on top of them.
I’ve actually been thinking about building a small POC around GPGPU myself, but before that I’d like to properly measure the performance, trade-offs, and whether it really makes sense in typical frontend scenarios.
Also, I’m curious, what’s your take on Zig for writing WASM? I haven’t tried it myself yet, but I’ve been hearing quite a lot of good things about it.
This is a great post! I am not familiar with WebAssembly and kind of with WebGPU, but this gave me a basic understanding the importance of it when it comes to mainly performance!
Speaking of performance, how would you determine that you reach your limit and in which therefore need WebAssembly or WebGPU? I can imagine some people will use it right off the bat but if we are talking about needing it, what do I look for and how do I measure it?
Thanks! Hope this makes sense. I might be overthinking the question and that it might be already be answered in the post, but I start asking this since you mentioned "But the moment you start hitting performance limits, or your problem shifts from “moving data around” to “actually computing things”… you might realize that the platform already had the solution all along."
Thanks, that’s a great question 🙂
I wouldn’t start with WebAssembly or WebGPU right away. First, I’d check where the real bottleneck is. In many apps it’s still the network, too much data or too many requests, and in that case these tools won’t help.
What I look for is whether the app becomes compute-bound. If the UI starts lagging, animations drop frames, or the main thread is busy for too long, that’s usually a sign. Then I’d profile and see which part of the code is actually expensive.
WASM makes sense for CPU-heavy tasks like data processing or transformations. WebGPU is more for massively parallel work like physics, graphics, or large-scale computations.
So the key is profiling first. If the problem is really about computation and not data transfer, that’s the moment when these tools start to make sense 👍
The Canvas 2D version is not an optimal Canvas 2D implementation. It's a software rasterizer written in JavaScript that happens to use Canvas 2D only as a final
putImageDatablit. That makes the "JS + Canvas 2D vs. WebGPU" comparison somewhat unfair. An optimal Canvas 2D version would useOffscreenCanvasfor pre-rendering one soft glowing particle sprite once/transforms/Web Workers so the main thread only issues draw calls, etc.Thanks a lot for the comment, I really appreciate it, and it’s great that you took the time to look into the code 🙌
That’s a fair point. This was definitely not meant to be the ultimate optimized Canvas 2D implementation.
I did also try more canvas-native approaches, closer to things like fillRect-style rendering, and Canvas still lost pretty badly there too, although the visual result was a bit different. So I agree that this specific comparison is not “best possible Canvas 2D vs best possible WebGPU”.
The bigger point I wanted to show was that once every particle has its own state and physics, this kind of workload is simply much more natural for the GPU. Canvas 2D can definitely be pushed further with techniques like OffscreenCanvas, sprite reuse, workers, and so on, but for me that only reinforces the main idea: at some stage, the real win comes not from squeezing the CPU path harder, but from moving the work to a better execution layer.
That said, you’re right that a more apples-to-apples comparison could be something like compute shaders vs a heavily optimized CPU approach. I did think about going in that direction, but ran out of time for this demo. I think we can all kind of guess what the outcome would be 😄
The 500,000 particles vs 40,000 is the signal. Not because anyone needs half a million particles. But because the gap shows you what's possible when you stop treating the browser as a document viewer. Most apps are slower than they need to be because developers assume the constraint is the platform. It's not. It's the assumption. WebGPU and WASM don't solve every problem. But they prove that the ceiling is higher than most people are willing to reach.
Exactly! This demo is mostly for fun, but I can absolutely imagine real scenarios where someone would actually need 500k particles, like physics simulations, complex visualizations, or scientific models.
That’s kind of the point: once you realize the browser can handle this level of computation, it changes how you think about what’s possible.
The WebAssembly + WebGPU angle is where most teams are still leaving huge wins on the table. One underrated slowdown I keep seeing in "fast" React apps: the waterfall of
useEffectfetches that happen after hydration. Moving data fetching to the server component or a single initial payload collapses 3-4 roundtrips into one and usually cuts TTI by 30-50% on mid-spec mobile. WebGPU is the ceiling lift; fetch architecture is the floor nobody talks about. What's the most common "easy to fix, huge impact" pattern you see in audits?Exactly, that’s how it usually looks 🙌
In React apps it’s very often about fetch patterns, too many requests, pulling too much data, or just inefficient queries. I also see a lot of cases where everything still ends up in one big bundle, even though on paper there’s “code splitting”.
These are usually quick wins and can make a huge difference. And then, if that’s already in a good place and the problem is still heavy computation, that’s where WASM or WebGPU start to make sense.
And the gains there can be really significant. We often fight for milliseconds, but here you can get a 2× speedup with WASM, and in parallel workloads WebGPU can be 200–300× faster than JavaScript.
Good post.
What stood out to me is how quickly the JS version starts struggling once the workload shifts, while the WebGPU version keeps scaling.
Feels like a lot of apps don’t get slower because they do too much, but because the work stays in the same execution layer for too long.
People keep pushing everything through the same JS path even after the workload clearly changed.
That is why this matters. Not because every app needs WebGPU or WASM, but because most teams never revisit the execution layer once the app starts growing.
Exactly this! 🙌
And then, of course, we love to blame the backend, the network, anything really… 😅
But sometimes a simple shift in the execution layer can make a massive difference in performance. It’s not always about doing less work, it’s about doing it in the right place.
yeah exactly, that’s the pattern I keep seeing
things start to slow down or behave weirdly and the first instinct is to blame infra, but a lot of the time it’s just that the execution model hasn’t evolved with the workload
what worked at small scale just keeps getting stretched instead of rethought
that shift you mentioned is underrated, it’s not about optimizing harder, it’s about realizing you’re solving the problem in the wrong place
I don't even try to switch to Canvas 2D tab 😂
Haha no way 😄 almost 2.5 million particles and still decent FPS. That’s insane, this should totally be the cover photo 😂
And yeah… better not switch to Canvas 2D at that point, poor CPU wouldn’t survive that experiment 😅
I guess this only works on Chrome?
Good question! 🙂
It works the same way as WebGPU support in general, so not just Chrome. It’s available in modern browsers like Firefox and Edge as well.
That said, WebGPU is still not a fully finalized standard yet, and there are some gaps, especially on mobile devices and certain environments like Linux. So we’re not quite at “works everywhere” yet.
But it’s definitely moving in that direction, it’s more a matter of time than anything else. That’s also why I mentioned in the article that having a fallback is still important for now 👍
Nice Post again. I am following you on LinkedIn now.
But: The article does not say anything about real apps that are slow.
If I look at the Apps I use, that feel really slow, I don't think WebAsembly and WebGPU will help.
Deezer (like Spotify):
Search seems slow, which is a backend problem. They also seem to fetch my favorite artists each time I open them. This all seems to be a caching problem and not a rendering problem.
Aumio (Children's meditation):
I don't know what's so hard about making an app like that. It's a shame that it is so slow, given that it is "only" an audio player. I thought about ripping the tracks just to avoid this unbelievably slow UI. I think there much easier ways to improve performance than switching to this stack.
Roborock (Roomba):
Maybe it's the communication, or the data comes from the robot and isn't cached. But slow map-loading probably also is a caching problem.
What "real" apps do you use that feel slow and might actually benefit from this stack?
At my former employer I evaluated solutions to render 100.000 rectangles that may be transformed in different ways. Rendering was not an issue, once I started using the CSS
transformproperty.I tweaked the drag and drop performance by using the
willChangeCSS property, reducing the amount of GC calls and by not rendering the update each time the drag/move event is fired.Needless to say, that at this point you don't send/render all elements on each frame, but only the changes.
Later these rectangles become buttons, labels and input fields and more. They can be styled using CSS. At this point performance must be reevaluated. But on the other hand: I wonder if there is a graphics library as capable and as performant as the DOM I could use inside WebAssembly/WebGPU 🤔
Thanks a lot for this comment, really appreciate it 🙌
I agree with you. Many slow apps are limited by network, backend, or caching, and in those cases WASM or WebGPU won’t help much.
At the same time, it’s not always that simple. A colleague of mine had a talk in Bologna about optimizing an app for Adobe, and even small things like adding a custom header made a difference, because every GET triggered a preflight. So sometimes milliseconds really matter.
For many apps, better caching and fewer requests are the biggest wins. But there’s also a growing class of apps where computation is the bottleneck. You can see it in tools like Figma or Canva, 3D viewers, video or image processing, and AI in the browser.
Also, your example with 100k rectangles is a great one, but it focuses mostly on rendering. In my case, each “rectangle” has its own physics and is updated every frame, which is a very different kind of problem. That’s exactly the kind of scenario where CPU starts to struggle and GPU-based approaches really shine, especially in games or simulations.
So I see this as one more tool. Not always needed, but very useful when the problem is actually about computation 🙂
Interesting for offline-first apps where you don’t have a backend to lean on.
It’s about choosing the right layer for the job.
😎
Exactly! 😎
And now that we have WebGPU, we can finally actually choose the right layer for the job, not just default to the CPU or the backend.
This is a great reminder that a lot of frontend performance issues are not always about optimizing React or shaving a few re-renders. Sometimes the bigger question is whether the workload belongs on the CPU, the GPU, or even in JavaScript at all.
I liked the way the demo separates responsibilities too: Canvas for text rendering, WASM for CPU-heavy particle generation, and WebGPU for the actual high-volume rendering. That feels like the practical takeaway here, not just "use WebGPU everywhere,” but use the browser platform more intentionally when the problem becomes more complex.
The 500k particle example also makes the point really clearly. At that scale, Canvas 2D and WebGPU are not just different implementations, they are actually different performance models. This is where the frontend starts feeling less like UI work and more like 'systems engineering' inside the browser
Every time I try to “optimize” something, I realize the problem isn’t React, not JS, not even the backend — it’s that we’re building like the browser is still a thin UI layer.
Meanwhile it can literally run GPU workloads now.
The wild part is most apps don’t feel slow because of one big mistake — it’s death by 100 small decisions:
extra abstractions, unnecessary state sync, over-fetching, “just in case” logic.
And suddenly you’re shipping a dashboard that needs a small team’s worth of compute to render a table.
Feels like Wirth’s law in action — software getting slower faster than hardware gets faster 
Curious where you draw the line in practice:
when do you actually reach for WebAssembly/WebGPU vs just fixing architecture?
That’s a great question 🙂
For me, the line is where fixing the architecture no longer helps. I’d always start with the “cheaper” wins first, reducing over-fetching, unnecessary state, re-renders, and general complexity. A lot of apps are slow because of many small decisions, not because they lack WASM or GPU.
But if the architecture is already reasonable and the problem is simply that you need to do a lot of computation, then it becomes a different story. At that point it’s less about optimizing code and more about choosing the right execution layer.
So I see WASM/WebGPU not as a replacement for good design, but as the next step when good design is no longer enough 🙂
Excellent work on the demo, the JS WASM WebGPU pipeline really drives the point home.
The 2-3x WASM vs optimized JS benchmark highlights something we all do: we spend hours micro-optimizing code without ever questioning if the runtime itself is the right tool for the job. In data it's the same thing, you can spend 3 days tuning a SQL query when the real problem is you're running computation in the wrong place. The "I optimize what I know" reflex is comfortable but it's often what keeps you from seeing the actual solution.
Exactly this! 🙌
It’s the same on the frontend, we often spend hours optimizing milliseconds, while here you can get a 2× gain with WASM and even 200–300× with WebGPU. That really puts things into perspective.
And sure, WebGPU isn’t fully supported everywhere yet, so you need to be mindful there. But WASM? That’s already widely available and production-ready, and still surprisingly underused for cases where it can make a real difference.
"I am truly impressed by your work; you have a very promising future ahead!
Your deep dive into WebGPU and WebAssembly
performance is exactly what the modern web needs.
Most developers scratch the surface, but you are exploring the real engine under the hood.
Keep pushing the boundaries—the digital world needs more pioneers like you! 🚀💻"
This is the kind of performance article we need more of practical, not theoretical.
The gap between works on my machine and works for users on 3G with a mid range phone is where most apps die. And it's not because developers are bad. It's because dev tools and local environments hide the slowness. Chrome DevTools on a MacBook Pro with fast internet tells you nothing about real user experience.
I've definitely been guilty of shipping something that felt fast to me, only to realize later that users were waiting seconds for things I didn't even notice.
The live demo approach is so valuable. Watching something be slow in real time hits different than reading about it.
What's the one performance issue you see most often in the wild that developers seem to ignore?
Thanks for this bookmarking. 🙌
Thanks a lot, really appreciate it 🙌 and yes, exactly — these problems are very often ignored.
At least with DevTools we have some way to simulate slower network conditions and get a glimpse of what users might experience. Testing on weaker hardware is definitely harder though. That said, this is where things like WASM can actually help quite a bit, especially for heavier computations, because of its more predictable performance. And if you can push work to the GPU, even better — even a low-end GPU will usually handle parallel workloads much better than the CPU.
But overall, I still think the most commonly ignored performance issues are related to the network layer. In many small and mid-sized apps, it’s not the computation that hurts the most, but sending too much data, too often, and sometimes completely unnecessarily.
This is such a good demo of “wrong tool vs right tool”
Most teams spend weeks micro-optimizing JS when the real win is changing the execution layer entirely
Oh yes, exactly! 🙌
And there are actually quite a few options here depending on the problem. I’m starting to think this might deserve a separate post on its own 😄
Well this was really fun!! I just spent way too long typing random things into the demo and watching them disintegrate. Idk why that is so satisfying LOL. Thank you for sharing 🙂
Hahaha Evan, exactly! 😄 Totally useless… but so satisfying 😄 I also loved typing way too long phrases just to see if the GPU would still keep up 😂
Cool demo. This had my GPU utilization at 35%, not bad
Haha not bad at all 😄 finally giving the GPU something useful to do! 🤣
I like your article but I like most the Text Goes Boom app, it's quite relaxing and just what I needed a few moments ago :-D
Haha, right? 😄 I had the exact same experience, I even played with it for a moment during my talk just to relax a bit. Turns out “completely useless” apps can be surprisingly therapeutic 😄
The gap between what the browser can do and what most production apps actually do is embarrassing at this point. WebAssembly and WebGPU have been stable enough to ship for a while, but the default React mental model still treats the browser like a dumb terminal waiting for a JSON payload. The live demo approach is the right move here — showing the perf delta in-browser is more convincing than any benchmark screenshot.