WebGPU for local inference is the right direction. We hit a similar problem building data infrastructure for edge AI — when your robot needs to make decisions in under 50ms, round-tripping to an API is not even an option.
The 1.5-3GB model download concern you mentioned is real though. In our case we solved it by shipping a smaller quantized model (Q4) as part of the binary itself. The tradeoff is accuracy vs. startup time, but for many edge use cases that tradeoff makes sense.
One question: how does react-brai handle tab-level coordination? If a user has 3 tabs open, do they each download their own model copy? That was a nasty issue we had with Web Workers — each worker loading its own model into memory and OOM-killing the browser.
For further actions, you may consider blocking this person and/or reporting abuse
We're a place where coders share, stay up-to-date and grow their careers.
WebGPU for local inference is the right direction. We hit a similar problem building data infrastructure for edge AI — when your robot needs to make decisions in under 50ms, round-tripping to an API is not even an option.
The 1.5-3GB model download concern you mentioned is real though. In our case we solved it by shipping a smaller quantized model (Q4) as part of the binary itself. The tradeoff is accuracy vs. startup time, but for many edge use cases that tradeoff makes sense.
One question: how does react-brai handle tab-level coordination? If a user has 3 tabs open, do they each download their own model copy? That was a nasty issue we had with Web Workers — each worker loading its own model into memory and OOM-killing the browser.