DEV Community

I Built a "GPT" in My Browser in One Evening. The Journey from Amnesia to Stable Learning with Pure JS.

Pavel on August 12, 2025

Hello, community! https://github.com/Xzdes/slmnetGPT Sometimes, the best projects are born from a simple "What if...?" question. One evening, I w...

Read full post

Prema Ananda • Aug 13

Nice! This could actually save tons of electricity. Why run a big model on the server for simple responses like "hello" or "thanks"? Your browser thing can handle that locally and use way less energy.

If this gets implemented everywhere, servers would run cooler and consume less power. Simple questions - handled locally, complex ones - sent to server. Makes sense!

Pavel • Aug 13

That's right, it's a waste to spend so many resources on such a brief chat.

Nilesh A. • Sep 19

This is such a cool project! Like how you broke down the classic AI pitfalls (echoing, catastrophic forgetting, and identity shifts) into a personal coding journey. The idea of combining LocalStorage with experience replay and transfer learning in pure JS is both creative and educational. It’s refreshing to see AI concepts explained and applied in such a hands-on, browser-based experiment. Really inspiring work! 🙌

Pavel • Aug 13

Or here's another thought for those who will read! The little model doesn't just respond to "Thank you," it becomes a context manager. Let's see how cool it is:
The user: "Hi! Tell me about hybrid AI."
Little GPT (in the browser): Responds instantly: "Hello there! Of course, I'll be happy to tell you."
"Behind the scenes": At the same moment, the small model sends a request to the large LLM, which looks something like this:
The main request is to talk about hybrid AI.
Contextual package: { "status": "dialog started", "user greeting": "Hello!", "robot reply": "Hello! Of course, I'll be happy to tell you.", "tone": "friendly" }
Having received such a "package", LLM immediately, without unnecessary introductions, I understand the whole picture:
The user has already been greeted.
He's friendly.
The beginning of the answer has already been given, and LLM needs to organically continue it.
The LLM response will be immediately relevant, without repeating "Hello! How can I help you?" LLM will just continue the conversation as if we were one from the very beginning.
This solves the key problems:
Complete seamless operation: The user will never notice the "switching" between the models.
Saving resources on a new level: LLM does not waste energy analyzing the beginning of the dialogue, but immediately gets to the point.
Deep understanding of the context: The dialogue becomes personalized and much more natural.

Pavel • Aug 13

Or the idea of a hybrid AI response:
Instant Start: As soon as you ask a question, the neural network immediately gives out the first part of the answer right in the browser — for example, "Of course, I'll help you now!"
The full answer follows: While you are reading this greeting, a powerful neural network on the server is already preparing a detailed, basic answer, which appears in a moment.
Result: For the user, communication looks instant, without pauses and waiting. This makes interaction with AI faster, smoother, more natural, and seamless.

Pavel • Aug 26

I released a post about the continuation of the project development! I think it turned out great!

dev.to/xzdes/i-supercharged-my-bro...

Pavel • Aug 15

I've completely rebuilt the project, transforming it from a classifier chatbot into a full-fledged GPT that runs in the browser using only vanilla JavaScript. I believe I've pushed this to its absolute limit!

Training takes about 10 minutes and generation takes up to 2 minutes, all happening client-side in the browser. I couldn't squeeze any more performance out of it. It's not much, but it's honest generation. The model is dumb and slow, but it's a real GPT!

Thank you for taking the time to read through this. I appreciate all of your feedback and support.

Pavel • Aug 13

I was sitting here thinking about why I created this neural network, and then it dawned on me! This can be part of the query optimization process! an LLM user, writes many similar and simple requests like Thank You! How are you! and why send them to the server for this when you can process them on the client and if it does not cope, then make a request to the server.
How do you like the idea?

Pavel • Aug 14

So, I created a project aimed at improving the API, this little bot can learn from LLM and LLM to say "don't bother me" if the user writes a message like "Thank you".
github.com/Xzdes/slmnet-Hybrid

Parag Nandy Roy • Aug 14

Turning what if into heck yeah in just one evening is peak dev energy...

Pavel • Aug 14

Thank you very much!

Daniel Chifamba • Aug 14

This is brilliant! I love it. Thanks for sharing

Pavel • Aug 14

Thank you very much!

Pavel • Aug 13

Here's another application! You can also use this on your own neural network aggregator service to avoid using up your neural network limits!

Willam stock • Aug 14

Wow
that’s awesome Building a GPT in the browser with pure JS in one evening is seriously impressive.

Pavel • Aug 14

Thank you very much! But technically it's an imitation of GPT, and GPT is an imitation of communication)

Pavel • Aug 14

I created a new project and changed its name to slmnet-Hybrid, but it's really not gpt.
github.com/Xzdes/slmnet-Hybrid
Then I'll write a new post.

Thomas TS • Aug 20

The browser is able to use the GPU to deal with animations like windy.com.

Is this 'nano GPT' making use of the GPU also?

Pavel • Aug 20

No, this implementation runs entirely on the CPU. The project's goal is purely educational: to demonstrate the inner workings of a transformer model without relying on GPU-accelerated libraries.

Emir Taner • Aug 18

Love it - you basically built a mini GPT pet that doesn’t forget you after refresh

Pavel • Aug 19

Here is the latest version — it’s a full-fledged GPT, but in a small cage