DEV Community

Cover image for I Built a "GPT" in My Browser in One Evening. The Journey from Amnesia to Stable Learning with Pure JS.

I Built a "GPT" in My Browser in One Evening. The Journey from Amnesia to Stable Learning with Pure JS.

Pavel on August 12, 2025

Hello, community! https://github.com/Xzdes/slmnetGPT Sometimes, the best projects are born from a simple "What if...?" question. One evening, I w...
Collapse
 
prema_ananda profile image
Prema Ananda

Nice! This could actually save tons of electricity. Why run a big model on the server for simple responses like "hello" or "thanks"? Your browser thing can handle that locally and use way less energy.

If this gets implemented everywhere, servers would run cooler and consume less power. Simple questions - handled locally, complex ones - sent to server. Makes sense!

Collapse
 
xzdes profile image
Pavel

That's right, it's a waste to spend so many resources on such a brief chat.

Collapse
 
xzdes profile image
Pavel

Or here's another thought for those who will read! The little model doesn't just respond to "Thank you," it becomes a context manager. Let's see how cool it is:
The user: "Hi! Tell me about hybrid AI."
Little GPT (in the browser): Responds instantly: "Hello there! Of course, I'll be happy to tell you."
"Behind the scenes": At the same moment, the small model sends a request to the large LLM, which looks something like this:
The main request is to talk about hybrid AI.
Contextual package: { "status": "dialog started", "user greeting": "Hello!", "robot reply": "Hello! Of course, I'll be happy to tell you.", "tone": "friendly" }
Having received such a "package", LLM immediately, without unnecessary introductions, I understand the whole picture:
The user has already been greeted.
He's friendly.
The beginning of the answer has already been given, and LLM needs to organically continue it.
The LLM response will be immediately relevant, without repeating "Hello! How can I help you?" LLM will just continue the conversation as if we were one from the very beginning.
This solves the key problems:
Complete seamless operation: The user will never notice the "switching" between the models.
Saving resources on a new level: LLM does not waste energy analyzing the beginning of the dialogue, but immediately gets to the point.
Deep understanding of the context: The dialogue becomes personalized and much more natural.

Collapse
 
xzdes profile image
Pavel

Or the idea of a hybrid AI response:
Instant Start: As soon as you ask a question, the neural network immediately gives out the first part of the answer right in the browser — for example, "Of course, I'll help you now!"
The full answer follows: While you are reading this greeting, a powerful neural network on the server is already preparing a detailed, basic answer, which appears in a moment.
Result: For the user, communication looks instant, without pauses and waiting. This makes interaction with AI faster, smoother, more natural, and seamless.

Collapse
 
xzdes profile image
Pavel

I released a post about the continuation of the project development! I think it turned out great!

dev.to/xzdes/i-supercharged-my-bro...

Collapse
 
xzdes profile image
Pavel

I was sitting here thinking about why I created this neural network, and then it dawned on me! This can be part of the query optimization process! an LLM user, writes many similar and simple requests like Thank You! How are you! and why send them to the server for this when you can process them on the client and if it does not cope, then make a request to the server.
How do you like the idea?

Collapse
 
xzdes profile image
Pavel

So, I created a project aimed at improving the API, this little bot can learn from LLM and LLM to say "don't bother me" if the user writes a message like "Thank you".
github.com/Xzdes/slmnet-Hybrid

Collapse
 
xzdes profile image
Pavel

I've completely rebuilt the project, transforming it from a classifier chatbot into a full-fledged GPT that runs in the browser using only vanilla JavaScript. I believe I've pushed this to its absolute limit!

Training takes about 10 minutes and generation takes up to 2 minutes, all happening client-side in the browser. I couldn't squeeze any more performance out of it. It's not much, but it's honest generation. The model is dumb and slow, but it's a real GPT!

Thank you for taking the time to read through this. I appreciate all of your feedback and support.

Collapse
 
parag_nandy_roy profile image
Parag Nandy Roy

Turning what if into heck yeah in just one evening is peak dev energy...

Collapse
 
xzdes profile image
Pavel

Thank you very much!

Collapse
 
dchif profile image
Daniel Chifamba

This is brilliant! I love it. Thanks for sharing

Collapse
 
xzdes profile image
Pavel

Thank you very much!

Collapse
 
xzdes profile image
Pavel

Here's another application! You can also use this on your own neural network aggregator service to avoid using up your neural network limits!

Collapse
 
willam_stock_5f8299da210d profile image
Willam stock

Wow
that’s awesome Building a GPT in the browser with pure JS in one evening is seriously impressive.

Collapse
 
xzdes profile image
Pavel

Thank you very much! But technically it's an imitation of GPT, and GPT is an imitation of communication)

Collapse
 
xzdes profile image
Pavel

I created a new project and changed its name to slmnet-Hybrid, but it's really not gpt.
github.com/Xzdes/slmnet-Hybrid
Then I'll write a new post.

Collapse
 
ttsoares profile image
Thomas TS

The browser is able to use the GPU to deal with animations like windy.com.

Is this 'nano GPT' making use of the GPU also?

Collapse
 
xzdes profile image
Pavel

No, this implementation runs entirely on the CPU. The project's goal is purely educational: to demonstrate the inner workings of a transformer model without relying on GPU-accelerated libraries.

Collapse
 
tanelith profile image
Emir Taner

Love it - you basically built a mini GPT pet that doesn’t forget you after refresh

Collapse
 
xzdes profile image
Pavel

Here is the latest version — it’s a full-fledged GPT, but in a small cage