Hello everyone!
We are two 13-year-old students from Russia, and we want to show
you our model: QwenSeek-2B!
Based on: Qwen3.5-2B, which we fine-tuned on ~8K
reasoning examples from DeepSeek-V4-Flash using the "Unsloth" Framework.
For us, this is a big result! We released the model on Hugging Face, and the
GGUF version has already gained over 1000+ downloads! We are just incredibly
happy about this!!!
Training Details:
Total epochs: 1 (250 steps)
Training time: ~10 hours (at a speed of 0.01 it/s...)
Total examples: ~8K (input and output)
Dataset: https://huggingface.co/datasets/Jackrong/DeepSeek-V4-Distill-8000x
Hardware: T4 x 1 (Kaggle)
Context window size: 4096
We will continue to release new models, even cooler! And maybe in the near
future, we will train a model from scratch! But for now, we are looking for good
GPUs, maybe we will even apply somewhere for GPU Grants!
By the way, the training took 10 hours, right? But we faced huge obstacles
before we could finally achieve the result! For a couple of days, we suffered
from...:
There were times when the loss spiked to 3+...
There were times with
OOM in the middle of training...
And other minor bugs... But we handled it!
Also: We trained in FP32 because Qwen3.5 refused to work with FP16, and BF16
didn't work on T4... 🤣
We just decided to make a "Home" DeepSeek! A Mini DeepSeek!
That’s how it
is, 13-year-old students are slowly teaching AI! :)
And now... If you want, you
can try our models, or just take a look:
https://huggingface.co/faunix/QwenSeek-2B
And also the GGUF version:
https://huggingface.co/faunix/QwenSeek-2B-GGUF
Try running it, see how it thinks and maybe send us Feedback!
We will be very grateful! Ask your questions, ask...
And let's go build open AI
together! :)
Top comments (2)
Good job!
Thanks!