DEV Community

Cover image for Train LLM From Scratch
Fareed Khan
Fareed Khan

Posted on

Train LLM From Scratch

I created an end-to-end LLM training project, from downloading the training dataset to generating text with the trained model. It currently supports the PILE dataset, a diverse data for LLM training. You can limit the dataset size, customize the default transformer architecture and training configuration, and more.

This is what my 13 million parameter-trained LLM output looks like, trained on a Colab T4 GPU:

In \*\*\*1978, The park was returned to the factory-plate that the public share to the lower of the electronic fence that follow from the Station's cities. The Canal of ancient Western nations were confined to the city spot. The villages were directly linked to cities in China that revolt that the US budget and in Odambinais is uncertain and fortune established in rural areas.

It's more about learning than making the absolute best AI right away.

Code, documentation, and example can all be found on GitHub:

Github Link

Heroku

This site is built on Heroku

Join the ranks of developers at Salesforce, Airbase, DEV, and more who deploy their mission critical applications on Heroku. Sign up today and launch your first app!

Get Started

Top comments (0)

Billboard image

The Next Generation Developer Platform

Coherence is the first Platform-as-a-Service you can control. Unlike "black-box" platforms that are opinionated about the infra you can deploy, Coherence is powered by CNC, the open-source IaC framework, which offers limitless customization.

Learn more

👋 Kindness is contagious

Please leave a ❤️ or a friendly comment on this post if you found it helpful!

Okay