DEV Community: Boris B.

Why you shouldn't Train your LLM from Scratch

Boris B. — Tue, 22 Oct 2024 08:35:00 +0000

Being the smart and curious person you are, you likely find the prospect of creating an LLM from scratch exciting.

Or at least, you're interested in knowing what it takes to create one from the ground up. That's completely understandable - who wouldn't?
However, you probably already know you can't but want to know regardless. To be blunt, it is impractical for most individuals and organisations to do this.

But knowledge is free, so let's see what it takes to build an LLM from scratch 😊.

Think It's Expensive? You Have No Idea

Let's use GPT-4 as an example since that's the AI model with the most public information on its associated training costs. It took 25,000 Nvidia A100 GPUs running for 90–100 days non-stop to train the model. Considering that each A100 GPU costs around $15K, the total GPU expense alone was about $375M.

To put that into perspective, these costs surpass the annual budgets of many mid-sized companies.

If buying the hardware seems too steep, renting might appear more accessible. However, renting A100 GPUs on cloud platforms like AWS costs about $3 per hour. Putting the cost for GPT-4s training at $180M , which is cheaper than buying the training hardware but not cheap either.

Similarly, LLama3 was trained on 24,000 H100 Nvidia GPU, meaning the estimated GPU training costs were $720m. These 2 examples give a good idea of the main cost when it comes to training.

Funny enough, when people consider the costs of training an LLM, they often focus solely on the above GPU expenses. Forgetting other less spoken-of costs like electricity, extra hardware, personnel costs etc.

But enough on the GPUs, let's now talk data.

Data - Feeding the Beast

My simple definition of an AI model that I have used dating pre-ChatGPT has always been that a model is an Algorithm combined with Data. The interesting thing about LLMs is that they take the data piece to a whole new level. We are talking hundreds of Gigabytes of text.

Artificial Intelligence Model = Algorithm + Data

Not only do you need large amounts of data to feed this beast, you need diversity as well. The data has to be varied enough to help the model understand language in all its forms.

Which means ingesting everything from classic literature, and code to the latest blog posts. A nice blend of wonderful Shakespearean writing and a bunch of people like myself on the internet with chaotic writing styles all in one mix. Plus, all of this has to be of high quality and representative of the world we live in today - languages, cultures, you name it.

Sticking with our GPT-4 example, the model was trained on about 10 trillion words. To give you an idea, it would take all Twitter users over 3.5 years to generate 10 trillion words at current rates.

That said, one could argue that more and more free datasets for training LLMs are becoming available (The Pile - 825GiB, Common Crawlt) making this "easier". True, but most of them still require extensive cleaning and formatting. Moreover, handling data on such a large scale requires robust infrastructure for storage and quick access during training.

Not Just Another Neural Network

There's a good reason why only a few people can DIY an LLM. In fact, Mistral AIᅳa French companyᅳmanaged to raise an astounding $113 million in seed funding without a product, simply by stating to investors that they had five employees with the expertise to create an LLM from scratch.

Photo by Clint Adair on Unsplash

Large Language Models use an advanced type of neural network called Transformers. These are especially efficient at producing models good at predicting the next word in a sentence, which is key for generating coherent text. While bigger isn't always better in most fields, it often is, with LLMs. And now the challenge becomes figuring out how to beat down your large collected dataset into this sophisticated algorithm ᅳthe process known as training.

Because training can be incredibly time-consuming, optimization becomes a must. This usually involves techniques like distributed training or parallelization to handle computations more efficiently, using mixed precision (32 & 64-bit) to reduce memory usage, and checkpointing to save your progress over time.

Not to bore you with the details, but my point here is that training an LLM is not for the weak. In fact, at times, a Python implementation of a transformer architecture can look less like Python and more like C.

class MultiHeadAttention(nn.Module):

    def __init__(self, d_model, n_head):
        super(MultiHeadAttention, self).__init__()
        self.n_head = n_head
        self.attention = ScaleDotProductAttention()
        self.w_q = nn.Linear(d_model, d_model)
        self.w_k = nn.Linear(d_model, d_model)
        self.w_v = nn.Linear(d_model, d_model)
        self.w_concat = nn.Linear(d_model, d_model)

    def forward(self, q, k, v, mask=None):
        q, k, v = self.w_q(q), self.w_k(k), self.w_v(v)
        q, k, v = self.split(q), self.split(k), self.split(v)
        out, attention = self.attention(q, k, v, mask=mask)
        out = self.concat(out)
        out = self.w_concat(out)
        return out

    def split(self, tensor):
        """
        split tensor by number of head

        :param tensor: [batch_size, length, d_model]
        :return: [batch_size, head, length, d_tensor]
        """
        batch_size, length, d_model = tensor.size()

        d_tensor = d_model // self.n_head
        tensor = tensor.view(batch_size, length, self.n_head, d_tensor).transpose(1, 2)
        return tensor

    def concat(self, tensor):
        """
        inverse function of self.split(tensor : torch.Tensor)

        :param tensor: [batch_size, head, length, d_tensor]
        :return: [batch_size, length, d_model]
        """
        batch_size, head, length, d_tensor = tensor.size()
        d_model = head * d_tensor

        tensor = tensor.transpose(1, 2).contiguous().view(batch_size, length, d_model)
        return tensor

Beyond Training

After training, you need to make sure you have something useful. And fortunately or unfortunately the bar for a useful model is quite high these days.

Without proper evaluation, your model might spit out nonsense or even harmful content. The best way to assess it is by benchmarking against existing LLMs. But sometimes, only a human can catch the subtle details, so human evaluation is imperative at this stage.

If you're lucky enough to have a model that performs well, you can move on to post-training techniques like fine-tuning and prompt engineering. All methods that you might be more familiar with that allow you to adjust your model based on the evaluation results to improve its performance.
And if you're feeling up for it, why not release it to the world with built-in feedback loops to further refine your model over time?

⁤Still thinking about creating an LLM from scratch? ⁤⁤Go aheadᅳbe my guest!

Personally, I'll be sticking to finetuning or prompt engineering my way through existing Large Language Models. And though, a Data Scientist, ⁤⁤I'll only consider training Machine Learning models for specific use cases where LLMs are too costly or fail altogether.

Thanks for reading!
Like this story? Subscribe below, or Connect with me on LinkedIn & Twitter!

The Emergence of AI Tools: A New Era of Productivity and Efficiency

Boris B. — Sat, 29 Apr 2023 23:25:53 +0000

As we move further into the digital age, the demand for tools and software that can boost productivity and efficiency has never been higher. In recent years, artificial intelligence (AI) has emerged as a game-changer in this space, with AI tools becoming increasingly prevalent across multiple industries.

In fact, according to AI tool sites like AIToolFinder and Theresanaiforthat, over 2000 AI products have launched in the past year alone. With the launch of GPT-4, it's clear that the influence of AI tools will only continue to grow.

In this article, we'll explore the prevalence of AI tools and their impact on productivity, as well as the challenges that come with the abundance of choices. We'll also delve into the future of these tools and how they will continue to shape our society. So buckle up and let's take a deep dive into the world of AI tools!

Impact on Productivity

There's no denying that AI tools have a significant impact on productivity. In fact, in some cases, they have been shown to boost productivity by more than five times. This is because they can automate repetitive and time-consuming tasks, allowing us to focus on more complex and creative work.

This impact is being felt across multiple industries, from marketing to music, programming and art. For example, in marketing, AI tools can analyze vast amounts of data to help businesses make data-driven decisions and create personalized marketing campaigns.

In the music industry, AI tools can help artists create new sounds and generate music based on different genres and styles. In the art world, AI tools can help artists create stunning pieces by generating new ideas and designs.

However, it's important to exercise caution when it comes to our dependence on these tools. The risk of becoming too reliant is high. While they can undoubtedly improve productivity, it's important to remember that they are not infallible.

This is because they are only as good as the data they are trained on, and there is always the risk of bias or errors in the data. Moreover, if we become too reliant, we risk losing critical thinking skills and creativity, which are essential for problem-solving and innovation.

In the next section, we'll explore the challenge of too many AI tools and how to navigate this landscape to make informed decisions.

The Problem with Too Many AI Tools

While the proliferation of AI tools has undoubtedly improved productivity, it has also created a new problem: too many options. With the launch of GPT4, this number is likely to increase even further.

The sheer number of AI tools available can make it challenging to decide which one to use. Moreover, many of these tools are GPT-powered, which means they offer similar functionality. As a result, it can be difficult to discern which one is the best fit for your needs.

One solution to this problem is to look for a tool that combines multiple functionalities needed for your work. For instance, some tools may offer both natural language processing and image recognition capabilities, which can be useful for marketing campaigns that require both text and visuals.

Additionally, directory sites like AItoolfinder can be helpful because they categorize tools by industry, task, and other parameters. These sites may also offer exclusive deals on certain products, making them an attractive option.

For builders or individuals with a curiosity for how these tools work, there is a different problem. While it's natural to want to build one's own AI tool, it's important to keep in mind that most of these tools powered by GPT are simply extensions of it.

This means that a simple feature addition by OpenAI could render hundreds of these tools useless. However, it's worth noting that OpenAI and GPT still struggle with user experience, which presents a unique opportunity for builders to create a better user experience.

The Future of AI Tools for Builders

For those with a builder mindset, the future of AI tools presents an exciting opportunity. While the majority of current tools are powered by GPT and are merely extensions, there is still room for real innovation and growth.

As AI technology continues to improve, we can expect to see more AI-enhanced traditional systems across different industries. For instance, AI tools can be used to automate repetitive tasks, analyze data, and make predictions.
This will enable companies to work more efficiently and make better decisions. A nice example is *EarlyBird *- a voice-based onboarding tool powered by AI for employability providers.

Moreover, as AI becomes more ubiquitous, there will be a growing demand for customization and integration. This presents an opportunity for builders to create new and innovative systems that meet specific needs and integrate with existing systems.

In addition to building new tools, builders can also contribute to the development of GPT and other AI technologies. By identifying and solving UX problems, builders can help improve the user experience of AI tools and make them more accessible to a wider audience.

Ultimately, the future of AI tools for builders is one of endless possibility. As the technology evolves, so too will the opportunities for builders to create new and innovative products that reshape the way we work and live.

Wrapping Up ...

In conclusion, the prevalence of AI tools has brought about both benefits and challenges. On the one hand, these tools have the potential to revolutionize productivity and reshape society. On the other hand, the sheer number of tools available has led to the problem of choice overload, and the risk of dependence on these tools is high.

Despite these challenges, the future of AI tools looks bright. With the continued development of AI technologies, we can expect to see more advanced and sophisticated tools that are tailored to specific needs and industries.

Nonetheless, it's important to approach AI with a critical eye, understanding the potential benefits and limitations. It's also crucial to consider the ethical implications of AI tools and to ensure that they are used in a responsible and equitable manner.

For those with a builder mindset, there is a wealth of opportunity in the AI tool space. By building new and innovative products, and contributing to the development of existing ones, builders can help shape the future of work and technology.

If you are a builder interested in discussing ideas or potential collaborations, feel free to reach out to me on Twitter.

How much does your GitHub profile reflect your experience as a DEV ?

Boris B. — Thu, 23 Jul 2020 09:54:03 +0000

I find myself having an empty GitHub profile mainly because I work mostly with companies who tend to have private repos on GitHub. As a result, my personal Github is empty and does not reflect my experience as a Dev.

What are your thoughts ?

Strapi vs API Platform

Boris B. — Wed, 29 Apr 2020 22:27:45 +0000

What are your thoughts on these 2 options for building backend APIs ?

Do u use Tailwind in React?

Boris B. — Mon, 16 Dec 2019 15:05:45 +0000

Hearing à lot about Tailwind CSS recently. Is there a React implementation and is it better than Material-ui ?

Is React ➕ Firebase the perfect combo?

Boris B. — Fri, 13 Dec 2019 22:28:41 +0000

Am just learning React js and am wondering if React and Firebase are the right pick for my projects going forward. What do y'all think?

CORS is always so annoying

Boris B. — Sat, 05 Oct 2019 11:44:48 +0000

Why is CORS always so annoying?

The Untold Secret to SYMFONY REST APIs in Less Than Ten Minutes

Boris B. — Sun, 15 Sep 2019 21:23:17 +0000

Hello, today I will love to share a very interesting experience I had this year with API Development. Basically it is about how I was able to build a production-ready Symfony REST API with little prior knowledge of the framework.

So I had this project a client wanted me to work on but was very specific about using Symfony. It was a backend API to be used to power a front-end React SPA. Even though I had never worked with Symfony, I felt I could handle it. I did have some experience with CodeIgniter 3 and Slim PHP which I knew I could leverage but I had never built an API before. But I knew I could not let this go – the experience – plus the client was playing really well 😉 . So I got to work …

The Magic of Books

“Books are a uniquely portable magic.” ― Stephen King

I needed some magic. I had to build not just an API but a good one. And yes, I am one of those who believe in doing work only if you can get well done. So I turned to books. The first that crossed my mind was Philip Sturgeon’s BUILDING APIs YOU WON’T HATE. A friend had once recommended it but I didn’t need it at the time.

I grabbed the book and finished it in no time. And what I really loved was how the book demystified the whole thing for me. In his book, Philp really makes it fun and I immediately began feeling very confident after just knowing all these new concepts. I then went further to read another book – UNDISTURBED REST by Michael Stowe. After those two, I literally felt APIs run the world.

SYMFONY REST APIs: A First Try

I was now ready to get my hands. But first I had to learn Symfony.

I watched YouTube videos and took some courses but the best by far was the Stellar Development with Symfony 4 course on SymfonyCasts. If you are like me and really new to Symfony like I was, SymfonyCasts is the place you should go to. Symfony 4 is very interesting and when I found out that most of the cool stuff came in Version 4, I was so happy I had never tried Symfony before that 😊 . Once the fundamentals were in place I had to start seeing how to bring in the whole API stuff into the show. That’s when I came across FOSRESTBundle and API Platform. And now there was a dilemma!

FOSRESTBundle VS API Platform

There are a lot of articles and posts all over about this so I am going to rant about which is better but rather what worked for me. So I read all the comparisons I could find and decided to try the two out for myself. I first built a simple API in Symfony (no tools) which was really cool and simple. I then tried doing the same with FOSRESTBundle but found it a bit tricky and I spent a lot of time trying to figure out how to make it work. It made the process easier but I probably did not feel the ease like I would have loved to.

At that point, I was already wanting to build the API with Symfony (no tools) but then I gave API Platform a try and sincerely, I was blown away !

So there I was, excited about this new discovery which made building an API – So Super Easy. It was too good to be true so I got did a lot of research and a very good article I read was one by Marek Gajda – Practical guide to API Platform: How to tell if it’s the right framework for you. And then I decided API Platform was the guy for my SYMFONY REST API.

Conclusion

To keep it really brief API platform is a framework built on Symfony that speeds up the development RESTful APIs. It is based upon best practices and focuses on not reinventing the wheel when it comes to API Development.

SYMFONY REST APIs with API Platform are definitely worth considering for any small, medium or large scale projects. Although far from perfect I just really believe in the project and the team. They do a great job to fix issues and respond to questions on their Slack channel. That’s what enabled me to complete my first project and to build a production-ready RESTFUL API with little prior experience in Symfony.

I would be putting up a full post on my blog on some challenges I faced with API Platform (and still face now), some tips on how to get through them and how I went to complete the API and host it on AWS Elastic Beanstalk.

Do leave any questions you have and if you have any experience with API platform, I will love to hear it.

Thanks!

What’s the Difference Between Data Science and Data Analytics?

Boris B. — Thu, 12 Sep 2019 17:02:46 +0000

As businesses collect more and more data, the roles of data scientists and data analysts have grown exponentially.

But what exactly do they each do?

To get a better idea of which path is right for you, you must first decide whether you’re more interested in working with numbers and business intelligence tools or with programming and machine learning.

You must also determine where you see yourself sitting in your future office environment. If you see yourself collaborating with business managers, stakeholders and CEOs, the path of a data analyst will challenge you daily to contribute to business growth. If you find yourself more comfortable with engineering teams, you may be best suited to the field of data science.

Just thoughts I should share some notes from a webinar I participated in.

Cheers!

The Early Internet is disappearing

Boris B. — Mon, 09 Sep 2019 08:30:25 +0000

Online Now ≠ Online Tomorrow
The early #internet is disappearing. Wanna at least know what it looked like? 🤓

https://oneterabyteofkilobyteage.tumblr.com/

Googling as a Software Engineer

Boris B. — Wed, 04 Sep 2019 14:49:59 +0000

How much Googling to you do as a Software Engineer? As for me ... ALOT
https://localghost.dev/2019/09/everything-i-googled-in-a-week-as-a-professional-software-engineer/