Hey Mr, DJ! Which large language model should I use?
Presented by Ezequiel Lanza — AI Open Source Evangelist (Intel)
Developers working in artificial intelligence must make a pivotal decision at the start of any language project. Do you use a foundation model or a fine-tuned model? How do you decide? If you’ve ever planned a party, it’s a lot like choosing a DJ.
Imagine you’re planning an event, and you need to pick a DJ for the evening entertainment. Do you go with someone who sticks to the tried-and-true hits or the one who dives deep into the knobs for a mind-blowing, unique experience? Will you go with the DJ who plays a broad mix of the classics, spanning multiple crowd-pleasing genres? Or will you choose the one with the hip-hop playlist that would make DJ Kool Herc jealous?
The answer depends on the type of party you want to throw — just as selecting the right language model depends on the type of results you want to achieve. In this post, we’ll explore the differences between classically trained foundation models and fine-tuned models to help you decide which type of large language model works best for your project.
Foundation Models (FM): The Wedding DJ of LLMs
Foundation models are like wedding DJs, both trained to have mass appeal by ingesting a large amount of general data, or in the case of DJs, music ranging from Bruno Mars to ABBA to Shania Twain. Foundation models aren’t designed to solve for specific tasks, but rather to capture general information in the data, such as semantics, grammar, and basic facts — much like a wedding DJ chooses and serves up music from a broad collection of songs designed to appeal to everyone from the bride to that cousin you didn’t expect to actually show up.
Photo by Andreas Rønningen on Unsplash
GPT, LLaMA, and Bloom are all examples of foundation models. Training these models demands an enormous amount of data, which poses challenges around data requirements, computational resources (requiring large and distributed computing systems), and technical expertise. Due to the high cost of training and deployment, most individuals — outside of perhaps Elon Musk or Taylor Swift — can’t afford to train a foundation model without help from multiple stakeholders. For instance, it’s estimated that Meta spent a whopping 10 million dollars training LLaMA — try expensing that in Concur.
Fine-Tuned Models: The Perfected Playlist
A DJ who wants to specialize in a specific genre needs to absorb and focus on all the top hits from that decade, much like a fine-tuned LLM trains on specific data sets. Transforming a wedding DJ, who has already studied a wide range of music artists, styles, and genres, into a specialized 80s DJ takes far less time than training a DJ from scratch, just as the process for fine-tuning a foundation model is significantly shorter and requires fewer hardware resources than building a foundation model from the ground up.
Photo by Natalie Cardona on Unsplash
Any developer with a specific dataset can fine-tune a foundation model, which can be customized for specific topics from music to medicine to whatever you can begin to imagine. The open source community provides its fine-tuned models on Hugging Face, making it easy to find customized databases to suit your specific needs. For example, if you explore the platform in need of medical assistance, you might discover ChatDoctor, a great fine-tuned model with the ability to understand patients’ needs, provide informed advice, and offer valuable assistance in a variety of medical-related fields. It was trained on data from specific doctor-patient interactions, utilizing LLaMA as its foundation model.
However, not all fine-tuned models are designed to respond to specific tasks; some are designed to find similar results using fewer parameters to reduce hardware requirements. A great example is the recently launched Intel Neural Chat https://huggingface.co/Intel/neural-chat-7b-v3-3, which is a fine-tuned model based on the open source foundation model: Mistral 7B (from Mistral AI). This post explains how Mistral 7B was fine-tuned using Intel® Gaudi® 2 accelerator and software optimizations like Intel® Extension for Transformers which helps compress models such as Quantization, distillation, and pruning.
Foundation Models vs. Fine-Tuned Models In Action
Suppose we have a model fine-tuned with Kubernetes information, and we want to use it to answer domain-specific questions. In our case, we want to ask the LLM to define “Dragon,” a scheduling and scaling controller for managing distributed deep-learning jobs in a Kubernetes cluster.
Let’s ask ChatGPT.
ChatGPT (FM) responds to a prompt by acknowledging its lack of specific knowledge when asked about something outside the scope of its training.
Now, let’s ask a model that’s been fine-tuned on Kubernetes and containers, what a “Dragon” is:
A fine-tuned model on Kubernetes topics can recognize the term “dragon” without requiring additional information.
So, Which Model Should You Use?
There’s no specific rule to determine which model is better; rather, it depends on what aligns best with your specific environment. However, three factors can guide your decision-making process:
Scope and versatility:
- Foundation Models (FMs): opt for FMs when a broad, versatile language model is needed for general tasks, such as sentiment analysis or common queries.
- Fine-Tuned Models (FTMs): Choose FTMs for task-specific requirements, where customization and expertise in a particular domain are crucial.
Resource efficiency:
- Foundation Models (FMs): Consider the substantial computational resources and costs associated with training FMs, along with the accessibility options, whether free downloads or paid APIs.
- Fine-Tuned Models (FTMs): Assess the quick and less resource-intensive fine-tuning process, making it a practical choice for specific applications with limited resources.
Customization and Privacy:
- Foundation Models (FMs): Consider FMs for their general capabilities and openness, making them suitable for diverse applications. Evaluate the trade-offs between technical capabilities and licensing implications when choosing between freely accessible models like Falcon or Dolly, and exclusive APIs, like GPT-4.
- Fine-Tuned Models (FTMs): Leverage FTMs for customization, particularly in scenarios requiring privacy, like healthcare. Explore pre-trained models on platforms like Hugging Face for quick deployment while considering privacy implications associated with specific use cases. You will need to have a decent number of examples of what you would like to fine-tune your model.
Now that you’ve decided which large language model to use, relax and pump up the jams with this AI-inspired list of top 80s tunes.
https://open.spotify.com/playlist/7b8yw4Zw94zpdY4hbIORtQ?si=b2228b493afc4a5b.
Find the video realated to this post here! https://www.youtube.com/watch?v=fNXJ8NSfsIo
Let me know in the comments which models you’re using.
About the author
Ezequiel Lanza is an open source evangelist on Intel’s Open Ecosystem team, passionate about helping people discover the exciting world of AI. He’s also a frequent AI conference presenter and creator of use cases, tutorials, and guides to help developers adopt open source AI tools like TensorFlow and Hugging Face*. X at @eze_lanza.
For more open source content from Intel, check out open.intel
Top comments (0)