DEV Community

Cover image for Databricks Dolly: A Free Powerful Open Source Large Language Model For Business
Novita AI
Novita AI

Posted on

Databricks Dolly: A Free Powerful Open Source Large Language Model For Business

Introduction

The artificial intelligence sector is witnessing rapid expansion and diversification, driven by growing demand across various sectors. Databricks Dolly has emerged as a significant contender in the large language model (LLM) landscape, offering an alternative to established players such as ChatGPT and Google Bard. When assessing LLM options for your enterprise, it’s crucial to grasp where models like Dolly stand in the market. This understanding will aid you in making informed decisions that align with your infrastructure, business objectives, and operational needs.

AI technologies have evolved to become more advanced and accessible, leading to widespread adoption and recognition as the technology matures swiftly. The market is experiencing a surge in AI-powered products and services, spanning from sophisticated analytics and automation tools to AI-driven customer service and marketing solutions.

What is Databricks Dolly?

Databricks Dolly stands out as an open-source large language model designed for natural language instruction-following, offering generative text responses for tasks like summarization, question answering, and brainstorming. In contrast to closed options like ChatGPT, Databricks Dolly, officially known as Databricks Dolly 2.0, has been fine-tuned using a training dataset crowdsourced from Databricks employees. This 12 billion-parameter model was developed based on over 13,000 demonstrations of instruction-following behavior provided by more than 5,000 Databricks employees between March and April 2023.

You can check the video below:

Image description

Databricks Dolly 2.0 vs 1.0

In their blog announcing Databricks Dolly 2.0, the company elucidated their choice to utilize an open-source model and a custom dataset, emphasizing that organizations will have the freedom to construct or tailor their own large language models “without the need for paying API access fees or sharing data with external parties.”

Databricks highlights its fine-tuning instruction dataset, databricks-dolly-15k, as “the inaugural open-source, human-generated instruction dataset specifically crafted to imbue large language models with the interactive charm of ChatGPT.” Unlike its predecessor, Dolly 1.0, unveiled in March 2023, which included output from ChatGPT, was trained on a model with half the parameters, fine-tuned on a limited 50k response/pair dataset, and lacked a commercial use license, Dolly 2.0, launched in April 2023, was trained on a 12 billion parameter open-source model (by EleutherAI) and importantly, is commercially deployable owing to its crowdsourced/human-generated question and answer pairs.

Enterprises contemplating the adoption of Databricks Dolly 2.0 can be assured of the dataset’s reliability. Databricks set clear guidelines for the creation of its data and answer set, ensuring that all answers were human-generated and unique responses.

Who Should Use Databricks Dolly?

Dolly’s unique selling point lies in its open-source dataset, presenting distinct advantages for enterprises seeking to develop AI solutions tailored to specific use cases. Here are some scenarios where Databricks Dolly might be particularly suitable:

  1. Enterprises subject to stringent data compliance regulations: Dolly’s open-source nature, independent of APIs, enables organizations in highly regulated sectors to create AI solutions without triggering data security or compliance issues typically associated with API-dependent tools.
  2. AI researchers and developers: Dolly offers a robust yet highly adaptable platform, empowering researchers and developers to swiftly adjust the model as necessary. This flexibility fosters greater potential for innovation and experimentation.
  3. Enhancement of existing question and answer solutions: Leveraging Databricks Dolly’s structure to integrate question and answer pairs from an existing solution presents an ideal application. For instance, transforming a technical support database structured around Q&A into an interactive experience is feasible with Dolly’s capabilities.

Truly open large language models
Customers have consistently expressed their preference for owning their models, enabling them to develop higher-quality models tailored to their domain-specific applications without compromising their sensitive data to third parties.

It is believed that addressing crucial issues such as bias, accountability, and AI safety requires the involvement of a diverse community of stakeholders rather than solely relying on a few large companies. Open-sourced datasets and models foster commentary, research, and innovation, fostering an environment where everyone can benefit from advancements in artificial intelligence technology.

As a technical and research artifact, Dolly is not expected to be at the forefront in terms of effectiveness. However, both Dolly and the open-source dataset are anticipated to serve as the foundation for numerous subsequent works, potentially catalyzing the development of even more powerful language models.

Use Cases of Databricks Dolly

In their official blog, they introduce some applications and use cases of Databricks Dolly:

  1. Open Q&A: Examples include inquiries such as “Why do individuals enjoy comedy films?” or “What is the capital of France?” Some questions have no definitive answer, while others necessitate drawing from a broad understanding of the world.
  2. Closed Q&A: These questions can be answered solely using the information within a given reference text. For instance, when provided with a paragraph from a Wikipedia article on atoms, one might ask, “What is the ratio of protons to neutrons in the nucleus?”
  3. Extracting information from Wikipedia: In this task, an annotator selects a paragraph from a Wikipedia article and identifies entities or factual details such as weights or measurements contained within the passage.
  4. Summarizing information from Wikipedia: Annotators are tasked with condensing a passage from Wikipedia into a brief summary.
  5. Brainstorming: This involves generating open-ended ideas and providing a list of potential options. For example, “What are some enjoyable activities to do with friends this weekend?”
  6. Classification: Annotators make determinations regarding class membership, such as categorizing items in a list as animals, minerals, or vegetables, or assessing the characteristics of a short passage, such as the sentiment expressed in a movie review.
  7. Creative writing: Tasks in this category include composing poetry or crafting a love letter.

Example of Open QA in databricks-dolly-15k

How to install Databricks Dolly

To download Dolly 2.0 model weights simply visit the Databricks Hugging Face page and visit the Dolly repo on databricks-labs to download the databricks-dolly-15k dataset.

Image description

Limitations of the Databricks Dolly LLM

While the open-sourced model of Databricks Dolly LLM offers numerous advantages, particularly for highly targeted commercial applications, it’s not a universal solution. Compared to larger, closed models, such as ChatGPT, Dolly’s smaller training set may result in less refined outputs. Language limitations are also prevalent; currently, Dolly 2.0 is limited to providing responses only in English.

Deploying an open-source LLM also raises concerns about knowledge requirements. It often demands a thorough understanding of training and utilizing AI solutions, as well as substantial compute resources within the enterprise’s environment. In contrast, closed generative AI models are typically ready for use and integration with custom solutions right out of the box. Balancing considerations regarding commercial viability and resource constraints often dictates the appropriate choice of model.

On the other hand, private information cannot be guaranteed when you are using Open-sourced LLM. If you are looking for a safe, reliable and cost-effective close-sourced LLM, you can choose our LLM:

Image description

And we have already released LLM APIs, which can seamlessly integrate with your LLMs. With Cheapest Pricing and scalable models, Novita AI LLM Inference API empowers your LLM incredible stability and rather low latency in less than 2 seconds.

Image description

Our LLM API also features with character playing. By importing your favorite character card, you can talk anything with him/her.

Image description

Which Large Language Model Is Right for You?

The introduction of Databricks Dolly LLM marks a significant leap in commercially viable open-source large language models tailored for enterprise use. While it may not suit every scenario, its potential at the enterprise level is immense. Whether it’s custom solution development, enhancing existing tools, or pioneering new AI applications, Databricks Dolly merits consideration if your organization requires an open-source model.

Should you have inquiries regarding the suitability of Databricks Dolly 2.0 for your specific context or require assistance in leveraging Dolly LLM effectively, don’t hesitate to reach out to us at Graphable. Our Custom Development services are geared towards helping you conceptualize and realize solutions aligned with your unique enterprise objectives.

Databricks Dolly offers a versatile, robust, and user-friendly platform catering to data analytics, predictive modeling, and automation needs. Its applicability spans across diverse industries and functions, rendering it invaluable for professionals operating within an enterprise environment. As AI continues to advance, Dolly’s role in streamlining and enriching data-driven decision-making is poised to expand further.

Conclusion

In summary, Databricks Dolly LLM offers a significant advancement in open-source large language models tailored for enterprise use. While it may not suit every scenario, its potential for customization and innovation within specific use cases is substantial. Organizations seeking higher-quality models while retaining ownership of their data may find Databricks Dolly to be a compelling option.

However, it’s essential to consider the trade-offs. Open-source models like Databricks Dolly offer flexibility and community-driven innovation but may not always deliver the same level of refinement or language coverage as closed models. For those balancing reliability, safety, and cost-effectiveness, closed-source LLMs present a viable alternative.

Ultimately, the choice between open-source and closed-source LLMs depends on your organization’s specific needs, objectives, and resources. Whether you opt for Databricks Dolly or another model, it’s crucial to align your decision with your unique enterprise requirements to enhance data-driven decision-making effectively.

Originally published at novita.ai
novita.ai, the one-stop platform for limitless creativity that gives you access to 100+ APIs. From image generation and language processing to audio enhancement and video manipulation, cheap pay-as-you-go, it frees you from GPU maintenance hassles while building your own products. Try it for free.

Top comments (0)