Embedchain: Building LLM-Powered Bots with Ease
Embedchain is a powerful framework designed to simplify the process of creating language model (LLM) powered bots using any dataset. It provides an abstraction layer that handles dataset loading, chunking, embedding creation, and storage in a vector database.
By using the .add
and .add_local
functions, you can easily add single or multiple datasets to your bot. Then, you can utilize the .query
function to retrieve answers from the added datasets.
Let's say you want to create a bot based on Naval Ravikant, incorporating one YouTube video, one book in PDF format, two blog posts, and a question and answer pair. With Embedchain, all you need to do is provide the links to the videos, PDF, and blog posts, as well as the Q&A pair. Embedchain will handle the rest, creating a bot tailored to your specifications.
from embedchain import App
naval_chat_bot = App()
# Embed Online Resources
naval_chat_bot.add("youtube_video", "https://www.youtube.com/watch?v=3qHkcs3kG44")
naval_chat_bot.add("pdf_file", "https://navalmanack.s3.amazonaws.com/Eric-Jorgenson_The-Almanack-of-Naval-Ravikant_Final.pdf")
naval_chat_bot.add("web_page", "https://nav.al/feedback")
naval_chat_bot.add("web_page", "https://nav.al/agi")
# Embed Local Resources
naval_chat_bot.add_local("qna_pair", ("Who is Naval Ravikant?", "Naval Ravikant is an Indian-American entrepreneur and investor."))
naval_chat_bot.query("What unique capacity does Naval argue humans possess when it comes to understanding explanations or concepts?")
# Answer: Naval argues that humans possess the unique capacity to understand explanations or concepts to the maximum extent possible in this physical reality.
Getting Started
Installation
First, make sure you have the Embedchain package installed. If not, you can install it via pip
:
pip install embedchain
Usage
To get started with Embedchain, you'll need an OpenAI account and an API key. If you don't have an API key, you can create one by visiting this link.
Once you have your API key, set it as an environment variable named OPENAI_API_KEY
:
import os
os.environ["OPENAI_API_KEY"] = "sk-xxxx"
Next, import the App
class from Embedchain and use the .add
function to add datasets to your bot:
from embedchain import App
naval_chat_bot = App()
# Embed Online Resources
naval_chat_bot.add("youtube_video", "https://www.youtube.com/watch?v=3qHkcs3kG44")
naval_chat_bot.add("pdf_file", "https://navalmanack.s3.amazonaws.com/Eric-Jorgenson_The-Almanack-of-Naval-Ravikant_Final.pdf")
naval_chat_bot.add("web_page", "https://nav.al/feedback")
naval_chat_bot.add("web_page", "https://nav.al/agi")
# Embed Local Resources
naval_chat_bot.add_local("qna_pair", ("Who is Naval Ravikant?", "Nav
al Ravikant is an Indian-American entrepreneur and investor."))
If there is another app instance in your script or app, you can change the import as follows:
from embedchain import App as EmbedChainApp
# or
from embedchain import App as ECApp
Now that your app is created, you can use the .query
function to retrieve answers for any query:
print(naval_chat_bot.query("What unique capacity does Naval argue humans possess when it comes to understanding explanations or concepts?"))
# Answer: Naval argues that humans possess the unique capacity to understand explanations or concepts to the maximum extent possible in this physical reality.
Supported Formats
Embedchain supports the following formats for dataset embedding:
YouTube Video
To add a YouTube video to your app, use the data type (first argument to .add
) as "youtube_video"
. For example:
app.add('youtube_video', 'a_valid_youtube_url_here')
PDF File
To add a PDF file, use the data type as "pdf_file"
. For example:
app.add('pdf_file', 'a_valid_url_where_pdf_file_can_be_accessed')
Note that password-protected PDFs are not supported.
Web Page
To add a web page, use the data type as "web_page"
. For example:
app.add('web_page', 'a_valid_web_page_url')
Text
To supply your own text, use the data type as "text"
and enter a string. The text is not processed, making it highly versatile. For example:
app.add_local('text', 'Seek wealth, not money or status. Wealth is having assets that earn while you sleep. Money is how we transfer time and wealth. Status is your place in the social hierarchy.')
Note: This example is not used in the provided code snippets because it's more common to supply a whole paragraph or file, which couldn't fit in the code examples.
Q&A Pair
To supply your own question and answer pair, use the data type as "qna_pair"
and enter a tuple. For example:
app.add_local('qna_pair', ("Question", "Answer"))
More Formats Coming Soon
If you want to add any other format, please create an issue, and we will consider adding it to the list of supported formats.
How Does It Work?
Creating a chat bot based on a dataset involves several steps, each with its own nuances. Embedchain simplifies this process and provides a straightforward interface to create bots over any dataset.
The steps involved in creating and querying a bot are as follows:
- Load the Data : Load the dataset into the bot.
- Create Meaningful Chunks : Break the data into meaningful chunks, determining the appropriate chunk size.
- Create Embeddings : Generate embeddings for each chunk using an embedding model.
- Store Chunks in a Vector Database : Store the chunks, along with their embeddings, in a vector database.
- Query the Bot : When a user asks a query, create an embedding for the query and retrieve similar documents from the vector database.
- Obtain the Answer : Pass the similar documents as context to the LLM and obtain the final answer.
Embedchain takes care of these steps and handles the underlying complexities. It provides a simplified interface to create bots over any dataset, allowing you to focus on building and deploying your application quickly.
Tech Stack
Embedchain is built on the following technology stack:
- [Langchain](https://github.com/hwchase17
/langchain): An LLM (Language Model) framework used to load, chunk, and index data.
- OpenAI's Ada embedding model: An embedding model provided by OpenAI used to generate embeddings.
- OpenAI's ChatGPT API: An LLM provided by OpenAI used to generate answers given a context.
- Chroma: A vector database used to store embeddings.
Author
- Taranjeet Singh (@taranjeetio)
Embedchain simplifies the process of creating language-powered bots over any dataset. With its easy-to-use framework, you can quickly build bots that leverage the power of language models to provide answers and insights. Whether you want to create a chatbot for a specific domain or build a knowledge base for a particular topic, Embedchain can help you streamline the development process. Get started with Embedchain today and unlock the potential of language-powered bots in your applications.
Top comments (0)