DEV Community

Cover image for LlamaHub: The one stop data solution for your LLMs🦙🏠
Adheeban Manoharan
Adheeban Manoharan

Posted on

LlamaHub: The one stop data solution for your LLMs🦙🏠

In the 4th part of this blog series, we'll have a look at the most popular adapters available at llamahub.ai. LLaMahub is a library of data connectors that translate raw data from various sources into vector(s) aiding in ease of access and transformation by LLMs like ChatGPT and Google Bard.

This is an open-source project where the adapters/connectors that are created would slowly get integrated into projects like Langchain and Llama-index. If you are building a custom document bot for your firm or even for your personal use I think you would find this super useful.

Llamahub

Now, lets get started !


For the sake of simplicity in this walkthrough, We will access the adapters of llamahub through Llama-Index

Adapters Assemble 🔨⚡

First up here's how you could load any loader with Llama-Index:

from llama_index import download_loader

Loader = download_loader("<NAME_OF_THE_LOADER>")
Enter fullscreen mode Exit fullscreen mode

the download_loader helper method will make sure to load the mentioned loader along with all the needed dependencies. To check the name of the loader that you'd want to use, visit this documentation. With that out of the way, lets have a look at some of the important data loaders in Llama-Index.

SimpleDirectoryReader:

This is a universal data connector available in the llama-index that should be able to handle pretty much all the file types you throw at it. Under the hood it uses various third party libraries like pypdf2 for PDFs, python-pptx for PPTs etc... The general idea behind this data connector is to read a directory of different file types in entirety or to read single files, doesn't matter what kind of file it is.

SimpleDirectoryReader = download_loader("SimpleDirectoryReader")
documents = SimpleDirectoryReader(input_dir='foo/').load_data()
Enter fullscreen mode Exit fullscreen mode

SimpleWebPageReader:

This is the web equivalent of the SimpleDirectoryReader. This can read any webpage by using BeautifulSoup4.

SimpleWebPageReader = download_loader("SimpleWebPageReader")
documents = SimpleWebPageReader().load_data(urls=['https://foo.com/index'])
Enter fullscreen mode Exit fullscreen mode

YoutubeTranscriptReader:

This loader will attempt to transcript any youtube video that you provide. It will first try to search for any available captions using the youtube-transcript-api and if the captions are not available, it will use OpenAI's whisper API to transcript the video.

YoutubeTranscriptReader = download_loader("YoutubeTranscriptReader")
documents = YoutubeTranscriptReader().load_data(ytlinks=['https://youtu.be/5MuIMqhT8DM'])
Enter fullscreen mode Exit fullscreen mode

The above are some of the loaders that I think have a lot of use cases. But there are more and you can find the complete list here. If you want to load the documents into langchain's Document format, you can use <Loader>.load_langchain_documents() instead of load_data, it will output the documents in the langchain supported format.

Once you load the documents, it is a cake walk from there. You can visit my old blogs in this series on how you could use this as context for the ChatGPT API.

Now, that's about LlamaHub, See ya in the next one 😉

Top comments (0)