DEV Community: Namee

How I Combined Small Language Models to Automate Workflow like Financial Research

Namee — Mon, 29 Apr 2024 14:15:16 +0000

More and more people are recognizing that small language models can provide great, on par results when used in specific workflows.

Clem Delangue, CEO of HuggingFace, even suggested that up to 99% of use cases can be addressed using SLMs in a recent VentureBeat article.

The next wave of Gen AI will be automating workflows

While Chatbots are one of the great use cases for Large Language Models, the next true groundbreaking use case for Gen AI for 2024 will be automating workflows.

Of course, if you are an AI expert reading this, the idea of chaining together various AI agent workflows with prompts is not new. However, what is new and still to be explored much more is the concept of multi-model agentic workflow with small language models (SLMs).

What are Small Language Models?

The definition of SLMs vary depending on who you ask.

In my opinion, an SLM is a model that can run fairly well without a GPU. It seems pretty simplistic but that is my general rule, which means that models that are 7 Billion parameters and under fit this rule today.

As of today, models that are larger than 7B tend to run excruciatingly slowly even quantized without a GPU.

Why use Small Language Models?

There are many reasons to use SLMs:

1) the most obvious - you don't need GPUs;

2) they can be easily run on-prem, in private cloud, on a laptop, or in edge devices;

3) they are more targeted and focused in scope of their training and are much easier to fine-tune; and

4) they are easier to keep track of and audit.

I can honestly add many more, but I will stop here because these are probably the most important ones.

(I will add the caveat here that I am the founder of LLMWare, an open source project providing LLM-based application platform and over 60 SLMs in HuggingFace and that the example to follow uses LLMWare.)

Here is a full end-to-end example of stacking 3 of popular Small Language Models (SLIM Extract, SLIM Summary and BLING Stable LM 3-B) along with 2 popular web services (Yahoo Finance and Wikipedia) to complete a financial research assignment with 30 different information keys, all on your laptop.

The Example Use Case: Combining 3 Different Models and 2 Web Services for Complex Financial Research

Extracting key information from the source text using models.
Performing secondary lookups using extracted information with web services like Yfinance for stock data and Wikipedia for company background information.
Summarizing and structuring the extracted information into a comprehensive dictionary.

Models Used:

slim-extract-tool
slim-summary-tool
bling-stablelm-3b-tool

Web Services Used:

Yfinance for stock ticker information
Wikipedia for company background information

Setup and Imports

First, let's import the necessary libraries and modules for our analysis.

from llmware.util import YFinance
from llmware.models import ModelCatalog
from llmware.parsers import WikiParser

Input Data

Our input for this example is a financial news article about NIKE, Inc. We will extract and analyze information from this text.

text = ("_BEAVERTON, Ore.--(BUSINESS WIRE)--NIKE, Inc. (NYSE:NKE) today reported fiscal 2024 financial results for its "
        "third quarter ended February 29, 2024.) “We are making the necessary adjustments to drive NIKE’s next chapter "
        "of growth Post this Third quarter revenues were slightly up on both a reported and currency-neutral basis* "
        "at $12.4 billion NIKE Direct revenues were $5.4 billion, slightly up on a reported and currency-neutral basis "
        "NIKE Brand Digital sales decreased 3 percent on a reported basis and 4 percent on a currency-neutral basis "
        "Wholesale revenues were $6.6 billion, up 3 percent on a reported and currency-neutral basis Gross margin "
        "increased 150 basis points to 44.8 percent, including a detriment of 50 basis points due to restructuring charges "
        "Selling and administrative expense increased 7 percent to $4.2 billion, including $340 million of restructuring "
        "charges Diluted earnings per share was $0.77, including $0.21 of restructuring charges. Excluding these "
        "charges, Diluted earnings per share would have been $0.98* “We are making the necessary adjustments to "
        "drive NIKE’s next chapter of growth,” said John Donahoe, President & CEO, NIKE, Inc. “We’re encouraged by "
        "the progress we’ve seen, as we build a multiyear cycle of new innovation, sharpen our brand storytelling and "
        "work with our wholesale partners to elevate and grow the marketplace_.")

Step 1: Extract Information from Source Text

We begin by loading the models and extracting key information from the source text. The keys we are interested in include the stock ticker, company name, total revenues, and more.

Load models

model = ModelCatalog().load_model("slim-extract-tool", temperature=0.0, sample=False)
model2 = ModelCatalog().load_model("slim-summary-tool", sample=False, temperature=0.0, max_output=200)
model3 = ModelCatalog().load_model("bling-stablelm-3b-tool", sample=False, temperature=0.0)

research_summary = {}

# Extract information
extract_keys = ["stock ticker", "company name", "total revenues", "restructuring charges", "digital growth", "ceo comment", "quarter end date"]

for key in extract_keys:
    response = model.function_call(text, params=[key])
    dict_key = key.replace(" ", "_")
    if dict_key in response["llm_response"]:
        value = response["llm_response"][dict_key][0]
        research_summary.update({dict_key: value})

Step 2: Secondary Lookups Using Extracted Information

With the extracted information, we perform secondary lookups using the YFinance web service to enrich our data with stock information, financial summaries, and company details.

if "stock_ticker" in research_summary:
    ticker = research_summary["stock_ticker"]
    ticker_core = ticker.split(":")[-1]  # Adjusting ticker format if needed

    # Fetch stock summary information from YFinance
    yf = YFinance().get_stock_summary(ticker=ticker_core)
    print("yahoo finance stock info: ", yf)

    # Update research summary with financial data from YFinance
    financial_keys = ["current_stock_price", "high_ltm", "low_ltm", "trailing_pe", "forward_pe", "volume"]
    for key in financial_keys:
        research_summary.update({key: yf[key]})

    # Fetch detailed financial summary
    yf2 = YFinance().get_financial_summary(ticker=ticker_core)
    print("yahoo finance financial info - ", yf2)
    for key in ["market_cap", "price_to_sales", "revenue_growth", "ebitda", "gross_margin", "currency"]:
        research_summary.update({key: yf2[key]})

Step 3: Use Extracted Company Name for Wikipedia Lookup

Next, we use the extracted company name to fetch background information from Wikipedia. This includes a company overview, founding date, and other relevant details.

if "company_name" in research_summary:
    company_name = research_summary["company_name"]
    wiki_output = WikiParser().add_wiki_topic(company_name, target_results=1)

    # Extract and summarize company overview from Wikipedia
    company_overview = "".join(block["text"] for block in wiki_output["blocks"][:3])
    summary = model2.function_call(company_overview, params=["company history (5)"])
    research_summary.update({"summary": summary["llm_response"]})

    # Extract founding date and company description
    founding_date_response = model.function_call(company_overview, params=["founding date"])
    company_description_response = model.function_call(company_overview, params=["company description"])
    research_summary.update({
        "founding_date": founding_date_response["llm_response"]["founding_date"][0],
        "company_description": company_description_response["llm_response"]["company_description"][0]
    })

    # Direct questions to the model about the company's business and products
    business_overview_response = model3.inference("What is an overview of company's business?", add_context=company_overview)
    origin_of_name_response = model3.inference("What is the origin of the company's name?", add_context=company_overview)
    products_response = model3.inference("What are the product names", add_context=company_overview)
    research_summary.update({
        "business_overview": business_overview_response["llm_response"],
        "origin_of_name": origin_of_name_response["llm_response"],
        "products": products_response["llm_response"]
    })

Step 4: Completed Research - Summary Output

Finally, we display the structured research summary, which includes all the extracted and enriched information.

print("Completed Research - Summary Output\n")
item_counter = 1
for key, value in research_summary.items():
    if isinstance(value, str):
        value = value.replace("\n", "").replace("\r", "").replace("\t", "")
    print(f"\t\t -- {item_counter} - \t - {key.ljust(25)} - {str(value).ljust(40)}")
    item_counter += 1

Here is a video of a tutorial if you are more of a visual learner:

⭐️ Star LLMWare ⭐️

Please be sure to visit our website llmware.ai for more information and updates.

Long Context Windows in LLMs are Deceptive (Lost in the Middle problem)🧐

Namee — Wed, 20 Mar 2024 21:20:17 +0000

It seems like OpenAI and Anthropic have been in a battle of context windows for the better part of a year.

In May of 2023, Anthropic breathlessly announced: "We've expanded Claude's context window from 9K to 100K tokens, corresponding to around 75,000 words!"

(Note: 75,000 words are about 300 pages)

Not to be outdone, OpenAI released its 128K context window in November 2023.

Only then to be outnumbered by Anthropic's 200k context window in March 2024.

Is this back and forth tennis match for context window sizes really necessary?

What are context windows?

A context window is the text range around a target token (a token is about a word) that an LLM can process at the time the information is generated.

People assume that the larger the context window, the more text that can be input to search, for example.

However, long context windows in LLMs are misleading because many users assume that you don't need RAG if the context windows are big enough.

Lost in the Middle Problem

Studies and experiments, however, have shown that long context windows in LLMs provide challenges when looking for a specific fact or text.

The most vivid illustration of this problem for me showed up in this YouTube video.

Here, the experimenter uses a context length of only 2k tokens (remember how GPT-4 has 128k token limit) to search for a simple sentence in the middle that reads:

"Astrofield creates a normal understanding of non-celestial phenomena."

And guess what? About 2/3 of these models fail this test! They literally can't find this sentence in only 2k tokens!

The Winners and Losers

🏆 Here is the list of the models that passed the 2k context window test: ChatGPT Turbo, Open Hermes 2.5 - Mistral 7B, Mistral 7b Instruct (passed once at 10:43 and failed once at 3:47), and Yi 34B Chat

👎 Here is a list of the models that failed the test: Mixtral 8x7B Instruct, Mistral Medium, Claude 2.0, GPT 4 Turbo, Gemini, Mistral 7B Instruct, Zephyr 7B Beta, PPIX 70B, Starling 7B - alpha, Llama 2 - 70B chat, Vicuna 33B and Mixtral 8x7B Instruct

Same Experiment with RAG

Now a small disclaimer about me -- I am the founder of an open source project where we also make models in Hugging Face as well as a platform for LLM-based workflows called LLMWare.

I was inspired to recreate this experiment so we made up a document of about 11,000 tokens (much more than the 2k) about astrophysics, added the sentence that is being queried "Astrofield creates a normal understanding of non-celestial phenomena" somewhere in the middle of the document and ran RAG on our LLMWare platform.

We then tried this against 3 models - LLMWare BLING Tiny Llama 1.1B, LLMWare DRAGON Yi-6b, and also the Zephyr 7B Beta (which had failed the test in the YT video).

Here are some screenshots of the results. As you can see, with RAG and fine-tuning, even a 1.1B model can find the answer.

LLMWare Bling Tiny Llama 1.1B:

Finds this content with no problem. 💯

LLMWare Dragon Yi 6B:

Also finds this content with no problem. 💯

Zephyr 7B Beta (not finetuned for RAG so a little more chatty but still finds it with RAG where it had failed before):

The Lesson: Large Context Windows are Ineffective at Searches

As can be seen by our experiment, even the smallest 1B parameter model can do a better job than GPT-4 Turbo for fact-based searches with RAG. It is much better to use a small model with RAG than to rely on just a large (or in this case, not that large at just 2k tokens) context window IF it is coupled with the right RAG workflow.

I hope this experimentation underscored the importance of a good LLM-based workflow using RAG. If you want to learn about RAG, here is an article I wrote recently in Dev.to to help you get started.

🔥 How to Learn RAG in 2024: Go from Beginner to Expert (Step by Step) 🚀

Namee for LLMWare ・ Mar 4

#beginners #programming #python #ai

So the next time someone tries to impress you with just a long context window, look critically at the surrounding workflow to make sure you are getting the answer you want.

Explore LLMWare on GitHub ⭐️

Please join our LLMWare community on discord to learn more about RAG and LLMs! https://discord.gg/5mx42AGbHm

Please be sure to visit our website llmware.ai for more information and updates.

🛠️How to Go from Software Engineer to AI Developer - What it means for YOU (Insider's View)🤖

Namee — Sat, 09 Mar 2024 14:20:26 +0000

I recently saw a lot of headlines that said Jensen Huang proclaims: 'Don't Teach Your Kids Programming.'

As we are hurtling toward an AI-dominant technocratic world, that was really shocking so I dug in a little more. What Jensen Huang actually said was:

"It is our job to create computing technology such that NOBODY has to program."

Let's unpack this because it is so easy to read only the headlines and panic without actually thinking about what this means for YOU, the software developer, for TODAY (because you need to work to buy food to eat TODAY).

Software Development is a Completely Oversaturated Market

I am the founder of an open source AI start-up called LLMWare.
Every single day, without fail, I get at least 1 if not 3, emails or LinkedIn messages asking me if I need Software Developers. Name the region, price point, skill set.

Starting as low as $5, in every country imaginable, I have been pitched some general software development service.

What have I not been pitched yet? Credible AI Developers.

Don't get me wrong -- almost everyone says they are experts in AI. But when I have interviewed some of these "experts," it was clear to me that they truly didn't know much about AI at all.

(By the way, prompt engineering does not make you an AI expert. Long prompts have already been discredited because they produce inconsistent results so please don't waste time with "AI agents" that rely on the longest prompts of life.)

But then I meet and talk to people everyday who want to build private Chatbots to incorporate into their companies to work with sensitive data. I hear all the time how clients can't find expert people to do real AI work. They are right - true AI experts are really hard to find.

There is a Huge Mismatch in Supply of Software vs. AI Developers

So on the one hand, we have this mismatch of massive numbers of software developers who are looking for work (as is evidenced by the countless agencies who are driving down the price - supply and demand in action). Then on the other hand, I am talking to people who are asking for crazy amounts of money for even front-end developers IF they have some knowledge of AI!

You are Only a Hop, Skip and a Jump away from Changing Your Career

And the craziest part of this from my perspective is that the journey from being a general Software Developer to an AI Developer is not that hard! It's not like you need to learn a special language or some other skill set. You just need PYTHON to get started!

Moreover, there are so many resources out there to help you get started for FREE! We ourselves have put out a series of 7 videos to really teach you Retrieval Augmented Generation (RAG) so that you can learn the fundamental steps in a no BS way. Here is the first of these 7 videos to get started:

And an article in dev.to with more details:

🔥 How to Learn RAG in 2024: Go from Beginner to Expert (Step by Step) 🚀

Namee for LLMWare ・ Mar 4

#beginners #programming #python #ai

And as with all things in life, these tools are going to help you change the trajectory of your life if you put in the work. Or, you can watch helplessly as your teammates and colleagues leap out of the gates with all this AI knowledge.

You are 100% Capable of Becoming an AI Developer

Even more mind-blowing: If you are smart and dedicated enough to be currently making a living as a software developer, I guarantee that you are smart and dedicated enough to learn a few more skills to become an AI developer.

Does this mean that you are going to start making millions of dollars a year to train models for OpenAI? Nope. BUT, I promise you that if you get started and truly grapple with this, based on the fact that you already know how to code, you can start to pivot your career to something that is much more relevant and in demand (meaning more $$$) in the near future.

So back to Jensen Huang's recent quote. Do you want to be a part of the movement that makes it possible for people to interact with AI without coding?

If YES, please get started. Ask questions. Experiment on your own time. Make the investment.

I am seriously rooting for you.

Please check out our Github and leave a star! https://github.com/llmware-ai/llmware

Please be sure to visit our website llmware.ai for more information and updates.

🔥 How to Learn RAG in 2024: Go from Beginner to Expert (Step by Step) 🚀

Namee — Mon, 04 Mar 2024 10:28:32 +0000

Everyone seems to be worried about how AI can take away our jobs.

But it is surprising how very few people have actually gotten into even the fundamental facets of working with AI models in a real practical setting.

By now, most technical people have heard of RAG - Retrieval Augmented Generation. In simple terms, RAG is just a way to link documents or some knowledge source to AI models.

Sounds easy enough if you're thinking of it with let's say 5 documents and ChatGPT. However, if you think about how anyone, or a company, would need to do this with thousands, tens of thousands, or millions of files, it is a different problem.

This is an issue that almost all companies have. That is why I am a huge advocate for everyone having at least a foundational understanding of what RAG is, because it is one of the fundamental pieces of knowledge you need to work with AI models.

Upskill Your AI Knowledge in 2024

This is what inspired us at LLMWare to create free step-by-step videos in YouTube that teach you the foundational elements of RAG, so that in 7 short videos (ranging from 8 to 15 min each), you can learn RAG. Note: Basic Python is a prerequisite.

As with anything in life, if you take it seriously, this will be the launching point of becoming an AI expert. Even if you are not interested in becoming an AI expert, knowing how all the pieces of RAG works will definitely serve you well as many companies will be incorporating these workflows.

Ready to get started? Introduction

This Introduction to RAG video walks you through the basic components of RAG so you can start your AI journey with LLMWare.

1. Parsing, Text Chunking, Indexing

Create your first library and get started with the basic steps. The documents in your library need to be parsed into a uniform format, and separated into smaller texts (chunking) then indexed with all the metadata. This video will walk you through this step.

2. Build Embeddings

What are embeddings, embedding models, vectors and vector databases? In this tutorial, learn the fundamental concepts behind embeddings, embedding models, vectors and vector databases.

You will build your first embeddings with models from Hugging Face to store to a database and use these embeddings to run your queries.

3. Prompt Models

Learn how to prompt models by loading and prompting models from Hugging Face and OpenAI.

Start inferencing models using this example and see how you can check if the model is providing the right answer using the context that was passed. Learn how to capture model's usage data such as token consumption, the output, and the total processing time.

4. RAG with Text Query

With this video, you will start to search. We will be taking some form of knowledge in a library with embeddings and bring the pieces together with a model.

Learn how to put together the right RAG strategy with a thoughtful retrieval and querying strategy combined with the right model to do the job.

5. RAG with Semantic Query

By now, you're really making progress. You are now ready to start semantic searching. Also known as natural language querying, this is where we reap the benefits of embeddings and vector databases.

You will be able to query your knowledge base using natural language to ask questions to derive answers from even the most complex legal documents.

6. RAG with Multi-Step, Hybrid Query

It's Graduation Day! (If you have been following along this far). We will quickly recap through all that you have learned in the previous videos and learn how to use a quantized DRAGON-YI-6b-GGUF model from Hugging Face on a laptop.

Perform multi-step hybrid queries to get the responses you need.

Also learn how to perform evidence verification (guard against model hallucinations) and how to save all the output as JSON or CSV files for future datasets or audits.

Once you are all done with these videos, you are ready to dig in and play with all the really cool aspects of Generative AI, including building complex Agent workflows.

Please check out our Github and leave a star! https://github.com/llmware-ai/llmware

Please be sure to visit our website llmware.ai for more information and updates.

Easy RAG and LLM tutorials to learn how to get started to become an "AI Developer" (Beginner Friendly)

Namee — Mon, 29 Jan 2024 12:26:19 +0000

We have all heard stories of "AI" experts making hundreds of thousands to millions of dollars each year. As an example, the current pay at OpenAI is reported to be a total compensation package of US$916,000 for L5 engineers.

It is estimated that there are 30 million developers, 300,000 Machine Learning engineers, and only 30,000 ML researchers in the world.

Does that mean that only 1% of the developers in the world are qualified to be AI Developers? Do you need to be an expert in ML to qualify as an "AI Developer"?

You do NOT need to be an ML Expert to be a GEN AI Developer

In a broad sense, because of the prevalence of Gen AI, almost every developer will in some sense interact with and use Gen AI in their job. Think of all the code generation apps and image generation apps you may already be using today.

Even if you only use LLMs like ChatGPT and OpenAI APIs to help generate content, when dealing with a large amount of data, you will still need to learn how to automate workflows and to retrieve the right data to send to the LLMs.

I have heard the argument that all developers will soon be AI developers in a general sense. But it will be difficult to command the pay rates that skilled AI Developers can command if you are a mere consumer of AI as opposed to being a skilled developer who can create and enable AI use cases and workflows.

Using resources available for FREE, you can start to learn the fundamentals of Generative AI so that you too can become a skilled AI Developer.

To get started in your AI Developer journey, here are the first steps to help you on your way. 👇

(Remember that Rome wasn't built in one day. You have to start digging in to learn anything, right?)

Learn About RAG

Retrieval Augmented Generation (RAG) is one of the fundamental aspects of Generative AI. RAG is basically a short-hand way of describing the workflow of linking documents or knowledge to LLMs. While this is foundational to using Gen AI in many enterprise workflows, many developers have not yet had first-hand experience with it.

LLMWare: Easy lessons in RAG for AI Beginners

A little about me -- I am the founder of LLMWare. LLMWare is a RAG platform that is designed for both beginners and experts alike.

In 'FAST-START: Learning RAG with LLMWare through 6 Examples', novice AI developers will learn the fundamentals of RAG in Python by:

1) Creating your first library;

2) Building embeddings (an embedding is a way of representing data as points in n-dimensional space so that similar data points cluster together);

3) Learning how to use prompts and models;

4) Text query using RAG;

5) Semantic query using RAG (natural language querying for retrieval); and

6) RAG with multi-step queries.

Explore LLMWare on GitHub ⭐️

In about an hour or two, you will have experimented with all of the basic components in RAG so that you will become familiar with how the pieces (like text parsing, chunking, indexing, vector databases, and LLMs) fit together in a very easy-to-use RAG workflow.

Once you have passed the Fast-Start stage, there are over 50 'cut and paste' recipes to help you dive into the more advanced capabilities around RAG.

Learn about LLMs

Hugging Face Tutorials on LLMs

Once you have learned about RAG, you will need to learn about models.

There are significant benefits to using a pre-trained model. It reduces computation costs, your carbon footprint, and allows you to use state-of-the-art models without having to train one from scratch.

Transformers provides access to thousands of pretrained models for a wide range of tasks. When you use a pretrained model, you train it on a dataset specific to your task. This is known as fine-tuning, an incredibly powerful training technique.

In these tutorials, you will learn to fine-tune a pretrained model with a deep learning framework of your choice:

Fine-tune a pretrained model with 🤗 Transformers Trainer.
Fine-tune a pretrained model in TensorFlow with Keras.
Fine-tune a pretrained model in native PyTorch.

https://huggingface.co/docs/transformers/training

Large Language Model Course by Maxime Labonne

Maxime Labonne is currently a senior applied researcher at JPMorgan Chase who published a beloved open source library that teaches you the true nuts and bolts of LLMs. This is a comprehensive, deep-dive course that starts with the mathematics of Machine Learning (linear algebra, calculus and probability & statistics) all the way to Natural Language Processing.

His LLM course is divided into three parts:

LLM Fundamentals covers essential knowledge about mathematics, Python, and neural networks.
The LLM Scientist focuses on building the best possible LLMs using the latest techniques.
The LLM Engineer focuses on creating LLM-based applications and deploying them.

Although this library is not going to be mastered in an afternoon, it is definitely worth checking out in detail if you want to truly take your AI skills to the next level.

https://github.com/mlabonne/llm-course

I hope you can use these resources to fast track your way to becoming an AI Developer in 2024.

Please join our LLMWare community on discord! https://discord.gg/5mx42AGbHm

Please be sure to visit our website llmware.ai for more information and updates.

📈 Top 5 Beginner-Friendly Open Source Libraries for RAG 🚀

Namee — Sun, 21 Jan 2024 15:53:55 +0000

Introduction to RAG

One of the most valuable skills you can learn today as a developer is learning how to build Retrieval Augmented Generation (RAG) applications using Large Language Models (LLMs).

Why?

Because there are over 64 zettabytes of data in the world and this doesn't even include physical data like books or physical documents. (For your reference, 1 zettabyte is a trillion gigabytes.)

Not only that, 90% percent of the world’s data was created in the last two years, and the volume of data is also doubling in size every two years. So basically, companies are swimming in mountains of data that is getting larger and larger by the day.

How will companies access and use all this data?

Everyone has by now heard of using Retrieval Augmented Generation (RAG) to find information with AI. Being able to access and use the ever-growing volumes of data is a key skill that every company needs.

Even if you know that RAG is basically a short-hand way of describing the workflow of linking documents or knowledge to LLMs, many developers have not tried or experimented with this themselves (yet).

The internet is full of lists of libraries but how to get started?

Here is a short list of the best libraries to help you start with RAG.

1. LLMWare

In LLMWare, you can upload documents, and with a few lines of code, start retrieving information. It handles the entire process that is required in RAG: document ingestion, parsing, chunking, indexing, embedding, storing to vector database, and linking to LLMs to retrieve the answer.

LLMWare is designed to be integrated and end-to-end so all of these steps are accessible out of the box. It assembles all the pieces so you don't have to.

LLMWare makes it very simple and easy to get started:

RAG workflow through end-to-end examples in just a few lines of code
Create a library and load files seamlessly
Generate embeddings effortlessly
Conduct semantic searches with ease
Utilize any Hugging Face model or a closed-source model like GPT-4 to answer questions from the data
Examples include RAG with no-GPU-required models

Disclaimer: I am the founder of LLMWare

⭐️ Star LLMWare ⭐️

2. MongoDB

MongoDB is a widely used, open-source NoSQL database program. It falls under the category of document-oriented databases, which means it stores and organizes data in a format similar to JSON documents. MongoDB is designed to be flexible and scalable, making it suitable for a variety of applications and industries.

Databases, like MongoDB, are a very important step in RAG because they store information, including important metadata, that is extracted from the document or knowledge base, before embeddings.

⭐️ Star MongoDB ⭐️

3. Milvus Vector DB

Milvus is an open-source vector database built to power embedding similarity search and AI applications. Milvus makes unstructured data search more accessible, and provides a consistent user experience regardless of the deployment environment.

Milvus DB, or a similar vector DB, is a crucial step in RAG. It is where vector embeddings are stored for similarity searches. This database allows people to ask questions in natural language and retrieve related results. Without a good embedding and vector DB, the LLM models will not be able to receive the right chunks of text to read.

⭐️ Star Milvus ⭐️

4. Hugging Face

If you haven't visited Hugging Face, you really should. It is THE place to go for all open-source models and single-handedly saving the world from AI monopolies. Like Github is for open source Projects, Hugging Face is for open source Models. There are over 450,000 models, all FREE, for anyone who wants to use them.

Hugging Face's Transformers Library is the Go-To library that
provides thousands of pretrained models to perform tasks on different modalities such as text, vision, and audio.

These models can be applied on:

Text, for tasks like text classification, information extraction, question answering, summarization, translation, and text generation, in over 100 languages.
Images, for tasks like image classification, object detection, and segmentation.
Audio, for tasks like speech recognition and audio classification.

Transformer models can also perform tasks on several modalities combined, such as table question answering, optical character recognition, information extraction from scanned documents, video classification, and visual question answering.

⭐️ Star Hugging Face ⭐️

5. Llama.cpp

No GPU? No problem!
Llama.cpp to the rescue!

The main goal of llama.cpp is to run the LLaMA model using 4-bit integer quantization on a MacBook.

Plain C/C++ implementation without dependencies
Apple silicon first-class citizen - optimized via ARM NEON, Accelerate and Metal frameworks
AVX, AVX2 and AVX512 support for x86 architectures
Mixed F16 / F32 precision
2-bit, 3-bit, 4-bit, 5-bit, 6-bit and 8-bit integer quantization support
CUDA, Metal and OpenCL GPU backend support

Once quantized, larger models can be run on CPUs with very little performance loss. Look for GGUF versions of models to try with LLMWare or other RAG workflow.

⭐️ Star Llama.cpp ⭐️

This is a VERY basic overview to get you started with RAG. If you want an integrated solution that is a one-stop shop for all of these libraries seamlessly working together, visit LLMWare's GitHub library to find over 50 great examples to help you get started.

Find us in discord - we would love to hear from you!

Please be sure to visit our website llmware.ai for more information and updates.