DEV Community: Julia Zhou

How I Learned Generative AI in Two Weeks (and You Can Too): Part 3 - Prompts & Models

Julia Zhou — Wed, 14 May 2025 12:05:49 +0000

Introduction

It's been a few months since the last iteration in this series, but new year, more LLMWare Fast Start to RAG examples! In the previous articles, we covered creating libraries and transforming this information into embeddings. Now that we have done the heavy lifting, so to speak, we are ready to begin writing prompts and getting responses. This will be the focus of today's article.

Extra resources

A few notes before we start! In case you missed them, I will link the previous articles in this series. This example will build upon example 1 and example 2 and will assume prior understanding of these topics.

For visual learners, here is a video that works through example 3. Feel free to watch the video before following the steps in this article. Also, here is a Python Notebook that breaks down this example's code alongside the output.

Notebook for example 3: prompts and models

The code

Now, we are ready to take a look at the example's code! This LLMWare Faststart example can be run in the same way as the previous ones, but instructions can be found in our README file if needed. Example 3 is directly copy-paste ready!

Code for example 3: prompts and models

Part 1 - What are prompts?

While working through this example, I read a MIT Sloan Teaching & Learning Technologies article titled "Effective Prompts for AI: The Essentials". The entire article is definitely worth a read, but I wanted to share a quote that summarizes what prompts are in the AI world. To read the whole article, check out this link.

Prompts are your input into the AI system to obtain specific results. In other words, prompts are conversation starters: what and how you tell something to the AI for it to respond in a way that generates useful responses for you ... It’s like having a conversation with another person, only in this case the conversation is text-based, and your interlocutor is AI.

In other words, the prompt you provide determines how the AI responds. To create the most effective prompts, use specific wording and consider providing context, including in the form of additional text paragraphs.

Part 2 - Which model should I use?

The model catalog is a list of all the models LLMWare has registered. Like a dictionary, each model in the catalog is automatically linked with configuration data and implementation classes for easy use. The goal of this catalog is exactly this: ease of use. When provided with only the model's name, if it is present in the catalog, it should be able to run without any other information.

The following lines of code provide lists of models included in the catalog. More information about the capabilities and performances of these models is included as comments in the Python code file.

#   all generative models
llm_models = ModelCatalog().list_generative_models()

#   if you only want to see the local models
llm_local_models = ModelCatalog().list_generative_local_models()

#   to see only the open source models
llm_open_source_models = ModelCatalog().list_open_source_models()

The following line of code selections a model by index. To choose a different model, simply replace the index value.

model_name = gguf_generative_models[0]

Alternatively, we can choose a specific model by name. For those interested in exploring RAG through OpenAI, all of the LLMWare examples are ready to use. In this particular example, uncomment the following lines and insert the necessary information.

#   model_name = "gpt-4"
#   os.environ["USER_MANAGED_OPENAI_API_KEY"] = "<insert-your-openai-key>"

However, these examples also encourage the use of open-source, models. These are locally deployed models that produce top-notch quality right on your laptop. The developments in regards to open-source over the past few years cannot be overstated. The future of AI is here in these small, specialized models optimized for a specific purpose.

For example, the LLMWare Bling 1B is a small, fast model fine-tuned to RAG that runs on your local machine.

To learn more about LLMWare's Bling and Dragon models, consider visiting their Hugging Face page!

Part 3 - Main example script

Now, we can head to the main example script, fast_start_prompting. We will follow four general steps:

Pull sample questions
Load the model
Prompt the model
Get results

The sample questions (each with query, answer, and context) are found at the top of the Python file. They cover a variety of fields with a little extra emphasis on business, financial, and legal applications. However, it is always encouraged to change these questions or add to them to better suit your interests and needs! All of the questions will be pulled in through the test_list.

To use the model, we create a prompt object. Prompts are what we do to a model: we use them when we have a question in context and want to pass it to the model to receive a response. This line of code loads the model:

prompter = Prompt().load_model(model_name)

The first time we load the model, it needs to "move" from the LLMWare Hugging Face repository to your local system, which can take a few minutes. However, once that is complete, all the work the model does will happen locally on your computer!

Now, we loop through our list of questions. The key method .prompt_main in the prompt class causes inference on the model. The mandatory parameter for this method is the query. Optionally, context, prompt_name, and temperature can also be passed in.

output = prompter.prompt_main(entries["query"],
                                      context=entries["context"],
                                      prompt_name="default_with_context",
                                      temperature=0.30)

The context is a passage of information we want the model to read before answering the question. This allows us to explain what we want the model to consider in its answer, and it will answer based on the passage.

The prompt catalog supports a range of prompt names. The code uses default_with_context, which tells the model to read the provided context and answer the question.

Adjusting the temperature will change the results of the query. In general, a lower temperature will yield more factual responses directly relating to the context. Higher temperatures are more appropriate when we require a more creative response from the model. For RAG based applications, we set the temperature comparatively low to yield the the most consistency and quality.

The output is a dictionary with two keys: llm_response and usage.

Part 4 - Running the model

Once you run the code, you will see the queries being iterated through and printed out. Each of these print-outs has an LLM Response and a Gold Answer. The LLM Response is the model's response while the Gold Answer is an "answer key" we created that the model does not see. This allows us to quickly compare the two answers and check for the model's accuracy.

The LLM Usage line provides additional information about how the model formulated its response. In particular, you can see the "processing_time" for each query, which showcases the model's speed. Of course, the computer you run the models on will also cause speed to vary - the amount of RAM available is especially impactful for efficiency.

1. Query: What is the total amount of the invoice?
LLM Response: 22,500.00
Gold Answer: $22,500.00
LLM Usage: {'input': 209, 'output': 9, 'total': 218, 'metric': 'tokens', 'processing_time': 2.0669240951538086}

The above output is a sample response. The LLM correctly responded to the query since its response matches the gold answer.

We have successfully received answers to our questions! Congrats on reaching the end of this example. Here is a link to the full working code!

FULL CODE

Part 5 - Further exploration

To experiment more with this example, consider changing out the model_name for other models! How does the LLMWare Bling model compare to the LLMWare Dragon model or OpenAI? Will these models generate the same response when provided the same queries and context? Once you try out these questions, let us know what you think!

I hope you enjoyed this example about prompts and models! The next example will be about RAG text query, stay tuned for the article.

Happy coding!

To see more ...

Please join our LLMWare community on discord to learn more about RAG / LLMs and share your thoughts! https://discord.gg/5mx42AGbHm

Visit LLMWare's Website

Explore LLMWare on GitHub

Image from Freepik

How I Learned Generative AI in Two Weeks (and You Can Too): Part 2 - Embeddings

Julia Zhou — Fri, 11 Oct 2024 11:41:08 +0000

Introduction

A few weeks ago, I shared my experience learning about Generative AI Libraries through LLMWare's Fast Start to RAG example 1. Today, I will continue this series by taking you through example 2. This is personally one of my favorite "lessons" in this LLMWare series, so I hope you will find it thought-provoking as well! This example will focus on embeddings and vectors. Let us start by exploring what exactly these terms mean!

How do embedding models work?

Embeddings models are trained on large amounts of language tokens to either predict the next token or fill in missing tokens. In either case, these models learn how to represent language! They take in large chunks of text as input and processes it through tokenization (breaking down into smaller pieces), conversion into numbers, and various layers of transformations. These steps build a representation of the input text to help formulate the output: vectors.

Vectors are created when the input text is translated into the language through which the model sees the world. Geometrically speaking, they are n-dimensional shapes where "n" is the number of embedding dimensions (typically, n is 768). The dimensions are represented by n floats, usually ranging between 0 and 1 or -1 and 1. Converting the text to numbers allows the model to more easily compare the similarity of two texts.

Try thinking back to high school geometry! You might remember that two points (or shapes) that are close to each other are considered more similar to one another than two far away points. This process is exactly what the model performs to compare texts and is known as a semantic search. Once a query is converted to a vector, that vector is compared to all the other vectors in the database. The ones that are the most similar are returned.

Now, we are ready to take a look at the example's code! This LLMWare Faststart example can be run in the same way as example 1, but instructions can be found in our README file if needed. Example 2 is directly copy-paste ready!

Example 2: Embeddings

Extra resources

In case you missed it, I will link my previous article in this series since this example will continue building on the foundation we built in example 1. The same process for creating libraries is utilized in example 2, so I will skip over it here.

Article - Example 1: Libraries

For visual learners, here is a video that works through example 2. Feel free to watch the video before following the steps in this article. Also, here is a Python Notebook that breaks down this example's code alongside the output.

Example 2 Notebook

Part 1 - Creating embeddings & storing vectors

As mentioned above, we will not cover the library building process in this article and will move directly into embedding models. For this demo, we will use the "mini-lm-sbert" model, which is efficient and is included in the default LLMWare package. Feel free to experiment with different models, including the OpenAI Text Embedding Ada!

Recall that in example 1, we not only created our library but also added our documents into a database. This database will make it extremely convenient to access test chunks that we can give to the embedding model.

Once the library has been created, let us focus our attention on the most important line of code:

library.install_new_embedding(embedding_model_name=embedding_model, vector_db=vector_db,batch_size=100)

This line calls the install_new_embedding function and passes in the embedding model and vector names as parameters. The final parameter batch_size determines how many text chunks will be processed at a time. Considerations like efficiency, memory, model capability, and database size all factor into choosing the most appropriate batch size.

We can confirm that our embedding creation and vector storage was a success!

update = Status().get_embedding_status(library_name, embedding_model)
print("update: Embeddings Complete - Status() check at end of embedding - ", update)

Part 2 - Queries

Now that we have the vector database, we can begin running queries on it! We will begin by creating a very simple query before passing it into the library and running a semantic query model on it.

sample_query = "incentive compensation"
query_results = Query(library).semantic_query(sample_query, result_count=20)

We will use the following portion of code to iterate through the query results to view them, and we will especially look at the distance parameter.

for i, entries in enumerate(query_results):
  text = entries["text"]
  document_source = entries["file_source"]
  page_num = entries["page_num"]
  vector_distance = entries["distance"]

  if len(text) > 125: text = text[0:125] + " ... "

  print("\nupdate: query results - {} - document - {} - page num - {} distance - {} ".format(i, document_source, page_num, vector_distance))

  print("update: text sample - ", text)

Let us run the example to see the results in action!

Part 3 - The results

Through the output, we can see that at first, we have no embeddings.

embedding record - before embedding  [{'embedding_status': 'no', 'embedding_model': 'none', 'embedding_db': 'none', 'embedded_blocks': 0, 'embedding_dims': 0, 'time_stamp': 'NA'}]

Then, there are a series of outputs showing that we are creating embeddings in batches of 100, as expected. By the end, all of the text chunks will be converted to vectors.

update: Embeddings Complete - Status() check at end of embedding -  [{'_id': 2, 'key': 'example2_library_embedding_mini-lm-sbert', 'summary': '2211 of 2211 blocks', 'start_time': '1717690179.087806', 'end_time': '1717690199.5373614', 'total': 2211, 'current': 2211, 'units': 'blocks'}]

Now, we have arrived back at the query result for-loop mentioned above. Looking at the first result, we can see that one, among many, of the outputted metadata points is distance. This distance value can be considered the distance between the vector for our query ("incentive compensation") and the vector for this sample block.

update: query results - 0 - document - Artemis Poseidon EXECUTIVE EMPLOYMENT AGREEMENT.pdf - page num - 4 distance - 0.24837934970855713

The query results are sorted from lowest to highest distance - that is, from most to least similar. For comparison, we can see that the tenth query result returned has a higher distance than the first one!

update: query results - 10 - document - Eileithyia EXECUTIVE EMPLOYMENT AGREEMENT.pdf - page num - 3 distance - 0.27305811643600464 
update: text sample -  in Employer's annual cash incentive   bonus plan (the “Plan”), based on the same terms and conditions as in existence for oth ...

Part 4 - Further exploration

For this example, we used the "faiss" vector database, but I encourage you to experiment with others as well.

Similarly, try using different embedding models to see how their characteristics might be optimized for certain types of inputs! A series of examples involving embeddings can be found on the LLMWare Github page.

Embeddings Examples

I hope you enjoyed this example about embeddings and vectors! The next example will be about prompts and models, stay tuned for the article.

Happy coding!

To see more ...

Please join our LLMWare community on discord to learn more about RAG and LLMs! https://discord.gg/5mx42AGbHm

Visit LLMWare's Website

Explore LLMWare on GitHub

Image from Freepik

How I Learned Generative AI in Two Weeks (and You Can Too): Part 1 - Libraries

Julia Zhou — Thu, 12 Sep 2024 21:54:51 +0000

Introduction

For reference, prior to this journey, I barely had more knowledge about AI than the average person. Sure, I fired off the occasional ChatGPT request for one task or another, but I was always more focused on coding than AI, having picked up Python and Java during quarantine.

Despite my initial skepticism at being able to successfully understand the examples, particularly in a short time frame, I found LLMWare's "Fast Start to RAG" series highly accessible. I will cover example one of the course in this article - hopefully it can help you as well! If you are interested in learning more about LLMWare, feel free to check out our website as well as another DEV article outlining the Fast Start to RAG examples.

To clarify, extensive knowledge of coding, specifically Python 3, is not necessarily a prerequisite for the examples that I used to get my start in AI and RAG. However, basic understanding is certainly helpful in comprehending content and parsing code.

Getting started

To run these examples, you will need to install the LLMWare package by running pip3 install llmware in the command line. Further instructions can be found in our README file. Then, you will be able to run example 1, which is directly copy-paste ready.

I will also point out that the AI community tends to use acronyms (like AI itself!) and technical language extending beyond the scope of everyday conversation. The acronym "RAG" stands for Retrieval Augmented Generation, which enhances outputs of LLMs (Large Language Models) using external knowledge. In Example 1, we will be focusing on the first step in RAG - converting a pile of files into an AI ready knowledge base.

Extra resources

For visual learners, here is a video that works through example 1. Feel free to watch the video before following the steps in this article. Also, here is a Python Notebook that breaks down this example's code alongside the output: Example 1 Notebook.

Part 1 - Execution configuration

By default, the active database being used is called "mongo", but we will select "sqlite" since it does not require a separate installation.

Additionally, we can use different debug mode options to see more or less information as it is processed. We can set debug_mode to 2 for more detailed outputs compared to 0, the default.

For this example, sample data sets are imported through from llmware.setup import Setup and are stored in sample_folders. These sets include documents of different subject matters and sizes, but you will be able to replace them with your own data as well. We can choose a name for our library (go ahead and customize!) and select a folder from the samples before running the main script.

LLMWareConfig().set_active_db("sqlite")

LLMWareConfig().set_config("debug_mode", 2)

sample_folders = ["Agreements", "Invoices", "UN-Resolutions-500", "SmallLibrary", "FinDocs", "AgreementsLarge"]
library_name = "example1_library"
selected_folder = sample_folders[0]     # e.g., "Agreements"

output = parsing_documents_into_library(library_name, selected_folder)

Part 2 - Main body

Step 1: Now, we can create our library! This line of code will set up the database tables as well as supporting file repositories to store information about the library.

library = Library().create_new_library(library_name)

Steps 2 and 3: However, our library is still completely empty, so we need to fill it up. To do so, we will load in the LLMWare sample files and save them in sample_files_path. If you are using your own data sets, you will need to point to a local folder path with your documents.

sample_files_path = Setup().load_sample_files(over_write=False)
ingestion_folder_path = os.path.join(sample_files_path, sample_folder)

Step 4: While adding files to a library, LLMWare performs parsing, text chunking, and indexing in the sqlite database. It will automatically choose the correct parser based on a file's extension type. This parser will extract information to store in database text chunks. Although this may seem like a lot of steps, it all happens incredibly quickly behind the scenes!

parsing_output = library.add_files(ingestion_folder_path)

Step 5: To check our progress, we can look at the updated_library_card, which contains key metadata, counting data, and other important information. This .get_library_card() method can be called at any time to retrieve information about your library,

updated_library_card = library.get_library_card()
doc_count = updated_library_card["documents"]
block_count = updated_library_card["blocks"]

Steps 6 and 7: We can check the library's main folder structure, but the library is ready to start running queries! We will do this by instantiating a Query object and passing it to the library. This test_query may need to be adjusted to best suit the data set. For this example, we chose the "Agreements" sample set, so we can use "base salary" as a "hello world"-esque query.

Now, a text query is going to be run to look at every chunk of text to find the ones that contain "base salary" to return. The Query class contains many methods for different Query types. Today, we will use the simplest text_query method.

query_results = Query(library).text_query(test_query, result_count=10)

We can print out our results, giving us a look at the metadata and attributes of the individual text blocks we created!

for i, result in enumerate(query_results):
        #   here are a few useful attributes
        text = result["text"]
        file_source = result["file_source"]
        page_number = result["page_num"]
        doc_id = result["doc_ID"]
        block_id = result["block_ID"]
        matches = result["matches"]

        print("query results: ", i, result)

Part 3 - The results

The outputted summary will include key information such as total pdf files processed, total blocks created, total pages added, and time elapsed. Try and see if you can find all of them!

In particular, the LLMWare package includes "C based parsers" that are able to quickly and efficiently parse files. Once completed, the parsed information will be outputted as a dictionary. You will see the results of your work in the previous steps!

To summarize, we took our documents and broke them down into thousands of blocks. Then, we extracted text information and put it into the sqlite database. Lastly, we ran a text search against that data to retrieve our results (including details as small as pixel coordinates and character level matches!).

You just completed your first example, but there is so much more for you to explore! I would suggest rerunning this example with varied data sets to tap into the true potential of this technology, and of course, continue onto example 2 about building embeddings!

Happy coding!

Part 4 - To see more ...

Please join our LLMWare community on discord to learn more about RAG and LLMs! https://discord.gg/5mx42AGbHm

Visit LLMWare's Website

Explore LLMWare on GitHub

Image by DC Studio on Freepik