<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Robert Boys</title>
    <description>The latest articles on DEV Community by Robert Boys (@robnotbob).</description>
    <link>https://dev.to/robnotbob</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3914766%2Ffbd6c1f3-3e98-499e-bbd0-2adb741bd067.png</url>
      <title>DEV Community: Robert Boys</title>
      <link>https://dev.to/robnotbob</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/robnotbob"/>
    <language>en</language>
    <item>
      <title>A 19th Century Author Taught Me RAG.</title>
      <dc:creator>Robert Boys</dc:creator>
      <pubDate>Sat, 09 May 2026 14:51:08 +0000</pubDate>
      <link>https://dev.to/robnotbob/a-19th-century-author-taught-me-rag-346h</link>
      <guid>https://dev.to/robnotbob/a-19th-century-author-taught-me-rag-346h</guid>
      <description>&lt;p&gt;I asked a 14 billion parameter LLM to remember a short story by Nathaniel Hawthorne and it told me it was written by Edith Wharton. This was a good thing, because I wanted to see first hand how Retrieval Augmented Generation (RAG) really works. I am writing this post to share the process and my results, in case there are others who are new to RAG and would like to read a case study on it.&lt;/p&gt;

&lt;p&gt;RAG is a method to improve the accuracy of LLM output. It creates and uses a separate database that the LLM refers to for additional information that it either lacks or has insufficient knowledge in its pretraining memory. The relevant context is pulled from the database and silently added to the user's prompt. With this additional information, the LLM has more to work with and, hopefully produce a better result. There are several design decisions in developing this process, which we will explore in a hands-on experiment using entirely local tools.&lt;/p&gt;

&lt;p&gt;This study was intentionally old school in the set-up. - I used a Large Language Model (LLM) from the days of AI antiquity, 2024, to run on my local laptop. Specifically, the study was built entirely in Python using ChromaDB 1.5.2 as the vector database, nomic-embed-text for embeddings, and phi4 via Ollama as the LLM. All code can be found on my GitHub at &lt;a href="https://github.com/RobNotBob/rag-rappaccini" rel="noopener noreferrer"&gt;https://github.com/RobNotBob/rag-rappaccini&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Text
&lt;/h2&gt;

&lt;p&gt;My first decision was to select a source text to evaluate. I soon settled on the short story, Rappaccini's Daughter by Nathaniel Hawthorne.&lt;/p&gt;

&lt;p&gt;Published in 1844, Rappaccini’s Daughter is the tragic love story of a young university student, Giovanni Guasconti. The young man comes to Southern Italy to study at the same school of Prof. Baglioni, who was a good friend of his father. The property adjacent to his room is an elaborate garden lush with exotic plants. He observes that the garden is presided over by a pale and stern-faced man, Dr. Rappaccini, and his beautiful daughter. &lt;/p&gt;

&lt;p&gt;He falls in love with the girl but she warns him not to enter the garden or touch her. We learn the garden is actually full of highly toxic plants Dr. Rappaccini uses in his medical experiments. His daughter grew up among the plants and so is immune to their toxic nature. Despite the warnings of Prof. Baglioni, Giovanni is driven by his love to get closer to her. When her fingers accidentally touch his wrist, his skin breaks out into a horrible rash. Dr. Baglioni, observing how the toxins are affecting the son of his friend, offers an antidote for both him and Rappaccini’s daughter. He is cured but when she drinks it she dies. &lt;/p&gt;

&lt;p&gt;It is a sad story, but one that was well suited to this study as it is in the public domain, easily available, and was more obscure than, say, Shakespeare or Charles Dickens. This meant that the  degree of phi4's pretraining memory for this story would be low. &lt;/p&gt;

&lt;p&gt;I obtained the text from the Project Gutenberg website shown below. The story is contained in the book &lt;em&gt;Mosses from an Old Manse&lt;/em&gt; published in 1846. This was an important step as the file was not cluttered with formatting scripts but had clear headings included. This made it easier to extract and prepare the document for RAG processing.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.gutenberg.org/files/512/512-h/512-h.htm#rappaccini" rel="noopener noreferrer"&gt;https://www.gutenberg.org/files/512/512-h/512-h.htm#rappaccini&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Baseline Performance
&lt;/h2&gt;

&lt;p&gt;Before performing a RAG development, I need to assess whether such a process is even needed. The LLM will need to rely on its pretraining content to respond to my prompts. The following are some of the important evaluations to be considered.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Recitation&lt;/strong&gt;. I prompted the phi4 LLM, "Recite the first four sentences of Rappaccini's Daughter by Nathaniel Hawthorne." The response said the correct author was Edith Wharton and then proceeded to hallucinate the entire quote.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Factual Recall&lt;/strong&gt;. I asked a series of verifiable questions from the short story and examined the responses. Out of ten basic questions it only missed one outright - where Giovanni came from. There were two other responses that were either mis-framed or anachronistic (calling the plants "genetically modified" from a 19th Century text that knew no such thing). Of the five harder questions it provided either fully or partial wrong answers to all but one of them. The most egregious was saying Dr. Rappaccini's reputation among scientists was that he turned humans into plants. The pattern of phi4 is that it anchors on something true and then fills in the gaps confidently and thus can hallucinate. For simple facts, the LLM is mostly correct but with scene details it is consistently wrong or padded with errors. Note the LLM never once signaled uncertainty even though I prompted it to do so.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Plausible Distractors&lt;/strong&gt;. I then tested the LLM by providing two different plot points - one real and one plausible but false one. This is to assess whether the LLM "knows" the text or is simply pattern-matching based on the literary genre. Phi4 performed much better on distractors than on factual recall. This test presents two anchors, which makes it easier for the LLM to identify the more plausible option, whereas factual recall requires a specific detail from nothing. A RAG process will help with that drawback.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Intentional Error Detection&lt;/strong&gt;. The final test provided a statement with an error and asked the LLM to both find the error and correct it. Phi4 answered all five questions correctly, which illustrates that when given an anchor, the LLM performs well.&lt;/p&gt;

&lt;p&gt;The conclusion of all these tests is that the LLM struggles to answer non-simple factual questions or recite specific lines of text. When given an anchor, the LLM can perform much better. This is a concern because the prompter will not always know enough to provide the proper anchor. A RAG process could indeed improve this performance.&lt;/p&gt;

&lt;p&gt;Here is the list of questions that the LLM failed during the baseline testing:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Recite the first 4 sentences.&lt;/li&gt;
&lt;li&gt;Question 5: What country does Giovanni come from?&lt;/li&gt;
&lt;li&gt;Question 12: What flower does Rappaccini give Beatrice when she enters the garden?&lt;/li&gt;
&lt;li&gt;Question 14: What does Professor Baglioni tell Giovanni about Rappaccini's reputation among scientists?&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Document Preparation
&lt;/h2&gt;

&lt;p&gt;Now that I have a good baseline and confirmed the need for improvement, I need to prepare the text document for use in the LLM. This is a significant first step for the overall performance of RAG. The more a document or data file is cluttered with irrelevant formatting or information, the more limiting or difficult it can be for chunking strategies (more on this later). Fortunately, using the Project Gutenberg website, the text was clean and the only real effort was to extract the Rappaccini's Daughter story from a compilation of stories within the single text document.&lt;/p&gt;

&lt;p&gt;Here is an example of what the text looked like when downloaded from Project Gutenberg:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight html"&gt;&lt;code&gt;&lt;span class="nt"&gt;&amp;lt;h2&amp;gt;&amp;lt;a&lt;/span&gt; &lt;span class="na"&gt;name=&lt;/span&gt;&lt;span class="s"&gt;"chap04"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&amp;lt;/a&amp;gt;&lt;/span&gt;
YOUNG GOODMAN BROWN&lt;span class="nt"&gt;&amp;lt;/h2&amp;gt;&lt;/span&gt;
--
&lt;span class="nt"&gt;&amp;lt;/div&amp;gt;&lt;/span&gt;&lt;span class="c"&gt;&amp;lt;!--end chapter--&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;div&lt;/span&gt; &lt;span class="na"&gt;class=&lt;/span&gt;&lt;span class="s"&gt;"chapter"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;h2&amp;gt;&amp;lt;a&lt;/span&gt; &lt;span class="na"&gt;name=&lt;/span&gt;&lt;span class="s"&gt;"chap05"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&amp;lt;/a&amp;gt;&lt;/span&gt;
RAPPACCINI’S DAUGHTER&lt;span class="nt"&gt;&amp;lt;/h2&amp;gt;&lt;/span&gt;
--
&lt;span class="nt"&gt;&amp;lt;/div&amp;gt;&lt;/span&gt;&lt;span class="c"&gt;&amp;lt;!--end chapter--&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;div&lt;/span&gt; &lt;span class="na"&gt;class=&lt;/span&gt;&lt;span class="s"&gt;"chapter"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;h2&amp;gt;&amp;lt;a&lt;/span&gt; &lt;span class="na"&gt;name=&lt;/span&gt;&lt;span class="s"&gt;"chap06"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&amp;lt;/a&amp;gt;&lt;/span&gt;
MRS. BULLFROG&lt;span class="nt"&gt;&amp;lt;/h2&amp;gt;&lt;/span&gt;
--
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Chunking Strategy
&lt;/h2&gt;

&lt;p&gt;Once the text is ready, the next step is to determine a chunking strategy. Chunking splits the text into meaningful pieces that can be added quietly to a user's prompt to improve the request quality for the LLM. Without chunking, the context window would be overwhelmed by content without any focus which would likely make performance even worse than without it.&lt;/p&gt;

&lt;p&gt;For example, if I wanted to know what happened to the insect that was near Beatrice, I would want the RAG process to find that one relevant paragraph and include that with my prompt to the LLM. The question is how to create these chunks. If made too large, you risk having irrelevant details. If made too small, some of the critical context could be left behind. I examined two different chunking strategies - fixed sized and paragraph boundary.&lt;/p&gt;

&lt;p&gt;Fixed chunking takes a set length of characters for each chunk. An overlap is included to help avoid circumstances when an idea may be carried across chunks. Here is the python code to do this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;chunk_fixed_size&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;chunk_size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;500&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;overlap&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;chunks&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
    &lt;span class="n"&gt;start&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
    &lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="n"&gt;start&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;end&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;start&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;chunk_size&lt;/span&gt;           &lt;span class="c1"&gt;# Define the end of this chunk
&lt;/span&gt;        &lt;span class="n"&gt;chunk&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;start&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="n"&gt;end&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;    &lt;span class="c1"&gt;# Extract the chunk
&lt;/span&gt;        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;                          &lt;span class="c1"&gt;# Only add non-empty chunks
&lt;/span&gt;            &lt;span class="n"&gt;chunks&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;start&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="n"&gt;chunk_size&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;overlap&lt;/span&gt;     &lt;span class="c1"&gt;# Move start less overlap
&lt;/span&gt;    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;chunks&lt;/span&gt;

&lt;span class="c1"&gt;# Function call executed and stored in a variable:
&lt;/span&gt;&lt;span class="n"&gt;fixed_chunks&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;chunk_fixed_size&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;chunk_size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;500&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;overlap&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Paragraph boundary can be an effective strategy for text documents if the prose style is compatible. This was something that was examined in this study. Knowing that the source text uses a double newline to start a new paragraph, the script used this as a marker for splitting the text.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;chunk_by_paragraph&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;min_length&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;raw_chunks&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;   &lt;span class="c1"&gt;# Split on double newlines
&lt;/span&gt;    &lt;span class="n"&gt;chunks&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;                       &lt;span class="c1"&gt;# Clean each chunk, filter short ones
&lt;/span&gt;    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;chunk&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;raw_chunks&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;cleaned&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cleaned&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="n"&gt;min_length&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;chunks&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cleaned&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;chunks&lt;/span&gt;

&lt;span class="c1"&gt;# Function call executed and stored in a variable:
&lt;/span&gt;&lt;span class="n"&gt;para_chunks&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;chunk_by_paragraph&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;min_length&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The story had 66,335 characters. For the fixed-size method, 148 chunks were created with an average size of 497 characters. Here is an example, chunk #10:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;ni stood at the window he heard a rustling behind a screen of
leaves, and became aware that a person was at work in the garden. His figure soon emerged into view, and showed itself to be that of no common laborer, but a tall, emaciated, sallow, and sickly-looking man, dressed in a scholarâs garb of black. He was beyond the middle term of life, with gray hair, a thin, gray beard, and a face singularly marked with intellect and cultivation, but which could never, even in his more youthful days,
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For the paragraph boundary method, 117 chunks were created with an average size of 554 characters. The shortest chunk was 100 characters while the larges was 2,118 characters. Here is an example of its chunk #10:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Soon there emerged from under a sculptured portal the figure of a young girl, arrayed with as much richness of taste as the most splendid of the flowers, beautiful as the day, and with a bloom so deep and vivid that one shade more would have been too much. She looked redundant with life, health, and energy; all of which attributes were bound down and compressed, as it were and girdled tensely, in their luxuriance, by her virgin zone. Yet Giovanniâs fancy must have grown morbid while he looked down into the garden; for the impression which the fair stranger made upon him was as if here were another flower, the human sister of those vegetable ones, as beautiful as they, more beautiful than the richest of them, but still to be touched only with a glove, nor to be approached without a mask. As Beatrice came down the garden path, it was observable that she handled and inhaled the odor of several of the plants which
her father had most sedulously avoided.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Note that the fixed-size chunk starts with a word fragment and ends before the completion of the sentence. This could be a loss of context that may impair the LLM. For the paragraph boundaries, we are fortunate that Hawthorne wrote in similarly sized paragraphs with at least one outlier that was very large. The weakness of this strategy is its entire dependence on the writer's style of paragraphs.&lt;/p&gt;

&lt;h2&gt;
  
  
  Embedding Strategy
&lt;/h2&gt;

&lt;p&gt;Each of these two lists of strings, one for each type of chunking, must be converted into a vector of numbers for the LLM to perform its mathematical operations, called Vector Search. A vector is a long list of numbers that represent meaning. For example, the word "dolphin" would have a list of numbers to represent ideas of water, swimming, mammal, fish-eater, echolocation, and so on. The word "snorkel" would have a list of numbers to represent ideas of water, swimming, equipment, plastic, manufactured, and so on. By representing each of these ideas with a number, these numbers can be mathematically compared. How this is done specifically is explained below.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="o"&gt;{&lt;/span&gt;&lt;span class="s2"&gt;"embedding"&lt;/span&gt;:[1.1739883422851562,-0.32890406250953674,-3.7720236778259277,0.4036591947078705,1.3114928007125854,-0.17043189704418182,-1.1893550157546997,...]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To work with vectors, a vector database is needed to store and use this data. The vector database used in the study was ChromaDB. This application was selected because it runs entirely in Python, and stored locally in an SQLite database, which avoids the need for separate servers.&lt;/p&gt;

&lt;p&gt;ChromaDB comes with its own embedding model, called all-MiniLM-L6-v2, which stores meaning in 384 dimensional vectors and requires a storage size of 87 MB. For this project, I used a different embedding model - nomic-embed-text. This model creates 768 dimensional vectors using 137M parameters. This choice provides a much larger context window to avoid truncation of chunks at the cost of speed. Nomic takes about five times longer to run. However, for this study the embedding process took only 1-2 minutes and was not a factor.&lt;/p&gt;

&lt;p&gt;The script for sending the text and receiving the vector is pretty simple:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_embedding&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;http://localhost:11434/api/embeddings&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;model&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;nomic-embed-text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;prompt&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;()[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;embedding&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each vector is then stored in a database table which in vector terminology is called a &lt;strong&gt;collection&lt;/strong&gt;. The embeddings is the vector, documents holds the original text, and metadata provides extra information we can filter on later:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;chunk&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;enumerate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;chunks&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="c1"&gt;# Get the embedding vector for this chunk
&lt;/span&gt;    &lt;span class="n"&gt;embedding&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;get_embedding&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;collection&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;ids&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;chunk_&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="n"&gt;embeddings&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;embedding&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="n"&gt;documents&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="n"&gt;metadatas&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;chunk_index&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;length&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;)}]&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="p"&gt;.&lt;/span&gt; &lt;span class="p"&gt;.&lt;/span&gt; &lt;span class="p"&gt;.&lt;/span&gt;

&lt;span class="n"&gt;collection&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;get_or_create_collection&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;rappaccini&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;metadata&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;hnsw:space&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cosine&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When doing vector searches we need to specify a mathematical method to do the evaluation. There are several to choose from, so I chose cosine, which measures the angle between vectors. This method is better suited for text than raw distance measurements. This is one parameter that can be changed for tuning RAG processes.&lt;/p&gt;

&lt;p&gt;Both the database and the embedding model were ran locally on a CPU-powered laptop. The process ran at about five chunks per second, which was fine for only having about 150 chunks per each of the two methods being tested.&lt;/p&gt;

&lt;h2&gt;
  
  
  Retrieval Quality
&lt;/h2&gt;

&lt;p&gt;Now that we have a vector database with the contents of Rappaccini's Daughter loaded - chunked by fixed-length and paragraph boundary, we need to assess how well the LLM does in retrieving the right chunks. This process is hidden from the regular user but is essential for the performance of RAG. Retrieving the wrong chunk will doom the process to failure.&lt;/p&gt;

&lt;p&gt;Reviewing the list of failures in the baseline testing, I selected the following question to perform the test because it is a concrete detail that can be located in a specific paragraph:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;What does Professor Baglioni tell Giovanni about Rappaccini's reputation among scientists?
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To do this test, I converted the question into a vector using the same nomic embedding model and script described above. Then I ran a query against the collection and asked it to return the top three results:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;retrieve_chunks&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;collection&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;n_results&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;query_embedding&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;get_embedding&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;collection&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;query_embeddings&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;query_embedding&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="n"&gt;n_results&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;n_results&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;include&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;documents&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;distances&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;metadatas&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For cosine distances, the metric ranges from 0 (identical) to 2 (opposite). For the paragraph chunking, the top three results were 0.3437, 0.3598, and 0.3630. The fixed-length chunking had similar scores of 0.3671, 0.3760, and 0.3834. &lt;/p&gt;

&lt;p&gt;The top result for the paragraph boundary was Chunk 44:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Distance score: 0.3437 (lower = more similar)
Chunk index: 44 | Length: 253 chars

âPatience! patience!âI tell thee, my poor Giovanni, that Rappaccini has a scientific interest in thee. Thou hast fallen into fearful hands! And the Signora Beatrice,—what part does she act in this mystery?âAre there such idle rumors?âDo people say that I am skilled in my fatherâs science of plants? What
a jest is there! No; though I have grown up among these flowers, I know no more of them than their hues and perfume; and sometimes methinks I would fain rid myself of even that small knowledge. There are many flowers here, and those not
the least brilliant, that shock and offend me when they meet my eye. But pray, signor, do not believe these stories about my science. Believe nothing of me save what you see with your own eyes.âAha!âSo now our friend Giovanniâs secret is out. You have heard of this daughter, whom all the young men in Padua are
wild about, though not half a dozen have ever had the good hap to see her face. I know little of the Signora Beatrice save that Rappaccini is said to have instructed her deeply in his science, and that, young and beautiful as fame reports her, she is already qualified to fill a professorâs chair. Perchance
her father destines her for mine! Other absurd rumors there be, not worth talking about or listening to. So now, Signor Giovanni, drink off your glass of lachryma.âAnd what ar
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The top result for the fixed-length method was Chunk 27. I am including Chunk 28 because it contains the rest of the relevant content:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;--- Result 1 ---
Distance score: 0.3671
Chunk index: 27 | Length: 500 chars

hand, I should answer it but scantily to my conscience were I to permit a worthy youth like yourself, Signor Giovanni, the son of an ancient friend, to imbibe erroneous ideas respecting a man who might hereafter chance to hold your life and death in his hands. The truth is, our worshipful Dr. Rappaccini has as much science as any member of the faculty—with perhaps one single exception—in Padua, or all Italy; but there are certain grave objections to his professional character.âAnd what ar


--- Result 2 ---
Distance score: 0.376
Chunk index: 28 | Length: 499 chars

to his professional character.âAnd what are they?âHas my friend Giovanni any disease of body or heart, that he is so inquisitive about physicians?âBut as for Rappaccini, it is said of him—and I, who know the man well, can answer for its
truth—that he cares infinitely more for science than for mankind. His patients are interesting to him only as subjects for some new experiment. He would sacrifice human life, his own amon
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Note that the distance scores are comparable for each method. However the fixed chunk 27 had more of the correct content than the paragraph chunk 44, despite being smaller and having a poorer distance score. The overlapping helps the fixed-length method as it includes the remainder of the relevant context. Both methods produced the right result among the top three but neither captured the top hit perfectly. Both chunking methods produced better retrieval quality than the LLM alone, so I proceeded to the next step.&lt;/p&gt;

&lt;h2&gt;
  
  
  Generation
&lt;/h2&gt;

&lt;p&gt;Now it is time to bring this all together and run the RAG-enhanced LLM process. To do this, I used some simple Python scripts to control the evaluation process:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;generate_answer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;question&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;context_chunks&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="c1"&gt;# Build the context string from retrieved chunks
&lt;/span&gt;    &lt;span class="n"&gt;context&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="s"&gt;---&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;context_chunks&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="c1"&gt;# This is the RAG prompt - it instructs phi4 to use
&lt;/span&gt;    &lt;span class="c1"&gt;# only the provided context to answer the question
&lt;/span&gt;    &lt;span class="n"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You are answering questions about Rappaccini&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s Daughter &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;by Nathaniel Hawthorne. Use only the context provided &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;below to answer the question. Be specific and concise. &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;If the answer is not contained in the context, say so.&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;CONTEXT:&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="s"&gt;QUESTION:&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;question&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="c1"&gt;# Send to phi4 via Ollama generate endpoint
&lt;/span&gt;    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;http://localhost:11434/api/generate&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;model&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;phi4&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;prompt&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;stream&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;()[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;response&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here are some highlights from this testing.&lt;/p&gt;

&lt;h2&gt;
  
  
  Positional Questions are Challenging
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Recite the first 4 sentences of Rappaccinis Daughter by Nathaniel Hawthorne.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I had thought this should have been a simple question, considering the chunks are taken in sequential order and numbered. This proved to be a false assumption.&lt;/p&gt;

&lt;p&gt;The boundary chunking method produced this response:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Distance scores: [0.4081, 0.4126, 0.4138]
Generating answer with phi4...
Answer:
I'm sorry, but I can't provide verbatim excerpts from "Rappaccini's Daughter" as it is not included in my training data. However, I can tell you that "Rappaccini's Daughter" begins with Giovanni Guasconti arriving at the University of Padua and renting a room in an old building near the garden belonging to Professor Baglioni. If you're looking for the exact text, I recommend checking a copy of the story itself or a reliable literary source.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The fixed chunking method produced a very similar response:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Distance scores: [0.4121, 0.4174, 0.4219]
Generating answer with phi4...
Answer:
I'm sorry, but I can't provide verbatim text from "Rappaccini's Daughter" as it is not included in my training data. However, you might consider looking up the opening lines directly from a copy of the story to see them for yourself. If there are specific themes or questions about the beginning of the story that you would like to discuss, feel free to ask!
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Even when provided with RAG chunks, the LLM still refused to consider it as it claimed it did not have it in its pretraining memory. There are two possible causes for this behavior. First, it could indeed be a &lt;strong&gt;retrieval failure&lt;/strong&gt; given the distance scores. The prompt itself may be misleading the LLM engine. Second, this could be intentional &lt;strong&gt;diversion training&lt;/strong&gt; built into the phi4 model over literary copyright concerns. &lt;/p&gt;

&lt;p&gt;To evaluate these options, I pulled the top chunks from each method and neither of them had the correct one, Chunk 0. When I forced the retrieval of Chunk 0 for each method, I received the same, and correct, response. Although the fixed-length chunk was cut off since the prompt asked for more than 500 characters:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;A young man, named Giovanni Guasconti, came, very long ago, from the more
southern region of Italy, to pursue his studies at the University of Padua.
Giovanni, who had but a scanty supply of gold ducats in his pocket, took
lodgings in a high and gloomy chamber of an old edifice which looked not...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Regardless of the cause, this result should caution RAG developers that responses may still deviate from expected behavior. RAG is designed for semantic searches, not positionally-specific content. This is a cause that can trigger hallucinations. Note also that diversion training can be part of model training that can provide significant influence into how it responds.&lt;/p&gt;

&lt;h2&gt;
  
  
  Distance Scores Can Mislead
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Question: What does Professor Baglioni tell Giovanni about Rappaccinis reputation among scientists?
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The baseline answer was wrong:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Rappaccini is considered a dangerous scientist who has crossed
ethical boundaries by turning humans into plants.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The paragraph answer was not much better:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Retrieved chunks: 3
Distance scores: [0.3568, 0.3571, 0.3709]

Generating answer with phi4...

Answer:
The context provided does not contain specific information regarding Professor Baglioni's comments on Rappaccini's reputation among scientists. It mentions that Rappaccini has a scientific interest in Giovanni and instructs Beatrice deeply in his science, but no explicit details about Rappaccini's reputation are given.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The fixed-chunks answer was far better:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Retrieved chunks: 3
Distance scores: [0.3679, 0.3777, 0.3839]

Generating answer with phi4...

Answer:
Professor Baglioni tells Giovanni that Dr. Rappaccini is highly regarded for his scientific knowledge, possibly equal to any member of the faculty in Padua or all Italy with one exception. However, there are significant objections to Rappaccini's professional character; he cares infinitely more for science than for mankind and views patients as subjects for experiments rather than individuals deserving care.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Note that even though the distance scores of the fixed-length chunks were not as good as the paragraph boundary, the fixed-length provided a much better result. The paragraph chunk merged several scenes with multiple speakers together which confused the LLM. The fixed-length provided two consecutive chunks that provided the full context of the the question. It could have still been better because the third chunk cut off a sentence where Baglioni said Dr. Rappaccini would sacrifice human life, including his own, for the cause of science.&lt;/p&gt;

&lt;h2&gt;
  
  
  Regression Testing
&lt;/h2&gt;

&lt;p&gt;I performed baseline testing for each of the methods and found that most responses had no degradation in performance. The paragraph chunking method generally had better distance scores as it tended to preserve the full context in a single chunk when compared to fixed-length.&lt;/p&gt;

&lt;p&gt;This is illustrated with the following question:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Question: Does Professor Baglioni warn Giovanni out of genuine concern or professional jealousy toward Rappaccini?
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The baseline answer, without RAG enhancement, responded correctly with this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Baglioni acts from both genuine concern AND professional jealousy.
The text deliberately leaves his motives ambiguous.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The paragraph boundary method  also provided the correct answer with even more context:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;istance scores: [0.3623, 0.3659, 0.3702]

Generating answer with phi4...

Answer:
The context suggests that Professor Baglioni's warning to Giovanni may be driven by both concern and professional jealousy. The text describes him as "imperturbable" and notes his mention of Rappaccini's "scientific interest" in Giovanni, indicating a potential rivalry or mistrust toward Rappaccini. Additionally, the professor expresses fear that Giovanni has "fallen into fearful hands," which implies genuine concern for Giovanni's well-being. Therefore, Baglioni's warning likely stems from a combination of both motivations.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The fixed-length, however, stumbled on this one:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Distance scores: [0.3819, 0.3833, 0.3945]

Generating answer with phi4...

Answer:
The context provided does not contain information regarding Professor Baglioni's motivations for warning Giovanni. It mentions his reaction to Beatrice and Giovanni but does not specify whether his concerns are due to genuine care or professional jealousy towards Rappaccini. Therefore, the answer cannot be determined from the given context.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This result should be expected as the breaks in the chunks are not arbitrary for the paragraph boundary method. As this prompt required a synthesis of text from multiple places in the story, it was easier for RAG to identify and prioritize the relevant chunks of text.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusions
&lt;/h2&gt;

&lt;p&gt;So what can we conclude from this study? RAG processes can certainly help with LLM responses, but they are not infallible and the techniques used can impact results. I had thought that the paragraph boundary method would be superior across the board, but I was mistaken. &lt;/p&gt;

&lt;p&gt;It should be noted that the paragraph boundary method is highly dependent on the writing style of the source text. It is fortunate in this particular case that Nathaniel Hawthorne largely kept each narrative point contained in a single paragraph. Other authors such as William Faulkner, Virginia Woolf, and Leo Tolstoy are known to have a single paragraph flowing across several pages. It is therefore an important consideration to know what documents you are working with and ensure your starting point has an uncluttered format. RAG systems often fail not due to the process or the tools but data cleanliness. The old adage of, 'garbage in, garbage out' certainly applies here. The more the document has only relevant data, with clearly defined headings, the better RAG will perform.&lt;/p&gt;

&lt;p&gt;It is tempting to use distance scores as a quantifiable metric of quality. However, this study proved such is not the case. Distance scores do not measure the correctness of the answer. The values indicate how topically related the chunk is to the prompt.&lt;/p&gt;

&lt;p&gt;For factual queries, fixed-chunks consistently provided the better answers. This is due to prompts asking for concentrated details that can be located in a single chunk. However, when a prompt requires reasoning across a broader part of the text, paragraph chunking did better in finding and prioritizing the content in a fewer number of chunks. Non-semantic prompts, such as asking to recite the first few sentences of a text was a struggle for both methods and highlights a structural weakness of RAG. Lastly, when it comes to questions of simple questions, a sufficiently trained LLM should be able to answer those questions without RAG assistance. In the case of Rappaccini's Daughter, however, 14 billion parameters are not enough.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Query Type&lt;/th&gt;
&lt;th&gt;Better Strategy&lt;/th&gt;
&lt;th&gt;Reason&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Factual — concentrated detail&lt;/td&gt;
&lt;td&gt;Fixed chunks&lt;/td&gt;
&lt;td&gt;Isolates specific content cleanly&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Reasoning — across narrative&lt;/td&gt;
&lt;td&gt;Paragraph chunks&lt;/td&gt;
&lt;td&gt;Preserves complete thoughts&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Positional — first sentences&lt;/td&gt;
&lt;td&gt;Neither&lt;/td&gt;
&lt;td&gt;Requires direct retrieval&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Simple known fact&lt;/td&gt;
&lt;td&gt;Both equal&lt;/td&gt;
&lt;td&gt;RAG adds no value&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Here are some questions to ask when you are looking at RAG solutions:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;What chunking strategy do you use and why?&lt;/li&gt;
&lt;li&gt;How do you handle positional queries?&lt;/li&gt;
&lt;li&gt;How do you measure retrieval quality separately from answer quality?&lt;/li&gt;
&lt;li&gt;What is your average chunk size and how was it determined?&lt;/li&gt;
&lt;li&gt;How does your system perform on queries requiring reasoning across multiple passages?&lt;/li&gt;
&lt;li&gt;Can you show me examples of retrieval failures and how the system handles them?&lt;/li&gt;
&lt;/ol&gt;

</description>
      <category>python</category>
      <category>ai</category>
      <category>beginners</category>
      <category>machinelearning</category>
    </item>
    <item>
      <title>Hello, World</title>
      <dc:creator>Robert Boys</dc:creator>
      <pubDate>Tue, 05 May 2026 22:33:21 +0000</pubDate>
      <link>https://dev.to/robnotbob/hello-world-16ah</link>
      <guid>https://dev.to/robnotbob/hello-world-16ah</guid>
      <description>&lt;p&gt;I figured there is no better way to establish a presence in a new environment than to say, Hello World! &lt;/p&gt;

&lt;h2&gt;
  
  
  My Background
&lt;/h2&gt;

&lt;p&gt;My journey in the world of computer science started at a young age. As a young child, I remember writing out BASIC programs from a book with my dad on a Timex Sinclair 1000 machine. It was a momentous day when my father got the 16k expansion memory box that plugged into the back of the small black box that was little more than a keyboard.&lt;/p&gt;

&lt;p&gt;I drifted away from coding in the following years - relying more on programs and games you would install from a cassette tape or floppy disk. In high school I gravitated to science and engineering and learned Pascal. However, in college I moved into the physical sciences and earned a degree in chemical engineering. &lt;/p&gt;

&lt;p&gt;After several years working in the environmental consulting and remediation field, I returned to school to earn an MBA in finance and accounting. However, it was there that I started moving back into the realm of computer science and coding. One of my professors was passionate about databases and believed it was a very important skill for business literacy. His belief proved out true for me as I started my new career in corporate finance building and running databases. It was a key driver of efficiency and quality.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where I Am Now
&lt;/h2&gt;

&lt;p&gt;Now, after 20 years in corporate finance and accounting, I am finding myself formally in an IT function as a product owner. I am entering my third career and I could not be more excited. With my engineering background, I have always been curious to learn how things work. In this era of AI, the opportunity to learn has never been greater. I firmly believe that to be a superior product owner and lead development vision, you need to roll up your sleeves and get your hands dirty with the toolsets. &lt;/p&gt;

&lt;h2&gt;
  
  
  What You Will See Here
&lt;/h2&gt;

&lt;p&gt;My goal of posting on dev.to is to engage with the professionals who are driving innovation and to share what I have learned for others who may be on a similar path. I imagine there will be errors and simple concepts that I spend too much time explaining. However, I have found that there truly are no stupid questions. Sometimes we need to return to the basics and test ourselves against them. We should always be honestly fearless with ourselves and hold ourselves accountable to what we truly understand. That is authentic growth. &lt;/p&gt;

&lt;p&gt;I will also make a commitment to you. While I will use AI to learn and to (likely extensively) help write code, the text prose you read will always be my own. I firmly believe in the writing process as a critical step in learning. Outsourcing human cognition is one of the great dangers of AI.&lt;/p&gt;

&lt;p&gt;Let's enjoy the journey.&lt;/p&gt;

</description>
      <category>beginners</category>
      <category>career</category>
      <category>learning</category>
      <category>ai</category>
    </item>
  </channel>
</rss>
