<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Eti Ijeoma </title>
    <description>The latest articles on DEV Community by Eti Ijeoma  (@aijeyomah).</description>
    <link>https://dev.to/aijeyomah</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1069843%2F6d883091-4801-4303-b463-1e4f7849a260.jpg</url>
      <title>DEV Community: Eti Ijeoma </title>
      <link>https://dev.to/aijeyomah</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/aijeyomah"/>
    <language>en</language>
    <item>
      <title>Docker Volumes vs. Bind Mounts: Choosing the Right Storage for Your Containers.</title>
      <dc:creator>Eti Ijeoma </dc:creator>
      <pubDate>Tue, 07 Jan 2025 21:10:50 +0000</pubDate>
      <link>https://dev.to/aijeyomah/docker-volumes-vs-bind-mounts-choosing-the-right-storage-for-your-containers-3pb8</link>
      <guid>https://dev.to/aijeyomah/docker-volumes-vs-bind-mounts-choosing-the-right-storage-for-your-containers-3pb8</guid>
      <description>&lt;p&gt;Data persistence in application deployment using containerization is often a critical challenge that can make or break your application’s performance and reliability. Containers are ephemeral, meaning they spin up, execute tasks, and can be destroyed in moments.&lt;/p&gt;

&lt;p&gt;When a containerized application is removed or destroyed, all changes made to the container itself, including the data stored in it, are lost. As a result, any files or data stored in the container's file system are erased when the containers are removed, and a new container will be created without the previous changes.&lt;/p&gt;

&lt;p&gt;This behavior can pose a threat when working with containerized applications that rely on persisting data such as logs, databases, or sensitive configuration files. Losing such data every time a container is recreated can disrupt your software development workflow, and affect the overall functionality of the application. &lt;/p&gt;

&lt;p&gt;To avoid losing your application data, Docker provides two key features called Docker Volumes, and bind mounts as solutions to persist data within your Docker container. &lt;/p&gt;

&lt;p&gt;In this tutorial, we will take an extensive look at Docker volumes, and Bind Mounts, their unique features, comparative analysis, and use case recommendations for the both of them&lt;/p&gt;

&lt;h3&gt;
  
  
  Docker Volumes
&lt;/h3&gt;

&lt;p&gt;Docker volumes are data stores used for persistent data storage for your containerized applications. In the Docker environment, you can create a volume using the docker volume create command or use the default volumes that are created whenever a container is created.&lt;/p&gt;

&lt;p&gt;Docker volumes are decoupled from the host’s specific file system structure by providing a specific directory within ts storage area, typically located at &lt;code&gt;/var/lib/docker/volumes/&lt;/code&gt; on Linux and Unix-based systems. This helps to eliminate the complexity of managing storage locations.&lt;/p&gt;

&lt;p&gt;The following command can be used create, list, and remove volumes using simple CLI commands&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;$ docker volume create my_volume   # Create a new volume
$ docker volume ls                 # List all volumes
$ docker volume inspect my_volume  # Inspect a specific volume
$ docker volume rm my_volume       # Remove a volume
$ docker run -v &amp;lt;my_volume&amp;gt;:&amp;lt;container_path&amp;gt; &amp;lt;image_name&amp;gt;:&amp;lt;tag&amp;gt;  # Run a container with the specified volume mounted to the container path
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Consider a scenario with a PostgreSQL database. With a Docker volume, you can ensure that the database files present will persist even if the container is deleted by running the following command.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;docker run -v postgres_data:/var/lib/postgresql/data postgres: latest
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In this example, the PostgreSQL data stored in &lt;code&gt;/var/lib/PostgreSQL/data&lt;/code&gt; is persisted in the &lt;code&gt;postgres_data&lt;/code&gt; volume, which will survive all container restarts.&lt;/p&gt;

&lt;p&gt;Docker volumes provide a separation layer between the container environment and the storage. The containers can access these volumes using mount points, while Docker manages the overall storage infrastructure.&lt;/p&gt;

&lt;p&gt;Volumes work with Linux, Unix, and Windows Docker environments. Different drivers store data in different services. Local storage is the default within your Docker environment, but storage adapters like NFS volumes and CIFS shares are alternative volumes.&lt;/p&gt;

&lt;p&gt;Docker also has the Docker’s API that allows the dynamic creation of volumes especially within a continuous integration and deployment pipeline.&lt;/p&gt;

&lt;h2&gt;
  
  
  Key Features of Docker Volumes
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Data Independence&lt;/strong&gt;&lt;br&gt;
Docker Volumes offer isolation and independence from host directory structures. These volumes can also be moved from one operating system to another environment with minimal access to the host system, enhancing its security&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Easy Backup and Migration capabilities&lt;/strong&gt;&lt;br&gt;
Since Docker volumes are portable, they offer a good mechanism for backup and migration within your infrastructure. Volumes can easily be copied using Docker CLI commands, and there is also support for cloud-based backup strategies.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Universal Storage compatibility&lt;/strong&gt;&lt;br&gt;
Docker volumes are platform-agnostic, which means that they provide uniform functionality across different operating systems, cloud platforms, and several other containerized environments.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  When to use Docker Volumes
&lt;/h2&gt;

&lt;p&gt;Docker volumes are specifically designed to support stateful applications, making them ideal for various use cases, including the following:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Database storage&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;When working with databases, you should mount the volume to the storage directories used by these databases. These databases may include MySQL, Postgres, and Mongo to ensure that your data persists after the container restarts.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Application Data Storage&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Data that is generated by your application should be stored in a volume for persistent storage. Data includes documents, photos, and file uploads.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Cached Data Storage.&lt;/strong&gt;
Use a volume to persist any cached data generated within your application that would take time to rebuild if the container restarts. &lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Bind Mounts
&lt;/h3&gt;

&lt;p&gt;Bind Mounts are one of the most direct and fundamental methods of persistent storage within a containerized environment. A bind mount creates a bi-directional and real-time mapping that mirrors the host filesystem’s exact state within your container environment. &lt;/p&gt;

&lt;p&gt;Unlike Docker volumes, bind mounts allow you to mount a specific file or directory from the host's file system into the container. This connection creates a direct link between the host and container file systems, with the same path structure and access to the host files.&lt;/p&gt;

&lt;p&gt;When you create a bind mount, Docker creates a reference point between the host directory and the container directory. Any change made in either the host or container directory reflects on the corresponding location.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Basic bind mount syntax
docker run -v /host/path:/container/path image_name
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In this example, &lt;code&gt;/host/path&lt;/code&gt; is mounted to &lt;code&gt;/container/path&lt;/code&gt; to create a connection between the host and the container filesystem.&lt;/p&gt;

&lt;p&gt;Within the Docker ecosystem, bind mounts excited before Docker volumes based on Linux mount features. When Docker was created, bind mounts were the initial method for persistent storage.&lt;/p&gt;

&lt;p&gt;Consider a scenario in which you need to mount a web development environment to a development container. The following Docker script can achieve this.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;docker run -v /home/developer/project:/app \
-w /app \
node:latest \
npm start
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The command above mounts the local project directory into the container’s &lt;code&gt;/app&lt;/code&gt; directory, allowing quick, real-time code changes.&lt;/p&gt;

&lt;h2&gt;
  
  
  Key Features of Bind Mounts
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Direct access to the Host’s Machine system&lt;/strong&gt;
With Bind Mounts, there is a direct connection to the host machine's filesystem. This provides minimal latency and synchronization of the host machine and container environment. &lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Control over the Mounted directory.&lt;/strong&gt; 
Developers can gain control over what files and directories can be exposed to containers.
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Mount a specific subdirectory
docker run -v /home/user/project/src:/app/src image_name
# Mount individual files
docker run -v /home/user/config.json:/app/config.json image_name
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The setup above selects the exact files to be exposed to the container environment.&lt;/p&gt;

&lt;h3&gt;
  
  
  Use Cases of Bind Mounts
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Local development environments&lt;/strong&gt;&lt;br&gt;
Bind mounts work well in local development workflows because they provide instant code synchronization, which helps maintain consistency in the development environment.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Sharing of configuration files&lt;/strong&gt;&lt;br&gt;
Here, you can mount host configuration files into a container to configure web servers, databases, and application systems.&lt;br&gt;
&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;docker run -v /host/nginx.conf:/etc/nginx/nginx.conf nginx: latest
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Database Data Storage&lt;/strong&gt;
Bind mounts help map database storage directories on the host to the container for data persistence. Consider the scenario for a PostgreSQL database.
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;docker run -v /host/pgdata:/var/lib/postgresql/data postgres: latest
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Comparative Analysis Between Docker Volumes and Bind Mounts
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Performance Comparison
&lt;/h3&gt;

&lt;p&gt;Docker volumes and bind mounts differ in the performance characteristics they offer in a containerized environment.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Docker Volumes&lt;/strong&gt;: This is designed for optimized storage. It also offers faster IO operations and caching across the host systems. Docker volumes manage storage allocation efficiently, but they often have a higher memory overhead.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Bind Mounts&lt;/strong&gt;: Bind Mounts provide direct access to the file system of the host. &lt;br&gt;
However, its performance is dependent on the host filesystem. Since it maps to that filesystem, it uses minimal memory. &lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The performance of Docker volumes and Bind Mount can also depend on the type of storage (HDD vs. SSD), the host filesystem (NTFS, ext4), and container runtime settings.&lt;br&gt;
In terms of performance, Docker volumes provide more consistent and optimized performance with good storage management features. They are ideal for production environments and in the management of large-scale stateful applications like databases.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;### Security Considerations.&lt;/li&gt;
&lt;/ol&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Docker Volumes&lt;/strong&gt;: This provides an isolated environment away from the host filesystem, reducing the potential for security risks. Controlled access mechanisms also ensure limited exposure to the host system.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Bind Mounts&lt;/strong&gt;: Bind mounts expose the host filesystem directly to the container and inherit the host system’s permissions. This may lead to unauthorized access, with unintended file modifications. To avoid this, use read-only mounts and implement strict access controls for the host directories.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;ol&gt;
&lt;li&gt;### Portability and Compatibility. &lt;/li&gt;
&lt;/ol&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Docker Volumes&lt;/strong&gt;: These are portable and consistent across different platforms, making storage management, backup, and migration strategies easier. They also work well with other Docker tools, such as Docker Desktop. &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Bind Mounts&lt;/strong&gt;: These are usually platform-dependent, and their behavior is usually tied to the host operating system. To configure this, you may need to perform a lot of path mapping, especially if both systems have different operating systems. File system differences may also occur during configuration, so extra attention is required.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Use Case Recommendations for Docker Volumes vs Bind Mounts
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;When to use Docker Volumes&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Docker volumes are the best choice for production environments where reliability and scalability are critical. They also work well in microservice architectures, where different containers need shared access to data. When working with persistent database storage, docker volumes help keep files safe after a container restarts or is recreated. This makes them ideal for setting up database infrastructure such as PostgreSQL or MongoDB.&lt;/p&gt;

&lt;p&gt;Due to their platform-agnostic nature, Docker volumes are ideal for cross-platform deployments. This feature makes it easier to deploy containerized applications across diverse operating systems, both on-premise and in the cloud.&lt;/p&gt;

&lt;p&gt;In addition, Docker volumes are ideal in a CI/CD pipeline, where sensitive files and data, such as logs, artifacts, build files and dependencies, need to be stored and reused. This ensures proper data protection and isolation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;When to use Bind Mounts&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Bind mounts are the best option for development and testing scenarios. They offer flexibility and real-time synchronization of files from the host to the container. Bind mounts also help developers edit code and make changes in the container. Thus, they are ideal for web development scenarios where comprehensive testing is important in the application's development stage.&lt;/p&gt;

&lt;p&gt;Bind mounts have low overhead requirements, making them a lightweight option for applications that do not require persistent storage or cross-platform compatibility. They are best for short-lived containers that will not scale to multiple environments.&lt;/p&gt;

&lt;h3&gt;
  
  
  Conclusion
&lt;/h3&gt;

&lt;p&gt;Docker Volumes and bind mounts are two approaches to container storage in a containerized environment. They each have unique strengths and trade-offs. In this article, we have shown how Docker volumes provide platform-agnostic storage with great security features across different environments, while bind mounts offer access to the host filesystem with minimal overhead.&lt;/p&gt;

&lt;p&gt;This article provides the information you need to adopt the best practices for a successful application deployment. Having proper knowledge of docker volumes and storage is ideal for creating the right persistent storage features for your application. &lt;/p&gt;

</description>
      <category>docker</category>
      <category>devops</category>
    </item>
    <item>
      <title>Building RAG-Powered Applications with LangChain, Pinecone, and OpenAI</title>
      <dc:creator>Eti Ijeoma </dc:creator>
      <pubDate>Fri, 13 Dec 2024 02:41:26 +0000</pubDate>
      <link>https://dev.to/aijeyomah/building-rag-powered-applications-with-langchain-pinecone-and-openai-2501</link>
      <guid>https://dev.to/aijeyomah/building-rag-powered-applications-with-langchain-pinecone-and-openai-2501</guid>
      <description>&lt;p&gt;The rise of large language models (LLMs) that can understand and generate human-like text has really transformed the area of artificial intelligence, allowing machines to understand and generate text that feels very human-like. While there are numbers of LLMs are available, In this article, we will focus on the generative pre-trained transformer (GPT), one of OpenAI's most advanced models.&lt;/p&gt;

&lt;p&gt;GPT models were first trained on many datasets, most of which were gathered from the Internet. This really improved how well the model can think. Because of that, it can do a good job with different natural language processing (NLP) tasks, including question-answering, summarizing, and generating human-like text.&lt;/p&gt;

&lt;p&gt;In this article, we will explore how the retrieval augmentation generation technique can mitigate these limitations and improve the performance of language models.&lt;/p&gt;

&lt;h2&gt;
  
  
  Retrieval Augmentation Generation
&lt;/h2&gt;

&lt;p&gt;Retrieval Augmentation Generation (RAG) is a technique for generating a high-quality, context-aware response by combining the initial prompt with information retrieved from external sources of knowledge and passing the augmented prompt to LLM. These knowledge sources may include collections of web pages, documents, or other textual resources that improve the LLM's understanding of information. Giving language models access to external data after initial training will optimize the way we train language models. In this article, We will leverage the capabilities of OpenAI alongside Langchain and Pinecone to create a context-aware chatbot.&lt;/p&gt;

&lt;h2&gt;
  
  
  Langchain
&lt;/h2&gt;

&lt;p&gt;Langchain is a tool that's great for creating applications that use large language models.It is available as a Python or JavaScript package, allowing software developers with to create  applications based on pre-existing AI models. Langchain can connect language models (LLMs) to data sources, allowing them to interact with the environment. Check out this documentation for a guide on Langchain and how to get started.&lt;/p&gt;

&lt;h2&gt;
  
  
  Pinecone
&lt;/h2&gt;

&lt;p&gt;Pinecone is a cloud-based vector database specializing in efficiently storing, indexing, and querying high-dimensional vectors. It is designed for effective similarity searches, enabling you to find vectors that are most similar based on metrics like E&lt;a href="https://en.wikipedia.org/wiki/Euclidean_vector" rel="noopener noreferrer"&gt;Euclidean distance&lt;/a&gt; or &lt;a href="https://en.wikipedia.org/wiki/Cosine_similarity" rel="noopener noreferrer"&gt;cosine similarity&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;To improve information retrieval for our AI model, we convert our knowledge documents into a special form called word embedding" or a "word vector" by using an embedding model and storing them in a vector database. This makes searching for relevant retrieval information faster and more accurate, especially when dealing with unstructured or semi-structured text data.&lt;/p&gt;

&lt;h2&gt;
  
  
  Bringing it All Together
&lt;/h2&gt;

&lt;p&gt;With our understanding of retrieval augmentation generation, we will leverage the power of OpenAI LLM, Langchain, and Pinecone to create a question-answering application.&lt;/p&gt;

&lt;p&gt;Implementation will be as follows: We will provide knowledge base documents, embed them, and store them in a pinecone. When a query is provided, it is converted into word embeddings using the same embedding model as the knowledge base text. The embedded query is then used to query Pinecone, to find the most similar and relevant vectors. These similar vectors will be translated back into the original language and used to help the LLM generate context-based responses.&lt;/p&gt;

&lt;h2&gt;
  
  
  Steps Involved in Retrieval Augmented Generation
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Data preparation and Ingestion&lt;/strong&gt;: After we gather our knowledge base data from different source(s). It is important to ingest (transform) the data into a standard structure that the language model can easily process. Langchain provides Document loader tools that are responsible for loading text from different sources, (text file, CSV file, youtube transcript, etc) to create a Langchain document. A Langchain document has two fields;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;page_content&lt;/strong&gt;: Contains the text of the file&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;metadata&lt;/strong&gt;: It stores additional relevant information about the text such as the text URL.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Examples of document loaders are Text loaders which can open a text file and Transform loaders which can open from any list of specific formats (HTML, CSV, etc) and load into a document.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Chunking Text&lt;/strong&gt;: This entails breaking down a document into smaller, more meaningful fragments, which are often shaped like sentences. This procedure is critical, especially when dealing with lengthy text inside the context window constraints of GPT-4, which presently supports a maximum of 8,912 tokens. The idea is to construct manageable fragments that are semantically coherent.&lt;/p&gt;

&lt;p&gt;To accomplish this, we use a length function, such as tiktoken, to calculate the sizes of these smaller fragments. The goal is to prevent suddenly splitting relevant information during chunking. In addition, we include an overlap that allows nearby pieces to share their content. This overlap contributes to continuity by including common words or phrases at the end of one chunk and the start of the next.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Embedding&lt;/strong&gt;: This process transforms complex data e.g. text, images, and audio into high-dimensional numeric vectors. Numerical storage allows for effective storage and processing. Also, embedding methods such as Word embeddings (Word2Vec) capture the semantic relationship between words and concepts. This semantic information is essential during the retrieval augmentation where understanding the context and relationships between terms is crucial for generating relevant content.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Embedding tools&lt;/strong&gt;: Langchain integrates with several models for generating word or sentence embeddings. In this article, we will make use of the OpenAIEmbeddings to create embeddings.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Storage&lt;/strong&gt;: The embedded documents are then stored in Pinecone’s vector database. Pinecone has some indexing techniques that organize and optimize the search to facilitate efficient retrieval of vectors similar to a query vector.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Retrieval&lt;/strong&gt;: The received query is embedded and then used to search Pinecone’s vector database for the most relevant documents. For example, In a similarity search, the term "Dog" may be represented numerically as [0.617]. Anytime a word is searched, they are also converted into vectors. a good model ensures that words with similar contexts, such as "puppy," yield closely aligned number series, like [0.691], reflecting the shared context between the words.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Generation&lt;/strong&gt;: The language model produces an accurate response for the query by utilizing the retrieved document as an additional context.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Implementing the question-and-answering chatbot&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;In this section, we’ll walk through a practical example of building an international football match question-answering bot. You can download the context-aware document will come from &lt;a href="https://www.kaggle.com/datasets/martj42/international-football-results-from-1872-to-2017" rel="noopener noreferrer"&gt;Kaggle data set&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Setting up the environment&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;To begin, we will create a .env file in which we will keep the Pinecone and OpenAI environment secret keys. To generate or find the secret keys for &lt;a href="https://app.pinecone.io/" rel="noopener noreferrer"&gt;Pinecone&lt;/a&gt; and &lt;a href="https://platform.openai.com/api-keys" rel="noopener noreferrer"&gt;OpenAI&lt;/a&gt;, click the attached link.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;OPENAI_API_KEY=""
PINECONE_ENV=""
PINECONE_API_KEY=""
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Next, we'll utilize pip to install the necessary packages.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;!pip install openai \
      \ langchain 
       \ pinecone 
       \ tiktoken
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then we'll import the relevant libraries.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import os
import time
from langchain.llms import OpenAI
from langchain.vectorstores import Pinecone
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.document_loaders.csv_loader import CSVLoader
from langchain.callbacks.base import CallbackManager
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
from langchain.chains import ConversationalRetrievalChain
from langchain.chains.conversational_retrieval.prompts import CONDENSE_QUESTION_PROMPT, QA_PROMPT
from langchain.chains.question_answering import load_qa_chain
from langchain.embeddings.openai import OpenAIEmbeddings
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We'll then use the Langchain DirectoryLoader to load documents from a directory. The documents in this example will be in CSV format and placed in the /data directory.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# In your project folder, create this directory structure which will hold the context-aware documents downloaded earlier 
directory = '/data' 
def load_docs(directory):
  loader = DirectoryLoader(directory, glob='**/*.csv', show_progress=True, loader_cls=CSVLoader)
  documents = loader.load()
  return documents

documents = load_docs(directory)
print(f"Loaded {len(documents)} documents")
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Next, we will split the documents into smaller chunks. This can be complex but we will simplify it using RecursiveCharacterTextSplitter from langchain. First, we will determine the custom length using the tiktoken library which will enable us to recursively split the data into n tokens&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Tell tiktoken what model we'd like to use for embeddings
tiktoken.encoding_for_model('text-embedding-ada-002')

# Intialize a tiktoken tokenizer (i.e. a tool that identifies individual tokens (words))
tokenizer = tiktoken.get_encoding('cl100k_base')

# Create our custom tiktoken function
def tiktoken_len(text: str) -&amp;gt; int:
    """
    Split up a body of text using a custom tokenizer.

    :param text: Text we'd like to tokenize.
    """
    tokens = tokenizer.encode(
        text,
        disallowed_special=()
    )
    return len(tokens)

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;def chunk_by_size(text: str, size: int = 50) -&amp;gt; list[Document]:
    """
    Chunk up text recursively.

    :param text: Text to be chunked up
    :return: List of Document items (i.e. chunks).|
    """
    text_splitter = RecursiveCharacterTextSplitter(
    chunk_size = size,
    chunk_overlap = 20,
    length_function = tiktoken_len,
    add_start_index = True,
)
    return text_splitter.create_documents([text])
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We will initialize Initialize our OpenAI model&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Initialize our OpenAI model
OPENAI_API_KEY = getpass("OpenAI API Key: ")
model_name = 'text-embedding-ada-002'

embeddings = OpenAIEmbeddings(
    model=model_name,
    openai_api_key=OPENAI_API_KEY
)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After that, Pinecone will be initialized using the environment and Pincone API key. If an index does not already exist, one will be created and configured to store 1536 &lt;a href="https://docs.pinecone.io/docs/choosing-index-type-and-size" rel="noopener noreferrer"&gt;dimension vectors&lt;/a&gt; that correspond to the embedding's length, using cosine similarity as the &lt;a href="https://www.pinecone.io/learn/vector-similarity/" rel="noopener noreferrer"&gt;similarity metric&lt;/a&gt;. The Pincone instance will be created using the embeddings and index that have been provided, and the document will be added to the vector store using the Pinecone.from_documents() method.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;PINECONE_API_KEY = os.getenv('PINECONE_API_KEY')
PINECONE_ENV = os.getenv('PINECONE_ENV')
# Initialize Pinecone client
pinecone.init(
    api_key=PINECONE_API_KEY,
    environment=tPINECONE_ENV
)
index_name = 'international-sport'

if index_name not in pinecone.list_indexes():
    print(f'Creating Index {index_name}...')
    pinecone.create_index(index_name, dimension=1536, metric='cosine') # Create index. This might take a while to create
    # wait a moment for the index to be fully initialized
     time.sleep(1)
     print('Done')
else:
    print(f'index {index_name} already exists')
index = Pinecone.from_documents(docs, embeddings, index_name=index_name) 

# to retieve the number of vectors in the embedding
index = pinecone.Index(INDEX_NAME)
print(index.describe_index_stats())
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Finding similar document&lt;/p&gt;

&lt;p&gt;We will now define the function to search Pinecone for similar documents based on the user query input&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;get_similar_docs(cls, query, index, k=5):
    found_docs = index.similarity_search(query, k=k)
    print(found_docs)
    if len(found_docs) == 0:
      return "Sorry, There is no relevant answer to your question. Please try again."
    logger.info("Found document similar to the query")
    return found_docs
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Next, we'll use the Langchain PromptTemplate to create a predefined parameterized text format that will be used to direct response generation in a specific context.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;template = """
You are an AI chatbot with a sense of humor.
Your mission is to turn the user's input into funny jokes.

{chat_history}
Human: {human_input}
Chatbot:"""

new_prompt = PromptTemplate(
input_variables=["chat_history", "human_input"],
template=template
)
new_memory = ConversationBufferMemory(memory_key="chat_history")
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now, we will write a function that uses OpenAI LLM, a question-answering chain &lt;code&gt;load_qa_chain&lt;/code&gt; from Langchain, and takes the user's query as input. You can choose from a variety of chain types depending on your use case; in this case, we'll use the stuff &lt;code&gt;chain_type&lt;/code&gt;, which uses all of the text from the prompt's documents.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;model_name = "gpt-3.5-turbo-0301"
llm = OpenAI(model_name=model_name)

chain = load_qa_chain(llm, chain_type="stuff")

def retrieve_answer(query):
  similar_docs = get_similar_docs(query)
  return chain.run(input_documents=similar_docs, question=query)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Lastly, we will test the functionality of the question-answering system using the query below&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;query = "How many away goals did scotland score in the last century?"
reponse = retrieve_answer(query)
print(answer)
 //419
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Conclusion&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;In this article, we explained how we can mitigate hallucination during the response generation by giving it some context. We wrapped it up by building a context-aware question-answering system that utilizes the power of semantic search to extract relevant information from a set of documents that will give context to Open AI.&lt;/p&gt;

</description>
      <category>llm</category>
      <category>rag</category>
      <category>python</category>
      <category>langchain</category>
    </item>
  </channel>
</rss>
