<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Mahmoud Sehsah</title>
    <description>The latest articles on DEV Community by Mahmoud Sehsah (@mahmoudrasmyfathy1).</description>
    <link>https://dev.to/mahmoudrasmyfathy1</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1261043%2F74f9a4d5-99aa-4791-86b5-977a49bcaaa9.jpeg</url>
      <title>DEV Community: Mahmoud Sehsah</title>
      <link>https://dev.to/mahmoudrasmyfathy1</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/mahmoudrasmyfathy1"/>
    <language>en</language>
    <item>
      <title>Getting Started with Natural Language Toolkit (NLTK)</title>
      <dc:creator>Mahmoud Sehsah</dc:creator>
      <pubDate>Sat, 27 Jan 2024 22:22:19 +0000</pubDate>
      <link>https://dev.to/mahmoudrasmyfathy1/getting-started-with-natural-language-toolkit-nltk-3eok</link>
      <guid>https://dev.to/mahmoudrasmyfathy1/getting-started-with-natural-language-toolkit-nltk-3eok</guid>
      <description>&lt;h1&gt;
  
  
  Introduction
&lt;/h1&gt;

&lt;p&gt;NLTK (Natural Language Toolkit), one of the most popular libraries in Python for working with human language data (i.e., text). This tutorial will guide you through the installation process, basic concepts, and some key functionalities of NLTK.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;a href="https://github.com/mahmoudrasmyfathy1/NLP-Tutorial/blob/main/getting-started-nltk.ipynb"&gt;Link for the Notebook&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h1&gt;
  
  
  1.Installation
&lt;/h1&gt;

&lt;p&gt;First, you need to install NLTK. You can do this easily using pip. In your command line (Terminal, Command Prompt, etc.), enter the following command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;!pip install nltk
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h1&gt;
  
  
  2.Understanding the Role of nltk.download() in NLTK Setup
&lt;/h1&gt;

&lt;p&gt;Use nltk.download() to fetch datasets and models for text processing with NLTK, ensuring updated resources and easing setup.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import nltk
nltk.download()
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h1&gt;
  
  
  3.Tokenization
&lt;/h1&gt;

&lt;p&gt;Tokenization is the process of splitting a text into meaningful units, such as words or sentences.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;from nltk.tokenize import word_tokenize, sent_tokenize

text = "Hello there! How are you? I hope you're learning a lot from this tutorial."

# Sentence Tokenization
sentences = sent_tokenize(text)
print(sentences)

# Word Tokenization
words = word_tokenize(text)
print(words)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3n2fcdultieimked6ls3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3n2fcdultieimked6ls3.png" alt="Image description" width="800" height="42"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  4. Part-of-Speech (POS) Tagging
&lt;/h1&gt;

&lt;p&gt;POS tagging means labeling words with their part of speech (noun, verb, adjective, etc.).&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;from nltk import pos_tag
​
words = word_tokenize("I am learning NLP with NLTK")
pos_tags = pos_tag(words)
print(pos_tags)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fskpvotlg539wkwp2jw5h.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fskpvotlg539wkwp2jw5h.png" alt="Image description" width="755" height="36"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  5. Stopwords
&lt;/h1&gt;

&lt;p&gt;Stopwords are common words that are usually removed from text because they carry little meaningful information.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize, sent_tokenize

words = word_tokenize("Hello there! How are you? I hope you're learning a lot from this tutorial.")
stop_words = set(stopwords.words('english'))
filtered_words = [word for word in words if not word in stop_words]
print(filtered_words)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fn5mlwzq8au79misi94qm.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fn5mlwzq8au79misi94qm.png" alt="Image description" width="638" height="47"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  6. Stemming
&lt;/h1&gt;

&lt;p&gt;Stemming is a process of stripping suffixes from words to extract the base or root form, known as the 'stem'. For example, the stem of the words 'waiting', 'waited', and 'waits' is 'wait'.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;from nltk.stem import PorterStemmer
from nltk.tokenize import word_tokenize
ps = PorterStemmer()
sentence = "It's important to be waiting patiently when you're learning to code."
words = word_tokenize(sentence)
stemmed_words = [ps.stem(word) for word in words]
print(stemmed_words)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fosu9aeem6e52yxtr823g.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fosu9aeem6e52yxtr823g.png" alt="Image description" width="773" height="39"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  7. Lemmatization
&lt;/h1&gt;

&lt;p&gt;Lemmatization is the process of reducing a word to its base or dictionary form, known as the 'lemma'. Unlike stemming, lemmatization considers the context and converts the word to its meaningful base form. For instance, 'is', 'are', and 'am' would all be lemmatized to 'be'.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import nltk
from nltk.stem import WordNetLemmatizer
from nltk.tokenize import word_tokenize

nltk.download('punkt')
nltk.download('wordnet', download_dir='/usr/share/nltk_data/corpora/wordnet')  # specify your NLTK data directory if it's not in the default location

lemmatizer = WordNetLemmatizer()
sentence = "The leaves on the ground were raked by the gardener, who was also planting bulbs for the coming spring."
words = word_tokenize(sentence)
lemmatized_words = [lemmatizer.lemmatize(word) for word in words]
print(lemmatized_words)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbty4dryzocb7me3nq3eu.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbty4dryzocb7me3nq3eu.png" alt="Image description" width="800" height="71"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  8.Frequency Distribution
&lt;/h1&gt;

&lt;p&gt;This is used to find the frequency of each vocabulary item in the text.&lt;/p&gt;

&lt;p&gt;from nltk.probability import FreqDist&lt;br&gt;
words = word_tokenize("I need to write a very, very simple sentence")&lt;br&gt;
fdist = FreqDist(words)&lt;br&gt;
print(fdist.most_common(1))&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fr0peeuozxgjtahmy84km.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fr0peeuozxgjtahmy84km.png" alt="Image description" width="139" height="40"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h1&gt;
  
  
  9. Named Entity Recognition (NER)
&lt;/h1&gt;

&lt;p&gt;NER is used to identify entities like names, locations, dates, etc., in the text.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import nltk
from nltk.tokenize import word_tokenize
from nltk.tag import pos_tag
from nltk.chunk import ne_chunk

nltk.download('punkt')
nltk.download('averaged_perceptron_tagger')
nltk.download('maxent_ne_chunker')
nltk.download('words')

sentence = "I will travel to Spain"
# Tokenize the sentence
words = word_tokenize(sentence)
# Part-of-speech tagging
pos_tags = pos_tag(words)
# Named entity recognition
named_entities = ne_chunk(pos_tags)
# Print named entities
print(named_entities)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvrflv3veizzjbiqv6sjj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvrflv3veizzjbiqv6sjj.png" alt="Image description" width="487" height="206"&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>nlp</category>
      <category>llm</category>
    </item>
    <item>
      <title>Deploying HuggingFace Chat UI with the Hugging Face Text Generation Inference Server</title>
      <dc:creator>Mahmoud Sehsah</dc:creator>
      <pubDate>Tue, 23 Jan 2024 02:01:34 +0000</pubDate>
      <link>https://dev.to/mahmoudrasmyfathy1/deploying-huggingface-chat-ui-with-the-hugging-face-text-generation-inference-server-n3h</link>
      <guid>https://dev.to/mahmoudrasmyfathy1/deploying-huggingface-chat-ui-with-the-hugging-face-text-generation-inference-server-n3h</guid>
      <description>&lt;h2&gt;
  
  
  Introdcution
&lt;/h2&gt;

&lt;p&gt;Before we dive into deploying the Hugging Chat UI, let's first explore the capabilities of the Hugging Face Text Generation Inference Server. We'll start with a practical walkthrough, demonstrating how to access and utilize its API endpoints effectively. This initial exploration is key to understanding the various configurations available for text generation and how they can enhance your AI interactions. &lt;/p&gt;

&lt;h2&gt;
  
  
  Start The Hugging Face Inference Server
&lt;/h2&gt;

&lt;p&gt;In this section, we focus on launching the Hugging Face Text Generation Inference Server, specifically configured with 8-bit quantization. This setting is pivotal for optimizing GPU memory utilization, ensuring efficient resource management, please refer to the detailed setup instructions provided in &lt;a href="https://dev.to/mahmoudrasmyfathy1/deploy-mistral-llm-on-google-compute-engine-with-docker-gpu-support-and-hugging-face-inference-server-dbb"&gt;this link&lt;/a&gt;&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;

export model=mistralai/Mistral-7B-v0.1
export volume=$PWD/data 


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;

docker run --gpus all --shm-size 1g -p 8080:80 -v $volume:/data ghcr.io/huggingface/text-generation-inference:1.3 --quantize=bitsandbytes --model-id $model


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxrczdn6xl22v5w24gb8g.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxrczdn6xl22v5w24gb8g.png" alt="Image description"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Discover Hugging Face Inference Server endpoints
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Call the default generate Enpoint
&lt;/h3&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;

curl --location 'http://127.0.0.1:8080/generate' \
--header 'Content-Type: application/json' \
--data '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":20}}'


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h3&gt;
  
  
  Call the streaming endpoint
&lt;/h3&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;

curl --location 'http://127.0.0.1:8080/generate_stream' \
--header 'Content-Type: application/json' \
--data '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":20}}'


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h3&gt;
  
  
  Call the generate endpoint while activating sampling
&lt;/h3&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;

curl --location 'http://127.0.0.1:8080/generate' \
--header 'Content-Type: application/json' \
--data '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":100, "do_sample":true, "top_k":50 }}'


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h3&gt;
  
  
  Call the generate endpoint while changing temperature
&lt;/h3&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;

curl --location 'http://127.0.0.1:8080/generate' \
--header 'Content-Type: application/json' \
--data '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":50, "do_sample":true, "top_k":50, "temperature":0.2 }}'


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;blockquote&gt;
&lt;p&gt;For more Generation strategies please refer to this link : &lt;a href="https://huggingface.co/docs/transformers/generation_strategies" rel="noopener noreferrer"&gt;https://huggingface.co/docs/transformers/generation_strategies&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Monitoring with Health, Info, and Metrics API Endpoints
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Ensuring System Health
&lt;/h3&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;

curl --location 'http://127.0.0.1:8080/health'


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h3&gt;
  
  
  Retrieving Server Information
&lt;/h3&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;

curl --location 'http://127.0.0.1:8080/info'


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5dnpy20e434qqsc086vk.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5dnpy20e434qqsc086vk.png" alt="Image description"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Accessing Performance Metrics Endpoint
&lt;/h3&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;

curl --location 'http://127.0.0.1:8080/metrics'


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frnqz471wb1o1g9x8ip6x.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frnqz471wb1o1g9x8ip6x.png" alt="Image description"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Install Hugging Face Chat UI
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Clone the Repository
&lt;/h3&gt;

&lt;p&gt;Initiate your project by cloning the Hugging face chat UI repository:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;

git clone https://github.com/huggingface/chat-ui.git


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h3&gt;
  
  
  Configure the Environment
&lt;/h3&gt;

&lt;p&gt;After cloning the repository, you'll need to set up your environment by editing the .env file. This involves specifying the correct IP addresses for your MongoDB instance and the Hugging Face Text Generation Inference Server.&lt;/p&gt;
&lt;h4&gt;
  
  
  Editing MongoDB Configuration:
&lt;/h4&gt;

&lt;p&gt;Locate and edit the MONGODB_URL in the .env file to point to your MongoDB instance. Replace ${MONGO_DB_IP} with the actual IP address of your MongoDB server.&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;

MONGODB_URL=mongodb://${MONGO_DB_IP}:27017


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Setting Up Text Generation Inference Server Connection:&lt;/p&gt;

&lt;p&gt;In the same .env file, ensure that the Hugging Face Text Generation Inference Server is correctly configured. Below is a JSON configuration snippet that you'll need to adjust based on your setup, it's important to recognize the MODELS object encapsulates your models' configurations:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;

{
      "name": "mistralai/Mistral-7B-Instruct-v0.1-local",
      "displayName": "mistralai/Mistral-7B-Instruct-v0.1-name",
      "description": "Mistral 7B is a new Apache 2.0 model, released by Mistral AI that outperforms Llama2 13B in benchmarks.",
      "websiteUrl": "https://mistral.ai/news/announcing-mistral-7b/",
      "preprompt": "",
      "chatPromptTemplate" : "&amp;lt;s&amp;gt;{{#each messages}}{{#ifUser}}[INST] {{#if @first}}{{#if @root.preprompt}}{{@root.preprompt}}\n{{/if}}{{/if}}{{content}} [/INST]{{/ifUser}}{{#ifAssistant}}{{content}}&amp;lt;/s&amp;gt;{{/ifAssistant}}{{/each}}",
      "parameters": {
        "temperature": 0.1,
        "top_p": 0.95,
        "repetition_penalty": 1.2,
        "top_k": 50,
        "max_new_tokens": 1024,
        "stop": ["&amp;lt;/s&amp;gt;"]
      },
      "endpoints": [{
        "type" : "tgi",
        "url": "http://${TEXT_GENERATION_INFERENCE_SERVER}:80/",
        }],
      "promptExamples": [
      {
          "title": "Assist in a task",
          "prompt": "How do I make a delicious lemon cheesecake?"
        }
      ]
    }


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h3&gt;
  
  
  Build the Chat UI Docker image
&lt;/h3&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;

DOCKER_BUILDKIT=1 docker build -t hugging-face-ui .


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h3&gt;
  
  
  Run MongDB
&lt;/h3&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;

docker run -d -p 27017:27017 --name mongo-chatui mongo:latest


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h3&gt;
  
  
  Run the Hugging-Face Chat UI
&lt;/h3&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;

docker run -p:3000:3000 hugging-face-ui


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

</description>
      <category>llm</category>
      <category>mlops</category>
      <category>ai</category>
      <category>machinelearning</category>
    </item>
    <item>
      <title>Deploy Mistral LLM on Google Compute Engine with Docker, GPU Support, and Hugging Face Inference Server</title>
      <dc:creator>Mahmoud Sehsah</dc:creator>
      <pubDate>Sun, 21 Jan 2024 20:21:46 +0000</pubDate>
      <link>https://dev.to/mahmoudrasmyfathy1/deploy-mistral-llm-on-google-compute-engine-with-docker-gpu-support-and-hugging-face-inference-server-dbb</link>
      <guid>https://dev.to/mahmoudrasmyfathy1/deploy-mistral-llm-on-google-compute-engine-with-docker-gpu-support-and-hugging-face-inference-server-dbb</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;A practical guide on setting up Large Language Models (LLMs) on Google Compute Engine using GPUs. This guide is designed to walk you through the process step by step, making it easy for you to take advantage of the powerful combination of Google's cloud infrastructure and NVIDIA's GPU technology. &lt;/p&gt;

&lt;h2&gt;
  
  
  Machine Specs for the tutorial
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Hardware Specifications:
&lt;/h3&gt;

&lt;h4&gt;
  
  
  GPU Information:
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;GPU Type: Nvidia T4&lt;/li&gt;
&lt;li&gt;Number of GPUs: 2&lt;/li&gt;
&lt;li&gt;GPU Memory: 16 GB GDDR6 (per GPU)&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Google compute engine Machine Type:
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Type: n1-highmem-4&lt;/li&gt;
&lt;li&gt;vCPUs: 4&lt;/li&gt;
&lt;li&gt;Cores: 2&lt;/li&gt;
&lt;li&gt;Memory: 26 GB&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Disk Information:
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Disk Type: Balanced Persistent Disk&lt;/li&gt;
&lt;li&gt;Disk Size: 150 GB&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Software Specifications:
&lt;/h3&gt;

&lt;h4&gt;
  
  
  Operating System:
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Ubuntu Version: 20.04 LTS&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  CUDA version:
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;CUDA version: 12.3&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  1.Setting Up Docker
&lt;/h2&gt;

&lt;p&gt;Follow these simple steps to get Docker up and running on your system:&lt;/p&gt;

&lt;h3&gt;
  
  
  1.1 Adding Docker's Official GPG Key
&lt;/h3&gt;

&lt;p&gt;add Docker’s official GPG key to your system. This step is crucial for validating the authenticity of the Docker packages you'll be installing&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;sudo apt-get update
sudo apt-get install ca-certificates curl gnupg
sudo install -m 0755 -d /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
sudo chmod a+r /etc/apt/keyrings/docker.gpg
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  1.2 Adding Docker Repository to Apt Sources
&lt;/h3&gt;

&lt;p&gt;Add Docker's repository to your system's Apt sources. This allows you to fetch Docker packages from their official repository:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;echo \
  "deb [arch="$(dpkg --print-architecture)" signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu \
  "$(. /etc/os-release &amp;amp;&amp;amp; echo "$VERSION_CODENAME")" stable" | \
  sudo tee /etc/apt/sources.list.d/docker.list &amp;gt; /dev/null
sudo apt-get update
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  1.3 Installing Docker
&lt;/h3&gt;

&lt;p&gt;Install Docker using the following command&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;sudo apt-get --reinstall install docker-ce
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  1.4 Create Docker Group
&lt;/h3&gt;

&lt;p&gt;If not already present, add the 'docker' group to your system:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;sudo groupadd docker
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  1.5 Add default User to Docker Group
&lt;/h3&gt;

&lt;p&gt;Add your default user to the 'docker' group to manage Docker as a non-root user:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;sudo usermod -aG docker $USER
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  1.6 check on Docker status
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;systemctl status docker
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  2.Install NVIDIA Container Toolkit
&lt;/h2&gt;

&lt;h3&gt;
  
  
  2.1 Add NVIDIA GPG Key and NVIDIA Container Toolkit Repository
&lt;/h3&gt;

&lt;p&gt;Start by adding the NVIDIA GPG key to ensure the authenticity of the software packages and add the NVIDIA Container Toolkit repository to your system's software sources:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg &amp;amp;&amp;amp; curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  2.2 Enable Experimental Features (Optional)
&lt;/h3&gt;

&lt;p&gt;If you wish to use experimental features, uncomment the respective lines in the sources list:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;sudo sed -i -e '/experimental/ s/^#//g' /etc/apt/sources.list.d/nvidia-container-toolkit.list
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  2.3 Update Package Index and Install NVIDIA Toollit
&lt;/h3&gt;

&lt;p&gt;pdate your package index and install the NVIDIA Container Toolkit:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;sudo apt-get update
sudo apt-get install -y nvidia-container-toolkit
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  3.Configure Container Toolkit
&lt;/h2&gt;

&lt;h3&gt;
  
  
  3.1 Configure NVIDIA Container Toolkit
&lt;/h3&gt;

&lt;p&gt;Configure the NVIDIA Container Toolkit to work with Docker:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;sudo nvidia-ctk runtime configure --runtime=docker
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fijs3b8cbudlgzbr8nkym.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fijs3b8cbudlgzbr8nkym.png" alt="Image description" width="705" height="72"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  3.2 Restart Docker Service
&lt;/h3&gt;

&lt;p&gt;Apply the changes by restarting the Docker service:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;sudo systemctl restart docker
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  4.Prerequisites Before Installing CUDA Drivers
&lt;/h2&gt;

&lt;p&gt;Ensure your system meets the following prerequisites before proceeding with the CUDA driver installation. For detailed guidance, refer to the official NVIDIA CUDA installation guide (&lt;a href="https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#pre-installation-actions"&gt;https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#pre-installation-actions&lt;/a&gt;).&lt;/p&gt;

&lt;h3&gt;
  
  
  4.1 Verify CUDA-Capable GPU
&lt;/h3&gt;

&lt;p&gt;First, confirm that your system has an NVIDIA GPU installed, this command should return information about the NVIDIA graphics card if one is present.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;lspci | grep -i nvidia
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvfesb9cnkln1dauq7dy0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvfesb9cnkln1dauq7dy0.png" alt="Image description" width="508" height="55"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  4.2 Confirm Supported Linux Version
&lt;/h3&gt;

&lt;p&gt;Ensure your Linux distribution is supported by checking its version, this command will display the architecture of your system and details about your Linux distribution :&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;uname -m &amp;amp;&amp;amp; cat /etc/*release
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  4.3 Check Kernel Headers and Development Packages
&lt;/h3&gt;

&lt;p&gt;Verify that your system has the appropriate kernel headers and development packages, which are essential for building the NVIDIA kernel module:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;uname -r
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  5. Installing NVIDIA Drivers
&lt;/h2&gt;

&lt;p&gt;Follow these steps to install NVIDIA drivers on your system. For detailed instructions, you can refer to the NVIDIA Tesla Installation Notes (&lt;a href="https://docs.nvidia.com/datacenter/tesla/tesla-installation-notes/index.html"&gt;https://docs.nvidia.com/datacenter/tesla/tesla-installation-notes/index.html&lt;/a&gt;).&lt;/p&gt;

&lt;h3&gt;
  
  
  5.1 Install Required Kernel Headers
&lt;/h3&gt;

&lt;p&gt;Start by installing the Linux kernel headers corresponding to your current kernel version:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;sudo apt-get install linux-headers-$(uname -r)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  5.2 Add the NVIDIA CUDA Repository
&lt;/h3&gt;

&lt;p&gt;Identify your distribution's version and add the NVIDIA CUDA repository to your system:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;distribution=$(. /etc/os-release;echo $ID$VERSION_ID | sed -e 's/\.//g')
wget https://developer.download.nvidia.com/compute/cuda/repos/$distribution/x86_64/cuda-keyring_1.0-1_all.deb
sudo dpkg -i cuda-keyring_1.0-1_all.deb
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  5.3 Update and Install NVIDIA Drivers
&lt;/h3&gt;

&lt;p&gt;Finally, update your package lists and install the CUDA drivers, after the installation it will need a restart&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;sudo apt-get update
sudo apt-get -y install cuda-drivers
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  6. Post-Installation Steps for NVIDIA Driver
&lt;/h2&gt;

&lt;p&gt;After successfully installing the NVIDIA drivers, perform the following post-installation steps to ensure everything is set up correctly. For a comprehensive guide, consult the NVIDIA CUDA Installation Guide for Linux (&lt;a href="https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#post-installation-actions"&gt;https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#post-installation-actions&lt;/a&gt;).&lt;/p&gt;

&lt;h3&gt;
  
  
  6.1 Verify NVIDIA Persistence Daemon
&lt;/h3&gt;

&lt;p&gt;Check the status of the NVIDIA Persistence Daemon to ensure it's running correctly:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;systemctl status nvidia-persistenced
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F82adao7n5ih3asgcok4q.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F82adao7n5ih3asgcok4q.png" alt="Image description" width="736" height="211"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  6.2 Monitor GPU Utilization
&lt;/h3&gt;

&lt;p&gt;To confirm that your GPU is recognized and monitor its utilization, use:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;nvidia-smi
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1svq0ged70yt0pb0qtn7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1svq0ged70yt0pb0qtn7.png" alt="Image description" width="650" height="426"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Define model configuration
&lt;/h2&gt;

&lt;p&gt;To set up the model configuration, you can use the following environment variables, the variable model is set to mistralai/Mistral-7B-v0.1, representing the Mistral-7B-v0.1 model for the tutorial, The variable volume is set to the present working directory ($PWD) followed by /data, indicating the directory path where data will be stored :&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;export model=mistralai/Mistral-7B-v0.1
export volume=$PWD/data 
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Run text-generation-inference using Docker
&lt;/h2&gt;

&lt;p&gt;To perform text generation inference, we will will use the Huggingface text generation inference server (for more details check this url &lt;a href="https://huggingface.co/docs/text-generation-inference/index"&gt;https://huggingface.co/docs/text-generation-inference/index&lt;/a&gt;), execute the following Docker command with the following parameters, and also here is the parameters explanation:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;--gpus all: Enables GPU support for Docker containers.&lt;/li&gt;
&lt;li&gt;--shm-size 1g: Sets the shared memory size to 1 gigabyte.&lt;/li&gt;
&lt;li&gt;-p 8080:80: Maps port 8080 on the host to port 80 in the Docker container.&lt;/li&gt;
&lt;li&gt;-v $volume:/data: Mounts the local data volume specified by $volume inside the Docker container at the /data path.&lt;/li&gt;
&lt;li&gt;ghcr.io/huggingface/text-generation-inference:1.3: Specifies the Docker image for text-generation-inference with the version tag 1.3.&lt;/li&gt;
&lt;li&gt;--model-id $model: Passes the specified model identifier ($model) to the text-generation-inference application.
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;docker run --gpus all --shm-size 1g -p 8080:80 -v $volume:/data ghcr.io/huggingface/text-generation-inference:1.3 --model-id $model
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  check GPU utilisation
&lt;/h2&gt;

&lt;p&gt;Run again the GPU monitoring command to check the memory utilization after loading model weights into the GPU memory :&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;nvidia-smi
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6fh2mfw3kazms45meqli.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6fh2mfw3kazms45meqli.png" alt="Image description" width="646" height="492"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Test API endpoint
&lt;/h2&gt;

&lt;p&gt;To test the API endpoint, use the following curl command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;curl 127.0.0.1:8080/generate -X POST -d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":20}}' -H 'Content-Type: application/json'
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Figwgqbv5kp5b4tvobzmn.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Figwgqbv5kp5b4tvobzmn.png" alt="Image description" width="800" height="35"&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>llm</category>
      <category>cloud</category>
      <category>ai</category>
      <category>mlops</category>
    </item>
  </channel>
</rss>
