Introdcution
Before we dive into deploying the Hugging Chat UI, let's first explore the capabilities of the Hugging Face Text Generation Inference Server. We'll start with a practical walkthrough, demonstrating how to access and utilize its API endpoints effectively. This initial exploration is key to understanding the various configurations available for text generation and how they can enhance your AI interactions.
Start The Hugging Face Inference Server
In this section, we focus on launching the Hugging Face Text Generation Inference Server, specifically configured with 8-bit quantization. This setting is pivotal for optimizing GPU memory utilization, ensuring efficient resource management, please refer to the detailed setup instructions provided in this link
export model=mistralai/Mistral-7B-v0.1
export volume=$PWD/data
docker run --gpus all --shm-size 1g -p 8080:80 -v $volume:/data ghcr.io/huggingface/text-generation-inference:1.3 --quantize=bitsandbytes --model-id $model
Discover Hugging Face Inference Server endpoints
Call the default generate Enpoint
curl --location 'http://127.0.0.1:8080/generate' \
--header 'Content-Type: application/json' \
--data '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":20}}'
Call the streaming endpoint
curl --location 'http://127.0.0.1:8080/generate_stream' \
--header 'Content-Type: application/json' \
--data '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":20}}'
Call the generate endpoint while activating sampling
curl --location 'http://127.0.0.1:8080/generate' \
--header 'Content-Type: application/json' \
--data '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":100, "do_sample":true, "top_k":50 }}'
Call the generate endpoint while changing temperature
curl --location 'http://127.0.0.1:8080/generate' \
--header 'Content-Type: application/json' \
--data '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":50, "do_sample":true, "top_k":50, "temperature":0.2 }}'
For more Generation strategies please refer to this link : https://huggingface.co/docs/transformers/generation_strategies
Monitoring with Health, Info, and Metrics API Endpoints
Ensuring System Health
curl --location 'http://127.0.0.1:8080/health'
Retrieving Server Information
curl --location 'http://127.0.0.1:8080/info'
Accessing Performance Metrics Endpoint
curl --location 'http://127.0.0.1:8080/metrics'
Install Hugging Face Chat UI
Clone the Repository
Initiate your project by cloning the Hugging face chat UI repository:
git clone https://github.com/huggingface/chat-ui.git
Configure the Environment
After cloning the repository, you'll need to set up your environment by editing the .env file. This involves specifying the correct IP addresses for your MongoDB instance and the Hugging Face Text Generation Inference Server.
Editing MongoDB Configuration:
Locate and edit the MONGODB_URL in the .env file to point to your MongoDB instance. Replace ${MONGO_DB_IP} with the actual IP address of your MongoDB server.
MONGODB_URL=mongodb://${MONGO_DB_IP}:27017
Setting Up Text Generation Inference Server Connection:
In the same .env file, ensure that the Hugging Face Text Generation Inference Server is correctly configured. Below is a JSON configuration snippet that you'll need to adjust based on your setup, it's important to recognize the MODELS object encapsulates your models' configurations:
{
"name": "mistralai/Mistral-7B-Instruct-v0.1-local",
"displayName": "mistralai/Mistral-7B-Instruct-v0.1-name",
"description": "Mistral 7B is a new Apache 2.0 model, released by Mistral AI that outperforms Llama2 13B in benchmarks.",
"websiteUrl": "https://mistral.ai/news/announcing-mistral-7b/",
"preprompt": "",
"chatPromptTemplate" : "<s>{{#each messages}}{{#ifUser}}[INST] {{#if @first}}{{#if @root.preprompt}}{{@root.preprompt}}\n{{/if}}{{/if}}{{content}} [/INST]{{/ifUser}}{{#ifAssistant}}{{content}}</s>{{/ifAssistant}}{{/each}}",
"parameters": {
"temperature": 0.1,
"top_p": 0.95,
"repetition_penalty": 1.2,
"top_k": 50,
"max_new_tokens": 1024,
"stop": ["</s>"]
},
"endpoints": [{
"type" : "tgi",
"url": "http://${TEXT_GENERATION_INFERENCE_SERVER}:80/",
}],
"promptExamples": [
{
"title": "Assist in a task",
"prompt": "How do I make a delicious lemon cheesecake?"
}
]
}
Build the Chat UI Docker image
DOCKER_BUILDKIT=1 docker build -t hugging-face-ui .
Run MongDB
docker run -d -p 27017:27017 --name mongo-chatui mongo:latest
Run the Hugging-Face Chat UI
docker run -p:3000:3000 hugging-face-ui
Top comments (0)