DEV Community

karthik Balasubramanian
karthik Balasubramanian

Posted on • Edited on

Deploying the DeepSeek R1 Model on AWS EC2 for Scalable AI Solutions

Introduction

The DeepSeek Model R1 is a state-of-the-art machine learning model designed for advanced data analysis, predictive modeling, and real-time inference. This guide provides a comprehensive walkthrough for deploying the DeepSeek Model R1 locally on an AWS EC2 instance, its use cases, advantages, cost optimization strategies, and key considerations for a successful deployment. AWS EC2 is an ideal platform for deploying machine learning models due to its scalability, flexibility, and seamless integration with other AWS services.


Use Cases

The DeepSeek Model R1 supports various applications across industries, including:

  • Fraud Detection: Detect anomalies in real-time financial transactions.
  • Predictive Maintenance: Analyze IoT data to prevent equipment failures.
  • Healthcare Diagnostics: Process large medical datasets for faster, more accurate diagnoses.
  • NLP Applications: Power conversational AI models or chatbots for improved customer interactions.

Deploying on EC2 provides a customizable and controlled environment to host these applications effectively.


Advantages of Deployment on EC2

  • Scalability: Dynamically scale resources based on real-time workloads, reducing costs during low usage periods.
  • Customization: Full control over the operating system, libraries, and model-specific configurations.
  • Performance: Utilize GPU-enabled EC2 instances (e.g., P3 or G5) for faster inference and training.
  • Data Privacy: Enhanced security through encryption, IAM roles, and VPC isolation.
  • Integration: Seamless integration with AWS services like S3, CloudWatch, and Lambda.

Instance Selection Guide for Developers

Recommended Instances:

  • Full Model Loading & Partial Inference:
    • Instance: p4d.24xlarge or higher.
  • Budget-Constrained Deployments:
    • Instance: p3.16xlarge or g5.48xlarge.
  • Lightweight Prototyping:
    • Instance: g4dn.12xlarge (4 NVIDIA T4 GPUs, 192 GB RAM).

My Configuration:

  • Instance Type: g4dn.12xlarge
  • GPU: 4 NVIDIA T4 GPUs (16 GB GDDR6 per GPU).
  • vCPUs: 48
  • Memory: 150 GB RAM
  • Storage: 1.9 TB NVMe SSD
  • Cost: ~$2.10 per hour (on-demand).
  • Operating System: Ubuntu

Installing Ollama

To install and set up Ollama, follow these steps:

  1. Install Ollama:
   curl -fsSL https://ollama.com/install.sh | sudo sh
Enter fullscreen mode Exit fullscreen mode
  1. Download the DeepSeek Model R1:
   ollama pull deepseek-r1:7b
Enter fullscreen mode Exit fullscreen mode
  1. Test Ollama:
   ollama run deepseek-r1:7b
   ollama show
   ollama list
Enter fullscreen mode Exit fullscreen mode
  1. Start Ollama:
   ollama serve
Enter fullscreen mode Exit fullscreen mode

Image description

Setting Up the UI for Ollama Model

Using Open WebUI with Docker provides an intuitive interface for interacting with the model.

Step 1: Install Docker

SSH into the instance and run:

sudo yum install -y docker
sudo systemctl start docker
sudo usermod -aG docker $USER
Enter fullscreen mode Exit fullscreen mode

Step 2: Pull the Open WebUI Docker Image

docker pull ghcr.io/open-webui/open-webui:main
Enter fullscreen mode Exit fullscreen mode

Step 3: Run the Docker Container

docker run -d -p 3000:8080 \
  --add-host=host.docker.internal:host-gateway \
  -v open-webui:/app/backend/data \
  --name open-webui \
  --restart always \
  ghcr.io/open-webui/open-webui:main
Enter fullscreen mode Exit fullscreen mode

Image description

Step 4: Access the WebUI

After starting the container, access Open WebUI at:

http://localhost:3000
Enter fullscreen mode Exit fullscreen mode

Image description

Accessing Ollama via REST API

Ollama provides a REST API for managing and interacting with models.

Generate a Response

curl http://localhost:11434/api/generate -d '{
  "model": "deepseek-r1:7b",
  "prompt": "Why is the sky blue?"
}'
Enter fullscreen mode Exit fullscreen mode

Chat with the Model

curl http://localhost:11434/api/chat -d '{
  "model": "deepseek-r1:7b",
  "messages": [
    { "role": "user", "content": "Why is the sky blue?" }
  ]
}'
Enter fullscreen mode Exit fullscreen mode

Monitor GPU usage during inference using the command:

nvidia-smi
Enter fullscreen mode Exit fullscreen mode

Image description

Next Steps: Optimizations

  • Proxy Implementation: Use reverse proxies for improved performance and scalability.
  • Stable Environment: Automate deployment workflows with tools like Terraform or AWS CloudFormation.
  • Cost Optimization: Leverage Spot Instances or Savings Plans to reduce operating costs.

Image of Docusign

Bring your solution into Docusign. Reach over 1.6M customers.

Docusign is now extensible. Overcome challenges with disconnected products and inaccessible data by bringing your solutions into Docusign and publishing to 1.6M customers in the App Center.

Learn more

Top comments (0)

A Workflow Copilot. Tailored to You.

Pieces.app image

Our desktop app, with its intelligent copilot, streamlines coding by generating snippets, extracting code from screenshots, and accelerating problem-solving.

Read the docs