DEV Community

karthik Balasubramanian
karthik Balasubramanian

Posted on • Edited on

Deploying the DeepSeek R1 Model on AWS EC2 for Scalable AI Solutions

Introduction

The DeepSeek Model R1 is a state-of-the-art machine learning model designed for advanced data analysis, predictive modeling, and real-time inference. This guide provides a comprehensive walkthrough for deploying the DeepSeek Model R1 locally on an AWS EC2 instance, its use cases, advantages, cost optimization strategies, and key considerations for a successful deployment. AWS EC2 is an ideal platform for deploying machine learning models due to its scalability, flexibility, and seamless integration with other AWS services.


Use Cases

The DeepSeek Model R1 supports various applications across industries, including:

  • Fraud Detection: Detect anomalies in real-time financial transactions.
  • Predictive Maintenance: Analyze IoT data to prevent equipment failures.
  • Healthcare Diagnostics: Process large medical datasets for faster, more accurate diagnoses.
  • NLP Applications: Power conversational AI models or chatbots for improved customer interactions.

Deploying on EC2 provides a customizable and controlled environment to host these applications effectively.


Advantages of Deployment on EC2

  • Scalability: Dynamically scale resources based on real-time workloads, reducing costs during low usage periods.
  • Customization: Full control over the operating system, libraries, and model-specific configurations.
  • Performance: Utilize GPU-enabled EC2 instances (e.g., P3 or G5) for faster inference and training.
  • Data Privacy: Enhanced security through encryption, IAM roles, and VPC isolation.
  • Integration: Seamless integration with AWS services like S3, CloudWatch, and Lambda.

Instance Selection Guide for Developers

Recommended Instances:

  • Full Model Loading & Partial Inference:
    • Instance: p4d.24xlarge or higher.
  • Budget-Constrained Deployments:
    • Instance: p3.16xlarge or g5.48xlarge.
  • Lightweight Prototyping:
    • Instance: g4dn.12xlarge (4 NVIDIA T4 GPUs, 192 GB RAM).

My Configuration:

  • Instance Type: g4dn.12xlarge
  • GPU: 4 NVIDIA T4 GPUs (16 GB GDDR6 per GPU).
  • vCPUs: 48
  • Memory: 150 GB RAM
  • Storage: 1.9 TB NVMe SSD
  • Cost: ~$2.10 per hour (on-demand).
  • Operating System: Ubuntu

Installing Ollama

To install and set up Ollama, follow these steps:

  1. Install Ollama:
   curl -fsSL https://ollama.com/install.sh | sudo sh
Enter fullscreen mode Exit fullscreen mode
  1. Download the DeepSeek Model R1:
   ollama pull deepseek-r1:7b
Enter fullscreen mode Exit fullscreen mode
  1. Test Ollama:
   ollama run deepseek-r1:7b
   ollama show
   ollama list
Enter fullscreen mode Exit fullscreen mode
  1. Start Ollama:
   ollama serve
Enter fullscreen mode Exit fullscreen mode

Image description

Setting Up the UI for Ollama Model

Using Open WebUI with Docker provides an intuitive interface for interacting with the model.

Step 1: Install Docker

SSH into the instance and run:

sudo yum install -y docker
sudo systemctl start docker
sudo usermod -aG docker $USER
Enter fullscreen mode Exit fullscreen mode

Step 2: Pull the Open WebUI Docker Image

docker pull ghcr.io/open-webui/open-webui:main
Enter fullscreen mode Exit fullscreen mode

Step 3: Run the Docker Container

docker run -d -p 3000:8080 \
  --add-host=host.docker.internal:host-gateway \
  -v open-webui:/app/backend/data \
  --name open-webui \
  --restart always \
  ghcr.io/open-webui/open-webui:main
Enter fullscreen mode Exit fullscreen mode

Image description

Step 4: Access the WebUI

After starting the container, access Open WebUI at:

http://localhost:3000
Enter fullscreen mode Exit fullscreen mode

Image description

Accessing Ollama via REST API

Ollama provides a REST API for managing and interacting with models.

Generate a Response

curl http://localhost:11434/api/generate -d '{
  "model": "deepseek-r1:7b",
  "prompt": "Why is the sky blue?"
}'
Enter fullscreen mode Exit fullscreen mode

Chat with the Model

curl http://localhost:11434/api/chat -d '{
  "model": "deepseek-r1:7b",
  "messages": [
    { "role": "user", "content": "Why is the sky blue?" }
  ]
}'
Enter fullscreen mode Exit fullscreen mode

Monitor GPU usage during inference using the command:

nvidia-smi
Enter fullscreen mode Exit fullscreen mode

Image description

Next Steps: Optimizations

  • Proxy Implementation: Use reverse proxies for improved performance and scalability.
  • Stable Environment: Automate deployment workflows with tools like Terraform or AWS CloudFormation.
  • Cost Optimization: Leverage Spot Instances or Savings Plans to reduce operating costs.

Image of AssemblyAI

Automatic Speech Recognition with AssemblyAI

Experience near-human accuracy, low-latency performance, and advanced Speech AI capabilities with AssemblyAI's Speech-to-Text API. Sign up today and get $50 in API credit. No credit card required.

Try the API

Top comments (0)

AWS Security LIVE!

Tune in for AWS Security LIVE!

Join AWS Security LIVE! for expert insights and actionable tips to protect your organization and keep security teams prepared.

Learn More

👋 Kindness is contagious

Dive into an ocean of knowledge with this thought-provoking post, revered deeply within the supportive DEV Community. Developers of all levels are welcome to join and enhance our collective intelligence.

Saying a simple "thank you" can brighten someone's day. Share your gratitude in the comments below!

On DEV, sharing ideas eases our path and fortifies our community connections. Found this helpful? Sending a quick thanks to the author can be profoundly valued.

Okay