Juan Guillermo Gomez Torres for Google Developer Experts

Posted on Jan 29

Deploying Powerful Language Models with Ease: Leveraging DeepSeek and Model Garden on Vertex AI

#ai #deepseek #vertexai

The landscape of artificial intelligence, particularly in the domain of large language models (LLMs), is rapidly evolving. Accessing and deploying these powerful models can present significant challenges regarding computational resources and infrastructure management. This article, based on a recent video on my YouTube channel, explores how Google Cloud Platform (GCP) Model Garden on Vertex AI simplifies accessing and utilizing cutting-edge open-source models like DeepSeek. We will delve into the features of the Model Garden, the characteristics of DeepSeek, the synergistic benefits of using them together, and the associated considerations.

Google Cloud Model Garden: A Hub for AI Models

The Google Cloud Model Garden is a centralized repository designed to facilitate the discovery, deployment, and fine-tuning of various machine learning models. It hosts both proprietary models, such as the Gemini family, and, importantly, a wide array of open-source models. A key feature highlighted is the ability to deploy models with a single click onto Google Cloud’s infrastructure. This significantly reduces the overhead associated with setting up the necessary hardware and software environment.

Within the Model Garden, users can find models tailored for specific tasks by leveraging filters based on categories like language models, sentiment analysis, translation, and so on. Furthermore, the platform integrates with the popular HuggingFace Hub, granting access to its vast collection of over 250,000 models, more or less. This integration allows users familiar with HuggingFace to seamlessly transition to deploying and experimenting with these models within the Google Cloud ecosystem. The Model Garden supports various deployment options, including Vertex AI’s fully managed environment, which abstracts away infrastructure management, and the possibility of deploying on existing Kubernetes clusters for users with pre-existing infrastructure.

The Model Garden supports various deployment options, including Vertex AI’s fully managed environment, which abstracts away infrastructure management, and the possibility of deploying on existing Kubernetes clusters for users with pre-existing infrastructure.

Advantages and Disadvantages of the Model Garden on GCP

The Model Garden offers several notable advantages for users seeking to leverage AI models:

Simplified deployment: The primary advantage emphasized is the ease of deploying models. With just a few clicks, users can have a model running on Google Cloud infrastructure, eliminating the complexities of manual setup and configuration.

Access to a wide range of models: The platform serves as a central hub, providing access to both Google’s proprietary models and a vast selection of open-source models, including those from HuggingFace. This allows users to explore and experiment with different architectures and capabilities.

Managed infrastructure: Deployment through Vertex AI offers a fully managed environment, relieving users from the burden of infrastructure management and hardware considerations.

Flexibility in deployment options: The Model Garden caters to different user needs by offering deployment options on fully managed services or user-managed Kubernetes clusters.

Facilitates experimentation and evaluation: The platform provides tools for testing deployed models directly through a user interface, allowing for quick experimentation with different prompts and parameters.

Some implicit considerations or potential limitations can be:

Cost Considerations: While deployment is simplified, the underlying infrastructure usage incurs costs. Users must be mindful of the hourly charges associated with the selected machine types and perform cost estimations based on their usage patterns.

Dependency on Google Cloud: Utilizing the Model Garden inherently ties users to the Google Cloud ecosystem.

Potential Learning Curve: While deployment is simplified, understanding the different deployment options and the intricacies of cloud infrastructure requires some learning.

DeepSeek: Open Source Innovation in Language Models

DeepSeek is presented as a significant player in the open-source LLM space. Three key characteristics are highlighted:

Open Source: DeepSeek models are not only available for commercial free use, but the entire process of their creation is also publicly exposed. This transparency fosters community collaboration and allows for deeper understanding and scrutiny of the model development.

Resource Efficiency: DeepSeek models are claimed to have been trained using fewer computational resources (less hardware and infrastructure). This advancement makes it possible to develop highly capable models with reduced environmental impact and potentially lower development costs.

Distilled Models: DeepSeek has produced smaller, “Distilled” models that retain high reasoning capabilities and good performance compared to much larger models. This is a crucial aspect as it opens up possibilities for broader accessibility and deployment on less powerful hardware.

How to Use DeepSeek with the Model Garden

The Google Cloud Model Garden provides a streamlined way to use DeepSeek models, particularly the distilled versions available on Hugging Face. The process involves the following steps:

1. Accessing the Model Garden: Navigate to the Model Garden service within the Google Cloud Console.

2. Searching for DeepSeek Models: Utilize the search or filtering options to find DeepSeek models. The platform explicitly shows distilled DeepSeek models like the deepseek-r1-distill-qwen-1.5b and deepseek-llm-7b-chat

3. Selecting a Model: Choose the desired DeepSeek model.

4. Initiating Deployment: Click on the deployment option for the selected model.

5. Configuring Deployment Settings: Provide a name for the deployed model (endpoint name), select the deployment region (e.g., us-central1), and choose the appropriate machine type recommended for the model, often involving NVIDIA GPUs.

6. Deployment: Initiate the deployment process, which takes a few minutes to provision the necessary infrastructure.

7. Testing the Endpoint: Once deployed, the Model Garden provides an interface to interact with the model. Users can input prompts in a JSON format, specifying parameters like the prompt itself, maximum tokens, temperature, top_k, and so on. The platform returns the model’s predictions in a JSON response.

8. Programmatic Access (Python Example): Now, how to interact with the deployed DeepSeek model programmatically using the Vertex AI SDK in Python. This involves:

Initializing the Vertex AI platform with the Project ID and region.
Defining the endpoint name (ID) of the deployed DeepSeek model.
Creating an endpoint object using the Vertex AI Endpoint class.
Constructing a JSON payload with the prompt and generation parameters.
Calling the predict() method on the endpoint object with the payload.
Processing the response to extract the model’s predictions.

REGION = "us-central1"
ENDPOINT_ID = "5340205930317348864"
PROJECT_ID = "devhack-3f0c2"

from google.cloud import aiplatform
aiplatform.init(project=PROJECT_ID, location=REGION)

endpoint_name = f"projects/{PROJECT_ID}/locations/{REGION}/endpoints/{ENDPOINT_ID}"
endpoint = aiplatform.Endpoint(endpoint_name=endpoint_name)

intances = [
    {
        "prompt": "create a song for my developer community",
        "max_tokens": 200,
        "temperature": 0.7,
    }
]

resp = endpoint.predict(instances=intances)
print(resp.predictions)

Benefits of Using DeepSeek with the Model Garden

Combining DeepSeek with the Google Cloud Model Garden offers several compelling benefits:

Easy Access to Powerful Open-Source Models: The Model Garden simplifies the discovery and access to DeepSeek’s innovative language models, particularly those available on HuggingFace.
Rapid Deployment: The one-click deployment feature of the Model Garden significantly reduces the time and effort required to get DeepSeek models up and running on cloud infrastructure.
Simplified Infrastructure Management: By deploying through Vertex AI, users can leverage a fully managed environment, abstracting away the complexities of hardware and infrastructure management.
Cost-Effective Experimentation: The ability to quickly deploy and test DeepSeek’s distilled models allows users to evaluate their suitability for specific use cases without significant upfront investment in hardware or complex setup.
Flexibility in Usage: Once deployed, DeepSeek models can be accessed through a user-friendly interface for quick testing or programmatically via SDKs and REST APIs for integration into applications.
Facilitates Comparison and Evaluation: The Model Garden enables users to easily deploy and compare the performance and cost-effectiveness of different models, including various DeepSeek sizes, before committing to a specific solution.

Conclusions

This blog post effectively highlights the power of combining open-source innovation, exemplified by DeepSeek’s resource-efficient and high-performing distilled language models, with the user-friendly deployment capabilities of Model Garden on Vertex AI. The Model Garden significantly lowers the barrier to entry for utilizing advanced LLMs by simplifying deployment and infrastructure management. While acknowledging potential scalability considerations with directly deployed DeepSeek models, the platform offers a valuable avenue for experimenting with and potentially deploying these models, especially the accessible distilled versions available through the HuggingFace integration. By leveraging the Model Garden, users can rapidly prototype, evaluate, and potentially scale applications powered by cutting-edge open-source language models like DeepSeek, ultimately fostering greater accessibility and innovation in the field of artificial intelligence.

I hope this information is useful to you, and remember to share this blog post, your comments are always welcome.

Visit my social networks:

https://twitter.com/jggomezt
https://www.youtube.com/devhack
https://devhack.co/

More Info

https://youtu.be/Ur6kNST9MPQ?si=1FNMi21fiwgDFlgR
https://cloud.google.com/model-garden?hl=en

Top comments (2)

GhostlyInc • Jan 29

Great article — very clear and practical overview of using DeepSeek with Vertex AI and Model Garden.
One question though:
Is GCP with Vertex AI actually cost-effective for small projects, or is it mainly suitable for prototyping and experimentation?

Juan Guillermo Gomez Torres Google Developer Experts • Feb 4

@ghostlyinc Good question. I believe it is expensive because it requires high-intensity infrastructure with GPUs and more. I recommend trying with GCP Cloud Run and enabling GPUs.