Gleb Otochkin

Posted on Dec 15, 2025 • Originally published at Medium on May 31, 2024

Hugging face and Open AI Models in AlloyDB Omni

#data #postgres #alloydb #googlecloudplatform

In my previous post I showed how to register some Vertex AI models in the AlloyDB Omni database and how to generate text or embeddings using just SQL. But what if you want to call an AI model outside of Google infrastructure. In such a case you still can register the model using credentials and endpoint provided by the third party provider. In this post I will show you how to register a model hosted by OpenAI and in the Hugging Face.

A quick reminder that all what I am describing here is still in preview and some things can be changed later.

We’ve already covered how to install and enable the AI integration in the AlloyDB Omni in the previous post. Here we continue assuming that the database is created, the AI integration is enabled and the the latest version of the google_ml_integration extension is installed. And, as of now, even when we work with 3d party models we still need the integration with Google Cloud because we use Google Cloud Secret Manager to store credentials, such as API tokens for 3 party providers. That integration is using the same service account we created for the Vertex AI authentication and the same service account is used to retrieve our secrets from the Google Secret Manager.

We will go step by step and at first, we have to enable the Secret Manager API:

gleb@cloudshell:~ (gleb-test-project)$ gcloud services enable secretmanager.googleapis.com
Operation "operations/acat.p2-870304593371-39f6ec47-ab61-45b5-9d42-ba128a5af575" finished successfully.
gleb@cloudshell:~ (gleb-test-project)$

Now let us move to the OpenAI platform and generate an API key. We need to go to https://platform.openai.com/api-keys and push the button “ Create new secret key” at the top right of the screen, put a name for the key and push the button “Create secret key”:

It will open another popup window with the key value. It has to be copied aside to be used later since it appears only once — when you create the key.

Later you can change the key name and permissions but you will not be able to read the key value again.

Then we can create a secret in the Google Secret Manager using either the Google Cloud Web Console or a gcloud command like the following:

gleb@cloudshell:~ (gleb-test-project)$ gcloud secrets create openai-api-key 
Created secret [openai-api-key].
gleb@cloudshell:~ (gleb-test-project)$ echo -n "sk-omni-access-key-...........................8hP9" | gcloud secrets versions add openai-api-key --data-file=-
Created version [1] of the secret [openai-api-key].
gleb@cloudshell:~ (gleb-test-project)$

And grant permissions to the secret to our service account used to set up the AI integration. We are still using the same vertex-ai-connect google service account we’ve used for the Vertex AI integration.

gleb@cloudshell:~ (gleb-test-project)$ gcloud secrets add-iam-policy-binding openai-api-key --member='serviceAccount:vertex-ai-connect@gleb-test-project.iam.gserviceaccount.com' --role='roles/secretmanager.secretAccessor'
Updated IAM policy for secret [openai-api-key].
bindings:
- members:
  - serviceAccount:vertex-ai-connect@gleb-test-project.iam.gserviceaccount.com
  role: roles/secretmanager.secretAccessor
etag: BwYZr3y9lZ4=
version: 1
gleb@cloudshell:~ (gleb-test-project)$

Now we need to register the created secret using google_ml.create_sm_secret procedure where we define the mapping between google_ml secret_id and the path to our secret in the Google Secret Manager. In a psql session connecting to the ai_demo database execute:

ai_demo=# CALL google_ml.create_sm_secret( 
    secret_id => 'openai-api-key', 
    secret_path => 'projects/gleb-test-project/secrets/openai-api-key/versions/1'
);
CALL
ai_demo=#

For our test we are going to test one of the latest Open AI models — gpt-4o. The API documentation for the chat interface is published here. From the documentation we take all required parameters and format for input and output for our queries to the AI model.

And here is the procedure to Here is register the Open AI gpt-4o model using the endpoint from Open AI documentation, previously registered secret and the model name:

CALL google_ml.create_model(
    model_id => 'gpt-4o', 
    model_provider => 'open_ai', 
    model_request_url =>'https://api.openai.com/v1/chat/completions', 
    model_type => 'generic', 
    model_auth_type => 'secret_manager', 
    model_auth_id => 'openai-api-key', 
    model_qualified_name => 'gpt-4o');

Now we can run the google_ml.predict_row() function supplying the model id and the request in JSON format according to Open AI specifications for API. We place a simple request asking the model what is the AlloyDB Omni:

select google_ml.predict_row('gpt-4o','{"model" : "gpt-4o", "messages" : [{"role": "user", "content": "What is AlloyDB Omni?"}]}')->'choices'->0->'message'->'content';

And here is the result. It looks factually correct and provide a very decent high level overview of what AlloyDB Omni is:

"AlloyDB Omni is a version of AlloyDB that is designed to run in environments beyond Google Cloud Platform (GCP). With AlloyDB Omni, you can deploy and manage AlloyDB on various platforms including on-premises data centers, other cloud providers, and edge locations. This flexibility allows organizations to utilize AlloyDB's features and capabilities while leveraging existing infrastructure or taking advantage of multi-cloud strategies."

I want to give one more example for another provider — Hugging Face. To register a model from Hugging Face we have to create an API token there first. We open the https://huggingface.co/settings/tokens page and push the “New token” button. It will open a pop-up window where we name the token and optionally define its scopes.

After pushing the “Generate token” button we get the new token which can be copied and used to create the secret.

The new token should be granted permissions to make calls to the Inference endpoints. You can always modify the permissions using the “Manage” button.

The process of creating the secret is the same as the one we’ve used for the Open AI token and required a few commands. Here is an example:

gcloud secrets create hugginface-api-key
echo -n "hf_....................oofm" | gcloud secrets versions add hugginface-api-key --data-file=-
gcloud secrets add-iam-policy-binding hugginface-api-key --member='serviceAccount:vertex-ai-connect@gleb-test-project.iam.gserviceaccount.com' --role='roles/secretmanager.secretAccessor'

And here is command executed in the psql session connected to the ai_demo database:

CALL google_ml.create_sm_secret( 
    secret_id => 'hugginface-api-key', 
    secret_path => 'projects/gleb-test-project/secrets/hugginface-api-key/versions/1'
);

Now we can register one of the models served by the Hugging Face platform. Here I’ve decided to give a try to the Mistral-7B. For the model we can use the free serverless endpoints. To get the required parameters such as JSON payload format and endpoint we need to open the “Inference API (serverless)” page and click on “Curl” at the top of the pop-up window.

Then we can register our model in the AlloyDB Omni using the provided information.

CALL google_ml.create_model(
    model_id => 'Mistral-7B-Instruct-v0.3', 
    model_provider => 'custom', 
    model_request_url =>'https://api-inference.huggingface.co/models/mistralai/Mistral-7B-Instruct-v0.3', 
    model_type => 'generic', 
    model_auth_type => 'secret_manager', 
    model_auth_id => 'hugginface-api-key', 
    model_qualified_name => 'Mistral-7B-Instruct-v0.3');

And try to ask exactly the same question using google_ml.predict_row() function. You’ve noticed that I parsed the JSON output getting the first value from the array and value for the “generated_text” key:

select google_ml.predict_row('Mistral-7B-Instruct-v0.3','{"inputs" : "What is AlloyDB Omni?"}')->0->'generated_text';

And here is the response:

"What is AlloyDB Omni? AlloyDB Omni is a fully managed and scalable database service for running dynamic and complex workloads across multiple regions with low-latency, high-throughput, and consistent performance. It\u2019s a global database offering that combines the high-performance and scalable characteristics of AlloyDB for PostgreSQL with the distributed and resilient capabilities of a global database. AlloyDB Omni is ideal for building and deploying applications that require low-latency and high"

The response is not entirely accurate and probably needs some grounding but the wording itself was not too bad and maybe with addition of RAG or tuning using Google Cloud documentation can be improved. But that is not in the scope of this post.

This is the second post from the series about AI on AlloyDB Omni. It has covered registration of AI models hosted by 3d party providers using examples for Open AI and Hugging Face. Also you can watch on Youtube when I tried it for the first time with Open AI models. Hopefully this is not the last post in the series. I have some ideas for the next post and, if you have some ideas or want some particular information about AlloyDB or AlloyDB Omni, please ping me here or X(twitter) or Linkedin. Stay tuned and happy testing.

DEV Community

Hugging face and Open AI Models in AlloyDB Omni

Top comments (0)