Gerardo Arroyo for AWS Community Builders

Posted on Mar 27 • Originally published at gerardo.dev

Amazon Bedrock + RDS Aurora: Generative AI Inside Your MySQL Database

#aws #rds #bedrock #aurora

Have you ever dreamed of having an AI assistant inside your database, helping you optimize queries and explore vast datasets?

Well, that dream is about to become reality. In this article, I'll walk you hand-in-hand through the exciting world of integrating Amazon Bedrock with RDS Aurora MySQL. Get ready to discover how this Generative AI combination can revolutionize the way you interact with your data and optimize your SQL queries.

Let's start this journey toward the future of AI-powered databases!

What Is Amazon Bedrock?

Amazon Bedrock is a managed Generative AI service that was launched in early 2023, providing us with access to multiple cutting-edge AI models through a single API.

This service has many features and is constantly evolving and growing; here are the most important ones from my perspective:

Access to AI models: It offers access to large language models (LLMs) and other AI models from leading companies: Anthropic, AI21 Labs, Meta, Cohere, Mistral AI, Stability AI, and Amazon.

Unified API: It allows developers to access and use different AI models through a single interface, simplifying integration. With Bedrock, it's just a matter of slightly changing the call and we can switch from one model to another — making it easy to test and evaluate which model best fits our use case.

AWS Integration: It integrates seamlessly with other AWS services.

Security and privacy: A very important element when it comes to Generative AI, and of course it includes options for secure data handling and regulatory compliance.

Prerequisites: Setting the Stage

Before we dive into the integration, let's make sure we have everything ready:

1. Access to the Anthropic Claude 3.5 Sonnet Model
Before starting the configuration process, it's important to request access to the models you'll need from the Bedrock console. For this exercise, I'll use the most advanced Anthropic model available in Bedrock, which is Claude 3.5 Sonnet.

This is done in the 'Bedrock Settings' section, where we need to confirm the corresponding permission for that model. Of course, you can enable other models if you want to experiment with different LLMs to compare responses.

Tip: Enable other models if you want to experiment!

2. RDS Aurora MySQL
We need to have a properly provisioned RDS Aurora MySQL cluster, with at least version 3.06 since that's the minimum version with support for this feature.
As part of this exercise, we'll use the popular MySQL test database called Sakila, so you should already have it properly installed on your cluster.

Configuration: Step by Step Toward Integration

1. Create an IAM Role and Policy
This integration requires 'AWS Identity and Access Management' (IAM) roles and policies to allow the Aurora MySQL cluster to access and use Amazon Bedrock services.

First, we create a new IAM policy that must contain the following:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": "bedrock:InvokeModel",
            "Resource": "*"
        }
    ]
}

💡 Tip: Save this policy with the name BedrockInvokeModel. We'll use it later.

Now, we need to create a role. We must select 'Add Role to Database' as the use case, as shown in the image.

Next, in the permissions section, we need to associate the previously created policy.

The final result should look like this:

Take note of the ARN of this new role, as we'll use it later; its format is similar to: arn:aws:iam::XXXXX:role/RDSCallBedrockRole.

2. Create a Parameter Group in RDS

Now, we need a new parameter group for our cluster:

Once created, we'll edit the aws_default_bedrock_role parameter in this group to set the ARN of the role we created in the previous step.

After that, we need to modify the RDS cluster to use our new custom parameter group.

As a final step, we associate the same role to the Aurora cluster.

It's important to reboot the cluster so that the changes we've made take effect.

Want to verify everything is in order? Run this command:

SHOW GLOBAL VARIABLES LIKE 'aws_default%';

And you should see the role we've been using as the value.

Variable_name	Value
aws_default_bedrock_role	arn:aws:iam::XXXXX:role/RDSCallBedrockRole

3. Create User and Permissions

For our exercise, I'll assume you already have a user created with full permissions on the sakila database. We'll assume it's called demo.

We need to grant the following permission to our user:

GRANT AWS_BEDROCK_ACCESS TO 'demo'@'%';

And then we set the effective privileges in our session.

SET ROLE AWS_BEDROCK_ACCESS;

If I tried to test Bedrock access right now, I'd get a connectivity error because the network configuration doesn't allow it yet. We'll fix this in the next step.

4. Network Configuration
There are several ways to configure communication between RDS and Bedrock, but for this case we'll use a VPC Endpoint.

First, we need to create a new endpoint and select com.amazonaws.region.bedrock-agent-runtime as the service. Be careful to select that one and not one of the other available options.

Then we need to associate it with our VPC, the appropriate subnets, and select the security group to attach. In my example, I used the database's security group — just make sure it has ingress traffic permissions authorized for itself.

That wraps up the VPC configuration.

Our Assistant: A SQL Expert at Your Disposal

Imagine having a virtual SQL assistant that not only optimizes your queries but also explains why it does so. That's exactly what we're going to build!

The reason that motivated me to do this goes back to the fact that many years ago I was a 'Database Administrator,' and even today I regularly see how developers create SQL statements that lack the minimum elements to be considered adequately optimized. So it hit me: what if we give them a tool that lets them submit a SQL statement and an assistant recommends how to rewrite it properly, considering the database schema, and also tells them in a measurable way the impact on execution time improvement?

Key Components:

query_history table: Stores the before and after of your queries, along with their execution times.
generate_optimized_query function: Leverages the power of Claude 3.5 Sonnet to improve your queries.
analyze_and_optimize_query procedure: The brain of the operation.
Gathers schema information from the current database.
Generates an optimized version of the input query using the AI model.
Executes both the original and optimized queries, measuring their execution times.
Stores the results in the history table.
Displays a comparison of the queries and their execution times.

Code

The complete source code is in the following GitHub repository. Here I'll share the most relevant parts.

First, we create a function that invokes the Claude 3.5 Sonnet model in Bedrock — note the model ID shown there. This function receives a JSON argument.

CREATE FUNCTION invoke_sonnet (request_body TEXT)
    RETURNS TEXT
    ALIAS AWS_BEDROCK_INVOKE_MODEL
    MODEL ID 'anthropic.claude-3-5-sonnet-20240620-v1:0'
    CONTENT_TYPE 'application/json'
    ACCEPT 'application/json';

This model ID can be obtained in at least two ways:

Directly in the Bedrock console, where you can find that identifier in the base models section.
Using the AWS CLI and running the following command (if you have the appropriate permissions)

aws bedrock list-foundation-models --query '*[].[modelName,modelId]' --out table

which returns the list of all available foundation models, for example:

Model	Model Id
Titan Multimodal Embeddings G1	amazon.titan-embed-image-v1
SDXL 1.0	stability.stable-diffusion-xl-v1:0
Jurassic-2 Ultra	ai21.j2-ultra
Claude 3 Sonnet	anthropic.claude-3-sonnet-20240229-v1:0
Claude 3 Haiku	anthropic.claude-3-haiku-20240307-v1:0
Claude 3.5 Sonnet	anthropic.claude-3-5-sonnet-20240620-v1:0
Llama 3 70B Instruct	meta.llama3-70b-instruct-v1:0
Mistral Large (2402)	mistral.mistral-large-2402-v1:0

Our next function is generate_optimized_query. In it, we set up a prompt instructing the model to act as an optimization expert, taking a SQL statement and the corresponding schema information as input. I also limit the response to a maximum of 500 tokens and build the JSON according to the specification required by Claude 3.5 Sonnet. Getting predictable, structured responses from the LLM is key in this context; if you want to go deeper into techniques for achieving deterministic outputs from language models, I recommend checking out this article on deterministic LLMs.

DELIMITER //
CREATE FUNCTION generate_optimized_query(input_query TEXT, schema_info TEXT)
RETURNS TEXT
BEGIN
    DECLARE result TEXT;
    DECLARE prompt TEXT;
    DECLARE json_payload TEXT;

    SET prompt = CONCAT('Act as a MySQL database optimization expert. ',
                'Given the following SQL query and schema information, ',
                'provide an optimized version of the query. ',
                'Only return the optimized query, without explanations. ',
                'Original query: "', input_query, '" ',
                'Schema information: "', schema_info, '"');

    SET json_payload = JSON_OBJECT(
        'anthropic_version', 'bedrock-2023-05-31',
        'max_tokens', 500,
        'messages', JSON_ARRAY(
            JSON_OBJECT(
                'role', 'user',
                'content', JSON_ARRAY(
                    JSON_OBJECT(
                        'type', 'text',
                        'text', prompt
                    )
                )
            )
        )
    );

    SET result = invoke_sonnet(json_payload);

    RETURN JSON_UNQUOTE(JSON_EXTRACT(result, '$.content[0].text'));
END //
DELIMITER ;

A simple way to find out which JSON each model expects is to go to the Bedrock console, select the providers list, click on the model of interest, and at the bottom you'll find an API example.

For our test, I'll send a SQL statement to our assistant to validate its behavior:

CALL analyze_and_optimize_query('
    SELECT c.first_name, c.last_name,
           COUNT(r.rental_id) as rental_count,
           SUM(p.amount) as total_spent
    FROM customer c
    JOIN rental r ON c.customer_id = r.customer_id
    JOIN payment p ON r.rental_id = p.rental_id
    JOIN inventory i ON r.inventory_id = i.inventory_id
    JOIN film f ON i.film_id = f.film_id
    WHERE f.rating = "PG" AND YEAR(r.rental_date) = 2005
    GROUP BY c.customer_id
    HAVING rental_count > 5
    ORDER BY total_spent DESC
    LIMIT 10
');

The result we get is the following:

Statement	Query	Execution Time
Original	SELECT c.first_name, c.last_name, COUNT(r.rental_id) as rental_count, SUM(p.amount) as total_spent FROM customer c JOIN rental r ON c.customer_id = r.customer_id JOIN payment p ON r.rental_id = p.rental_id JOIN inventory i ON r.inventory_id = i.inventory_id JOIN film f ON i.film_id = f.film_id WHERE f.rating = "PG" AND YEAR(r.rental_date) = 2005 GROUP BY c.customer_id HAVING rental_count > 5 ORDER BY total_spent DESC LIMIT 10	0.070305
Optimized	SELECT c.first_name, c.last_name, COUNT(r.rental_id) as rental_count, SUM(p.amount) as total_spent FROM customer c JOIN rental r ON c.customer_id = r.customer_id JOIN payment p ON r.rental_id = p.rental_id JOIN inventory i ON r.inventory_id = i.inventory_id JOIN film f ON i.film_id = f.film_id WHERE f.rating = 'PG' AND r.rental_date >= '2005-01-01' AND r.rental_date < '2006-01-01' GROUP BY c.customer_id HAVING rental_count > 5 ORDER BY total_spent DESC LIMIT 10	0.052826

The result? A 24.86% improvement in execution time. Not bad for a few minutes of work!

Let's go a bit further — if I modify the prompt and ask it to justify its changes, we get the following reasoning:

Eliminating double quotes: We changed double quotes to single quotes in f.rating = 'PG' to follow SQL best practices.
Optimizing HAVING: We changed HAVING rental_count > 5 to HAVING COUNT(r.rental_id) > 5. This avoids using an alias in the HAVING clause, which can be more efficient in some database engines.
Optimizing ORDER BY: We changed ORDER BY total_spent DESC to ORDER BY SUM(p.amount) DESC. Just like with HAVING, using the expression directly instead of the alias can be more efficient in some cases.
Indexes: Although we can't create indexes in this query, make sure indexes exist on the columns used in the JOIN, WHERE, and ORDER BY clauses. Specifically:
- customer_id in the customer and rental tables
- rental_id in the rental and payment tables
- inventory_id in the rental and inventory tables
- film_id in the inventory and film tables
- rating in the film table
- rental_date in the rental table
Partitioning: If the rental table is very large, consider partitioning it by year to improve the performance of the YEAR(r.rental_date) = 2005 clause.
Materialized views: If this query runs frequently, consider creating a materialized view containing the pre-aggregated data.

These optimizations should improve query performance, especially when implemented alongside proper indexes and other database-level optimizations.

That's a pretty reasonable explanation — it details each action and its justification.

From here, we can keep evolving our assistant and send it more database context such as indexes, running queries, open connection counts, among many other things. This would further expand the possibilities this assistant opens up. To efficiently manage the connection pool to Aurora in high-concurrency scenarios, consider using RDS Proxy, which centralizes and optimizes database connection management.

Conclusions: The Future Is Now

Integrating Amazon Bedrock with Aurora MySQL isn't just a technical improvement — it's a huge leap in how we interact with our databases:

Automatic optimization: Imagine having an expert DBA working 24/7 on your queries.
Continuous learning: Every optimization is a lesson for your team.
Time and resource savings: Less time debugging, more time innovating.
Scalability: As your database grows, your assistant grows with you.

But this is just the beginning. Can you imagine integrating sentiment analysis into your SQL queries? RDS Aurora MySQL and PostgreSQL have support for Amazon Comprehend.
Or maybe generating automatic reports based on your data? Well, you can also integrate with SageMaker. The only limit is our imagination.

Next Steps:

🚀 Experiment with different Bedrock models
📊 Create dashboards that show query performance improvements
🤝 Share your experiences and learnings with the community

Start experimenting today!

I hope this article has been useful and that it motivates you to try new things on AWS!

Questions? Comments? Leave them below! And don't forget to share this article if you found it helpful.

DEV Community