Create and manage inference profiles on Amazon Bedrock

#aws #bedrock #genai #nova

Amazon Bedrock Inference Profiles is a powerful feature that allows you to track costs and usage metrics when invoking foundation models on Bedrock. Think of them as custom aliases for your models that provide detailed insights into how your AI applications consume resources. In this tutorial, I'll show you how to manage inference profiles.

What Are Inference Profiles?

Inference profiles in Amazon Bedrock serve two main purposes:

Cost Tracking: Monitor expenses per application, team, or project.
Usage Analytics: Track invocation patterns and resource consumption.

There are two types:

System-defined profiles: Created by AWS for cross-region inference.

Application profiles: Custom profiles you create for your specific needs. You can't view application inference profiles in the Amazon Bedrock console, only through CLI or API.

Each model have different inference types supported. Here is the model inference types supported breakdown:

inferenceTypesSupported is ['ON_DEMAND'] or ['ON_DEMAND', 'PROVISIONED']

✅ Can invoke directly
✅ Can create application inference profiles from these

inferenceTypesSupported: ['INFERENCE_PROFILE'] or ['PROVISIONED']

❌ Cannot invoke directly
❌ Cannot create application inference profiles directly
✅ Can only access through system-defined profiles

If you want to create inference profile, inferenceTypesSupported ** must have **ON_DEMAND.

You can use the function below to all the model that have ON_DEMAND in your selected region.

Remember that each region have different *inferenceTypesSupported * for each models.

import boto3
import json
from botocore.exceptions import ClientError

def list_available_models():
    bedrock_client = boto3.client('bedrock', region_name='us-east-1')
    try:
        response = bedrock_client.list_foundation_models()
        models = response.get('modelSummaries', [])
        available_models = []
        for model in models:
            if 'ON_DEMAND' in model.get('inferenceTypesSupported', []):
                model_info = {
                    'modelId': model.get('modelId'),
                    'modelArn': model.get('modelArn'),
                    'modelName': model.get('modelName'),
                    'inferenceTypes': model.get('inferenceTypesSupported', [])
                }
                available_models.append(model_info)
        print(available_models)
        return available_models

    except ClientError as e:
        print(f"Error listing models: {e}")
        return []

In us-east-1, you can create profile for almost every model, from every provider, even Amazon Nova:

But in ap-southeast-1, you can only create for 8 models from Anthropic and Cohere:

Currently, you can only create an inference profile using the Amazon Bedrock API. You can use this function here to create profile:

def create_inference_profile(
    region,
    profile_name,
    model_arn,
    model_name,
    tags=None,
    description=None
):
    bedrock_client= boto3.client('bedrock', region_name=region)

    try:
        params = {
            'inferenceProfileName': profile_name,
            'modelSource': {
                'copyFrom': model_arn
            }
        }

        if description:
            params['description'] = description

        if tags:
            params['tags'] = tags

        response = bedrock_client.create_inference_profile(**params)

        profile_info = {
            'inferenceProfileArn': response['inferenceProfileArn'],
            'inferenceProfileId': response.get('inferenceProfileId'),
            'status': response.get('status'),
            'profileName': profile_name,
            'region': region,
            'model': model_name,
            'modelArn': model_arn
        }

        print(f"Profile's ARN: {profile_info['inferenceProfileArn']}")

        return profile_info
    except ClientError as e:
        print(f"ErrorCode: {e.response['Error']['Code']}")
        print(f"Error: {e.response['Error']['Message']}")

        return None

For example, here I'm trying to create profile for Amazon Nova in us-east-1, also remember to add your desired tags:

profile = create_inference_profile(
    region="us-east-1",
    profile_name="Nova Pro Production",
    model_arn="arn:aws:bedrock:us-east-1::foundation-model/amazon.nova-pro-v1:0",
    model_name="Nova Pro",
    tags= [
        {
            'key': 'region',
            'value': 'us-east-1'
        },
        {
            'key': 'project',
            'value': 'my-project'
        },
        {
            'key': 'env',
            'value': 'production'
        },
    ],
    description="Nova Pro for my application running on production"
)

if profile:
    print(f"Inference Profile Created Successfully!")
    print(f"Name: {profile['profileName']}")
    print(f"ARN: {profile['inferenceProfileArn']}")
    print(f"Region: {profile['region']}")
    print(f"Model: {profile['model']}")

Unfortunately, right now at August 2025, you can not view the inference profile on AWS console. Then you can check all the profiles with this command:

aws bedrock list-inference-profiles --region us-east-1 --type-equals APPLICATION

Get more details and tags for your inference profile:

# Get detailed information about a specific profile
aws bedrock get-inference-profile --region us-east-1 --inference-profile-identifier arn:aws:bedrock:us-east-1:xxxxxxx:application-inference-profile/xxxxxxx

# View tags for your profile
aws bedrock list-tags-for-resource --region us-east-1 --resource-arn arn:aws:bedrock:us-east-1:xxxxxxx:application-inference-profile/xxxxxxxx

Now let's try to invoke with the newly created inference profile:

bedrock_runtime = boto3.client("bedrock-runtime", region_name="us-east-1")

try:
    request_body = {
        "messages": [
            {
                "role": "user",
                "content": [
                    {
                        "text": "What is the capital of Canada?"
                    }
                ]
            }
        ],
    }

    response = bedrock_runtime.invoke_model(
        modelId="arn:aws:bedrock:us-east-1:151182331915:application-inference-profile/xxxxxxx",
        body=json.dumps(request_body),
        contentType="application/json",
    )

    response_body = json.loads(response["body"].read())
    print(response_body)
except ClientError as e:
    print(e.response['Error']['Code'])
    print(e.response['Error']['Message'])

As you can see that it works:

Now all the cost using this inference profile will be tagged. You now can manage the cost of your application more efficent.

You can delete the profile with this command:

aws bedrock delete-inference-profile --inference-profile-identifier "arn:aws:bedrock:us-east-1:xxxxxx:application-inference-profile/xxxxxx"  --region us-east-1

There is also this AWS Bedrock Inference Profile Management Tool to have you more with managing inference profiles, have a look!

Create inference profile with AWS CDK

from aws_cdk import (
    Stack,
    custom_resources as cr,
    aws_iam as iam,
    CfnOutput,
    # aws_logs as logs,
)
from constructs import Construct


class BedrockInferenceProfileStack(Stack):
    def __init__(self, scope: Construct, construct_id: str, **kwargs) -> None:
        super().__init__(scope, construct_id, **kwargs)

        self.profile_name = "inference-profile"
        self.inference_profile = cr.AwsCustomResource(
            self,
            "BedrockInferenceProfile",
            on_create=cr.AwsSdkCall(
                service="Bedrock",
                action="createInferenceProfile",
                parameters={
                    "inferenceProfileName": self.profile_name,
                    "description": "Claude Inference Profile",
                    "modelSource": {
                        "copyFrom": "arn:aws:bedrock:ap-southeast-1::foundation-model/anthropic.claude-3-5-sonnet-20240620-v1:0"
                    },
                },
                physical_resource_id=cr.PhysicalResourceId.of(self.profile_name),
            ),
            on_update=cr.AwsSdkCall(
                service="Bedrock",
                action="getInferenceProfile",
                parameters={"inferenceProfileIdentifier": self.profile_name},
                physical_resource_id=cr.PhysicalResourceId.of(self.profile_name),
            ),
            on_delete=cr.AwsSdkCall(
                service="Bedrock",
                action="deleteInferenceProfile",
                parameters={
                    "inferenceProfileIdentifier": self.profile_name
                },
            ),
            policy=cr.AwsCustomResourcePolicy.from_statements(
                [
                    iam.PolicyStatement(
                        actions=[
                            "bedrock:CreateInferenceProfile",
                            "bedrock:DeleteInferenceProfile",
                            "bedrock:GetInferenceProfile",
                        ],
                        resources=["*"],
                    )
                ]
            ),
        )

        self.profileARN = self.inference_profile.get_response_field(
            "inferenceProfileArn"
        )

        CfnOutput(
            self,
            "ProfileARN",
            value=self.profileARN,
            description="Inference Profile",
        )

Hope you can create your inference profile successfully!

Reference: