DEV Community

Cover image for Bedrock Inference Profiles — From Flying Blind to Understanding Your AWS Bedrock Usage in Detail

Bedrock Inference Profiles — From Flying Blind to Understanding Your AWS Bedrock Usage in Detail

The starting point: zero visibility

Every organization is using Claude, Codex, Cursor, Cline, and many other wonderful tools right now to improve team productivity and, by a rippling effect, the products they ship. Most of these tools are developer-oriented, but a new generation of applications has been increasingly growing in numbers — agentic applications.

Rather than interacting with a simple, naive bank app where you click through menus and filters that barely work, you can now talk with them in natural language and ask things like "where is my money going this month?" That kind of feature is raising the bar on what a product delivers. And in my case as an AWS consultant, most of these agentic features are powered behind the scenes by AWS Bedrock — though you can also find them on other clouds with their respective LLMs, Ollama, or even direct OpenAI or Anthropic keys if you want.

Right now, several applications in your organization may already be using AWS Bedrock and improving your products. But are you aware of the ROI? Or do you only see a total bill without being able to answer:

  • Which app is calling Claude the most?
  • Which team is burning tokens on Opus when Haiku would do?
  • Which application ran a loop last night and spent $40 on something trivial?

This is where Bedrock inference profiles help. You can now see, in every account and region, which users and applications are using which models — and how much it's costing. Isn't that nice?

# IAM Caller Department Model Requests Input Tokens Output Tokens Est. Cost
1 sgomez Digital Banking OPUS 64,736 8,747,730 39,697,047 $1,007.47
2 vrodriguez Core Banking OPUS 56,296 9,829,576 21,772,001 $528.79
3 smartinez Fraud & Risk OPUS 25,166 13,829,878 8,723,278 $249.95
4 cperez Payments & Transfers OPUS 13,459 2,476,498 8,256,859 $216.67
5 agarcia Mobile Banking OPUS 14,247 6,575,622

The solution: three layers

For the impatient builders: GitHub Code

Layer 1 — Capture every invocation

This is the foundation. You tell Bedrock to log every API call to two places:

  • S3 (invocations/ prefix) — durable, cheap, queryable with Athena
  • CloudWatch (/aws/bedrock/invocations) — for real-time tailing and alerting

Now you can see in CloudWatch each request and response, including the tokens used. Without this step, everything else is blind. Run it once per account/region.

Layer 2 — Tag every invocation with an identity

This is the key insight. Logging alone tells you that a call happened and how many tokens it used — but not who made it from an application perspective. All calls to the same model look identical in the logs.

Application inference profiles solve this. Each app gets a profile that is a named copy of a system model, carrying tags:

tags: { app: "community-bank", team: "cto" }
Enter fullscreen mode Exit fullscreen mode

The app swaps its modelId for the profile ARN. That's the only change required — no code changes, just configuration. The profile ARN flows into every log entry, so every token is now stamped with an app and team identity.

That's all you need. From here you can go to Athena and start answering your questions.

Layer 3 — Query the data with Athena

Once logs are flowing with identity stamps, Athena turns your S3 bucket into a queryable warehouse:

  • Tokens per app per day
  • Estimated cost per app in USD
  • Spend per IAM caller — catches developers calling Bedrock directly from their laptops, not through a profile

Bonus: cross-region inference and data residency

There's one more concept worth understanding before you design your profiles. The source model ID used to create a profile has a geographic prefix:

us.anthropic.claude-haiku-4-5-20251001-v1:0
eu.anthropic.claude-haiku-4-5-20251001-v1:0
ap.anthropic.claude-haiku-4-5-20251001-v1:0
Enter fullscreen mode Exit fullscreen mode

That prefix is not cosmetic. Bedrock has three geographic routing pools and when you copy from one of these system profiles, your application profile inherits that routing — meaning Bedrock automatically distributes traffic across regions within that pool for higher availability and better throughput.

Prefix Pool Use case
us. US cross-region Production apps, US data
eu. EU cross-region GDPR, EU data residency
ap. AP cross-region Asia-Pacific latency

If you have GDPR obligations or customers in Europe, source your profiles from eu. and data never leaves EU regions. This turns inference profiles into a data governance tool, not just a cost governance tool.


The governance arc

  1. Before — bill arrives, no idea who spent what
  2. Enable logging — raw data flows, but it's all ARNs and roles, still hard to read
  3. Add profiles — one config change per app unlocks full attribution, no code changes
  4. Athena — token-level drill-down, estimated USD per app/day, per IAM caller
  5. Cost Explorer — activate the app/team tags for budget-level visibility and alerts

From nothing to full observability. That's the journey.


Implementation Guide

Technical setup for Bedrock cost tracking using application inference profiles and invocation logging.

Prerequisites

  • AWS CLI configured with permissions for IAM, Bedrock, S3, CloudWatch Logs
  • An S3 bucket for invocation logs
  • Region: us-east-1 (or override via AWS_REGION)

Architecture

bedrock-cost-tracking/
├── 01-enable-logging.sh             # Step 1: enable invocation logging (run once per account)
├── 02-create-inference-profiles.sh  # Step 2: create per-app/team profiles
├── 03-validate.sh                   # Step 3: verify the full pipeline
├── invoke_profiles.py               # Fire test calls through each app profile
├── check_logs.py                    # Tail CloudWatch logs and summarise usage
├── trust-policy.json                # IAM trust policy for BedrockInvocationLoggingRole
├── permissions-policy.json          # IAM permissions for the logging role
└── athena-usage-query.sql           # Token-level usage + estimated cost per app/day
Enter fullscreen mode Exit fullscreen mode

Setup

Step 1 — Enable invocation logging (once per account/region)

export BEDROCK_LOG_BUCKET=your-bedrock-logs-bucket
bash 01-enable-logging.sh
Enter fullscreen mode Exit fullscreen mode

This creates:

  • CloudWatch log group /aws/bedrock/invocations (90-day retention)
  • IAM role BedrockInvocationLoggingRole with S3 + CloudWatch write permissions
  • Bedrock model invocation logging configuration pointing at both destinations

Step 2 — Create application inference profiles

bash 02-create-inference-profiles.sh
Enter fullscreen mode Exit fullscreen mode

Each profile is tagged with app and team. The script prints the ARN for each created profile — copy these into invoke_profiles.py and athena-usage-query.sql.

Step 3 — Validate

# Set profile ARNs from step 2 output
export PROFILE_ALPHA=arn:aws:bedrock:us-east-1:<YOUR_ACCOUNT_ID>:application-inference-profile/<ID>
export PROFILE_BETA=arn:aws:bedrock:us-east-1:<YOUR_ACCOUNT_ID>:application-inference-profile/<ID>
export PROFILE_AIHUB=arn:aws:bedrock:us-east-1:<YOUR_ACCOUNT_ID>:application-inference-profile/<ID>

python3 invoke_profiles.py   # fire one test call per profile
# wait ~90 seconds for logs to appear
python3 check_logs.py        # confirm attribution is working
bash 03-validate.sh          # full infrastructure check
Enter fullscreen mode Exit fullscreen mode

Add a new app profile

1. Find the source model ARN

aws bedrock list-inference-profiles --region us-east-1 --type-equals SYSTEM_DEFINED \
  --query 'inferenceProfileSummaries[?contains(inferenceProfileId, `haiku`) == `true`].{ID:inferenceProfileId,ARN:inferenceProfileArn}' \
  --output table
Enter fullscreen mode Exit fullscreen mode

Use the full ARN from this output — the copyFrom field requires it.

2. Create the profile

Replace app-gamma, gamma, and data-science with your app name and team.

SOURCE_ARN="arn:aws:bedrock:us-east-1:<YOUR_ACCOUNT_ID>:inference-profile/us.anthropic.claude-haiku-4-5-20251001-v1:0"

aws bedrock create-inference-profile \
  --region us-east-1 \
  --inference-profile-name "app-gamma-claude-haiku" \
  --description "Gamma app profile for Claude Haiku 4.5" \
  --model-source "{\"copyFrom\":\"$SOURCE_ARN\"}" \
  --tags "[{\"key\":\"app\",\"value\":\"gamma\"},{\"key\":\"team\",\"value\":\"data-science\"}]" \
  --query '{ARN:inferenceProfileArn,Status:status}' \
  --output table
Enter fullscreen mode Exit fullscreen mode

3. Add the new ARN to athena-usage-query.sql — add a row to the pricing CTE with the ARN, app label, and token prices.

4. Run a validation call

python3 invoke_profiles.py   # add the new profile to APP_PROFILES first
python3 check_logs.py        # confirm it shows up attributed correctly
Enter fullscreen mode Exit fullscreen mode

Migrate an existing app (one config change, no code changes)

Apps pass the profile ARN as modelId — the API call shape is identical to a direct model call:

# Before: direct model call (no attribution)
BEDROCK_MODEL_ID=us.anthropic.claude-haiku-4-5-20251001-v1:0

# After: routed through profile (tagged in logs and Cost Explorer)
BEDROCK_MODEL_ID=arn:aws:bedrock:us-east-1:<YOUR_ACCOUNT_ID>:application-inference-profile/<PROFILE_ID>
Enter fullscreen mode Exit fullscreen mode

No SDK changes. The response shape is identical.

Cost attribution

Cost Explorer (budget-level)

  1. Activate cost allocation tags in AWS Billing console: app, team
  2. Cost Explorer → Group by tag → filter by app or team
  3. Set budget alerts per tag value to catch spend anomalies early

Athena (token-level)

Run athena-usage-query.sql against your S3 log bucket for daily token counts and estimated USD cost per app profile. The query file contains five progressive steps:

Query What it shows
Step 1 Create the Athena table over S3 logs
Step 2 Daily usage by profile ARN (tokens + requests)
Step 3 Estimated cost per app using a pricing CTE
Step 4 Per IAM caller — tracks developer/role-level spend
Step 5 Combined view: user + app + estimated cost in one query

Pricing reference (us-east-1, on-demand)

Model Input per 1K tokens Output per 1K tokens
Claude Haiku 4.5 $0.00080 $0.00400
Claude Sonnet 4.5 $0.00300 $0.01500
Claude Opus 4 $0.01500 $0.07500

IAM-level attribution

Invocation logs capture the caller's IAM ARN (identity.arn). This enables per-developer or per-role spend queries — useful for tracking AI coding assistant usage separately from production workloads without needing a separate profile per developer.

Key concepts

System inference profiles are AWS-managed cross-region routing profiles (e.g. us.anthropic.claude-haiku-4-5-20251001-v1:0). They route to the best available region automatically for resilience.

Application inference profiles are account-owned copies of a system profile that add tagging metadata. They are the attribution layer — there is no routing or model-behavior difference.

Invocation logging captures every request/response at the Bedrock service level, including token counts, model ID (which resolves to the profile ARN when profiles are used), and the IAM identity of the caller.


There has never been a better time to be an engineer and create value in society through software.

If you enjoyed the articles, visit my blog at jorgetovar.dev.

Top comments (1)

Collapse
 
topstar_ai profile image
Luis

This is a really useful breakdown of something that’s often confusing in Bedrock—how inference profiles actually shape cost visibility and model usage tracking. The “flying blind” analogy is accurate; without proper observability, it’s very easy to lose track of which models are driving spend and latency. I also like the focus on bringing structure to usage patterns, since most teams only notice inefficiencies after bills spike. One thing I’d be curious about is how inference profiles behave in multi-environment setups (dev/staging/prod) and whether they can be reliably used for governance at scale. Overall, a practical guide for real-world Bedrock usage.