Jerry

Posted on Sep 26

AWS Bedrock Pricing Guide: Compare Amazon, OpenAI, and Anthropic

#aws #bedrock #pricing #ai

Quick note: Prices and limits change periodically. The examples below are accurate as of Fall 2025, but always check each provider’s current pricing page before you deploy.

Ever tried to decipher AWS Bedrock’s pricing and felt lost? This post explains tokens, billing, and real costs in plain English so you can plan with confidence.

About This Guide

Generative AI is everywhere, but pricing can be baffling if you’ve never used it before. As someone who helps businesses integrate AWS Bedrock into real workflows, I’ve seen that understanding a few basics (tokens, model limits, and billing) can save time and money.

This article explains, in plain English:

What AWS Bedrock is
What “tokens” are and why they’re the key to pricing
How much Amazon’s models actually cost
How it compares to OpenAI and Anthropic

By the end, pricing charts will feel clearer and less intimidating.

What Is AWS Bedrock?

AWS Bedrock is like an app store for AI brains. Amazon runs the heavy hardware for you. You just:

Pick a model you want to use (Amazon’s or a partner’s)
Send it text and get results back
Pay only for what you use

This is different from OpenAI or Anthropic, which only give you their own models. With Bedrock, you can mix and match: Amazon’s models for cost-effective tasks, third-party models for niche needs, all behind a consistent AWS experience.

Why teams like it: unified security, familiar AWS tooling, consolidated billing, and flexibility to switch models without re-architecting everything.

What Are “Tokens”? Think Text Lego Bricks

AI systems don’t actually “read” or “think” in whole words the way people do. Instead, they split everything you type into tiny pieces called tokens. You can picture tokens like Lego bricks of text. The model stacks and rearranges these bricks internally to understand and generate language.

How big is a token?

In English, one token is usually about 4 characters (roughly three-quarters of a word). Short words like “cat” are one token, while long words may be broken into several tokens.

How many tokens per page?

As a rough guide, 1,000 tokens ≈ 750 words (around one to two typed pages). This is why you’ll often see prices quoted per 1,000 tokens.

How are you billed?

You’re charged for the tokens you send in (input tokens) and the tokens the AI sends back (output tokens). The more text you send or receive, the more tokens you use.

A simple example:

You send a 300-token question and get a 200-token answer. Your total usage for that exchange is 500 tokens. That’s the number multiplied by the provider’s per-1,000-token prices.

This token system is essentially the same across AWS Bedrock, OpenAI, and Anthropic, which makes apples-to-apples comparisons possible.

What Are Embeddings (and How Do They Differ From Regular Text)?

When you send regular text to an AI model (like a chat prompt), you’re asking it to read your words and generate a human-readable response. Behind the scenes, the model briefly turns your text into numbers to “think,” but you never see those numbers; it just returns more text.

An embedding is when you explicitly ask a model to convert text into a vector (a list of numbers) that captures meaning. Think of it like getting the GPS coordinates of your text in “meaning space.” Texts that mean similar things end up with embeddings that are close together, even if the wording is totally different. This makes embeddings incredibly useful for search, recommendations, and retrieval-augmented generation (RAG), where you want to find information by meaning instead of just matching keywords.

Billing difference:

With regular text requests, you pay for input + output tokens. With embeddings, you typically pay for input only, because the model isn’t generating a big block of text; it’s just converting your input into vectors. In AWS Bedrock, Titan Embeddings is the service that does this, and it’s one of the cheapest ways anywhere to turn text into numbers for search, recommendations, or RAG.

How AWS Bedrock Charges You

There’s no flat monthly fee. You pay for how much text you process, measured in tokens. Bedrock offers three ways to pay:

On-Demand : Pay per request. Ideal for pilots, prototypes, and unpredictable workloads.
Batch : Cheaper, but runs in the background (great for large jobs overnight or non-urgent processing).
Provisioned : Reserve a set capacity for steady, high-volume needs to get predictable performance and pricing.

How to Choose the Right Pricing Mode

On-Demand if you’re experimenting or traffic is unpredictable.
Batch if you have large but non-urgent jobs, such as summarizing thousands of support tickets overnight.
Provisioned if you’ve reached steady, high usage and want guaranteed throughput at a lower per-token cost.

Many teams start with On-Demand, switch to Batch for bulk tasks, and move to Provisioned once patterns stabilize.

Every model shows a price per 1,000 tokens, with separate rates for input and output. (Embeddings are typically “input only,” since you’re just turning your text into vectors.)

Actual Prices for Amazon’s Models

(Accurate as of Fall 2025 and subject to change.)

Nova (writes or rewrites text)

Model	Cost In (per 1K)	Cost Out (per 1K)	Notes
Nova Micro (On-Demand)	$0.000035	$0.00014
Nova Micro (Batch)	$0.0000175	$0.00007	~50% off, asynchronous
Nova Lite	$0.00006	$0.00024
Nova Pro	$0.0008	$0.0032

Batch is roughly half price but runs asynchronously, so it’s best for large, non-interactive jobs.

Titan Embeddings (turns text into numbers for search/recommendation)

$0.00002 per 1,000 tokens (input only)
Can handle about 8,000 tokens in one go
Among the cheapest options anywhere for embeddings, which you’d use for search, recommendations, or retrieval-augmented generation (RAG)

How Much Text Can You Send at Once?

Each model has a limit on how much text it can handle in a single request (input + output combined). If you go over, the request fails, so people often split long content into chunks.

Some examples:

Model	Max Text It Can Handle
Titan Text Lite	~4,000 tokens
Titan Express	~8,000 tokens
Titan Premier	~3,000 tokens
Claude Sonnet 4 (Anthropic)	~65,000 tokens
Claude 3.7 Sonnet	~131,000 tokens

Tip: When you set max_tokens (the longest answer the AI can produce), remember it counts against that total too. If your prompt is huge, reduce the expected answer length — or vice versa.

How AWS Bedrock Compares to OpenAI and Anthropic

Here’s a simple side-by-side using common choices:

Platform / Model	Cost In (per 1K)	Cost Out (per 1K)	Max Text (context)
AWS Nova Micro	0.000035	0.00014	A few thousand tokens
AWS Titan Embeddings	0.00002	n/a	~8K tokens input
OpenAI GPT-3.5 Turbo	~0.0005	~0.0015	~16K tokens
OpenAI GPT-4 (8K)	~0.03	~0.06	~8K tokens
Anthropic Claude 3.7 Sonnet	~0.003	~0.015	Up to ~131K tokens

Bottom line: Amazon’s models are extremely cheap per token for everyday tasks. OpenAI and Anthropic tend to cost more but can offer more advanced reasoning, longer context windows, or specific strengths depending on the version.

Which Amazon Model Should You Use?

Nova (Micro/Lite/Pro): For writing text: chatbots, summaries, Q&A, rewriting.
Micro is the cheapest and surprisingly capable for simple tasks.
Lite and Pro give you higher quality and longer, more nuanced outputs.
Titan Text (Lite/Express/Premier): Amazon’s text generators at different sizes.
Good when you want a native Amazon LLM with predictable behavior and AWS integration.
Titan Embeddings: Converts text into vectors (numbers) for search and recommendations or RAG.
If you’re building a knowledge base, semantic search, or “retrieve then answer” system, this is your workhorse.
Rerank: Improves the order of search results.
If you already have a list of candidates (e.g., documents or products), Rerank helps surface the most relevant ones first.

Rule of thumb: Pick the smallest model that meets your quality needs; you’ll save money without sacrificing results.

Tips to Keep Your Costs Down

Measure your tokens: Log the average input and output sizes per request. Even a slight trim (e.g., tighter prompts, shorter answers) adds up quickly at scale.
Use Batch mode when latency isn’t critical: If your job doesn’t require instant answers (such as bulk summarization or nightly indexing), Batch pricing can reduce costs by around 50% on supported models.
Set max_tokens thoughtfully: Cap the length of outputs to prevent accidentally generating long, costly responses.
Chunk long documents: If you’re near a model’s limit, split content into smaller pieces. This avoids errors and gives you more predictable usage.
Cache repeated instructions (where supported): If your prompts contain long, repetitive system instructions, consider caching mechanisms or template your prompts to reduce overhead.

A Real Example (With a Quick Calculator Mindset)

Let’s say you do 100,000 requests a month, each with 300 tokens in and 200 tokens out, using Nova Micro:

Input = 100,000 × 300 = 30,000,000 tokens

30,000,000 ÷ 1,000 = 30,000 units × $0.000035 = $1.05

Output = 100,000 × 200 = 20,000,000 tokens

20,000,000 ÷ 1,000 = 20,000 units × $0.00014 = $2.80

Total monthly cost ≈ $3.85* for 100k interactions.
Why this matters:* You can estimate costs before you ship. Multiply your expected token usage by the per-1,000-token price. If it’s too high, reduce output length, prune your inputs, or use Batch/Provisioned pricing.

Common Mistakes Beginners Make With AI Pricing

Forgetting that output tokens count too; a long answer costs more.
Using a vast, expensive model for a simple task where a smaller one would suffice.
Not setting max_tokens and getting unexpected multi-paragraph answers.
Sending the same long instruction text with every prompt instead of caching or templating it.

Avoiding these pitfalls can reduce bills by 30–50% without compromising quality.

Conclusion: Taking the Mystery Out of AI Pricing

AI pricing doesn’t have to feel like a secret code. Once you know that tokens are the unit of usage, input and output are billed separately, each model has a max text limit, and different models are built for different tasks, the numbers start to make sense. You can forecast costs, choose the right model for your job, and avoid surprises.

Need hands-on help integrating AWS Bedrock into your workflow? My team at Forged Concepts specializes in building scalable AI solutions for businesses.

If you’d like to discuss AWS Bedrock further, including choosing the right model or setting up your infrastructure, please feel free to connect with me on LinkedIn or email Forged Concepts at info@forgedconcepts.com. We are happy to share insights or point you to resources.

References

*Author's Note: I'm Jerry Warren, founder of Forged Concepts. We help startups and growing businesses unlock the full potential of the cloud by designing secure, scalable, and cost-efficient solutions on AWS. From DevOps integration to managed cloud services, our mission is to simplify complexity so your team can focus on innovation and growth.

If this guide was valuable, I'd love for you to explore more on our website or connect with us at Forged Concepts directly!

DEV Community