Token Counting Meets Amazon Bedrock

#ai #aws #bedrock #programming

When working with large language models through Amazon Bedrock, understanding token consumption can help managing costs and staying within model limits. While the Bedrock console provides token counts after each API call, developers need a way to measure tokens before sending requests, especially when building applications that process large volumes of text or require precise truncation.

Amazon Bedrock offers a CountTokens API that provides exact token measurements for the supported models, currently:

Anthropic Claude 4 Sonnet
Anthropic Claude 4 Opus
Anthropic Claude 3.7 Sonnet
Anthropic Claude 3.5 Sonnet
Anthropic Claude 3.5 Haiku

However, integrating this API into development workflows requires dealing with the correct syntax and implementing efficient algorithms if truncation is needed. This is where ttok4bedrock comes in—a command-line tool and Python library that makes token counting as simple as it should be.

# Count tokens (default: Claude Sonnet 4)
ttok4bedrock "Hello, world!"
# Output: 11

# Count from stdin
echo "Count these tokens" | ttok4bedrock
cat document.txt | ttok4bedrock

# Truncate to N tokens
ttok4bedrock -t 100 "Very long text..."
cat large.txt | ttok4bedrock -t 100 > truncated.txt

# Use specific Bedrock model (full model ID)
ttok4bedrock -m anthropic.claude-3-5-sonnet-20241022-v2:0 "Text"
ttok4bedrock -m anthropic.claude-3-7-sonnet-20250219-v1:0 "Text"

# Specify AWS region (uses default if not specified)
ttok4bedrock --aws-region us-west-2 "Text"

Standing on the Shoulders of Giants

Simon Willison's ttok has become a standard tool for token counting with OpenAI models, valued for its simplicity and versatility. Rather than creating something entirely new, I built ttok4bedrock as a drop-in replacement that maintains complete compatibility with ttok's interface while leveraging Bedrock's native CountTokens API.

The goal was straightforward: preserve the developer experience that made ttok successful while adapting to Bedrock's requirements. This means you can switch from ttok "Count my tokens" to ttok4bedrock "Count my tokens" without changing your scripts or learning new commands. The tool automatically handles AWS authentication using the standard boto3 credential chain and can work with any AWS Region.

Solving the Truncation Challenge

One of the most requested features in token counting tools is intelligent truncation—cutting text to fit within a specific token limit. This is not straightforward if you can only count tokens.

The truncation algorithm I implemented uses an adaptive approach that minimizes API calls while achieving exact results. It begins by analyzing text characteristics such as punctuation density and word length to estimate the character-to-token ratio. Through iterative refinement using linear interpolation, it finds the precise character boundary where the token count meets your target. The algorithm typically converges in 3-5 API calls for most texts, with built-in caching to eliminate redundant API requests.

For developers, this means you can pipe any text through the tool with a token limit and get perfectly truncated output: cat large_document.txt | ttok4bedrock -t 1000 > truncated.txt. The truncation is exact, not approximate, ensuring you maximize the content within your token budget.

The tool includes self-imposed limits to prevent runaway API usage, capping truncation attempts at 20 API calls. In practice, this limit is rarely reached, but it provides a safety net against unexpected edge cases or malformed input.

Handling AWS Integration Properly

Working with AWS services requires attention to authentication and configuration patterns. The tool respects the standard AWS credential chain, working seamlessly whether you're using environment variables, AWS profiles, IAM roles on EC2, or any other standard authentication method. Region selection follows the same precedence rules as other AWS tools, checking command-line arguments, environment variables, and configuration files in that order.

The tool requires minimal IAM permissions—just bedrock:CountTokens on the foundation model resources. This follows the principle of least privilege while keeping setup simple.

Technical Details That Matter

An interesting quirk of Amazon Bedrock CountTokens API is that it wraps text in message structures, adding approximately 7 tokens of overhead. This overhead is invisible but affects the count, potentially confusing developers who expect the raw token count of their text. The ttok4bedrock library automatically detects and subtracts this overhead, providing intuitive results that match what developers expect.

Model selection is explicit: ttok4bedrock -m anthropic.claude-3-5-sonnet-20241022-v2:0 "Your text here". Claude 4 Sonnet is the default if no model is specified.

Integration Patterns for Developers

For Python developers, the library offers the same API as ttok, making migration trivial. Import ttok4bedrock as ttok, and your existing code continues to work with Bedrock models for the functionalities that are provided (token count and truncation).

The CLI tool fits naturally into Unix-style pipelines, accepting input from stdin and outputting to stdout. This design enables powerful compositions with other text processing tools, making it easy to integrate token counting into existing workflows. Whether you're building a document processing pipeline or analyzing prompt efficiency, the tool adapts to your needs.

Practical Applications

Token counting might seem like a utility concern, but it enables important optimizations in production systems. Accurately measuring tokens before API calls helps with prompt and context engineering, allowing developers to maximize the information within model context windows. For applications that process user-generated content, pre-flight token counting prevents errors and improves user experience by providing immediate feedback about text length.

The truncation capability is particularly valuable for RAG (Retrieval-Augmented Generation) systems where you need to fit retrieved documents within prompt limits. Instead of crude character-based cutting that might break mid-word or mid-sentence, the tool provides clean truncation at exact token boundaries.

Getting Started

Installation is straightforward using uv, the fast Python package installer. After cloning the repository and running uv sync, you're ready to count tokens.

For teams already using ttok in their workflows, migration is as simple as aliasing ttok4bedrock to ttok. The identical command-line interface means existing scripts, documentation, and muscle memory all transfer seamlessly.

The next time you're working with Claude models on Amazon Bedrock and need to count or truncate tokens, give ttok4bedrock a try.