How to Sample Multiple Completions (n) Directly from Claude API without a for loop?

brando90 — Thu, 15 Aug 2024 00:38:16 +0000

I'm using the Anthropic Claude API and I'm trying to generate multiple completions (n completions) for a given prompt in a single API call. OpenAI's API provides an n parameter in their sampling settings to achieve this, but I can't find an equivalent option in the Claude API.

My Current Approach:

I'm currently using a retry mechanism to handle potential errors during API calls, which looks like this:

from tenacity import retry, stop_after_attempt, wait_exponential

def before_sleep(retry_state):
    print(f"(Tenacity) Retry, error that caused it: {retry_state.outcome.exception()}")

def retry_error_callback(retry_state):
    exception = retry_state.outcome.exception()
    exception_str = str(exception)
    if "prompt is too long" in exception_str and "400" in exception_str:
        raise exception
    return 'No error that requires us to exit early.'

@retry(stop=stop_after_attempt(20), wait=wait_exponential(multiplier=2, max=256), 
       before_sleep=before_sleep, retry_error_callback=retry_error_callback)
def call_to_anthropic_client_api_with_retry(gen: AnthropicGenerator, prompt: str) -> dict:
    response = gen.llm.messages.create(
        model=gen.model,
        max_tokens=gen.sampling_params.max_tokens,
        system=gen.system_prompt,
        messages=[
            {"role": "user", "content": [{"type": "text", "text": prompt}]}
        ],
        temperature=gen.sampling_params.temperature,
        top_p=gen.sampling_params.top_p,
        n=gen.sampling_params.n,  # Intended to generate multiple completions
        stop_sequences=gen.sampling_params.stop[:3],
    )
    return response

Problem:

I can't find an n parameter in the Anthropic API documentation that allows generating multiple completions in one request.

Questions:

Does the Claude API support generating multiple completions (n completions) directly within a single API call?
If not, is there a recommended workaround or best practice to achieve this without resorting to looping multiple requests? Any guidance or suggestions would be greatly appreciated!

cross: https://stackoverflow.com/questions/78873304/how-to-sample-multiple-completions-n-directly-from-claude-api-without-a-for-lo

Best practices to handle prompts that are too long for the LLM API (eg., Anthropic, OpenAi)?

brando90 — Fri, 02 Aug 2024 21:41:10 +0000

I am working with the Anthropic API to process text prompts, but I keep encountering the following error when my prompt exceeds the maximum token limit:

Error code: 400 - {'type': 'error', 'error': {'type': 'invalid_request_error', 'message': 'prompt is too long: 200936 tokens > 199999 maximum'}}

I need to ensure my prompts are within the 199,999 token limit before sending them to the API. Here's what I have so far:

I generate a long prompt with approximately 150K words.
I use the count_tokens method to check the token count.
If the token count exceeds the limit, I trim the prompt and retry. Here's the code I'm using:

from anthropic_bedrock import AnthropicBedrock
import anthropic
import random
import string

# Function to generate a random word
def generate_random_word(length):
    return ''.join(random.choices(string.ascii_lowercase, k=length))

# Generate ~150K words
words = [generate_random_word(random.randint(3, 10)) for _ in range(150000)]
print(f'Number of words: {len(words)}')

test_prompt = ' '.join(words)

# Function to count the number of tokens
def count_number_tokens(prompt: str, verbose: bool = False) -> tuple[int, int]:
    bedrock_client = AnthropicBedrock()
    anthropic_client = anthropic.Client()

    try:
        token_count_bedrock = bedrock_client.count_tokens(prompt)
    except Exception as e:
        token_count_bedrock = -1
        if verbose:
            print(f"Error counting tokens with Bedrock: {e}")

    try:
        token_count_anthropic = anthropic_client.count_tokens(prompt)
    except Exception as e:
        token_count_anthropic = -1
        if verbose:
            print(f"Error counting tokens with Anthropic: {e}")

    if verbose:
        print(f'token_count_bedrock={token_count_bedrock}, token_count_anthropic={token_count_anthropic}')
    return token_count_bedrock, token_count_anthropic

# Maximum token limit
max_tokens = 199_999

# Function to trim the prompt
def trim_prompt(prompt: str, max_tokens: int) -> str:
    initial_length = len(prompt)
    while True:
        _, token_count = count_number_tokens(prompt)
        if token_count <= max_tokens:
            break
        # Reduce the size of the prompt
        prompt = prompt[:len(prompt) - 1000]
        if len(prompt) == initial_length:
            # Avoid infinite loop in case prompt length doesn't change
            prompt = prompt[:len(prompt) // 2]
    return prompt

# Trim the prompt to fit within the token limit
trimmed_prompt = trim_prompt(test_prompt, max_tokens)

# Final check
final_token_count_bedrock, final_token_count_anthropic = count_number_tokens(trimmed_prompt, verbose=True)
print(f'Final prompt length: {len(trimmed_prompt)} characters')
print(f'Final token count (Bedrock): {final_token_count_bedrock}')
print(f'Final token count (Anthropic): {final_token_count_anthropic}')

Questions:

Is there a more efficient way to handle prompts that are too long for the Anthropic API?
Are there any best practices or recommended approaches for trimming prompts to fit within token limits?
How can I ensure my approach does not inadvertently lead to an infinite loop or excessive API calls?

Any guidance or suggestions would be greatly appreciated!

Note: I it's basically impossible to deduce the exact token index to truncate the string, since those companies don't return that afaik.

Ref:

Anthropic's Discord Channel: https://discord.com/channels/1072196207201501266/1268741091377676309

DEV Community: brando90

How to Sample Multiple Completions (n) Directly from Claude API without a for loop?

My Current Approach:

Best practices to handle prompts that are too long for the LLM API (eg., Anthropic, OpenAi)?