5 Hidden AI API Parameters (Most Developers Miss)

#ai #api #llm #programming

You're probably only using model and messages. Here are 5 advanced parameters that'll make your AI app faster, cheaper, and smarter.

Most developers treat AI APIs like a black box: send a prompt, get a response.

But the real magic is in the parameters. Here are 5 you should be using:

1. temperature — Control Randomness

response = client.chat.completions.create(
    model="deepseek-v4-pro",
    messages=[{"role": "user", "content": "Write a poem"}],
    temperature=0.2  # 0.0 = deterministic, 1.0 = creative
)

Use case: 0.0 for code generation, 0.7 for creative writing.

2. max_tokens — Prevent Runaway Costs

response = client.chat.completions.create(
    model="deepseek-v4-pro",
    messages=[{"role": "user", "content": "Explain AI"}],
    max_tokens=200  # ← Limit response length
)

Why it matters: A verbose model can burn through your budget. Cap it.

3. top_p — Nucleus Sampling

response = client.chat.completions.create(
    model="deepseek-v4-pro",
    messages=[{"role": "user", "content": "Suggest a startup idea"}],
    top_p=0.1  # Only consider top 10% of probability mass
)

What it does: Instead of considering all possible next tokens, only look at the top p%. Makes output more focused.

Rule of thumb: Use temperature OR top_p, not both.

4. frequency_penalty — Reduce Repetition

response = client.chat.completions.create(
    model="deepseek-v4-pro",
    messages=[{"role": "user", "content": "List 10 AI tools"}],
    frequency_penalty=0.5  # Penalize repeated phrases
)

Use case: Great for listicles, brainstorming, or any task where repetition sucks.

5. stream — Make Your App Feel 3x Faster

response = client.chat.completions.create(
    model="deepseek-v4-flash",
    messages=[{"role": "user", "content": "Explain quantum computing"}],
    stream=True  # ← Stream tokens as they're generated
)

for chunk in response:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")

Why it matters: First token in < 500ms instead of waiting 3 seconds for the full response.

The "Pro" Setup

Here's how I combine all 5 for a production app:

def ask_ai(prompt, task_type="general"):
    params = {
        "model": "deepseek-v4-flash",
        "messages": [{"role": "user", "content": prompt}],
        "stream": True,
        "max_tokens": 500,
    }

    if task_type == "code":
        params["temperature"] = 0.0
    elif task_type == "creative":
        params["temperature"] = 0.8
        params["frequency_penalty"] = 0.3

    return client.chat.completions.create(**params)