DEV Community

Cover image for How to Use the Hy3 Preview API for Free ?
Hassann
Hassann

Posted on • Originally published at apidog.com

How to Use the Hy3 Preview API for Free ?

Tencent open-sourced Hy3 Preview on April 22, 2026. Within a day, OpenRouter listed it as a fully free endpoint—no credit card, no metering, no trial window. You can call the same 295B-parameter Mixture-of-Experts model powering Tencent’s Yuanbao app and CodeBuddy assistant from your own code, for $0.

Try Apidog today

This guide shows you how to use the Hy3 Preview API for free via OpenRouter, the Hugging Face Space, or by running the raw Hy3 repo yourself. It covers Hy3’s unique reasoning modes and details on testing the API in Apidog—no throwaway scripts required.

If you want the fastest way to a working response, jump to “Step-by-step: call Hy3 Preview free on OpenRouter.”

TL;DR

  • Hy3 Preview is free on OpenRouter under model ID tencent/hy3-preview:free, with $0 input and output pricing.
  • Mixture-of-Experts model: 295B total parameters, 21B active, 192 experts (top-8 routing), and a 256K-token context window.
  • Three reasoning modes: no_think (fast), low, and high (deep chain-of-thought for coding/agent tasks).
  • Benchmarks: SWE-bench Verified 74.4, Terminal-Bench 2.0 54.4, GPQA Diamond 87.2, MMLU 87.42.
  • Three free ways to run: OpenRouter free tier, Hugging Face Hy3-preview Space, or local inference with vLLM and open weights.
  • Apidog is a great fit with the OpenRouter endpoint: Hy3 uses the OpenAI Chat Completions schema; just point your request at OpenRouter.

What is Hy3 Preview?

Hy3 Preview is Tencent’s first flagship release from their restructured Hunyuan foundation-model team, now led by Yao Shunyu (ex-OpenAI). It’s Tencent’s most capable model to date—a direct response to top Chinese open-weights models from DeepSeek, Alibaba, and Zhipu.

Hy3 Model Overview

Technical highlights (official model card):

  • Architecture: Mixture-of-Experts, 80 layers + one MTP layer, 64 attention heads with grouped-query attention.
  • Parameters: 295B total, 21B active per forward pass.
  • Experts: 192 specialists, top-8 routing per token.
  • Context: 256K tokens (262,144 on OpenRouter).
  • Tokenizer: 120,832 entries, BF16 precision.
  • License: Tencent Hy Community License (commercial use allowed with conditions).

Agentic training and improved RL infrastructure enable strong results on SWE-bench Verified, Terminal-Bench 2.0, and code/shell tasks—close to top closed models.

Benchmarks

Three free ways to use Hy3 Preview

Choose based on your workflow:

Path What it is Free? Good for
OpenRouter tencent/hy3-preview:free Hosted OpenAI-compatible API Yes, $0 in/out Agents, scripts, backend features
Hugging Face Space Browser chat demo Yes Quick prompts, smoke tests
Self-hosted weights (vLLM) Run on your own GPUs Free software, hardware cost Privacy, high volume, custom work

Most developers should start with OpenRouter—quick setup, generous free-tier rate limits, and OpenAI API compatibility.

Step-by-step: call Hy3 Preview free on OpenRouter

Minimal steps to your first free call:

OpenRouter Hy3 Preview

  1. Create an OpenRouter account. Sign up at openrouter.ai. Email only; no payment required for free-tier models.
  2. Generate an API key. In your dashboard, go to “Keys,” create a new key, and export it (e.g., export OPENROUTER_API_KEY=sk-or-...).
  3. Open the model page. Visit the Hy3 Preview free listing. Confirm the “Free” status and review usage stats (at launch: 6.81B prompt tokens/day).

Usage stats

  1. Send your first request. OpenRouter uses the OpenAI Chat Completions schema. Any OpenAI SDK works; here’s a curl example:
curl https://openrouter.ai/api/v1/chat/completions \
  -H "Authorization: Bearer $OPENROUTER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "tencent/hy3-preview:free",
    "messages": [
      {"role": "user", "content": "Explain the MoE routing decision inside a top-8 of 192 setup in 3 sentences."}
    ],
    "temperature": 0.9,
    "top_p": 1.0
  }'
Enter fullscreen mode Exit fullscreen mode
  1. Enable reasoning as needed. Add a reasoning parameter with effort set to low or high. OpenRouter returns a reasoning_details array:
{
  "model": "tencent/hy3-preview:free",
  "messages": [
    {"role": "user", "content": "Plan, then write a Bash script that rotates daily log files older than 30 days into a dated archive folder."}
  ],
  "reasoning": {"effort": "high"}
}
Enter fullscreen mode Exit fullscreen mode
  1. Iterate. Use the same thread to maintain context—Hy3’s 256K window can handle entire codebases.

That’s it. The OpenRouter free-tier model is identical to the one published on Hugging Face; quality is not downgraded.

Free, Plus, and self-host: comparison

Choose the right path for your use case:

Capability OpenRouter Free OpenRouter Paid Self-hosted (vLLM/SGLang)
Per-token cost $0 Per provider Electricity + GPU amortization
Reasoning modes no_think, low, high Same Same
Context length 256K 256K 256K (memory permitting)
Throughput under load Shared, deprioritized Dedicated Your hardware
Rate limits OpenRouter free cap Provider-specific None
Data retention OpenRouter policy Provider-specific Stays on your hardware
Reasoning token visibility Yes (reasoning_details) Yes Yes

Free is ideal for prototyping, benchmarks, and low-traffic agents. Paid/self-hosted is better for lower latency or if you exceed rate caps.

Prompt and parameter tips to maximize Hy3

  • Match temperature to mode. Use temperature=0.9, top_p=1.0 for creative work; drop to 0.3 for structured output.
  • Use no_think for chat. Default mode is fastest; use low/high only for planning, multi-step code, or math.
  • Name tools in the system prompt. Even with OpenRouter, explicitly describe tools for better results.
  • Quote code. Paste full files and then ask your question; don’t summarize.
  • Batch multi-file edits. Hy3 performs best with all relevant files provided at once.
  • Ask for a plan first. Use a two-step prompt: plan, then execute for better agentic task results.

Limits worth knowing

  • Rate limits flex with load. Free tier is shared; peak hours get 429s. Use retries with exponential backoff.
  • Reasoning tokens count as output. On the free tier, reasoning_details are free, but they’re billed on paid routes.
  • License is not Apache 2.0. Tencent Hy Community License allows commercial use with attribution and policy compliance—read the full license.
  • Tool calling requires correct parser. For self-hosting, use vLLM/SGLang with --tool-call-parser hy_v3 (or hunyuan for SGLang).
  • English and Chinese best supported. Other languages work, but quality drops.
  • Trailing US flagship models on some reasoning benchmarks. Hy3 is top-tier for Chinese models, but OpenAI/Google DeepMind still lead on the hardest tasks.

The developer fast path: Hy3 Preview plus Apidog

Command-line curl works for demos, but for real iteration use a visual API client.

  1. Open Apidog and create a new project. Import the OpenAI Chat Completions OpenAPI spec.
  2. Set base URL to https://openrouter.ai/api/v1 and add an environment variable for OPENROUTER_API_KEY.
  3. Create a request to /chat/completions with model set to tencent/hy3-preview:free.
  4. Fork requests to compare reasoning modes. Duplicate and tweak one parameter to run side by side (no_think, low, high).
  5. Save prompt templates. Use Apidog environments and variables to manage system prompts, schemas, and user turns for reuse.

If you’re migrating from Postman, see the API testing without Postman in 2026 guide. Prefer VS Code? Use Apidog inside VS Code to keep prompt tuning next to your code.

Free alternatives when you hit the cap

If OpenRouter’s free pool is throttled, try:

These don’t match Hy3’s coding scores, but cover chat/multilingual/multimodal use cases. For production, set up one Apidog collection per model and benchmark on your actual prompts.

Self-hosting Hy3 Preview with vLLM

If you have sufficient hardware, local inference is an option. Hy3’s model card recommends vLLM with tensor parallelism (8-way) and multi-token prediction:

vllm serve tencent/Hy3-preview \
  --tensor-parallel-size 8 \
  --speculative-config.method mtp \
  --speculative-config.num_speculative_tokens 1 \
  --tool-call-parser hy_v3 \
  --reasoning-parser hy_v3 \
  --enable-auto-tool-choice \
  --served-model-name hy3-preview
Enter fullscreen mode Exit fullscreen mode

SGLang: use --tool-call-parser hunyuan and --reasoning-parser hunyuan. Once the server is up at http://localhost:8000/v1, point any OpenAI SDK to it as you would OpenRouter—just update the base URL and key.

Expect to need eight H100-class GPUs (BF16) for the full model. Quantized versions may appear later, but full precision is currently required.

FAQ

Is Hy3 Preview free?

Yes. OpenRouter’s tencent/hy3-preview:free is $0 per million input/output tokens. Reasoning tokens are free on the free tier but count toward rate limits. Confirm current status on the OpenRouter model page.

How does Hy3 Preview compare to DeepSeek V3 and Qwen 3?

Hy3’s SWE-bench Verified (74.4) and Terminal-Bench 2.0 (54.4) scores put it with the best Chinese open models, with a strong agent/tool use focus. For chat, Qwen 3 and DeepSeek V3 are competitive; for agent/coding workflows, Hy3’s RL-trained tool use stands out.

What are Hy3’s reasoning modes?

Three: no_think (default), low, and high. Set via the reasoning parameter (OpenRouter) or chat_template_kwargs={"reasoning_effort": "high"} (direct model call). Use high for planning/multi-step code/math; default to off for chat.

Can I use Hy3 Preview commercially?

Yes, under the Tencent Hy Community License. Commercial use requires attribution and compliance—read full terms.

What context length does the free tier support?

256K tokens (262,144). Paste entire codebases and still have room for tool schemas and history.

How do I test Hy3 Preview without coding?

Use the Hugging Face Space for browser chat or point Apidog at the OpenRouter endpoint. Apidog supports the OpenAI OpenAPI spec—just set base URL, API key, and model name.

Top comments (0)