<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Hassann</title>
    <description>The latest articles on DEV Community by Hassann (@hassann).</description>
    <link>https://dev.to/hassann</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3890506%2F89a141f2-4995-48b3-b5f2-e00ba5055afb.png</url>
      <title>DEV Community: Hassann</title>
      <link>https://dev.to/hassann</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/hassann"/>
    <language>en</language>
    <item>
      <title>How to Run DeepSeek V4 Locally ?</title>
      <dc:creator>Hassann</dc:creator>
      <pubDate>Fri, 24 Apr 2026 04:22:00 +0000</pubDate>
      <link>https://dev.to/hassann/how-to-run-deepseek-v4-locally--45jo</link>
      <guid>https://dev.to/hassann/how-to-run-deepseek-v4-locally--45jo</guid>
      <description>&lt;p&gt;DeepSeek V4 dropped on April 23, 2026 with MIT-licensed weights on Hugging Face. That single license choice opens up frontier AI for any team wanting to run models on their own hardware. V4-Flash (284B total, 13B active) fits on two H100s at FP8. V4-Pro (1.6T total, 49B active) requires a cluster but matches GPT-5.5 and Claude Opus 4.6 on code and reasoning workloads.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://apidog.com/?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation" class="crayons-btn crayons-btn--primary"&gt;Try Apidog today&lt;/a&gt;
&lt;/p&gt;

&lt;p&gt;This guide walks through local deployment: hardware requirements, quantization, vLLM and SGLang setup, tool-use configuration, and a validation workflow in Apidog to confirm your local server before sending production traffic.&lt;/p&gt;

&lt;p&gt;For product overview, see &lt;a href="http://apidog.com/blog/what-is-deepseek-v4?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;what is DeepSeek V4&lt;/a&gt;. For hosted API usage, see &lt;a href="http://apidog.com/blog/how-to-use-deepseek-v4-api?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;how to use the DeepSeek V4 API&lt;/a&gt;. For cost details, see &lt;a href="http://apidog.com/blog/deepseek-v4-api-pricing?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;DeepSeek V4 API pricing&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;V4-Flash:&lt;/strong&gt; Runs on 2 × H100 80GB at FP8, or 1 × H100 at INT4. Weights ≈ 500GB (FP8).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;V4-Pro:&lt;/strong&gt; Needs 16+ H100s at FP8 for production throughput.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;vLLM:&lt;/strong&gt; Fastest path to OpenAI-compatible server. &lt;code&gt;vllm&amp;gt;=0.9.0&lt;/code&gt; adds V4 support.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;SGLang:&lt;/strong&gt; Alternative for better tool-use and structured-output features.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Quantization:&lt;/strong&gt; AWQ INT4 or GPTQ INT4 fits V4-Flash on a single 80GB card (~5% quality loss).&lt;/li&gt;
&lt;li&gt;Use &lt;a href="https://apidog.com/download?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;Apidog&lt;/a&gt; to test &lt;code&gt;http://localhost:8000/v1&lt;/code&gt; and reuse your hosted API collections.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Who should self-host
&lt;/h2&gt;

&lt;p&gt;Self-hosting V4 is right for:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Compliance-bound teams:&lt;/strong&gt; Health, finance, legal, or defense use-cases where data cannot leave the network. MIT-licensed open weights means no usage agreement or cross-border data flows.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Large stable workloads:&lt;/strong&gt; Above ~200B tokens/month, dedicated hardware beats API costs. Example: V4-Pro API = $1.74/M input + $3.48/M output.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Fine-tuning and research:&lt;/strong&gt; Base checkpoints are for further pre-training/domain adaptation. MIT license allows redistribution of tuned models.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Not for:&lt;/strong&gt; Prototypers, teams without GPU ops experience, or workloads &amp;lt; $200/month on the hosted API—operational overhead will outweigh cost savings at small scale.&lt;/p&gt;

&lt;h2&gt;
  
  
  Hardware requirements
&lt;/h2&gt;

&lt;p&gt;DeepSeek V4 uses FP4 + FP8 mixed precision, so VRAM needs are lower than raw parameter counts suggest.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Variant&lt;/th&gt;
&lt;th&gt;Total params&lt;/th&gt;
&lt;th&gt;Active params&lt;/th&gt;
&lt;th&gt;FP8 VRAM&lt;/th&gt;
&lt;th&gt;INT4 VRAM&lt;/th&gt;
&lt;th&gt;Minimum cards&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;V4-Flash&lt;/td&gt;
&lt;td&gt;284B&lt;/td&gt;
&lt;td&gt;13B&lt;/td&gt;
&lt;td&gt;~500GB&lt;/td&gt;
&lt;td&gt;~140GB&lt;/td&gt;
&lt;td&gt;2 × H100 80GB (FP8) or 1 × H100 (INT4)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;V4-Pro&lt;/td&gt;
&lt;td&gt;1.6T&lt;/td&gt;
&lt;td&gt;49B&lt;/td&gt;
&lt;td&gt;~2.4TB&lt;/td&gt;
&lt;td&gt;~700GB&lt;/td&gt;
&lt;td&gt;16 × H100 80GB (FP8) or 8 × H100 (INT4)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Notes:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;MoE memory is total, not active:&lt;/strong&gt; All experts must fit in VRAM, not just the active subset.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;H200 and MI300X:&lt;/strong&gt; 141GB/192GB cards need fewer GPUs.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Consumer GPUs:&lt;/strong&gt; Not supported (V4-Flash at INT4 won't run on RTX 5090 24GB).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Apple Silicon:&lt;/strong&gt; M3/M4 Max with 128GB unified memory can run V4-Flash at high quantization, but only for dev, not deployment.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Step 1: Download the weights
&lt;/h2&gt;

&lt;p&gt;Official Hugging Face repos:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://huggingface.co/deepseek-ai/DeepSeek-V4-Flash" rel="noopener noreferrer"&gt;&lt;code&gt;deepseek-ai/DeepSeek-V4-Flash&lt;/code&gt;&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro" rel="noopener noreferrer"&gt;&lt;code&gt;deepseek-ai/DeepSeek-V4-Pro&lt;/code&gt;&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Use &lt;code&gt;deepseek-ai/DeepSeek-V4-Flash-Base&lt;/code&gt; and &lt;code&gt;DeepSeek-V4-Pro-Base&lt;/code&gt; for fine-tuning.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Download example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-U&lt;/span&gt; &lt;span class="s2"&gt;"huggingface_hub[cli]"&lt;/span&gt;
huggingface-cli login

huggingface-cli download deepseek-ai/DeepSeek-V4-Flash &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--local-dir&lt;/span&gt; ./models/deepseek-v4-flash &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--local-dir-use-symlinks&lt;/span&gt; False
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;Reserve ~500GB disk for V4-Flash, several TBs for V4-Pro.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://modelscope.cn/models/deepseek-ai/DeepSeek-V4-Flash" rel="noopener noreferrer"&gt;ModelScope&lt;/a&gt; is faster for users in China.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Step 2: Pick a serving engine
&lt;/h2&gt;

&lt;p&gt;Two main options:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;vLLM:&lt;/strong&gt; High throughput, OpenAI-compatible, largest community—recommended for most teams.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;SGLang:&lt;/strong&gt; Better for tool-use, structured output, and long context. Use if you need advanced function calling.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Both support V4 as of their April 2026 releases.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 3: Serve V4-Flash with vLLM
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="s2"&gt;"vllm&amp;gt;=0.9.0"&lt;/span&gt;

vllm serve deepseek-ai/DeepSeek-V4-Flash &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--tensor-parallel-size&lt;/span&gt; 2 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--max-model-len&lt;/span&gt; 1048576 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--dtype&lt;/span&gt; auto &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--enable-prefix-caching&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--port&lt;/span&gt; 8000
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Flags:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;--tensor-parallel-size 2&lt;/code&gt;: Splits model across 2 H100s. Raise for more GPUs.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;--max-model-len 1048576&lt;/code&gt;: Full 1M-token context window. Reduce to save VRAM.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;--enable-prefix-caching&lt;/code&gt;: Enables fast repeated prefixes (mirrors hosted API cache).&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;--dtype auto&lt;/code&gt;: Uses FP8 mixed precision.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Server runs OpenAI-compatible endpoints at &lt;code&gt;http://localhost:8000/v1&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 4: Serve V4-Pro with vLLM
&lt;/h2&gt;

&lt;p&gt;Requires a cluster:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;vllm serve deepseek-ai/DeepSeek-V4-Pro &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--tensor-parallel-size&lt;/span&gt; 8 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--pipeline-parallel-size&lt;/span&gt; 2 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--max-model-len&lt;/span&gt; 524288 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--enable-prefix-caching&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--port&lt;/span&gt; 8000
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;--max-model-len 524288&lt;/code&gt; (512K) fits on a 16-H100 box; increase if VRAM allows.&lt;/li&gt;
&lt;li&gt;Use both pipeline and tensor parallelism for multi-node setups.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Step 5: Serve with SGLang (the tool-use alternative)
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="s2"&gt;"sglang[all]&amp;gt;=0.4.0"&lt;/span&gt;

python &lt;span class="nt"&gt;-m&lt;/span&gt; sglang.launch_server &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--model-path&lt;/span&gt; deepseek-ai/DeepSeek-V4-Flash &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--tp&lt;/span&gt; 2 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--context-length&lt;/span&gt; 1048576 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--port&lt;/span&gt; 30000
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;OpenAI-compatible endpoint at &lt;code&gt;http://localhost:30000/v1&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;SGLang's &lt;code&gt;lang&lt;/code&gt; DSL enables better function calling and structured output.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Step 6: Quantize for a single-GPU box
&lt;/h2&gt;

&lt;p&gt;INT4 quantization allows V4-Flash on a single 80GB GPU with minimal quality drop.&lt;/p&gt;

&lt;h3&gt;
  
  
  AWQ (recommended)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;autoawq

python &lt;span class="nt"&gt;-c&lt;/span&gt; &lt;span class="s2"&gt;"
from awq import AutoAWQForCausalLM
from transformers import AutoTokenizer

model_path = './models/deepseek-v4-flash'
out_path = './models/deepseek-v4-flash-awq'
model = AutoAWQForCausalLM.from_pretrained(model_path)
tokenizer = AutoTokenizer.from_pretrained(model_path)
model.quantize(tokenizer, quant_config={'w_bit': 4, 'q_group_size': 128})
model.save_quantized(out_path)
tokenizer.save_pretrained(out_path)
"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  GPTQ
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;auto-gptq
&lt;span class="c"&gt;# Follow the GPTQ quantization recipe; similar pattern to AWQ.&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;Serve quantized checkpoints with vLLM using &lt;code&gt;--quantization awq&lt;/code&gt; or &lt;code&gt;--quantization gptq&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Step 7: Test with Apidog
&lt;/h2&gt;

&lt;p&gt;Always validate your local server before sending production traffic.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyhlube65mg8kn9sbhwv0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyhlube65mg8kn9sbhwv0.png" alt="Apidog Validation" width="800" height="530"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Download Apidog.&lt;/li&gt;
&lt;li&gt;Create a collection targeting &lt;code&gt;http://localhost:8000/v1/chat/completions&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Paste in your standard test prompt (same as hosted API).&lt;/li&gt;
&lt;li&gt;Run a 500K-token context test to confirm KV cache stability.&lt;/li&gt;
&lt;li&gt;Run a tool-calling flow end-to-end before connecting agent loops.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Your hosted &lt;a href="http://apidog.com/blog/how-to-use-deepseek-v4-api?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;DeepSeek V4 API&lt;/a&gt; collections work locally—just change the base URL.&lt;/p&gt;

&lt;h2&gt;
  
  
  Observability and monitoring
&lt;/h2&gt;

&lt;p&gt;Track these from day one:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Tokens per second:&lt;/strong&gt; Both prompt and generation. vLLM exposes &lt;code&gt;/metrics&lt;/code&gt; in Prometheus format.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GPU utilization:&lt;/strong&gt; Use &lt;code&gt;nvidia-smi&lt;/code&gt; or DCGM. Sustained &amp;lt;70% means batch size is likely too small.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;KV cache hit rate:&lt;/strong&gt; With &lt;code&gt;--enable-prefix-caching&lt;/code&gt;, vLLM reports this. Falling rates signal prompt churn.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Request latency (p50/p95/p99):&lt;/strong&gt; Use tracing. High p99 with stable p50 means some requests are stalling the queue.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Send all four to Grafana or your existing observability stack.&lt;/p&gt;

&lt;h2&gt;
  
  
  Fine-tuning V4 Base checkpoints
&lt;/h2&gt;

&lt;p&gt;Base checkpoints are for continued pre-training and SFT. Standard SFT (with LoRA):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="s2"&gt;"torch&amp;gt;=2.6"&lt;/span&gt; transformers accelerate peft trl

&lt;span class="c"&gt;# Standard SFT with LoRA on V4-Flash-Base&lt;/span&gt;
python &lt;span class="nt"&gt;-m&lt;/span&gt; trl sft &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--model_name_or_path&lt;/span&gt; deepseek-ai/DeepSeek-V4-Flash-Base &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--dataset_name&lt;/span&gt; your-org/your-sft-set &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--output_dir&lt;/span&gt; ./models/v4-flash-custom &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--per_device_train_batch_size&lt;/span&gt; 1 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--gradient_accumulation_steps&lt;/span&gt; 16 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--learning_rate&lt;/span&gt; 2e-5 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--bf16&lt;/span&gt; &lt;span class="nb"&gt;true&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--use_peft&lt;/span&gt; &lt;span class="nb"&gt;true&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--lora_r&lt;/span&gt; 64 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--lora_alpha&lt;/span&gt; 128
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;Full-parameter tuning on V4-Pro is for research labs. LoRA adapters on V4-Flash-Base provide substantial quality gain for practical compute.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Common pitfalls
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;OOM at startup:&lt;/strong&gt; Usually &lt;code&gt;--max-model-len&lt;/code&gt; is too high or &lt;code&gt;--tensor-parallel-size&lt;/code&gt; too low. Lower context or increase parallelism.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Slow first request:&lt;/strong&gt; vLLM compiles kernels lazily. Warm up with a dummy request.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tool-use parsing errors:&lt;/strong&gt; DeepSeek encoding differs from OpenAI's. Use SDK versions with explicit V4 support.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;FP8 errors on old GPUs:&lt;/strong&gt; A100s lack FP8 support. Use BF16 and expect 2x VRAM needs.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  When self-hosting pays off
&lt;/h2&gt;

&lt;p&gt;Break-even vs. &lt;a href="http://apidog.com/blog/deepseek-v4-api-pricing?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;hosted DeepSeek V4 pricing&lt;/a&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;V4-Flash at 200B input + 20B output/month:&lt;/strong&gt; ~$33.6K on API. 8 × H100 box rents ≈ $20K/month. Self-hosting saves ~40%.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;V4-Pro at 500B input + 50B output/month:&lt;/strong&gt; ~$1.04M on API. 16 × H100 cluster rents ≈ $35K/month. Self-hosting saves &amp;gt;95%.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Break-even for V4-Flash: ≈ 100B tokens/month. Below that, hosted API is cheaper and simpler.&lt;/p&gt;

&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Can I run V4-Flash on a single A100?&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Heavy quantization and reduced context will run (INT4 on 80GB A100 = 5–15 tok/s), but H100 is much faster.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Does V4 support LoRA fine-tuning?&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Yes. Use Base checkpoints with TRL or Axolotl pipelines. MoE routing doesn't impact LoRA.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Is the local server OpenAI-compatible?&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Yes. Both vLLM and SGLang expose &lt;code&gt;/v1/chat/completions&lt;/code&gt; and &lt;code&gt;/v1/completions&lt;/code&gt; with OpenAI request shape. The &lt;a href="http://apidog.com/blog/how-to-use-deepseek-v4-api?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;hosted API guide&lt;/a&gt; applies to localhost.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How do I enable thinking mode locally?&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Pass &lt;code&gt;thinking_mode: "thinking"&lt;/code&gt; or &lt;code&gt;"thinking_max"&lt;/code&gt; in the request body. vLLM and SGLang forward the flag.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Can I stream from a local V4 server?&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Yes. Set &lt;code&gt;stream: true&lt;/code&gt; as you would for OpenAI or hosted DeepSeek API.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cheapest way to experiment before buying hardware?&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Rent a single H100 on RunPod or Lambda, run V4-Flash at INT4, and benchmark with your prompts. $10–$30 is enough for a real-world throughput check.&lt;/p&gt;

</description>
    </item>
    <item>
      <title>DeepSeek V4 API Pricing</title>
      <dc:creator>Hassann</dc:creator>
      <pubDate>Fri, 24 Apr 2026 04:19:13 +0000</pubDate>
      <link>https://dev.to/hassann/deepseek-v4-api-pricing-2j8f</link>
      <guid>https://dev.to/hassann/deepseek-v4-api-pricing-2j8f</guid>
      <description>&lt;p&gt;DeepSeek released V4 pricing on April 23, 2026, resetting expectations for frontier AI costs. V4-Flash starts at &lt;strong&gt;$0.14 per million input tokens and $0.28 per million output tokens&lt;/strong&gt;. V4-Pro is priced at &lt;strong&gt;$1.74 input and $3.48 output&lt;/strong&gt; per million tokens. Both support a 1M-token context window and up to 384K output tokens, with a cache-hit discount that cuts input costs by 80–90% on repeated prompts.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://apidog.com/?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation" class="crayons-btn crayons-btn--primary"&gt;Try Apidog today&lt;/a&gt;
&lt;/p&gt;

&lt;p&gt;This guide covers the full rate card, how context caching affects real per-call costs, a comparison with GPT-5.5 and Claude Opus, and four actionable methods to keep your spend predictable in Apidog.&lt;/p&gt;

&lt;p&gt;For additional details, see &lt;a href="http://apidog.com/blog/what-is-deepseek-v4?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;what is DeepSeek V4&lt;/a&gt;, the &lt;a href="http://apidog.com/blog/how-to-use-deepseek-v4-api?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;DeepSeek V4 API walkthrough&lt;/a&gt;, and &lt;a href="http://apidog.com/blog/how-to-use-deepseek-v4-for-free?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;how to use DeepSeek V4 for free&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;V4-Flash:&lt;/strong&gt; $0.14 / M input (cache miss), $0.028 / M input (cache hit), $0.28 / M output&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;V4-Pro:&lt;/strong&gt; $1.74 / M input (cache miss), $0.145 / M input (cache hit), $3.48 / M output&lt;/li&gt;
&lt;li&gt;Context window: &lt;strong&gt;1M tokens&lt;/strong&gt; input, &lt;strong&gt;384K tokens&lt;/strong&gt; output on both&lt;/li&gt;
&lt;li&gt;Cache-hit discount: &lt;strong&gt;~80% off Flash&lt;/strong&gt;, &lt;strong&gt;~92% off Pro&lt;/strong&gt; on repeated prefixes&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;deepseek-chat&lt;/code&gt; and &lt;code&gt;deepseek-reasoner&lt;/code&gt; deprecated July 24, 2026; billing maps to V4-Flash&lt;/li&gt;
&lt;li&gt;At cache-miss rates, V4-Pro is &lt;strong&gt;~2.9x cheaper than GPT-5.5&lt;/strong&gt; on input and &lt;strong&gt;~8.6x cheaper&lt;/strong&gt; on output&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The Full Rate Card
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Input (cache miss)&lt;/th&gt;
&lt;th&gt;Input (cache hit)&lt;/th&gt;
&lt;th&gt;Output&lt;/th&gt;
&lt;th&gt;Context&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;deepseek-v4-flash&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;$0.14 / M&lt;/td&gt;
&lt;td&gt;$0.028 / M&lt;/td&gt;
&lt;td&gt;$0.28 / M&lt;/td&gt;
&lt;td&gt;1M / 384K&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;deepseek-v4-pro&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;$1.74 / M&lt;/td&gt;
&lt;td&gt;$0.145 / M&lt;/td&gt;
&lt;td&gt;$3.48 / M&lt;/td&gt;
&lt;td&gt;1M / 384K&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;deepseek-chat&lt;/code&gt; (deprecated)&lt;/td&gt;
&lt;td&gt;maps to V4-Flash non-thinking&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;deepseek-reasoner&lt;/code&gt; (deprecated)&lt;/td&gt;
&lt;td&gt;maps to V4-Flash thinking&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Key implementation details:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Pricing is set by model ID&lt;/strong&gt;—thinking/non-thinking mode only affects how many tokens you consume, not the rate.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cache-hit pricing is automatic&lt;/strong&gt;—any repeated prefix ≥1,024 tokens (byte-for-byte match) within the same account gets discounted input pricing. No setup required.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Old model IDs&lt;/strong&gt; (&lt;code&gt;deepseek-chat&lt;/code&gt;, &lt;code&gt;deepseek-reasoner&lt;/code&gt;) are now V4-Flash aliases. If you haven’t migrated, you’re already billed at V4-Flash rates. The deprecation deadline is July 24, 2026.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Context Caching Explained
&lt;/h2&gt;

&lt;p&gt;Context caching is the biggest lever to reduce DeepSeek V4 costs. Any repeated content across calls—such as long system prompts, agent schemas, or RAG context—is billed at a heavily discounted input rate after the first call.&lt;/p&gt;

&lt;h3&gt;
  
  
  Example: Agent with Static System Prompt
&lt;/h3&gt;

&lt;p&gt;Suppose you run an agent with a 20,000-token system prompt and 100 user questions (200 tokens each).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Without caching:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Input: 100 × 20,200 × $1.74 / M = $3.52&lt;/li&gt;
&lt;li&gt;Output: 100 × 500 × $3.48 / M = $0.17&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Total: $3.69&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;With caching (1 miss, 99 hits):&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;First input: 20,200 × $1.74 / M = $0.035&lt;/li&gt;
&lt;li&gt;99 cache-hit prefixes: 99 × 20,000 × $0.145 / M = $0.287&lt;/li&gt;
&lt;li&gt;99 user turns: 99 × 200 × $1.74 / M = $0.034&lt;/li&gt;
&lt;li&gt;Output: 100 × 500 × $3.48 / M = $0.174&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Total: $0.53&lt;/strong&gt; (over 7x cheaper)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;On V4-Flash, the effect is even more pronounced due to the already low base rate.&lt;/p&gt;

&lt;h2&gt;
  
  
  Comparing DeepSeek V4 to GPT-5.5 and Claude
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Input (std)&lt;/th&gt;
&lt;th&gt;Input (cached)&lt;/th&gt;
&lt;th&gt;Output&lt;/th&gt;
&lt;th&gt;Context&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;DeepSeek V4-Flash&lt;/td&gt;
&lt;td&gt;$0.14 / M&lt;/td&gt;
&lt;td&gt;$0.028 / M&lt;/td&gt;
&lt;td&gt;$0.28 / M&lt;/td&gt;
&lt;td&gt;1M&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;DeepSeek V4-Pro&lt;/td&gt;
&lt;td&gt;$1.74 / M&lt;/td&gt;
&lt;td&gt;$0.145 / M&lt;/td&gt;
&lt;td&gt;$3.48 / M&lt;/td&gt;
&lt;td&gt;1M&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GPT-5.5&lt;/td&gt;
&lt;td&gt;$5 / M&lt;/td&gt;
&lt;td&gt;$1.25 / M&lt;/td&gt;
&lt;td&gt;$30 / M&lt;/td&gt;
&lt;td&gt;1M&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GPT-5.5 Pro&lt;/td&gt;
&lt;td&gt;$30 / M&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;$180 / M&lt;/td&gt;
&lt;td&gt;1M&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Claude Opus 4.6&lt;/td&gt;
&lt;td&gt;$15 / M&lt;/td&gt;
&lt;td&gt;$1.50 / M&lt;/td&gt;
&lt;td&gt;$75 / M&lt;/td&gt;
&lt;td&gt;200K&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Takeaways:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;V4-Pro is ~8.6x cheaper than GPT-5.5&lt;/strong&gt; and &lt;strong&gt;~21x cheaper than Claude Opus 4.6&lt;/strong&gt; on output tokens.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cached input:&lt;/strong&gt; V4-Pro is ~10x cheaper than both GPT-5.5 and Claude.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Benchmarking:&lt;/strong&gt; V4-Pro matches or beats GPT-5.5 on LiveCodeBench (93.5 vs top tier) and Codeforces (3206 vs 3168). For full benchmarks, see &lt;a href="http://apidog.com/blog/what-is-deepseek-v4?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;what is DeepSeek V4&lt;/a&gt;.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Caveats:&lt;/strong&gt; Claude outperforms V4-Pro on long-context retrieval, and Gemini 3.1 Pro leads on MMLU-Pro. If your workload depends on long-context retrieval, weigh quality vs. price savings.&lt;/p&gt;

&lt;h2&gt;
  
  
  Cost Modeling for Common Workloads
&lt;/h2&gt;

&lt;p&gt;Here’s what typical workloads cost on V4-Pro (cache-miss baseline):&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Agentic Coding Loop (50K context, 2K output, 20 calls)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Input: 50,000 × 20 × $1.74 / M = $1.74
Output: 2,000 × 20 × $3.48 / M = $0.14
Per-task cost: ~$1.88
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;em&gt;GPT-5.5: ~$6.20 per task.&lt;/em&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Long-Document Q&amp;amp;A (500K context, 1K output)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Input: 500,000 × $1.74 / M = $0.87
Output: 1,000 × $3.48 / M = $0.003
Per-call cost: ~$0.87
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;em&gt;GPT-5.5: ~$2.53 per call.&lt;/em&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  3. High-Volume Classification (2K context, 200 output, 10,000 calls)
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;Use V4-Flash; V4-Pro is overkill.&lt;br&gt;
&lt;/p&gt;
&lt;/blockquote&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Input: 2,000 × 10,000 × $0.14 / M = $2.80
Output: 200 × 10,000 × $0.28 / M = $0.56
Run cost: ~$3.36
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;em&gt;GPT-5.5: ~$110 per run.&lt;/em&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Repeated-Prompt Chatbot (10K system, 500 user, 1K output, 1,000 sessions)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;First input: 10,500 × $1.74 / M = $0.018
Cache-hit input: 999 × 10,000 × $0.145 / M = $1.45
Cache-miss user: 999 × 500 × $1.74 / M = $0.87
Output: 1,000 × 1,000 × $3.48 / M = $3.48
Session run cost: ~$5.82
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;em&gt;GPT-5.5 (with caching): ~$26.35.&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Hidden Costs to Watch
&lt;/h2&gt;

&lt;p&gt;Be aware of these cost traps:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Thinking-mode token inflation:&lt;/strong&gt; &lt;code&gt;thinking_max&lt;/code&gt; burns 3–10x more output tokens. Only use Think Max for critical tasks.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Silent context growth:&lt;/strong&gt; Agent loops that feed entire conversations back into each turn can balloon costs. Truncate or summarize aggressively.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Retry storms:&lt;/strong&gt; Uncapped retries (e.g., on every HTTP 500) can quickly double your bill. Implement exponential backoff and set a hard retry cap.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Development churn:&lt;/strong&gt; Iterating with raw curl replays the full context each time. Use &lt;a href="https://apidog.com/?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;Apidog&lt;/a&gt; for variable substitution and to avoid unnecessary prompt replays.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Track Cost in Apidog
&lt;/h2&gt;

&lt;p&gt;Optimize workflow and avoid surprises:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;a href="https://apidog.com/download?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;Download Apidog&lt;/a&gt; and store your &lt;code&gt;DEEPSEEK_API_KEY&lt;/code&gt; as a secret per environment.&lt;/li&gt;
&lt;li&gt;Save a POST request to &lt;code&gt;https://api.deepseek.com/v1/chat/completions&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;In the response panel, pin &lt;code&gt;usage.prompt_tokens&lt;/code&gt;, &lt;code&gt;usage.completion_tokens&lt;/code&gt;, and &lt;code&gt;usage.reasoning_tokens&lt;/code&gt;—you’ll see cost metrics with every call.&lt;/li&gt;
&lt;li&gt;Parameterize &lt;code&gt;model&lt;/code&gt; and &lt;code&gt;thinking_mode&lt;/code&gt; so you can A/B V4-Flash vs V4-Pro, and Non-Think vs Think Max, without duplicating requests.&lt;/li&gt;
&lt;li&gt;Mirror the collection for GPT-5.5 using the &lt;a href="http://apidog.com/blog/how-to-use-gpt-5-5-api?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;GPT-5.5 API guide&lt;/a&gt;. One window, both providers, full cost transparency.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This setup catches ~80% of surprises that show up on invoices.&lt;/p&gt;

&lt;h2&gt;
  
  
  Four Rules to Keep Spend Predictable
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Default to V4-Flash.&lt;/strong&gt; Use V4-Pro only if a measurable quality gap impacts revenue.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Default to Non-Think.&lt;/strong&gt; Escalate to Think High as needed; reserve Think Max for correctness-critical work.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cap &lt;code&gt;max_tokens&lt;/code&gt;.&lt;/strong&gt; The 384K output ceiling is a safety net, not a target. Production answers usually fit in 2K.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Ship usage telemetry.&lt;/strong&gt; Log &lt;code&gt;prompt_tokens&lt;/code&gt;, &lt;code&gt;completion_tokens&lt;/code&gt;, and &lt;code&gt;reasoning_tokens&lt;/code&gt; on every call. Alert on reasoning-token spikes—they often signal prompt drift into Think Max.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Is there a free tier?&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
No usage-free API tier, but new accounts may get a trial credit. For zero-cost options, see &lt;a href="http://apidog.com/blog/how-to-use-deepseek-v4-for-free?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;how to use DeepSeek V4 for free&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How does cache-hit pricing work?&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Prefixes ≥1,024 tokens that repeat across requests in the same account are billed at the cache-hit rate. First call is full rate; subsequent identical-prefix calls are discounted. Caching is automatic.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Do thinking modes cost more?&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Per-token rates are unchanged. Thinking modes generate more tokens (reasoning traces). Monitor &lt;code&gt;reasoning_tokens&lt;/code&gt; in the &lt;code&gt;usage&lt;/code&gt; object to assess real cost.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Is pricing stable?&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
DeepSeek updates pricing periodically. V3.2 rates lasted most of 2025; V4 pricing has no published end-date. Always check the &lt;a href="https://api-docs.deepseek.com/quick_start/pricing" rel="noopener noreferrer"&gt;live pricing page&lt;/a&gt; before budgeting.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Are V4-Pro and V4-Flash output rates the same?&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
No. V4-Pro output is $3.48 / M; V4-Flash is $0.28 / M. The 12.4x difference is the main reason to default to V4-Flash.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Does the Anthropic-format endpoint change pricing?&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
No. &lt;code&gt;https://api.deepseek.com/anthropic&lt;/code&gt; uses the same pricing as the OpenAI-format endpoint. Format does not affect cost.&lt;/p&gt;

</description>
    </item>
    <item>
      <title>How to Use DeepSeek V4 for Free ?</title>
      <dc:creator>Hassann</dc:creator>
      <pubDate>Fri, 24 Apr 2026 04:16:30 +0000</pubDate>
      <link>https://dev.to/hassann/how-to-use-deepseek-v4-for-free--4470</link>
      <guid>https://dev.to/hassann/how-to-use-deepseek-v4-for-free--4470</guid>
      <description>&lt;p&gt;DeepSeek V4 launched on April 23, 2026, with real free access options. The official web chat runs V4-Pro—no credit card needed—and the weights are MIT-licensed and downloadable. Aggregators like OpenRouter and Chutes enable free tiers within days of a DeepSeek launch. With these methods, you can run substantial V4 workloads for free before considering paid plans.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://apidog.com/?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation" class="crayons-btn crayons-btn--primary"&gt;Try Apidog today&lt;/a&gt;
&lt;/p&gt;

&lt;p&gt;This guide outlines all verified no-cost paths, how they fit different use cases, and how to set up a production-ready collection in &lt;a href="https://apidog.com?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;Apidog&lt;/a&gt; for a smooth transition to paid billing as your needs grow.&lt;/p&gt;

&lt;p&gt;For a product overview, see &lt;a href="http://apidog.com/blog/what-is-deepseek-v4?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;what is DeepSeek V4&lt;/a&gt;. For API details, see &lt;a href="http://apidog.com/blog/how-to-use-deepseek-v4-api?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;how to use the DeepSeek V4 API&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="http://chat.deepseek.com" rel="noopener noreferrer"&gt;chat.deepseek.com&lt;/a&gt;&lt;/strong&gt; — Free web chat on V4-Pro with Think High and Think Max toggles. No card needed. Available now.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Hugging Face weights + your own GPU&lt;/strong&gt; — MIT license. V4-Flash runs on 2–4 H100s; V4-Pro requires a cluster.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;OpenRouter and Chutes free tiers&lt;/strong&gt; — Third-party gateways typically open free quota on DeepSeek models within a week of launch.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Hugging Face Inference Providers&lt;/strong&gt; — Shared, rate-limited endpoint for early experimentation with V4.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Kaggle, Colab, and RunPod trial credits&lt;/strong&gt; — Free compute for one-off self-hosting tests.&lt;/li&gt;
&lt;li&gt;All free paths have usage caps. For production workloads, switch to paid billing before hitting limits.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fassets.apidog.com%2Fblog-next%2F2026%2F04%2Fimage-225.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fassets.apidog.com%2Fblog-next%2F2026%2F04%2Fimage-225.png" alt="" width="800" height="550"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Path 1: chat.deepseek.com (the default free path)
&lt;/h2&gt;

&lt;p&gt;The quickest, most reliable free option is the official chat interface. V4-Pro is the default model; use the top toggle to switch between Non-Think, Think High, and Think Max reasoning modes.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fassets.apidog.com%2Fblog-next%2F2026%2F04%2Fimage-224.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fassets.apidog.com%2Fblog-next%2F2026%2F04%2Fimage-224.png" alt="" width="800" height="434"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Setup
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;Go to &lt;a href="https://chat.deepseek.com/" rel="noopener noreferrer"&gt;chat.deepseek.com&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Sign in with email, Google, or WeChat.&lt;/li&gt;
&lt;li&gt;Ensure the active model is V4-Pro.&lt;/li&gt;
&lt;li&gt;Start chatting.&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  What you get
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Full 1M-token context window.&lt;/li&gt;
&lt;li&gt;File upload (PDFs, images, code bundles).&lt;/li&gt;
&lt;li&gt;On-demand web search.&lt;/li&gt;
&lt;li&gt;All three reasoning modes, including Think Max.&lt;/li&gt;
&lt;li&gt;Conversation history and folder organization.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Usage caps
&lt;/h3&gt;

&lt;p&gt;There’s no published hard daily message cap; free usage is soft-throttled under load. Heavy use may slow responses or queue requests, but hard blocks are rare. If you see persistent rate limits, slow down or switch to the API.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Best for:&lt;/strong&gt; Testing prompts, reviewing large codebases or documents, running Think Max on complex inputs.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Not for:&lt;/strong&gt; Automation or reproducible workflows.&lt;/p&gt;

&lt;h2&gt;
  
  
  Path 2: Self-host V4-Flash on your own GPU
&lt;/h2&gt;

&lt;p&gt;V4-Flash is MIT-licensed and practical for self-hosting. At 284B total, 13B active, a multi-H100 machine runs it in FP8; with INT4 quantization, it fits on a single 80GB card.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt; The main cost is hardware. If you have GPUs, this is the most robust free path.&lt;/p&gt;

&lt;h3&gt;
  
  
  Pull the weights
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-U&lt;/span&gt; &lt;span class="s2"&gt;"huggingface_hub[cli]"&lt;/span&gt;
huggingface-cli login
huggingface-cli download deepseek-ai/DeepSeek-V4-Flash &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--local-dir&lt;/span&gt; ./models/deepseek-v4-flash
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;em&gt;Reserve about 500GB disk for FP8 weights.&lt;/em&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Serve with vLLM
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="s2"&gt;"vllm&amp;gt;=0.9.0"&lt;/span&gt;

vllm serve deepseek-ai/DeepSeek-V4-Flash &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--tensor-parallel-size&lt;/span&gt; 4 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--max-model-len&lt;/span&gt; 1048576 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--dtype&lt;/span&gt; auto &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--port&lt;/span&gt; 8000
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Once running, use any OpenAI-compatible client with &lt;code&gt;http://localhost:8000/v1&lt;/code&gt;. The API shape matches the paid DeepSeek API; &lt;a href="https://apidog.com?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;Apidog&lt;/a&gt; can treat this as a new base URL with your saved collections.&lt;/p&gt;

&lt;h3&gt;
  
  
  Hardware requirements
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Variant&lt;/th&gt;
&lt;th&gt;Minimum cards (FP8)&lt;/th&gt;
&lt;th&gt;Minimum cards (INT4)&lt;/th&gt;
&lt;th&gt;Realistic throughput&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;V4-Flash&lt;/td&gt;
&lt;td&gt;2 × H100 80GB&lt;/td&gt;
&lt;td&gt;1 × H100 80GB&lt;/td&gt;
&lt;td&gt;50 to 150 tok/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;V4-Pro&lt;/td&gt;
&lt;td&gt;16 × H100 80GB&lt;/td&gt;
&lt;td&gt;8 × H100 80GB&lt;/td&gt;
&lt;td&gt;cluster-dependent&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;If you don’t own GPU capacity, paid APIs are cheaper than renting. Self-hosting suits teams with existing hardware or strict compliance needs.&lt;/p&gt;

&lt;h2&gt;
  
  
  Path 3: OpenRouter free tier
&lt;/h2&gt;

&lt;p&gt;OpenRouter aggregates open and closed models behind a single API. Free tiers are typically available for new DeepSeek releases (V3, V3.1, V3.2).&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fassets.apidog.com%2Fblog-next%2F2026%2F04%2Fimage-226.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fassets.apidog.com%2Fblog-next%2F2026%2F04%2Fimage-226.png" alt="" width="800" height="293"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Setup
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;Sign up at &lt;a href="https://openrouter.ai/" rel="noopener noreferrer"&gt;openrouter.ai&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Create an API key.&lt;/li&gt;
&lt;li&gt;Check the model catalog for &lt;code&gt;deepseek/deepseek-v4-pro&lt;/code&gt; or &lt;code&gt;deepseek/deepseek-v4-flash&lt;/code&gt;; free variants are usually suffixed with &lt;code&gt;:free&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Use the OpenAI-compatible SDK:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OpenAI&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;OPENROUTER_KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;base_url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://openrouter.ai/api/v1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;deepseek/deepseek-v4-flash:free&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Write a Python CLI for semver bumping.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}],&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Usage caps
&lt;/h3&gt;

&lt;p&gt;OpenRouter free tiers typically allow a few hundred requests per day per key and lower priority under load. Good for prototyping, not for production.&lt;/p&gt;

&lt;h2&gt;
  
  
  Path 4: Hugging Face Inference Providers
&lt;/h2&gt;

&lt;p&gt;Hugging Face offers hosted inference endpoints for V4 checkpoints soon after release. These are rate-limited and may have variable latency, but are free to use.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;huggingface_hub&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;InferenceClient&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;InferenceClient&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;deepseek-ai/DeepSeek-V4-Flash&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;chat_completion&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Summarize the V4 technical report in 5 bullets.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}],&lt;/span&gt;
    &lt;span class="n"&gt;max_tokens&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;512&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;HF tokens are free. For heavier use, a Pro account offers higher limits, still at a lower cost than the official API.&lt;/p&gt;

&lt;h2&gt;
  
  
  Path 5: Trial credits on Colab, Kaggle, RunPod, and Lambda
&lt;/h2&gt;

&lt;p&gt;Major GPU-rental providers offer trial credits for short-term use.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Google Colab&lt;/strong&gt;: Free T4 is too small; Colab Pro+ gives 500 units/month—enough for V4-Flash experiments on A100.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Kaggle&lt;/strong&gt;: Free weekly T4/P100 hours; usually too small for V4-Pro but can handle quantized V4-Flash.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;RunPod&lt;/strong&gt;: $10 trial covers a few hours on H100—enough for vLLM benchmarking.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Lambda&lt;/strong&gt;: Occasionally offers free hours on H100/H200; check signup for current promos.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These options support bounded experiments, not ongoing free usage.&lt;/p&gt;

&lt;h2&gt;
  
  
  Build a provider-agnostic Apidog collection
&lt;/h2&gt;

&lt;p&gt;You can test the same prompt across all providers without duplicating work. Recommended workflow:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Download Apidog.&lt;/li&gt;
&lt;li&gt;Create a collection with four environments: &lt;code&gt;chat&lt;/code&gt; (placeholder), &lt;code&gt;deepseek&lt;/code&gt; (&lt;code&gt;https://api.deepseek.com/v1&lt;/code&gt;), &lt;code&gt;openrouter&lt;/code&gt; (&lt;code&gt;https://openrouter.ai/api/v1&lt;/code&gt;), &lt;code&gt;self-hosted&lt;/code&gt; (&lt;code&gt;http://localhost:8000/v1&lt;/code&gt;).&lt;/li&gt;
&lt;li&gt;Save a POST request to &lt;code&gt;{{BASE_URL}}/chat/completions&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Store each provider’s key as a secret variable for consistent requests across environments.&lt;/li&gt;
&lt;li&gt;Switch environments to A/B test prompts on every backend.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This matches the pattern from the &lt;a href="http://apidog.com/blog/how-to-use-gpt-5-5-api-for-free?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;GPT-5.5 free-tier collection&lt;/a&gt;: one tool, every provider, no duplicate setup.&lt;/p&gt;

&lt;h2&gt;
  
  
  Which free path should you pick?
&lt;/h2&gt;

&lt;p&gt;Use these heuristics:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Want a quick opinion?&lt;/strong&gt; Use &lt;a href="http://chat.deepseek.com" rel="noopener noreferrer"&gt;chat.deepseek.com&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Prototyping a product?&lt;/strong&gt; Use OpenRouter’s free tier until capped, then switch to paid DeepSeek.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Have GPUs &amp;amp; compliance needs?&lt;/strong&gt; Self-host V4-Flash with vLLM.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Need long-term free usage?&lt;/strong&gt; No sustainable option—combine &lt;a href="http://chat.deepseek.com" rel="noopener noreferrer"&gt;chat.deepseek.com&lt;/a&gt; for manual work and paid API for automation.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  When to move off free
&lt;/h2&gt;

&lt;p&gt;Move to paid billing if:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Rate-limited more than once daily&lt;/strong&gt;—your workload justifies a budget.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Need SLAs&lt;/strong&gt;—free tiers don’t provide them; the official API does.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Need logging, auditing, or compliance&lt;/strong&gt;—paid API delivers billing records; most free tiers don’t.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;When these apply, use the &lt;a href="http://apidog.com/blog/how-to-use-deepseek-v4-api?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;official API&lt;/a&gt;. Minimum top-up is $2, and per-token pricing is among the lowest.&lt;/p&gt;

&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Is &lt;a href="http://chat.deepseek.com" rel="noopener noreferrer"&gt;chat.deepseek.com&lt;/a&gt; really free?&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Yes. No credit card or trial period. Service is soft-throttled, not paywalled.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Do I need a Hugging Face account to download weights?&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Not strictly, but a logged-in account gets better download rate limits.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Which free path runs real V4-Pro?&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
&lt;a href="http://chat.deepseek.com" rel="noopener noreferrer"&gt;chat.deepseek.com&lt;/a&gt; provides full V4-Pro. OpenRouter’s free tier is usually V4-Flash. For V4-Pro output without paying, use the web chat.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Can I put a free tier behind a product?&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
No. Free tiers can rate-limit, change terms, or disappear. For customer-facing products, use the paid API or self-host.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Is self-hosting actually free?&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
License is free; hardware is not. If you own GPUs, marginal cost is electricity. Renting usually costs more than the paid API.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Will there be an Apidog free tier for testing?&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
&lt;a href="https://apidog.com?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;Apidog&lt;/a&gt; is free for API design and testing; credits are only needed for paid APIs. You can use a free Apidog workspace with &lt;a href="http://chat.deepseek.com" rel="noopener noreferrer"&gt;chat.deepseek.com&lt;/a&gt; or OpenRouter for a fully free workflow.&lt;/p&gt;

</description>
    </item>
    <item>
      <title>How to Use DeepSeek V4: Web Chat, API, and Self-Hosted Paths</title>
      <dc:creator>Hassann</dc:creator>
      <pubDate>Fri, 24 Apr 2026 04:11:47 +0000</pubDate>
      <link>https://dev.to/hassann/how-to-use-deepseek-v4-web-chat-api-and-self-hosted-paths-mh1</link>
      <guid>https://dev.to/hassann/how-to-use-deepseek-v4-web-chat-api-and-self-hosted-paths-mh1</guid>
      <description>&lt;p&gt;DeepSeek V4 launched on April 23, 2026, offering four checkpoints, a live API, and open weights (MIT license) on Hugging Face. You can access it instantly, make production API calls, or self-host for on-premise deployment. This guide covers all three options with actionable steps, tradeoffs, and a production-ready prompt workflow.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://apidog.com/?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation" class="crayons-btn crayons-btn--primary"&gt;Try Apidog today&lt;/a&gt;
&lt;/p&gt;

&lt;p&gt;If you need an overview, start with &lt;a href="http://apidog.com/blog/what-is-deepseek-v4?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;what is DeepSeek V4&lt;/a&gt;. For API integration, read the &lt;a href="http://apidog.com/blog/how-to-use-deepseek-v4-api?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;DeepSeek V4 API guide&lt;/a&gt;. For zero-cost usage, see &lt;a href="http://apidog.com/blog/how-to-use-deepseek-v4-for-free?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;how to use DeepSeek V4 for free&lt;/a&gt;. When you’re ready to test, download &lt;a href="https://apidog.com?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;Apidog&lt;/a&gt; to pre-build your API collection.&lt;/p&gt;

&lt;h2 id="tldr"&gt;TL;DR&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Fastest: &lt;a href="https://chat.deepseek.com/" rel="noopener noreferrer"&gt;chat.deepseek.com&lt;/a&gt; — Free chat UI, V4-Pro by default, three reasoning modes.&lt;/li&gt;
&lt;li&gt;Production: Use &lt;code&gt;https://api.deepseek.com/v1/chat/completions&lt;/code&gt; with &lt;code&gt;deepseek-v4-pro&lt;/code&gt; or &lt;code&gt;deepseek-v4-flash&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Self-hosted: Pull weights from &lt;a href="https://huggingface.co/collections/deepseek-ai/deepseek-v4" rel="noopener noreferrer"&gt;Hugging Face&lt;/a&gt;, run the &lt;code&gt;/inference&lt;/code&gt; code.&lt;/li&gt;
&lt;li&gt;Choose &lt;strong&gt;Non-Think&lt;/strong&gt; for fast routing/classification, &lt;strong&gt;Think High&lt;/strong&gt; for code/analysis, &lt;strong&gt;Think Max&lt;/strong&gt; for accuracy-critical tasks.&lt;/li&gt;
&lt;li&gt;Recommended sampling: &lt;code&gt;temperature=1.0, top_p=1.0&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Use &lt;a href="https://apidog.com/download?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;Apidog&lt;/a&gt; as your API client. The OpenAI-compatible format allows easy replay across DeepSeek, OpenAI, and Anthropic.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fassets.apidog.com%2Fblog-next%2F2026%2F04%2Fimage-220.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fassets.apidog.com%2Fblog-next%2F2026%2F04%2Fimage-220.png" alt="" width="800" height="550"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2 id="pick-the-right-path-for-your-workload"&gt;Pick the right path for your workload&lt;/h2&gt;

&lt;p&gt;Choose the integration that fits your needs:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Path&lt;/th&gt;
&lt;th&gt;Cost&lt;/th&gt;
&lt;th&gt;Setup time&lt;/th&gt;
&lt;th&gt;Best for&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="http://chat.deepseek.com" rel="noopener noreferrer"&gt;chat.deepseek.com&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Free&lt;/td&gt;
&lt;td&gt;30 seconds&lt;/td&gt;
&lt;td&gt;Quick tests, ad-hoc work&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;DeepSeek API&lt;/td&gt;
&lt;td&gt;Per-token billing&lt;/td&gt;
&lt;td&gt;5 minutes&lt;/td&gt;
&lt;td&gt;Production, agents, batch jobs&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Self-hosted V4-Flash&lt;/td&gt;
&lt;td&gt;Hardware cost only&lt;/td&gt;
&lt;td&gt;A few hours&lt;/td&gt;
&lt;td&gt;On-prem compliance, offline inference&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Self-hosted V4-Pro&lt;/td&gt;
&lt;td&gt;Cluster cost only&lt;/td&gt;
&lt;td&gt;A day&lt;/td&gt;
&lt;td&gt;Research, custom fine-tunes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;OpenRouter / aggregator&lt;/td&gt;
&lt;td&gt;Per-token billing&lt;/td&gt;
&lt;td&gt;2 minutes&lt;/td&gt;
&lt;td&gt;Multi-provider fallback&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2 id="path-1-use-v4-in-the-web-chat"&gt;Path 1: Use V4 in the web chat&lt;/h2&gt;

&lt;p&gt;For the quickest evaluation, use the official chat interface:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Go to &lt;a href="https://chat.deepseek.com/" rel="noopener noreferrer"&gt;chat.deepseek.com&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Sign in (email, Google, or WeChat).&lt;/li&gt;
&lt;li&gt;V4-Pro is the default. Toggle between Non-Think, Think High, and Think Max at the top.&lt;/li&gt;
&lt;li&gt;Enter your prompt and send.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fassets.apidog.com%2Fblog-next%2F2026%2F04%2Fimage-221.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fassets.apidog.com%2Fblog-next%2F2026%2F04%2Fimage-221.png" alt="" width="800" height="434"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Web chat supports file uploads, web search, and full 1M-token context. Rate limits are per account; heavy use may slow responses.&lt;/p&gt;

&lt;p&gt;Best for: error trace diagnosis, summarizing large PDFs, quick benchmark tests. Not suitable for automation or repeatable workflows.&lt;/p&gt;

&lt;h2 id="path-2-use-the-deepseek-api"&gt;Path 2: Use the DeepSeek API&lt;/h2&gt;

&lt;p&gt;The API is OpenAI-compatible and ready for production. Model IDs (&lt;code&gt;deepseek-v4-pro&lt;/code&gt;, &lt;code&gt;deepseek-v4-flash&lt;/code&gt;) are stable.&lt;/p&gt;

&lt;h3 id="get-a-key"&gt;Get an API key&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;Sign up at &lt;a href="https://platform.deepseek.com/" rel="noopener noreferrer"&gt;platform.deepseek.com&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Add a payment method (top-ups from $2).&lt;/li&gt;
&lt;li&gt;Create an API key under &lt;strong&gt;API Keys&lt;/strong&gt; and copy it (you won’t see it again).&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Set the key in your environment:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;export DEEPSEEK_API_KEY="sk-..."
&lt;/code&gt;&lt;/pre&gt;

&lt;h3 id="the-minimum-viable-request"&gt;Minimum viable request (curl)&lt;/h3&gt;

&lt;pre&gt;&lt;code&gt;curl https://api.deepseek.com/v1/chat/completions \
  -H "Authorization: Bearer $DEEPSEEK_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "deepseek-v4-pro",
    "messages": [
      {"role": "user", "content": "Refactor this Python function to async. Reply with code only."}
    ],
    "thinking_mode": "thinking"
  }'
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Switch &lt;code&gt;deepseek-v4-pro&lt;/code&gt; to &lt;code&gt;deepseek-v4-flash&lt;/code&gt; for lower cost. Use &lt;code&gt;non-thinking&lt;/code&gt; mode for faster outputs.&lt;/p&gt;

&lt;h3 id="python-client"&gt;Python client (OpenAI SDK)&lt;/h3&gt;

&lt;pre&gt;&lt;code&gt;import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["DEEPSEEK_API_KEY"],
    base_url="https://api.deepseek.com/v1",
)

response = client.chat.completions.create(
    model="deepseek-v4-pro",
    messages=[
        {"role": "system", "content": "You are a concise senior engineer."},
        {"role": "user", "content": "Explain the CSA+HCA hybrid attention stack."},
    ],
    extra_body={"thinking_mode": "thinking_max"},
    temperature=1.0,
    top_p=1.0,
)

print(response.choices[0].message.content)
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Any OpenAI-compatible library (LangChain, LlamaIndex, DSPy) works by just changing the base URL.&lt;/p&gt;

&lt;h3 id="node-client"&gt;Node client (OpenAI SDK)&lt;/h3&gt;

&lt;pre&gt;&lt;code&gt;import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.DEEPSEEK_API_KEY,
  baseURL: "https://api.deepseek.com/v1",
});

const response = await client.chat.completions.create({
  model: "deepseek-v4-flash",
  messages: [{ role: "user", content: "Write a fizzbuzz in Rust." }],
  temperature: 1.0,
  top_p: 1.0,
});

console.log(response.choices[0].message.content);
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;See the &lt;a href="http://apidog.com/blog/how-to-use-deepseek-v4-api?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;DeepSeek V4 API guide&lt;/a&gt; for endpoint details, parameters, and error handling.&lt;/p&gt;

&lt;h2 id="path-3-iterate-with-apidog"&gt;Path 3: Iterate with Apidog&lt;/h2&gt;

&lt;p&gt;For repeated API calls, Apidog streamlines your workflow and helps manage credits.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Download Apidog for your OS: &lt;a href="https://apidog.com/download?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;Mac, Windows, Linux&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Create a new API project. Add a POST request to &lt;code&gt;https://api.deepseek.com/v1/chat/completions&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Add &lt;code&gt;Authorization: Bearer {{DEEPSEEK_API_KEY}}&lt;/code&gt; as a header. Store the key in environment variables.&lt;/li&gt;
&lt;li&gt;Paste your JSON request body and save. Replay or tweak with a click.&lt;/li&gt;
&lt;li&gt;Use the response viewer to compare outputs (e.g., Non-Think vs Think Max modes).&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;You can store multiple requests (OpenAI, Claude, DeepSeek) side by side for easy A/B testing and billing visibility. To migrate, simply change the base URL in your existing &lt;a href="http://apidog.com/blog/how-to-use-gpt-5-5-api?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;GPT-5.5 API collection&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id="path-4-self-host-v4-flash"&gt;Path 4: Self-host V4-Flash&lt;/h2&gt;

&lt;p&gt;The MIT license allows full self-hosting. This is ideal for compliance, air-gapped environments, or custom economics.&lt;/p&gt;

&lt;h3 id="hardware"&gt;Hardware requirements&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;V4-Flash (13B active, 284B total):&lt;/strong&gt; 2–4 H100/H200/MI300X GPUs (FP8). Quantized INT4 fits on a single 80GB card for small batches.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;V4-Pro (49B active, 1.6T total):&lt;/strong&gt; Requires 16–32 H100s for production inference.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id="get-the-weights"&gt;Download model weights&lt;/h3&gt;

&lt;pre&gt;&lt;code&gt;# Install Hugging Face CLI
pip install -U "huggingface_hub[cli]"

# (Optional) Log in to reduce rate limits
huggingface-cli login

# Download V4-Flash weights
huggingface-cli download deepseek-ai/DeepSeek-V4-Flash \
  --local-dir ./models/deepseek-v4-flash \
  --local-dir-use-symlinks False
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;V4-Flash is ~500GB at FP8; V4-Pro is several TB.&lt;/p&gt;

&lt;h3 id="run-inference"&gt;Run inference (vLLM)&lt;/h3&gt;

&lt;pre&gt;&lt;code&gt;pip install "vllm&amp;gt;=0.9.0"

vllm serve deepseek-ai/DeepSeek-V4-Flash \
  --tensor-parallel-size 4 \
  --max-model-len 1048576 \
  --dtype auto
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Once running, point OpenAI-compatible clients to &lt;code&gt;http://localhost:8000/v1&lt;/code&gt;. You can use the same Apidog collection with the new base URL.&lt;/p&gt;

&lt;h2 id="prompting-v4-effectively"&gt;Prompting V4 effectively&lt;/h2&gt;

&lt;p&gt;DeepSeek V4’s prompt handling differs from GPT-5.5 or Claude. For best results:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Set &lt;code&gt;thinking_mode&lt;/code&gt; explicitly&lt;/strong&gt; for each task.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Use system prompts for persona only&lt;/strong&gt;, not task instructions. Place the main task in the user message.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;For code tasks, provide a test harness&lt;/strong&gt; or failing test case to increase solution accuracy.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;For long-context prompts, keep the most relevant info at the beginning and end. V4’s attention is optimized, but recency and primacy effects remain.&lt;/p&gt;

&lt;h2 id="cost-control"&gt;Cost control&lt;/h2&gt;

&lt;p&gt;To prevent overspending, apply these safeguards:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Default to V4-Flash&lt;/strong&gt;. Use V4-Pro only for proven quality needs.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Default to Non-Think&lt;/strong&gt;. Use Think High or Think Max as required.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Set &lt;code&gt;max_tokens&lt;/code&gt;&lt;/strong&gt;. The 1M context is a limit, not a target. Most responses fit in 2,000 tokens.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In Apidog, use environment variables for &lt;code&gt;DEEPSEEK_API_KEY&lt;/code&gt; to separate test and production billing. &lt;a href="https://apidog.com?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;Apidog&lt;/a&gt; also records token counts per response to help spot runaway prompts.&lt;/p&gt;

&lt;h2 id="migrating-from-deepseek-v3-or-other-models"&gt;Migrating from DeepSeek V3 or other models&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;From &lt;code&gt;deepseek-chat&lt;/code&gt; / &lt;code&gt;deepseek-reasoner&lt;/code&gt;:&lt;/strong&gt; Change model ID to &lt;code&gt;deepseek-v4-pro&lt;/code&gt; or &lt;code&gt;deepseek-v4-flash&lt;/code&gt;. Old IDs deprecate July 24, 2026.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;From OpenAI GPT-5.x:&lt;/strong&gt; Change base URL to &lt;code&gt;https://api.deepseek.com/v1&lt;/code&gt; and model ID. Request shape is otherwise unchanged. See the &lt;a href="http://apidog.com/blog/how-to-use-gpt-5-5-api?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;GPT-5.5 API guide&lt;/a&gt; for reference.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;From Anthropic Claude:&lt;/strong&gt; Use &lt;code&gt;https://api.deepseek.com/anthropic&lt;/code&gt; for Anthropic message format, or convert to OpenAI format for main endpoint.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id="faq"&gt;FAQ&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Do I need a paid account?&lt;/strong&gt; Web chat is free. API access requires a minimum $2 top-up. See &lt;a href="http://apidog.com/blog/how-to-use-deepseek-v4-for-free?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;how to use DeepSeek V4 for free&lt;/a&gt; for no-cost options.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Which variant should I use?&lt;/strong&gt; Start with V4-Flash in Non-Think mode, measure quality, and only upgrade if necessary.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Can I run V4 on a MacBook?&lt;/strong&gt; V4-Flash runs on M3/M4 Max with 128GB RAM (INT4, slow). V4-Pro requires much more. For laptops, use the API or web chat.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Does V4 support function calling?&lt;/strong&gt; Yes. The OpenAI-compatible endpoint accepts the standard &lt;code&gt;tools&lt;/code&gt; array. Responses return &lt;code&gt;tool_calls&lt;/code&gt;. Anthropic endpoint uses its native schema.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How do I stream responses?&lt;/strong&gt; Set &lt;code&gt;stream: true&lt;/code&gt; in your request body. Responses are OpenAI-compatible SSE streams; any compatible library works out of the box.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Are there rate limits?&lt;/strong&gt; Hosted API has per-tier limits (&lt;a href="https://api-docs.deepseek.com/" rel="noopener noreferrer"&gt;api-docs.deepseek.com&lt;/a&gt;). Self-hosted: limited only by hardware.&lt;/p&gt;

</description>
    </item>
    <item>
      <title>What Is DeepSeek V4?</title>
      <dc:creator>Hassann</dc:creator>
      <pubDate>Fri, 24 Apr 2026 04:09:03 +0000</pubDate>
      <link>https://dev.to/hassann/what-is-deepseek-v4-338f</link>
      <guid>https://dev.to/hassann/what-is-deepseek-v4-338f</guid>
      <description>&lt;p&gt;DeepSeek V4: The Dev-Focused Guide to the 1.6T Open Model&lt;/p&gt;

&lt;p&gt;DeepSeek released V4 on April 23, 2026—a major Mixture-of-Experts (MoE) family upgrade. Four checkpoints dropped, led by DeepSeek-V4-Pro (1.6T parameters, MIT license, 1M-token context). V4-Flash is the smaller sibling (284B parameters, same context, open weights). Benchmarks put V4-Pro ahead of Claude Opus 4.6 for code, close behind GPT-5.4 xHigh on MMLU-Pro.&lt;/p&gt;

&lt;p&gt;If you’re choosing between Claude, GPT-5.5, Qwen, or DeepSeek V4, this guide covers what’s new, architectural changes from V3.2, implementation details, and how to run it right now.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://apidog.com/?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation" class="crayons-btn crayons-btn--primary"&gt;Try Apidog today&lt;/a&gt;
&lt;/p&gt;

&lt;p&gt;For hands-on integration, see the &lt;a href="http://apidog.com/blog/how-to-use-deepseek-v4-api?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;DeepSeek V4 API guide&lt;/a&gt;, &lt;a href="http://apidog.com/blog/how-to-use-deepseek-v4-for-free?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;free-access guide&lt;/a&gt;, and the &lt;a href="http://apidog.com/blog/how-to-use-deepseek-v4?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;full usage walkthrough&lt;/a&gt;. The API mirrors OpenAI’s request shape, so you can pre-build collections in &lt;a href="https://apidog.com/?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;Apidog&lt;/a&gt; before you have an API key.&lt;/p&gt;

&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;DeepSeek V4&lt;/strong&gt;: Mixture-of-Experts, released April 23, 2026, MIT license.&lt;/li&gt;
&lt;li&gt;Four checkpoints: &lt;strong&gt;V4-Pro&lt;/strong&gt;, &lt;strong&gt;V4-Pro-Base&lt;/strong&gt;, &lt;strong&gt;V4-Flash&lt;/strong&gt;, &lt;strong&gt;V4-Flash-Base&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;V4-Pro&lt;/strong&gt;: 1.6T total, 49B active params; &lt;strong&gt;V4-Flash&lt;/strong&gt;: 284B total, 13B active.&lt;/li&gt;
&lt;li&gt;Both: &lt;strong&gt;1M-token context&lt;/strong&gt;, three reasoning modes (Non-Think, Think High, Think Max).&lt;/li&gt;
&lt;li&gt;Headline scores (Pro): &lt;strong&gt;LiveCodeBench 93.5&lt;/strong&gt;, &lt;strong&gt;Codeforces 3206&lt;/strong&gt;, &lt;strong&gt;MMLU-Pro 87.5&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;API live at &lt;code&gt;api.deepseek.com&lt;/code&gt; (model IDs: &lt;code&gt;deepseek-v4-pro&lt;/code&gt;, &lt;code&gt;deepseek-v4-flash&lt;/code&gt;). Weights on Hugging Face and ModelScope.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  What DeepSeek V4 Actually Is
&lt;/h2&gt;

&lt;p&gt;V4 succeeds the V3 and V3.2 lines, keeping the MoE architecture but changing model shape. &lt;strong&gt;V4-Pro&lt;/strong&gt; activates 49B of 1.6T parameters per token—so inference is closer to a 50B dense model. Full technical details: &lt;a href="https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro" rel="noopener noreferrer"&gt;DeepSeek V4 model card&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpejjjliky55psxexo3l2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpejjjliky55psxexo3l2.png" alt="V4 architecture overview" width="800" height="158"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Checkpoints
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;DeepSeek-V4-Pro&lt;/strong&gt;: 1.6T total, 49B active, 1M context. This is the main production API model.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;DeepSeek-V4-Pro-Base&lt;/strong&gt;: Pre-trained only, for custom fine-tunes.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;DeepSeek-V4-Flash&lt;/strong&gt;: 284B total, 13B active, 1M context. For latency/local deploy on 2–3 H100s.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;DeepSeek-V4-Flash-Base&lt;/strong&gt;: Pre-trained, for research/fine-tune.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;All checkpoints: MIT license. You can download, mirror, fine-tune, and deploy with no license fee.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Changed from V3.2
&lt;/h2&gt;

&lt;p&gt;V4 improves on code and reasoning benchmarks by rewriting the attention stack and training pipeline.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Capability&lt;/th&gt;
&lt;th&gt;V3.2&lt;/th&gt;
&lt;th&gt;V4-Pro&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Total parameters&lt;/td&gt;
&lt;td&gt;685B&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;1.6T&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Active parameters&lt;/td&gt;
&lt;td&gt;37B&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;49B&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Context window&lt;/td&gt;
&lt;td&gt;128K&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;1M&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Inference FLOPs (1M context)&lt;/td&gt;
&lt;td&gt;baseline&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;27%&lt;/strong&gt; of V3.2&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;KV cache (1M context)&lt;/td&gt;
&lt;td&gt;baseline&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;10%&lt;/strong&gt; of V3.2&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Precision&lt;/td&gt;
&lt;td&gt;FP8&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;FP4 + FP8 mixed&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;License&lt;/td&gt;
&lt;td&gt;DeepSeek License&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;MIT&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Reasoning modes&lt;/td&gt;
&lt;td&gt;single&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;three&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Key drivers:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Hybrid attention stack&lt;/strong&gt;: Combines Compressed Sparse Attention and Heavily Compressed Attention for efficient, long-context inference, shrinking KV cache to 10% and FLOPs to 27% of V3.2 at 1M tokens.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Manifold-Constrained Hyper-Connections&lt;/strong&gt;: Stabilizes gradients for deep stacking.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Muon optimizer&lt;/strong&gt;: Faster convergence and better handling of large gradient norms.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Training corpus: 32T+ tokens; post-training: two-stage pipeline (domain experts, then on-policy distillation).&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0ohkm6y7q9i5q2swecu2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0ohkm6y7q9i5q2swecu2.png" alt="V4 technical improvements" width="800" height="550"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Benchmarks That Matter
&lt;/h2&gt;

&lt;p&gt;V4-Pro is top-tier on code and knowledge tasks, with some gaps in long-context retrieval.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqn2q9hlfiouvipu8z13b.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqn2q9hlfiouvipu8z13b.png" alt="V4 benchmark scores" width="800" height="591"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;V4-Flash&lt;/strong&gt; delivers:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;MMLU-Pro&lt;/strong&gt;: 86.2&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GPQA Diamond&lt;/strong&gt;: 88.1&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;LiveCodeBench&lt;/strong&gt;: 91.6&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Codeforces&lt;/strong&gt;: 3052&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;SWE Verified&lt;/strong&gt;: 79.0&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;See &lt;a href="https://huggingface.co/deepseek-ai/DeepSeek-V4-Flash" rel="noopener noreferrer"&gt;DeepSeek V4-Flash card&lt;/a&gt; for full tables.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Summary:&lt;/strong&gt;  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;V4-Pro leads on code and factual recall, but Gemini 3.1 Pro leads MMLU-Pro; Claude Opus is stronger for 1M-token retrieval.&lt;/li&gt;
&lt;li&gt;For coding, agentic tasks, and complex analysis, V4-Pro is competitive. For “needle-in-a-haystack” retrieval, Claude is better.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Three Reasoning Modes
&lt;/h2&gt;

&lt;p&gt;Each V4 checkpoint exposes three modes, controlled by the &lt;code&gt;thinking_mode&lt;/code&gt; parameter (API) or a script flag:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Non-Think&lt;/strong&gt;: Fast, no reasoning tokens. Use for classification, routing, or summaries.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Think High&lt;/strong&gt;: Default for complex tasks. The model generates reasoning traces and plans actions.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Think Max&lt;/strong&gt;: Longer reasoning, self-critique, recommended for 384K+ context. Highest accuracy, highest token cost.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Sampling settings:&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
&lt;code&gt;temperature=1.0, top_p=1.0&lt;/code&gt; across all modes.&lt;/p&gt;




&lt;h2&gt;
  
  
  Architecture in Plain English
&lt;/h2&gt;

&lt;p&gt;Three architectural choices drive V4's efficiency:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Hybrid attention&lt;/strong&gt;: Most layers use Compressed Sparse Attention (full attention on "important" tokens, compression elsewhere); some use Heavily Compressed Attention (close to linear cost). This enables efficient scaling to 1M tokens.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Manifold-Constrained Hyper-Connections&lt;/strong&gt;: Residuals are constrained to stable manifolds, enabling deeper stacking without gradient instability.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Muon optimizer&lt;/strong&gt;: Replaces AdamW, better for MoE gradients and faster convergence.&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  Availability Today
&lt;/h2&gt;

&lt;p&gt;All four checkpoints and the API are live as of April 24, 2026.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Surface&lt;/th&gt;
&lt;th&gt;Access&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://chat.deepseek.com/" rel="noopener noreferrer"&gt;chat.deepseek.com&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Free web chat, V4-Pro default, login required&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;DeepSeek API&lt;/td&gt;
&lt;td&gt;Live at &lt;code&gt;api.deepseek.com&lt;/code&gt;; model IDs &lt;code&gt;deepseek-v4-pro&lt;/code&gt;, &lt;code&gt;deepseek-v4-flash&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Hugging Face weights&lt;/td&gt;
&lt;td&gt;
&lt;a href="https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro" rel="noopener noreferrer"&gt;V4-Pro&lt;/a&gt;, &lt;a href="https://huggingface.co/deepseek-ai/DeepSeek-V4-Flash" rel="noopener noreferrer"&gt;V4-Flash&lt;/a&gt;, both MIT&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;ModelScope&lt;/td&gt;
&lt;td&gt;Mirrored weights for users in China&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;OpenRouter and aggregators&lt;/td&gt;
&lt;td&gt;Expected within days; typical DeepSeek launch pattern&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;deepseek-chat&lt;/code&gt; / &lt;code&gt;deepseek-reasoner&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Deprecated July 24, 2026&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Migration note:&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
If using &lt;code&gt;deepseek-chat&lt;/code&gt; in production, migrate to &lt;code&gt;deepseek-v4-pro&lt;/code&gt; or &lt;code&gt;deepseek-v4-flash&lt;/code&gt; within three months.&lt;/p&gt;




&lt;h2&gt;
  
  
  How It Compares to GPT-5.5 and Claude
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Cost&lt;/strong&gt;: V4-Pro/Flash are open weights (MIT). GPT-5.5 and Claude Opus are closed. Self-hosting V4 is cheaper at scale.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Coding&lt;/strong&gt;: V4-Pro (LiveCodeBench 93.5, Codeforces 3206) beats GPT-5.5 and Claude on code.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Knowledge&lt;/strong&gt;: Gemini 3.1 Pro leads MMLU-Pro. V4-Pro/GPT-5.5 tie at 87.5.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Long-context retrieval&lt;/strong&gt;: Claude Opus leads MRCR 1M. For deep retrieval, Claude is safer.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;License&lt;/strong&gt;: MIT lets you embed V4-Pro in your product freely.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  What to Build With It
&lt;/h2&gt;

&lt;p&gt;Good fits for DeepSeek V4:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Agentic coding loops&lt;/strong&gt;: Multi-file debugging, repo refactoring, test fixes. Use with &lt;a href="https://apidog.com/?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;Apidog&lt;/a&gt; for API inspection and prompt tuning.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Long-document reasoning&lt;/strong&gt;: 1M tokens handles large repos, contracts, research corpora. Use Think High mode.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Self-hosted AI products&lt;/strong&gt;: V4-Flash is the first open-weights model with frontier quality for on-prem.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Research/fine-tuning&lt;/strong&gt;: Base checkpoints let you train specialist models with your own data.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Not ideal for: high-volume classification, embedding retrieval, or short-prompt chat (older DeepSeek models are cheaper for those).&lt;/p&gt;




&lt;h2&gt;
  
  
  Pricing in One Line
&lt;/h2&gt;

&lt;p&gt;As of writing, V4 API pricing is not final. V3.2 was ~$0.28/million input tokens, $0.42/million output tokens. Expect V4-Flash at a similar rate, V4-Pro at a premium. Closed competitors charge $5–$15/million input tokens. For updates, see the &lt;a href="https://api-docs.deepseek.com/" rel="noopener noreferrer"&gt;DeepSeek pricing page&lt;/a&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  How to Test V4 Today
&lt;/h2&gt;

&lt;p&gt;Three ways to get started (fastest first):&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Web chat&lt;/strong&gt;: Go to &lt;a href="https://chat.deepseek.com/" rel="noopener noreferrer"&gt;chat.deepseek.com&lt;/a&gt;, sign in. V4-Pro is default. Toggle to Think High in UI. Free, no card.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;API&lt;/strong&gt;: Get an API key, point your client at &lt;code&gt;https://api.deepseek.com&lt;/code&gt;, set &lt;code&gt;"model": "deepseek-v4-pro"&lt;/code&gt;. Request shape is OpenAI-compatible—swap the base URL in any OpenAI client. Full guide: &lt;a href="http://apidog.com/blog/how-to-use-deepseek-v4-api?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;DeepSeek V4 API guide&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Local weights&lt;/strong&gt;: Download from Hugging Face or ModelScope. V4-Flash runs on 2–4 H100s, V4-Pro requires a cluster. Inference code is in &lt;code&gt;/inference&lt;/code&gt; of the repo.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;For a prompt-iteration workflow using Apidog, see &lt;a href="http://apidog.com/blog/how-to-use-deepseek-v4?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;how to use DeepSeek V4&lt;/a&gt;. For zero-cost usage, see &lt;a href="http://apidog.com/blog/how-to-use-deepseek-v4-for-free?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;how to use DeepSeek V4 for free&lt;/a&gt;. &lt;a href="https://apidog.com/download?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;Download Apidog&lt;/a&gt; and pre-build your collection; the OpenAI-compatible format supports DeepSeek, OpenAI, and other APIs with one request.&lt;/p&gt;




&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Is DeepSeek V4 really open source?&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Yes. All checkpoints are MIT licensed for commercial use, modification, and redistribution.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Do I need a GPU cluster to run V4-Flash?&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
For full precision: 2–4 H100s/H200s for V4-Flash. Less if quantized. V4-Pro needs a full cluster. To test without hardware, use the API or &lt;a href="https://chat.deepseek.com/" rel="noopener noreferrer"&gt;chat.deepseek.com&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;When does V4 hit the DeepSeek API?&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Live as of April 23, 2026. Model IDs: &lt;code&gt;deepseek-v4-pro&lt;/code&gt;, &lt;code&gt;deepseek-v4-flash&lt;/code&gt;. Old IDs (&lt;code&gt;deepseek-chat&lt;/code&gt;, &lt;code&gt;deepseek-reasoner&lt;/code&gt;) deprecated July 24, 2026.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How does V4 compare to Kimi and Qwen?&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
V4-Pro posts higher LiveCodeBench and Codeforces numbers than Kimi K2 and Qwen 3 Max. All are open-weights MoE models. Choose based on the best benchmark for your use case.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Can I fine-tune V4 on my data?&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Yes, use the Base checkpoints and a standard SFT pipeline. MIT license permits commercial redistribution.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Will V4 work with my existing OpenAI-compatible tooling?&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Yes, the API accepts OpenAI and Anthropic formats:  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;https://api.deepseek.com&lt;/code&gt; (OpenAI)
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;https://api.deepseek.com/anthropic&lt;/code&gt; (Anthropic)
Most OpenAI clients work with a base-URL change. For a parallel pattern, see the &lt;a href="http://apidog.com/blog/how-to-use-gpt-5-5-api?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;GPT-5.5 API walkthrough&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;

</description>
    </item>
    <item>
      <title>GPT-5.5 Pricing: Full Breakdown of API, Codex, and ChatGPT Costs (April 2026)</title>
      <dc:creator>Hassann</dc:creator>
      <pubDate>Fri, 24 Apr 2026 02:26:01 +0000</pubDate>
      <link>https://dev.to/hassann/gpt-55-pricing-full-breakdown-of-api-codex-and-chatgpt-costs-april-2026-14id</link>
      <guid>https://dev.to/hassann/gpt-55-pricing-full-breakdown-of-api-codex-and-chatgpt-costs-april-2026-14id</guid>
      <description>&lt;p&gt;OpenAI’s April 23, 2026 release of GPT-5.5 doubled the per-token price compared to GPT-5.4: input tokens now cost $5.00/M, output tokens $30.00/M. Pro API pricing remains at $30/$180. Knowing the detailed pricing surfaces—API, Batch, Flex, Priority, and Codex limits—is essential to avoid surprise bills and optimize your workload before upgrading to GPT-5.5.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://apidog.com/?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation" class="crayons-btn crayons-btn--primary"&gt;Try Apidog today&lt;/a&gt;
&lt;/p&gt;

&lt;p&gt;For a high-level overview, see &lt;a href="http://apidog.com/blog/what-is-gpt-5-5?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;What is GPT-5.5&lt;/a&gt;. For API integration steps, check &lt;a href="http://apidog.com/blog/how-to-use-gpt-5-5-api?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;How to use the GPT-5.5 API&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Surface&lt;/th&gt;
&lt;th&gt;Input / M&lt;/th&gt;
&lt;th&gt;Output / M&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;GPT-5.5 standard API&lt;/td&gt;
&lt;td&gt;$5.00&lt;/td&gt;
&lt;td&gt;$30.00&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GPT-5.5 Pro API&lt;/td&gt;
&lt;td&gt;$30.00&lt;/td&gt;
&lt;td&gt;$180.00&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GPT-5.5 Batch (50% off)&lt;/td&gt;
&lt;td&gt;$2.50&lt;/td&gt;
&lt;td&gt;$15.00&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GPT-5.5 Flex (50% off)&lt;/td&gt;
&lt;td&gt;$2.50&lt;/td&gt;
&lt;td&gt;$15.00&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GPT-5.5 Priority (2.5×)&lt;/td&gt;
&lt;td&gt;$12.50&lt;/td&gt;
&lt;td&gt;$75.00&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GPT-5.4 standard API&lt;/td&gt;
&lt;td&gt;$2.50&lt;/td&gt;
&lt;td&gt;$15.00&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GPT-5.4-mini API&lt;/td&gt;
&lt;td&gt;$0.25&lt;/td&gt;
&lt;td&gt;$2.00&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;GPT-5.5 costs 2× GPT-5.4 per token, but OpenAI claims ~20% higher net intelligence when factoring in token efficiency.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Headline Numbers
&lt;/h2&gt;

&lt;p&gt;OpenAI’s &lt;a href="https://openai.com/api/pricing/" rel="noopener noreferrer"&gt;API pricing page&lt;/a&gt; lists current rates:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;GPT-5.5&lt;/strong&gt;: $5.00/M input, $30.00/M output tokens.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GPT-5.5 Pro&lt;/strong&gt;: $30.00/M input, $180.00/M output tokens.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Context window&lt;/strong&gt;: 1M tokens for both variants. Reasoning tokens count toward both context and output billing.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Batch, Flex, and Priority
&lt;/h2&gt;

&lt;p&gt;OpenAI provides alternative pricing tiers for specific workload needs:&lt;/p&gt;

&lt;h3&gt;
  
  
  Batch API
&lt;/h3&gt;

&lt;p&gt;Use the Batch endpoint for queued requests at 50% standard pricing. Turnaround is &amp;lt;24 hours. Ideal for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Overnight dataset evaluations.&lt;/li&gt;
&lt;li&gt;Historical re-processing.&lt;/li&gt;
&lt;li&gt;Any workflow with latency tolerance measured in hours.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Batch pricing:&lt;/strong&gt; GPT-5.5 at $2.50 / $15.00 per million tokens—same as GPT-5.4 standard. Use Batch for offline workloads to avoid the price hike.&lt;/p&gt;

&lt;h3&gt;
  
  
  Flex Processing
&lt;/h3&gt;

&lt;p&gt;Flex offers 50% off standard rates with variable latency, from seconds to several minutes depending on demand. Choose Flex for unpredictable latency tolerance with Batch-level savings and near-synchronous responses.&lt;/p&gt;

&lt;h3&gt;
  
  
  Priority Processing
&lt;/h3&gt;

&lt;p&gt;Priority tier charges 2.5× standard rates ($12.50/$75.00 per million tokens for GPT-5.5) in exchange for faster throughput, higher rate limits, and minimal queue time. Reserve for latency-critical, user-facing production scenarios.&lt;/p&gt;

&lt;h2&gt;
  
  
  Thinking Mode Cost Math
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;reasoning.effort&lt;/code&gt; increases the number of tokens used per request, not the per-token price. Adjust your math based on effort:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Effort&lt;/th&gt;
&lt;th&gt;Output-token multiplier&lt;/th&gt;
&lt;th&gt;When to use&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;low&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;1×&lt;/td&gt;
&lt;td&gt;Routine calls&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;medium&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;1.3–2×&lt;/td&gt;
&lt;td&gt;Multi-step coding, structured generation&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;high&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;2–4×&lt;/td&gt;
&lt;td&gt;Deep research, correctness-critical review&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;xhigh&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;3–8×&lt;/td&gt;
&lt;td&gt;Agent loops, dense planning&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;A single &lt;code&gt;xhigh&lt;/code&gt; call on a long prompt may use 20K reasoning tokens—$0.60 just for reasoning at standard output rates. &lt;strong&gt;Budget by workload, not per request.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Codex Pricing
&lt;/h2&gt;

&lt;p&gt;Codex access ties to your ChatGPT plan, not token billing. As of April 2026:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Plan&lt;/th&gt;
&lt;th&gt;Codex access&lt;/th&gt;
&lt;th&gt;GPT-5.5&lt;/th&gt;
&lt;th&gt;Notes&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Free&lt;/td&gt;
&lt;td&gt;Yes (limited time)&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Tight weekly caps&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Go&lt;/td&gt;
&lt;td&gt;Yes (limited time)&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;2× Free caps&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Plus ($20/mo)&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Standard caps&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Pro ($200/mo)&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Yes + Thinking + Pro (in ChatGPT)&lt;/td&gt;
&lt;td&gt;Highest per-user caps&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Business&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Seat-based&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Enterprise/Edu&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Contract-based&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;For CLI-based terminal workflows, Plus or Pro is the most cost-effective way to access GPT-5.5 after a few hundred thousand tokens/day. See the &lt;a href="http://apidog.com/blog/how-to-use-gpt-5-5-free-codex?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;free path guide&lt;/a&gt; for no-cost entry.&lt;/p&gt;

&lt;h2&gt;
  
  
  Comparison: GPT-5.5 vs Other Models
&lt;/h2&gt;

&lt;p&gt;Choose based on your workload’s output and risk profile:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Input / M&lt;/th&gt;
&lt;th&gt;Output / M&lt;/th&gt;
&lt;th&gt;Cost per 1K output tokens&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;GPT-5.4-mini&lt;/td&gt;
&lt;td&gt;$0.25&lt;/td&gt;
&lt;td&gt;$2.00&lt;/td&gt;
&lt;td&gt;$0.0020&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GPT-5.4&lt;/td&gt;
&lt;td&gt;$2.50&lt;/td&gt;
&lt;td&gt;$15.00&lt;/td&gt;
&lt;td&gt;$0.0150&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GPT-5.5&lt;/td&gt;
&lt;td&gt;$5.00&lt;/td&gt;
&lt;td&gt;$30.00&lt;/td&gt;
&lt;td&gt;$0.0300&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GPT-5.5 Pro&lt;/td&gt;
&lt;td&gt;$30.00&lt;/td&gt;
&lt;td&gt;$180.00&lt;/td&gt;
&lt;td&gt;$0.1800&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;High-volume, low-risk output&lt;/strong&gt; (classification, summarization): &lt;strong&gt;GPT-5.4-mini&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;General workloads where GPT-5.4 is sufficient&lt;/strong&gt;: &lt;strong&gt;GPT-5.4&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Complex coding, agentic tasks, research&lt;/strong&gt;: &lt;strong&gt;GPT-5.5&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Correctness-critical output&lt;/strong&gt;: &lt;strong&gt;GPT-5.5 Pro&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Worked Example: Coding Agent Cost per Task
&lt;/h2&gt;

&lt;p&gt;Typical agentic coding session on GPT-5.5 (&lt;code&gt;reasoning.effort: "medium"&lt;/code&gt;):&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Input tokens&lt;/strong&gt;: ~15,000&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Output tokens&lt;/strong&gt;: ~3,000&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Reasoning tokens&lt;/strong&gt;: ~6,000 (medium effort)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Cost per task (standard pricing):&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Input: 15K × $5.00/M = &lt;strong&gt;$0.075&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Output: (3K + 6K) × $30.00/M = &lt;strong&gt;$0.27&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Total: $0.345 per coding task&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Same workload on GPT-5.4:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Input: 15K × $2.50/M = &lt;strong&gt;$0.0375&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Output: 9K × $15.00/M = &lt;strong&gt;$0.135&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Total: $0.1725 per task&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;GPT-5.5 is 2× the cost per task at equal reasoning. If GPT-5.5 closes more tasks successfully (higher quality), the upgrade may pay for itself by reducing retries.&lt;/p&gt;

&lt;h2&gt;
  
  
  Day 1 Cost Controls to Implement
&lt;/h2&gt;

&lt;p&gt;To manage GPT-5.5 costs, build these controls into your stack:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Enforce &lt;code&gt;max_output_tokens&lt;/code&gt; caps&lt;/strong&gt;: Default to 2,000 unless longer output is necessary.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Strict JSON schemas&lt;/strong&gt;: Prevents malformed output and expensive retries.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Route by difficulty&lt;/strong&gt;: Use GPT-5.4-mini for easy requests; escalate hard ones to GPT-5.5.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Use Batch for offline jobs&lt;/strong&gt;: Evaluations, reports, etc. get 50% off.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Monitor &lt;code&gt;usage.reasoning_tokens&lt;/code&gt;&lt;/strong&gt;: High-effort reasoning tokens are the most common source of bill spikes.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Per-Plan Monthly Cost Estimate
&lt;/h2&gt;

&lt;p&gt;If you’re choosing a ChatGPT plan for GPT-5.5, see the breakdown:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Plan&lt;/th&gt;
&lt;th&gt;Monthly Price&lt;/th&gt;
&lt;th&gt;Best Fit&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Free&lt;/td&gt;
&lt;td&gt;$0&lt;/td&gt;
&lt;td&gt;Trying GPT-5.5 via Codex&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Go&lt;/td&gt;
&lt;td&gt;$4/mo&lt;/td&gt;
&lt;td&gt;Students, light users&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Plus&lt;/td&gt;
&lt;td&gt;$20/mo&lt;/td&gt;
&lt;td&gt;Devs using Codex + ChatGPT daily&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Pro&lt;/td&gt;
&lt;td&gt;$200/mo&lt;/td&gt;
&lt;td&gt;Power users needing Thinking/Pro modes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Business&lt;/td&gt;
&lt;td&gt;$25/seat/mo&lt;/td&gt;
&lt;td&gt;Teams needing shared workspaces&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Enterprise/Edu&lt;/td&gt;
&lt;td&gt;Custom&lt;/td&gt;
&lt;td&gt;Contracted, SLA-based use&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;If your API usage exceeds ~4M output tokens/month, Pro plus Codex CLI is usually cheaper—as long as your context fits in the 400K CLI window.&lt;/p&gt;

&lt;h2&gt;
  
  
  Price Change Signals to Watch
&lt;/h2&gt;

&lt;p&gt;Track these if budgeting long-term:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;GPT-5.5 API general availability&lt;/strong&gt;: Pricing may drop in response to competition (Claude Mythos, Gemini 3.5, open weights).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pro model democratization&lt;/strong&gt;: OpenAI has historically lowered Pro-tier prices 3–6 months post-launch. Don’t assume $30/$180 is permanent.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Does caching reduce input cost?&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Yes. Cached input tokens are billed at a fraction of standard rate. Check the &lt;a href="https://openai.com/api/pricing/" rel="noopener noreferrer"&gt;OpenAI pricing page&lt;/a&gt; for details. Cache reusable system prompts and repo contexts.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Is there a volume discount?&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Not officially. Enterprise contracts may have custom rates. For sustained, large-scale use, talk to sales.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Does thinking mode cost extra?&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
No. It increases token usage, not the per-token rate.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Is Codex CLI usage billed separately?&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Only if you sign in with an API key. ChatGPT sign-ins use the plan fee; API keys use usage-based billing.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What’s the cheapest way to try GPT-5.5?&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Free or Go plan plus Codex CLI. See our &lt;a href="http://apidog.com/blog/how-to-use-gpt-5-5-api-for-free?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;free path guide&lt;/a&gt; for details.&lt;/p&gt;

</description>
    </item>
    <item>
      <title>How to Use GPT-5.5 for Free with Codex?</title>
      <dc:creator>Hassann</dc:creator>
      <pubDate>Fri, 24 Apr 2026 02:25:23 +0000</pubDate>
      <link>https://dev.to/hassann/how-to-use-gpt-55-for-free-with-codex-2em0</link>
      <guid>https://dev.to/hassann/how-to-use-gpt-55-for-free-with-codex-2em0</guid>
      <description>&lt;p&gt;OpenAI released GPT-5.5 on April 23, 2026, and introduced Codex access to every ChatGPT plan—including Free and Go—for a limited time. The fastest way to try GPT-5.5 for free is to install the Codex CLI, sign in with your ChatGPT account, and start using GPT-5.5 from your terminal—no API key or credit card required.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://apidog.com/?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation" class="crayons-btn crayons-btn--primary"&gt;Try Apidog today&lt;/a&gt;
&lt;/p&gt;

&lt;p&gt;This actionable guide covers Codex CLI installation, authentication, model selection, usage limits, and how to integrate Codex into your dev workflow. For a deeper dive into the model, see &lt;a href="http://apidog.com/blog/what-is-gpt-5-5?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;What is GPT-5.5&lt;/a&gt;. For more free usage options, check the &lt;a href="http://apidog.com/blog/how-to-use-gpt-5-5-api-for-free?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;GPT-5.5 for free guide&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Codex CLI&lt;/strong&gt; lets you run GPT-5.5 on your local repo with a 400K context window.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;All ChatGPT plans&lt;/strong&gt; (Free, Go, Plus, Pro, Business, Enterprise, Edu) get Codex; Free and Go are &lt;strong&gt;limited time&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Install:&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;  npm &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-g&lt;/span&gt; @openai/codex
  &lt;span class="c"&gt;# or&lt;/span&gt;
  brew &lt;span class="nb"&gt;install &lt;/span&gt;codex
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Authenticate:&lt;/strong&gt; Use ChatGPT OAuth in a browser, or device-code flow on headless servers.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Switch models:&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;  /model gpt-5.5
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Check quota with &lt;code&gt;/status&lt;/code&gt;.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Integrate with &lt;a href="https://apidog.com/?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;Apidog&lt;/a&gt;:&lt;/strong&gt; Prototype and test API calls before production.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Why Codex is the Easiest Free Path
&lt;/h2&gt;

&lt;p&gt;OpenAI’s API is paid-only by default—GPT-5.5 on the Responses endpoint is $5/million input tokens and $30/million output tokens after general release. Codex sidesteps this: it wraps the same model in a CLI that authenticates via your ChatGPT account, not an API key. Your ChatGPT plan controls rate limits, but you get the real GPT-5.5 model.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnuhczptpnd6thkkkxlhc.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnuhczptpnd6thkkkxlhc.png" alt="Codex vs API" width="800" height="481"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Install Codex CLI
&lt;/h2&gt;

&lt;p&gt;Codex CLI supports two install methods:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# npm (cross-platform)&lt;/span&gt;
npm &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-g&lt;/span&gt; @openai/codex

&lt;span class="c"&gt;# or Homebrew (macOS / Linux)&lt;/span&gt;
brew &lt;span class="nb"&gt;install &lt;/span&gt;codex
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Verify installation:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;codex &lt;span class="nt"&gt;--version&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You should see version &lt;code&gt;0.28.0&lt;/code&gt; or newer—older versions won’t include GPT-5.5.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3u8pyal71lu6rbud8cet.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3u8pyal71lu6rbud8cet.png" alt="Codex CLI Install" width="800" height="426"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Authenticate with a ChatGPT Account
&lt;/h2&gt;

&lt;p&gt;On first run, Codex prompts for authentication.&lt;/p&gt;

&lt;h3&gt;
  
  
  Browser OAuth (Local Machines)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;codex
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A browser window opens; log in with your ChatGPT account. The CLI caches your session for future use.&lt;/p&gt;

&lt;h3&gt;
  
  
  Device Code Flow (Headless Servers)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;codex login &lt;span class="nt"&gt;--device-auth&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You’ll get a code and URL—open the URL on any device, paste the code, and approve. The CLI on your server completes sign-in automatically.&lt;/p&gt;

&lt;h3&gt;
  
  
  API Key Fallback
&lt;/h3&gt;

&lt;p&gt;If you prefer to use a paid OpenAI API key:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;printenv &lt;/span&gt;OPENAI_API_KEY | codex login &lt;span class="nt"&gt;--with-api-key&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This charges your API billing account instead of ChatGPT. Useful for team billing.&lt;/p&gt;

&lt;h2&gt;
  
  
  Pick GPT-5.5 as the Model
&lt;/h2&gt;

&lt;p&gt;Codex defaults to the “recommended” model for your plan. On paid plans, that’s usually &lt;code&gt;gpt-5.5&lt;/code&gt;. Free and Go users may need to switch manually.&lt;/p&gt;

&lt;h3&gt;
  
  
  During a Session
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;/model gpt-5.5
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The CLI displays the active model and your rate limits.&lt;/p&gt;

&lt;h3&gt;
  
  
  Launch Flag
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;codex &lt;span class="nt"&gt;--model&lt;/span&gt; gpt-5.5
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Check Remaining Quota
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;/status
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Displays your weekly message budget, context window, and trial expiration for Free/Go plans.&lt;/p&gt;

&lt;h2&gt;
  
  
  First Session: Practical Example
&lt;/h2&gt;

&lt;p&gt;Codex provides a full-screen terminal UI that reads your repo, runs commands, and edits files. Example workflow:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;cd&lt;/span&gt; ~/Projects/my-app
codex &lt;span class="nt"&gt;--model&lt;/span&gt; gpt-5.5
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Inside Codex, try:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;&amp;gt; Read README.md, then open scripts/deploy.sh and summarize what it does in five bullets.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Codex will summarize the script. Next:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;&amp;gt; Refactor deploy.sh so it exits on any failed step, and add a dry-run flag. Keep backwards compatibility.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;GPT-5.5 proposes a diff; approve to apply changes.&lt;/p&gt;

&lt;p&gt;Then run tests:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;&amp;gt; Run the deploy test suite and show me the failing case.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Test output streams into the session. If anything fails, ask Codex to fix and loop until tests pass.&lt;/p&gt;

&lt;p&gt;This workflow matches the model’s strengths. See &lt;a href="https://openai.com/index/introducing-gpt-5-5/" rel="noopener noreferrer"&gt;OpenAI’s launch post&lt;/a&gt; for benchmarks and scores.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Codex Adds Beyond Raw API Calls
&lt;/h2&gt;

&lt;p&gt;The Codex CLI layers these features over the model:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Repo context:&lt;/strong&gt; Reads your file tree and indexes relevant files without pasting raw content.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Command execution with approval:&lt;/strong&gt; You approve all CLI-suggested commands before execution.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Diff previews:&lt;/strong&gt; All file edits show as unified diffs—accept, reject, or edit before committing.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Session persistence:&lt;/strong&gt; Project sessions persist; pick up where you left off.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Building these features over the API alone is non-trivial; the &lt;a href="http://apidog.com/blog/how-to-use-gpt-5-5-api?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;GPT-5.5 API guide&lt;/a&gt; explains the DIY approach.&lt;/p&gt;

&lt;h2&gt;
  
  
  Rate Limits and Caps per Plan
&lt;/h2&gt;

&lt;p&gt;Caps as of April 23, 2026:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Plan&lt;/th&gt;
&lt;th&gt;GPT-5.5 access in Codex&lt;/th&gt;
&lt;th&gt;Weekly cap&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Free&lt;/td&gt;
&lt;td&gt;Yes (limited time)&lt;/td&gt;
&lt;td&gt;Tight; prototype&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Go&lt;/td&gt;
&lt;td&gt;Yes (limited time, 2× Free)&lt;/td&gt;
&lt;td&gt;Small&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Plus&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Mid&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Pro&lt;/td&gt;
&lt;td&gt;Yes, highest solo caps&lt;/td&gt;
&lt;td&gt;High&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Business&lt;/td&gt;
&lt;td&gt;Yes, seat-based&lt;/td&gt;
&lt;td&gt;High per seat&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Enterprise / Edu&lt;/td&gt;
&lt;td&gt;Yes, contract-based&lt;/td&gt;
&lt;td&gt;Custom&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;If you hit your cap, Codex returns a clear error—no silent degradation. Use &lt;code&gt;/status&lt;/code&gt; to check usage.&lt;/p&gt;

&lt;h2&gt;
  
  
  Editor and IDE Integration
&lt;/h2&gt;

&lt;p&gt;Codex authentication is shared across the CLI, VS Code extension, JetBrains plugin, and Codex cloud app. Sign in once via CLI; IDE extensions reuse your credentials.&lt;/p&gt;

&lt;p&gt;For &lt;a href="https://apidog.com/?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;Apidog&lt;/a&gt; users, a practical workflow:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Prototype a request in Codex CLI (&lt;code&gt;run the GPT-5.5 prompt against this file&lt;/code&gt;).&lt;/li&gt;
&lt;li&gt;Export the prompt and output into an &lt;a href="https://apidog.com/?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;Apidog&lt;/a&gt; collection for team sharing.&lt;/li&gt;
&lt;li&gt;Switch from Codex to direct API calls as your contract stabilizes and you move to paid keys.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;See &lt;a href="http://apidog.com/blog/how-to-use-apidog-inside-vscode?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;Apidog in VS Code&lt;/a&gt; for integration details.&lt;/p&gt;

&lt;h2&gt;
  
  
  Keeping the Workflow Safe on Free and Go
&lt;/h2&gt;

&lt;p&gt;Set these guardrails on day one:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Require approval for file writes:&lt;/strong&gt; In &lt;code&gt;~/.codex/config.json&lt;/code&gt;, set &lt;code&gt;"autoApproveWrites": false&lt;/code&gt;. Defaults are safe for Free, but Go plans may auto-apply trivial diffs.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Limit workspace scope:&lt;/strong&gt; Run &lt;code&gt;codex&lt;/code&gt; from your project directory. Codex reads from the current directory down—avoid running from &lt;code&gt;~&lt;/code&gt; or other broad scopes.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;OpenAI performed third-party safety reviews and &lt;a href="https://www.cnbc.com/2026/04/23/openai-announces-latest-artificial-intelligence-model.html" rel="noopener noreferrer"&gt;cyber red-teaming&lt;/a&gt; for GPT-5.5, but always review diffs before applying.&lt;/p&gt;

&lt;h2&gt;
  
  
  When to Move Off the Free Path
&lt;/h2&gt;

&lt;p&gt;Free and Go access is time-limited. Plan to upgrade if:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;You consistently hit quotas:&lt;/strong&gt; Upgrade to Plus or Pro for more capacity.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;You need direct API access:&lt;/strong&gt; See the &lt;a href="http://apidog.com/blog/how-to-use-gpt-5-5-api?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;GPT-5.5 API guide&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Team billing is needed:&lt;/strong&gt; Business or Enterprise plans—see the &lt;a href="http://apidog.com/blog/gpt-5-5-pricing?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;pricing breakdown&lt;/a&gt;.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Underlying model remains the same; only billing and access method change.&lt;/p&gt;

&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Does Codex run GPT-5.5 Pro too?&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
No—Pro isn’t currently exposed in Codex. CLI uses standard GPT-5.5; Pro is available in ChatGPT web/app and (eventually) direct API.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Can I use Codex without a ChatGPT account?&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
No. You need either a ChatGPT login or an OpenAI API key. Free access requires ChatGPT authentication.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How long will Free and Go access last?&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
OpenAI says “limited time.” Expect weeks to a few months—upgrade as usage grows.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Does Codex work offline?&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
No. Every GPT-5.5 operation requires a connection to OpenAI’s servers.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How is Codex different from the ChatGPT web app?&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Codex runs in your terminal with local filesystem and repo context, plus shell access. The web app doesn’t have these capabilities.&lt;/p&gt;

</description>
    </item>
    <item>
      <title>How to Use the GPT-5.5 API for Free</title>
      <dc:creator>Hassann</dc:creator>
      <pubDate>Fri, 24 Apr 2026 02:12:57 +0000</pubDate>
      <link>https://dev.to/hassann/how-to-use-the-gpt-55-api-for-free-59dm</link>
      <guid>https://dev.to/hassann/how-to-use-the-gpt-55-api-for-free-59dm</guid>
      <description>&lt;p&gt;GPT-5.5 launched on April 23, 2026, behind a paywall for most users—Plus, Pro, Business, and Enterprise plans in ChatGPT, and paid API tokens for programmatic use. However, there are three current, verifiable ways to access GPT-5.5 for free. If you can work within rate limits and accept temporary access, you can make actual GPT-5.5 calls without adding payment details.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://apidog.com/?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation" class="crayons-btn crayons-btn--primary"&gt;Try Apidog today&lt;/a&gt;
&lt;/p&gt;

&lt;p&gt;This guide covers all free options, maps them to use cases, and shows how to set up a production-ready request collection in &lt;a href="https://apidog.com?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;Apidog&lt;/a&gt; to ensure a smooth transition when you need to move off the free tier.&lt;/p&gt;

&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Codex CLI on ChatGPT Free or Go&lt;/strong&gt; — Use the Codex command-line tool for free, temporary GPT-5.5 access. No credit card required.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;OpenAI trial credit for new API accounts&lt;/strong&gt; — New accounts get a small starter balance, usable for GPT-5.5 calls once the Responses API is available.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;OpenRouter and aggregator free tiers&lt;/strong&gt; — Some third-party gateways offer free quota for new models shortly after launch.&lt;/li&gt;
&lt;li&gt;All paths have usage caps. For any production workload, switch to paid billing before free access runs out.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Path 1: Codex CLI (the most practical free route)
&lt;/h2&gt;

&lt;p&gt;OpenAI includes Codex with every ChatGPT plan (Free and Go, limited time). Codex exposes GPT-5.5 using ChatGPT sign-in, not an API key. Sign in with a free account, run the CLI, and interact with GPT-5.5 using a 400K-token window.&lt;/p&gt;

&lt;h3&gt;
  
  
  Install
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-g&lt;/span&gt; @openai/codex
&lt;span class="c"&gt;# or&lt;/span&gt;
brew &lt;span class="nb"&gt;install &lt;/span&gt;codex
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Verify the installation:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;codex &lt;span class="nt"&gt;--version&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Authenticate
&lt;/h3&gt;

&lt;p&gt;First run of &lt;code&gt;codex&lt;/code&gt; opens a browser for ChatGPT OAuth. For headless environments:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;codex login &lt;span class="nt"&gt;--device-auth&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This uses device code authentication—copy the URL and code to another machine to authorize. No API key is needed.&lt;/p&gt;

&lt;h3&gt;
  
  
  Pick the model
&lt;/h3&gt;

&lt;p&gt;Within Codex:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;/model gpt-5.5
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Or start Codex directly with the model flag:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;codex &lt;span class="nt"&gt;--model&lt;/span&gt; gpt-5.5
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Check your remaining quota with &lt;code&gt;/status&lt;/code&gt;. Free and Go plans have tighter limits than paid, but enough for prototyping.&lt;/p&gt;

&lt;h3&gt;
  
  
  What is available (and what is not)
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;You get:&lt;/strong&gt; Actual GPT-5.5 model, 400K context, file reads, terminal command execution, repo editing within the CLI.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;You do not get:&lt;/strong&gt; Direct API access. GPT-5.5 is only accessible through Codex when signed in.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For a step-by-step Codex guide, see &lt;a href="http://apidog.com/blog/how-to-use-gpt-5-5-free-codex?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;our free GPT-5.5 with Codex guide&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;OpenAI's "limited time" window means Codex access for Free and Go users will end. Make your project flexible by letting the model ID be configurable—swap to a paid plan or API when needed without rewriting code.&lt;/p&gt;

&lt;h2&gt;
  
  
  Path 2: OpenAI trial credit for new API accounts
&lt;/h2&gt;

&lt;p&gt;New OpenAI developer accounts typically receive a small trial credit. The amount can vary—historically $5 for 90 days, more for .edu emails. When GPT-5.5 API is generally available, you can use this trial balance for real calls.&lt;/p&gt;

&lt;h3&gt;
  
  
  How to claim trial credit
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;Create a new developer account at &lt;code&gt;platform.openai.com&lt;/code&gt; using a fresh email (previous billing history may disqualify you).&lt;/li&gt;
&lt;li&gt;Verify your phone number. This is required for the trial.&lt;/li&gt;
&lt;li&gt;Create a project-scoped API key under the trial organization.&lt;/li&gt;
&lt;li&gt;Check the usage dashboard for your credit amount and expiry date.&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  GPT-5.5 usage with trial credit
&lt;/h3&gt;

&lt;p&gt;With $5 credit and current pricing ($5/M input, $30/M output tokens), you get about 1M input tokens or ~160K output tokens with GPT-5.5. Enough to prototype, benchmark, or validate workflows—not enough for production.&lt;/p&gt;

&lt;h4&gt;
  
  
  Save costs during trial
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Use Batch mode:&lt;/strong&gt; Batch API requests cost 50% less. Great for workflows not requiring instant responses. See &lt;a href="https://openai.com/api/pricing/" rel="noopener noreferrer"&gt;OpenAI API pricing&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Set &lt;code&gt;reasoning.effort&lt;/code&gt; to &lt;code&gt;low&lt;/code&gt;:&lt;/strong&gt; Default is &lt;code&gt;low&lt;/code&gt; (same as GPT-5.4). Higher values consume more tokens.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Important limitations
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Trial credit is one-time. When used up, calls return 402. You cannot get a second trial on the same payment or phone credentials.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Path 3: Aggregator free tiers
&lt;/h2&gt;

&lt;p&gt;Third-party gateways (OpenRouter, Together, Groq) sometimes offer free quota for new models. These offers are temporary—check availability before relying on them.&lt;/p&gt;

&lt;h3&gt;
  
  
  Typical setup pattern
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;Sign up and verify your email with the aggregator.&lt;/li&gt;
&lt;li&gt;Obtain an aggregator API key.&lt;/li&gt;
&lt;li&gt;Change your SDK’s base URL to point to the aggregator.&lt;/li&gt;
&lt;li&gt;Use the aggregator’s model alias, e.g., &lt;code&gt;openai/gpt-5.5&lt;/code&gt;.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Example with OpenRouter:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OpenAI&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;base_url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://openrouter.ai/api/v1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sk-or-v1-...&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;openai/gpt-5.5&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Explain the Responses API in two paragraphs.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}],&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Caveats:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Aggregators have their own rate limits.&lt;/li&gt;
&lt;li&gt;Free quota is shared and can disappear fast.&lt;/li&gt;
&lt;li&gt;When free access ends, requests return 402 or 429. Use these for prototyping only.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Which free path should you pick?
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Use case&lt;/th&gt;
&lt;th&gt;Best free path&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Terminal-based coding assistant&lt;/td&gt;
&lt;td&gt;Codex CLI (Path 1)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Quick Python or Node experiments&lt;/td&gt;
&lt;td&gt;Trial credit (Path 2)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Testing from a hosted app&lt;/td&gt;
&lt;td&gt;Aggregator (Path 3)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Comparing GPT-5.5 against GPT-5.4 on real prompts&lt;/td&gt;
&lt;td&gt;Trial credit + Apidog collection&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;One-off “can this answer my question” research&lt;/td&gt;
&lt;td&gt;ChatGPT Plus (not free, but cheapest per hour)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;For anything beyond a prototype, these free paths run out quickly. Use them to learn request shapes and tune prompts before switching to paid tokens.&lt;/p&gt;

&lt;h2&gt;
  
  
  Pre-build the request shape in Apidog
&lt;/h2&gt;

&lt;p&gt;To avoid rewrites when moving from free to paid, build your request in a version-controlled Apidog collection.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fassets.apidog.com%2Fblog-next%2F2026%2F04%2Fimage-213.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fassets.apidog.com%2Fblog-next%2F2026%2F04%2Fimage-213.png" alt="" width="800" height="530"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In Apidog:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Create a new collection and add a &lt;code&gt;POST https://api.openai.com/v1/responses&lt;/code&gt; request.&lt;/li&gt;
&lt;li&gt;Set the auth header from an environment variable to swap keys without editing requests.&lt;/li&gt;
&lt;li&gt;Save an example response so downstream developers can work with mocks when keys are missing.&lt;/li&gt;
&lt;li&gt;Clone the collection for aggregator testing: point &lt;code&gt;baseUrl&lt;/code&gt; at OpenRouter and change the model string.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;When trial credit expires or you upgrade, just update the environment variable—no other changes required. For VS Code integration, see our &lt;a href="http://apidog.com/blog/how-to-use-apidog-inside-vscode?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;Apidog in VS Code walkthrough&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Free-path limitations to plan around
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Rate limits vary by load:&lt;/strong&gt; Codex Free/Go slows down during peak hours.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Trial credits do not stack:&lt;/strong&gt; Duplicate accounts (same card/phone/IP) do not get multiple trials.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GPT-5.5 Pro is always paid:&lt;/strong&gt; No free access to Pro.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Thinking mode uses more quota:&lt;/strong&gt; Keep &lt;code&gt;reasoning.effort&lt;/code&gt; at &lt;code&gt;low&lt;/code&gt; unless you need accuracy.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Free windows are temporary:&lt;/strong&gt; Codex Free/Go access is time-limited (&lt;a href="https://openai.com/index/introducing-gpt-5-5/" rel="noopener noreferrer"&gt;OpenAI launch announcement&lt;/a&gt;).&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  A realistic free-tier prototype workflow
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;Pick a real team task: report, code review, research brief, etc.&lt;/li&gt;
&lt;li&gt;Run 10 examples through GPT-5.4 (existing tools), log output quality.&lt;/li&gt;
&lt;li&gt;Run the same 10 examples through GPT-5.5 (Codex CLI or trial credit).&lt;/li&gt;
&lt;li&gt;Compare output-per-token and error rate.&lt;/li&gt;
&lt;li&gt;Decide if the upgrade justifies the extra cost for your workload.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This can be done in an afternoon, and will inform your production choice before committing to paid usage.&lt;/p&gt;

&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Is Codex Free/Go trial permanent?&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
No. &lt;a href="https://openai.com/index/introducing-gpt-5-5/" rel="noopener noreferrer"&gt;OpenAI calls it “limited time.”&lt;/a&gt; Expect it to end within months.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Does ChatGPT Free provide GPT-5.5 in-browser?&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
No. Free ChatGPT uses GPT-5.3. GPT-5.5 requires Plus or higher.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Can I run GPT-5.5 for free via Hugging Face or Ollama?&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
No. GPT-5.5 is closed-weight; only available via OpenAI or Codex sign-in.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Is there a student discount?&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
OpenAI has offered .edu email perks before. Check the &lt;a href="https://openai.com/education/" rel="noopener noreferrer"&gt;OpenAI education page&lt;/a&gt; for current offers.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How do I switch from free to paid without rewriting code?&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Use environment variables for &lt;code&gt;OPENAI_API_KEY&lt;/code&gt; and &lt;code&gt;OPENAI_BASE_URL&lt;/code&gt;. When your trial ends, just change these. See our &lt;a href="http://apidog.com/blog/how-to-use-gpt-5-5-api?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;GPT-5.5 API guide&lt;/a&gt; for best practices.&lt;/p&gt;

</description>
    </item>
    <item>
      <title>How to Use the GPT-5.5 API</title>
      <dc:creator>Hassann</dc:creator>
      <pubDate>Fri, 24 Apr 2026 02:00:21 +0000</pubDate>
      <link>https://dev.to/hassann/how-to-use-the-gpt-55-api-55kd</link>
      <guid>https://dev.to/hassann/how-to-use-the-gpt-55-api-55kd</guid>
      <description>&lt;p&gt;GPT-5.5 launched on April 23, 2026. OpenAI immediately opened the model for ChatGPT and Codex, with Responses and Chat Completions APIs coming “very soon.” This guide covers both: how to call GPT-5.5 as soon as API keys work, and how to access it today via the Codex sign-in path.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://apidog.com/?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation" class="crayons-btn crayons-btn--primary"&gt;Try Apidog today&lt;/a&gt;
&lt;/p&gt;

&lt;p&gt;This article includes endpoint shapes, authentication, Python and Node examples, the parameter table, pricing breakdown, error handling, and a testing workflow in &lt;a href="https://apidog.com?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;Apidog&lt;/a&gt; to help you save credits while iterating.&lt;/p&gt;

&lt;p&gt;For a product overview, see &lt;a href="http://apidog.com/blog/what-is-gpt-5-5?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;What is GPT-5.5&lt;/a&gt;. For a free-tier guide, see &lt;a href="http://apidog.com/blog/how-to-use-gpt-5-5-api-for-free?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;How to use GPT-5.5 API for free&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id="tldr"&gt;TL;DR&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;GPT-5.5 is available via &lt;strong&gt;Responses&lt;/strong&gt; and &lt;strong&gt;Chat Completions&lt;/strong&gt; endpoints. Model IDs: &lt;code&gt;gpt-5.5&lt;/code&gt; and &lt;code&gt;gpt-5.5-pro&lt;/code&gt;.&lt;/li&gt;
  &lt;li&gt;API pricing: &lt;strong&gt;$5 / M input&lt;/strong&gt;, &lt;strong&gt;$30 / M output&lt;/strong&gt;; Pro: &lt;strong&gt;$30 / M input&lt;/strong&gt;, &lt;strong&gt;$180 / M output&lt;/strong&gt;.&lt;/li&gt;
  &lt;li&gt;Context window: &lt;strong&gt;1M tokens&lt;/strong&gt; (API), &lt;strong&gt;400K&lt;/strong&gt; (Codex CLI).&lt;/li&gt;
  &lt;li&gt;Until API GA, access GPT-5.5 via Codex with ChatGPT sign-in.&lt;/li&gt;
  &lt;li&gt;Use &lt;a href="https://apidog.com/download?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;Apidog&lt;/a&gt; to pre-build collections; request shape matches GPT-5.4 with new model ID and expanded &lt;code&gt;reasoning&lt;/code&gt; block.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id="prerequisites"&gt;Prerequisites&lt;/h2&gt;

&lt;p&gt;Before making your first call, ensure:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;
&lt;strong&gt;OpenAI developer account&lt;/strong&gt; with a billable tier. ChatGPT Plus/Pro is separate from API billing; for both UI and API, you need both.&lt;/li&gt;
  &lt;li&gt;
&lt;strong&gt;API key&lt;/strong&gt; with GPT-5 access. Prefer project-scoped keys for production workloads.&lt;/li&gt;
  &lt;li&gt;
&lt;strong&gt;SDK version supporting &lt;code&gt;gpt-5.5&lt;/code&gt;&lt;/strong&gt;: Python &lt;code&gt;openai&amp;gt;=2.1.0&lt;/code&gt;, Node &lt;code&gt;openai@5.1.0&lt;/code&gt; or newer.&lt;/li&gt;
  &lt;li&gt;
&lt;strong&gt;API client&lt;/strong&gt; for easy request replay. Use curl for one-off, then switch to Apidog or similar for iteration.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Export your API key:&lt;/p&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;OPENAI_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"sk-proj-..."&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2 id="endpoint-and-authentication"&gt;Endpoint and authentication&lt;/h2&gt;

&lt;p&gt;GPT-5.5 uses the same endpoints as GPT-5:&lt;/p&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight http"&gt;&lt;code&gt;&lt;span class="err"&gt;POST https://api.openai.com/v1/responses
POST https://api.openai.com/v1/chat/completions
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Responses API is tool-aware (supports thinking mode, web search, computer use). Chat Completions maintains compatibility with legacy integrations.&lt;/p&gt;

&lt;p&gt;Authenticate using a bearer token. Every request sends a JSON body with model ID, prompt/message array, and additional parameters as needed.&lt;/p&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl https://api.openai.com/v1/responses &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Authorization: Bearer &lt;/span&gt;&lt;span class="nv"&gt;$OPENAI_API_KEY&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Content-Type: application/json"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'{
    "model": "gpt-5.5",
    "input": "Summarize the last 10 releases of the openai/codex repo in three bullets.",
    "reasoning": { "effort": "medium" }
  }'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Successful calls return a JSON object with an &lt;code&gt;output&lt;/code&gt; array and a &lt;code&gt;usage&lt;/code&gt; block (input, output, reasoning tokens). Errors return a standard OpenAI envelope with &lt;code&gt;code&lt;/code&gt; and &lt;code&gt;message&lt;/code&gt;; see the error table below.&lt;/p&gt;

&lt;h2 id="request-parameters"&gt;Request parameters&lt;/h2&gt;

&lt;p&gt;Here’s a full map of &lt;code&gt;gpt-5.5&lt;/code&gt; parameters and their effects:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Parameter&lt;/th&gt;
&lt;th&gt;Type&lt;/th&gt;
&lt;th&gt;Values&lt;/th&gt;
&lt;th&gt;Notes&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;model&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;string&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;gpt-5.5&lt;/code&gt;, &lt;code&gt;gpt-5.5-pro&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Required. Pro is 6× cost.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;input&lt;/code&gt; / &lt;code&gt;messages&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;string or array&lt;/td&gt;
&lt;td&gt;Prompt or chat array&lt;/td&gt;
&lt;td&gt;Required. Use &lt;code&gt;input&lt;/code&gt; for Responses, &lt;code&gt;messages&lt;/code&gt; for Chat Completions.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;reasoning.effort&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;string&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;none&lt;/code&gt;, &lt;code&gt;low&lt;/code&gt;, &lt;code&gt;medium&lt;/code&gt;, &lt;code&gt;high&lt;/code&gt;, &lt;code&gt;xhigh&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Default: &lt;code&gt;low&lt;/code&gt;. &lt;code&gt;xhigh&lt;/code&gt; = max depth, higher cost.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;max_output_tokens&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;integer&lt;/td&gt;
&lt;td&gt;1 – 128000&lt;/td&gt;
&lt;td&gt;Output cap, excludes reasoning tokens.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;tools&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;array&lt;/td&gt;
&lt;td&gt;Function, web_search, file_search, computer_use, code_interpreter&lt;/td&gt;
&lt;td&gt;Define available tools. Model chains them as needed.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;tool_choice&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;string/object&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;auto&lt;/code&gt;, &lt;code&gt;none&lt;/code&gt;, or a specific tool&lt;/td&gt;
&lt;td&gt;Force specific tool usage.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;response_format&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;object&lt;/td&gt;
&lt;td&gt;&lt;code&gt;{ "type": "json_schema", "schema": {...} }&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Structured output. Strict mode default.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;stream&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;boolean&lt;/td&gt;
&lt;td&gt;true / false&lt;/td&gt;
&lt;td&gt;Server-sent events; reasoning tokens streamed separately.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;user&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;string&lt;/td&gt;
&lt;td&gt;Free-form&lt;/td&gt;
&lt;td&gt;Helps abuse detection. Pass a hashed user ID.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;metadata&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;object&lt;/td&gt;
&lt;td&gt;Up to 16 key-value pairs&lt;/td&gt;
&lt;td&gt;Visible in OpenAI dashboard/logs.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;seed&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;integer&lt;/td&gt;
&lt;td&gt;Any int32&lt;/td&gt;
&lt;td&gt;Soft determinism; output is similar for same prompt + seed.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;temperature&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;number&lt;/td&gt;
&lt;td&gt;0 – 2&lt;/td&gt;
&lt;td&gt;Ignored if &lt;code&gt;reasoning.effort &amp;gt;= medium&lt;/code&gt;.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Parameters most affecting cost: &lt;code&gt;reasoning.effort&lt;/code&gt;, &lt;code&gt;max_output_tokens&lt;/code&gt;, and &lt;code&gt;tools&lt;/code&gt;. High or xhigh &lt;code&gt;reasoning.effort&lt;/code&gt; can increase output tokens 3–8× compared to low.&lt;/p&gt;

&lt;h2 id="python-example"&gt;Python example&lt;/h2&gt;

&lt;p&gt;SDK usage mirrors GPT-5.4; update the model ID and use the expanded &lt;code&gt;reasoning.effort&lt;/code&gt; range.&lt;/p&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OpenAI&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;responses&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpt-5.5&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nb"&gt;input&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;system&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You are a senior Go engineer. Answer in terse, runnable code.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Write a worker pool with bounded concurrency and a context &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cancellation path. No third-party deps.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
            &lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;reasoning&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;effort&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;medium&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="n"&gt;max_output_tokens&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;4000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;output_text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;usage&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;model_dump&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
  &lt;li&gt;
&lt;code&gt;response.output_text&lt;/code&gt; flattens the output array. For structured events (tool calls, citations, etc.), use &lt;code&gt;response.output&lt;/code&gt;.&lt;/li&gt;
  &lt;li&gt;
&lt;code&gt;usage&lt;/code&gt; contains &lt;code&gt;input_tokens&lt;/code&gt;, &lt;code&gt;output_tokens&lt;/code&gt;, &lt;code&gt;reasoning_tokens&lt;/code&gt;. Bill against all three.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id="node-example"&gt;Node example&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nx"&gt;OpenAI&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;openai&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;responses&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;gpt-5.5&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;input&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;role&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;system&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;You are a careful reviewer.&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;role&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;user&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Review this migration and flag any operation that would lock a write-heavy table for more than 200 ms.&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;],&lt;/span&gt;
  &lt;span class="na"&gt;reasoning&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;effort&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;high&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="na"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt; &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;file_search&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;}],&lt;/span&gt;
  &lt;span class="na"&gt;max_output_tokens&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;6000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;output_text&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;usage&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Set &lt;code&gt;reasoning.effort&lt;/code&gt; to &lt;code&gt;high&lt;/code&gt; for review tasks where correctness outweighs cost.&lt;/p&gt;

&lt;h2 id="thinking-mode"&gt;Thinking mode&lt;/h2&gt;

&lt;p&gt;Thinking mode uses &lt;code&gt;reasoning.effort&lt;/code&gt; set to &lt;code&gt;high&lt;/code&gt; or &lt;code&gt;xhigh&lt;/code&gt; with a higher &lt;code&gt;max_output_tokens&lt;/code&gt;. There’s no special model ID—just adjust these parameters per request.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;
&lt;strong&gt;Default to &lt;code&gt;medium&lt;/code&gt;&lt;/strong&gt; for most tasks (agentic work, multi-file debugging, doc generation). Costs remain close to GPT-5.4.&lt;/li&gt;
  &lt;li&gt;
&lt;strong&gt;Use &lt;code&gt;high&lt;/code&gt;/&lt;code&gt;xhigh&lt;/code&gt;&lt;/strong&gt; for research, correctness-critical tasks, and long tool chains. Budget for 3–8× output tokens and longer response times.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If using &lt;code&gt;computer_use&lt;/code&gt; or long web-search chains, higher effort reduces hallucinations (see OpenAI’s &lt;a href="https://openai.com/index/introducing-gpt-5-5/" rel="noopener noreferrer"&gt;launch post&lt;/a&gt;).&lt;/p&gt;

&lt;h2 id="structured-output"&gt;Structured output&lt;/h2&gt;

&lt;p&gt;Strict JSON output is default. Pass a schema to the SDK for guaranteed JSON structure.&lt;/p&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;responses&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpt-5.5&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nb"&gt;input&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Extract the title, speaker, and start time from this transcript chunk.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;response_format&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;json_schema&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;json_schema&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;session_extract&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;strict&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;schema&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;object&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;required&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;title&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;speaker&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;start_time&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;properties&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;title&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;string&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;speaker&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;string&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;start_time&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;string&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;format&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;date-time&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
                &lt;span class="p"&gt;},&lt;/span&gt;
            &lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For pipelines that feed downstream code, always set a schema. This prevents malformed output and eliminates manual retry logic.&lt;/p&gt;

&lt;h2 id="tool-use-and-agents"&gt;Tool use and agents&lt;/h2&gt;

&lt;p&gt;The Responses API exposes five first-party tool types:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;
&lt;code&gt;web_search&lt;/code&gt;: real-time search with citations&lt;/li&gt;
  &lt;li&gt;
&lt;code&gt;file_search&lt;/code&gt;: vector search over uploaded files&lt;/li&gt;
  &lt;li&gt;
&lt;code&gt;code_interpreter&lt;/code&gt;: sandboxed Python&lt;/li&gt;
  &lt;li&gt;
&lt;code&gt;computer_use&lt;/code&gt;: mouse, keyboard, and browser via Operator stack&lt;/li&gt;
  &lt;li&gt;
&lt;code&gt;function&lt;/code&gt;: custom callbacks&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;GPT-5.5 chains tools more effectively than 5.4. In tests like &lt;a href="https://the-decoder.com/openai-unveils-gpt-5-5-claims-a-new-class-of-intelligence-at-double-the-api-price/" rel="noopener noreferrer"&gt;The Decoder’s&lt;/a&gt;, 5.5 completed 11% more multi-step tool chains without user intervention.&lt;/p&gt;

&lt;h2 id="error-handling-and-retries"&gt;Error handling and retries&lt;/h2&gt;

&lt;p&gt;Handle these common error codes explicitly:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Code&lt;/th&gt;
&lt;th&gt;Meaning&lt;/th&gt;
&lt;th&gt;Retry?&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;429 rate_limit_exceeded&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Rate cap hit.&lt;/td&gt;
&lt;td&gt;Yes, use exponential backoff + jitter.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;400 context_length_exceeded&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Input + output + reasoning &amp;gt; 1M tokens.&lt;/td&gt;
&lt;td&gt;No; shorten input.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;500 server_error&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;OpenAI server error.&lt;/td&gt;
&lt;td&gt;Yes, up to 3 attempts.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;403 policy_violation&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Safety refusal.&lt;/td&gt;
&lt;td&gt;No; rewrite prompt.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Reasoning tokens count toward context window. For example, &lt;code&gt;reasoning.effort: "xhigh"&lt;/code&gt; on a 900K-token input can trigger context overflow.&lt;/p&gt;

&lt;h2 id="testing-workflow-with-apidog"&gt;Testing workflow with Apidog&lt;/h2&gt;

&lt;p&gt;Due to GPT-5.5’s cost, avoid burning tokens with repeated trial runs. Recommended workflow:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;Build the request in &lt;a href="https://apidog.com?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;Apidog&lt;/a&gt;, save it in a collection, and tag the environment (dev/staging/prod).&lt;/li&gt;
  &lt;li&gt;Use Apidog’s mock server to replay the last real response while refining downstream code.&lt;/li&gt;
  &lt;li&gt;Switch to a live key only when your schema and logic are stable.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Apidog also integrates with Claude Code and Cursor, so you can access collections directly from your editor. See the &lt;a href="http://apidog.com/blog/how-to-use-apidog-inside-vscode?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;VS Code walkthrough&lt;/a&gt; and &lt;a href="http://apidog.com/blog/api-testing-without-postman-2026?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;Apidog vs. Postman comparison&lt;/a&gt; for setup instructions.&lt;/p&gt;

&lt;h2 id="calling-gpt-55-before-the-api-is-general"&gt;Calling GPT-5.5 before the API is general&lt;/h2&gt;

&lt;p&gt;Until OpenAI’s Responses API is fully available, use the Codex sign-in flow for early access. The &lt;a href="http://apidog.com/blog/how-to-use-gpt-5-5-free-codex?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;Codex free guide&lt;/a&gt; explains how to install the CLI, authenticate with ChatGPT, and select the model.&lt;/p&gt;

&lt;h2 id="faq"&gt;FAQ&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Is there a &lt;code&gt;gpt-5.5-mini&lt;/code&gt;?&lt;/strong&gt; Not at launch. &lt;code&gt;gpt-5.4-mini&lt;/code&gt; remains the cost-optimized option.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Context window size?&lt;/strong&gt; 1M tokens (API), 400K (Codex CLI). Both count reasoning tokens.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Do I need to rewrite GPT-5.4 code?&lt;/strong&gt; No. Swap the model ID, adjust &lt;code&gt;max_output_tokens&lt;/code&gt; if needed, and tune &lt;code&gt;reasoning.effort&lt;/code&gt; as appropriate.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How to reduce cost?&lt;/strong&gt; Options: Batch (50% off), Flex (50% off with slower queue), and strict schemas to avoid retries. See the &lt;a href="http://apidog.com/blog/gpt-5-5-pricing?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;GPT-5.5 pricing breakdown&lt;/a&gt; for details.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Where to get API GA updates?&lt;/strong&gt; Watch the &lt;a href="https://community.openai.com/" rel="noopener noreferrer"&gt;OpenAI developer community&lt;/a&gt; and &lt;a href="https://openai.com/api/pricing/" rel="noopener noreferrer"&gt;OpenAI API pricing page&lt;/a&gt;.&lt;/p&gt;

</description>
    </item>
    <item>
      <title>What Is GPT-5.5? OpenAI's New Frontier Model Explained</title>
      <dc:creator>Hassann</dc:creator>
      <pubDate>Fri, 24 Apr 2026 01:46:22 +0000</pubDate>
      <link>https://dev.to/hassann/what-is-gpt-55-openais-new-frontier-model-explained-1e9p</link>
      <guid>https://dev.to/hassann/what-is-gpt-55-openais-new-frontier-model-explained-1e9p</guid>
      <description>&lt;p&gt;OpenAI shipped GPT-5.5 on April 23, 2026, just six weeks after GPT-5.4. Marketed as “a new class of intelligence for real work,” GPT-5.5 is a frontier model designed for multi-step coding, advanced computer use, and deep research. It’s available today in ChatGPT and Codex, with API access launching soon.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://apidog.com/?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation" class="crayons-btn crayons-btn--primary"&gt;Try Apidog today&lt;/a&gt;
&lt;/p&gt;

&lt;p&gt;If you’re evaluating whether GPT-5.5 is worth adopting, this guide focuses on what’s new, how it differs from 5.4, key benchmarks, actionable usage steps, and practical caveats.&lt;/p&gt;

&lt;p&gt;For hands-on guides, see:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="http://apidog.com/blog/how-to-use-gpt-5-5-api?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;GPT-5.5 API guide&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="http://apidog.com/blog/how-to-use-gpt-5-5-api-for-free?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;Free-access guide&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="http://apidog.com/blog/how-to-use-gpt-5-5-free-codex?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;Codex free path&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="http://apidog.com/blog/gpt-5-5-pricing?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;GPT-5.5 pricing breakdown&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;To prepare for the API launch, get &lt;a href="https://apidog.com/?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;Apidog&lt;/a&gt; and pre-build your collection.&lt;/p&gt;

&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;GPT-5.5&lt;/strong&gt;: OpenAI’s advanced coding/reasoning model, released April 23, 2026.&lt;/li&gt;
&lt;li&gt;Scores &lt;strong&gt;88.7% on SWE-bench&lt;/strong&gt; and &lt;strong&gt;92.4% on MMLU&lt;/strong&gt;; &lt;strong&gt;60% fewer hallucinations&lt;/strong&gt; vs. 5.4.&lt;/li&gt;
&lt;li&gt;Three variants: &lt;strong&gt;GPT-5.5 standard&lt;/strong&gt;, &lt;strong&gt;GPT-5.5 Thinking&lt;/strong&gt; (extended reasoning), &lt;strong&gt;GPT-5.5 Pro&lt;/strong&gt; (highest accuracy).&lt;/li&gt;
&lt;li&gt;Available now in &lt;strong&gt;ChatGPT Plus, Pro, Business, Enterprise, Edu&lt;/strong&gt; and &lt;strong&gt;Codex&lt;/strong&gt; across all plans (including a temporary free window for Free and Go).&lt;/li&gt;
&lt;li&gt;API is staged—developers can use it via &lt;strong&gt;Codex sign-in path&lt;/strong&gt; now; full API is rolling out soon.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;API pricing&lt;/strong&gt;: $5/M input tokens, $30/M output tokens—double GPT-5.4, but more token-efficient.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What GPT-5.5 Actually Is
&lt;/h2&gt;

&lt;p&gt;GPT-5.5 leads the GPT-5 family, above GPT-5.4, 5.4-mini, and 5.3. Codename “Spud,” official name GPT-5.5.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdz76bgxfqjme3i4hca84.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdz76bgxfqjme3i4hca84.png" alt="GPT-5.5 model diagram" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Variants:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;GPT-5.5 (default):&lt;/strong&gt; Faster, sharper, and more token-efficient for most tasks.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GPT-5.5 Thinking:&lt;/strong&gt; Same model but with a larger reasoning budget—ideal for complex spreadsheets, dense research, and multi-file debugging. Capped at ~3,000 messages/week in ChatGPT.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GPT-5.5 Pro:&lt;/strong&gt; Highest-accuracy; for correctness-critical work. Only on Pro, Business, and Enterprise plans.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Key features:&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Trained to plan, select tools, and self-check outputs. Expect fewer prompts, more accurate tables, and more clarifying questions instead of hallucinations.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Changed from GPT-5.4
&lt;/h2&gt;

&lt;p&gt;The six-week interval means targeted upgrades rather than a generational leap. Here’s the practical diff:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Capability&lt;/th&gt;
&lt;th&gt;GPT-5.4&lt;/th&gt;
&lt;th&gt;GPT-5.5&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;SWE-bench&lt;/td&gt;
&lt;td&gt;~74 %&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;88.7 %&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;MMLU&lt;/td&gt;
&lt;td&gt;91.1 %&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;92.4 %&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Hallucination rate&lt;/td&gt;
&lt;td&gt;baseline&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;−60 %&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Context window (API)&lt;/td&gt;
&lt;td&gt;1.05 M&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;1 M&lt;/strong&gt; (Codex: 400 K)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;API input price&lt;/td&gt;
&lt;td&gt;$2.50 / M&lt;/td&gt;
&lt;td&gt;$5.00 / M&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;API output price&lt;/td&gt;
&lt;td&gt;$15.00 / M&lt;/td&gt;
&lt;td&gt;$30.00 / M&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Computer use&lt;/td&gt;
&lt;td&gt;Improving&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Production-grade&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Multi-step tool chains&lt;/td&gt;
&lt;td&gt;Single-shot&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Full autonomous loops&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;SWE-bench&lt;/strong&gt; is the headline metric. GPT-5.5’s 88.7% means it closes GitHub issues at a senior engineering level (per OpenAI). Test it on your codebase for real-world validation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pricing&lt;/strong&gt; doubled, but token efficiency improves. Independent tests (&lt;a href="https://the-decoder.com/openai-unveils-gpt-5-5-claims-a-new-class-of-intelligence-at-double-the-api-price/" rel="noopener noreferrer"&gt;The Decoder&lt;/a&gt;) show net cost rises ~20% overall, less on short-prompt workloads.&lt;/p&gt;

&lt;h2&gt;
  
  
  What It Is Good At
&lt;/h2&gt;

&lt;p&gt;OpenAI targets four use cases:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Agentic coding:&lt;/strong&gt; Repo reading, file ops, running tests, iterative development. Powered by the SWE-bench metrics.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Computer use:&lt;/strong&gt; Drives browsers, shells, fills forms, scrapes data, recovers from intermediate errors.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Deep research:&lt;/strong&gt; Longer reasoning chains, better web search, and improved summarization. “Thinking” variant is optimized for this.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Document/spreadsheet generation:&lt;/strong&gt; Fewer layout errors, correct formulas, cleaner slides. Available in ChatGPT Plus/Business.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Not ideal for:&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Cheap, high-volume tasks (classification, embeddings, bulk summarization)—use GPT-5.4-mini or 5.3 for better cost efficiency.&lt;/p&gt;

&lt;h2&gt;
  
  
  Availability Today
&lt;/h2&gt;

&lt;p&gt;Here’s the access snapshot (as of April 23, 2026):&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Surface&lt;/th&gt;
&lt;th&gt;Plan&lt;/th&gt;
&lt;th&gt;Access&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;ChatGPT&lt;/td&gt;
&lt;td&gt;Free / Go&lt;/td&gt;
&lt;td&gt;GPT-5.3 default, no GPT-5.5&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;ChatGPT&lt;/td&gt;
&lt;td&gt;Plus&lt;/td&gt;
&lt;td&gt;GPT-5.5 standard + Thinking (3,000/week)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;ChatGPT&lt;/td&gt;
&lt;td&gt;Pro / Business / Enterprise / Edu&lt;/td&gt;
&lt;td&gt;Standard + Thinking + Pro&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Codex&lt;/td&gt;
&lt;td&gt;All plans (incl. Free/Go)&lt;/td&gt;
&lt;td&gt;GPT-5.5 with 400K context; Free/Go on limited-time trial&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;API&lt;/td&gt;
&lt;td&gt;Responses / Chat Completions&lt;/td&gt;
&lt;td&gt;“Very soon”; not GA at launch&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Codex&lt;/strong&gt; is the actionable path: GPT-5.5 is live in Codex, so you can use it from the CLI today—no API keys required.&lt;/p&gt;

&lt;h2&gt;
  
  
  Pricing in One Line
&lt;/h2&gt;

&lt;p&gt;Budgeting? Here’s what to expect:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;GPT-5.5 API:&lt;/strong&gt; $5/M input, $30/M output.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GPT-5.5 Pro API:&lt;/strong&gt; $30/M input, $180/M output (same as 5.4 Pro).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Batch/Flex:&lt;/strong&gt; Half standard rate.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Priority:&lt;/strong&gt; 2.5× standard rate.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Codex CLI:&lt;/strong&gt; Free on Plus, Pro, Business, Enterprise, Edu, Go, and temporarily Free (plan caps apply).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Full details and per-workload estimates in the &lt;a href="http://apidog.com/blog/gpt-5-5-pricing?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;GPT-5.5 pricing article&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  How To Try It Today
&lt;/h2&gt;

&lt;p&gt;Get started with minimal friction:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Open ChatGPT (paid plan):&lt;/strong&gt;
Select GPT-5.5 from the model picker.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Install Codex CLI:&lt;/strong&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;   npm &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-g&lt;/span&gt; @openai/codex
   &lt;span class="c"&gt;# or&lt;/span&gt;
   brew &lt;span class="nb"&gt;install &lt;/span&gt;codex
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then run &lt;code&gt;codex&lt;/code&gt; and sign in with your ChatGPT account. Use &lt;code&gt;/model gpt-5.5&lt;/code&gt; to switch. Free/Go plans are included for a limited time.&lt;br&gt;&lt;br&gt;
   Full walkthrough: &lt;a href="http://apidog.com/blog/how-to-use-gpt-5-5-free-codex?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;How to use GPT-5.5 for free with Codex&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Pre-build for API rollout:&lt;/strong&gt;
The API is coming soon. Build your request collection now in &lt;a href="https://apidog.com/?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;Apidog&lt;/a&gt; so you’re ready when the model ID goes live.
See the &lt;a href="http://apidog.com/blog/how-to-use-gpt-5-5-api?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;GPT-5.5 API guide&lt;/a&gt; for endpoint specs.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Safety and Red-Teaming
&lt;/h2&gt;

&lt;p&gt;OpenAI has run GPT-5.5 through third-party cyber/bio risk tests ahead of launch. Safeguards, especially for offensive security tasks, are tighter with each model.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;For developers:&lt;/strong&gt;  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Expect stricter refusal behavior on dual-use code.&lt;/li&gt;
&lt;li&gt;The API rollout is staged while new safeguards are finalized.&lt;/li&gt;
&lt;li&gt;If you’re building consumer-facing agents, plan for more restrictive default policies than 5.4.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Should You Switch?
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Decision matrix:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Coding agents:&lt;/strong&gt; Switch now. SWE-bench gains stack up, and Codex access is live.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;High-volume inference:&lt;/strong&gt; Stick with GPT-5.4-mini for defaults; use GPT-5.5 only for hard cases.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Consumer-facing products:&lt;/strong&gt; Wait for API GA, then A/B test. The price jump is real; only switch if hallucination reduction matters for your users.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Full decision flow: &lt;a href="http://apidog.com/blog/gpt-5-5-pricing?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;pricing breakdown&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Is GPT-5.5 available on the API?&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Not yet for direct key-based calls (as of April 23, 2026). It’s live in Codex after ChatGPT sign-in, so early testers have access.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What is the context window?&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
1M tokens in ChatGPT and (soon) the API; 400K tokens in Codex CLI.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How do Thinking and Pro differ?&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
&lt;em&gt;Thinking&lt;/em&gt; extends the reasoning budget on the standard model; &lt;em&gt;Pro&lt;/em&gt; is a separate, higher-accuracy variant for critical tasks. Pro is available on Pro, Business, and Enterprise only.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Is GPT-5.5 free?&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
No, only paid ChatGPT plans. Codex offers temporary free access for Free and Go plans (rate limits apply). See the &lt;a href="http://apidog.com/blog/how-to-use-gpt-5-5-api-for-free?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;free guide&lt;/a&gt; for no-cost usage.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Can I still use GPT-5.4?&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Yes. GPT-5.4 is still available and cheaper ($2.50/$15 per million tokens). For cost-sensitive pipelines, keep it as your default.&lt;/p&gt;

</description>
    </item>
    <item>
      <title>How to Use the Hy3 Preview API for Free ?</title>
      <dc:creator>Hassann</dc:creator>
      <pubDate>Thu, 23 Apr 2026 10:41:35 +0000</pubDate>
      <link>https://dev.to/hassann/how-to-use-the-hy3-preview-api-for-free--25on</link>
      <guid>https://dev.to/hassann/how-to-use-the-hy3-preview-api-for-free--25on</guid>
      <description>&lt;p&gt;Tencent open-sourced Hy3 Preview on April 22, 2026. Within a day, OpenRouter listed it as a fully free endpoint—no credit card, no metering, no trial window. You can call the same 295B-parameter Mixture-of-Experts model powering Tencent’s Yuanbao app and CodeBuddy assistant from your own code, for $0.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://apidog.com/?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation" class="crayons-btn crayons-btn--primary"&gt;Try Apidog today&lt;/a&gt;
&lt;/p&gt;

&lt;p&gt;This guide shows you how to use the Hy3 Preview API for free via OpenRouter, the Hugging Face Space, or by running the raw Hy3 repo yourself. It covers Hy3’s unique reasoning modes and details on testing the API in Apidog—no throwaway scripts required.&lt;/p&gt;

&lt;p&gt;If you want the fastest way to a working response, jump to “Step-by-step: call Hy3 Preview free on OpenRouter.”&lt;/p&gt;

&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Hy3 Preview is free on OpenRouter&lt;/strong&gt; under model ID &lt;code&gt;tencent/hy3-preview:free&lt;/code&gt;, with $0 input and output pricing.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Mixture-of-Experts model&lt;/strong&gt;: 295B total parameters, 21B active, 192 experts (top-8 routing), and a &lt;strong&gt;256K-token context window&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Three reasoning modes&lt;/strong&gt;: &lt;code&gt;no_think&lt;/code&gt; (fast), &lt;code&gt;low&lt;/code&gt;, and &lt;code&gt;high&lt;/code&gt; (deep chain-of-thought for coding/agent tasks).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Benchmarks&lt;/strong&gt;: SWE-bench Verified 74.4, Terminal-Bench 2.0 54.4, GPQA Diamond 87.2, MMLU 87.42.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Three free ways to run&lt;/strong&gt;: OpenRouter free tier, &lt;a href="https://huggingface.co/spaces/tencent/Hy3-preview" rel="noopener noreferrer"&gt;Hugging Face Hy3-preview Space&lt;/a&gt;, or local inference with vLLM and open weights.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Apidog&lt;/strong&gt; is a great fit with the OpenRouter endpoint: Hy3 uses the OpenAI Chat Completions schema; just point your request at OpenRouter.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What is Hy3 Preview?
&lt;/h2&gt;

&lt;p&gt;Hy3 Preview is Tencent’s first flagship release from their restructured Hunyuan foundation-model team, now led by Yao Shunyu (ex-OpenAI). It’s Tencent’s most capable model to date—a direct response to top Chinese open-weights models from DeepSeek, Alibaba, and Zhipu.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ft3okmwbravxkkwnbywfk.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ft3okmwbravxkkwnbywfk.png" alt="Hy3 Model Overview" width="800" height="535"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Technical highlights&lt;/strong&gt; (&lt;a href="https://huggingface.co/tencent/Hy3-preview" rel="noopener noreferrer"&gt;official model card&lt;/a&gt;):&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Architecture&lt;/strong&gt;: Mixture-of-Experts, 80 layers + one MTP layer, 64 attention heads with grouped-query attention.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Parameters&lt;/strong&gt;: 295B total, 21B active per forward pass.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Experts&lt;/strong&gt;: 192 specialists, top-8 routing per token.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Context&lt;/strong&gt;: 256K tokens (262,144 on OpenRouter).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tokenizer&lt;/strong&gt;: 120,832 entries, BF16 precision.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;License&lt;/strong&gt;: Tencent Hy Community License (commercial use allowed with conditions).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Agentic training and improved RL infrastructure enable strong results on SWE-bench Verified, Terminal-Bench 2.0, and code/shell tasks—close to top closed models.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fr9r1k4hiwnqmxfzcxkrv.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fr9r1k4hiwnqmxfzcxkrv.png" alt="Benchmarks" width="800" height="823"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Three free ways to use Hy3 Preview
&lt;/h2&gt;

&lt;p&gt;Choose based on your workflow:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Path&lt;/th&gt;
&lt;th&gt;What it is&lt;/th&gt;
&lt;th&gt;Free?&lt;/th&gt;
&lt;th&gt;Good for&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;OpenRouter &lt;code&gt;tencent/hy3-preview:free&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Hosted OpenAI-compatible API&lt;/td&gt;
&lt;td&gt;Yes, $0 in/out&lt;/td&gt;
&lt;td&gt;Agents, scripts, backend features&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Hugging Face Space&lt;/td&gt;
&lt;td&gt;Browser chat demo&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Quick prompts, smoke tests&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Self-hosted weights (vLLM)&lt;/td&gt;
&lt;td&gt;Run on your own GPUs&lt;/td&gt;
&lt;td&gt;Free software, hardware cost&lt;/td&gt;
&lt;td&gt;Privacy, high volume, custom work&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Most developers should start with OpenRouter—quick setup, generous free-tier rate limits, and OpenAI API compatibility.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step-by-step: call Hy3 Preview free on OpenRouter
&lt;/h2&gt;

&lt;p&gt;Minimal steps to your first free call:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fa447qs4ksvaydcv2u7nb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fa447qs4ksvaydcv2u7nb.png" alt="OpenRouter Hy3 Preview" width="800" height="493"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Create an OpenRouter account.&lt;/strong&gt; Sign up at &lt;a href="https://openrouter.ai" rel="noopener noreferrer"&gt;openrouter.ai&lt;/a&gt;. Email only; no payment required for free-tier models.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Generate an API key.&lt;/strong&gt; In your dashboard, go to “Keys,” create a new key, and export it (e.g., &lt;code&gt;export OPENROUTER_API_KEY=sk-or-...&lt;/code&gt;).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Open the model page.&lt;/strong&gt; Visit the &lt;a href="https://openrouter.ai/tencent/hy3-preview:free" rel="noopener noreferrer"&gt;Hy3 Preview free listing&lt;/a&gt;. Confirm the “Free” status and review usage stats (at launch: 6.81B prompt tokens/day).&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4z8fvz17rlpo92tt34x4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4z8fvz17rlpo92tt34x4.png" alt="Usage stats" width="800" height="208"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Send your first request.&lt;/strong&gt; OpenRouter uses the OpenAI Chat Completions schema. Any OpenAI SDK works; here’s a &lt;code&gt;curl&lt;/code&gt; example:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl https://openrouter.ai/api/v1/chat/completions &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Authorization: Bearer &lt;/span&gt;&lt;span class="nv"&gt;$OPENROUTER_API_KEY&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Content-Type: application/json"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'{
    "model": "tencent/hy3-preview:free",
    "messages": [
      {"role": "user", "content": "Explain the MoE routing decision inside a top-8 of 192 setup in 3 sentences."}
    ],
    "temperature": 0.9,
    "top_p": 1.0
  }'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Enable reasoning as needed.&lt;/strong&gt; Add a &lt;code&gt;reasoning&lt;/code&gt; parameter with &lt;code&gt;effort&lt;/code&gt; set to &lt;code&gt;low&lt;/code&gt; or &lt;code&gt;high&lt;/code&gt;. OpenRouter returns a &lt;code&gt;reasoning_details&lt;/code&gt; array:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"model"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"tencent/hy3-preview:free"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"messages"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"role"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"user"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"content"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Plan, then write a Bash script that rotates daily log files older than 30 days into a dated archive folder."&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"reasoning"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"effort"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"high"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Iterate.&lt;/strong&gt; Use the same thread to maintain context—Hy3’s 256K window can handle entire codebases.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;That’s it. The OpenRouter free-tier model is identical to the one published on Hugging Face; quality is not downgraded.&lt;/p&gt;

&lt;h2&gt;
  
  
  Free, Plus, and self-host: comparison
&lt;/h2&gt;

&lt;p&gt;Choose the right path for your use case:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Capability&lt;/th&gt;
&lt;th&gt;OpenRouter Free&lt;/th&gt;
&lt;th&gt;OpenRouter Paid&lt;/th&gt;
&lt;th&gt;Self-hosted (vLLM/SGLang)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Per-token cost&lt;/td&gt;
&lt;td&gt;$0&lt;/td&gt;
&lt;td&gt;Per provider&lt;/td&gt;
&lt;td&gt;Electricity + GPU amortization&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Reasoning modes&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;no_think&lt;/code&gt;, &lt;code&gt;low&lt;/code&gt;, &lt;code&gt;high&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Same&lt;/td&gt;
&lt;td&gt;Same&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Context length&lt;/td&gt;
&lt;td&gt;256K&lt;/td&gt;
&lt;td&gt;256K&lt;/td&gt;
&lt;td&gt;256K (memory permitting)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Throughput under load&lt;/td&gt;
&lt;td&gt;Shared, deprioritized&lt;/td&gt;
&lt;td&gt;Dedicated&lt;/td&gt;
&lt;td&gt;Your hardware&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Rate limits&lt;/td&gt;
&lt;td&gt;OpenRouter free cap&lt;/td&gt;
&lt;td&gt;Provider-specific&lt;/td&gt;
&lt;td&gt;None&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Data retention&lt;/td&gt;
&lt;td&gt;OpenRouter policy&lt;/td&gt;
&lt;td&gt;Provider-specific&lt;/td&gt;
&lt;td&gt;Stays on your hardware&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Reasoning token visibility&lt;/td&gt;
&lt;td&gt;Yes (&lt;code&gt;reasoning_details&lt;/code&gt;)&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Free is ideal for prototyping, benchmarks, and low-traffic agents. Paid/self-hosted is better for lower latency or if you exceed rate caps.&lt;/p&gt;

&lt;h2&gt;
  
  
  Prompt and parameter tips to maximize Hy3
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Match temperature to mode.&lt;/strong&gt; Use &lt;code&gt;temperature=0.9&lt;/code&gt;, &lt;code&gt;top_p=1.0&lt;/code&gt; for creative work; drop to &lt;code&gt;0.3&lt;/code&gt; for structured output.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Use &lt;code&gt;no_think&lt;/code&gt; for chat.&lt;/strong&gt; Default mode is fastest; use &lt;code&gt;low&lt;/code&gt;/&lt;code&gt;high&lt;/code&gt; only for planning, multi-step code, or math.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Name tools in the system prompt.&lt;/strong&gt; Even with OpenRouter, explicitly describe tools for better results.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Quote code.&lt;/strong&gt; Paste full files and then ask your question; don’t summarize.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Batch multi-file edits.&lt;/strong&gt; Hy3 performs best with all relevant files provided at once.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Ask for a plan first.&lt;/strong&gt; Use a two-step prompt: plan, then execute for better agentic task results.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Limits worth knowing
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Rate limits flex with load.&lt;/strong&gt; Free tier is shared; peak hours get 429s. Use retries with exponential backoff.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Reasoning tokens count as output.&lt;/strong&gt; On the free tier, &lt;code&gt;reasoning_details&lt;/code&gt; are free, but they’re billed on paid routes.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;License is not Apache 2.0.&lt;/strong&gt; Tencent Hy Community License allows commercial use with attribution and policy compliance—&lt;a href="https://github.com/Tencent-Hunyuan/Hy3-preview" rel="noopener noreferrer"&gt;read the full license&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tool calling requires correct parser.&lt;/strong&gt; For self-hosting, use vLLM/SGLang with &lt;code&gt;--tool-call-parser hy_v3&lt;/code&gt; (or &lt;code&gt;hunyuan&lt;/code&gt; for SGLang).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;English and Chinese best supported.&lt;/strong&gt; Other languages work, but quality drops.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Trailing US flagship models on some reasoning benchmarks.&lt;/strong&gt; Hy3 is top-tier for Chinese models, but OpenAI/Google DeepMind still lead on the hardest tasks.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The developer fast path: Hy3 Preview plus Apidog
&lt;/h2&gt;

&lt;p&gt;Command-line &lt;code&gt;curl&lt;/code&gt; works for demos, but for real iteration use a visual API client.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Open Apidog&lt;/strong&gt; and create a new project. Import the OpenAI Chat Completions OpenAPI spec.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Set base URL&lt;/strong&gt; to &lt;code&gt;https://openrouter.ai/api/v1&lt;/code&gt; and add an environment variable for &lt;code&gt;OPENROUTER_API_KEY&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Create a request&lt;/strong&gt; to &lt;code&gt;/chat/completions&lt;/code&gt; with &lt;code&gt;model&lt;/code&gt; set to &lt;code&gt;tencent/hy3-preview:free&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Fork requests&lt;/strong&gt; to compare reasoning modes. Duplicate and tweak one parameter to run side by side (&lt;code&gt;no_think&lt;/code&gt;, &lt;code&gt;low&lt;/code&gt;, &lt;code&gt;high&lt;/code&gt;).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Save prompt templates.&lt;/strong&gt; Use Apidog environments and variables to manage system prompts, schemas, and user turns for reuse.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;If you’re migrating from Postman, see the &lt;a href="http://apidog.com/blog/api-testing-without-postman-2026?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;API testing without Postman in 2026 guide&lt;/a&gt;. Prefer VS Code? Use &lt;a href="http://apidog.com/blog/how-to-use-apidog-inside-vscode?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;Apidog inside VS Code&lt;/a&gt; to keep prompt tuning next to your code.&lt;/p&gt;

&lt;h2&gt;
  
  
  Free alternatives when you hit the cap
&lt;/h2&gt;

&lt;p&gt;If OpenRouter’s free pool is throttled, try:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Hugging Face Space:&lt;/strong&gt; &lt;a href="https://huggingface.co/spaces/tencent/Hy3-preview" rel="noopener noreferrer"&gt;Hy3-preview Space&lt;/a&gt; for browser chat. Not scriptable, but free and good for comparisons.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Other free Chinese open-weights models:&lt;/strong&gt; Alibaba’s Qwen 3.5 Omni and Zhipu GLM 5V Turbo have generous free tiers:

&lt;ul&gt;
&lt;li&gt;&lt;a href="http://apidog.com/blog/qwen-3-5-omni-announcement?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;Qwen 3.5 Omni announcement&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="http://apidog.com/blog/how-to-use-qwen-3-5-omni?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;Qwen 3.5 Omni how-to&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="http://apidog.com/blog/glm-5v-turbo-api-guide?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;GLM 5V Turbo API guide&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;These don’t match Hy3’s coding scores, but cover chat/multilingual/multimodal use cases. For production, set up one Apidog collection per model and benchmark on your actual prompts.&lt;/p&gt;

&lt;h2&gt;
  
  
  Self-hosting Hy3 Preview with vLLM
&lt;/h2&gt;

&lt;p&gt;If you have sufficient hardware, local inference is an option. Hy3’s model card recommends vLLM with tensor parallelism (8-way) and multi-token prediction:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;vllm serve tencent/Hy3-preview &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--tensor-parallel-size&lt;/span&gt; 8 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--speculative-config&lt;/span&gt;.method mtp &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--speculative-config&lt;/span&gt;.num_speculative_tokens 1 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--tool-call-parser&lt;/span&gt; hy_v3 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--reasoning-parser&lt;/span&gt; hy_v3 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--enable-auto-tool-choice&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--served-model-name&lt;/span&gt; hy3-preview
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;SGLang: use &lt;code&gt;--tool-call-parser hunyuan&lt;/code&gt; and &lt;code&gt;--reasoning-parser hunyuan&lt;/code&gt;. Once the server is up at &lt;code&gt;http://localhost:8000/v1&lt;/code&gt;, point any OpenAI SDK to it as you would OpenRouter—just update the base URL and key.&lt;/p&gt;

&lt;p&gt;Expect to need eight H100-class GPUs (BF16) for the full model. Quantized versions may appear later, but full precision is currently required.&lt;/p&gt;

&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Is Hy3 Preview free?&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Yes. OpenRouter’s &lt;code&gt;tencent/hy3-preview:free&lt;/code&gt; is $0 per million input/output tokens. Reasoning tokens are free on the free tier but count toward rate limits. Confirm current status on the &lt;a href="https://openrouter.ai/tencent/hy3-preview:free" rel="noopener noreferrer"&gt;OpenRouter model page&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How does Hy3 Preview compare to DeepSeek V3 and Qwen 3?&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Hy3’s SWE-bench Verified (74.4) and Terminal-Bench 2.0 (54.4) scores put it with the best Chinese open models, with a strong agent/tool use focus. For chat, Qwen 3 and DeepSeek V3 are competitive; for agent/coding workflows, Hy3’s RL-trained tool use stands out.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What are Hy3’s reasoning modes?&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Three: &lt;code&gt;no_think&lt;/code&gt; (default), &lt;code&gt;low&lt;/code&gt;, and &lt;code&gt;high&lt;/code&gt;. Set via the &lt;code&gt;reasoning&lt;/code&gt; parameter (OpenRouter) or &lt;code&gt;chat_template_kwargs={"reasoning_effort": "high"}&lt;/code&gt; (direct model call). Use &lt;code&gt;high&lt;/code&gt; for planning/multi-step code/math; default to off for chat.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Can I use Hy3 Preview commercially?&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Yes, under the Tencent Hy Community License. Commercial use requires attribution and compliance—&lt;a href="https://github.com/Tencent-Hunyuan/Hy3-preview" rel="noopener noreferrer"&gt;read full terms&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What context length does the free tier support?&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
256K tokens (262,144). Paste entire codebases and still have room for tool schemas and history.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How do I test Hy3 Preview without coding?&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Use the &lt;a href="https://huggingface.co/spaces/tencent/Hy3-preview" rel="noopener noreferrer"&gt;Hugging Face Space&lt;/a&gt; for browser chat or point &lt;a href="https://apidog.com/?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;Apidog&lt;/a&gt; at the OpenRouter endpoint. Apidog supports the OpenAI OpenAPI spec—just set base URL, API key, and model name.&lt;/p&gt;

</description>
    </item>
    <item>
      <title>Best Prediction Market API for 2026</title>
      <dc:creator>Hassann</dc:creator>
      <pubDate>Thu, 23 Apr 2026 09:45:49 +0000</pubDate>
      <link>https://dev.to/hassann/best-prediction-market-api-for-2026-1kig</link>
      <guid>https://dev.to/hassann/best-prediction-market-api-for-2026-1kig</guid>
      <description>&lt;p&gt;Prediction markets let developers and traders bet on real-world outcomes—elections, Fed decisions, crypto prices, and more. By 2026, these markets are a major data source, with Polymarket alone handling billions in volume. If you’re building trading bots, dashboards, forecasting tools, or news products, you need a reliable prediction market API feed.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://apidog.com/?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation" class="crayons-btn crayons-btn--primary"&gt;Try Apidog today&lt;/a&gt;
&lt;/p&gt;

&lt;p&gt;The challenge: every API differs. Some use on-chain WebSockets, others offer CFTC-regulated REST endpoints, and a few are built for play money prototyping. Picking the wrong API can waste weeks due to mismatched auth flows, rate limits, or unsupported data formats.&lt;/p&gt;

&lt;p&gt;This guide compares the top prediction market APIs for 2026, explains their strengths, and shows how to evaluate and test them with &lt;a href="https://apidog.com?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;Apidog&lt;/a&gt;. Official docs are spread across &lt;a href="https://docs.polymarket.com/" rel="noopener noreferrer"&gt;Polymarket&lt;/a&gt;, &lt;a href="https://docs.kalshi.com/" rel="noopener noreferrer"&gt;Kalshi&lt;/a&gt;, and &lt;a href="https://docs.manifold.markets/api" rel="noopener noreferrer"&gt;Manifold Markets&lt;/a&gt;. If you’re working on-chain, see our guide to the &lt;a href="http://apidog.com/blog/best-crypto-wallet-api?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;best crypto wallet API&lt;/a&gt; for Polymarket and Augur integrations.&lt;/p&gt;

&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Polymarket&lt;/strong&gt;: Deepest liquidity, on-chain CLOB and Gamma APIs. Ideal for high-volume trading and election data.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Kalshi&lt;/strong&gt;: Only CFTC-regulated US event exchange. REST and WebSocket APIs; requires KYC for trading.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Manifold Markets&lt;/strong&gt;: Play-money, clean REST API. Great for prototyping, research, and learning.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Augur v2&lt;/strong&gt;: Runs on Ethereum via subgraph. Niche, low volume, fully decentralized.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;PredictIt&lt;/strong&gt;: Public read-only feed, strict rate limits. Use for data, not trading.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Metaculus&lt;/strong&gt;: REST API for research and aggregated forecasts. No trading.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What to Look for in a Prediction Market API
&lt;/h2&gt;

&lt;p&gt;Before integrating, evaluate these 7 criteria:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Market coverage&lt;/strong&gt;: Does it cover politics, sports, crypto, and more, or just one category? (Polymarket/Kalshi = broad; PredictIt = US politics only.)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Liquidity &amp;amp; volume data&lt;/strong&gt;: Need endpoints for 24h volume, open interest, resolver status.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Real-time feeds&lt;/strong&gt;: REST polling is limited. Prefer APIs with WebSocket support for live data.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Historical data&lt;/strong&gt;: Backtesting requires long-term, minute/tick-level data. Some APIs limit free access to 30 days.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Regulated status&lt;/strong&gt;: US users need CFTC-regulated APIs (Kalshi) or must handle IP restrictions.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Auth model&lt;/strong&gt;: Trading often requires API keys, wallet signatures, or KYC.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Rate limits &amp;amp; SDKs&lt;/strong&gt;: Know the per-IP or per-key limits. (Polymarket: ~50 req/sec; Kalshi: tiered; Manifold: relaxed.)&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Comparison Table
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Provider&lt;/th&gt;
&lt;th&gt;Type&lt;/th&gt;
&lt;th&gt;API style&lt;/th&gt;
&lt;th&gt;Trading auth&lt;/th&gt;
&lt;th&gt;Best for&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Polymarket&lt;/td&gt;
&lt;td&gt;Decentralized, on-chain (Polygon)&lt;/td&gt;
&lt;td&gt;REST (CLOB, Gamma) + WebSocket&lt;/td&gt;
&lt;td&gt;EIP-712 wallet signature&lt;/td&gt;
&lt;td&gt;High-volume crypto-native trading and election data&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Kalshi&lt;/td&gt;
&lt;td&gt;CFTC-regulated US exchange&lt;/td&gt;
&lt;td&gt;REST + WebSocket&lt;/td&gt;
&lt;td&gt;Email/password + API key, KYC&lt;/td&gt;
&lt;td&gt;US-compliant event contracts and regulated products&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Manifold Markets&lt;/td&gt;
&lt;td&gt;Play-money social market&lt;/td&gt;
&lt;td&gt;REST (clean JSON)&lt;/td&gt;
&lt;td&gt;API key&lt;/td&gt;
&lt;td&gt;Prototyping, research, teaching&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Augur v2&lt;/td&gt;
&lt;td&gt;Decentralized (Ethereum)&lt;/td&gt;
&lt;td&gt;The Graph subgraph + contracts&lt;/td&gt;
&lt;td&gt;Wallet signature&lt;/td&gt;
&lt;td&gt;Fully decentralized, censorship-resistant markets&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;PredictIt&lt;/td&gt;
&lt;td&gt;Regulated US political market&lt;/td&gt;
&lt;td&gt;Public JSON feed (read)&lt;/td&gt;
&lt;td&gt;No public trading API&lt;/td&gt;
&lt;td&gt;Historical US political sentiment data&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Metaculus&lt;/td&gt;
&lt;td&gt;Forecasting research platform&lt;/td&gt;
&lt;td&gt;REST&lt;/td&gt;
&lt;td&gt;Token auth&lt;/td&gt;
&lt;td&gt;Aggregated expert forecasts, research datasets&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Top Prediction Market API Providers
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Polymarket (CLOB and Gamma)
&lt;/h3&gt;

&lt;p&gt;Polymarket is the largest decentralized prediction market (Polygon, USDC). It exposes two APIs:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;CLOB API&lt;/strong&gt;: Orderbook data, trades, order placement.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Gamma API&lt;/strong&gt;: Market metadata, event grouping, category browsing.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;WebSocket&lt;/strong&gt;: Real-time trades and orderbook updates.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Read&lt;/strong&gt; endpoints are open. &lt;strong&gt;Trading&lt;/strong&gt; requires EIP-712 signatures from a Polygon wallet. Use the Polymarket SDK to generate signatures, and integrate wallet providers like Privy or MetaMask. See our guides on &lt;a href="http://apidog.com/blog/how-to-use-privy-api?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;using the Privy API&lt;/a&gt; and &lt;a href="http://apidog.com/blog/how-to-use-metamask-api?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;MetaMask API&lt;/a&gt; for implementation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Best for:&lt;/strong&gt; High-volume trading, election markets, on-chain teams.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Example: Get Polymarket markets&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl https://clob.polymarket.com/api/markets
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Kalshi
&lt;/h3&gt;

&lt;p&gt;Kalshi is a CFTC-regulated US event exchange. It covers macro (CPI, Fed), weather, politics, sports, and entertainment. The API uses REST and WebSocket; &lt;a href="https://docs.kalshi.com/" rel="noopener noreferrer"&gt;docs here&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Trading&lt;/strong&gt; requires a verified account (KYC) and API keys. Orderbooks and trades need authentication. The API enforces per-tier rate limits and token refresh flows, so handle token caching and rotation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Best for:&lt;/strong&gt; US-compliant trading, macro and sports automation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Example: Get Kalshi markets&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Authorization: Bearer {TOKEN}"&lt;/span&gt; https://trading-api.kalshi.com/trade-api/v2/markets
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Manifold Markets
&lt;/h3&gt;

&lt;p&gt;Manifold uses “mana” (play money), so you can prototype with zero risk. The REST API (&lt;a href="https://docs.manifold.markets/api" rel="noopener noreferrer"&gt;docs&lt;/a&gt;) is straightforward. Read markets, comments, and profiles without auth. Betting requires an API key; creating a test account is simple.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Best for:&lt;/strong&gt; Prototyping, tutorials, student projects, hackathons.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Example: Get Manifold markets&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl https://api.manifold.markets/v0/markets
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Augur v2
&lt;/h3&gt;

&lt;p&gt;Augur is a decentralized Ethereum prediction market (REP token). Most access is via &lt;a href="https://thegraph.com/" rel="noopener noreferrer"&gt;The Graph&lt;/a&gt; subgraph (GraphQL). For trading, interact with smart contracts directly.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Developer notes:&lt;/strong&gt; Documentation is sparse, markets are slow, and Ethereum gas costs apply. Integrate with an Ethereum node provider; see our &lt;a href="http://apidog.com/blog/how-to-use-alchemy-api?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;Alchemy API guide&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Best for:&lt;/strong&gt; Decentralized apps, censorship resistance, research.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Example: Query Augur with GraphQL&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight graphql"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="n"&gt;markets&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;first&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="n"&gt;outcomes&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="n"&gt;price&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  PredictIt
&lt;/h3&gt;

&lt;p&gt;PredictIt is a regulated US political market. No trading API, but a public JSON feed is available at:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;https://www.predictit.org/api/marketdata/all/
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Usage:&lt;/strong&gt; Consume as a read-only data source. Rate limits are strict; implement caching and exponential backoff on 429 errors.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Best for:&lt;/strong&gt; US political price data, sentiment research.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Example: Fetch PredictIt data&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl https://www.predictit.org/api/marketdata/all/
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Metaculus
&lt;/h3&gt;

&lt;p&gt;Metaculus is a forecasting platform—no trading, but high-quality aggregated predictions. The REST API (&lt;a href="https://www.metaculus.com/api/" rel="noopener noreferrer"&gt;docs&lt;/a&gt;) returns questions, predictions, and history using token auth.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Best for:&lt;/strong&gt; Research, dashboards, academic datasets.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Example: Get Metaculus questions&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl https://www.metaculus.com/api2/questions/
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  How to Choose
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Compliance&lt;/strong&gt;: For US retail and real money, pick Kalshi. For crypto-native with wallet auth, pick Polymarket.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Coverage&lt;/strong&gt;: Elections—Polymarket (liquidity), Kalshi (wider contract list). Crypto—Polymarket. Play money/education—Manifold.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Stack fit&lt;/strong&gt;: On-chain teams: Polymarket, Augur. REST-first teams: Kalshi.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Backtesting/data&lt;/strong&gt;: PredictIt (US politics), Metaculus (aggregated forecasts).&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Testing Prediction Market APIs with Apidog
&lt;/h2&gt;

&lt;p&gt;Each API uses different auth and flows. Kalshi rotates login tokens, Polymarket requires EIP-712 signatures, Manifold uses API keys, Metaculus uses token auth. Managing these in Postman is messy.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://apidog.com?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;Apidog&lt;/a&gt; provides a unified workspace for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Importing OpenAPI specs&lt;/li&gt;
&lt;li&gt;Attaching environment-specific auth profiles&lt;/li&gt;
&lt;li&gt;Chaining requests into test scenarios&lt;/li&gt;
&lt;li&gt;Mocking API responses&lt;/li&gt;
&lt;li&gt;Running pre-request flows (e.g., Kalshi login)&lt;/li&gt;
&lt;li&gt;Diffing payloads to catch schema changes&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For multi-venue dashboards or bots, Apidog can save hours over juggling curl, Postman, and scripts. &lt;a href="https://apidog.com/download?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;Download Apidog&lt;/a&gt; and import the Polymarket/Kalshi OpenAPI spec to get started.&lt;/p&gt;

&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Which prediction market has the most liquidity in 2026?&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Polymarket leads, especially for elections and macro events. Kalshi is second, growing on US-regulated contracts.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Can I trade Polymarket from the US?&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
No—Polymarket blocks US IPs for trading, per CFTC settlement. Market data is still accessible. Use Kalshi for US-compliant trading.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Do I need a crypto wallet for Polymarket’s API?&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Yes, for trading. Read endpoints are public, but placing/canceling orders needs EIP-712 signatures from a Polygon wallet. See our &lt;a href="http://apidog.com/blog/how-to-use-metamask-api?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;MetaMask API guide&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Is there a free prediction market API for learning?&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Manifold Markets is fully free (play money). Metaculus is also free for read access.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Kalshi vs. Polymarket for developers?&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Kalshi: REST/WebSocket, email auth, CFTC-compliant.&lt;br&gt;&lt;br&gt;
Polymarket: On-chain, wallet signatures, higher liquidity, no US retail trading. Choose based on jurisdiction and settlement needs.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How do I avoid rate limits while backtesting?&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Cache snapshots locally, respect 429s with exponential backoff, and batch WebSocket subscriptions if possible. For more, see &lt;a href="http://apidog.com/blog/api-testing-without-postman-2026?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;API testing without Postman in 2026&lt;/a&gt;.&lt;/p&gt;

</description>
    </item>
  </channel>
</rss>
