<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Novita AI</title>
    <description>The latest articles on DEV Community by Novita AI (@novita_ai).</description>
    <link>https://dev.to/novita_ai</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1183161%2F844aefc6-6de4-4095-92b6-6cc3eb4d8d2d.png</url>
      <title>DEV Community: Novita AI</title>
      <link>https://dev.to/novita_ai</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/novita_ai"/>
    <language>en</language>
    <item>
      <title>Best Text-to-Speech APIs in 2026: 8 Providers Compared</title>
      <dc:creator>Novita AI</dc:creator>
      <pubDate>Wed, 06 May 2026 14:12:21 +0000</pubDate>
      <link>https://dev.to/novita_ai/best-text-to-speech-apis-in-2026-8-providers-compared-5edg</link>
      <guid>https://dev.to/novita_ai/best-text-to-speech-apis-in-2026-8-providers-compared-5edg</guid>
      <description></description>
      <category>llm</category>
      <category>ai</category>
    </item>
    <item>
      <title>Kimi K2.6 on Novita AI: API Pricing ($0.95/$4.00), SWE-Bench &amp; Agentic Coding</title>
      <dc:creator>Novita AI</dc:creator>
      <pubDate>Wed, 29 Apr 2026 14:59:50 +0000</pubDate>
      <link>https://dev.to/novita_ai/kimi-k26-on-novita-ai-api-pricing-095400-swe-bench-agentic-coding-2644</link>
      <guid>https://dev.to/novita_ai/kimi-k26-on-novita-ai-api-pricing-095400-swe-bench-agentic-coding-2644</guid>
      <description>&lt;h1&gt;
  
  
  Kimi K2.6: Open-Source Agent for 13-Hour Coding Sessions
&lt;/h1&gt;

&lt;p&gt;Your coding agent halts after 20 minutes, burns through context, and leaves you with a half-finished PR. You switch to a closed frontier model — it lasts longer but costs 5× more per run. Kimi K2.6, Moonshot AI's newly open-sourced model, is built specifically to break that trade-off. Across 4,000+ tool calls and 13-hour autonomous sessions, it delivered 58.6% on SWE-Bench Pro — edging out GPT-5.4 (57.7%) and outperforming Claude Opus 4.6 (53.4%) — at a fraction of the closed-model price. &lt;em&gt;(Benchmarks sourced from &lt;a href="https://www.kimi.com/blog/kimi-k2-6.html" rel="noopener noreferrer"&gt;kimi.com/blog/kimi-k2-6&lt;/a&gt;.)&lt;/em&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;TL;DR&lt;/strong&gt;: Kimi K2.6 is now on Novita AI. 1T MoE open-source model, 256K context, 58.6% SWE-Bench Pro — built for long-horizon agentic coding. Try free via OpenAI-compatible API.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Kimi K2.6 is now available on Novita AI via OpenAI-compatible API.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;In short:&lt;/strong&gt; Kimi K2.6 is a 1-trillion-parameter open-source MoE model (32B activated) from Moonshot AI, specialized for agentic coding, long-horizon task execution, and multi-agent coordination — with a 256K context window and OpenAI-compatible API access on Novita AI.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://novita.ai/models?q=kimi" rel="noopener noreferrer"&gt;Try Kimi K2.6 on Novita AI →&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What Is Kimi K2.6?
&lt;/h2&gt;

&lt;p&gt;Kimi K2.6 is an open-source, native multimodal agentic model released by Moonshot AI in April 2026. It is a direct evolution of Kimi K2.5 — the same MoE architecture, now significantly improved for real-world long-horizon tasks, coding-driven UI generation, and coordinated multi-agent execution.&lt;/p&gt;

&lt;p&gt;At its core, K2.6 is a 1-trillion-parameter Mixture-of-Experts (MoE) model with only 32B parameters activated per token — giving it frontier-class reasoning at compute costs closer to a dense 30B model. The architecture uses Multi-head Latent Attention (MLA), SwiGLU activations, 384 experts with 8 selected per token, and a 256K-token context window. The model is released under a modified MIT license.&lt;/p&gt;

&lt;p&gt;Key capabilities at a glance:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Long-horizon coding&lt;/strong&gt; — sustained autonomous execution across hours and thousands of tool calls&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Multi-language generalization&lt;/strong&gt; — strong performance in Rust, Go, Python, and niche languages like Zig&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Coding-driven design&lt;/strong&gt; — turns prompts and visual inputs into production-ready front-end interfaces&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Agent Swarm scaling&lt;/strong&gt; — coordinates up to 300 sub-agents across 4,000 parallel steps&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Native multimodal&lt;/strong&gt; — processes images and text natively via the MoonViT vision encoder&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Function calling &amp;amp; structured output&lt;/strong&gt; — OpenAI-compatible tool use, ideal for building agent pipelines and RAG systems&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What Makes Kimi K2.6 Different from Other Open-Source Models?
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Long-Horizon Coding
&lt;/h3&gt;

&lt;p&gt;Most LLMs degrade after a few hundred tool calls. K2.6 was explicitly trained for multi-hour, multi-thousand-call sessions. In one benchmark task, it deployed a local Qwen3.5-0.8B model on a Mac, rewrote its inference engine in Zig over 12 hours and 4,000+ tool calls, and improved throughput from ~15 to ~193 tokens/sec — roughly 20% faster than LM Studio. In another, it autonomously refactored an 8-year-old financial matching engine (exchange-core) across a 13-hour session, executing 12 optimization strategies and modifying 4,000+ lines of code for a 185% throughput gain.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fblogs.novita.ai%2Fwp-content%2Fuploads%2F2026%2F04%2Fkimi-code-bench.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fblogs.novita.ai%2Fwp-content%2Fuploads%2F2026%2F04%2Fkimi-code-bench.webp" alt="Kimi Code Bench: K2.6 (68.2) vs K2.5 (57.4) coding performance" width="800" height="364"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Kimi Code Bench: K2.6 scores 68.2 vs K2.5's 57.4 (+19%). [Source: &lt;a href="https://www.kimi.com/blog/kimi-k2-6.html" rel="noopener noreferrer"&gt;Kimi Official Blog&lt;/a&gt;]&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;According to Moonshot AI's &lt;a href="https://www.kimi.com/blog/kimi-k2-6.html" rel="noopener noreferrer"&gt;launch blog&lt;/a&gt;, beta partners including Baseten, Blackbox.ai, Factory.ai, and Fireworks.ai noted that K2.6 maintains "architectural integrity over extended coding sessions" and surfaces "non-obvious bugs that would normally take significant developer time to uncover."&lt;/p&gt;

&lt;h3&gt;
  
  
  Coding-Driven Design
&lt;/h3&gt;

&lt;p&gt;K2.6 can generate structured front-end layouts, interactive elements, scroll-triggered animations, and lightweight full-stack workflows — authentication, session management, database operations — from a simple text or image prompt. Moonshot AI's internal Kimi Design Bench, covering Visual Input Tasks, Landing Page Construction, Full-Stack App Development, and General Creative Programming, shows K2.6 competitive with Google AI Studio across all four categories.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fblogs.novita.ai%2Fwp-content%2Fuploads%2F2026%2F04%2Fkimi-design-bench.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fblogs.novita.ai%2Fwp-content%2Fuploads%2F2026%2F04%2Fkimi-design-bench.webp" alt="Kimi Design Bench: K2.6 (47.5%) vs Google AI Studio (31.4%)" width="800" height="383"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Kimi Design Bench: K2.6 (47.5%) outperforms Google AI Studio (31.4%) on UI generation tasks. [Source: &lt;a href="https://www.kimi.com/blog/kimi-k2-6.html" rel="noopener noreferrer"&gt;Kimi Official Blog&lt;/a&gt;]&lt;/em&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Elevated Agent Swarm
&lt;/h3&gt;

&lt;p&gt;K2.6 scales the agent swarm architecture from K2.5's 100 sub-agents / 1,500 steps to &lt;strong&gt;300 sub-agents executing across 4,000 coordinated steps simultaneously&lt;/strong&gt;. The coordinator dynamically assigns tasks to agents based on skill profiles, detects failures, reassigns work, and manages the full lifecycle from initiation to validation. Outputs span documents, websites, slides, and spreadsheets — produced in a single autonomous run. Moonshot AI's own marketing team uses a K2.6-backed Claw Group internally, with specialized agents for demo creation, benchmarking, social media, and video production all coordinated by K2.6.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fblogs.novita.ai%2Fwp-content%2Fuploads%2F2026%2F04%2Fkimi-claw-bench.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fblogs.novita.ai%2Fwp-content%2Fuploads%2F2026%2F04%2Fkimi-claw-bench.webp" alt="Kimi Claw Bench: K2.6 (65.5) vs K2.5 (59.6) agent task completion" width="800" height="364"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Kimi Claw Bench: K2.6 scores 65.5 vs K2.5's 59.6 (+9.9%) on multi-step agent tasks. [Source: &lt;a href="https://www.kimi.com/blog/kimi-k2-6.html" rel="noopener noreferrer"&gt;Kimi Official Blog&lt;/a&gt;]&lt;/em&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Proactive Background Agents
&lt;/h3&gt;

&lt;p&gt;One of the more striking K2.6 use cases from Moonshot's own RL infrastructure team: a K2.6-backed agent ran autonomously for &lt;strong&gt;5 days&lt;/strong&gt;, handling monitoring, incident response, and system operations — persistent context, multi-threaded task management, and full-cycle execution from alert to resolution, without human intervention. This kind of persistent, 24/7 background agent is a specific design target for K2.6.&lt;/p&gt;

&lt;h2&gt;
  
  
  How Does Kimi K2.6 Perform on Agentic Coding Benchmarks?
&lt;/h2&gt;

&lt;p&gt;K2.6 competes directly with top closed models. It leads on the benchmarks most relevant to agentic coding workflows:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Coding Benchmarks&lt;/strong&gt; &lt;em&gt;(Last verified: 2026-04-21, source: kimi.com/blog/kimi-k2-6)&lt;/em&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Benchmark&lt;/th&gt;
&lt;th&gt;Kimi K2.6&lt;/th&gt;
&lt;th&gt;GPT-5.4 (xhigh)&lt;/th&gt;
&lt;th&gt;Claude Opus 4.6 (max)&lt;/th&gt;
&lt;th&gt;Gemini 3.1 Pro (thinking)&lt;/th&gt;
&lt;th&gt;Kimi K2.5&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;SWE-Bench Pro&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;58.6&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;57.7&lt;/td&gt;
&lt;td&gt;53.4&lt;/td&gt;
&lt;td&gt;54.2&lt;/td&gt;
&lt;td&gt;50.7&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;SWE-Bench Verified&lt;/td&gt;
&lt;td&gt;80.2&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;80.8&lt;/td&gt;
&lt;td&gt;80.6&lt;/td&gt;
&lt;td&gt;76.8&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;SWE-Bench Multilingual&lt;/td&gt;
&lt;td&gt;76.7&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;77.8&lt;/td&gt;
&lt;td&gt;76.9&lt;/td&gt;
&lt;td&gt;73.0&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Terminal-Bench 2.0&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;66.7&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;65.4&lt;/td&gt;
&lt;td&gt;65.4&lt;/td&gt;
&lt;td&gt;68.5&lt;/td&gt;
&lt;td&gt;50.8&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;LiveCodeBench (v6)&lt;/td&gt;
&lt;td&gt;89.6&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;88.8&lt;/td&gt;
&lt;td&gt;91.7&lt;/td&gt;
&lt;td&gt;85.0&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Agentic Benchmarks&lt;/strong&gt; &lt;em&gt;(Last verified: 2026-04-21)&lt;/em&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Benchmark&lt;/th&gt;
&lt;th&gt;Kimi K2.6&lt;/th&gt;
&lt;th&gt;GPT-5.4 (xhigh)&lt;/th&gt;
&lt;th&gt;Claude Opus 4.6 (max)&lt;/th&gt;
&lt;th&gt;Gemini 3.1 Pro&lt;/th&gt;
&lt;th&gt;Kimi K2.5&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;HLE-Full w/ tools&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;54.0&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;52.1&lt;/td&gt;
&lt;td&gt;53.0&lt;/td&gt;
&lt;td&gt;51.4&lt;/td&gt;
&lt;td&gt;50.2&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;DeepSearchQA (f1-score)&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;92.5&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;78.6&lt;/td&gt;
&lt;td&gt;91.3&lt;/td&gt;
&lt;td&gt;81.9&lt;/td&gt;
&lt;td&gt;89.0&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;BrowseComp&lt;/td&gt;
&lt;td&gt;83.2&lt;/td&gt;
&lt;td&gt;82.7&lt;/td&gt;
&lt;td&gt;83.7&lt;/td&gt;
&lt;td&gt;85.9&lt;/td&gt;
&lt;td&gt;74.9&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;OSWorld-Verified&lt;/td&gt;
&lt;td&gt;73.1&lt;/td&gt;
&lt;td&gt;75.0&lt;/td&gt;
&lt;td&gt;72.7&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;63.3&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Toolathlon&lt;/td&gt;
&lt;td&gt;50.0&lt;/td&gt;
&lt;td&gt;54.6&lt;/td&gt;
&lt;td&gt;47.2&lt;/td&gt;
&lt;td&gt;48.8&lt;/td&gt;
&lt;td&gt;27.8&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The headline: K2.6 leads all models on SWE-Bench Pro (58.6%) and outperforms GPT-5.4 and Claude Opus 4.6 on Terminal-Bench 2.0 and DeepSearchQA by a notable margin. Gemini 3.1 Pro edges it on Terminal-Bench (68.5 vs. 66.7) and LiveCodeBench. Its reasoning scores (AIME 2026: 96.4%, GPQA-Diamond: 90.5%) are competitive but trail Gemini and GPT-5.4 — this is a coding-first model, not a math olympiad specialist.&lt;/p&gt;

&lt;h2&gt;
  
  
  How to Use Kimi K2.6 on Novita AI
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Option 1: Playground
&lt;/h3&gt;

&lt;p&gt;Navigate to &lt;a href="https://novita.ai/models/model-detail/moonshotai-kimi-k2.6" rel="noopener noreferrer"&gt;Kimi K2.6 on Novita AI&lt;/a&gt; and click &lt;strong&gt;Try in Playground&lt;/strong&gt;. No API key needed to start.&lt;/p&gt;

&lt;h3&gt;
  
  
  Option 2: API (Python)
&lt;/h3&gt;

&lt;p&gt;Kimi K2.6 is fully OpenAI-compatible. Swap in the Novita base URL and your API key:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;openai
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OpenAI&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;YOUR_NOVITA_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;base_url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://api.novita.ai/v3/openai&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;moonshotai/kimi-k2.6&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;system&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You are a helpful assistant.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Your prompt here&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;max_tokens&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;8192&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;temperature&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.7&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Get your API key at &lt;a href="https://novita.ai/settings" rel="noopener noreferrer"&gt;novita.ai/settings&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Option 3: Third-Party Tools
&lt;/h3&gt;

&lt;p&gt;Because Novita's API is OpenAI-compatible, Kimi K2.6 works out of the box with LangChain, LlamaIndex, OpenWebUI, and coding assistants like Cursor or Continue. Point the base URL to &lt;code&gt;https://api.novita.ai/v3/openai&lt;/code&gt; and set the model name to &lt;code&gt;moonshotai/kimi-k2.6&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  When Should You Use Kimi K2.6 Instead of GPT-4o or Claude?
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Scenario 1: Long-Running Engineering Agents
&lt;/h3&gt;

&lt;p&gt;K2.6 is well-suited for long-running engineering agents — legacy codebase refactoring, CI/CD pipeline debugging, and infrastructure optimization. Its Kimi Code Bench results and the exchange-core case study show it maintains task coherence across thousands of tool calls without drifting from the original objective.&lt;/p&gt;

&lt;h3&gt;
  
  
  Scenario 2: Design-to-Code Pipelines
&lt;/h3&gt;

&lt;p&gt;Designers drop a mockup; K2.6 produces a working React/HTML/CSS implementation with animations and responsive layouts. The model's native multimodal input (via MoonViT) means it processes the image reference directly rather than relying on a verbal description. This makes it a strong backbone for AI-assisted UI generation workflows.&lt;/p&gt;

&lt;h3&gt;
  
  
  Scenario 3: Multi-Agent Orchestration
&lt;/h3&gt;

&lt;p&gt;When you need to coordinate specialized agents in parallel — one scraping data, another writing analysis, a third formatting output — K2.6 acts as the coordinator layer. Its 300-agent / 4,000-step architecture makes it a practical choice for content pipelines, research workflows, or any task where parallel specialization reduces latency compared to sequential single-agent runs.&lt;/p&gt;

&lt;h3&gt;
  
  
  Scenario 4: Migrating from Claude or GPT-4o Agent Pipelines
&lt;/h3&gt;

&lt;p&gt;If you're running agentic coding workflows on Claude Opus or GPT-4o and looking to cut costs without sacrificing reliability, K2.6 is a strong open-source drop-in. Its SWE-Bench Pro score (58.6%) exceeds both Claude Opus 4.6 (53.4%) and GPT-5.4 (57.7%) on the same benchmark. The OpenAI-compatible API means migration is a one-line change.&lt;/p&gt;

&lt;h2&gt;
  
  
  How Much Does Kimi K2.6 Cost on Novita AI?
&lt;/h2&gt;

&lt;p&gt;Kimi K2.6 on Novita AI is priced as follows &lt;em&gt;(Last verified: 2026-04-21)&lt;/em&gt;:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Input ($/M tokens)&lt;/th&gt;
&lt;th&gt;Cache Read ($/M tokens)&lt;/th&gt;
&lt;th&gt;Output ($/M tokens)&lt;/th&gt;
&lt;th&gt;Context&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Kimi K2.6&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;$0.95&lt;/td&gt;
&lt;td&gt;$0.16&lt;/td&gt;
&lt;td&gt;$4.00&lt;/td&gt;
&lt;td&gt;262K&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Kimi K2.5&lt;/td&gt;
&lt;td&gt;$0.60&lt;/td&gt;
&lt;td&gt;$0.10&lt;/td&gt;
&lt;td&gt;$3.00&lt;/td&gt;
&lt;td&gt;262K&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;For long-horizon agentic runs where cache hit rates are high, the $0.16/M cache-read price makes extended autonomous sessions materially cheaper than the headline input price suggests.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Are the Technical Specs of Kimi K2.6?
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Property&lt;/th&gt;
&lt;th&gt;Value&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Architecture&lt;/td&gt;
&lt;td&gt;Mixture-of-Experts (MoE)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Total Parameters&lt;/td&gt;
&lt;td&gt;1T&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Activated Parameters&lt;/td&gt;
&lt;td&gt;32B&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Number of Layers&lt;/td&gt;
&lt;td&gt;61 (incl. 1 dense layer)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Number of Experts&lt;/td&gt;
&lt;td&gt;384&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Selected Experts per Token&lt;/td&gt;
&lt;td&gt;8&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Context Length&lt;/td&gt;
&lt;td&gt;256K tokens&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Attention Mechanism&lt;/td&gt;
&lt;td&gt;MLA (Multi-head Latent Attention)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Vision Encoder&lt;/td&gt;
&lt;td&gt;MoonViT&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Vocabulary Size&lt;/td&gt;
&lt;td&gt;160K&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;License&lt;/td&gt;
&lt;td&gt;Modified MIT&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;em&gt;Full architecture details, weights, and evaluation code available on the &lt;a href="https://huggingface.co/moonshotai/Kimi-K2.6" rel="noopener noreferrer"&gt;Kimi K2.6 HuggingFace model card&lt;/a&gt;. Benchmark methodology published on the &lt;a href="https://kimi.ai/blog/kimi-k2-6" rel="noopener noreferrer"&gt;Moonshot AI blog&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Is Kimi K2.6 the Right Model for Your Agent Pipeline?
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Bottom line:&lt;/strong&gt; Kimi K2.6 is one of the strongest open-source models for long-horizon agentic coding as of April 2026. Its SWE-Bench Pro score of 58.6% outperforms several closed-source models on these benchmarks, its 256K context and MoE architecture keep inference costs reasonable, making it a compelling alternative to Claude or GPT-4o for agent pipeline developers.&lt;/p&gt;

&lt;p&gt;It is not the top reasoning model overall — GPT-5.4 and Gemini 3.1 Pro lead on pure math (AIME, HLE without tools). But for developers building coding agents, design-to-code pipelines, or multi-agent orchestration systems, K2.6 is a strong open-source option available on the Novita AI API today.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Recommended Reading&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://blogs.novita.ai/how-to-access-kimi-k2-5-web-api-claude-code-self-host/" rel="noopener noreferrer"&gt;How to Access Kimi K2.5: Web, API, Claude Code, Self-Host&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://blogs.novita.ai/top-10-ai-inference-platforms-2026/" rel="noopener noreferrer"&gt;Top 8 AI Inference Platforms in 2026&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://blogs.novita.ai/qwen3-coder-vs-deepseek-v3-1-choosing-the-right-llm-for-your-program/" rel="noopener noreferrer"&gt;Qwen3 Coder vs DeepSeek V3.1: Choosing the Right LLM for Your Program&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://novita.ai/models?q=kimi" rel="noopener noreferrer"&gt;Try Kimi K2.6 Free →&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;h3&gt;
  
  
  What is Kimi K2.6?
&lt;/h3&gt;

&lt;p&gt;Kimi K2.6 is an open-source, native multimodal agentic model from Moonshot AI, released in April 2026. It is a 1-trillion-parameter Mixture-of-Experts model (32B activated) with a 256K context window, built for long-horizon coding, autonomous agent execution, and multi-agent swarm coordination.&lt;/p&gt;

&lt;h3&gt;
  
  
  How do I access Kimi K2.6 via API on Novita AI?
&lt;/h3&gt;

&lt;p&gt;Use the OpenAI Python SDK with &lt;code&gt;base_url="https://api.novita.ai/v3/openai"&lt;/code&gt; and model ID &lt;code&gt;moonshotai/kimi-k2.6&lt;/code&gt;. Get your API key at &lt;a href="https://novita.ai/settings" rel="noopener noreferrer"&gt;novita.ai/settings&lt;/a&gt;. No special SDK or wrapper required.&lt;/p&gt;

&lt;h3&gt;
  
  
  How does Kimi K2.6 compare to Claude Opus 4.6 for coding tasks?
&lt;/h3&gt;

&lt;p&gt;On SWE-Bench Pro, Kimi K2.6 scores 58.6% vs. Claude Opus 4.6's 53.4% — a 5-point gap on real-world software engineering tasks. K2.6 also beats Claude on DeepSearchQA (92.5% vs. 91.3%) and Terminal-Bench 2.0 (66.7% vs. 65.4%); Gemini 3.1 Pro tops Terminal-Bench at 68.5%. For pure reasoning benchmarks like AIME or HLE without tools, Claude Opus 4.6 holds a slight edge.&lt;/p&gt;

&lt;h3&gt;
  
  
  What is the context window for Kimi K2.6?
&lt;/h3&gt;

&lt;p&gt;Kimi K2.6 supports a 256K-token context window (262,144 tokens). On Novita AI, both the context length and max output are set to 262,144 tokens, making it suitable for long-document analysis and sustained multi-turn agentic sessions.&lt;/p&gt;

&lt;h3&gt;
  
  
  What is the pricing for Kimi K2.6 on Novita AI?
&lt;/h3&gt;

&lt;p&gt;On Novita AI, Kimi K2.6 is priced at &lt;strong&gt;$0.95 per million input tokens&lt;/strong&gt;, &lt;strong&gt;$0.16 per million cache-read tokens&lt;/strong&gt;, and &lt;strong&gt;$4.00 per million output tokens&lt;/strong&gt;. The 256K context window and max output are both included. &lt;a href="https://novita.ai/models/model-detail/moonshotai-kimi-k2.6" rel="noopener noreferrer"&gt;View current pricing on Novita AI&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Novita AI is an AI &amp;amp; Agent Cloud for developers — offering 200+ models via serverless API alongside Agent Sandbox infrastructure and GPU Cloud. Build, scale, and deploy AI applications without managing infrastructure. &lt;a href="https://novita.ai" rel="noopener noreferrer"&gt;Get started at novita.ai&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>llm</category>
      <category>ai</category>
      <category>tutorial</category>
      <category>python</category>
    </item>
    <item>
      <title>DeepSeek-V4-Flash on Novita AI: Fast Reasoning at Lower Cost</title>
      <dc:creator>Novita AI</dc:creator>
      <pubDate>Wed, 29 Apr 2026 14:58:49 +0000</pubDate>
      <link>https://dev.to/novita_ai/deepseek-v4-flash-on-novita-ai-fast-reasoning-at-lower-cost-154l</link>
      <guid>https://dev.to/novita_ai/deepseek-v4-flash-on-novita-ai-fast-reasoning-at-lower-cost-154l</guid>
      <description>&lt;h1&gt;
  
  
  DeepSeek-V4-Flash backed by Novita AI: 1M Context at $0.14/M Tokens
&lt;/h1&gt;

&lt;p&gt;Most open-source models with reasoning capabilities force a trade-off: small context windows, slow throughput, or prices that climb above $1/M tokens the moment you enable extended thinking. DeepSeek-V4-Flash sidesteps that entirely — 284B parameters, only 13B activated per inference, a native 1,048,576-token context window, and three selectable reasoning modes. At $0.14/M input tokens, it lands in a category where reasoning-capable models rarely compete.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;TL;DR&lt;/strong&gt;: DeepSeek-V4-Flash is now available via Novita AI. 284B MoE model, 1M token context, selectable reasoning modes. $0.14/M input. OpenAI-compatible API.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;In short: DeepSeek-V4-Flash is a MoE model from DeepSeek AI that brings 1M-token context and adjustable reasoning depth to developers who need throughput without the closed-model price premium. As of today, it's available through the Novita AI API.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://novita.ai/models-console/model-detail/deepseek/deepseek-v4-flash" rel="noopener noreferrer"&gt;Click Here&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What Is DeepSeek-V4-Flash?
&lt;/h2&gt;

&lt;p&gt;DeepSeek-V4-Flash is a Mixture-of-Experts (MoE) language model from &lt;a href="https://huggingface.co/deepseek-ai" rel="noopener noreferrer"&gt;DeepSeek AI&lt;/a&gt;, released as part of the DeepSeek-V4 series alongside the larger DeepSeek-V4-Pro. The model has 284B total parameters with 13B activated at inference — keeping per-token compute cost low while retaining the parameter capacity of a much larger model.&lt;/p&gt;

&lt;p&gt;Key capabilities at a glance:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;284B total / 13B activated parameters&lt;/strong&gt; — MoE architecture, low inference cost&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;1,048,576-token context window&lt;/strong&gt; (1M tokens) — enabled by Hybrid Attention Architecture&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Three reasoning modes:&lt;/strong&gt; Non-think (fast), Think (step-by-step), Think Max (maximum reasoning budget)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Function calling support&lt;/strong&gt; — tool use, structured outputs, JSON mode&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Trained on 32T+ tokens&lt;/strong&gt; with multi-stage post-training (SFT, RL with GRPO, on-policy distillation)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;MIT License&lt;/strong&gt; — weights available for download on &lt;a href="https://huggingface.co/deepseek-ai/DeepSeek-V4-Flash" rel="noopener noreferrer"&gt;HuggingFace&lt;/a&gt;; commercial use permitted&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;FP4 + FP8 mixed precision&lt;/strong&gt; — MoE expert weights in FP4, remaining layers in FP8&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Key Features: Why DeepSeek-V4-Flash Stands Out
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Selectable Reasoning Depth Without Switching Models
&lt;/h3&gt;

&lt;p&gt;Most models lock you into a single inference mode: either reasoning-on or reasoning-off. DeepSeek-V4-Flash gives you three distinct operating modes on the same API endpoint:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Mode&lt;/th&gt;
&lt;th&gt;Characteristics&lt;/th&gt;
&lt;th&gt;Best For&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Non-think&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Fast, no chain-of-thought&lt;/td&gt;
&lt;td&gt;High-volume tasks, chat, summarization&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Think&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Step-by-step reasoning, balanced&lt;/td&gt;
&lt;td&gt;Complex Q&amp;amp;A, code generation, analysis&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Think Max&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Maximum reasoning budget&lt;/td&gt;
&lt;td&gt;Math competitions, hard coding tasks, benchmarks&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The gap between modes is significant: on GPQA Diamond, V4-Flash Non-think scores 71.2 vs Think at 87.4 and Think Max at 88.1. On LiveCodeBench, Think Max reaches 91.6 vs Non-think's 55.2. You choose cost vs quality per request — no infrastructure change required.&lt;/p&gt;

&lt;h3&gt;
  
  
  Hybrid Attention Architecture for 1M-Token Context
&lt;/h3&gt;

&lt;p&gt;Native million-token context is harder than it sounds. DeepSeek-V4-Flash achieves it through a purpose-built Hybrid Attention Architecture that combines two mechanisms:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Compressed Sparse Attention (CSA)&lt;/strong&gt; — dramatically reduces the attention compute budget for long sequences&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Heavily Compressed Attention (HCA)&lt;/strong&gt; — compresses KV cache footprint for 1M-context inference&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The result: inference over 1M-token inputs with manageable FLOP and memory cost. For workloads like codebase analysis, legal document review, or long-session agents, this architecture makes the difference between feasible and prohibitive.&lt;/p&gt;

&lt;h3&gt;
  
  
  MoE Efficiency: 13B Activated at 284B Scale
&lt;/h3&gt;

&lt;p&gt;The 284B/13B activated ratio is where the cost efficiency comes from. Only 13B parameters are active per forward pass, keeping latency and per-token cost close to a 13B dense model — while the full 284B parameter pool provides knowledge capacity comparable to a much larger dense network. The FP4 + FP8 mixed precision further reduces memory bandwidth pressure on expert weights.&lt;/p&gt;

&lt;h3&gt;
  
  
  Strong Post-Training Pipeline
&lt;/h3&gt;

&lt;p&gt;DeepSeek-V4-Flash follows a two-stage post-training process: first, domain-specific expert cultivation via SFT and reinforcement learning with GRPO; then, unified model consolidation through on-policy distillation. This produces a single model with differentiated capability profiles across coding, reasoning, and general knowledge — not a generic instruction-follower.&lt;/p&gt;

&lt;h2&gt;
  
  
  Benchmark Performance
&lt;/h2&gt;

&lt;p&gt;The benchmark story for DeepSeek-V4-Flash is about reasoning mode selection. In Non-think mode, it behaves like an efficient 13B-activated model. Dial up to Think Max and it reaches a different tier entirely.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fblogs.novita.ai%2Fwp-content%2Fuploads%2F2026%2F04%2Fdsv4_performance-1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fblogs.novita.ai%2Fwp-content%2Fuploads%2F2026%2F04%2Fdsv4_performance-1.png" alt="DeepSeek-V4-Flash benchmark comparison chart showing performance across reasoning modes"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;DeepSeek-V4-Flash performance across modes vs frontier models [Source: &lt;a href="https://huggingface.co/deepseek-ai/DeepSeek-V4-Flash" rel="noopener noreferrer"&gt;DeepSeek AI / HuggingFace&lt;/a&gt;]&lt;/em&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Performance Across Reasoning Modes
&lt;/h3&gt;

&lt;p&gt;Below are V4-Flash's scores across key benchmarks, comparing all three operating modes:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Benchmark&lt;/th&gt;
&lt;th&gt;V4-Flash Non-Think&lt;/th&gt;
&lt;th&gt;V4-Flash Think&lt;/th&gt;
&lt;th&gt;V4-Flash Think Max&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;LiveCodeBench (Pass@1)&lt;/td&gt;
&lt;td&gt;55.2&lt;/td&gt;
&lt;td&gt;88.4&lt;/td&gt;
&lt;td&gt;91.6&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GPQA Diamond (Pass@1)&lt;/td&gt;
&lt;td&gt;71.2&lt;/td&gt;
&lt;td&gt;87.4&lt;/td&gt;
&lt;td&gt;88.1&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;HMMT 2026 Feb (Pass@1)&lt;/td&gt;
&lt;td&gt;40.8&lt;/td&gt;
&lt;td&gt;91.9&lt;/td&gt;
&lt;td&gt;94.8&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;IMOAnswerBench (Pass@1)&lt;/td&gt;
&lt;td&gt;41.9&lt;/td&gt;
&lt;td&gt;85.1&lt;/td&gt;
&lt;td&gt;88.4&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Codeforces Rating&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;2816&lt;/td&gt;
&lt;td&gt;3052&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;SWE Verified (Resolved)&lt;/td&gt;
&lt;td&gt;73.7&lt;/td&gt;
&lt;td&gt;78.6&lt;/td&gt;
&lt;td&gt;79.0&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;MRCR 1M (MMR)&lt;/td&gt;
&lt;td&gt;37.5&lt;/td&gt;
&lt;td&gt;76.9&lt;/td&gt;
&lt;td&gt;78.7&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;MCPAtlas (Pass@1)&lt;/td&gt;
&lt;td&gt;64.0&lt;/td&gt;
&lt;td&gt;67.4&lt;/td&gt;
&lt;td&gt;69.0&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;MMLU-Pro (EM)&lt;/td&gt;
&lt;td&gt;83.0&lt;/td&gt;
&lt;td&gt;86.4&lt;/td&gt;
&lt;td&gt;86.2&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;em&gt;Last verified: 2026-04-27. Source: &lt;a href="https://huggingface.co/deepseek-ai/DeepSeek-V4-Flash" rel="noopener noreferrer"&gt;DeepSeek-V4 technical report and HuggingFace model card&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  How V4-Flash Compares to Competitors
&lt;/h3&gt;

&lt;p&gt;V4-Flash Think Max (79.0 SWE Verified, 91.6 LiveCodeBench) competes with models running at much higher per-token cost. It doesn't top every leaderboard — V4-Pro Max leads on most frontier benchmarks — but for developers looking at cost-per-task rather than raw peak performance, the trade-off is favorable:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Benchmark&lt;/th&gt;
&lt;th&gt;V4-Flash Max&lt;/th&gt;
&lt;th&gt;V4-Pro Max&lt;/th&gt;
&lt;th&gt;Claude Opus 4.6 Max&lt;/th&gt;
&lt;th&gt;Gemini 3.1 Pro High&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;LiveCodeBench (Pass@1)&lt;/td&gt;
&lt;td&gt;91.6&lt;/td&gt;
&lt;td&gt;93.5&lt;/td&gt;
&lt;td&gt;88.8&lt;/td&gt;
&lt;td&gt;91.7&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GPQA Diamond (Pass@1)&lt;/td&gt;
&lt;td&gt;88.1&lt;/td&gt;
&lt;td&gt;90.1&lt;/td&gt;
&lt;td&gt;91.3&lt;/td&gt;
&lt;td&gt;94.3&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;SWE Verified (Resolved)&lt;/td&gt;
&lt;td&gt;79.0&lt;/td&gt;
&lt;td&gt;80.6&lt;/td&gt;
&lt;td&gt;80.8&lt;/td&gt;
&lt;td&gt;80.6&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;HMMT 2026 Feb (Pass@1)&lt;/td&gt;
&lt;td&gt;94.8&lt;/td&gt;
&lt;td&gt;95.2&lt;/td&gt;
&lt;td&gt;96.2&lt;/td&gt;
&lt;td&gt;94.7&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;MRCR 1M (MMR)&lt;/td&gt;
&lt;td&gt;78.7&lt;/td&gt;
&lt;td&gt;83.5&lt;/td&gt;
&lt;td&gt;92.9&lt;/td&gt;
&lt;td&gt;76.3&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;em&gt;Last verified: 2026-04-27. Claude Opus 4.6 Max and Gemini 3.1 Pro High figures sourced from the &lt;a href="https://huggingface.co/deepseek-ai/DeepSeek-V4-Flash" rel="noopener noreferrer"&gt;DeepSeek-V4 technical report&lt;/a&gt; (V4-Pro frontier comparison table). These scores were not measured head-to-head against V4-Flash in that report.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Notably, V4-Flash Think Max on MRCR 1M (78.7) beats Gemini 3.1 Pro High (76.3) on the long-context retrieval task — the benchmark that most directly maps to 1M-context use cases. On SWE Verified, all four models cluster between 79–81, making V4-Flash competitive in the real-world coding agent category at a fraction of the closed-model price.&lt;/p&gt;

&lt;h2&gt;
  
  
  How to Use DeepSeek-V4-Flash via Novita AI
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Option 1: Playground (No Code)
&lt;/h3&gt;

&lt;p&gt;Test the model directly in your browser at the &lt;a href="https://novita.ai/models-console/model-detail/deepseek/deepseek-v4-flash" rel="noopener noreferrer"&gt;Novita AI model console&lt;/a&gt;. No API key required to start — switch between Non-think, Think, and Think Max modes via the chat interface.&lt;/p&gt;

&lt;h3&gt;
  
  
  Option 2: API (Python)
&lt;/h3&gt;

&lt;p&gt;DeepSeek-V4-Flash uses the OpenAI-compatible API. Use the model ID &lt;code&gt;deepseek/deepseek-v4-flash&lt;/code&gt; with the Novita base URL:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OpenAI&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;base_url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://api.novita.ai/v3/openai&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;YOUR_NOVITA_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;deepseek/deepseek-v4-flash&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Your prompt here&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}]&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To enable Think or Think Max mode, pass the &lt;code&gt;reasoning&lt;/code&gt; parameter in the request body:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OpenAI&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;base_url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://api.novita.ai/v3/openai&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;YOUR_NOVITA_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Think Max mode — maximum reasoning budget
&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;deepseek/deepseek-v4-flash&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Solve: x^4 - 5x^2 + 4 = 0&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}],&lt;/span&gt;
    &lt;span class="n"&gt;extra_body&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;reasoning&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;effort&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;high&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}}&lt;/span&gt;  &lt;span class="c1"&gt;# "low" = Think, "high" = Think Max
&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Get your API key at &lt;a href="https://novita.ai/settings" rel="noopener noreferrer"&gt;novita.ai/settings&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Option 3: Third-Party Tools
&lt;/h3&gt;

&lt;p&gt;Because Novita AI exposes an OpenAI-compatible endpoint, DeepSeek-V4-Flash works out of the box with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;LangChain / LlamaIndex&lt;/strong&gt; — use &lt;code&gt;ChatOpenAI&lt;/code&gt; with &lt;code&gt;base_url="https://api.novita.ai/v3/openai"&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;OpenWebUI&lt;/strong&gt; — add as a custom OpenAI-compatible endpoint&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Continue.dev / Cursor&lt;/strong&gt; — configure as a custom model with the Novita base URL&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Pricing
&lt;/h2&gt;

&lt;p&gt;DeepSeek-V4-Flash is priced consistently across major providers. All figures are per million tokens, as of 2026-04-27:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Provider&lt;/th&gt;
&lt;th&gt;Input ($/M)&lt;/th&gt;
&lt;th&gt;Output ($/M)&lt;/th&gt;
&lt;th&gt;Cache Read ($/M)&lt;/th&gt;
&lt;th&gt;Max Context&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Novita AI&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;$0.14&lt;/td&gt;
&lt;td&gt;$0.28&lt;/td&gt;
&lt;td&gt;$0.028&lt;/td&gt;
&lt;td&gt;1,048,576 tokens&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;DeepSeek Official&lt;/td&gt;
&lt;td&gt;$0.14&lt;/td&gt;
&lt;td&gt;$0.28&lt;/td&gt;
&lt;td&gt;$0.028&lt;/td&gt;
&lt;td&gt;131,072 tokens&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;SiliconFlow&lt;/td&gt;
&lt;td&gt;$0.14&lt;/td&gt;
&lt;td&gt;$0.28&lt;/td&gt;
&lt;td&gt;$0.028&lt;/td&gt;
&lt;td&gt;65,536 tokens&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;DeepInfra&lt;/td&gt;
&lt;td&gt;$0.14&lt;/td&gt;
&lt;td&gt;$0.28&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;16,384 tokens&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The per-token rate is the same everywhere — but max context varies significantly. Novita AI offers the full 1M token context window. DeepInfra caps at 16,384 tokens. If your workload involves long documents, codebases, or multi-turn agents, Novita is the practical choice.&lt;/p&gt;

&lt;h2&gt;
  
  
  Recommended Use Cases
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Autonomous Coding Agents
&lt;/h3&gt;

&lt;p&gt;V4-Flash's 1M context window means an agent can load an entire codebase into context without chunking. Combined with 79.0 SWE Verified in Think Max mode, it handles multi-file refactors and debugging without losing state between turns.&lt;/p&gt;

&lt;h3&gt;
  
  
  Long-Document QA and RAG
&lt;/h3&gt;

&lt;p&gt;MRCR 1M (Multi-Round Context Retrieval) at 78.7% Think Max — the benchmark measures retrieval accuracy over a genuine 1M-token window. For indexing legal documents, academic papers, or long technical specs, V4-Flash retrieves accurately where most models degrade after 32K tokens.&lt;/p&gt;

&lt;h3&gt;
  
  
  Math and Science Reasoning
&lt;/h3&gt;

&lt;p&gt;94.8% on HMMT 2026 February (competition math) with Think Max. The budget-thinking mode lets you tune cost vs accuracy — use Think for standard problems, Think Max for the hard ones. A single request doesn't burn a fixed compute budget; you choose.&lt;/p&gt;

&lt;h3&gt;
  
  
  Production APIs with Caching
&lt;/h3&gt;

&lt;p&gt;At $0.028/M cache reads, repeated system prompts and tool schemas effectively cost nothing at scale. Chatbot products and API wrappers that re-inject the same context on every call benefit from cache read pricing over raw input pricing.&lt;/p&gt;

&lt;h2&gt;
  
  
  Frequently Asked Questions
&lt;/h2&gt;

&lt;h3&gt;
  
  
  What is DeepSeek-V4-Flash?
&lt;/h3&gt;

&lt;p&gt;DeepSeek-V4-Flash is a 284B-parameter Mixture-of-Experts language model developed by DeepSeek AI, released on 2026-04-23. It activates only 13B parameters per forward pass, making it significantly faster and cheaper than dense models of comparable capability. It supports a 1,048,576-token context window and three reasoning modes: Non-thinking (fast), Budget Thinking, and Extended Thinking (Think Max).&lt;/p&gt;

&lt;h3&gt;
  
  
  How is DeepSeek-V4-Flash different from DeepSeek-V4-Pro?
&lt;/h3&gt;

&lt;p&gt;V4-Flash is the lighter, faster variant optimized for speed and cost. V4-Pro is the flagship model with higher peak benchmark scores (e.g., 93.5 vs 91.6 on LiveCodeBench Think Max). V4-Flash "achieves comparable reasoning performance to the Pro version when given a larger thinking budget" — in practice, V4-Flash Think Max closes most of the gap against V4-Pro Think Max at lower per-token cost.&lt;/p&gt;

&lt;h3&gt;
  
  
  What does "Flash" mean in the model name?
&lt;/h3&gt;

&lt;p&gt;Flash signals a speed-optimized variant, consistent with how Google uses the term for Gemini Flash. DeepSeek-V4-Flash prioritizes lower latency and cost over raw maximum accuracy, with the thinking modes available when you need to close the performance gap.&lt;/p&gt;

&lt;h3&gt;
  
  
  Does DeepSeek-V4-Flash support a 1M context window backed by Novita AI?
&lt;/h3&gt;

&lt;p&gt;Yes. Novita AI exposes the full 1,048,576-token context window — the largest available across all current providers for this model. Max completion tokens on Novita is 393,216.&lt;/p&gt;

&lt;h3&gt;
  
  
  How do I switch reasoning modes via the API?
&lt;/h3&gt;

&lt;p&gt;Pass the &lt;code&gt;extra_body={"reasoning": {"effort": "low"}}&lt;/code&gt; parameter for Budget Thinking, or &lt;code&gt;"effort": "high"&lt;/code&gt; for Think Max. Omit the parameter entirely for Non-thinking (fast) mode. The API is OpenAI-compatible — no SDK changes required.&lt;/p&gt;

&lt;h3&gt;
  
  
  What is the pricing for DeepSeek-V4-Flash backed by Novita AI?
&lt;/h3&gt;

&lt;p&gt;As of 2026-04-27: $0.14/M input tokens, $0.28/M output tokens, $0.028/M cache read tokens. This matches DeepSeek's official pricing and is consistent across providers — the differentiator on Novita is the full 1M context window and reliable uptime.&lt;/p&gt;

&lt;h3&gt;
  
  
  Is DeepSeek-V4-Flash open source?
&lt;/h3&gt;

&lt;p&gt;Yes. The model weights are available on &lt;a href="https://huggingface.co/deepseek-ai/DeepSeek-V4-Flash" rel="noopener noreferrer"&gt;HuggingFace&lt;/a&gt; under the &lt;strong&gt;MIT License&lt;/strong&gt; — confirmed in the official DeepSeek-V4 repository. Self-hosting and commercial use are permitted under MIT terms. Using it via Novita AI's API requires no self-hosting at all.&lt;/p&gt;




&lt;h2&gt;
  
  
  Start Using DeepSeek-V4-Flash Today
&lt;/h2&gt;

&lt;p&gt;DeepSeek-V4-Flash is now available via Novita AI with the full 1M context window, competitive pricing, and zero infrastructure overhead. You pick the reasoning mode; Novita handles the rest.&lt;/p&gt;

&lt;p&gt;→ &lt;a href="https://novita.ai/models/model-detail/deepseek/deepseek-v4-flash" rel="noopener noreferrer"&gt;&lt;strong&gt;Try DeepSeek-V4-Flash backed by Novita AI&lt;/strong&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;→ &lt;a href="https://novita.ai/docs/api-reference/llm-openai-compatible" rel="noopener noreferrer"&gt;&lt;strong&gt;Novita AI LLM API documentation&lt;/strong&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Recommended Articles
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://blogs.novita.ai/choose-inference-provider-for-ai-agents/" rel="noopener noreferrer"&gt;Which Inference Provider Is Right for AI Agents&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://blogs.novita.ai/inference-api-providers-for-open-source-models/" rel="noopener noreferrer"&gt;Top Inference API Providers for Open-Source Models in 2026&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://blogs.novita.ai/ling-2-6-1t-novita-ai/" rel="noopener noreferrer"&gt;Ling-2.6-1T: The 1T Model That Skips the Reasoning Tax&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>llm</category>
      <category>ai</category>
      <category>tutorial</category>
      <category>python</category>
    </item>
    <item>
      <title>DeepSeek-V4-Pro on Novita AI: 1M Context, #1 LiveCodeBench Score</title>
      <dc:creator>Novita AI</dc:creator>
      <pubDate>Wed, 29 Apr 2026 14:58:41 +0000</pubDate>
      <link>https://dev.to/novita_ai/deepseek-v4-pro-on-novita-ai-1m-context-1-livecodebench-score-3i9g</link>
      <guid>https://dev.to/novita_ai/deepseek-v4-pro-on-novita-ai-1m-context-1-livecodebench-score-3i9g</guid>
      <description>&lt;h1&gt;
  
  
  DeepSeek-V4-Pro: 1M Context, #1 on LiveCodeBench, Open-Source Frontier
&lt;/h1&gt;

&lt;p&gt;You're evaluating open-source models for a production coding agent. You need something that handles large codebases—entire repos, not just single files—and actually resolves GitHub issues without hallucinating tool calls. Every model you try either falls apart beyond 128K tokens or lags behind GPT-4o on the benchmarks that matter for real engineering tasks.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;TL;DR&lt;/strong&gt;: DeepSeek-V4-Pro is a 1.6T-parameter open-source MoE model delivering #1 LiveCodeBench score (93.5) and 1M-token context. Available now via Novita AI.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;DeepSeek-V4-Pro changes this calculus. It's a 1.6-trillion-parameter MoE model with a true 1M-token context window, the highest published score on LiveCodeBench (93.5 Pass@1), and Codeforces Rating 3206—both #1 among all evaluated models including closed frontier APIs. In short: it's the best open-source model available today for competitive coding and large-context agentic tasks, released under MIT license. As of today, it's available via Novita AI.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://novita.ai/models/model-detail/deepseek-deepseek-v4-pro" rel="noopener noreferrer"&gt;Try DeepSeek-V4-Pro Now →&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What Is DeepSeek-V4-Pro?
&lt;/h2&gt;

&lt;p&gt;DeepSeek-V4-Pro is the flagship model in DeepSeek's V4 series, released April 24, 2026. It sits above the lightweight &lt;a href="https://huggingface.co/deepseek-ai/DeepSeek-V4-Flash" rel="noopener noreferrer"&gt;DeepSeek-V4-Flash&lt;/a&gt; (284B total / 13B active) and is positioned as a preview of DeepSeek's current frontier capabilities—what they describe as the "best open-source model available today" for knowledge and coding. The model is trained on over 32 trillion tokens and fine-tuned through a two-stage pipeline: domain-expert SFT + GRPO reinforcement learning, followed by on-policy distillation. The full technical details are in DeepSeek's paper &lt;em&gt;&lt;a href="https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro" rel="noopener noreferrer"&gt;DeepSeek-V4: Towards Highly Efficient Million-Token Context Intelligence&lt;/a&gt;&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;Key specs at a glance:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Architecture:&lt;/strong&gt; Mixture-of-Experts (MoE) with Hybrid Attention — Compressed Sparse Attention (CSA) + Heavily Compressed Attention (HCA)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Parameters:&lt;/strong&gt; 1.6T total / 49B activated per forward pass&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Context window:&lt;/strong&gt; 1,048,576 tokens (1M)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Precision:&lt;/strong&gt; FP4 (MoE experts) + FP8 mixed&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Reasoning modes:&lt;/strong&gt; Non-think (fast), Think (standard CoT), Max (maximum reasoning budget)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Capabilities:&lt;/strong&gt; Function calling, structured outputs, reasoning, 1M-context retrieval&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;License:&lt;/strong&gt; MIT&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Key Features
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Hybrid Attention for Efficient 1M-Token Context
&lt;/h3&gt;

&lt;p&gt;Most models claiming "long context" either truncate silently or degrade sharply beyond 128K tokens. DeepSeek-V4-Pro's Hybrid Attention Architecture—combining Compressed Sparse Attention (CSA) and Heavily Compressed Attention (HCA) alongside Manifold-Constrained Hyper-Connections (mHC)—is designed from the ground up for efficient million-token processing. In practice: MRCR 1M scores 83.5 (memory recall across 1M context) and CorpusQA 1M hits 62.0, both while maintaining coherent reasoning over the full window. For agents that need to ingest an entire codebase, a day's worth of logs, or a book-length document in a single call, this is the architecture that makes it viable without specialized infrastructure.&lt;/p&gt;

&lt;h3&gt;
  
  
  #1 on LiveCodeBench and Codeforces — The Coding Model That Actually Competes
&lt;/h3&gt;

&lt;p&gt;DeepSeek-V4-Pro scores &lt;strong&gt;93.5 on LiveCodeBench&lt;/strong&gt; (Pass@1) and &lt;strong&gt;3206 on Codeforces Rating&lt;/strong&gt;—both the highest published scores in the comparison table, beating Claude Opus 4.6 Max (88.8 / no rating), Gemini 3.1 Pro High (91.7 / 3052), and GPT-5.4 xHigh (no LCB score / 3168). On SWE-Verified (real-world GitHub issue resolution), it hits 80.6, on par with Claude Opus 4.6 Max (80.8) and Gemini 3.1 Pro (80.6). For teams building coding agents where "can it actually fix the bug" matters more than theoretical MMLU scores, V4-Pro is the open-source option that directly competes with closed frontier APIs.&lt;/p&gt;

&lt;h3&gt;
  
  
  Three Reasoning Modes — Match Compute to the Task
&lt;/h3&gt;

&lt;p&gt;DeepSeek-V4-Pro exposes three inference modes through the same API endpoint:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Non-think:&lt;/strong&gt; No chain-of-thought. Fast, low latency—suitable for classification, extraction, structured output tasks where reasoning overhead is wasteful.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Think:&lt;/strong&gt; Standard CoT reasoning. The default for coding, math, and multi-step tasks.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Max (V4-Pro Max):&lt;/strong&gt; Extended reasoning budget. Use when accuracy matters more than speed—complex proofs, hard competitive programming problems, deep debugging sessions.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;All three modes are accessible via the &lt;code&gt;deepseek/deepseek-v4-pro&lt;/code&gt; model ID backed by Novita AI. Switching between them is a prompt-level instruction, not a different endpoint—which means you can implement adaptive mode selection in your application without changing API config.&lt;/p&gt;

&lt;h3&gt;
  
  
  Agentic and Tool Use Performance
&lt;/h3&gt;

&lt;p&gt;Beyond coding benchmarks, V4-Pro holds its own on agentic evaluations. BrowseComp: 83.4 (vs Claude Opus 83.7, Gemini 85.9—within 2.5 points of the frontier). MCPAtlas Public: 73.6, second only to Claude Opus 4.6 (73.8). Toolathlon: 51.8, third overall. These aren't "leads all models" results, but they confirm that V4-Pro is a capable general-purpose agentic model, not just a benchmark-optimized coding specialist. Combined with native function calling support, it's a practical choice for agents that need to browse, call tools, and reason in a single session.&lt;/p&gt;

&lt;h2&gt;
  
  
  Benchmark Performance
&lt;/h2&gt;

&lt;p&gt;The table below covers the benchmarks from DeepSeek's official comparison. "V4-Pro" refers to the DeepSeek-V4-Pro Max (extended reasoning) mode—the same model accessible via the &lt;code&gt;deepseek/deepseek-v4-pro&lt;/code&gt; API ID on Novita.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Faliu1ffjxoe5igfkj0e0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Faliu1ffjxoe5igfkj0e0.png" alt="DeepSeek-V4-Pro benchmark performance comparison chart showing LiveCodeBench, Codeforces, SWE-Verified, BrowseComp scores vs Claude Opus, Gemini 3.1 Pro, GPT-5.4" width="800" height="433"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;DeepSeek-V4-Pro performance across coding, reasoning, and agentic benchmarks. [Source: &lt;a href="https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro" rel="noopener noreferrer"&gt;DeepSeek HuggingFace&lt;/a&gt;]&lt;/em&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Benchmark&lt;/th&gt;
&lt;th&gt;DeepSeek-V4-Pro&lt;/th&gt;
&lt;th&gt;Claude Opus 4.6&lt;/th&gt;
&lt;th&gt;Gemini 3.1 Pro&lt;/th&gt;
&lt;th&gt;GPT-5.4&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;LiveCodeBench (Pass@1)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;93.5 ✓&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;88.8&lt;/td&gt;
&lt;td&gt;91.7&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Codeforces Rating&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;3206 ✓&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;3052&lt;/td&gt;
&lt;td&gt;3168&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;SWE-Verified&lt;/td&gt;
&lt;td&gt;80.6&lt;/td&gt;
&lt;td&gt;80.8&lt;/td&gt;
&lt;td&gt;80.6&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;SWE Pro&lt;/td&gt;
&lt;td&gt;55.4&lt;/td&gt;
&lt;td&gt;57.3&lt;/td&gt;
&lt;td&gt;54.2&lt;/td&gt;
&lt;td&gt;57.7&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;BrowseComp&lt;/td&gt;
&lt;td&gt;83.4&lt;/td&gt;
&lt;td&gt;83.7&lt;/td&gt;
&lt;td&gt;85.9&lt;/td&gt;
&lt;td&gt;82.7&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;MCPAtlas Public&lt;/td&gt;
&lt;td&gt;73.6&lt;/td&gt;
&lt;td&gt;73.8&lt;/td&gt;
&lt;td&gt;69.2&lt;/td&gt;
&lt;td&gt;67.2&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GPQA Diamond&lt;/td&gt;
&lt;td&gt;90.1&lt;/td&gt;
&lt;td&gt;91.3&lt;/td&gt;
&lt;td&gt;94.3&lt;/td&gt;
&lt;td&gt;93.0&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;HLE (Pass@1)&lt;/td&gt;
&lt;td&gt;37.7&lt;/td&gt;
&lt;td&gt;40.0&lt;/td&gt;
&lt;td&gt;44.4&lt;/td&gt;
&lt;td&gt;39.8&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;IMOAnswerBench&lt;/td&gt;
&lt;td&gt;89.8&lt;/td&gt;
&lt;td&gt;75.3&lt;/td&gt;
&lt;td&gt;81.0&lt;/td&gt;
&lt;td&gt;91.4&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;HMMT 2026 Feb&lt;/td&gt;
&lt;td&gt;95.2&lt;/td&gt;
&lt;td&gt;96.2&lt;/td&gt;
&lt;td&gt;94.7&lt;/td&gt;
&lt;td&gt;97.7&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;MRCR 1M (MMR)&lt;/td&gt;
&lt;td&gt;83.5&lt;/td&gt;
&lt;td&gt;92.9&lt;/td&gt;
&lt;td&gt;76.3&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;CorpusQA 1M&lt;/td&gt;
&lt;td&gt;62.0&lt;/td&gt;
&lt;td&gt;71.7&lt;/td&gt;
&lt;td&gt;53.8&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Terminal Bench 2.0&lt;/td&gt;
&lt;td&gt;67.9&lt;/td&gt;
&lt;td&gt;65.4&lt;/td&gt;
&lt;td&gt;68.5&lt;/td&gt;
&lt;td&gt;75.1&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;em&gt;✓ = highest published score in this comparison. Last verified: 2026-04-25. Scores reflect "Max" / extended reasoning mode where applicable. Source: &lt;a href="https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro" rel="noopener noreferrer"&gt;DeepSeek HuggingFace model card&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Honest read:&lt;/strong&gt; On knowledge benchmarks (GPQA Diamond, HLE), Gemini 3.1 Pro and GPT-5.4 are clearly ahead. V4-Pro's edge is in coding—LiveCodeBench and Codeforces are unambiguous #1 scores—and in long-context retrieval over other open-source models. For math reasoning, the gap is mixed: V4-Pro beats GPT-5.4 on IMOAnswerBench (89.8 vs 91.4, close) but trails on HMMT 2026 (95.2 vs 97.7).&lt;/p&gt;

&lt;h2&gt;
  
  
  How to Use DeepSeek-V4-Pro backed by Novita AI
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Option 1: Playground (No Code)
&lt;/h3&gt;

&lt;p&gt;Test directly at &lt;a href="https://novita.ai/models/model-detail/deepseek-deepseek-v4-pro" rel="noopener noreferrer"&gt;novita.ai/models/model-detail/deepseek-deepseek-v4-pro&lt;/a&gt;. No API key required to explore. Set the system prompt to activate Think or Non-think mode.&lt;/p&gt;

&lt;h3&gt;
  
  
  Option 2: API (Python)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OpenAI&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;base_url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://api.novita.ai/v3/openai&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;YOUR_NOVITA_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Standard (Think mode)
&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;deepseek/deepseek-v4-pro&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Implement a Rust async runtime from scratch.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;],&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Get your API key at &lt;a href="https://novita.ai/settings" rel="noopener noreferrer"&gt;novita.ai/settings&lt;/a&gt;. The same model ID works for all three reasoning modes—pass mode instructions in the system prompt or use DeepSeek's documented mode-switching syntax.&lt;/p&gt;

&lt;h3&gt;
  
  
  Option 3: Third-Party Tools
&lt;/h3&gt;

&lt;p&gt;Since Novita AI is OpenAI-API-compatible, you can drop in &lt;code&gt;deepseek/deepseek-v4-pro&lt;/code&gt; as the model ID in &lt;strong&gt;Cursor&lt;/strong&gt; (custom OpenAI provider), &lt;strong&gt;Claude Code&lt;/strong&gt;-compatible setups, &lt;strong&gt;LangChain&lt;/strong&gt;, &lt;strong&gt;LlamaIndex&lt;/strong&gt;, or any OpenAI SDK-based framework. Just point &lt;code&gt;base_url&lt;/code&gt; to &lt;code&gt;https://api.novita.ai/v3/openai&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl https://api.novita.ai/v3/openai/chat/completions &lt;span class="se"&gt;\\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Authorization: Bearer YOUR_NOVITA_API_KEY"&lt;/span&gt; &lt;span class="se"&gt;\\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Content-Type: application/json"&lt;/span&gt; &lt;span class="se"&gt;\\&lt;/span&gt;
  &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'{"model":"deepseek/deepseek-v4-pro","messages":[{"role":"user","content":"Implement a Rust async runtime."}]}'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Use Cases
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Full-codebase analysis and refactoring:&lt;/strong&gt; With 1M-token context, you can pass an entire medium-sized repository in one call. Ask V4-Pro to find architectural issues, generate migration guides, or refactor patterns across 50+ files simultaneously—without chunking or retrieval hacks.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Competitive programming and hard algorithm problems:&lt;/strong&gt; Codeforces Rating 3206 puts V4-Pro in the top tier for algorithmic problem solving. Use it for generating solutions to competitive programming challenges, verifying complexity proofs, or stress-testing edge cases in production algorithms.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;GitHub issue resolution agents:&lt;/strong&gt; SWE-Verified 80.6 places V4-Pro on par with Claude Opus 4.6 on real-world bug fixing. Combined with function calling and long context, it can read issue descriptions, browse code history, and generate patches without losing track across large repos.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Long-document reasoning:&lt;/strong&gt; Legal contracts, research papers, technical specifications, audit logs—V4-Pro's 1M context means you're not forced to summarize or chunk before analysis. CorpusQA 1M (62.0) and MRCR 1M (83.5) confirm retrieval accuracy holds at full context length.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Math and science tutoring / problem generation:&lt;/strong&gt; IMOAnswerBench 89.8 (beats all closed models except GPT-5.4's 91.4) makes V4-Pro a strong choice for generating competition-level math problems, verifying proofs, or building STEM education tools where mathematical reasoning is the bottleneck.&lt;/p&gt;

&lt;h2&gt;
  
  
  Pricing
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Input ($/M tokens)&lt;/th&gt;
&lt;th&gt;Cache Read ($/M tokens)&lt;/th&gt;
&lt;th&gt;Output ($/M tokens)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;DeepSeek-V4-Pro (Novita)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;$1.74&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;$0.145&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;$3.48&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;DeepSeek-V4-Flash (Novita)&lt;/td&gt;
&lt;td&gt;$0.10&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;$0.50&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Claude Opus 4.6 (Anthropic)&lt;/td&gt;
&lt;td&gt;$15.00&lt;/td&gt;
&lt;td&gt;$1.50&lt;/td&gt;
&lt;td&gt;$75.00&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Gemini 3.1 Pro (Google)&lt;/td&gt;
&lt;td&gt;$1.25&lt;/td&gt;
&lt;td&gt;$0.31&lt;/td&gt;
&lt;td&gt;$10.00&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GPT-5.4 (OpenAI)&lt;/td&gt;
&lt;td&gt;$10.00&lt;/td&gt;
&lt;td&gt;$2.50&lt;/td&gt;
&lt;td&gt;$40.00&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;em&gt;Last verified: 2026-04-25. Novita pricing from &lt;a href="https://novita.ai/pricing" rel="noopener noreferrer"&gt;novita.ai/pricing&lt;/a&gt;. Competitor pricing: Claude from anthropic.com (unverified), Gemini from ai.google.dev (unverified), GPT-5.4 from platform.openai.com (unverified).&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Via Novita AI, V4-Pro is roughly 8× cheaper than Claude Opus 4.6 for input tokens, and 21× cheaper for output. Compared to Gemini 3.1 Pro, input pricing is similar but output is 2.9× cheaper. For coding agents with long context and multi-turn sessions—where output tokens dominate costs—the gap compounds fast.&lt;/p&gt;

&lt;h2&gt;
  
  
  Migrating from DeepSeek-V3 or DeepSeek-R1
&lt;/h2&gt;

&lt;p&gt;If you're currently running DeepSeek-V3 or R1 on Novita, upgrading to V4-Pro is a one-line model ID change. The API is OpenAI-compatible, same endpoint, same request format. V4-Pro's three reasoning modes give you the flexibility to replicate both V3 (non-think mode) and R1-style deep reasoning (Max mode) from a single model—without maintaining separate deployments. If you're migrating from another provider's model (GPT-4o, Claude 3.5, etc.), point your existing OpenAI SDK client to &lt;code&gt;base_url="https://api.novita.ai/v3/openai"&lt;/code&gt; and swap the model ID.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Bottom line:&lt;/strong&gt; DeepSeek-V4-Pro is the strongest open-source model available for coding tasks, with definitive #1 scores on LiveCodeBench and Codeforces, and it's the only model in its tier that handles a genuine 1M-token context window. It doesn't lead every benchmark—Gemini 3.1 Pro holds the edge on knowledge recall, and Claude Opus leads on long-context retrieval—but for teams building coding agents, fixing GitHub issues at scale, or processing massive documents, V4-Pro delivers frontier-class performance at a fraction of closed-model API costs. Now available backed by Novita AI — 200+ model APIs and OpenAI-compatible infrastructure.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://novita.ai/models/model-detail/deepseek-deepseek-v4-pro" rel="noopener noreferrer"&gt;Try DeepSeek-V4-Pro via Novita AI →&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;h3&gt;
  
  
  What is DeepSeek-V4-Pro?
&lt;/h3&gt;

&lt;p&gt;DeepSeek-V4-Pro is a 1.6-trillion-parameter Mixture-of-Experts language model from DeepSeek AI, released April 2026. It activates 49B parameters per forward pass, supports 1,048,576 tokens of context, and currently leads all publicly evaluated models on LiveCodeBench (93.5) and Codeforces Rating (3206). It's available under the MIT license and via Novita AI.&lt;/p&gt;

&lt;h3&gt;
  
  
  How do I access DeepSeek-V4-Pro via API?
&lt;/h3&gt;

&lt;p&gt;Use model ID &lt;code&gt;deepseek/deepseek-v4-pro&lt;/code&gt; with &lt;code&gt;base_url="https://api.novita.ai/v3/openai"&lt;/code&gt; and your Novita API key from &lt;a href="https://novita.ai/settings" rel="noopener noreferrer"&gt;novita.ai/settings&lt;/a&gt;. The endpoint is OpenAI SDK-compatible—no custom SDK required.&lt;/p&gt;

&lt;h3&gt;
  
  
  How does DeepSeek-V4-Pro compare to Claude Opus 4.6 and Gemini 3.1 Pro?
&lt;/h3&gt;

&lt;p&gt;V4-Pro leads on coding: LiveCodeBench 93.5 (vs Opus 4.6 88.8, Gemini 91.7) and Codeforces 3206 (vs Gemini 3052). On knowledge benchmarks like GPQA Diamond and HLE, Gemini 3.1 Pro leads. On long-context retrieval (MRCR 1M), Claude Opus leads. V4-Pro is the best open-source choice for coding-heavy and agentic workloads—closed models maintain edges in raw factual recall.&lt;/p&gt;

&lt;h3&gt;
  
  
  What is DeepSeek-V4-Pro's context window?
&lt;/h3&gt;

&lt;p&gt;1,048,576 tokens (1M). The model is specifically architected for long-context efficiency using Hybrid Attention (CSA + HCA). MRCR 1M scores 83.5 and CorpusQA 1M hits 62.0, confirming usable retrieval accuracy at full context length.&lt;/p&gt;

&lt;h3&gt;
  
  
  How much does DeepSeek-V4-Pro cost backed by Novita AI?
&lt;/h3&gt;

&lt;p&gt;$1.74/M input tokens, $3.48/M output tokens, $0.145/M cache read. This makes it approximately 8× cheaper than Claude Opus 4.6 for input and 21× cheaper for output. Last verified: 2026-04-25.&lt;/p&gt;




&lt;h2&gt;
  
  
  Recommended Articles
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://blogs.novita.ai/deepseek-v3-0324-on-novita-ai/" rel="noopener noreferrer"&gt;DeepSeek-V3-0324: What Changed and How to Upgrade&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://blogs.novita.ai/how-to-use-deepseek-r1-api/" rel="noopener noreferrer"&gt;How to Use DeepSeek R1 API&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://blogs.novita.ai/novita-ai-llm-api/" rel="noopener noreferrer"&gt;Novita AI LLM API: 200+ Models, One Endpoint&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>llm</category>
      <category>ai</category>
      <category>tutorial</category>
      <category>python</category>
    </item>
    <item>
      <title>Ling-2.6-1T: The 1T Model That Skips the Reasoning Tax</title>
      <dc:creator>Novita AI</dc:creator>
      <pubDate>Wed, 29 Apr 2026 14:49:19 +0000</pubDate>
      <link>https://dev.to/novita_ai/ling-26-1t-the-1t-model-that-skips-the-reasoning-tax-2347</link>
      <guid>https://dev.to/novita_ai/ling-26-1t-the-1t-model-that-skips-the-reasoning-tax-2347</guid>
      <description>&lt;p&gt;Most capable open-source models make you choose: raw intelligence or token efficiency. Thinking models burn 3–5× more tokens per request. Smaller non-reasoning models cut costs but cap capability. Ling-2.6-1T is built to break that tradeoff.&lt;/p&gt;

&lt;p&gt;Ling-2.6-1T is a trillion-scale comprehensive flagship model from Ant Group (inclusionAI), designed for immediate task execution. Built on &lt;strong&gt;MLA + Hybrid Linear Attention&lt;/strong&gt; architecture, it achieves a superior intelligence-to-token ratio: strong benchmark performance with minimal output token overhead. On AIME26, it significantly outperforms other non-thinking models. On agent execution benchmarks — SWE-bench Verified, BFCLv4, TAU2-Bench, Claw-Eval — it reaches open-source SOTA. Now exclusively backed by Novita AI as the inference provider.&lt;/p&gt;

&lt;p&gt;In short: Ling-2.6-1T delivers comprehensive frontier capability for agent workloads — complex reasoning, tool use, multi-step execution, and long-context instruction following — at a fraction of the token cost of thinking models.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;TL;DR&lt;/strong&gt;: Ling-2.6-1T is a 1-trillion-parameter open-source MoE model from Ant Group, available via Novita AI API. Best for agent workloads requiring frontier capability (SWE-bench SOTA) without thinking-model token overhead. Non-reasoning architecture keeps output tokens lean while matching reasoning-model benchmark performance.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="https://novita.ai/models/model-detail/inclusionai-ling-2.6-1t" rel="noopener noreferrer"&gt;Try Ling-2.6-1T backed by Novita AI&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What Is Ling-2.6-1T?
&lt;/h2&gt;

&lt;p&gt;Ling-2.6-1T is the latest flagship model from &lt;a href="https://huggingface.co/inclusionAI/Ling-1T" rel="noopener noreferrer"&gt;inclusionAI&lt;/a&gt;, the AI research arm of Ant Group (AntLingAGI). It’s a 1-trillion-parameter Mixture-of-Experts model — the largest FP8-trained foundation model released to date — trained on 20T+ high-quality tokens with over 40% reasoning-dense data in later stages.&lt;/p&gt;

&lt;p&gt;Unlike thinking models (DeepSeek-R1, QwQ) that output long chain-of-thought traces before answering, Ling-2.6-1T uses a “fast thinking” mechanism: it internalizes reasoning without externalizing verbose thought chains. This keeps token output lean while maintaining strong analytical depth. ~50B parameters activate per token, making inference practical at 1T scale.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Architecture:&lt;/strong&gt; MLA + Hybrid Linear Attention, 1T total parameters, ~50B active params per token&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Context window:&lt;/strong&gt; 262,144 tokens (via YaRN rope scaling), max output 32,768 tokens&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Training:&lt;/strong&gt; FP8 mixed-precision, 20T+ tokens, &amp;gt;40% reasoning-dense data&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Paradigm:&lt;/strong&gt; Fast-thinking — internalized reasoning, no verbose chain-of-thought output&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;License:&lt;/strong&gt; MIT — fully open weights&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Availability:&lt;/strong&gt; Exclusively backed by Novita AI (OpenRouter provider)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Key Features: Why Ling-2.6-1T Stands Out
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Superior Intelligence-to-Token Ratio
&lt;/h3&gt;

&lt;p&gt;Thinking models produce impressive results but inflate your token bill — hundreds of reasoning tokens before the actual answer. Ling-2.6-1T was trained with Evolutionary Chain-of-Thought (Evo-CoT) in mid-training, internalizing reasoning rather than externalizing it. The result: strong benchmark scores on AIME26 (outperforming other non-thinking models), LiveCodeBench, and Omni-MATH — without paying for the thought process. Per the official model card, it achieves intelligence-output efficiency on par with GPT-5.4 (Non-Reasoning), representing a major leap over its predecessor Ling-1T. For high-throughput production workloads, this directly reduces cost.&lt;/p&gt;

&lt;h3&gt;
  
  
  Open-Source SOTA on Agent Execution
&lt;/h3&gt;

&lt;p&gt;Agent workloads require more than math and coding in isolation — they require tool use, multi-step execution, and reliable instruction following under real-world conditions. Ling-2.6-1T reaches open-source SOTA across the key agent benchmarks (per inclusionAI model card):&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;SWE-bench Verified&lt;/strong&gt; — real-world software engineering task resolution&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;BFCLv4&lt;/strong&gt; — Berkeley Function-Calling Leaderboard v4, complex tool-use&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;TAU2-Bench&lt;/strong&gt; — long-horizon agentic task completion&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Claw-Eval&lt;/strong&gt; — multi-turn command execution&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;PinchBench&lt;/strong&gt; — composite agent capability evaluation&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;On LiveCodeBench (Aug 2024–May 2025), it scores 61.68 — outperforming DeepSeek-V3.1 (48.02), Kimi-K2-0905 (48.95), and GPT-5-main (48.57) by 13+ points. For front-end generation, ArtifactsBench score is 59.31 — second only to Gemini-2.5-Pro(lowthink) at 60.28 in this comparison group (per inclusionAI model card).&lt;/p&gt;

&lt;h3&gt;
  
  
  Long Context + Instruction Following
&lt;/h3&gt;

&lt;p&gt;With 262,144-token context (YaRN rope scaling), Ling-2.6-1T can hold entire codebases, long documents, or extended multi-turn agent conversations in a single call. On the MRCR benchmark (16K–256K context range), it consistently maintains retrieval accuracy — a critical requirement for agent pipelines that process long tool outputs or document corpora. IFBench score is 56.9%, demonstrating strong complex instruction-following under extended context.&lt;/p&gt;

&lt;h2&gt;
  
  
  Benchmark Performance
&lt;/h2&gt;

&lt;p&gt;Independent measurements from &lt;a href="https://artificialanalysis.ai/models/ling-2-6-1t" rel="noopener noreferrer"&gt;Artificial Analysis&lt;/a&gt; place Ling-2.6-1T at an Intelligence Index of 33.6 — better than 73% of 495 measured models, and #2 in the open-weights large non-reasoning class. Below are self-reported scores from the inclusionAI model card (comparing against DeepSeek-V3.1-terminus, Kimi-K2-0905, GPT-5-main, and Gemini-2.5-Pro(lowthink)), followed by independently verified AA scores.&lt;/p&gt;

&lt;h3&gt;
  
  
  Math &amp;amp; Reasoning (per inclusionAI model card)
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Benchmark&lt;/th&gt;
&lt;th&gt;Ling-2.6-1T&lt;/th&gt;
&lt;th&gt;DeepSeek-V3.1&lt;/th&gt;
&lt;th&gt;Kimi-K2-0905&lt;/th&gt;
&lt;th&gt;GPT-5-main&lt;/th&gt;
&lt;th&gt;Gemini-2.5-Pro*&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;AIME26&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;70.42&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;55.21&lt;/td&gt;
&lt;td&gt;50.16&lt;/td&gt;
&lt;td&gt;59.43&lt;/td&gt;
&lt;td&gt;70.10&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Omni-MATH&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;74.46&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;64.77&lt;/td&gt;
&lt;td&gt;62.42&lt;/td&gt;
&lt;td&gt;61.09&lt;/td&gt;
&lt;td&gt;72.02&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;OptMATH&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;57.68&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;35.99&lt;/td&gt;
&lt;td&gt;35.84&lt;/td&gt;
&lt;td&gt;39.16&lt;/td&gt;
&lt;td&gt;42.77&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;FinanceReasoning&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;87.45&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;86.44&lt;/td&gt;
&lt;td&gt;84.83&lt;/td&gt;
&lt;td&gt;86.28&lt;/td&gt;
&lt;td&gt;86.65&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;BBEH&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;47.34&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;42.86&lt;/td&gt;
&lt;td&gt;34.83&lt;/td&gt;
&lt;td&gt;39.75&lt;/td&gt;
&lt;td&gt;29.08&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;KOR-Bench&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;76.00&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;73.76&lt;/td&gt;
&lt;td&gt;73.20&lt;/td&gt;
&lt;td&gt;70.56&lt;/td&gt;
&lt;td&gt;59.68&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;ARC-AGI-1&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;43.81&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;14.69&lt;/td&gt;
&lt;td&gt;22.19&lt;/td&gt;
&lt;td&gt;14.06&lt;/td&gt;
&lt;td&gt;18.94&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;*Gemini-2.5-Pro(lowthink). Source: &lt;a href="https://huggingface.co/inclusionAI/Ling-1T" rel="noopener noreferrer"&gt;inclusionAI model card&lt;/a&gt;. Last verified: 2026-04-24.&lt;/p&gt;

&lt;h3&gt;
  
  
  Code Performance (per inclusionAI model card)
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Benchmark&lt;/th&gt;
&lt;th&gt;Ling-2.6-1T&lt;/th&gt;
&lt;th&gt;DeepSeek-V3.1&lt;/th&gt;
&lt;th&gt;Kimi-K2-0905&lt;/th&gt;
&lt;th&gt;GPT-5-main&lt;/th&gt;
&lt;th&gt;Gemini-2.5-Pro*&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;LiveCodeBench&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;61.68&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;48.02&lt;/td&gt;
&lt;td&gt;48.95&lt;/td&gt;
&lt;td&gt;48.57&lt;/td&gt;
&lt;td&gt;45.43&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;MultiPL-E&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;77.91&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;77.68&lt;/td&gt;
&lt;td&gt;73.54&lt;/td&gt;
&lt;td&gt;76.66&lt;/td&gt;
&lt;td&gt;71.48&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;CodeForces Rating&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;1901&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;1582&lt;/td&gt;
&lt;td&gt;1574&lt;/td&gt;
&lt;td&gt;1120&lt;/td&gt;
&lt;td&gt;1675&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;FullStack Bench&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;56.55&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;55.48&lt;/td&gt;
&lt;td&gt;54.00&lt;/td&gt;
&lt;td&gt;50.92&lt;/td&gt;
&lt;td&gt;48.19&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;ArtifactsBench&lt;/td&gt;
&lt;td&gt;59.31&lt;/td&gt;
&lt;td&gt;43.29&lt;/td&gt;
&lt;td&gt;44.87&lt;/td&gt;
&lt;td&gt;41.04&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;60.28&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Aider Code Editing&lt;/td&gt;
&lt;td&gt;83.65&lt;/td&gt;
&lt;td&gt;88.16&lt;/td&gt;
&lt;td&gt;85.34&lt;/td&gt;
&lt;td&gt;84.40&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;89.85&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;*Gemini-2.5-Pro(lowthink). Source: &lt;a href="https://huggingface.co/inclusionAI/Ling-1T" rel="noopener noreferrer"&gt;inclusionAI model card&lt;/a&gt;. Last verified: 2026-04-24. Note: model version names (e.g. "gpt-5-main", "DeepSeek-V3.1-terminus") are as reported by inclusionAI and may not correspond to publicly released versions.&lt;/p&gt;

&lt;h3&gt;
  
  
  Agent Execution Benchmarks (per inclusionAI model card)
&lt;/h3&gt;

&lt;p&gt;Ling-2.6-1T reaches open-source SOTA across agent-specific evaluations. Exact competitor scores are not published for all benchmarks; results listed as reported in the official model card.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Benchmark&lt;/th&gt;
&lt;th&gt;What It Measures&lt;/th&gt;
&lt;th&gt;Ling-2.6-1T&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;SWE-bench Verified&lt;/td&gt;
&lt;td&gt;Real-world GitHub issue resolution&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Open-source SOTA&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;BFCLv4&lt;/td&gt;
&lt;td&gt;Complex multi-step function/tool calling&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Open-source SOTA&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;TAU2-Bench&lt;/td&gt;
&lt;td&gt;Long-horizon agent task completion&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Open-source SOTA&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Claw-Eval&lt;/td&gt;
&lt;td&gt;Multi-turn command execution&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Open-source SOTA&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;PinchBench&lt;/td&gt;
&lt;td&gt;Composite agent capability&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Open-source SOTA&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;IFBench&lt;/td&gt;
&lt;td&gt;Complex instruction following&lt;/td&gt;
&lt;td&gt;56.9%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Source: &lt;a href="https://huggingface.co/inclusionAI/Ling-1T" rel="noopener noreferrer"&gt;inclusionAI model card&lt;/a&gt;. "Open-source SOTA" as claimed by inclusionAI; independent per-score data not yet available. Last verified: 2026-04-24.&lt;/p&gt;

&lt;h3&gt;
  
  
  Independent Benchmarks (Artificial Analysis)
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Ling-2.6-1T&lt;/th&gt;
&lt;th&gt;Notes&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;AA Intelligence Index&lt;/td&gt;
&lt;td&gt;33.6&lt;/td&gt;
&lt;td&gt;Better than 73% of 495 models&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;AA Coding Index&lt;/td&gt;
&lt;td&gt;33.0&lt;/td&gt;
&lt;td&gt;Better than 78% of models&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;AA Agentic Index&lt;/td&gt;
&lt;td&gt;48.2&lt;/td&gt;
&lt;td&gt;Better than 80% of models&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GPQA Diamond&lt;/td&gt;
&lt;td&gt;75.2%&lt;/td&gt;
&lt;td&gt;Graduate-level scientific reasoning&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;τ²-Bench Telecom&lt;/td&gt;
&lt;td&gt;89.8%&lt;/td&gt;
&lt;td&gt;Conversational agent tasks&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;IFBench&lt;/td&gt;
&lt;td&gt;56.9%&lt;/td&gt;
&lt;td&gt;Instruction-following&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Output Speed&lt;/td&gt;
&lt;td&gt;67.7 tok/s&lt;/td&gt;
&lt;td&gt;Via Novita AI on OpenRouter&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Source: &lt;a href="https://artificialanalysis.ai/models/ling-2-6-1t" rel="noopener noreferrer"&gt;Artificial Analysis&lt;/a&gt;. Last verified: 2026-04-24.&lt;/p&gt;

&lt;h2&gt;
  
  
  How to Use Ling-2.6-1T backed by Novita AI
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Option 1: Playground (No Code)
&lt;/h3&gt;

&lt;p&gt;Try the model instantly at &lt;a href="https://novita.ai/models/model-detail/inclusionai-ling-2.6-1t" rel="noopener noreferrer"&gt;novita.ai/models/model-detail/inclusionai-ling-2.6-1t&lt;/a&gt; — no setup required. Useful for quickly testing prompts before integrating into your app.&lt;/p&gt;

&lt;h3&gt;
  
  
  Option 2: API (Python)
&lt;/h3&gt;

&lt;p&gt;Ling-2.6-1T is fully OpenAI-compatible. Swap in your Novita API key and the model ID:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OpenAI&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;base_url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://api.novita.ai/v3/openai&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;YOUR_NOVITA_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;inclusionai/ling-2.6-1t&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Your prompt here&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}],&lt;/span&gt;
    &lt;span class="n"&gt;temperature&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.7&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;top_p&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.95&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Get your API key at &lt;a href="https://novita.ai/settings" rel="noopener noreferrer"&gt;novita.ai/settings&lt;/a&gt;. The model also supports streaming, function calling via tool_use, and structured output.&lt;/p&gt;

&lt;h3&gt;
  
  
  Option 3: Third-Party Tools
&lt;/h3&gt;

&lt;p&gt;Since Novita AI is OpenAI-compatible, Ling-2.6-1T works with any tool that accepts a custom base URL — including &lt;strong&gt;Cursor&lt;/strong&gt;, &lt;strong&gt;Claude Code&lt;/strong&gt;, &lt;strong&gt;OpenWebUI&lt;/strong&gt;, &lt;strong&gt;LangChain&lt;/strong&gt;, and &lt;strong&gt;LlamaIndex&lt;/strong&gt;. Set base URL to &lt;code&gt;https://api.novita.ai/v3/openai&lt;/code&gt; and model to &lt;code&gt;inclusionai/ling-2.6-1t&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Use Cases
&lt;/h2&gt;

&lt;p&gt;Ling-2.6-1T’s combination of 1T-parameter capacity, fast-thinking paradigm, and 262K context makes it a strong fit for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Coding Agents:&lt;/strong&gt; With a CodeForces rating of 1901 and strong LiveCodeBench scores, it handles competitive-level programming tasks. Pair it with Novita’s Agent Sandbox for fully isolated code execution without managing infrastructure.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Financial Analysis:&lt;/strong&gt; 87.45 on FinanceReasoning (#1 in its comparison group per inclusionAI model card) makes it suitable for automated report analysis, earnings summarization, and quantitative research workflows.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Front-End Generation:&lt;/strong&gt; The Hybrid Syntax–Function–Aesthetics reward in training specifically targets UI code quality. ArtifactsBench score of 59.31 is the second-highest in its comparison group — only 0.97 points behind Gemini-2.5-Pro(lowthink).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Long-Document Processing:&lt;/strong&gt; 262,144-token context handles multi-hundred-page documents, full repository analysis, or extended legal/research corpora in a single call.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;High-Volume Production APIs:&lt;/strong&gt; Non-reasoning paradigm means predictable token counts and lower latency variance — important when you’re running thousands of requests per day.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Migrating From DeepSeek V3 or Kimi K2?
&lt;/h2&gt;

&lt;p&gt;If you’re currently using DeepSeek V3 or Kimi K2 via another provider, switching to Ling-2.6-1T backed by Novita AI is a one-line change — same OpenAI-compatible API, same request format. The model ID becomes &lt;code&gt;inclusionai/ling-2.6-1t&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;On coding tasks, Ling-2.6-1T outperforms both DeepSeek-V3.1 and Kimi-K2-0905 on LiveCodeBench (61.68 vs 48.02 and 48.95), and on math reasoning it leads both on AIME26 and OptMATH. If your workloads are reasoning-heavy but you don’t want chain-of-thought verbosity, this is the cleaner upgrade path versus switching to a thinking model.&lt;/p&gt;

&lt;h2&gt;
  
  
  Pricing
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Input ($/1M tokens)&lt;/th&gt;
&lt;th&gt;Output ($/1M tokens)&lt;/th&gt;
&lt;th&gt;Context&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Ling-2.6-1T (Novita AI)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;$0.30&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;$2.50&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;262,144&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;DeepSeek V3.2&lt;/td&gt;
&lt;td&gt;$0.28&lt;/td&gt;
&lt;td&gt;$0.42&lt;/td&gt;
&lt;td&gt;128K&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Qwen3-235B-A22B&lt;/td&gt;
&lt;td&gt;$0.455&lt;/td&gt;
&lt;td&gt;$1.82&lt;/td&gt;
&lt;td&gt;131K&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Kimi K2 (OpenRouter)&lt;/td&gt;
&lt;td&gt;$0.57&lt;/td&gt;
&lt;td&gt;$2.30&lt;/td&gt;
&lt;td&gt;131K&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Novita AI pricing via &lt;a href="https://novita.ai/models/model-detail/inclusionai-ling-2.6-1t" rel="noopener noreferrer"&gt;novita.ai&lt;/a&gt;. Competitor pricing via OpenRouter. Last verified: 2026-04-24.&lt;/p&gt;

&lt;p&gt;Ling-2.6-1T’s output pricing ($2.50/M) is higher than DeepSeek V3.2 — the tradeoff is meaningfully stronger benchmark performance on reasoning and coding tasks. If token cost per call is the primary constraint, &lt;a href="https://novita.ai/models/model-detail/inclusionai-ling-2.6-flash" rel="noopener noreferrer"&gt;Ling-2.6-flash&lt;/a&gt; (104B params, 7.4B active) is the cheaper sibling and also exclusively available via Novita AI.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Free tier:&lt;/strong&gt; Ling-2.6-1T is available for free via the &lt;code&gt;inclusionai/ling-2.6-1t:free&lt;/code&gt; endpoint on OpenRouter, exclusively provided by Novita AI. This free window is time-limited — check current availability at &lt;a href="https://openrouter.ai/inclusionai/ling-2.6-1t:free" rel="noopener noreferrer"&gt;openrouter.ai/inclusionai/ling-2.6-1t:free&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Bottom line:&lt;/strong&gt; Ling-2.6-1T is currently the strongest open-weight non-reasoning model for competitive math and coding benchmarks, and the strongest open-source option if you need 262K context without paying for chain-of-thought verbosity. It’s not the cheapest option per token, but for complex reasoning tasks where thinking models would inflate your bill, it’s the most practical frontier open-source alternative available today.&lt;/p&gt;

&lt;p&gt;Exclusively backed by Novita AI — the only provider offering both Ling-2.6-1T and Ling-2.6-flash on OpenRouter — you get a stable inference endpoint, 99.9% uptime, and OpenAI-compatible API without managing the 32-GPU minimum deployment yourself.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://novita.ai/models/model-detail/inclusionai-ling-2.6-1t" rel="noopener noreferrer"&gt;Get Started with Ling-2.6-1T&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;h3&gt;
  
  
  What is Ling-2.6-1T?
&lt;/h3&gt;

&lt;p&gt;Ling-2.6-1T is a 1-trillion-parameter Mixture-of-Experts language model developed by Ant Group (inclusionAI). It activates roughly 50B parameters per token, supports a 262,144-token context window, and is designed as a fast-thinking, non-reasoning model — strong benchmark performance without chain-of-thought overhead. MIT-licensed and fully open weights.&lt;/p&gt;

&lt;h3&gt;
  
  
  How do I access Ling-2.6-1T via API?
&lt;/h3&gt;

&lt;p&gt;Set &lt;code&gt;base_url="https://api.novita.ai/v3/openai"&lt;/code&gt; and &lt;code&gt;model="inclusionai/ling-2.6-1t"&lt;/code&gt; in any OpenAI-compatible client. Get your API key at &lt;a href="https://novita.ai/settings" rel="noopener noreferrer"&gt;novita.ai/settings&lt;/a&gt;. It’s also accessible via OpenRouter using the same model ID.&lt;/p&gt;

&lt;h3&gt;
  
  
  How does Ling-2.6-1T compare to DeepSeek V3?
&lt;/h3&gt;

&lt;p&gt;On self-reported benchmarks (inclusionAI model card), Ling-2.6-1T outperforms DeepSeek-V3.1 on AIME26 (70.42 vs 55.21), LiveCodeBench (61.68 vs 48.02), and ARC-AGI-1 (43.81 vs 14.69). DeepSeek V3.2 scores higher on the Artificial Analysis Intelligence Index (42 vs 34), but Ling-2.6-1T offers a larger context window (262K vs 128K) at similar pricing ($0.30/M input).&lt;/p&gt;

&lt;h3&gt;
  
  
  What is Ling-2.6-1T’s context window?
&lt;/h3&gt;

&lt;p&gt;262,144 tokens (extended from 128K native via YaRN rope scaling). Maximum output length is 32,768 tokens.&lt;/p&gt;

&lt;h3&gt;
  
  
  Is Ling-2.6-1T free to use?
&lt;/h3&gt;

&lt;p&gt;Yes, temporarily. The &lt;code&gt;inclusionai/ling-2.6-1t:free&lt;/code&gt; endpoint on OpenRouter is provided exclusively by Novita AI. The free window is time-limited. The paid tier via Novita AI is $0.30/M input and $2.50/M output tokens.&lt;/p&gt;

</description>
      <category>llm</category>
      <category>ai</category>
      <category>tutorial</category>
      <category>python</category>
    </item>
    <item>
      <title>How to Use Kimi-K2 in Claude Code on Windows and Mac</title>
      <dc:creator>Novita AI</dc:creator>
      <pubDate>Tue, 15 Jul 2025 09:46:32 +0000</pubDate>
      <link>https://dev.to/novita_ai/how-to-use-kimi-k2-in-claude-code-on-windows-and-mac-18p</link>
      <guid>https://dev.to/novita_ai/how-to-use-kimi-k2-in-claude-code-on-windows-and-mac-18p</guid>
      <description>&lt;p&gt;Claude Code offers more powerful agent capabilities than traditional code editors like Cursor. By integrating &lt;a href="https://novita.ai/models/llm/moonshotai-kimi-k2-instruct?utm_source=blogs&amp;amp;utm_medium=article&amp;amp;utm_campaign=use-kimi-k2-in-claude-code" rel="noopener noreferrer"&gt;Kimi-K2 through Novita AI’s platform&lt;/a&gt;, developers can access enterprise-grade AI functionality at a fraction of the cost. This guide covers setting up Kimi-K2 with Claude Code on both Windows and Mac systems.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;What is Claude Code&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fh2kxdclqmvzvc056t135.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fh2kxdclqmvzvc056t135.png" width="793" height="411"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Source: &lt;a href="http://www.anthropic.com/claude-code" rel="noopener noreferrer"&gt;http://www.anthropic.com/claude-code&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Claude Code is an agentic command line tool that revolutionizes the way developers interact with AI for coding tasks. Unlike traditional code editors, Claude Code offers more powerful agent abilities than Cursor.&lt;/p&gt;

&lt;p&gt;This innovative tool enables developers to delegate complex coding tasks directly from their terminal. It transforms natural language descriptions into fully functional code, making it an indispensable asset for modern development workflows.&lt;/p&gt;

&lt;p&gt;The tool operates as an interactive session where developers can describe their requirements in plain English. Claude Code intelligently generates, modifies, and optimizes code accordingly. Its advanced understanding of context and project structure allows it to make informed decisions about code architecture, dependencies, and implementation patterns.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Why Use Kimi-K2 in Claude Code&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Kimi-K2 presents a compelling alternative to traditional Claude models, offering similar capabilities at significantly reduced costs. The economic advantages are substantial:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Kimi-K2 on Novita AI: $0.57 per 1M input tokens and $2.3 per 1M output tokens&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Claude Sonnet: $3 per 1M input tokens and $15 per 1M output tokens&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This represents an 81% cost reduction for input tokens and an 85% reduction for output tokens.&lt;/p&gt;

&lt;p&gt;Beyond cost savings, Kimi-K2 through Novita AI provides an anthropic-compatible LLM API with higher rate limits than official channels. This compatibility ensures seamless integration with existing Claude Code workflows while offering improved performance and reliability.&lt;/p&gt;

&lt;p&gt;The combination delivers enterprise-grade AI capabilities without the premium pricing. This makes advanced AI development accessible to a broader range of developers and organizations.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Getting Your API Key on Novita AI&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://novita.ai/user/register?utm_source=blogs&amp;amp;utm_medium=article&amp;amp;utm_campaign=use-kimi-k2-in-claude-code" rel="noopener noreferrer"&gt;Sign up for a Novita AI account&lt;/a&gt; to get started with free trial credits. Navigate to the &lt;a href="https://novita.ai/settings/key-management?utm_source=blogs&amp;amp;utm_medium=article&amp;amp;utm_campaign=use-kimi-k2-in-claude-code" rel="noopener noreferrer"&gt;Key Management page&lt;/a&gt; in your dashboard and click “Create New Key.”&lt;/p&gt;

&lt;p&gt;Copy the generated API key immediately and store it securely – it won’t be displayed again. You’ll need this key for the configuration steps below.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Installing Claude Code&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Before installing Claude Code, ensure your system meets the minimum requirements. Node.js 18 or higher must be installed on your local environment. You can verify your Node.js version by running &lt;code&gt;node --version&lt;/code&gt; in your terminal.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;For Windows&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Open Command Prompt and execute the following commands:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;cmd&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;npm install -g @anthropic-ai/claude-code&lt;/p&gt;

&lt;p&gt;npx win-claude-code@latest&lt;/p&gt;

&lt;p&gt;The global installation ensures Claude Code is accessible from any directory on your system. The &lt;code&gt;npx win-claude-code@latest&lt;/code&gt; command downloads and runs the latest Windows-specific version.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;For Mac&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Open Terminal and run:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;bash&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;npm install -g @anthropic-ai/claude-code&lt;/p&gt;

&lt;p&gt;Mac users can proceed directly with the global installation without requiring additional platform-specific commands. The installation process automatically configures the necessary dependencies and PATH variables.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Setting Up Environment Variables&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Environment variables configure Claude Code to use Kimi-K2 through Novita AI’s API endpoints. These variables tell Claude Code where to send requests and how to authenticate.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;For Windows&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Open Command Prompt and set the following environment variables:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;cmd&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;set ANTHROPIC_BASE_URL=&lt;a href="https://api.novita.ai/anthropic" rel="noopener noreferrer"&gt;https://api.novita.ai/anthropic&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;set ANTHROPIC_AUTH_TOKEN=&lt;strong&gt;&amp;lt;&lt;/strong&gt;Novita API Key*&lt;em&gt;&amp;amp;gt;&lt;/em&gt;*&lt;/p&gt;

&lt;p&gt;set ANTHROPIC_MODEL=moonshotai/kimi-k2-instruct&lt;/p&gt;

&lt;p&gt;set ANTHROPIC_SMALL_FAST_MODEL=moonshotai/kimi-k2-instruct&lt;/p&gt;

&lt;p&gt;Replace &lt;code&gt;&amp;lt;Novita API Key&amp;gt;&lt;/code&gt; with your actual API key obtained from the Novita AI platform. These variables remain active for the current session and must be reset if you close the Command Prompt.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;For Mac&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Open Terminal and export the following environment variables:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;bash&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;export ANTHROPIC_BASE_URL="&lt;a href="https://api.novita.ai/anthropic" rel="noopener noreferrer"&gt;https://api.novita.ai/anthropic&lt;/a&gt;"&lt;/p&gt;

&lt;p&gt;export ANTHROPIC_AUTH_TOKEN="&amp;lt;Novita API Key&amp;gt;"&lt;/p&gt;

&lt;p&gt;export ANTHROPIC_MODEL="moonshotai/kimi-k2-instruct"&lt;/p&gt;

&lt;p&gt;export ANTHROPIC_SMALL_FAST_MODEL="moonshotai/kimi-k2-instruct"&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Starting Claude Code&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;With installation and configuration complete, you can now start Claude Code in your project directory. Navigate to your desired project location using the &lt;code&gt;cd&lt;/code&gt; command:&lt;/p&gt;

&lt;p&gt;cd &lt;strong&gt;&amp;lt;&lt;/strong&gt;your-project-directory*&lt;em&gt;&amp;amp;gt;&lt;/em&gt;*&lt;/p&gt;

&lt;p&gt;claude .&lt;/p&gt;

&lt;p&gt;The dot (.) parameter instructs Claude Code to operate in the current directory. Upon startup, you’ll see the Claude Code prompt appear in an interactive session.&lt;/p&gt;

&lt;p&gt;This indicates the tool is ready to receive your instructions. The interface provides a clean, intuitive environment for natural language programming interactions.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Building Your First Project&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Claude Code excels at transforming detailed project descriptions into functional applications. After entering your prompt, press Enter to begin the task. Claude Code will analyze your requirements, create the necessary files, implement the functionality, and provide a complete project structure with documentation.&lt;/p&gt;

&lt;p&gt;Here’s an example of how to create a Python Flask web app with MBTI personality guessing game:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8zrw8747f9a7a8xya22r.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8zrw8747f9a7a8xya22r.gif" width="1280" height="576"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Using Claude Code in VSCode or Cursor&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Claude Code integrates seamlessly with popular development environments. It enhances your existing workflow rather than replacing it.&lt;/p&gt;

&lt;p&gt;You can use Claude Code directly in the terminal within VSCode or Cursor. This maintains access to your familiar development tools while leveraging AI assistance.&lt;/p&gt;

&lt;p&gt;Additionally, Claude Code plugins are available for both VSCode and Cursor. These plugins provide deeper integration with these editors, offering inline AI assistance, code suggestions, and project management features directly within your IDE interface.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftsnsub22iatxlnjr7enl.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftsnsub22iatxlnjr7enl.png" alt="claude code in cursor" width="800" height="399"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The terminal integration allows you to run Claude Code commands without leaving your development environment. This creates a streamlined workflow for AI-assisted development.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Help and Documentation Resources&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Claude Code includes comprehensive help documentation accessible through the &lt;code&gt;/help&lt;/code&gt; command. This command displays available commands, usage examples, and troubleshooting information.&lt;/p&gt;

&lt;p&gt;The help system is context-aware, providing relevant information based on your current project and session state.&lt;/p&gt;

&lt;p&gt;For additional support, Novita AI provides &lt;a href="https://novita.ai/docs/guides/integration-claude-code?utm_source=blogs&amp;amp;utm_medium=article&amp;amp;utm_campaign=use-kimi-k2-in-claude-code" rel="noopener noreferrer"&gt;extensive documentation&lt;/a&gt; . This covers advanced configuration options, API usage patterns, and best practices.&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://docs.anthropic.com/en/docs/claude-code/overview" rel="noopener noreferrer"&gt;Anthropic documentation&lt;/a&gt; offers detailed information about Claude Code’s capabilities and features.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Conclusion&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://novita.ai/docs/guides/integration-claude-code?utm_source=blogs&amp;amp;utm_medium=article&amp;amp;utm_campaign=use-kimi-k2-in-claude-code" rel="noopener noreferrer"&gt;Kimi-K2 integration with Claude Code&lt;/a&gt; through Novita AI delivers enterprise-grade capabilities at significantly reduced costs. The combination transforms natural language descriptions into functional code, dramatically accelerating development workflows. Start your journey with Kimi-K2 and Claude Code today to experience the future of AI-assisted programming.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://novita.ai/?utm_source=blogs&amp;amp;utm_medium=article&amp;amp;utm_campaign=use-kimi-k2-in-claude-code" rel="noopener noreferrer"&gt;&lt;em&gt;Novita AI&lt;/em&gt;&lt;/a&gt; &lt;em&gt;is an AI cloud platform that offers developers an easy way to deploy AI models using our simple API, while also providing the affordable and reliable GPU cloud for building and scaling.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>claudecode</category>
      <category>kimi</category>
      <category>lowcode</category>
    </item>
    <item>
      <title>Access Free DeepSeek R1 0528 API Now</title>
      <dc:creator>Novita AI</dc:creator>
      <pubDate>Thu, 29 May 2025 08:48:55 +0000</pubDate>
      <link>https://dev.to/novita_ai/access-free-deepseek-r1-0528-api-now-2l7j</link>
      <guid>https://dev.to/novita_ai/access-free-deepseek-r1-0528-api-now-2l7j</guid>
      <description>&lt;p&gt;We’re excited to announce that DeepSeek AI’s latest model, &lt;a href="https://novita.ai/models/llm/deepseek-deepseek-r1-0528?utm_source=blogs&amp;amp;utm_medium=article&amp;amp;utm_campaign=deepseek-r1-0528" rel="noopener noreferrer"&gt;DeepSeek R1&lt;/a&gt; 0528, released today, is officially available in the Novita AI Model Library. We are also the official inference partner for &lt;a href="https://huggingface.co/deepseek-ai/DeepSeek-R1-0528" rel="noopener noreferrer"&gt;DeepSeek R1 0528&lt;/a&gt; on Hugging Face, supporting the community in bringing advanced models to production.&lt;/p&gt;

&lt;p&gt;For a limited time, new users can claim &lt;a href="https://novita.ai/referral?invited_code=5W10UA&amp;amp;utm_source=blogs&amp;amp;utm_medium=article&amp;amp;utm_campaign=homepage" rel="noopener noreferrer"&gt;&lt;strong&gt;$10 in free credits&lt;/strong&gt;&lt;/a&gt; to explore and build with DeepSeek-R1 0528’s advanced reasoning capabilities.&lt;/p&gt;

&lt;p&gt;Here’s the current DeepSeek-R1 0528 pricing on Novita AI:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://novita.ai/models/llm/deepseek-deepseek-r1-0528?utm_source=blogs&amp;amp;utm_medium=article&amp;amp;utm_campaign=deepseek-r1-0528" rel="noopener noreferrer"&gt;&lt;strong&gt;DeepSeek-R1–0528&lt;/strong&gt;&lt;/a&gt;: $0.7 / M input tokens, $2.5 / M output tokens&lt;/p&gt;

&lt;h1&gt;
  
  
  &lt;strong&gt;How to Access DeepSeek R1 0528 on Novita AI&lt;/strong&gt;
&lt;/h1&gt;

&lt;p&gt;Getting started with DeepSeek R1 0528 is fast, simple, and risk-free on Novita AI. Thanks to the Referral Program, you’ll receive &lt;strong&gt;$10 in free credits&lt;/strong&gt; — enough to fully explore DeepSeek R1 0528’s power, build prototypes, and even launch your first use case without any upfront cost.&lt;/p&gt;

&lt;h1&gt;
  
  
  &lt;strong&gt;Use the Playground (No Coding Required)&lt;/strong&gt;
&lt;/h1&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Instant Access&lt;/strong&gt;: &lt;a href="https://novita.ai/referral?invited_code=5W10UA&amp;amp;utm_source=blogs&amp;amp;utm_medium=article&amp;amp;utm_campaign=qwen-3-now-available-on-novita-ai-claim-your-10-free-credits" rel="noopener noreferrer"&gt;Sign up&lt;/a&gt;, claim your free credits, and start experimenting with DeepSeek R1 0528 and other top models in seconds.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Interactive UI&lt;/strong&gt;: Test prompts, chain-of-thought reasoning, and visualize results in real time.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Model Comparison&lt;/strong&gt;: Effortlessly switch between Qwen 3, Llama 4, DeepSeek, and more to find the perfect fit for your needs.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://novita.ai/models/llm/deepseek-deepseek-r1-0528?utm_source=blogs&amp;amp;utm_medium=article&amp;amp;utm_campaign=deepseek-r1-0528" rel="noopener noreferrer"&gt;Explore DeepSeek R1 0528 Demo Now&lt;/a&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  &lt;strong&gt;Integrate via API (For Developers)&lt;/strong&gt;
&lt;/h1&gt;

&lt;p&gt;Seamlessly connect DeepSeek R1 0528 to your applications, workflows, or chatbots with Novita AI’s unified REST API — no need to manage model weights or infrastructure. Novita AI offers multi-language SDKs (Python, Node.js, cURL, and more) and advanced parameter controls for power users.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Option 1: Direct API Integration (Python Example)&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;To get started, simply use the code snippet below:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OpenAI&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;base_url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://api.novita.ai/v3/openai&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;session_Ntg-O34ZOS-q5bNnkb3IcixmWnmxEQBxwKWMW3es3CD7KG4PEhFE1yRTRMGS3s8zZ52hrMdz14MmI4oalaDJTw==&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;deepseek/deepseek-r1-0528&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;stream&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt; &lt;span class="c1"&gt;# or False
&lt;/span&gt;&lt;span class="n"&gt;max_tokens&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;2048&lt;/span&gt;
&lt;span class="n"&gt;system_content&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="n"&gt;Be&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="n"&gt;helpful&lt;/span&gt; &lt;span class="n"&gt;assistant&lt;/span&gt;&lt;span class="sh"&gt;""&lt;/span&gt;
&lt;span class="n"&gt;temperature&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
&lt;span class="n"&gt;top_p&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
&lt;span class="n"&gt;min_p&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
&lt;span class="n"&gt;top_k&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;50&lt;/span&gt;
&lt;span class="n"&gt;presence_penalty&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
&lt;span class="n"&gt;frequency_penalty&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
&lt;span class="n"&gt;repetition_penalty&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
&lt;span class="n"&gt;response_format&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="n"&gt;chat_completion_res&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;system&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;system_content&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Hi there!&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;stream&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;stream&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;max_tokens&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;max_tokens&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;temperature&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;temperature&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;top_p&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;top_p&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;presence_penalty&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;presence_penalty&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;frequency_penalty&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;frequency_penalty&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;response_format&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;response_format&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;extra_body&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;top_k&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;top_k&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;repetition_penalty&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;repetition_penalty&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;min_p&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;min_p&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;stream&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;chunk&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;chat_completion_res&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;delta&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;end&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;chat_completion_res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Key Features:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Unified endpoint:&lt;/strong&gt;&lt;code&gt;/v3/openai&lt;/code&gt; supports OpenAI’s Chat Completions API format.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Flexible controls:&lt;/strong&gt; Adjust temperature, top-p, penalties, and more for tailored results.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Streaming &amp;amp; batching:&lt;/strong&gt; Choose your preferred response mode.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Option 2: Multi-Agent Workflows with OpenAI Agents SDK&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Build advanced multi-agent systems by integrating Novita AI with the &lt;a href="https://novita.ai/docs/guides/integration-openai-agents-sdk?utm_source=blogs&amp;amp;utm_medium=article&amp;amp;utm_campaign=qwen-3-now-available-on-novita-ai-claim-your-10-free-credits" rel="noopener noreferrer"&gt;OpenAI Agents SDK&lt;/a&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Plug-and-play:&lt;/strong&gt; Use Novita AI’s LLMs in any OpenAI Agents workflow.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Supports handoffs, routing, and tool use:&lt;/strong&gt; Design agents that can delegate, triage, or run functions, all powered by Novita AI’s models.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Python integration:&lt;/strong&gt; Simply point the SDK to Novita’s endpoint (&lt;code&gt;https://api.novita.ai/v3/openai&lt;/code&gt;) and use your API key.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  &lt;strong&gt;Connect DeepSeek R1 0528 API on Third-Party Platforms&lt;/strong&gt;
&lt;/h1&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://novita.ai/docs/guides/huggingface?utm_source=blogs&amp;amp;utm_medium=article&amp;amp;utm_campaign=qwen-3-now-available-on-novita-ai-claim-your-10-free-credits" rel="noopener noreferrer"&gt;&lt;strong&gt;Hugging Face&lt;/strong&gt;&lt;/a&gt;: Use DeepSeek R1 0528 in Spaces, pipelines, or with the Transformers library via Novita AI endpoints.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Agent &amp;amp; Orchestration Frameworks:&lt;/strong&gt; Easily connect Novita AI with partner platforms like &lt;a href="https://novita.ai/docs/guides/continue?utm_source=blogs&amp;amp;utm_medium=article&amp;amp;utm_campaign=qwen-3-now-available-on-novita-ai-claim-your-10-free-credits" rel="noopener noreferrer"&gt;Continue&lt;/a&gt;, &lt;a href="https://novita.ai/docs/guides/anythingllm?utm_source=blogs&amp;amp;utm_medium=article&amp;amp;utm_campaign=qwen-3-now-available-on-novita-ai-claim-your-10-free-credits" rel="noopener noreferrer"&gt;AnythingLLM,&lt;/a&gt; &lt;a href="https://novita.ai/docs/guides/langchain?utm_source=blogs&amp;amp;utm_medium=article&amp;amp;utm_campaign=qwen-3-now-available-on-novita-ai-claim-your-10-free-credits" rel="noopener noreferrer"&gt;LangChain&lt;/a&gt;, &lt;a href="https://novita.ai/docs/guides/dify?utm_source=blogs&amp;amp;utm_medium=article&amp;amp;utm_campaign=qwen-3-now-available-on-novita-ai-claim-your-10-free-credits" rel="noopener noreferrer"&gt;Dify&lt;/a&gt; and &lt;a href="https://novita.ai/docs/guides/langflow?utm_source=blogs&amp;amp;utm_medium=article&amp;amp;utm_campaign=qwen-3-now-available-on-novita-ai-claim-your-10-free-credits" rel="noopener noreferrer"&gt;Langflow&lt;/a&gt; through official connectors and step-by-step integration guides.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;OpenAI-Compatible API:&lt;/strong&gt; Enjoy hassle-free migration and integration with tools such as &lt;a href="https://blogs.novita.ai/how-to-integrate-novita-ai-llm-api-with-cline-in-vscode/" rel="noopener noreferrer"&gt;Cline&lt;/a&gt; and &lt;a href="https://novita.ai/docs/guides/cursor?utm_source=blogs&amp;amp;utm_medium=article&amp;amp;utm_campaign=qwen-3-now-available-on-novita-ai-claim-your-10-free-credits" rel="noopener noreferrer"&gt;Cursor&lt;/a&gt;, designed for the OpenAI API standard.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://novita.ai/?utm_source=blogs&amp;amp;utm_medium=article&amp;amp;utm_campaign=deepseek-r1-0528" rel="noopener noreferrer"&gt;&lt;em&gt;Novita AI&lt;/em&gt;&lt;/a&gt; &lt;em&gt;is the All-in-one cloud platform that empowers your AI ambitions. Integrated APIs, serverless, GPU Instance — the cost-effective tools you need. Eliminate infrastructure, start free, and make your AI vision a reality.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>deepseek</category>
    </item>
    <item>
      <title>Access Free DeepSeek R1 0528 API Now</title>
      <dc:creator>Novita AI</dc:creator>
      <pubDate>Thu, 29 May 2025 08:42:14 +0000</pubDate>
      <link>https://dev.to/novita_ai/access-free-deepseek-r1-0528-api-now-54n7</link>
      <guid>https://dev.to/novita_ai/access-free-deepseek-r1-0528-api-now-54n7</guid>
      <description>&lt;p&gt;We’re excited to announce that DeepSeek AI’s latest model, DeepSeek R1 0528, released today, is officially available in the Novita AI Model Library. We are also the official inference partner for DeepSeek R1 0528 on Hugging Face, supporting the community in bringing advanced models to production.&lt;/p&gt;

&lt;p&gt;For a limited time, new users can claim $10 in free credits to explore and build with DeepSeek-R1 0528’s advanced reasoning capabilities.&lt;/p&gt;

&lt;p&gt;Here’s the current DeepSeek-R1 0528 pricing on Novita AI:&lt;/p&gt;

&lt;p&gt;DeepSeek-R1–0528: $0.7 / M input tokens, $2.5 / M output tokens&lt;/p&gt;

&lt;p&gt;How to Access DeepSeek R1 0528 on Novita AI&lt;br&gt;
Getting started with DeepSeek R1 0528 is fast, simple, and risk-free on Novita AI. Thanks to the Referral Program, you’ll receive $10 in free credits — enough to fully explore DeepSeek R1 0528’s power, build prototypes, and even launch your first use case without any upfront cost.&lt;/p&gt;

&lt;p&gt;Use the Playground (No Coding Required)&lt;br&gt;
Instant Access: Sign up, claim your free credits, and start experimenting with DeepSeek R1 0528 and other top models in seconds.&lt;br&gt;
Interactive UI: Test prompts, chain-of-thought reasoning, and visualize results in real time.&lt;br&gt;
Model Comparison: Effortlessly switch between Qwen 3, Llama 4, DeepSeek, and more to find the perfect fit for your needs.&lt;br&gt;
Explore DeepSeek R1 0528 Demo Now&lt;/p&gt;

&lt;p&gt;Integrate via API (For Developers)&lt;br&gt;
Seamlessly connect DeepSeek R1 0528 to your applications, workflows, or chatbots with Novita AI’s unified REST API — no need to manage model weights or infrastructure. Novita AI offers multi-language SDKs (Python, Node.js, cURL, and more) and advanced parameter controls for power users.&lt;/p&gt;

&lt;p&gt;Option 1: Direct API Integration (Python Example)&lt;br&gt;
To get started, simply use the code snippet below:&lt;/p&gt;

&lt;p&gt;from openai import OpenAI&lt;/p&gt;

&lt;p&gt;client = OpenAI(&lt;br&gt;
    base_url="&lt;a href="https://api.novita.ai/v3/openai" rel="noopener noreferrer"&gt;https://api.novita.ai/v3/openai&lt;/a&gt;",&lt;br&gt;
    api_key="session_Ntg-O34ZOS-q5bNnkb3IcixmWnmxEQBxwKWMW3es3CD7KG4PEhFE1yRTRMGS3s8zZ52hrMdz14MmI4oalaDJTw==",&lt;br&gt;
)&lt;br&gt;
model = "deepseek/deepseek-r1-0528"&lt;br&gt;
stream = True # or False&lt;br&gt;
max_tokens = 2048&lt;br&gt;
system_content = ""Be a helpful assistant""&lt;br&gt;
temperature = 1&lt;br&gt;
top_p = 1&lt;br&gt;
min_p = 0&lt;br&gt;
top_k = 50&lt;br&gt;
presence_penalty = 0&lt;br&gt;
frequency_penalty = 0&lt;br&gt;
repetition_penalty = 1&lt;br&gt;
response_format = { "type": "text" }&lt;br&gt;
chat_completion_res = client.chat.completions.create(&lt;br&gt;
    model=model,&lt;br&gt;
    messages=[&lt;br&gt;
        {&lt;br&gt;
            "role": "system",&lt;br&gt;
            "content": system_content,&lt;br&gt;
        },&lt;br&gt;
        {&lt;br&gt;
            "role": "user",&lt;br&gt;
            "content": "Hi there!",&lt;br&gt;
        }&lt;br&gt;
    ],&lt;br&gt;
    stream=stream,&lt;br&gt;
    max_tokens=max_tokens,&lt;br&gt;
    temperature=temperature,&lt;br&gt;
    top_p=top_p,&lt;br&gt;
    presence_penalty=presence_penalty,&lt;br&gt;
    frequency_penalty=frequency_penalty,&lt;br&gt;
    response_format=response_format,&lt;br&gt;
    extra_body={&lt;br&gt;
      "top_k": top_k,&lt;br&gt;
      "repetition_penalty": repetition_penalty,&lt;br&gt;
      "min_p": min_p&lt;br&gt;
    }&lt;br&gt;
  )&lt;br&gt;
if stream:&lt;br&gt;
    for chunk in chat_completion_res:&lt;br&gt;
        print(chunk.choices[0].delta.content or "", end="")&lt;br&gt;
else:&lt;br&gt;
    print(chat_completion_res.choices[0].message.content)&lt;br&gt;
Key Features:&lt;/p&gt;

&lt;p&gt;Unified endpoint:/v3/openai supports OpenAI’s Chat Completions API format.&lt;br&gt;
Flexible controls: Adjust temperature, top-p, penalties, and more for tailored results.&lt;br&gt;
Streaming &amp;amp; batching: Choose your preferred response mode.&lt;br&gt;
Option 2: Multi-Agent Workflows with OpenAI Agents SDK&lt;br&gt;
Build advanced multi-agent systems by integrating Novita AI with the OpenAI Agents SDK:&lt;/p&gt;

&lt;p&gt;Plug-and-play: Use Novita AI’s LLMs in any OpenAI Agents workflow.&lt;br&gt;
Supports handoffs, routing, and tool use: Design agents that can delegate, triage, or run functions, all powered by Novita AI’s models.&lt;br&gt;
Python integration: Simply point the SDK to Novita’s endpoint (&lt;a href="https://api.novita.ai/v3/openai" rel="noopener noreferrer"&gt;https://api.novita.ai/v3/openai&lt;/a&gt;) and use your API key.&lt;br&gt;
Connect DeepSeek R1 0528 API on Third-Party Platforms&lt;br&gt;
Hugging Face: Use DeepSeek R1 0528 in Spaces, pipelines, or with the Transformers library via Novita AI endpoints.&lt;br&gt;
Agent &amp;amp; Orchestration Frameworks: Easily connect Novita AI with partner platforms like Continue, AnythingLLM, LangChain, Dify and Langflow through official connectors and step-by-step integration guides.&lt;br&gt;
OpenAI-Compatible API: Enjoy hassle-free migration and integration with tools such as Cline and Cursor, designed for the OpenAI API standard.&lt;br&gt;
Showcase&lt;br&gt;
Prompt: Write a Python program that shows a ball bouncing inside a spinning hexagon. The ball should be affected by gravity and friction, and it must bounce off the rotating walls realistically&lt;/p&gt;

&lt;p&gt;Prompt: Build a pilot game&lt;/p&gt;

&lt;p&gt;Prompt: Build a PDF summary web app + UI concept&lt;/p&gt;

&lt;p&gt;Novita AI is the All-in-one cloud platform that empowers your AI ambitions. Integrated APIs, serverless, GPU Instance — the cost-effective tools you need. Eliminate infrastructure, start free, and make your AI vision a reality.&lt;/p&gt;

</description>
      <category>deepseek</category>
    </item>
    <item>
      <title>Qwen 3 Now Available on Novita AI — Claim Your $10 Free Credits</title>
      <dc:creator>Novita AI</dc:creator>
      <pubDate>Fri, 23 May 2025 03:30:00 +0000</pubDate>
      <link>https://dev.to/novita_ai/qwen-3-now-available-on-novita-ai-claim-your-10-free-credits-1946</link>
      <guid>https://dev.to/novita_ai/qwen-3-now-available-on-novita-ai-claim-your-10-free-credits-1946</guid>
      <description>&lt;p&gt;We’re excited to announce a strategic partnership with SGLang, a fast serving engine for large language models and vision language models. Through this collaboration, Novita AI will provide high-performance GPU cloud resources for SGLang’s ongoing research, benchmarking, and optimization efforts.&lt;/p&gt;

&lt;p&gt;SGLang is a leading inference engine that co-designs a structured generation language with a highly optimized runtime, enabling powerful performance gains such as efficient RadixAttention cache reuse and zero-overhead batch scheduling for large language and vision-language models. By aligning language-level control with backend optimizations, it empowers developers to build complex generation workflows, multi-modal applications, and parallel inference pipelines with reliability and scale. SGLang is supported by leading institutions including NVIDIA, AMD, xAI, Oracle Cloud, Google Cloud, LinkedIn, Cursor, alongside research groups at Stanford, University of California, Berkeley, and University of California, Los Angeles — evidence of strong community engagement and broad industry adoption.&lt;/p&gt;

&lt;p&gt;“SGLang’s integration of language-level primitives with runtime optimizations demonstrates the value of aligning software and hardware to unlock new performance levels,” said Junyu Huang, Co-Founder &amp;amp; COO at Novita AI. “By contributing our infrastructure and expertise, we’ve already supported the development of SGLang’s first end-to-end multi-turn reinforcement learning (RL) framework and the Prism multi-large language model serving system, and remain committed to fueling its ongoing innovations for developers everywhere.”&lt;/p&gt;

&lt;p&gt;“We’re thrilled to partner with the SGLang team,” added Junyu Huang. “Having supported their RL framework and multi-LLM serving system, we’re excited to see these achievements accelerate their work and bring powerful inference performance to applications across industries.”&lt;/p&gt;

&lt;p&gt;Novita AI is also collaborating on SGLang’s large-scale expert parallelism project, an open-source implementation designed to approach the throughput benchmarks detailed in the official DeepSeek blog, partnering to bring this milestone to fruition.&lt;/p&gt;

&lt;p&gt;This collaboration reflects Novita AI’s ongoing commitment to advancing an open ecosystem of inference engines and supporting diverse research initiatives through shared infrastructure and joint development efforts.&lt;/p&gt;

&lt;p&gt;Through collaborations with pioneering open-source projects like SGLang, Novita AI continues to advance its mission of democratizing AI, making cutting-edge inference capabilities readily available to developers worldwide.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;About Novita AI&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Novita AI is an AI cloud platform that helps developers easily deploy AI models through a simple API, backed by affordable and reliable GPU cloud infrastructure. By supporting open-source libraries for LLM inference and serving — Novita AI is driving the future of AI and encouraging innovation across the industry.&lt;/p&gt;

</description>
    </item>
    <item>
      <title>LLM Dedicated Endpoint on Novita AI: Custom Models, Usage-Based Pricing, and DevOps-Free Scaling</title>
      <dc:creator>Novita AI</dc:creator>
      <pubDate>Wed, 14 May 2025 04:00:00 +0000</pubDate>
      <link>https://dev.to/novita_ai/llm-dedicated-endpoint-on-novita-ai-custom-models-usage-based-pricing-and-devops-free-scaling-2bgk</link>
      <guid>https://dev.to/novita_ai/llm-dedicated-endpoint-on-novita-ai-custom-models-usage-based-pricing-and-devops-free-scaling-2bgk</guid>
      <description>&lt;p&gt;Want to ship your own fine-tuned LLMs, without babysitting GPUs or racking up idle costs?&lt;/p&gt;

&lt;p&gt;&lt;a href="https://novita.ai/dedicated-endpoint?utm_source=blogs&amp;amp;utm_medium=article&amp;amp;utm_campaign=%3Fp%3D11236" rel="noopener noreferrer"&gt;&lt;strong&gt;Novita AI’s LLM Dedicated Endpoint&lt;/strong&gt;&lt;/a&gt; gives you true flexibility: run your custom models, pay only for tokens used, and let Novita handle deployment and scaling.&lt;/p&gt;

&lt;p&gt;Compared to &lt;a href="https://novita.ai/models?utm_source=blogs&amp;amp;utm_medium=article&amp;amp;utm_campaign=%3Fp%3D11236" rel="noopener noreferrer"&gt;LLM Public APIs&lt;/a&gt;, it’s your stack, your way. Compared to raw &lt;a href="https://novita.ai/gpus?utm_source=blogs&amp;amp;utm_medium=article&amp;amp;utm_campaign=%3Fp%3D11236" rel="noopener noreferrer"&gt;GPU hosting&lt;/a&gt;, you get predictable pricing and a pro team to keep your models running smoothly.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;What is an LLM Dedicated Endpoint?&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;A &lt;strong&gt;LLM Dedicated Endpoint&lt;/strong&gt; is your own private API for running any model you want — fine-tuned, proprietary, or mainstream. No noisy neighbors, no shared resources. Novita AI handles all the infra, you just send requests. &lt;a href="https://novita.ai/dedicated-endpoint?utm_source=blogs&amp;amp;utm_medium=article&amp;amp;utm_campaign=%3Fp%3D11236" rel="noopener noreferrer"&gt;Learn more&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Key Features&lt;/strong&gt;
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Bring Your Own Model:&lt;/strong&gt; Deploy your fine-tuned or custom LLMs.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;No Idle GPU Bills:&lt;/strong&gt; Pay only for tokens used (usage-based, not hourly).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Auto-Scales Instantly:&lt;/strong&gt; Handles spikes, no manual scaling.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Full Isolation:&lt;/strong&gt; Dedicated compute, your data only.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Enterprise Uptime, Low Latency:&lt;/strong&gt; SLAs for mission-critical apps.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Zero-DevOps:&lt;/strong&gt; Monitoring, scaling, and patching done for you.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;LLM Public Endpoints vs LLM Dedicated Endpoint&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Novita AI offers two LLM API flavors—pick what fits your workflow:&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;1.&lt;/strong&gt; &lt;a href="https://novita.ai/models/llm?utm_source=blogs&amp;amp;utm_medium=article&amp;amp;utm_campaign=%3Fp%3D11236" rel="noopener noreferrer"&gt;&lt;strong&gt;LLM Public Endpoints&lt;/strong&gt;&lt;/a&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;What:&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Plug-and-play APIs for open-source models like Llama, DeepSeek, Qwen, Gemma, and more.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;When to use:&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Prototyping, hackathons, projects with standard LLMs.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Why:&lt;/strong&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Fast to integrate&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;No servers or infra&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Scale to production&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;2.&lt;/strong&gt; &lt;a href="https://novita.ai/dedicated-endpoint?utm_source=blogs&amp;amp;utm_medium=article&amp;amp;utm_campaign=%3Fp%3D11236" rel="noopener noreferrer"&gt;&lt;strong&gt;LLM Dedicated Endpoint&lt;/strong&gt;&lt;/a&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;What:&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Your own API for custom/fine-tuned models, including proprietary LLMs.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;When to use:&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
When you need control, privacy, or custom models (think: internal tools, production SaaS, unique data).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Why:&lt;/strong&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Private, dedicated resources&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Custom SLAs and scaling&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Usage-based pricing&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Expert deployment and monitoring&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;TL;DR:&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Need standard models, fast? Go &lt;strong&gt;Public Endpoints&lt;/strong&gt;.&lt;br&gt;&lt;br&gt;
Need your own model, full control, and pro support? Go &lt;a href="https://novita.ai/dedicated-endpoint?utm_source=blogs&amp;amp;utm_medium=article&amp;amp;utm_campaign=%3Fp%3D11236" rel="noopener noreferrer"&gt;LLM Dedicated Endpoint&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Why Developers Love It&lt;/strong&gt;
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Drop-in API:&lt;/strong&gt; Keep your code—just update the endpoint URL.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;No Cloud Headaches:&lt;/strong&gt; No need for Dockerfiles, GPU quotas, or on-call alerts.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Transparent Pricing:&lt;/strong&gt; No surprises. Billed for tokens, with optional daily minimums.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;24/7 Support:&lt;/strong&gt; Hit a snag? Ping &lt;a href="https://discord.gg/YyPRAzwp7P" rel="noopener noreferrer"&gt;Novita’s support team&lt;/a&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;How to Get Started&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Ready to deploy?&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://meet.brevo.com/novita-ai/contact-sales" rel="noopener noreferrer"&gt;Contact Novita AI Sales&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Share your requirements (QPS, latency, model type)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Novita sets up your endpoint—no DevOps needed&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Update your API URL and ship!&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Conclusion&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;LLM Dedicated Endpoint on Novita AI&lt;/strong&gt; is the dev-friendly way to run custom models with no ops, no idle GPU costs, and no guesswork. You focus on building, Novita keeps your models running—secure, scalable, and fast.&lt;br&gt;&lt;br&gt;
Ready to launch your own LLM? &lt;a href="https://meet.brevo.com/novita-ai/contact-sales" rel="noopener noreferrer"&gt;Book a Demo&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Frequently Asked Questions&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;How does Novita handle scaling during traffic spikes?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Resources auto-scale based on real-time demand. You’re only billed for actual usage, not reserved capacity.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Can I migrate from a Novita public API to a Dedicated Endpoint?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Yes—just update the endpoint URL. 100% API compatibility means no code changes are required.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What if I need guaranteed uptime and latency?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Novita offers custom SLAs for uptime, latency, and throughput, tailored to your needs.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How is billing handled?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;You pay only for tokens processed, with a minimum daily token commitment. No idle GPU bills.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;a href="https://novita.ai/?utm_source=blog_llm&amp;amp;utm_medium=article&amp;amp;utm_campaign=qwen-3-30b-a3b-vs-qwq-32b" rel="noopener noreferrer"&gt;&lt;em&gt;Novita AI&lt;/em&gt;&lt;/a&gt; &lt;em&gt;is an AI cloud platform that offers developers an easy way to deploy AI models using our simple API, while also providing the affordable and reliable GPU cloud for building and scaling.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

</description>
      <category>llm</category>
    </item>
    <item>
      <title>Qwen 3 Now Available on Novita AI - Claim Your $10 Free Credits</title>
      <dc:creator>Novita AI</dc:creator>
      <pubDate>Tue, 29 Apr 2025 10:09:20 +0000</pubDate>
      <link>https://dev.to/novita_ai/qwen-3-now-available-on-novita-ai-claim-your-10-free-credits-5d28</link>
      <guid>https://dev.to/novita_ai/qwen-3-now-available-on-novita-ai-claim-your-10-free-credits-5d28</guid>
      <description>&lt;p&gt;Alibaba’s cutting-edge Qwen 3 large language models are now live on Novita AI’s Model API platform! Instantly access the latest Qwen3–235B-A22B, Qwen3–30B-A3B, and Qwen3–32B models — all with a massive 128,000 context window and industry-leading performance.&lt;/p&gt;

&lt;p&gt;For a limited time, new users can claim &lt;a href="https://novita.ai/referral?invited_code=5W10UA&amp;amp;utm_source=blogs&amp;amp;utm_medium=article&amp;amp;utm_campaign=qwen-3-now-available-on-novita-ai-claim-your-10-free-credits" rel="noopener noreferrer"&gt;&lt;strong&gt;$10 in free credits&lt;/strong&gt;&lt;/a&gt; to explore and build with Qwen 3.&lt;/p&gt;

&lt;p&gt;Here’s the current Qwen 3 lineup and pricing on Novita AI:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://novita.ai/models/llm/qwen-qwen3-235b-a22b-fp8?utm_source=blogs&amp;amp;utm_medium=article&amp;amp;utm_campaign=qwen-3-now-available-on-novita-ai-claim-your-10-free-credits" rel="noopener noreferrer"&gt;&lt;strong&gt;Qwen3–235B-A22B&lt;/strong&gt;&lt;/a&gt;&lt;strong&gt;:&lt;/strong&gt; $0.20 / M input tokens, $0.80 / M output tokens&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://novita.ai/models/llm/qwen-qwen3-30b-a3b-fp8?utm_source=blogs&amp;amp;utm_medium=article&amp;amp;utm_campaign=qwen-3-now-available-on-novita-ai-claim-your-10-free-credits" rel="noopener noreferrer"&gt;&lt;strong&gt;Qwen3–30B-A3B&lt;/strong&gt;&lt;/a&gt;&lt;strong&gt;:&lt;/strong&gt; $0.10 / M input tokens, $0.45 / M output tokens&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://novita.ai/models/llm/qwen-qwen3-32b-fp8?utm_source=blogs&amp;amp;utm_medium=article&amp;amp;utm_campaign=qwen-3-now-available-on-novita-ai-claim-your-10-free-credits" rel="noopener noreferrer"&gt;&lt;strong&gt;Qwen3–32B&lt;/strong&gt;&lt;/a&gt;&lt;strong&gt;:&lt;/strong&gt; $0.10 / M input tokens, $0.45 / M output tokens&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Power your chatbots, apps, and workflows with state-of-the-art language models — Qwen 3 is just an API call away.&lt;/p&gt;

&lt;h3&gt;
  
  
  What is Qwen 3?
&lt;/h3&gt;

&lt;p&gt;Qwen 3 is the latest and most advanced family of large language models developed by Alibaba Cloud’s Qwen team. Building on the experience of QwQ and Qwen2.5, Qwen 3 sets a new standard for open-source AI with major improvements in reasoning, multilingualism, and agentic abilities.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F1000%2F0%2AnhkYvIyB2RuXpHV4" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F1000%2F0%2AnhkYvIyB2RuXpHV4" width="1000" height="776"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Key Features of Qwen 3
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Dense and Mixture-of-Experts (MoE) models in various sizes:&lt;/strong&gt; Qwen 3 is available in both dense and MoE architectures, ranging from lightweight 0.6B and 1.7B models up to large-scale 32B (dense) and flagship 30B-A3B and 235B-A22B (MoE) variants.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Hybrid thinking modes:&lt;/strong&gt; The model allows seamless switching between &lt;em&gt;thinking mode&lt;/em&gt; (for complex, step-by-step logical reasoning, math, and code generation) and &lt;em&gt;non-thinking mode&lt;/em&gt; (for fast, efficient, general-purpose chat).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Significantly enhanced reasoning:&lt;/strong&gt; Qwen 3 surpasses previous Qwen models in mathematics, code generation, and commonsense logical reasoning. It also offers more stable and controllable reasoning budgets for different tasks.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Superior human preference alignment:&lt;/strong&gt; The model excels in creative writing, role-playing, multi-turn dialogues, and instruction following, resulting in more natural, engaging conversations.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Advanced agentic capabilities:&lt;/strong&gt; Qwen 3 is designed for agent-based workflows, supporting seamless integration with external tools and precise function calling in both reasoning modes. This enables state-of-the-art performance in complex, agent-driven tasks.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Robust multilingual support:&lt;/strong&gt; Supporting 119 languages and dialects, Qwen 3 is capable of high-quality multilingual instruction following and translation, opening the door for truly global applications.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F1000%2F0%2AlMQ6xNFrKEPYuACW" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F1000%2F0%2AlMQ6xNFrKEPYuACW" width="1000" height="841"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Benchmarks and Performance
&lt;/h3&gt;

&lt;p&gt;The Qwen 3 series demonstrates industry-leading performance across a comprehensive suite of AI benchmarks, excelling in coding, mathematics, general reasoning, and multilingual understanding.&lt;/p&gt;

&lt;h3&gt;
  
  
  Flagship Model: Qwen3–235B-A22B
&lt;/h3&gt;

&lt;p&gt;The flagship model, &lt;strong&gt;Qwen3–235B-A22B&lt;/strong&gt;, consistently achieves top or near-top results when compared with the most advanced models available today, such as DeepSeek-R1, OpenAI-01, OpenAI-o3-mini, Grok-3 Beta, and Gemini-2.5-Pro.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F1000%2F0%2Aoihni-KMiDNqZSLm" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F1000%2F0%2Aoihni-KMiDNqZSLm" width="1000" height="563"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Complex Reasoning:&lt;/strong&gt; Highest scores on ArenaHard (95.6), outperforming or matching all competitors.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Mathematics:&lt;/strong&gt; Leading results on AIME’24 (85.7) and AIME’25 (81.5), well ahead of most commercial and open-source models.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Coding:&lt;/strong&gt; Exceptional performance on LiveCodeBench (70.7) and CodeForces Elo (2056), confirming its strength in software and algorithmic tasks.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Multilingual &amp;amp; General Capabilities:&lt;/strong&gt; Qwen3–235B-A22B achieves strong results on LiveBench and MultiF, demonstrating robust real-world and multilingual understanding.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Model Efficiency and Scalability
&lt;/h3&gt;

&lt;p&gt;Qwen 3’s architectural innovations also translate to outstanding performance at smaller model sizes:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F1000%2F0%2A1_BOKA6jqTRDO5CB" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F1000%2F0%2A1_BOKA6jqTRDO5CB" width="1000" height="563"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Qwen3–32B (Dense):&lt;/strong&gt; Delivers results just behind the flagship, still outperforming most alternative models across all categories.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Qwen3–30B-A3B (MoE):&lt;/strong&gt; Outperforms QwQ-32B, despite using only a tenth of the activated parameters — showcasing Qwen’s efficiency and smart scaling.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Qwen3–4B (Dense):&lt;/strong&gt; Even this compact model can rival the performance of much larger models like Qwen2.5–72B-Instruct, especially on reasoning and multilingual tasks.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  How to Access Qwen 3 on Novita AI
&lt;/h3&gt;

&lt;p&gt;Getting started with Qwen 3 is fast, simple, and risk-free on Novita AI. Thanks to the Referral Program, you’ll receive &lt;strong&gt;$10 in free credits&lt;/strong&gt; — enough to fully explore Qwen 3’s power, build prototypes, and even launch your first use case without any upfront cost.&lt;/p&gt;

&lt;h3&gt;
  
  
  Use the Playground (No Coding Required)
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Instant Access&lt;/strong&gt;: &lt;a href="https://novita.ai/referral?invited_code=5W10UA&amp;amp;utm_source=blogs&amp;amp;utm_medium=article&amp;amp;utm_campaign=qwen-3-now-available-on-novita-ai-claim-your-10-free-credits" rel="noopener noreferrer"&gt;Sign up&lt;/a&gt;, claim your free credits, and start experimenting with Qwen 3 and other top models in seconds.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Interactive UI&lt;/strong&gt;: Test prompts, chain-of-thought reasoning, and visualize results in real time.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Model Comparison&lt;/strong&gt;: Effortlessly switch between Qwen 3, Llama 4, DeepSeek, and more to find the perfect fit for your needs.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Integrate via API (For Developers)
&lt;/h3&gt;

&lt;p&gt;Seamlessly connect Qwen 3 to your applications, workflows, or chatbots with Novita AI’s unified REST API — no need to manage model weights or infrastructure. Novita AI offers multi-language SDKs (Python, Node.js, cURL, and more) and advanced parameter controls for power users.&lt;/p&gt;

&lt;h4&gt;
  
  
  Option 1: Direct API Integration (Python Example)
&lt;/h4&gt;

&lt;p&gt;To get started, simply use the code snippet below:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OpenAI&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;base_url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://api.novita.ai/v3/openai&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;&amp;lt;YOUR Novita AI API Key&amp;gt;&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;qwen/qwen3-235b-a22b-fp8&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;stream&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt; &lt;span class="c1"&gt;# or False
&lt;/span&gt;&lt;span class="n"&gt;max_tokens&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;2048&lt;/span&gt;
&lt;span class="n"&gt;system_content&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Be a helpful assistant&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
&lt;span class="n"&gt;temperature&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
&lt;span class="n"&gt;top_p&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
&lt;span class="n"&gt;min_p&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
&lt;span class="n"&gt;top_k&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;50&lt;/span&gt;
&lt;span class="n"&gt;presence_penalty&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
&lt;span class="n"&gt;frequency_penalty&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
&lt;span class="n"&gt;repetition_penalty&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
&lt;span class="n"&gt;response_format&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="n"&gt;chat_completion_res&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;system&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;system_content&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Hi there!&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;stream&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;stream&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;max_tokens&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;max_tokens&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;temperature&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;temperature&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;top_p&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;top_p&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;presence_penalty&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;presence_penalty&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;frequency_penalty&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;frequency_penalty&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;response_format&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;response_format&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;extra_body&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;top_k&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;top_k&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;repetition_penalty&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;repetition_penalty&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;min_p&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;min_p&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;stream&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;chunk&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;chat_completion_res&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;delta&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;end&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;chat_completion_res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Option 2: Multi-Agent Workflows with OpenAI Agents SDK
&lt;/h4&gt;

&lt;p&gt;Build advanced multi-agent systems by integrating Novita AI with the &lt;a href="https://novita.ai/docs/guides/integration-openai-agents-sdk?utm_source=blogs&amp;amp;utm_medium=article&amp;amp;utm_campaign=qwen-3-now-available-on-novita-ai-claim-your-10-free-credits" rel="noopener noreferrer"&gt;OpenAI Agents SDK&lt;/a&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Plug-and-play:&lt;/strong&gt; Use Novita AI’s LLMs in any OpenAI Agents workflow.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Supports handoffs, routing, and tool use:&lt;/strong&gt; Design agents that can delegate, triage, or run functions, all powered by Novita AI’s models.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Python integration:&lt;/strong&gt; Simply point the SDK to Novita’s endpoint (&lt;code&gt;https://api.novita.ai/v3/openai&lt;/code&gt;) and use your API key.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Connect Qwen 3 API on Third-Party Platforms
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://novita.ai/docs/guides/huggingface?utm_source=blogs&amp;amp;utm_medium=article&amp;amp;utm_campaign=qwen-3-now-available-on-novita-ai-claim-your-10-free-credits" rel="noopener noreferrer"&gt;&lt;strong&gt;Hugging Face&lt;/strong&gt;&lt;/a&gt;: Use Qwen 3 in Spaces, pipelines, or with the Transformers library via Novita AI endpoints.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Agent &amp;amp; Orchestration Frameworks:&lt;/strong&gt; Easily connect Novita AI with partner platforms like &lt;a href="https://novita.ai/docs/guides/continue?utm_source=blogs&amp;amp;utm_medium=article&amp;amp;utm_campaign=qwen-3-now-available-on-novita-ai-claim-your-10-free-credits" rel="noopener noreferrer"&gt;Continue&lt;/a&gt;, &lt;a href="https://novita.ai/docs/guides/anythingllm?utm_source=blogs&amp;amp;utm_medium=article&amp;amp;utm_campaign=qwen-3-now-available-on-novita-ai-claim-your-10-free-credits" rel="noopener noreferrer"&gt;AnythingLLM,&lt;/a&gt; &lt;a href="https://novita.ai/docs/guides/langchain?utm_source=blogs&amp;amp;utm_medium=article&amp;amp;utm_campaign=qwen-3-now-available-on-novita-ai-claim-your-10-free-credits" rel="noopener noreferrer"&gt;LangChain&lt;/a&gt;, &lt;a href="https://novita.ai/docs/guides/dify?utm_source=blogs&amp;amp;utm_medium=article&amp;amp;utm_campaign=qwen-3-now-available-on-novita-ai-claim-your-10-free-credits" rel="noopener noreferrer"&gt;Dify&lt;/a&gt; and &lt;a href="https://novita.ai/docs/guides/langflow?utm_source=blogs&amp;amp;utm_medium=article&amp;amp;utm_campaign=qwen-3-now-available-on-novita-ai-claim-your-10-free-credits" rel="noopener noreferrer"&gt;Langflow&lt;/a&gt; through official connectors and step-by-step integration guides.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;OpenAI-Compatible API:&lt;/strong&gt; Enjoy hassle-free migration and integration with tools such as &lt;a href="https://blogs.novita.ai/how-to-integrate-novita-ai-llm-api-with-cline-in-vscode/" rel="noopener noreferrer"&gt;Cline&lt;/a&gt; and &lt;a href="https://novita.ai/docs/guides/cursor?utm_source=blogs&amp;amp;utm_medium=article&amp;amp;utm_campaign=qwen-3-now-available-on-novita-ai-claim-your-10-free-credits" rel="noopener noreferrer"&gt;Cursor&lt;/a&gt;, designed for the OpenAI API standard.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Best Practices for Optimal Qwen 3 Performance
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Sampling Parameter Settings&lt;/strong&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Thinking Mode&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
&lt;code&gt;enable_thinking=True&lt;/code&gt;&lt;br&gt;&lt;br&gt;
&lt;strong&gt;Temperature:&lt;/strong&gt; 0.6&lt;br&gt;&lt;br&gt;
&lt;strong&gt;TopP:&lt;/strong&gt; 0.95&lt;br&gt;&lt;br&gt;
&lt;strong&gt;TopK:&lt;/strong&gt; 20&lt;br&gt;&lt;br&gt;
&lt;strong&gt;MinP:&lt;/strong&gt; 0&lt;br&gt;&lt;br&gt;
&lt;em&gt;Tip:&lt;/em&gt; Avoid greedy decoding to prevent degraded performance or repetitive outputs.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Non-Thinking Mode&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
&lt;code&gt;enable_thinking=False&lt;/code&gt;&lt;br&gt;&lt;br&gt;
&lt;strong&gt;Temperature:&lt;/strong&gt; 0.7&lt;br&gt;&lt;br&gt;
&lt;strong&gt;TopP:&lt;/strong&gt; 0.8&lt;br&gt;&lt;br&gt;
&lt;strong&gt;TopK:&lt;/strong&gt; 20&lt;br&gt;&lt;br&gt;
&lt;strong&gt;MinP:&lt;/strong&gt; 0&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Repetition Control&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
For supported frameworks, adjust &lt;code&gt;presence_penalty&lt;/code&gt; between &lt;strong&gt;0&lt;/strong&gt; and &lt;strong&gt;2&lt;/strong&gt; to reduce repetitions.&lt;br&gt;&lt;br&gt;
&lt;em&gt;Note:&lt;/em&gt; Higher values may cause some language mixing or a slight decrease in model performance.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Output Length Recommendations&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;For most queries, set the output length to &lt;strong&gt;32,768 tokens&lt;/strong&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;For complex benchmarking tasks (such as math or programming competitions), increase the max output length to &lt;strong&gt;38,912 tokens&lt;/strong&gt; for more comprehensive responses.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;3. Standardizing Output Format&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Math Problems:&lt;/strong&gt; Include this in your prompt: &lt;em&gt;“Please reason step by step, and put your final answer within \boxed{}.”&lt;/em&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Multiple-Choice Questions:&lt;/strong&gt; Standardize responses using a JSON field: &lt;em&gt;“Please show your choice in the answer field with only the choice letter, e.g., “answer”: “C”.”&lt;/em&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;4. Conversation History Management&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;In multi-turn conversations, include only the final output in the chat history. Omit any intermediate “thinking” content.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;If using a Jinja2 chat template, this is handled automatically. For other frameworks, ensure this practice is followed manually.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;By following these recommendations, you’ll ensure Qwen 3 consistently delivers accurate, high-quality results across all use cases.&lt;/p&gt;

&lt;h3&gt;
  
  
  Conclusion
&lt;/h3&gt;

&lt;p&gt;Qwen 3 delivers best-in-class performance for coding, reasoning, and multilingual tasks — no matter the project size. Ready to see it in action?&lt;/p&gt;

&lt;p&gt;Try the &lt;a href="https://novita.ai/models/llm/qwen-qwen3-235b-a22b-fp8?utm_source=blogs&amp;amp;utm_medium=article&amp;amp;utm_campaign=qwen-3-now-available-on-novita-ai-claim-your-10-free-credits" rel="noopener noreferrer"&gt;Qwen 3 demo&lt;/a&gt; on Novita AI now and &lt;a href="https://novita.ai/referral?invited_code=5W10UA&amp;amp;utm_source=blogs&amp;amp;utm_medium=article&amp;amp;utm_campaign=qwen-3-now-available-on-novita-ai-claim-your-10-free-credits" rel="noopener noreferrer"&gt;claim your free credits&lt;/a&gt;!&lt;/p&gt;

&lt;p&gt;Originally published on &lt;a href="https://novita.ai/?utm_source=blogs&amp;amp;utm_medium=article&amp;amp;utm_campaign=qwen-3-now-available-on-novita-ai-claim-your-10-free-credits" rel="noopener noreferrer"&gt;&lt;em&gt;Novita AI&lt;/em&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://novita.ai/?utm_source=blogs&amp;amp;utm_medium=article&amp;amp;utm_campaign=qwen-3-now-available-on-novita-ai-claim-your-10-free-credits" rel="noopener noreferrer"&gt;&lt;em&gt;Novita AI&lt;/em&gt;&lt;/a&gt; &lt;em&gt;is an AI cloud platform that offers developers an easy way to deploy AI models using our simple API, while also providing the affordable and reliable GPU cloud for building and scaling.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>llm</category>
      <category>ai</category>
      <category>qwen</category>
    </item>
    <item>
      <title>Earn $500 Free Credits: Build Faster with Deepseek, Llama &amp; Qwen on Novita AI</title>
      <dc:creator>Novita AI</dc:creator>
      <pubDate>Thu, 24 Apr 2025 05:46:08 +0000</pubDate>
      <link>https://dev.to/novita_ai/earn-500-free-credits-build-faster-with-deepseek-llama-qwen-on-novita-ai-553d</link>
      <guid>https://dev.to/novita_ai/earn-500-free-credits-build-faster-with-deepseek-llama-qwen-on-novita-ai-553d</guid>
      <description>&lt;p&gt;Novita AI is offering an exclusive, limited-time opportunity! With the Referral Program, you can earn up to $500 in LLM API credits by simply referring your friends. Here’s the best part: both you and your referral will receive $10 in credits, unlocking access to top-tier models like DeepSeek, Llama and Qwen.&lt;/p&gt;

&lt;p&gt;These credits can power your next big project, whether you’re working with Hugging Face, Anything LLM, Langflow, Continue, Helicone, Dify, Cursor, LobeChat, and many more.&lt;br&gt;
Don’t miss out on the opportunity to supercharge your AI applications.&lt;/p&gt;

&lt;p&gt;👉 Sign up for the Novita AI Referral Program and begin earning credits now.&lt;/p&gt;

&lt;p&gt;Why Novita AI is the Trusted Choice&lt;br&gt;
Artificial Analysis, a leading AI model evaluation platform, ranks Novita AI alongside industry leaders such as Together AI and Fireworks AI, reinforcing Novita AI’s reputation as a trusted choice for developers worldwide.&lt;/p&gt;

&lt;p&gt;Additionally, OpenRouter recognizes Novita AI as one of the most cost-effective LLM API providers.&lt;/p&gt;

&lt;p&gt;Novita AI also serves as the official inference provider on Hugging Face.&lt;/p&gt;

&lt;p&gt;4 Easy Steps to Claim $10 in API Credits&lt;br&gt;
Visit the Referral Program Page&lt;br&gt;
Head to the official page to begin.&lt;br&gt;
Enter Your Invite Code&lt;br&gt;
Use either the official invite code 5W10UA or your personal one to get started.&lt;br&gt;
Create Your Novita AI Account&lt;br&gt;
Sign up using your email, Google, Hugging Face or GitHub account.&lt;br&gt;
Verify Your GitHub Account&lt;br&gt;
Complete the verification process to unlock your credits.&lt;br&gt;
3 Ways to Share and Earn Up to $500 in Credits&lt;br&gt;
Earn up to $500 in LLM API credits by referring others. Here’s how you can share and earn:&lt;/p&gt;

&lt;p&gt;Copy Your Referral Link:&lt;br&gt;
&lt;a href="https://novita.ai/referral?invited_code=xxx" rel="noopener noreferrer"&gt;https://novita.ai/referral?invited_code=xxx&lt;/a&gt;&lt;br&gt;
Copy your own Referral Code&lt;br&gt;
Share on Social Media:&lt;br&gt;
Post your referral link on platforms like Twitter (X), LinkedIn, Facebook, or anywhere else developers are hanging out.&lt;br&gt;
The more you share, the more you earn!&lt;/p&gt;

&lt;p&gt;LLM API on Novita AI&lt;br&gt;
You can use your credits across the entire range of LLM APIs available on Novita AI. Below is a comprehensive list of all supported LLM APIs on the platform.&lt;/p&gt;

&lt;p&gt;LLM API&lt;br&gt;
Integrated Projects &amp;amp; SDKs&lt;br&gt;
Novita AI supports seamless integration with many leading open-source projects and developer tools. Once you get the LLM API credits, you can call the API on the following platforms.&lt;/p&gt;

&lt;p&gt;Novita AI &amp;amp; OpenAI Agents SDK&lt;br&gt;
Novita AI &amp;amp; AnythingLLM&lt;br&gt;
Novita AI &amp;amp; Dify&lt;br&gt;
Novita AI &amp;amp; Helicone&lt;br&gt;
Novita AI &amp;amp; Hugging Face&lt;br&gt;
Novita AI &amp;amp; Langflow&lt;br&gt;
Novita AI &amp;amp; Continue&lt;br&gt;
Novita AI &amp;amp; Cursor&lt;br&gt;
Novita AI &amp;amp; LangChain&lt;br&gt;
Novita AI &amp;amp; Skyvern&lt;br&gt;
Novita AI &amp;amp; LobeChat&lt;br&gt;
Novita AI &amp;amp; ai-gradio&lt;br&gt;
Novita AI &amp;amp; Langfuse&lt;br&gt;
Novita AI &amp;amp; Verba&lt;br&gt;
Novita AI &amp;amp; Portkey&lt;br&gt;
Novita AI &amp;amp; DocsGPT&lt;br&gt;
Novita AI &amp;amp; LlamaIndex&lt;br&gt;
Novita AI &amp;amp; LoLLMS WebUI&lt;br&gt;
Novita AI &amp;amp; CodeCompanion.nvim&lt;br&gt;
Novita AI &amp;amp; Page Assist&lt;br&gt;
Novita AI &amp;amp; DeepSearcher&lt;br&gt;
Start Earning and Building with Novita AI Today!&lt;br&gt;
Don’t miss out on the chance to earn up to $500 in credits, unlock powerful LLM API models, and supercharge your projects with Novita AI. Whether you’re building AI-powered tools, developing advanced agents, or creating the next big thing in AI, Novita AI is your trusted partner.&lt;/p&gt;

&lt;p&gt;👉 Sign up now, share your link, and start building!&lt;/p&gt;

&lt;p&gt;Novita AI is an AI cloud platform that offers developers an easy way to deploy AI models using our simple API, while also providing an affordable and reliable GPU cloud for building and scaling.&lt;/p&gt;

</description>
      <category>llm</category>
      <category>llama</category>
      <category>deepseek</category>
      <category>qwen</category>
    </item>
  </channel>
</rss>
