<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Novita AI</title>
    <description>The latest articles on DEV Community by Novita AI (@novita_ai).</description>
    <link>https://dev.to/novita_ai</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1183161%2F844aefc6-6de4-4095-92b6-6cc3eb4d8d2d.png</url>
      <title>DEV Community: Novita AI</title>
      <link>https://dev.to/novita_ai</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/novita_ai"/>
    <language>en</language>
    <item>
      <title>How to Use Kimi-K2 in Claude Code on Windows and Mac</title>
      <dc:creator>Novita AI</dc:creator>
      <pubDate>Tue, 15 Jul 2025 09:46:32 +0000</pubDate>
      <link>https://dev.to/novita_ai/how-to-use-kimi-k2-in-claude-code-on-windows-and-mac-18p</link>
      <guid>https://dev.to/novita_ai/how-to-use-kimi-k2-in-claude-code-on-windows-and-mac-18p</guid>
      <description>&lt;p&gt;Claude Code offers more powerful agent capabilities than traditional code editors like Cursor. By integrating &lt;a href="https://novita.ai/models/llm/moonshotai-kimi-k2-instruct?utm_source=blogs&amp;amp;utm_medium=article&amp;amp;utm_campaign=use-kimi-k2-in-claude-code" rel="noopener noreferrer"&gt;Kimi-K2 through Novita AI’s platform&lt;/a&gt;, developers can access enterprise-grade AI functionality at a fraction of the cost. This guide covers setting up Kimi-K2 with Claude Code on both Windows and Mac systems.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;What is Claude Code&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fh2kxdclqmvzvc056t135.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fh2kxdclqmvzvc056t135.png" width="793" height="411"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Source: &lt;a href="http://www.anthropic.com/claude-code" rel="noopener noreferrer"&gt;http://www.anthropic.com/claude-code&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Claude Code is an agentic command line tool that revolutionizes the way developers interact with AI for coding tasks. Unlike traditional code editors, Claude Code offers more powerful agent abilities than Cursor.&lt;/p&gt;

&lt;p&gt;This innovative tool enables developers to delegate complex coding tasks directly from their terminal. It transforms natural language descriptions into fully functional code, making it an indispensable asset for modern development workflows.&lt;/p&gt;

&lt;p&gt;The tool operates as an interactive session where developers can describe their requirements in plain English. Claude Code intelligently generates, modifies, and optimizes code accordingly. Its advanced understanding of context and project structure allows it to make informed decisions about code architecture, dependencies, and implementation patterns.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Why Use Kimi-K2 in Claude Code&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Kimi-K2 presents a compelling alternative to traditional Claude models, offering similar capabilities at significantly reduced costs. The economic advantages are substantial:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Kimi-K2 on Novita AI: $0.57 per 1M input tokens and $2.3 per 1M output tokens&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Claude Sonnet: $3 per 1M input tokens and $15 per 1M output tokens&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This represents an 81% cost reduction for input tokens and an 85% reduction for output tokens.&lt;/p&gt;

&lt;p&gt;Beyond cost savings, Kimi-K2 through Novita AI provides an anthropic-compatible LLM API with higher rate limits than official channels. This compatibility ensures seamless integration with existing Claude Code workflows while offering improved performance and reliability.&lt;/p&gt;

&lt;p&gt;The combination delivers enterprise-grade AI capabilities without the premium pricing. This makes advanced AI development accessible to a broader range of developers and organizations.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Getting Your API Key on Novita AI&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://novita.ai/user/register?utm_source=blogs&amp;amp;utm_medium=article&amp;amp;utm_campaign=use-kimi-k2-in-claude-code" rel="noopener noreferrer"&gt;Sign up for a Novita AI account&lt;/a&gt; to get started with free trial credits. Navigate to the &lt;a href="https://novita.ai/settings/key-management?utm_source=blogs&amp;amp;utm_medium=article&amp;amp;utm_campaign=use-kimi-k2-in-claude-code" rel="noopener noreferrer"&gt;Key Management page&lt;/a&gt; in your dashboard and click “Create New Key.”&lt;/p&gt;

&lt;p&gt;Copy the generated API key immediately and store it securely – it won’t be displayed again. You’ll need this key for the configuration steps below.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Installing Claude Code&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Before installing Claude Code, ensure your system meets the minimum requirements. Node.js 18 or higher must be installed on your local environment. You can verify your Node.js version by running &lt;code&gt;node --version&lt;/code&gt; in your terminal.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;For Windows&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Open Command Prompt and execute the following commands:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;cmd&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;npm install -g @anthropic-ai/claude-code&lt;/p&gt;

&lt;p&gt;npx win-claude-code@latest&lt;/p&gt;

&lt;p&gt;The global installation ensures Claude Code is accessible from any directory on your system. The &lt;code&gt;npx win-claude-code@latest&lt;/code&gt; command downloads and runs the latest Windows-specific version.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;For Mac&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Open Terminal and run:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;bash&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;npm install -g @anthropic-ai/claude-code&lt;/p&gt;

&lt;p&gt;Mac users can proceed directly with the global installation without requiring additional platform-specific commands. The installation process automatically configures the necessary dependencies and PATH variables.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Setting Up Environment Variables&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Environment variables configure Claude Code to use Kimi-K2 through Novita AI’s API endpoints. These variables tell Claude Code where to send requests and how to authenticate.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;For Windows&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Open Command Prompt and set the following environment variables:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;cmd&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;set ANTHROPIC_BASE_URL=&lt;a href="https://api.novita.ai/anthropic" rel="noopener noreferrer"&gt;https://api.novita.ai/anthropic&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;set ANTHROPIC_AUTH_TOKEN=&lt;strong&gt;&amp;lt;&lt;/strong&gt;Novita API Key*&lt;em&gt;&amp;amp;gt;&lt;/em&gt;*&lt;/p&gt;

&lt;p&gt;set ANTHROPIC_MODEL=moonshotai/kimi-k2-instruct&lt;/p&gt;

&lt;p&gt;set ANTHROPIC_SMALL_FAST_MODEL=moonshotai/kimi-k2-instruct&lt;/p&gt;

&lt;p&gt;Replace &lt;code&gt;&amp;lt;Novita API Key&amp;gt;&lt;/code&gt; with your actual API key obtained from the Novita AI platform. These variables remain active for the current session and must be reset if you close the Command Prompt.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;For Mac&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Open Terminal and export the following environment variables:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;bash&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;export ANTHROPIC_BASE_URL="&lt;a href="https://api.novita.ai/anthropic" rel="noopener noreferrer"&gt;https://api.novita.ai/anthropic&lt;/a&gt;"&lt;/p&gt;

&lt;p&gt;export ANTHROPIC_AUTH_TOKEN="&amp;lt;Novita API Key&amp;gt;"&lt;/p&gt;

&lt;p&gt;export ANTHROPIC_MODEL="moonshotai/kimi-k2-instruct"&lt;/p&gt;

&lt;p&gt;export ANTHROPIC_SMALL_FAST_MODEL="moonshotai/kimi-k2-instruct"&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Starting Claude Code&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;With installation and configuration complete, you can now start Claude Code in your project directory. Navigate to your desired project location using the &lt;code&gt;cd&lt;/code&gt; command:&lt;/p&gt;

&lt;p&gt;cd &lt;strong&gt;&amp;lt;&lt;/strong&gt;your-project-directory*&lt;em&gt;&amp;amp;gt;&lt;/em&gt;*&lt;/p&gt;

&lt;p&gt;claude .&lt;/p&gt;

&lt;p&gt;The dot (.) parameter instructs Claude Code to operate in the current directory. Upon startup, you’ll see the Claude Code prompt appear in an interactive session.&lt;/p&gt;

&lt;p&gt;This indicates the tool is ready to receive your instructions. The interface provides a clean, intuitive environment for natural language programming interactions.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Building Your First Project&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Claude Code excels at transforming detailed project descriptions into functional applications. After entering your prompt, press Enter to begin the task. Claude Code will analyze your requirements, create the necessary files, implement the functionality, and provide a complete project structure with documentation.&lt;/p&gt;

&lt;p&gt;Here’s an example of how to create a Python Flask web app with MBTI personality guessing game:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8zrw8747f9a7a8xya22r.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8zrw8747f9a7a8xya22r.gif" width="1280" height="576"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Using Claude Code in VSCode or Cursor&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Claude Code integrates seamlessly with popular development environments. It enhances your existing workflow rather than replacing it.&lt;/p&gt;

&lt;p&gt;You can use Claude Code directly in the terminal within VSCode or Cursor. This maintains access to your familiar development tools while leveraging AI assistance.&lt;/p&gt;

&lt;p&gt;Additionally, Claude Code plugins are available for both VSCode and Cursor. These plugins provide deeper integration with these editors, offering inline AI assistance, code suggestions, and project management features directly within your IDE interface.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftsnsub22iatxlnjr7enl.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftsnsub22iatxlnjr7enl.png" alt="claude code in cursor" width="800" height="399"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The terminal integration allows you to run Claude Code commands without leaving your development environment. This creates a streamlined workflow for AI-assisted development.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Help and Documentation Resources&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Claude Code includes comprehensive help documentation accessible through the &lt;code&gt;/help&lt;/code&gt; command. This command displays available commands, usage examples, and troubleshooting information.&lt;/p&gt;

&lt;p&gt;The help system is context-aware, providing relevant information based on your current project and session state.&lt;/p&gt;

&lt;p&gt;For additional support, Novita AI provides &lt;a href="https://novita.ai/docs/guides/integration-claude-code?utm_source=blogs&amp;amp;utm_medium=article&amp;amp;utm_campaign=use-kimi-k2-in-claude-code" rel="noopener noreferrer"&gt;extensive documentation&lt;/a&gt; . This covers advanced configuration options, API usage patterns, and best practices.&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://docs.anthropic.com/en/docs/claude-code/overview" rel="noopener noreferrer"&gt;Anthropic documentation&lt;/a&gt; offers detailed information about Claude Code’s capabilities and features.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Conclusion&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://novita.ai/docs/guides/integration-claude-code?utm_source=blogs&amp;amp;utm_medium=article&amp;amp;utm_campaign=use-kimi-k2-in-claude-code" rel="noopener noreferrer"&gt;Kimi-K2 integration with Claude Code&lt;/a&gt; through Novita AI delivers enterprise-grade capabilities at significantly reduced costs. The combination transforms natural language descriptions into functional code, dramatically accelerating development workflows. Start your journey with Kimi-K2 and Claude Code today to experience the future of AI-assisted programming.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://novita.ai/?utm_source=blogs&amp;amp;utm_medium=article&amp;amp;utm_campaign=use-kimi-k2-in-claude-code" rel="noopener noreferrer"&gt;&lt;em&gt;Novita AI&lt;/em&gt;&lt;/a&gt; &lt;em&gt;is an AI cloud platform that offers developers an easy way to deploy AI models using our simple API, while also providing the affordable and reliable GPU cloud for building and scaling.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>claudecode</category>
      <category>kimi</category>
      <category>lowcode</category>
    </item>
    <item>
      <title>Access Free DeepSeek R1 0528 API Now</title>
      <dc:creator>Novita AI</dc:creator>
      <pubDate>Thu, 29 May 2025 08:48:55 +0000</pubDate>
      <link>https://dev.to/novita_ai/access-free-deepseek-r1-0528-api-now-2l7j</link>
      <guid>https://dev.to/novita_ai/access-free-deepseek-r1-0528-api-now-2l7j</guid>
      <description>&lt;p&gt;We’re excited to announce that DeepSeek AI’s latest model, &lt;a href="https://novita.ai/models/llm/deepseek-deepseek-r1-0528?utm_source=blogs&amp;amp;utm_medium=article&amp;amp;utm_campaign=deepseek-r1-0528" rel="noopener noreferrer"&gt;DeepSeek R1&lt;/a&gt; 0528, released today, is officially available in the Novita AI Model Library. We are also the official inference partner for &lt;a href="https://huggingface.co/deepseek-ai/DeepSeek-R1-0528" rel="noopener noreferrer"&gt;DeepSeek R1 0528&lt;/a&gt; on Hugging Face, supporting the community in bringing advanced models to production.&lt;/p&gt;

&lt;p&gt;For a limited time, new users can claim &lt;a href="https://novita.ai/referral?invited_code=5W10UA&amp;amp;utm_source=blogs&amp;amp;utm_medium=article&amp;amp;utm_campaign=homepage" rel="noopener noreferrer"&gt;&lt;strong&gt;$10 in free credits&lt;/strong&gt;&lt;/a&gt; to explore and build with DeepSeek-R1 0528’s advanced reasoning capabilities.&lt;/p&gt;

&lt;p&gt;Here’s the current DeepSeek-R1 0528 pricing on Novita AI:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://novita.ai/models/llm/deepseek-deepseek-r1-0528?utm_source=blogs&amp;amp;utm_medium=article&amp;amp;utm_campaign=deepseek-r1-0528" rel="noopener noreferrer"&gt;&lt;strong&gt;DeepSeek-R1–0528&lt;/strong&gt;&lt;/a&gt;: $0.7 / M input tokens, $2.5 / M output tokens&lt;/p&gt;

&lt;h1&gt;
  
  
  &lt;strong&gt;How to Access DeepSeek R1 0528 on Novita AI&lt;/strong&gt;
&lt;/h1&gt;

&lt;p&gt;Getting started with DeepSeek R1 0528 is fast, simple, and risk-free on Novita AI. Thanks to the Referral Program, you’ll receive &lt;strong&gt;$10 in free credits&lt;/strong&gt; — enough to fully explore DeepSeek R1 0528’s power, build prototypes, and even launch your first use case without any upfront cost.&lt;/p&gt;

&lt;h1&gt;
  
  
  &lt;strong&gt;Use the Playground (No Coding Required)&lt;/strong&gt;
&lt;/h1&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Instant Access&lt;/strong&gt;: &lt;a href="https://novita.ai/referral?invited_code=5W10UA&amp;amp;utm_source=blogs&amp;amp;utm_medium=article&amp;amp;utm_campaign=qwen-3-now-available-on-novita-ai-claim-your-10-free-credits" rel="noopener noreferrer"&gt;Sign up&lt;/a&gt;, claim your free credits, and start experimenting with DeepSeek R1 0528 and other top models in seconds.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Interactive UI&lt;/strong&gt;: Test prompts, chain-of-thought reasoning, and visualize results in real time.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Model Comparison&lt;/strong&gt;: Effortlessly switch between Qwen 3, Llama 4, DeepSeek, and more to find the perfect fit for your needs.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://novita.ai/models/llm/deepseek-deepseek-r1-0528?utm_source=blogs&amp;amp;utm_medium=article&amp;amp;utm_campaign=deepseek-r1-0528" rel="noopener noreferrer"&gt;Explore DeepSeek R1 0528 Demo Now&lt;/a&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  &lt;strong&gt;Integrate via API (For Developers)&lt;/strong&gt;
&lt;/h1&gt;

&lt;p&gt;Seamlessly connect DeepSeek R1 0528 to your applications, workflows, or chatbots with Novita AI’s unified REST API — no need to manage model weights or infrastructure. Novita AI offers multi-language SDKs (Python, Node.js, cURL, and more) and advanced parameter controls for power users.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Option 1: Direct API Integration (Python Example)&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;To get started, simply use the code snippet below:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OpenAI&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;base_url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://api.novita.ai/v3/openai&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;session_Ntg-O34ZOS-q5bNnkb3IcixmWnmxEQBxwKWMW3es3CD7KG4PEhFE1yRTRMGS3s8zZ52hrMdz14MmI4oalaDJTw==&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;deepseek/deepseek-r1-0528&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;stream&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt; &lt;span class="c1"&gt;# or False
&lt;/span&gt;&lt;span class="n"&gt;max_tokens&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;2048&lt;/span&gt;
&lt;span class="n"&gt;system_content&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="n"&gt;Be&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="n"&gt;helpful&lt;/span&gt; &lt;span class="n"&gt;assistant&lt;/span&gt;&lt;span class="sh"&gt;""&lt;/span&gt;
&lt;span class="n"&gt;temperature&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
&lt;span class="n"&gt;top_p&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
&lt;span class="n"&gt;min_p&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
&lt;span class="n"&gt;top_k&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;50&lt;/span&gt;
&lt;span class="n"&gt;presence_penalty&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
&lt;span class="n"&gt;frequency_penalty&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
&lt;span class="n"&gt;repetition_penalty&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
&lt;span class="n"&gt;response_format&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="n"&gt;chat_completion_res&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;system&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;system_content&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Hi there!&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;stream&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;stream&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;max_tokens&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;max_tokens&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;temperature&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;temperature&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;top_p&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;top_p&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;presence_penalty&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;presence_penalty&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;frequency_penalty&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;frequency_penalty&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;response_format&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;response_format&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;extra_body&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;top_k&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;top_k&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;repetition_penalty&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;repetition_penalty&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;min_p&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;min_p&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;stream&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;chunk&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;chat_completion_res&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;delta&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;end&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;chat_completion_res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Key Features:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Unified endpoint:&lt;/strong&gt;&lt;code&gt;/v3/openai&lt;/code&gt; supports OpenAI’s Chat Completions API format.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Flexible controls:&lt;/strong&gt; Adjust temperature, top-p, penalties, and more for tailored results.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Streaming &amp;amp; batching:&lt;/strong&gt; Choose your preferred response mode.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Option 2: Multi-Agent Workflows with OpenAI Agents SDK&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Build advanced multi-agent systems by integrating Novita AI with the &lt;a href="https://novita.ai/docs/guides/integration-openai-agents-sdk?utm_source=blogs&amp;amp;utm_medium=article&amp;amp;utm_campaign=qwen-3-now-available-on-novita-ai-claim-your-10-free-credits" rel="noopener noreferrer"&gt;OpenAI Agents SDK&lt;/a&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Plug-and-play:&lt;/strong&gt; Use Novita AI’s LLMs in any OpenAI Agents workflow.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Supports handoffs, routing, and tool use:&lt;/strong&gt; Design agents that can delegate, triage, or run functions, all powered by Novita AI’s models.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Python integration:&lt;/strong&gt; Simply point the SDK to Novita’s endpoint (&lt;code&gt;https://api.novita.ai/v3/openai&lt;/code&gt;) and use your API key.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  &lt;strong&gt;Connect DeepSeek R1 0528 API on Third-Party Platforms&lt;/strong&gt;
&lt;/h1&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://novita.ai/docs/guides/huggingface?utm_source=blogs&amp;amp;utm_medium=article&amp;amp;utm_campaign=qwen-3-now-available-on-novita-ai-claim-your-10-free-credits" rel="noopener noreferrer"&gt;&lt;strong&gt;Hugging Face&lt;/strong&gt;&lt;/a&gt;: Use DeepSeek R1 0528 in Spaces, pipelines, or with the Transformers library via Novita AI endpoints.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Agent &amp;amp; Orchestration Frameworks:&lt;/strong&gt; Easily connect Novita AI with partner platforms like &lt;a href="https://novita.ai/docs/guides/continue?utm_source=blogs&amp;amp;utm_medium=article&amp;amp;utm_campaign=qwen-3-now-available-on-novita-ai-claim-your-10-free-credits" rel="noopener noreferrer"&gt;Continue&lt;/a&gt;, &lt;a href="https://novita.ai/docs/guides/anythingllm?utm_source=blogs&amp;amp;utm_medium=article&amp;amp;utm_campaign=qwen-3-now-available-on-novita-ai-claim-your-10-free-credits" rel="noopener noreferrer"&gt;AnythingLLM,&lt;/a&gt; &lt;a href="https://novita.ai/docs/guides/langchain?utm_source=blogs&amp;amp;utm_medium=article&amp;amp;utm_campaign=qwen-3-now-available-on-novita-ai-claim-your-10-free-credits" rel="noopener noreferrer"&gt;LangChain&lt;/a&gt;, &lt;a href="https://novita.ai/docs/guides/dify?utm_source=blogs&amp;amp;utm_medium=article&amp;amp;utm_campaign=qwen-3-now-available-on-novita-ai-claim-your-10-free-credits" rel="noopener noreferrer"&gt;Dify&lt;/a&gt; and &lt;a href="https://novita.ai/docs/guides/langflow?utm_source=blogs&amp;amp;utm_medium=article&amp;amp;utm_campaign=qwen-3-now-available-on-novita-ai-claim-your-10-free-credits" rel="noopener noreferrer"&gt;Langflow&lt;/a&gt; through official connectors and step-by-step integration guides.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;OpenAI-Compatible API:&lt;/strong&gt; Enjoy hassle-free migration and integration with tools such as &lt;a href="https://blogs.novita.ai/how-to-integrate-novita-ai-llm-api-with-cline-in-vscode/" rel="noopener noreferrer"&gt;Cline&lt;/a&gt; and &lt;a href="https://novita.ai/docs/guides/cursor?utm_source=blogs&amp;amp;utm_medium=article&amp;amp;utm_campaign=qwen-3-now-available-on-novita-ai-claim-your-10-free-credits" rel="noopener noreferrer"&gt;Cursor&lt;/a&gt;, designed for the OpenAI API standard.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://novita.ai/?utm_source=blogs&amp;amp;utm_medium=article&amp;amp;utm_campaign=deepseek-r1-0528" rel="noopener noreferrer"&gt;&lt;em&gt;Novita AI&lt;/em&gt;&lt;/a&gt; &lt;em&gt;is the All-in-one cloud platform that empowers your AI ambitions. Integrated APIs, serverless, GPU Instance — the cost-effective tools you need. Eliminate infrastructure, start free, and make your AI vision a reality.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>deepseek</category>
    </item>
    <item>
      <title>Access Free DeepSeek R1 0528 API Now</title>
      <dc:creator>Novita AI</dc:creator>
      <pubDate>Thu, 29 May 2025 08:42:14 +0000</pubDate>
      <link>https://dev.to/novita_ai/access-free-deepseek-r1-0528-api-now-54n7</link>
      <guid>https://dev.to/novita_ai/access-free-deepseek-r1-0528-api-now-54n7</guid>
      <description>&lt;p&gt;We’re excited to announce that DeepSeek AI’s latest model, DeepSeek R1 0528, released today, is officially available in the Novita AI Model Library. We are also the official inference partner for DeepSeek R1 0528 on Hugging Face, supporting the community in bringing advanced models to production.&lt;/p&gt;

&lt;p&gt;For a limited time, new users can claim $10 in free credits to explore and build with DeepSeek-R1 0528’s advanced reasoning capabilities.&lt;/p&gt;

&lt;p&gt;Here’s the current DeepSeek-R1 0528 pricing on Novita AI:&lt;/p&gt;

&lt;p&gt;DeepSeek-R1–0528: $0.7 / M input tokens, $2.5 / M output tokens&lt;/p&gt;

&lt;p&gt;How to Access DeepSeek R1 0528 on Novita AI&lt;br&gt;
Getting started with DeepSeek R1 0528 is fast, simple, and risk-free on Novita AI. Thanks to the Referral Program, you’ll receive $10 in free credits — enough to fully explore DeepSeek R1 0528’s power, build prototypes, and even launch your first use case without any upfront cost.&lt;/p&gt;

&lt;p&gt;Use the Playground (No Coding Required)&lt;br&gt;
Instant Access: Sign up, claim your free credits, and start experimenting with DeepSeek R1 0528 and other top models in seconds.&lt;br&gt;
Interactive UI: Test prompts, chain-of-thought reasoning, and visualize results in real time.&lt;br&gt;
Model Comparison: Effortlessly switch between Qwen 3, Llama 4, DeepSeek, and more to find the perfect fit for your needs.&lt;br&gt;
Explore DeepSeek R1 0528 Demo Now&lt;/p&gt;

&lt;p&gt;Integrate via API (For Developers)&lt;br&gt;
Seamlessly connect DeepSeek R1 0528 to your applications, workflows, or chatbots with Novita AI’s unified REST API — no need to manage model weights or infrastructure. Novita AI offers multi-language SDKs (Python, Node.js, cURL, and more) and advanced parameter controls for power users.&lt;/p&gt;

&lt;p&gt;Option 1: Direct API Integration (Python Example)&lt;br&gt;
To get started, simply use the code snippet below:&lt;/p&gt;

&lt;p&gt;from openai import OpenAI&lt;/p&gt;

&lt;p&gt;client = OpenAI(&lt;br&gt;
    base_url="&lt;a href="https://api.novita.ai/v3/openai" rel="noopener noreferrer"&gt;https://api.novita.ai/v3/openai&lt;/a&gt;",&lt;br&gt;
    api_key="session_Ntg-O34ZOS-q5bNnkb3IcixmWnmxEQBxwKWMW3es3CD7KG4PEhFE1yRTRMGS3s8zZ52hrMdz14MmI4oalaDJTw==",&lt;br&gt;
)&lt;br&gt;
model = "deepseek/deepseek-r1-0528"&lt;br&gt;
stream = True # or False&lt;br&gt;
max_tokens = 2048&lt;br&gt;
system_content = ""Be a helpful assistant""&lt;br&gt;
temperature = 1&lt;br&gt;
top_p = 1&lt;br&gt;
min_p = 0&lt;br&gt;
top_k = 50&lt;br&gt;
presence_penalty = 0&lt;br&gt;
frequency_penalty = 0&lt;br&gt;
repetition_penalty = 1&lt;br&gt;
response_format = { "type": "text" }&lt;br&gt;
chat_completion_res = client.chat.completions.create(&lt;br&gt;
    model=model,&lt;br&gt;
    messages=[&lt;br&gt;
        {&lt;br&gt;
            "role": "system",&lt;br&gt;
            "content": system_content,&lt;br&gt;
        },&lt;br&gt;
        {&lt;br&gt;
            "role": "user",&lt;br&gt;
            "content": "Hi there!",&lt;br&gt;
        }&lt;br&gt;
    ],&lt;br&gt;
    stream=stream,&lt;br&gt;
    max_tokens=max_tokens,&lt;br&gt;
    temperature=temperature,&lt;br&gt;
    top_p=top_p,&lt;br&gt;
    presence_penalty=presence_penalty,&lt;br&gt;
    frequency_penalty=frequency_penalty,&lt;br&gt;
    response_format=response_format,&lt;br&gt;
    extra_body={&lt;br&gt;
      "top_k": top_k,&lt;br&gt;
      "repetition_penalty": repetition_penalty,&lt;br&gt;
      "min_p": min_p&lt;br&gt;
    }&lt;br&gt;
  )&lt;br&gt;
if stream:&lt;br&gt;
    for chunk in chat_completion_res:&lt;br&gt;
        print(chunk.choices[0].delta.content or "", end="")&lt;br&gt;
else:&lt;br&gt;
    print(chat_completion_res.choices[0].message.content)&lt;br&gt;
Key Features:&lt;/p&gt;

&lt;p&gt;Unified endpoint:/v3/openai supports OpenAI’s Chat Completions API format.&lt;br&gt;
Flexible controls: Adjust temperature, top-p, penalties, and more for tailored results.&lt;br&gt;
Streaming &amp;amp; batching: Choose your preferred response mode.&lt;br&gt;
Option 2: Multi-Agent Workflows with OpenAI Agents SDK&lt;br&gt;
Build advanced multi-agent systems by integrating Novita AI with the OpenAI Agents SDK:&lt;/p&gt;

&lt;p&gt;Plug-and-play: Use Novita AI’s LLMs in any OpenAI Agents workflow.&lt;br&gt;
Supports handoffs, routing, and tool use: Design agents that can delegate, triage, or run functions, all powered by Novita AI’s models.&lt;br&gt;
Python integration: Simply point the SDK to Novita’s endpoint (&lt;a href="https://api.novita.ai/v3/openai" rel="noopener noreferrer"&gt;https://api.novita.ai/v3/openai&lt;/a&gt;) and use your API key.&lt;br&gt;
Connect DeepSeek R1 0528 API on Third-Party Platforms&lt;br&gt;
Hugging Face: Use DeepSeek R1 0528 in Spaces, pipelines, or with the Transformers library via Novita AI endpoints.&lt;br&gt;
Agent &amp;amp; Orchestration Frameworks: Easily connect Novita AI with partner platforms like Continue, AnythingLLM, LangChain, Dify and Langflow through official connectors and step-by-step integration guides.&lt;br&gt;
OpenAI-Compatible API: Enjoy hassle-free migration and integration with tools such as Cline and Cursor, designed for the OpenAI API standard.&lt;br&gt;
Showcase&lt;br&gt;
Prompt: Write a Python program that shows a ball bouncing inside a spinning hexagon. The ball should be affected by gravity and friction, and it must bounce off the rotating walls realistically&lt;/p&gt;

&lt;p&gt;Prompt: Build a pilot game&lt;/p&gt;

&lt;p&gt;Prompt: Build a PDF summary web app + UI concept&lt;/p&gt;

&lt;p&gt;Novita AI is the All-in-one cloud platform that empowers your AI ambitions. Integrated APIs, serverless, GPU Instance — the cost-effective tools you need. Eliminate infrastructure, start free, and make your AI vision a reality.&lt;/p&gt;

</description>
      <category>deepseek</category>
    </item>
    <item>
      <title>Qwen 3 Now Available on Novita AI — Claim Your $10 Free Credits</title>
      <dc:creator>Novita AI</dc:creator>
      <pubDate>Fri, 23 May 2025 03:30:00 +0000</pubDate>
      <link>https://dev.to/novita_ai/qwen-3-now-available-on-novita-ai-claim-your-10-free-credits-1946</link>
      <guid>https://dev.to/novita_ai/qwen-3-now-available-on-novita-ai-claim-your-10-free-credits-1946</guid>
      <description>&lt;p&gt;We’re excited to announce a strategic partnership with SGLang, a fast serving engine for large language models and vision language models. Through this collaboration, Novita AI will provide high-performance GPU cloud resources for SGLang’s ongoing research, benchmarking, and optimization efforts.&lt;/p&gt;

&lt;p&gt;SGLang is a leading inference engine that co-designs a structured generation language with a highly optimized runtime, enabling powerful performance gains such as efficient RadixAttention cache reuse and zero-overhead batch scheduling for large language and vision-language models. By aligning language-level control with backend optimizations, it empowers developers to build complex generation workflows, multi-modal applications, and parallel inference pipelines with reliability and scale. SGLang is supported by leading institutions including NVIDIA, AMD, xAI, Oracle Cloud, Google Cloud, LinkedIn, Cursor, alongside research groups at Stanford, University of California, Berkeley, and University of California, Los Angeles — evidence of strong community engagement and broad industry adoption.&lt;/p&gt;

&lt;p&gt;“SGLang’s integration of language-level primitives with runtime optimizations demonstrates the value of aligning software and hardware to unlock new performance levels,” said Junyu Huang, Co-Founder &amp;amp; COO at Novita AI. “By contributing our infrastructure and expertise, we’ve already supported the development of SGLang’s first end-to-end multi-turn reinforcement learning (RL) framework and the Prism multi-large language model serving system, and remain committed to fueling its ongoing innovations for developers everywhere.”&lt;/p&gt;

&lt;p&gt;“We’re thrilled to partner with the SGLang team,” added Junyu Huang. “Having supported their RL framework and multi-LLM serving system, we’re excited to see these achievements accelerate their work and bring powerful inference performance to applications across industries.”&lt;/p&gt;

&lt;p&gt;Novita AI is also collaborating on SGLang’s large-scale expert parallelism project, an open-source implementation designed to approach the throughput benchmarks detailed in the official DeepSeek blog, partnering to bring this milestone to fruition.&lt;/p&gt;

&lt;p&gt;This collaboration reflects Novita AI’s ongoing commitment to advancing an open ecosystem of inference engines and supporting diverse research initiatives through shared infrastructure and joint development efforts.&lt;/p&gt;

&lt;p&gt;Through collaborations with pioneering open-source projects like SGLang, Novita AI continues to advance its mission of democratizing AI, making cutting-edge inference capabilities readily available to developers worldwide.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;About Novita AI&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Novita AI is an AI cloud platform that helps developers easily deploy AI models through a simple API, backed by affordable and reliable GPU cloud infrastructure. By supporting open-source libraries for LLM inference and serving — Novita AI is driving the future of AI and encouraging innovation across the industry.&lt;/p&gt;

</description>
    </item>
    <item>
      <title>LLM Dedicated Endpoint on Novita AI: Custom Models, Usage-Based Pricing, and DevOps-Free Scaling</title>
      <dc:creator>Novita AI</dc:creator>
      <pubDate>Wed, 14 May 2025 04:00:00 +0000</pubDate>
      <link>https://dev.to/novita_ai/llm-dedicated-endpoint-on-novita-ai-custom-models-usage-based-pricing-and-devops-free-scaling-2bgk</link>
      <guid>https://dev.to/novita_ai/llm-dedicated-endpoint-on-novita-ai-custom-models-usage-based-pricing-and-devops-free-scaling-2bgk</guid>
      <description>&lt;p&gt;Want to ship your own fine-tuned LLMs, without babysitting GPUs or racking up idle costs?&lt;/p&gt;

&lt;p&gt;&lt;a href="https://novita.ai/dedicated-endpoint?utm_source=blogs&amp;amp;utm_medium=article&amp;amp;utm_campaign=%3Fp%3D11236" rel="noopener noreferrer"&gt;&lt;strong&gt;Novita AI’s LLM Dedicated Endpoint&lt;/strong&gt;&lt;/a&gt; gives you true flexibility: run your custom models, pay only for tokens used, and let Novita handle deployment and scaling.&lt;/p&gt;

&lt;p&gt;Compared to &lt;a href="https://novita.ai/models?utm_source=blogs&amp;amp;utm_medium=article&amp;amp;utm_campaign=%3Fp%3D11236" rel="noopener noreferrer"&gt;LLM Public APIs&lt;/a&gt;, it’s your stack, your way. Compared to raw &lt;a href="https://novita.ai/gpus?utm_source=blogs&amp;amp;utm_medium=article&amp;amp;utm_campaign=%3Fp%3D11236" rel="noopener noreferrer"&gt;GPU hosting&lt;/a&gt;, you get predictable pricing and a pro team to keep your models running smoothly.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;What is an LLM Dedicated Endpoint?&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;A &lt;strong&gt;LLM Dedicated Endpoint&lt;/strong&gt; is your own private API for running any model you want — fine-tuned, proprietary, or mainstream. No noisy neighbors, no shared resources. Novita AI handles all the infra, you just send requests. &lt;a href="https://novita.ai/dedicated-endpoint?utm_source=blogs&amp;amp;utm_medium=article&amp;amp;utm_campaign=%3Fp%3D11236" rel="noopener noreferrer"&gt;Learn more&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Key Features&lt;/strong&gt;
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Bring Your Own Model:&lt;/strong&gt; Deploy your fine-tuned or custom LLMs.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;No Idle GPU Bills:&lt;/strong&gt; Pay only for tokens used (usage-based, not hourly).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Auto-Scales Instantly:&lt;/strong&gt; Handles spikes, no manual scaling.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Full Isolation:&lt;/strong&gt; Dedicated compute, your data only.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Enterprise Uptime, Low Latency:&lt;/strong&gt; SLAs for mission-critical apps.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Zero-DevOps:&lt;/strong&gt; Monitoring, scaling, and patching done for you.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;LLM Public Endpoints vs LLM Dedicated Endpoint&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Novita AI offers two LLM API flavors—pick what fits your workflow:&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;1.&lt;/strong&gt; &lt;a href="https://novita.ai/models/llm?utm_source=blogs&amp;amp;utm_medium=article&amp;amp;utm_campaign=%3Fp%3D11236" rel="noopener noreferrer"&gt;&lt;strong&gt;LLM Public Endpoints&lt;/strong&gt;&lt;/a&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;What:&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Plug-and-play APIs for open-source models like Llama, DeepSeek, Qwen, Gemma, and more.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;When to use:&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Prototyping, hackathons, projects with standard LLMs.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Why:&lt;/strong&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Fast to integrate&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;No servers or infra&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Scale to production&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;2.&lt;/strong&gt; &lt;a href="https://novita.ai/dedicated-endpoint?utm_source=blogs&amp;amp;utm_medium=article&amp;amp;utm_campaign=%3Fp%3D11236" rel="noopener noreferrer"&gt;&lt;strong&gt;LLM Dedicated Endpoint&lt;/strong&gt;&lt;/a&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;What:&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Your own API for custom/fine-tuned models, including proprietary LLMs.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;When to use:&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
When you need control, privacy, or custom models (think: internal tools, production SaaS, unique data).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Why:&lt;/strong&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Private, dedicated resources&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Custom SLAs and scaling&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Usage-based pricing&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Expert deployment and monitoring&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;TL;DR:&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Need standard models, fast? Go &lt;strong&gt;Public Endpoints&lt;/strong&gt;.&lt;br&gt;&lt;br&gt;
Need your own model, full control, and pro support? Go &lt;a href="https://novita.ai/dedicated-endpoint?utm_source=blogs&amp;amp;utm_medium=article&amp;amp;utm_campaign=%3Fp%3D11236" rel="noopener noreferrer"&gt;LLM Dedicated Endpoint&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Why Developers Love It&lt;/strong&gt;
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Drop-in API:&lt;/strong&gt; Keep your code—just update the endpoint URL.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;No Cloud Headaches:&lt;/strong&gt; No need for Dockerfiles, GPU quotas, or on-call alerts.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Transparent Pricing:&lt;/strong&gt; No surprises. Billed for tokens, with optional daily minimums.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;24/7 Support:&lt;/strong&gt; Hit a snag? Ping &lt;a href="https://discord.gg/YyPRAzwp7P" rel="noopener noreferrer"&gt;Novita’s support team&lt;/a&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;How to Get Started&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Ready to deploy?&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://meet.brevo.com/novita-ai/contact-sales" rel="noopener noreferrer"&gt;Contact Novita AI Sales&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Share your requirements (QPS, latency, model type)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Novita sets up your endpoint—no DevOps needed&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Update your API URL and ship!&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Conclusion&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;LLM Dedicated Endpoint on Novita AI&lt;/strong&gt; is the dev-friendly way to run custom models with no ops, no idle GPU costs, and no guesswork. You focus on building, Novita keeps your models running—secure, scalable, and fast.&lt;br&gt;&lt;br&gt;
Ready to launch your own LLM? &lt;a href="https://meet.brevo.com/novita-ai/contact-sales" rel="noopener noreferrer"&gt;Book a Demo&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Frequently Asked Questions&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;How does Novita handle scaling during traffic spikes?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Resources auto-scale based on real-time demand. You’re only billed for actual usage, not reserved capacity.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Can I migrate from a Novita public API to a Dedicated Endpoint?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Yes—just update the endpoint URL. 100% API compatibility means no code changes are required.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What if I need guaranteed uptime and latency?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Novita offers custom SLAs for uptime, latency, and throughput, tailored to your needs.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How is billing handled?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;You pay only for tokens processed, with a minimum daily token commitment. No idle GPU bills.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;a href="https://novita.ai/?utm_source=blog_llm&amp;amp;utm_medium=article&amp;amp;utm_campaign=qwen-3-30b-a3b-vs-qwq-32b" rel="noopener noreferrer"&gt;&lt;em&gt;Novita AI&lt;/em&gt;&lt;/a&gt; &lt;em&gt;is an AI cloud platform that offers developers an easy way to deploy AI models using our simple API, while also providing the affordable and reliable GPU cloud for building and scaling.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

</description>
      <category>llm</category>
    </item>
    <item>
      <title>Qwen 3 Now Available on Novita AI - Claim Your $10 Free Credits</title>
      <dc:creator>Novita AI</dc:creator>
      <pubDate>Tue, 29 Apr 2025 10:09:20 +0000</pubDate>
      <link>https://dev.to/novita_ai/qwen-3-now-available-on-novita-ai-claim-your-10-free-credits-5d28</link>
      <guid>https://dev.to/novita_ai/qwen-3-now-available-on-novita-ai-claim-your-10-free-credits-5d28</guid>
      <description>&lt;p&gt;Alibaba’s cutting-edge Qwen 3 large language models are now live on Novita AI’s Model API platform! Instantly access the latest Qwen3–235B-A22B, Qwen3–30B-A3B, and Qwen3–32B models — all with a massive 128,000 context window and industry-leading performance.&lt;/p&gt;

&lt;p&gt;For a limited time, new users can claim &lt;a href="https://novita.ai/referral?invited_code=5W10UA&amp;amp;utm_source=blogs&amp;amp;utm_medium=article&amp;amp;utm_campaign=qwen-3-now-available-on-novita-ai-claim-your-10-free-credits" rel="noopener noreferrer"&gt;&lt;strong&gt;$10 in free credits&lt;/strong&gt;&lt;/a&gt; to explore and build with Qwen 3.&lt;/p&gt;

&lt;p&gt;Here’s the current Qwen 3 lineup and pricing on Novita AI:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://novita.ai/models/llm/qwen-qwen3-235b-a22b-fp8?utm_source=blogs&amp;amp;utm_medium=article&amp;amp;utm_campaign=qwen-3-now-available-on-novita-ai-claim-your-10-free-credits" rel="noopener noreferrer"&gt;&lt;strong&gt;Qwen3–235B-A22B&lt;/strong&gt;&lt;/a&gt;&lt;strong&gt;:&lt;/strong&gt; $0.20 / M input tokens, $0.80 / M output tokens&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://novita.ai/models/llm/qwen-qwen3-30b-a3b-fp8?utm_source=blogs&amp;amp;utm_medium=article&amp;amp;utm_campaign=qwen-3-now-available-on-novita-ai-claim-your-10-free-credits" rel="noopener noreferrer"&gt;&lt;strong&gt;Qwen3–30B-A3B&lt;/strong&gt;&lt;/a&gt;&lt;strong&gt;:&lt;/strong&gt; $0.10 / M input tokens, $0.45 / M output tokens&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://novita.ai/models/llm/qwen-qwen3-32b-fp8?utm_source=blogs&amp;amp;utm_medium=article&amp;amp;utm_campaign=qwen-3-now-available-on-novita-ai-claim-your-10-free-credits" rel="noopener noreferrer"&gt;&lt;strong&gt;Qwen3–32B&lt;/strong&gt;&lt;/a&gt;&lt;strong&gt;:&lt;/strong&gt; $0.10 / M input tokens, $0.45 / M output tokens&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Power your chatbots, apps, and workflows with state-of-the-art language models — Qwen 3 is just an API call away.&lt;/p&gt;

&lt;h3&gt;
  
  
  What is Qwen 3?
&lt;/h3&gt;

&lt;p&gt;Qwen 3 is the latest and most advanced family of large language models developed by Alibaba Cloud’s Qwen team. Building on the experience of QwQ and Qwen2.5, Qwen 3 sets a new standard for open-source AI with major improvements in reasoning, multilingualism, and agentic abilities.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F1000%2F0%2AnhkYvIyB2RuXpHV4" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F1000%2F0%2AnhkYvIyB2RuXpHV4" width="1000" height="776"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Key Features of Qwen 3
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Dense and Mixture-of-Experts (MoE) models in various sizes:&lt;/strong&gt; Qwen 3 is available in both dense and MoE architectures, ranging from lightweight 0.6B and 1.7B models up to large-scale 32B (dense) and flagship 30B-A3B and 235B-A22B (MoE) variants.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Hybrid thinking modes:&lt;/strong&gt; The model allows seamless switching between &lt;em&gt;thinking mode&lt;/em&gt; (for complex, step-by-step logical reasoning, math, and code generation) and &lt;em&gt;non-thinking mode&lt;/em&gt; (for fast, efficient, general-purpose chat).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Significantly enhanced reasoning:&lt;/strong&gt; Qwen 3 surpasses previous Qwen models in mathematics, code generation, and commonsense logical reasoning. It also offers more stable and controllable reasoning budgets for different tasks.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Superior human preference alignment:&lt;/strong&gt; The model excels in creative writing, role-playing, multi-turn dialogues, and instruction following, resulting in more natural, engaging conversations.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Advanced agentic capabilities:&lt;/strong&gt; Qwen 3 is designed for agent-based workflows, supporting seamless integration with external tools and precise function calling in both reasoning modes. This enables state-of-the-art performance in complex, agent-driven tasks.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Robust multilingual support:&lt;/strong&gt; Supporting 119 languages and dialects, Qwen 3 is capable of high-quality multilingual instruction following and translation, opening the door for truly global applications.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F1000%2F0%2AlMQ6xNFrKEPYuACW" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F1000%2F0%2AlMQ6xNFrKEPYuACW" width="1000" height="841"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Benchmarks and Performance
&lt;/h3&gt;

&lt;p&gt;The Qwen 3 series demonstrates industry-leading performance across a comprehensive suite of AI benchmarks, excelling in coding, mathematics, general reasoning, and multilingual understanding.&lt;/p&gt;

&lt;h3&gt;
  
  
  Flagship Model: Qwen3–235B-A22B
&lt;/h3&gt;

&lt;p&gt;The flagship model, &lt;strong&gt;Qwen3–235B-A22B&lt;/strong&gt;, consistently achieves top or near-top results when compared with the most advanced models available today, such as DeepSeek-R1, OpenAI-01, OpenAI-o3-mini, Grok-3 Beta, and Gemini-2.5-Pro.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F1000%2F0%2Aoihni-KMiDNqZSLm" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F1000%2F0%2Aoihni-KMiDNqZSLm" width="1000" height="563"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Complex Reasoning:&lt;/strong&gt; Highest scores on ArenaHard (95.6), outperforming or matching all competitors.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Mathematics:&lt;/strong&gt; Leading results on AIME’24 (85.7) and AIME’25 (81.5), well ahead of most commercial and open-source models.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Coding:&lt;/strong&gt; Exceptional performance on LiveCodeBench (70.7) and CodeForces Elo (2056), confirming its strength in software and algorithmic tasks.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Multilingual &amp;amp; General Capabilities:&lt;/strong&gt; Qwen3–235B-A22B achieves strong results on LiveBench and MultiF, demonstrating robust real-world and multilingual understanding.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Model Efficiency and Scalability
&lt;/h3&gt;

&lt;p&gt;Qwen 3’s architectural innovations also translate to outstanding performance at smaller model sizes:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F1000%2F0%2A1_BOKA6jqTRDO5CB" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F1000%2F0%2A1_BOKA6jqTRDO5CB" width="1000" height="563"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Qwen3–32B (Dense):&lt;/strong&gt; Delivers results just behind the flagship, still outperforming most alternative models across all categories.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Qwen3–30B-A3B (MoE):&lt;/strong&gt; Outperforms QwQ-32B, despite using only a tenth of the activated parameters — showcasing Qwen’s efficiency and smart scaling.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Qwen3–4B (Dense):&lt;/strong&gt; Even this compact model can rival the performance of much larger models like Qwen2.5–72B-Instruct, especially on reasoning and multilingual tasks.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  How to Access Qwen 3 on Novita AI
&lt;/h3&gt;

&lt;p&gt;Getting started with Qwen 3 is fast, simple, and risk-free on Novita AI. Thanks to the Referral Program, you’ll receive &lt;strong&gt;$10 in free credits&lt;/strong&gt; — enough to fully explore Qwen 3’s power, build prototypes, and even launch your first use case without any upfront cost.&lt;/p&gt;

&lt;h3&gt;
  
  
  Use the Playground (No Coding Required)
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Instant Access&lt;/strong&gt;: &lt;a href="https://novita.ai/referral?invited_code=5W10UA&amp;amp;utm_source=blogs&amp;amp;utm_medium=article&amp;amp;utm_campaign=qwen-3-now-available-on-novita-ai-claim-your-10-free-credits" rel="noopener noreferrer"&gt;Sign up&lt;/a&gt;, claim your free credits, and start experimenting with Qwen 3 and other top models in seconds.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Interactive UI&lt;/strong&gt;: Test prompts, chain-of-thought reasoning, and visualize results in real time.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Model Comparison&lt;/strong&gt;: Effortlessly switch between Qwen 3, Llama 4, DeepSeek, and more to find the perfect fit for your needs.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Integrate via API (For Developers)
&lt;/h3&gt;

&lt;p&gt;Seamlessly connect Qwen 3 to your applications, workflows, or chatbots with Novita AI’s unified REST API — no need to manage model weights or infrastructure. Novita AI offers multi-language SDKs (Python, Node.js, cURL, and more) and advanced parameter controls for power users.&lt;/p&gt;

&lt;h4&gt;
  
  
  Option 1: Direct API Integration (Python Example)
&lt;/h4&gt;

&lt;p&gt;To get started, simply use the code snippet below:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OpenAI&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;base_url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://api.novita.ai/v3/openai&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;&amp;lt;YOUR Novita AI API Key&amp;gt;&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;qwen/qwen3-235b-a22b-fp8&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;stream&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt; &lt;span class="c1"&gt;# or False
&lt;/span&gt;&lt;span class="n"&gt;max_tokens&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;2048&lt;/span&gt;
&lt;span class="n"&gt;system_content&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Be a helpful assistant&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
&lt;span class="n"&gt;temperature&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
&lt;span class="n"&gt;top_p&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
&lt;span class="n"&gt;min_p&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
&lt;span class="n"&gt;top_k&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;50&lt;/span&gt;
&lt;span class="n"&gt;presence_penalty&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
&lt;span class="n"&gt;frequency_penalty&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
&lt;span class="n"&gt;repetition_penalty&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
&lt;span class="n"&gt;response_format&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="n"&gt;chat_completion_res&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;system&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;system_content&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Hi there!&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;stream&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;stream&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;max_tokens&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;max_tokens&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;temperature&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;temperature&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;top_p&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;top_p&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;presence_penalty&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;presence_penalty&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;frequency_penalty&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;frequency_penalty&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;response_format&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;response_format&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;extra_body&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;top_k&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;top_k&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;repetition_penalty&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;repetition_penalty&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;min_p&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;min_p&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;stream&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;chunk&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;chat_completion_res&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;delta&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;end&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;chat_completion_res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Option 2: Multi-Agent Workflows with OpenAI Agents SDK
&lt;/h4&gt;

&lt;p&gt;Build advanced multi-agent systems by integrating Novita AI with the &lt;a href="https://novita.ai/docs/guides/integration-openai-agents-sdk?utm_source=blogs&amp;amp;utm_medium=article&amp;amp;utm_campaign=qwen-3-now-available-on-novita-ai-claim-your-10-free-credits" rel="noopener noreferrer"&gt;OpenAI Agents SDK&lt;/a&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Plug-and-play:&lt;/strong&gt; Use Novita AI’s LLMs in any OpenAI Agents workflow.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Supports handoffs, routing, and tool use:&lt;/strong&gt; Design agents that can delegate, triage, or run functions, all powered by Novita AI’s models.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Python integration:&lt;/strong&gt; Simply point the SDK to Novita’s endpoint (&lt;code&gt;https://api.novita.ai/v3/openai&lt;/code&gt;) and use your API key.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Connect Qwen 3 API on Third-Party Platforms
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://novita.ai/docs/guides/huggingface?utm_source=blogs&amp;amp;utm_medium=article&amp;amp;utm_campaign=qwen-3-now-available-on-novita-ai-claim-your-10-free-credits" rel="noopener noreferrer"&gt;&lt;strong&gt;Hugging Face&lt;/strong&gt;&lt;/a&gt;: Use Qwen 3 in Spaces, pipelines, or with the Transformers library via Novita AI endpoints.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Agent &amp;amp; Orchestration Frameworks:&lt;/strong&gt; Easily connect Novita AI with partner platforms like &lt;a href="https://novita.ai/docs/guides/continue?utm_source=blogs&amp;amp;utm_medium=article&amp;amp;utm_campaign=qwen-3-now-available-on-novita-ai-claim-your-10-free-credits" rel="noopener noreferrer"&gt;Continue&lt;/a&gt;, &lt;a href="https://novita.ai/docs/guides/anythingllm?utm_source=blogs&amp;amp;utm_medium=article&amp;amp;utm_campaign=qwen-3-now-available-on-novita-ai-claim-your-10-free-credits" rel="noopener noreferrer"&gt;AnythingLLM,&lt;/a&gt; &lt;a href="https://novita.ai/docs/guides/langchain?utm_source=blogs&amp;amp;utm_medium=article&amp;amp;utm_campaign=qwen-3-now-available-on-novita-ai-claim-your-10-free-credits" rel="noopener noreferrer"&gt;LangChain&lt;/a&gt;, &lt;a href="https://novita.ai/docs/guides/dify?utm_source=blogs&amp;amp;utm_medium=article&amp;amp;utm_campaign=qwen-3-now-available-on-novita-ai-claim-your-10-free-credits" rel="noopener noreferrer"&gt;Dify&lt;/a&gt; and &lt;a href="https://novita.ai/docs/guides/langflow?utm_source=blogs&amp;amp;utm_medium=article&amp;amp;utm_campaign=qwen-3-now-available-on-novita-ai-claim-your-10-free-credits" rel="noopener noreferrer"&gt;Langflow&lt;/a&gt; through official connectors and step-by-step integration guides.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;OpenAI-Compatible API:&lt;/strong&gt; Enjoy hassle-free migration and integration with tools such as &lt;a href="https://blogs.novita.ai/how-to-integrate-novita-ai-llm-api-with-cline-in-vscode/" rel="noopener noreferrer"&gt;Cline&lt;/a&gt; and &lt;a href="https://novita.ai/docs/guides/cursor?utm_source=blogs&amp;amp;utm_medium=article&amp;amp;utm_campaign=qwen-3-now-available-on-novita-ai-claim-your-10-free-credits" rel="noopener noreferrer"&gt;Cursor&lt;/a&gt;, designed for the OpenAI API standard.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Best Practices for Optimal Qwen 3 Performance
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Sampling Parameter Settings&lt;/strong&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Thinking Mode&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
&lt;code&gt;enable_thinking=True&lt;/code&gt;&lt;br&gt;&lt;br&gt;
&lt;strong&gt;Temperature:&lt;/strong&gt; 0.6&lt;br&gt;&lt;br&gt;
&lt;strong&gt;TopP:&lt;/strong&gt; 0.95&lt;br&gt;&lt;br&gt;
&lt;strong&gt;TopK:&lt;/strong&gt; 20&lt;br&gt;&lt;br&gt;
&lt;strong&gt;MinP:&lt;/strong&gt; 0&lt;br&gt;&lt;br&gt;
&lt;em&gt;Tip:&lt;/em&gt; Avoid greedy decoding to prevent degraded performance or repetitive outputs.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Non-Thinking Mode&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
&lt;code&gt;enable_thinking=False&lt;/code&gt;&lt;br&gt;&lt;br&gt;
&lt;strong&gt;Temperature:&lt;/strong&gt; 0.7&lt;br&gt;&lt;br&gt;
&lt;strong&gt;TopP:&lt;/strong&gt; 0.8&lt;br&gt;&lt;br&gt;
&lt;strong&gt;TopK:&lt;/strong&gt; 20&lt;br&gt;&lt;br&gt;
&lt;strong&gt;MinP:&lt;/strong&gt; 0&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Repetition Control&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
For supported frameworks, adjust &lt;code&gt;presence_penalty&lt;/code&gt; between &lt;strong&gt;0&lt;/strong&gt; and &lt;strong&gt;2&lt;/strong&gt; to reduce repetitions.&lt;br&gt;&lt;br&gt;
&lt;em&gt;Note:&lt;/em&gt; Higher values may cause some language mixing or a slight decrease in model performance.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Output Length Recommendations&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;For most queries, set the output length to &lt;strong&gt;32,768 tokens&lt;/strong&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;For complex benchmarking tasks (such as math or programming competitions), increase the max output length to &lt;strong&gt;38,912 tokens&lt;/strong&gt; for more comprehensive responses.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;3. Standardizing Output Format&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Math Problems:&lt;/strong&gt; Include this in your prompt: &lt;em&gt;“Please reason step by step, and put your final answer within \boxed{}.”&lt;/em&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Multiple-Choice Questions:&lt;/strong&gt; Standardize responses using a JSON field: &lt;em&gt;“Please show your choice in the answer field with only the choice letter, e.g., “answer”: “C”.”&lt;/em&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;4. Conversation History Management&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;In multi-turn conversations, include only the final output in the chat history. Omit any intermediate “thinking” content.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;If using a Jinja2 chat template, this is handled automatically. For other frameworks, ensure this practice is followed manually.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;By following these recommendations, you’ll ensure Qwen 3 consistently delivers accurate, high-quality results across all use cases.&lt;/p&gt;

&lt;h3&gt;
  
  
  Conclusion
&lt;/h3&gt;

&lt;p&gt;Qwen 3 delivers best-in-class performance for coding, reasoning, and multilingual tasks — no matter the project size. Ready to see it in action?&lt;/p&gt;

&lt;p&gt;Try the &lt;a href="https://novita.ai/models/llm/qwen-qwen3-235b-a22b-fp8?utm_source=blogs&amp;amp;utm_medium=article&amp;amp;utm_campaign=qwen-3-now-available-on-novita-ai-claim-your-10-free-credits" rel="noopener noreferrer"&gt;Qwen 3 demo&lt;/a&gt; on Novita AI now and &lt;a href="https://novita.ai/referral?invited_code=5W10UA&amp;amp;utm_source=blogs&amp;amp;utm_medium=article&amp;amp;utm_campaign=qwen-3-now-available-on-novita-ai-claim-your-10-free-credits" rel="noopener noreferrer"&gt;claim your free credits&lt;/a&gt;!&lt;/p&gt;

&lt;p&gt;Originally published on &lt;a href="https://novita.ai/?utm_source=blogs&amp;amp;utm_medium=article&amp;amp;utm_campaign=qwen-3-now-available-on-novita-ai-claim-your-10-free-credits" rel="noopener noreferrer"&gt;&lt;em&gt;Novita AI&lt;/em&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://novita.ai/?utm_source=blogs&amp;amp;utm_medium=article&amp;amp;utm_campaign=qwen-3-now-available-on-novita-ai-claim-your-10-free-credits" rel="noopener noreferrer"&gt;&lt;em&gt;Novita AI&lt;/em&gt;&lt;/a&gt; &lt;em&gt;is an AI cloud platform that offers developers an easy way to deploy AI models using our simple API, while also providing the affordable and reliable GPU cloud for building and scaling.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>llm</category>
      <category>ai</category>
      <category>qwen</category>
    </item>
    <item>
      <title>Earn $500 Free Credits: Build Faster with Deepseek, Llama &amp; Qwen on Novita AI</title>
      <dc:creator>Novita AI</dc:creator>
      <pubDate>Thu, 24 Apr 2025 05:46:08 +0000</pubDate>
      <link>https://dev.to/novita_ai/earn-500-free-credits-build-faster-with-deepseek-llama-qwen-on-novita-ai-553d</link>
      <guid>https://dev.to/novita_ai/earn-500-free-credits-build-faster-with-deepseek-llama-qwen-on-novita-ai-553d</guid>
      <description>&lt;p&gt;Novita AI is offering an exclusive, limited-time opportunity! With the Referral Program, you can earn up to $500 in LLM API credits by simply referring your friends. Here’s the best part: both you and your referral will receive $10 in credits, unlocking access to top-tier models like DeepSeek, Llama and Qwen.&lt;/p&gt;

&lt;p&gt;These credits can power your next big project, whether you’re working with Hugging Face, Anything LLM, Langflow, Continue, Helicone, Dify, Cursor, LobeChat, and many more.&lt;br&gt;
Don’t miss out on the opportunity to supercharge your AI applications.&lt;/p&gt;

&lt;p&gt;👉 Sign up for the Novita AI Referral Program and begin earning credits now.&lt;/p&gt;

&lt;p&gt;Why Novita AI is the Trusted Choice&lt;br&gt;
Artificial Analysis, a leading AI model evaluation platform, ranks Novita AI alongside industry leaders such as Together AI and Fireworks AI, reinforcing Novita AI’s reputation as a trusted choice for developers worldwide.&lt;/p&gt;

&lt;p&gt;Additionally, OpenRouter recognizes Novita AI as one of the most cost-effective LLM API providers.&lt;/p&gt;

&lt;p&gt;Novita AI also serves as the official inference provider on Hugging Face.&lt;/p&gt;

&lt;p&gt;4 Easy Steps to Claim $10 in API Credits&lt;br&gt;
Visit the Referral Program Page&lt;br&gt;
Head to the official page to begin.&lt;br&gt;
Enter Your Invite Code&lt;br&gt;
Use either the official invite code 5W10UA or your personal one to get started.&lt;br&gt;
Create Your Novita AI Account&lt;br&gt;
Sign up using your email, Google, Hugging Face or GitHub account.&lt;br&gt;
Verify Your GitHub Account&lt;br&gt;
Complete the verification process to unlock your credits.&lt;br&gt;
3 Ways to Share and Earn Up to $500 in Credits&lt;br&gt;
Earn up to $500 in LLM API credits by referring others. Here’s how you can share and earn:&lt;/p&gt;

&lt;p&gt;Copy Your Referral Link:&lt;br&gt;
&lt;a href="https://novita.ai/referral?invited_code=xxx" rel="noopener noreferrer"&gt;https://novita.ai/referral?invited_code=xxx&lt;/a&gt;&lt;br&gt;
Copy your own Referral Code&lt;br&gt;
Share on Social Media:&lt;br&gt;
Post your referral link on platforms like Twitter (X), LinkedIn, Facebook, or anywhere else developers are hanging out.&lt;br&gt;
The more you share, the more you earn!&lt;/p&gt;

&lt;p&gt;LLM API on Novita AI&lt;br&gt;
You can use your credits across the entire range of LLM APIs available on Novita AI. Below is a comprehensive list of all supported LLM APIs on the platform.&lt;/p&gt;

&lt;p&gt;LLM API&lt;br&gt;
Integrated Projects &amp;amp; SDKs&lt;br&gt;
Novita AI supports seamless integration with many leading open-source projects and developer tools. Once you get the LLM API credits, you can call the API on the following platforms.&lt;/p&gt;

&lt;p&gt;Novita AI &amp;amp; OpenAI Agents SDK&lt;br&gt;
Novita AI &amp;amp; AnythingLLM&lt;br&gt;
Novita AI &amp;amp; Dify&lt;br&gt;
Novita AI &amp;amp; Helicone&lt;br&gt;
Novita AI &amp;amp; Hugging Face&lt;br&gt;
Novita AI &amp;amp; Langflow&lt;br&gt;
Novita AI &amp;amp; Continue&lt;br&gt;
Novita AI &amp;amp; Cursor&lt;br&gt;
Novita AI &amp;amp; LangChain&lt;br&gt;
Novita AI &amp;amp; Skyvern&lt;br&gt;
Novita AI &amp;amp; LobeChat&lt;br&gt;
Novita AI &amp;amp; ai-gradio&lt;br&gt;
Novita AI &amp;amp; Langfuse&lt;br&gt;
Novita AI &amp;amp; Verba&lt;br&gt;
Novita AI &amp;amp; Portkey&lt;br&gt;
Novita AI &amp;amp; DocsGPT&lt;br&gt;
Novita AI &amp;amp; LlamaIndex&lt;br&gt;
Novita AI &amp;amp; LoLLMS WebUI&lt;br&gt;
Novita AI &amp;amp; CodeCompanion.nvim&lt;br&gt;
Novita AI &amp;amp; Page Assist&lt;br&gt;
Novita AI &amp;amp; DeepSearcher&lt;br&gt;
Start Earning and Building with Novita AI Today!&lt;br&gt;
Don’t miss out on the chance to earn up to $500 in credits, unlock powerful LLM API models, and supercharge your projects with Novita AI. Whether you’re building AI-powered tools, developing advanced agents, or creating the next big thing in AI, Novita AI is your trusted partner.&lt;/p&gt;

&lt;p&gt;👉 Sign up now, share your link, and start building!&lt;/p&gt;

&lt;p&gt;Novita AI is an AI cloud platform that offers developers an easy way to deploy AI models using our simple API, while also providing an affordable and reliable GPU cloud for building and scaling.&lt;/p&gt;

</description>
      <category>llm</category>
      <category>llama</category>
      <category>deepseek</category>
      <category>qwen</category>
    </item>
    <item>
      <title>What is Agent2Agent (A2A)? A New Era of AI Agent Interaction</title>
      <dc:creator>Novita AI</dc:creator>
      <pubDate>Sat, 12 Apr 2025 04:00:00 +0000</pubDate>
      <link>https://dev.to/novita_ai/what-is-agent2agent-a2a-a-new-era-of-ai-agent-interaction-402f</link>
      <guid>https://dev.to/novita_ai/what-is-agent2agent-a2a-a-new-era-of-ai-agent-interaction-402f</guid>
      <description>&lt;p&gt;In the world of artificial intelligence (AI), the ability for different AI agents to communicate and collaborate is essential for streamlining processes and maximizing productivity. Google has taken a major step forward in this area with the introduction of the &lt;strong&gt;Agent2Agent (A2A)&lt;/strong&gt; protocol. This open protocol allows AI agents to interact with each other across different platforms, breaking down silos and enabling more efficient collaboration. In this article, we’ll explore what A2A is, how it works, and the real-world applications of this groundbreaking protocol.&lt;/p&gt;

&lt;h3&gt;
  
  
  What is Agent2Agent (A2A)?
&lt;/h3&gt;

&lt;p&gt;Agent2Agent (A2A) is an open protocol developed by Google that allows AI agents to communicate and collaborate, regardless of their underlying frameworks or platforms. It enables seamless interaction between diverse AI agents, making it possible for them to perform complex tasks together without being constrained by system boundaries.&lt;/p&gt;

&lt;p&gt;Think of A2A as a universal translator for AI agents — it allows different agents to “speak” the same language, regardless of the technology they’re built on. With A2A, AI agents can now share information, update each other, and perform tasks collaboratively without requiring a complete overhaul of existing systems.&lt;/p&gt;

&lt;p&gt;This protocol is set to radically change how businesses operate by allowing AI agents to integrate more easily into everyday workflows, without needing to be customized for every different platform or framework.&lt;/p&gt;

&lt;h3&gt;
  
  
  How Does A2A Work?
&lt;/h3&gt;

&lt;p&gt;A2A works by facilitating communication between two key types of agents: the &lt;strong&gt;client agent&lt;/strong&gt; and the &lt;strong&gt;remote agent&lt;/strong&gt;. Let’s break down the process:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Discovery&lt;/strong&gt;: The client agent first discovers the capabilities of the remote agent by fetching its “Agent Card,” a JSON-based file that contains metadata about the remote agent, including its skills, authentication requirements, and endpoint URL.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Task Assignment&lt;/strong&gt;: Once a suitable agent has been identified, the client agent can assign a task to the remote agent. This task is represented by a unique Task ID, which helps both agents keep track of the progress and state of the task.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Communication&lt;/strong&gt;: After the task is initiated, the two agents communicate with each other by sending messages, which may include text, files, or other data. These messages are referred to as “Parts,” and each part can contain different types of content such as plain text, files, or structured data.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Task Progress&lt;/strong&gt;: A2A supports long-running tasks and can provide real-time updates using &lt;strong&gt;Server-Sent Events (SSE)&lt;/strong&gt; or &lt;strong&gt;Push Notifications&lt;/strong&gt;. This allows agents to continuously update each other on the progress of the task.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Completion&lt;/strong&gt;: Once the task is completed, the results are referred to as &lt;strong&gt;Artifacts&lt;/strong&gt;. These artifacts represent the outputs or results generated by the remote agent in response to the client’s task.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Key Features of Agent2Agent
&lt;/h3&gt;

&lt;h3&gt;
  
  
  1. Interoperability Across Platforms
&lt;/h3&gt;

&lt;p&gt;A2A is designed to be platform-agnostic, allowing AI agents built on different frameworks to work together seamlessly. This breaks down traditional system silos, enabling businesses to integrate their existing systems with AI agents without needing to overhaul their entire infrastructure.&lt;/p&gt;

&lt;p&gt;For example, a business might use &lt;strong&gt;Atlassian&lt;/strong&gt; for project management, &lt;strong&gt;Box&lt;/strong&gt; for file storage, and &lt;strong&gt;Salesforce&lt;/strong&gt; for customer relationship management. With A2A, these systems can now work together by allowing their respective agents to communicate and share data.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Task Management and Flexibility
&lt;/h3&gt;

&lt;p&gt;A2A provides a robust mechanism for managing tasks. Each task has a lifecycle, and agents can communicate with each other to ensure the task progresses smoothly. Whether the task is a simple query or a complex, multi-step process, A2A handles it efficiently.&lt;/p&gt;

&lt;p&gt;Additionally, the protocol is designed to support a wide range of tasks, from quick operations to long-running research projects. This flexibility is crucial in industries like healthcare or research, where tasks can vary dramatically in terms of time and complexity.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Security and Authentication
&lt;/h3&gt;

&lt;p&gt;A2A supports enterprise-level authentication and authorization, ensuring that data exchanges between agents are secure and comply with regulations. This makes it ideal for businesses that need to ensure sensitive data is handled appropriately during agent interactions.&lt;/p&gt;

&lt;p&gt;By supporting existing authentication mechanisms such as OpenAPI, A2A ensures that enterprises can quickly implement it into their current systems without worrying about security issues.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Multi-Modal Support
&lt;/h3&gt;

&lt;p&gt;Unlike traditional text-based AI agents, A2A supports multiple modes of interaction. This means agents can communicate not only through text, but also through images, videos, and audio. This opens up new possibilities for tasks that require richer forms of communication, such as training simulations, customer service interactions, and multimedia content generation.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. Real-Time Updates and Notifications
&lt;/h3&gt;

&lt;p&gt;For tasks that take a long time to complete, A2A offers real-time updates via &lt;strong&gt;Server-Sent Events (SSE)&lt;/strong&gt; or &lt;strong&gt;Push Notifications&lt;/strong&gt;. These updates allow the client agent to track the status of the task and receive feedback about the task’s progress, artifacts, or issues in real-time. This feature is especially useful for long-running tasks in industries like pharmaceuticals, where real-time progress updates are crucial for timely decision-making.&lt;/p&gt;

&lt;h3&gt;
  
  
  Real-World Applications of A2A
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F1000%2F0%2Ag1f6VxoTRn7-1kGv" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F1000%2F0%2Ag1f6VxoTRn7-1kGv" width="1000" height="614"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Enterprise Software Integration
&lt;/h3&gt;

&lt;p&gt;Many enterprises use a range of platforms for different functions — for example, Atlassian for project management, Box for file storage, and Salesforce for customer relationship management. Traditionally, these systems don’t communicate with each other. With A2A, however, these platforms can integrate seamlessly, allowing their respective agents to share data and automate tasks.&lt;/p&gt;

&lt;p&gt;For instance, an e-commerce company could use A2A to link its order management system with intelligent agents that provide real-time logistics updates. This integration would streamline operations without requiring the company to rebuild its existing infrastructure.&lt;/p&gt;

&lt;h3&gt;
  
  
  Research and Development
&lt;/h3&gt;

&lt;p&gt;A2A is also useful in research environments, where tasks can range from simple data retrieval to complex simulations. A research organization working on drug development might use A2A to connect various agents, such as those responsible for data analysis, database querying, and simulations of molecular structures.&lt;/p&gt;

&lt;p&gt;In this context, A2A provides real-time progress updates, helping researchers stay informed about the status of long-running tasks, such as simulating the interaction of drug molecules with human cells.&lt;/p&gt;

&lt;h3&gt;
  
  
  Healthcare
&lt;/h3&gt;

&lt;p&gt;In the healthcare industry, A2A can be used to integrate AI agents that handle patient records, diagnosis, treatment recommendations, and follow-ups. By using A2A, healthcare systems can ensure that these various agents can work together seamlessly, improving patient care and operational efficiency.&lt;/p&gt;

&lt;p&gt;For example, an agent in a hospital’s scheduling system might use A2A to communicate with an AI-powered diagnostic agent to schedule and follow up on appointments related to specific health conditions.&lt;/p&gt;

&lt;h3&gt;
  
  
  Conclusion
&lt;/h3&gt;

&lt;p&gt;The &lt;strong&gt;Agent2Agent (A2A)&lt;/strong&gt; protocol represents a major leap forward in AI interoperability, allowing AI agents to communicate and collaborate seamlessly across different platforms and frameworks. By breaking down traditional system silos, A2A enables more efficient, secure, and flexible workflows, making it easier for businesses and developers to integrate AI into their operations.&lt;/p&gt;

&lt;p&gt;As A2A continues to gain adoption across industries, it will likely lead to significant advancements in AI-powered automation, communication, and collaboration. Whether you’re a developer looking to integrate AI into your applications, or a business seeking to optimize your workflows, A2A has the potential to revolutionize the way we interact with AI agents.&lt;/p&gt;

&lt;p&gt;By embracing A2A, businesses can unlock new possibilities, improve operational efficiency, and pave the way for a more interconnected future in AI development.&lt;/p&gt;

&lt;h3&gt;
  
  
  Frequently Asked Questions
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;What distinguishes A2A from other interoperability protocols?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A2A focuses specifically on agent-to-agent communication and collaboration across different platforms and vendors, providing a networking layer for agents to discover, negotiate, and interact securely.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Can A2A work with agents built on any framework?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Yes, A2A is framework-agnostic, enabling communication between agents regardless of their underlying technology or vendor.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How does A2A ensure security?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A2A uses token-based security for function calling, leverages DNS security for discovery, and specifies authentication requirements through Agent Cards.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How does A2A complement Anthropic’s Model Context Protocol (MCP)?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;While &lt;a href="https://blogs.novita.ai/what-is-mcp-a-developers-guide-to-model-context-protocol/" rel="noopener noreferrer"&gt;MCP&lt;/a&gt; enhances individual agent capabilities through plugins, A2A facilitates communication and collaboration between different agents, serving as a networking layer for multi-agent systems.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;About Novita AI&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://novita.ai/?utm_source=medium_llm&amp;amp;utm_medium=article&amp;amp;utm_campaign=a2a" rel="noopener noreferrer"&gt;Novita AI&lt;/a&gt; is an AI cloud platform that offers developers an easy way to deploy AI models using our simple API, while also providing an affordable and reliable GPU cloud for building and scaling.&lt;/p&gt;
&lt;/blockquote&gt;

</description>
      <category>llm</category>
      <category>agents</category>
      <category>ai</category>
    </item>
    <item>
      <title>Novita AI Evaluates FlashMLA on H100 and H200</title>
      <dc:creator>Novita AI</dc:creator>
      <pubDate>Fri, 11 Apr 2025 08:17:59 +0000</pubDate>
      <link>https://dev.to/novita_ai/novita-ai-evaluates-flashmla-on-h100-and-h200-5c1a</link>
      <guid>https://dev.to/novita_ai/novita-ai-evaluates-flashmla-on-h100-and-h200-5c1a</guid>
      <description>&lt;p&gt;DeepSeek has officially kicked off its five-day open source release initiative, with the first featured project being &lt;strong&gt;FlashMLA&lt;/strong&gt;. FlashMLA is an optimized, high-efficiency MLA decoding kernel specifically designed for NVIDIA Hopper GPUs (e.g., H800 SXM5). Its primary goal is to accelerate computations for large-scale models, particularly enhancing performance on NVIDIA's high-end GPUs.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;As a leading provider of AI infrastructure, Novita AI was among the first to evaluate FlashMLA's performance across mainstream Hopper GPUs (H100, H200).&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  What is MLA?
&lt;/h2&gt;

&lt;p&gt;Before diving into the evaluation results, let’s take a moment to understand some relevant background concepts.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Hopper GPU&lt;/strong&gt;: NVIDIA's next-generation high-performance GPU architecture, engineered for AI and high-performance computing (HPC). Built with advanced process technologies and an innovative architecture, Hopper GPUs deliver exceptional performance and energy efficiency for complex computational tasks. The mainstream Hopper GPUs include H100 and H200.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Decoding Kernel&lt;/strong&gt;: A hardware or software module specifically designed to accelerate decoding tasks. In AI inference, decoding kernels significantly enhance the speed and efficiency of model inference, particularly when processing sequential data.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Key-Value (KV) Pairs&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Key&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Represents a compressed version of the input data, used to compute attention weights (how much focus to place on different parts of the input).&lt;/li&gt;
&lt;li&gt;Example: In text generation, keys help the model identify which words in a sentence are most relevant to the current word being generated.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;p&gt;&lt;strong&gt;Value&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Contains the actual information associated with each input token, weighted by the attention scores.&lt;/li&gt;
&lt;li&gt;Example: Values store the semantic meaning of words, which are combined based on attention weights to produce the output.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;/li&gt;

&lt;li&gt;&lt;p&gt;&lt;strong&gt;MLA (Multi-head Latent Attention)&lt;/strong&gt;: A novel attention mechanism that requires lighter KV (key-value) caching, making it more scalable for long-sequence processing. MLA outperforms traditional Multi-Head Attention (MHA) mechanisms in both scalability and performance.&lt;/p&gt;&lt;/li&gt;

&lt;/ul&gt;

&lt;h3&gt;
  
  
  MHA VS MQA VS GQA VS MLA
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;strong&gt;Module&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Technical Logic&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Inference Speed&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Model Performance&lt;/strong&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;MHA&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Multiple heads independently generate keys and values with no sharing (full-dimensional computation).&lt;/td&gt;
&lt;td&gt;⭐️&lt;/td&gt;
&lt;td&gt;⭐️⭐️⭐️&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;MQA&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;All query heads share a single key-value pair (single KV group).&lt;/td&gt;
&lt;td&gt;⭐️⭐️⭐️&lt;/td&gt;
&lt;td&gt;⭐️&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;GQA&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Query heads share key-value pairs in groups (multiple KV groups).&lt;/td&gt;
&lt;td&gt;⭐️⭐️&lt;/td&gt;
&lt;td&gt;⭐️⭐️&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;MLA&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Key-value pairs are compressed into low-dimensional latent vectors and decoded with decoupled RoPE to retain positional information.&lt;/td&gt;
&lt;td&gt;⭐️⭐️⭐️⭐️&lt;/td&gt;
&lt;td&gt;⭐️⭐️⭐️⭐️&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;MQA/GQA: A "simplified version" of MHA&lt;/strong&gt;, focusing on efficiency at the cost of information loss.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;MLA&lt;/strong&gt;: An "upgraded compressed version" that balances memory efficiency and information retention, even outperforming MHA.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Architectural Innovation&lt;/strong&gt;: MLA is not a mere optimization but a reimagining of attention mechanisms, leveraging latent variables to reconstruct them mathematically. It achieves the best of both worlds: efficiency and capability.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  FlashMLA Performance Evaluation by Novita AI
&lt;/h2&gt;

&lt;p&gt;DeepSeek has announced that FlashMLA achieves a &lt;strong&gt;memory bandwidth limit of 3000 GB/s&lt;/strong&gt; and a &lt;strong&gt;compute limit of 580 TFLOPS&lt;/strong&gt; on the H800 SXM5 GPU. To validate these claims, &lt;strong&gt;Novita AI&lt;/strong&gt; &lt;strong&gt;conducted a comprehensive evaluation&lt;/strong&gt;, testing FlashMLA under various parameter configurations.&lt;/p&gt;

&lt;p&gt;To present the results more intuitively, the horizontal axis in the performance charts represents the following parameter configurations:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Batch Size&lt;/strong&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Sequence Length&lt;/strong&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Number of Attention Heads&lt;/strong&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;Note&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;These results are based on the official test scripts. Without knowledge of the optimal parameter configurations, the data may not fully reflect theoretical maximums.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  What Impact Will FlashMLA Have?
&lt;/h2&gt;

&lt;p&gt;The release of FlashMLA has not only captured the interest of developers but also garnered positive responses from mainstream inference frameworks, &lt;strong&gt;vLLM&lt;/strong&gt; and &lt;strong&gt;SGLang&lt;/strong&gt;.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;vLLM Integration&lt;/strong&gt;:&lt;br&gt;&lt;br&gt;
The vLLM team has announced plans to integrate FlashMLA soon. Technically, FlashMLA is built on &lt;strong&gt;PagedAttention&lt;/strong&gt;, making it highly compatible with vLLM's technology stack. Once integrated, FlashMLA is expected to further enhance vLLM's inference performance.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;SGLang Adoption&lt;/strong&gt;:&lt;br&gt;&lt;br&gt;
SGLang will continue utilizing the already integrated &lt;strong&gt;FlashInferMLA&lt;/strong&gt;, which has been evaluated to deliver performance comparable to FlashMLA.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;a href="https://novita.ai/?utm_source=hashnode_llm&amp;amp;utm_medium=article&amp;amp;utm_campaign=deepseek-flashmla" rel="noopener noreferrer"&gt;&lt;strong&gt;Novita AI&lt;/strong&gt;&lt;/a&gt; &lt;strong&gt;is an AI cloud platform that offers developers an easy way to deploy AI models using our simple API, while also providing the affordable and reliable GPU cloud for building and scaling.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fblogs.novita.ai%2Fwp-content%2Fuploads%2F2025%2F02%2Fr1-5-1024x530.png%2520align%3D" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fblogs.novita.ai%2Fwp-content%2Fuploads%2F2025%2F02%2Fr1-5-1024x530.png%2520align%3D" alt="try deepseek r1" width="1023" height="530"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;💡&lt;br&gt;
&lt;a rel="follow noopener noreferrer" href="https://novita.ai/?utm_source=hashnode_llm&amp;amp;utm_medium=article&amp;amp;utm_campaign=deepseek-flashmla"&gt;Get $20 credits and Try DeepSeek now!&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  &lt;strong&gt;Recommend Reading&lt;/strong&gt;
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://blogs.novita.ai/how-to-access-deepseek-v3/" rel="noopener noreferrer"&gt;A Guide to Accessing DeepSeek V3: Locally and via API&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://blogs.novita.ai/deepseek-v3-vs-deepseek-r1/" rel="noopener noreferrer"&gt;DeepSeek V3 vs R1: Staged Training vs Iterative SFT-RL Cycles&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://blogs.novita.ai/how-to-access-deepseek-v3-locally/" rel="noopener noreferrer"&gt;Running DeepSeek V3 Locally: A Developer’s Guide&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>deepseek</category>
    </item>
    <item>
      <title>Llama 4 Now Available on Novita AI: Unleashing Multimodal MoE Power</title>
      <dc:creator>Novita AI</dc:creator>
      <pubDate>Tue, 08 Apr 2025 06:33:39 +0000</pubDate>
      <link>https://dev.to/novita_ai/llama-4-now-available-on-novita-ai-unleashing-multimodal-moe-power-17a0</link>
      <guid>https://dev.to/novita_ai/llama-4-now-available-on-novita-ai-unleashing-multimodal-moe-power-17a0</guid>
      <description>&lt;p&gt;Meta has just unveiled its groundbreaking Llama 4 family of models, marking a significant leap in AI capabilities with native multimodality and mixture-of-experts (MoE) architecture.&lt;/p&gt;

&lt;p&gt;Today, we’re excited to announce that Llama 4 Scout and Llama 4 Maverick are now available on Novita AI, enabling business and developers to harness these powerful models through simple API integration.&lt;/p&gt;

&lt;p&gt;Novita AI is offering the first of the Llama 4 model herd at the following pricing:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://novita.ai/models/llm/meta-llama-llama-4-scout-17b-16e-instruct" rel="noopener noreferrer"&gt;&lt;strong&gt;Llama 4 Scout&lt;/strong&gt;&lt;/a&gt;&lt;strong&gt;:&lt;/strong&gt; $0.1 / M input tokens and $0.5 / M output tokens&lt;br&gt;&lt;br&gt;
&lt;a href="https://novita.ai/models/llm/meta-llama-llama-4-maverick-17b-128e-instruct-fp8" rel="noopener noreferrer"&gt;&lt;strong&gt;Llama 4 Maverick&lt;/strong&gt;&lt;/a&gt;&lt;strong&gt;:&lt;/strong&gt; $0.2 / M input tokens and $0.85 / M output tokens&lt;/p&gt;
&lt;h1&gt;
  
  
  &lt;strong&gt;Understanding the Llama 4 Herd&lt;/strong&gt;
&lt;/h1&gt;

&lt;p&gt;The Llama 4 release introduces three distinct models, each designed for different needs and computational constraints:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fa91tuvll53t0h1406kgr.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fa91tuvll53t0h1406kgr.png" width="800" height="454"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Llama 4 Scout&lt;/strong&gt; features 16 experts and delivers state-of-the-art performance for its class. It supports an industry-leading 10M token context length, making it ideal for processing large amounts of data, including entire codebases or extensive documentation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Llama 4 Maverick&lt;/strong&gt; is Meta’s product workhorse, incorporating 128 experts to deliver superior performance across a wide range of tasks. It excels at precise image understanding and creative writing while supporting up to 1M tokens in context.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Llama 4 Behemoth&lt;/strong&gt; serves as the teacher model for the Llama 4 family with 16 experts. While not yet publicly released as it’s still in training, Behemoth outperforms GPT-4.5, Claude Sonnet 3.7, and Gemini 2.0 Pro on several STEM-focused benchmarks.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt; The context window for Llama 4 Scout on Novita AI is 131,072 tokens, while the context window for Llama 4 Maverick is 1,048,576 tokens.&lt;/p&gt;
&lt;h1&gt;
  
  
  &lt;strong&gt;Key Features and Capabilities&lt;/strong&gt;
&lt;/h1&gt;
&lt;h1&gt;
  
  
  &lt;strong&gt;Native Multimodality&lt;/strong&gt;
&lt;/h1&gt;

&lt;p&gt;Llama 4 models incorporate early fusion to seamlessly integrate text and vision tokens into a unified model backbone. This enables joint pre-training with large amounts of unlabeled text, image, and video data.&lt;/p&gt;

&lt;p&gt;The enhanced vision encoder, based on MetaCLIP but further optimized for LLM integration, allows the models to process multiple images alongside text prompts without additional engineering.&lt;/p&gt;
&lt;h1&gt;
  
  
  &lt;strong&gt;Extended Context Length&lt;/strong&gt;
&lt;/h1&gt;

&lt;p&gt;One of the most significant advancements in Llama 4 is its support for extraordinarily long contexts:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Llama 4 Scout: 10 million tokens&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Llama 4 Maverick: 1 million tokens&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This leap in context length enables applications that were previously impractical, such as:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Multi-document summarization and analysis&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Reasoning over extensive codebases&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Parsing vast amounts of user activity for personalized experiences&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Processing entire research archives in a single prompt&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h1&gt;
  
  
  &lt;strong&gt;Multilingual and Reasoning Capabilities&lt;/strong&gt;
&lt;/h1&gt;

&lt;p&gt;Llama 4 models have been pre-trained on 200 languages — with dedicated fine-tuning support for 12, including Arabic, Spanish, German, and Hindi. Over 100 of these languages have more than 1 billion training tokens each — offering 10 times more multilingual coverage than Llama 3.&lt;/p&gt;

&lt;p&gt;This extensive training enables superior performance across languages, making the models suitable for global applications.&lt;/p&gt;

&lt;p&gt;The models also demonstrate enhanced reasoning abilities thanks to specialized training recipes. For Maverick, this included a continuous online RL strategy with adaptive data filtering, focusing on medium-to-hard difficulty prompts.&lt;/p&gt;
&lt;h1&gt;
  
  
  &lt;strong&gt;Performance Benchmarks and Use Cases&lt;/strong&gt;
&lt;/h1&gt;
&lt;h1&gt;
  
  
  &lt;strong&gt;Benchmarks&lt;/strong&gt;
&lt;/h1&gt;

&lt;p&gt;According to Meta’s official benchmark data, Llama 4 models demonstrate exceptional performance across various tasks, as shown in the tables below:&lt;/p&gt;
&lt;h2&gt;
  
  
  &lt;strong&gt;Llama 4 Scout Benchmarks&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fmiro.medium.com%2Fv2%2Fresize%3Afit%3A1050%2F0%2AgVWy44iXMuicCBRr" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fmiro.medium.com%2Fv2%2Fresize%3Afit%3A1050%2F0%2AgVWy44iXMuicCBRr" width="1050" height="743"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  &lt;strong&gt;Llama 4 Maverick Benchmarks&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fmiro.medium.com%2Fv2%2Fresize%3Afit%3A1050%2F0%2AgaXa_VR1nfiOUxYr" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fmiro.medium.com%2Fv2%2Fresize%3Afit%3A1050%2F0%2AgaXa_VR1nfiOUxYr" width="1050" height="896"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Llama 4 Scout is best suited for long-context applications, while Llama 4 Maverick excels at complex reasoning and creative tasks that involve multimodal understanding. Here are the ideal use cases for each model based on their strengths:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Llama 4 Scout:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Multi-document summarization for legal or financial analysis&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Personalized task automation using extensive user data&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Efficient image processing for lightweight multimodal applications&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://novita.ai/models/llm/meta-llama-llama-4-scout-17b-16e-instruct" rel="noopener noreferrer"&gt;Explore Llama 4 Scout Demo Now&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Llama 4 Maverick:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Multilingual customer support with visual context&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Generating marketing content based on multimodal inputs&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Advanced document intelligence combining text, diagrams, and tables&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Creative writing and content generation with precise image understanding&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://novita.ai/models/llm/meta-llama-llama-4-maverick-17b-128e-instruct-fp8" rel="noopener noreferrer"&gt;Explore Llama 4 Maverick Demo Now&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Both models excel in situations requiring multimodal understanding, reasoning over extensive context, and multilingual capabilities.&lt;/p&gt;
&lt;h1&gt;
  
  
  &lt;strong&gt;Getting Started with Llama 4 on Novita AI&lt;/strong&gt;
&lt;/h1&gt;

&lt;p&gt;Integrating Llama 4 models into your applications via &lt;a href="https://novita.ai/models?utm_source=blog_llm" rel="noopener noreferrer"&gt;Novita AI’s model library&lt;/a&gt; is straightforward, requiring just a few lines of code. Here’s how to get started:&lt;/p&gt;
&lt;h1&gt;
  
  
  &lt;strong&gt;Setting Up Your Environment&lt;/strong&gt;
&lt;/h1&gt;

&lt;p&gt;First, ensure you &lt;a href="https://novita.ai/settings/key-management" rel="noopener noreferrer"&gt;have an API key&lt;/a&gt; from Novita AI. If you don’t have one yet, sign up and create an API key through the Novita AI dashboard.&lt;/p&gt;
&lt;h1&gt;
  
  
  &lt;strong&gt;Integrating with Python&lt;/strong&gt;
&lt;/h1&gt;

&lt;p&gt;Novita AI provides OpenAI-compatible endpoints for seamless integration. Here’s a simple example using the Python client:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OpenAI&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;base_url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://api.novita.ai/v3/openai&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;&amp;lt;YOUR Novita AI API Key&amp;gt;&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;meta-llama/llama-4-maverick-17b-128e-instruct-fp8&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;stream&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt; &lt;span class="c1"&gt;# or False
&lt;/span&gt;&lt;span class="n"&gt;max_tokens&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;2048&lt;/span&gt;
&lt;span class="n"&gt;system_content&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Be a helpful assistant&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
&lt;span class="n"&gt;temperature&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
&lt;span class="n"&gt;top_p&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
&lt;span class="n"&gt;min_p&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
&lt;span class="n"&gt;top_k&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;50&lt;/span&gt;
&lt;span class="n"&gt;presence_penalty&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
&lt;span class="n"&gt;frequency_penalty&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
&lt;span class="n"&gt;repetition_penalty&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
&lt;span class="n"&gt;response_format&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="n"&gt;chat_completion_res&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;system&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;system_content&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Hi there!&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;stream&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;stream&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;max_tokens&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;max_tokens&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;temperature&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;temperature&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;top_p&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;top_p&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;presence_penalty&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;presence_penalty&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;frequency_penalty&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;frequency_penalty&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;response_format&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;response_format&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;extra_body&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;top_k&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;top_k&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;repetition_penalty&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;repetition_penalty&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;min_p&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;min_p&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;stream&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;chunk&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;chat_completion_res&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;delta&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;end&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;chat_completion_res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For more detailed examples and comprehensive integration guides, visit our &lt;a href="https://novita.ai/docs/guides/llm-api" rel="noopener noreferrer"&gt;LLM API documentation&lt;/a&gt;.&lt;/p&gt;

&lt;h1&gt;
  
  
  &lt;strong&gt;Conclusion&lt;/strong&gt;
&lt;/h1&gt;

&lt;p&gt;The arrival of Llama 4 on Novita AI represents a significant milestone in the democratization of advanced AI capabilities.&lt;/p&gt;

&lt;p&gt;With native multimodality, extended context lengths, and an efficient MoE architecture, these models enable new classes of applications that were previously impractical or prohibitively expensive.&lt;/p&gt;

&lt;p&gt;Whether you’re building applications for document processing, multilingual communication, or creative content generation, Llama 4 provides the tools you need to create intelligent, responsive experiences.&lt;/p&gt;

&lt;p&gt;Get started today with &lt;a href="https://novita.ai/docs/guides/llm-api" rel="noopener noreferrer"&gt;Novita AI’s simple integration process&lt;/a&gt; and competitive pricing to bring the power of Llama 4 to your applications and users.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;&lt;em&gt;About Novita AI&lt;/em&gt;&lt;/strong&gt;&lt;br&gt;
&lt;a href="https://novita.ai/?utm_source=dev_llm&amp;amp;utm_medium=article&amp;amp;utm_campaign=llama-4" rel="noopener noreferrer"&gt;&lt;em&gt;Novita AI&lt;/em&gt;&lt;/a&gt; &lt;em&gt;is an AI cloud platform that offers developers an easy way to deploy AI models using our simple API, while also providing an affordable and reliable GPU cloud for building and scaling.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

</description>
      <category>llm</category>
      <category>llama4</category>
      <category>ai</category>
    </item>
    <item>
      <title>DeepSeek V3 0324 Available on Novita AI</title>
      <dc:creator>Novita AI</dc:creator>
      <pubDate>Thu, 27 Mar 2025 04:00:00 +0000</pubDate>
      <link>https://dev.to/novita_ai/deepseek-v3-0324-available-on-novita-ai-3289</link>
      <guid>https://dev.to/novita_ai/deepseek-v3-0324-available-on-novita-ai-3289</guid>
      <description>&lt;p&gt;DeepSeek V3 0324, the latest iteration of the powerful DeepSeek AI series, is now seamlessly accessible via &lt;a href="https://novita.ai/" rel="noopener noreferrer"&gt;Novita AI&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;This advanced version brings substantial enhancements in logical reasoning, mathematical problem-solving, function-calling accuracy, and specialized language proficiency, especially for Chinese.&lt;/p&gt;

&lt;p&gt;This technical guide comprehensively covers DeepSeek V3 0324’s upgrades, detailed benchmarks, comparative analyses, and practical integration methods.&lt;/p&gt;

&lt;p&gt;For quick hands-on experimentation, developers can use Novita AI’s interactive &lt;a href="https://novita.ai/models/llm/deepseek-deepseek-v3-0324" rel="noopener noreferrer"&gt;LLM Playground&lt;/a&gt; and follow the clear &lt;a href="https://novita.ai/docs/guides/llm-api" rel="noopener noreferrer"&gt;Quick Start Guide&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  What’s New: DeepSeek V3 0324 vs. DeepSeek V3
&lt;/h3&gt;

&lt;h3&gt;
  
  
  Enhanced Reasoning Capabilities
&lt;/h3&gt;

&lt;p&gt;DeepSeek V3 0324 significantly outperforms its predecessor in logical and mathematical reasoning benchmarks, although available at the same price:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;MMLU-Pro:&lt;/strong&gt; Improved from 75.9% to &lt;strong&gt;81.2%&lt;/strong&gt; (+5.3%).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;GPQA Diamond:&lt;/strong&gt; Increased from 59.1% to &lt;strong&gt;68.4%&lt;/strong&gt; (+9.3%).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;MATH-500:&lt;/strong&gt; Enhanced accuracy from 90.2% to &lt;strong&gt;94.0%&lt;/strong&gt;, excelling in mathematical reasoning.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;AIME 2024:&lt;/strong&gt; Improved considerably from 39.6% to &lt;strong&gt;59.4%&lt;/strong&gt; (+19.8%).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;LiveCodeBench:&lt;/strong&gt; Improved coding performance from 39.2% to &lt;strong&gt;49.2%&lt;/strong&gt; (+10.0%).&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These improvements ensure greater accuracy and reliability for complex reasoning tasks.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://novita.ai/models/llm/deepseek-deepseek-v3-0324" rel="noopener noreferrer"&gt;Try DeepSeek V3 0324 now with $0.5 free credits&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Improved Front-End Web Development
&lt;/h3&gt;

&lt;p&gt;DeepSeek V3 0324 generates cleaner, executable, and professionally structured front-end code.&lt;/p&gt;

&lt;p&gt;This improvement allows developers to prototype interactive and visually appealing web solutions faster, reducing debugging and accelerating project timelines.&lt;/p&gt;

&lt;h3&gt;
  
  
  Refined Chinese Writing and Interaction
&lt;/h3&gt;

&lt;p&gt;DeepSeek V3 0324 aligns closely with DeepSeek-R1’s sophisticated writing style.&lt;/p&gt;

&lt;p&gt;The model produces superior-quality, contextually relevant medium-to-long-form Chinese content, ideal for chatbots, customer support, and content generation tools.&lt;/p&gt;

&lt;h3&gt;
  
  
  Accurate Function Calling
&lt;/h3&gt;

&lt;p&gt;Function-calling accuracy has significantly improved, resolving reliability issues observed in earlier versions.&lt;/p&gt;

&lt;p&gt;Developers can now confidently integrate structured outputs into API-driven applications and complex workflows.&lt;/p&gt;

&lt;h3&gt;
  
  
  In-Depth Benchmarking Analysis
&lt;/h3&gt;

&lt;p&gt;Below is a detailed benchmarking analysis comparing DeepSeek V3 0324 to other prominent models:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F1000%2F1%2Ai7OC_5HAYgSaGrsETFKiPA.png%2520align%3D" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F1000%2F1%2Ai7OC_5HAYgSaGrsETFKiPA.png%2520align%3D" width="800" height="400"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Source from: &lt;a href="https://api-docs.deepseek.com/news/news250325" rel="noopener noreferrer"&gt;DeepSeek-V3–0324 Release&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Key Takeaways:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Mathematical Reasoning (MATH-500):&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
DeepSeek V3 0324 achieves the highest score (94.0%) in the MATH-500 benchmark, outperforming GPT-4.5 (90.7%), Qwen-Max (82.6%), and Claude-Sonnet-3.7 (82.2%), indicating superior math-solving capabilities.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;General Knowledge (MMLU-Pro):&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
DeepSeek V3 0324 (81.2%) exhibits significant improvement compared to DeepSeek V3 (75.9%), closely approaching GPT-4.5 (86.1%) and surpassing Qwen-Max (76.1%) and Claude-Sonnet-3.7 (80.7%).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Coding Performance (LiveCodeBench&lt;/strong&gt;)&lt;strong&gt;:&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
DeepSeek V3 0324 demonstrates notable improvement (49.2%) over its predecessor (39.2%) in the LiveCodeBench, closely competitive with GPT-4.5 (44.4%) and surpassing Claude-Sonnet-3.7 (42.2%) and Qwen-Max (38.7%).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Complex Problem-Solving (AIME 2024):&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
With a remarkable score of 59.4%, DeepSeek V3 0324 significantly exceeds GPT-4.5 (36.7%), Claude-Sonnet-3.7 (23.3%), and Qwen-Max (26.7%), showcasing its strength in solving advanced problems.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  DeepSeek V3 0324 vs. Claude 3.7 vs. GPT-4.5
&lt;/h3&gt;

&lt;p&gt;To better understand how DeepSeek V3 0324 compares in the broader AI ecosystem, let’s take a closer look at its performance and features alongside two other leading models — Claude 3.7 and GPT-4.5:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F1000%2F1%2A71Vamjc28sT1DqagIwhx9A.png%2520align%3D" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F1000%2F1%2A71Vamjc28sT1DqagIwhx9A.png%2520align%3D" width="800" height="400"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Key Takeaways:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Cost Efficiency:&lt;/strong&gt; DeepSeek V3 0324 offers significant cost savings compared to GPT-4.5 and Claude-Sonnet-3.7, making it highly accessible for startups and large-scale projects.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Specialization:&lt;/strong&gt; DeepSeek V3 0324 excels in scenarios requiring advanced mathematical reasoning, coding, and specialized Chinese language applications.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Integration Simplicity:&lt;/strong&gt; Novita AI simplifies integration significantly by maintaining compatibility with the OpenAI API standard.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Getting Started with DeepSeek V3 0324 on Novita AI
&lt;/h3&gt;

&lt;p&gt;To quickly leverage DeepSeek V3 0324:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 1:&lt;/strong&gt; Go to &lt;a href="https://novita.ai/?utm_source=blogs&amp;amp;utm_medium=article&amp;amp;utm_campaign=deepseek-r1-now-available-on-novita-ai-a-strong-competitor-to-openai-o1" rel="noopener noreferrer"&gt;Novita AI&lt;/a&gt; and log in using your Google, GitHub account, or email address.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 2:&lt;/strong&gt; Try the &lt;a href="https://novita.ai/models/llm/deepseek-deepseek-v3-0324" rel="noopener noreferrer"&gt;DeepSeek-V3–0324 Demo&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 3:&lt;/strong&gt; Monitor the &lt;a href="https://novita.ai/model-api/console/llm-metrics?utm_source=blogs&amp;amp;utm_medium=article&amp;amp;utm_campaign=deepseek-v3-advancing-open-source-code-models-now-available-on-novita-ai" rel="noopener noreferrer"&gt;LLM Metrics Console&lt;/a&gt; of the model on Novita AI.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 4:&lt;/strong&gt; Get your API Key:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Navigate to “&lt;a href="https://novita.ai/settings/key-management" rel="noopener noreferrer"&gt;Key Management&lt;/a&gt;” in the settings.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;A default key is created upon your first login.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;To generate additional keys, click on “+ Add New Key.”&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Step 5&lt;/strong&gt;: Set up your development environment and configure options such as content, role, name, and prompt&lt;/p&gt;

&lt;h3&gt;
  
  
  Accessing DeepSeek V3 0324 API via Novita AI
&lt;/h3&gt;

&lt;p&gt;Novita AI simplifies DeepSeek V3 0324 integration via an intuitive, OpenAI-compatible API.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;For python users&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;from openai import OpenAI

client = OpenAI(
    base_url="https://api.novita.ai/v3/openai",
    api_key="&amp;lt;YOUR Novita AI API Key&amp;gt;",
)
model = "deepseek/deepseek-v3-0324"
stream = True # or False
max_tokens = 2048
system_content = """Be a helpful assistant"""
temperature = 1
top_p = 1
min_p = 0
top_k = 50
presence_penalty = 0
frequency_penalty = 0
repetition_penalty = 1
response_format = { "type": "text" }
chat_completion_res = client.chat.completions.create(
    model=model,
    messages=[
        {
            "role": "system",
            "content": system_content,
        },
        {
            "role": "user",
            "content": "Hi there!",
        }
    ],
    stream=stream,
    max_tokens=max_tokens,
    temperature=temperature,
    top_p=top_p,
    presence_penalty=presence_penalty,
    frequency_penalty=frequency_penalty,
    response_format=response_format,
    extra_body={
      "top_k": top_k,
      "repetition_penalty": repetition_penalty,
      "min_p": min_p
    }
  )
if stream:
    for chunk in chat_completion_res:
        print(chunk.choices[0].delta.content or "", end="")
else:
    print(chat_completion_res.choices[0].message.content)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;For JavaScript users:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import OpenAI from "openai";
const openai = new OpenAI({
  baseURL: "https://api.novita.ai/v3/openai",
  apiKey: "&amp;lt;YOUR Novita AI API Key&amp;gt;",
});
const stream = true; // or false
async function run() {
  const completion = await openai.chat.completions.create({
    messages: [
      {
        role: "system",
        content: "Be a helpful assistant",
      },
      {
        role: "user",
        content: "Hi there!",
      },
    ],
    model: "deepseek/deepseek-v3-0324",
    stream,
    response_format: { type: "text" },
    max_tokens: 2048,
    temperature: 1,
    top_p: 1,
    min_p: 0,
    top_k: 50,
    presence_penalty: 0,
    frequency_penalty: 0,
    repetition_penalty: 1
  });
  if (stream) {
    for await (const chunk of completion) {
      if (chunk.choices[0].finish_reason) {
        console.log(chunk.choices[0].finish_reason);
      } else {
        console.log(chunk.choices[0].delta.content);
      }
    }
  } else {
    console.log(JSON.stringify(completion));
  }
}
run();
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;For Curl users:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;curl "https://api.novita.ai/v3/openai/chat/completions" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer &amp;lt;YOUR Novita AI API Key&amp;gt;" \
  -d @- &amp;lt;&amp;lt; 'EOF'
{
    "model": "deepseek/deepseek-v3-0324",
    "messages": [
        {
            "role": "system",
            "content": "Be a helpful assistant"
        },
        {
            "role": "user",
            "content": "Hi there!"
        }
    ],
    "response_format": { "type": "text" },
    "max_tokens": 2048,
    "temperature": 1,
    "top_p": 1,
    "min_p": 0,
    "top_k": 50,
    "presence_penalty": 0,
    "frequency_penalty": 0,
    "repetition_penalty": 1
}
EOF
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For detailed instructions, refer to Novita AI’s comprehensive &lt;a href="https://novita.ai/docs/guides/llm-api" rel="noopener noreferrer"&gt;Quick Start Guide&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Conclusion
&lt;/h3&gt;

&lt;p&gt;DeepSeek V3 0324, now available on Novita AI, significantly upgrades reasoning, coding, and specialized language processing capabilities.&lt;/p&gt;

&lt;p&gt;Its competitive pricing, powerful features, intuitive API integration, and scalable infrastructure offer developers unmatched efficiency and cost-effectiveness.&lt;/p&gt;

&lt;p&gt;Start leveraging &lt;a href="https://novita.ai/models/llm/deepseek-deepseek-v3-0324" rel="noopener noreferrer"&gt;DeepSeek V3 0324 on Novita AI&lt;/a&gt; today and elevate your AI projects effectively and affordably.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;&lt;em&gt;About Novita AI&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://novita.ai/?utm_source=blog_llm&amp;amp;utm_medium=article&amp;amp;utm_campaign=deepseek-r1" rel="noopener noreferrer"&gt;&lt;em&gt;Novita&lt;/em&gt;&lt;/a&gt; &lt;a href="https://novita.ai/?utm_source=blog_llm&amp;amp;utm_medium=article&amp;amp;utm_campaign=deepseek-v3" rel="noopener noreferrer"&gt;&lt;em&gt;AI&lt;/em&gt;&lt;/a&gt; &lt;em&gt;is an AI cloud platform that offers developers an easy way to deploy AI models using our simple API, while also providing an affordable and reliable GPU cloud for building and scaling.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

</description>
      <category>deepseek</category>
      <category>llm</category>
      <category>ai</category>
    </item>
    <item>
      <title>High-Frequency CPU + RTX 4090: Optimizing AI Image Generation by 150%+</title>
      <dc:creator>Novita AI</dc:creator>
      <pubDate>Tue, 18 Mar 2025 04:00:00 +0000</pubDate>
      <link>https://dev.to/novita_ai/high-frequency-cpu-rtx-4090-optimizing-ai-image-generation-by-150-54nc</link>
      <guid>https://dev.to/novita_ai/high-frequency-cpu-rtx-4090-optimizing-ai-image-generation-by-150-54nc</guid>
      <description>&lt;p&gt;In AI image generation workloads, the relationship between CPU and GPU plays a crucial role in overall system performance. Our comprehensive testing reveals that CPU frequency is a more significant factor than core count when paired with an RTX 4090 GPU. This finding challenges conventional wisdom that favors multi-core enterprise processors for AI tasks and demonstrates how high-frequency consumer CPUs can dramatically improve generation speed while reducing costs.&lt;/p&gt;

&lt;p&gt;This article details our benchmarks showing how high-frequency CPUs dramatically reduce generation times, explores ComfyUI optimization techniques that further enhance performance, and provides a step-by-step guide to accessing these optimized configurations through Novita AI’s platform.&lt;/p&gt;

&lt;h3&gt;
  
  
  CPU Frequency Impact on Image Generation Performance
&lt;/h3&gt;

&lt;p&gt;For text-to-image generation tasks, the CPU prepares data for the GPU to process. Higher CPU frequencies enable faster preparation and transfer of these instruction sets, allowing the GPU to operate at maximum efficiency rather than waiting for data. Our testing demonstrates that consumer CPUs with higher frequencies can increase GPU utilization by over 150% compared to lower-frequency enterprise processors.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Testing Frameworks&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;ComfyUI running Stable Diffusion 1.8.0, tested on NVIDIA RTX 4090 (24GB VRAM) across multiple CPU configurations&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F1000%2F0%2A58zw3uNZx1wxd5lK" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F1000%2F0%2A58zw3uNZx1wxd5lK" width="800" height="281"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  CPU Configuration Performance: StableDiffusion Generation Speed
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F1000%2F0%2A1_tCFdmD3b0uE0_3" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F1000%2F0%2A1_tCFdmD3b0uE0_3" width="800" height="156"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F1000%2F0%2AtKjR3BQvlxlw5uLt" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F1000%2F0%2AtKjR3BQvlxlw5uLt" width="800" height="386"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Key Finding&lt;/strong&gt;: The consumer-grade high-frequency CPU completes the same task in less than half the time required by the enterprise CPU, demonstrating over 150% performance improvement.&lt;/p&gt;

&lt;h3&gt;
  
  
  ComfyUI Optimization Modes Impact on Image Generation Speed
&lt;/h3&gt;

&lt;p&gt;After establishing the superior performance of high-frequency CPUs, we explored additional optimization methods to further enhance generation speed.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Testing Framework:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;ComfyUI running Flux1.dev fp8 model on a high-frequency CPU + RTX 4090 system&lt;/p&gt;

&lt;h3&gt;
  
  
  Average Image Generation Time (seconds)
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1j4iubv2c25h8es8obu2.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1j4iubv2c25h8es8obu2.jpg" width="800" height="104"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Understanding ComfyUI Functions
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Fast Mode&lt;/strong&gt;: Accelerates resource loading through preloading and caching mechanisms. Reduces unnecessary checks and optimizes resource loading for faster startup and improved efficiency when generating multiple images.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;HighVRAM Mode&lt;/strong&gt;: Keeps more models and data in GPU VRAM, reducing data transfer overhead. Optimizes memory management by avoiding frequent allocation/release operations, improving generation efficiency. Can process multiple batches when sufficient VRAM is available.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Accelerator Node&lt;/strong&gt;: A custom plugin for ComfyUI that further enhances processing pipeline efficiency.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Key Finding&lt;/strong&gt;:&lt;/p&gt;

&lt;p&gt;When using RTX 4090 + high-frequency CPU configurations, optimizing your workflow with specialized modes delivers significant performance gains while maintaining image quality:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Basic Model Workflow&lt;/strong&gt;: Using flux.dev-fp8 with fast+highVRAM modes reduces generation time from 10.05s to 6.59s&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Single LoRA Workflow&lt;/strong&gt;: Implementing flux.dev-fp8 with fast+accelerator nodes cuts generation time from 12.63s to 9.68s&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Four LoRA Switching Workflow&lt;/strong&gt;: Applying flux.dev-fp8 with fast mode decreases generation time from 14.07s to 11.10s&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Across all scenarios, these optimizations save approximately 3 seconds per image while maintaining full quality and reliability.&lt;/p&gt;

&lt;h3&gt;
  
  
  How to Access the Image Generation Friendly GPU on Novita AI
&lt;/h3&gt;

&lt;p&gt;For those looking to implement these findings, Novita AI offers pre-configured instances with the optimal hardware combination:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 1:&lt;/strong&gt; Go to &lt;a href="https://novita.ai/?utm_source=blogs&amp;amp;utm_medium=article&amp;amp;utm_campaign=deepseek-r1-now-available-on-novita-ai-a-strong-competitor-to-openai-o1" rel="noopener noreferrer"&gt;Novita AI&lt;/a&gt; and log in using your Google, GitHub account, or email address.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 2:&lt;/strong&gt; Navigate to &lt;a href="https://novita.ai/gpus-console?utm_source=blogs&amp;amp;utm_medium=article&amp;amp;utm_campaign=high-frequency-cpu-rtx-4090-optimizing-ai-image-generation-by-150" rel="noopener noreferrer"&gt;GPU instances&lt;/a&gt; page.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 3:&lt;/strong&gt; Select the appropriate GPU template:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;StableDiffusion:v1.8.0&lt;/strong&gt; for Stable Diffusion model optimization&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Comfyui:flux1-fp8&lt;/strong&gt; for Flux model optimization&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F1000%2F0%2A3_HhE0QbFyzg_4ch" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F1000%2F0%2A3_HhE0QbFyzg_4ch" width="800" height="353"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 4:&lt;/strong&gt; Select 24 vCPUs per GPU in the bottom right corner, then choose the ‘RTX 4090 (High-Freq CPU)’ configuration.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F1000%2F0%2AstLqFY0p47FPiH8U" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F1000%2F0%2AstLqFY0p47FPiH8U" width="800" height="378"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Technical Specifications for RTX 4090 (High-Freq CPU)&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;GPU&lt;/strong&gt;: 1× NVIDIA RTX 4090 with 24GB VRAM&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;CPU&lt;/strong&gt;: High-frequency CPU (13th Gen Intel Core i7–13790F)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;System Memory&lt;/strong&gt;: 58GB RAM&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Processing Cores&lt;/strong&gt;: 24 vCPUs&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Cost-Efficiency&lt;/strong&gt;: $0.69/hour (on-demand pricing)&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Conclusion
&lt;/h3&gt;

&lt;p&gt;Our research conclusively demonstrates that high-frequency consumer CPUs significantly outperform lower-frequency enterprise CPUs when paired with an RTX 4090 for AI image generation tasks. This combination delivers up to 150% faster performance while potentially reducing hardware costs.&lt;/p&gt;

&lt;p&gt;By implementing the additional ComfyUI optimization techniques outlined in this article, users can further enhance their generation speed and throughput.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Ready to transform your AI image generation workflow?&lt;/strong&gt; Consider upgrading to an &lt;a href="https://novita.ai/gpus-console/explore?utm_source=blogs&amp;amp;utm_medium=article&amp;amp;utm_campaign=high-frequency-cpu-rtx-4090-optimizing-ai-image-generation-by-150" rel="noopener noreferrer"&gt;RTX 4090 with a high-frequency CPU&lt;/a&gt; to immediately improve your generation speeds and output quality.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;About Novita AI&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://novita.ai/?utm_source=dev_gpu&amp;amp;utm_medium=article&amp;amp;utm_campaign=high-frequency-cpu-rtx-4090-optimizing-ai-image-generation-by-150" rel="noopener noreferrer"&gt;Novita AI&lt;/a&gt; is an AI cloud platform that offers developers an easy way to deploy AI models using our simple API, while also providing an affordable and reliable GPU cloud for building and scaling.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;Originally published at&lt;/em&gt;&lt;/strong&gt; &lt;a href="https://blogs.novita.ai/high-frequency-cpu-rtx-4090-optimizing-ai-image-generation-by-150/?utm_source=dev_gpu&amp;amp;utm_medium=article&amp;amp;utm_campaign=high-frequency-cpu-rtx-4090" rel="noopener noreferrer"&gt;&lt;em&gt;Novita AI&lt;/em&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>4090</category>
      <category>gpu</category>
      <category>ai</category>
    </item>
  </channel>
</rss>
