DEV Community: SiliconFlow

GLM-4.6V Now on SiliconFlow: Native Multimodal Tool Use Meets SoTA Visual Intelligence

SiliconFlow — Fri, 26 Dec 2025 14:10:00 +0000

TL;DR: GLM-4.6V, Z.ai's latest multimodal large language model, is now available on SiliconFlow. Featuring a 131K multimodal context window and native function calling integration, it delivers SoTA performance in visual understanding and reasoning — seamlessly bridging the gap between "visual perception" and "executable action". The GLM-4.6V series provides a unified technical foundation for multimodal agents in real-world business scenarios. Try GLM-4.6V now and level up your* multimodal agents with SiliconFlow APIs*.

We are thrilled to announce GLM-4.6V, Z.ai's latest multimodal foundation model designed for cloud and enterprise-grade scenarios, is now available on SiliconFlow. It integrates native multimodal function calling capability and excels in long-context visual reasoning, directly closing the loop from* perception to understanding to execution.*

Now, through SiliconFlow's GLM-4.6V API, you can expect:

Budget-friendly Pricing: GLM-4.6V $0.30/M tokens (input) and $0.90/M tokens (output)
131K Context Window: Enables processing lengthy industry reports, extensive slide decks, or long-form video content
Seamless Integration: Instantly deploy via SiliconFlow's OpenAI-compatible API, or plug into your existing agentic frameworks, automation tools, or workflows.

Whether you are building agents, workflows, or tools for:

Rich-Text Content Creation: Convert papers, reports, and slides into polished posts for social media and knowledge bases
Design-to-Code Automation: Upload screenshots/designs for pixel-level HTML/CSS/JS code generation
Business Document Processing: Process reports to extract metrics and synthesize comparative tables
Video Content Operations: Summarize, tag, and extract insights at scale

Through SiliconFlow's production-ready API, you can leverage GLM-4.6V to power your multimodal agents in minutes — no cost concerns, no engineering overhead.

Let's dive into the key capabilities with live demos from the SiliconFlow Platform.

Key Features & Benchmark Performance

In most LLM pipelines, tool calling is still text-only: even for image or document tasks, everything must be converted into text first, then back again. This process potentially leads to information loss and increases system complexity. GLM-4.6V changes this with native multimodal tool calling capability:

Multimodal Input: Images, UI screenshots, and document pages can be passed directly as tool arguments, avoiding manual text conversion and preserving layout and visual cues.
Multimodal Output: The model can directly interpret tool results such as search pages, charts, rendered web screenshots, or product images, and feed them back into its reasoning and final response.

By closing the loop from perception → understanding → execution, GLM-4.6V supports the following key features:

Rich-Text Content Understanding and Creation: Accurately understands complex text, charts, tables, and formulas, then autonomously invokes visual tools to crop key visuals during generation, and audits image quality to compose publication-ready content perfect for social media & knowledge bases.
Visual Web Search: Recognizes search intent and autonomously triggers appropriate search tools, then comprehends and aligns the mixed visual-textual results to identify relevant information, and finally performs reasoning to deliver structured, visually-rich answers.
Frontend Replication & Visual Interaction: Achieves pixel-level replication by identifying layouts, components, and color schemes from screenshots to generate high-fidelity HTML/CSS/JS code, then lets you refine it interactively—just circle an element and tell it what you want, like "make this button bigger and change it to green."
Long-Context Understanding: Processes ~150 pages of documents, 200 slides, or a one-hour video in a single pass with its 131K context window, enabling tasks like analyzing financial reports or summarizing an entire football match while pinpointing specific goal events and timestamps.

For example, when uploading two financial reports filled with numbers, tables and charts, GLM-4.6V shows outstanding visual understanding and reasoning performance. It really understood the tables and charts, reasoned over the numbers, and surfaced actionable insights on revenue growth, profitability, and market positioning.

SiliconFlow Playground supports text & image inputs. Use API service for other input types.

GLM-4.6V has also been evaluated across 20+ mainstream multimodal benchmarks including MMBench, MathVista, and OCRBench, achieving SoTA performance among open-source models. It matches or outperforms comparable-scale models like Qwen3-VL-235B, Kimi-VL-A3B-Thinking-2506, and Step3-321B in key capabilities: multimodal understanding, multimodal agentic tasks, and long-context processing.

Techniques

GLM-4.6V sets the technical foundation for multimodal agents in real-world business scenarios. To achieve this performance, GLM-4.6V introduces a comprehensive suite of innovations:

Model architecture & long-sequence modeling: GLM-4.6V is continually pre-trained on long-context image–text data, with visual–language compression alignment (inspired by Glyph) to better couple visual encoding with linguistic semantics.
Multimodal world knowledge: A billion-scale multimodal perception and world-knowledge corpus was introduced to enhance both basic visual understanding and the accuracy and completeness of cross-modal QA.
Agentic data & MCP extensions: Through large-scale synthetic agentic training, GLM-4.6V extends Model Context Protocol (MCP) with URL-based multimodal handling and end-to-end interleaved text–image output using a “Draft → Image Selection → Final Polish” workflow.
RL for multimodal agents: Tool-calling behaviors are integrated into a unified RL objective, and a visual feedback loop (building on UI2Code^N) lets the model use rendered results to self-correct its code and actions, pushing toward self-improving multimodal agents.

Get Started Immediately

Explore: Try GLM-4.6V in the SiliconFlow playground.
Integrate: Use our OpenAI-compatible API. Explore the full API specifications in the SiliconFlow API documentation.

import requests

url = "https://api.siliconflow.com/v1/chat/completions"

payload = {
    "model": "zai-org/GLM-4.6V",
    "messages": [
        {
            "content": [
                {
                    "type": "image_url",
                    "image_url": {
                        "detail": "auto",
                        "url": "https://tse4.mm.bing.net/th/id/OIP.mDDGH4uc_a7tmLFLJvKXrQHaEo?rs=1&pid=ImgDetMain&o=7&rm=3"
                    }
                },
                {
                    "type": "text",
                    "text": "What is in the picture?"
                }
            ],
            "role": "user"
        }
    ],
    "stream": True,
    "temperature": 1
}
headers = {
    "Authorization": "Bearer <token>",
    "Content-Type": "application/json"
}

response = requests.request("POST", url, json=payload, headers=headers)

print(response.text)

GLM-4.7 Now on SiliconFlow: Advanced Coding, Reasoning & Tool Use Capabilities

SiliconFlow — Thu, 25 Dec 2025 13:30:00 +0000

We're excited to announce that GLM-4.7, Z.ai's latest flagship model, is now available on SiliconFlow with Day 0 support. Compared with its predecessor GLM-4.6, this release brings significant advancements across coding, complex reasoning, and tool utilization — delivering performance that rivals or even outperforms industry leaders like Claude Sonnet 4.5 and GPT-5.1.

Currently, SiliconFlow supports the entire GLM model series, including GLM-4.5, GLM-4.5-Air, GLM-4.5V, GLM-4.6, GLM-4.6V, and now GLM-4.7.

SiliconFlow Day 0 support with:

Competitive Pricing: GLM-4.7 $0.6/M tokens (input) and $2.2/M tokens (output)
205K Context Window: Tackle complex coding tasks, deep document analysis, and extended agentic workflows.
Anthropic & OpenAI-Compatible APIs: Deploy via SiliconFlow with seamless integration into Claude Code, Kilo Code, Cline, Roo Code, and other mainstream agent workflows with significant improvements on complex tasks.

What Makes GLM-4.7 Special

GLM-4.7, your new coding partner, is coming with the following features:

Core Coding Excellence

GLM-4.7 sets a new standard for multilingual agentic coding and terminal-based tasks. Compared to its predecessor, the improvements are substantial:

73.8% (+5.8%) on SWE-bench Verified
66.7% (+12.9%) on SWE-bench Multilingual
41% (+16.5%) on Terminal Bench 2.0

The model now supports "thinking before acting" enabling more reliable performance on complex tasks across mainstream agent frameworks including Claude Code, Kilo Code, Cline, and Roo Code.

Vibe Coding

GLM-4.7 takes a major leap forward in UI quality. It produces cleaner, more modern webpages and generates better-looking slides with more accurate layout and sizing. Whether you're prototyping interfaces or creating presentations, the visual output quality is noticeably enhanced.

Advanced Tool Using

Tool utilization has been significantly enhanced. On multi-step benchmarks like τ²-Bench and web browsing tasks via BrowseComp, GLM-4.7 surpasses both Claude Sonnet 4.5 and GPT-5.1 High, demonstrating superior capability for complex, real-world workflows.

Complex Reasoning Capabilities

Mathematical and reasoning abilities see a substantial boost, with GLM-4.7 achieving 42.8% (+12.4%) on the HLE (Humanity's Last Exam) benchmark compared to GLM-4.6. Moreover, you can also see significant improvements in many other scenarios such as chat, creative writing, and role-play scenarios.

Whether it's coding, creativity, or complex reasoning — get started now to see what GLM-4.7 brings to your workflow.

Get Started Immediately

Explore: Try GLM-4.7 in the SiliconFlow playground.
Integrate: Use our OpenAI/Anthropic-compatible API. Explore the full API specifications in the SiliconFlow API documentation.

import requests

url = "https://api.siliconflow.com/v1/chat/completions"

payload = {
    "model": "zai-org/GLM-4.7",
    "messages": [
        {
            "role": "system",
            "content": "You are an assistant"
        },
        {
            "role": "user",
            "content": "What's the weather like in America?"
        }
    ],
    "stream": True,
    "max_tokens": 4096,
    "enable_thinking": True,
    "temperature": 1,
    "top_p": 0.95
}
headers = {
    "Authorization": "Bearer <token>",
    "Content-Type": "application/json"
}

response = requests.post(url, json=payload, headers=headers)

print(response.text)

OpenAI's gpt-oss Now Live on SiliconFlow: Designed for Agentic Workflows, Advanced Reasoning and Tool Use

SiliconFlow — Thu, 09 Oct 2025 05:24:46 +0000

SiliconFlow is excited to announce the launch of gpt-oss-120B and gpt-oss-20B — state-of-the-art open-weight language models now available on our platform. Built on a MoE architecture, gpt-oss-120B has 117 billion parameters with 5.1 billion activated per token, while gpt-oss-20B has 21 billion parameters, activating 3.6 billion per token.

Trained with reinforcement learning techniques inspired by OpenAI's advanced internal models (including o3), gpt-oss is built for agentic workflows with exceptional instruction following, tool use such as web search and Python code execution, and configurable reasoning effort— enabling both complex reasoning and lower latency outputs.

Whether you're building complex reasoning pipelines, enabling sophisticated tool use or deploying large-scale AI services, gpt-oss on SiliconFlow delivers the flexibility and power to accelerate innovation — backed by our fully optimized deployment and production-ready API service.

With SiliconFlow's gpt-oss API, you can expect:

Cost-Effective Pricing:
- gpt-oss-120b $0.09/M tokens (input) and $0.45/M tokens (output);
- gpt-oss-20b $0.04/M tokens (input) and $0.18/M tokens (output).
Extended Context Window: 131K context window for complex tasks.

Key Capabilities & Benchmark Performance

OpenAI's gpt-oss models on SiliconFlow offer versatile capabilities to adapt to a wide range of AI tasks:

Configurable reasoning effort: Easily adjust the reasoning effort (low, medium, high) based on your specific use case and latency needs.
Full chain-of-thought: Provides complete access to the model's reasoning process, facilitating easier debugging and greater trust in outputs.
Fine-tunable: Fully customize models to your specific use case through parameter fine-tuning.
Agentic capabilities: Use the models' native capabilities for function calling, web browsing, Python code execution and Structured Outputs.

Also, gpt-oss-120b and gpt-oss-20b have been evaluated across standard academic benchmarks to measure their capabilities in coding, competition math, health, and agentic tool use, compared with other OpenAI reasoning models, including o3, o3‑mini, and o4-mini:

gpt-oss-120b outperforms OpenAI o3‑mini and matches or exceeds OpenAI o4-mini on competition coding (Codeforces), general problem solving (MMLU and HLE) and tool calling (TauBench). It furthermore does even better than o4-mini on health-related queries (HealthBench⁠) and competition mathematics (AIME 2024 & 2025).
gpt-oss-20b matches or exceeds OpenAI o3‑mini on these same evals, despite its small size, even outperforming it on competition mathematics and health.

With these features and competitive benchmark performance, gpt-oss offers developers an optimal balance of capability and cost-effectiveness.

Technical Highlights of gpt-oss

Building on these capabilities and benchmark results, the technical foundation of gpt-oss combines cutting-edge architecture with advanced training methodologies to deliver high performance:

Advanced Training & Architecture:

Trained using OpenAI's most advanced pre-training and post-training techniques, emphasizing reasoning, efficiency and real-world usability.
Built on a Transformer backbone with mixture-of-experts (MoE), gpt-oss-120b activates 5.1B parameters per token (117B total), and gpt-oss-20b activates 3.6B (21B total).
Employ alternating dense and locally banded sparse attention, grouped multi-query attention (group size 8) and Rotary Positional Embedding (RoPE) supporting context lengths up to 128k tokens.
Training data focuses on English text in STEM, coding and general knowledge, tokenized with the open-sourced o200k_harmony tokenizer.

Post-Training & Reasoning:

Following pre-training, the models undergo supervised fine-tuning and a high-compute reinforcement learning stage to align with the OpenAIModelSpec.
This process enhances chain-of-thought (CoT) reasoning and tool use capabilities, supporting configurable reasoning efforts — low, medium, and high — allowing developers to balance latency and performance via system prompts.

Get Started Immediately

Explore: Try gpt-oss in the SiliconFlow playground.

Integrate: Use our OpenAI-compatible API. Explore the full API specifications in the SiliconFlow API documentation.

import requests

url = "https://api.siliconflow.com/v1/chat/completions"

payload = {
    "model": "openai/gpt-oss-20b",
    "max_tokens": 512,
    "enable_thinking": True,
    "thinking_budget": 4096,
    "min_p": 0.05,
    "temperature": 0.7,
    "top_p": 0.7,
    "top_k": 50,
    "frequency_penalty": 0.5,
    "n": 1,
    "messages": [
        {
            "content": "how are you today",
            "role": "user"
        }
    ]
}
headers = {
    "Authorization": "Bearer <token>",
    "Content-Type": "application/json"
}

response = requests.post(url, json=payload, headers=headers)

print(response.json())

Start building with gpt-oss via SiliconFlow's high-performance API today!

Wan 2.2 Series Now Available on SiliconFlow：Generate Stable, Realistic and Cinematic Videos

SiliconFlow — Mon, 01 Sep 2025 07:30:33 +0000

SiliconFlow is excited to bring the Wan 2.2 series to our platform — a major upgrade to Wan's visual generative models, featuring significantly improved performance and superior visual quality. We provide two models from this series: Wan2.2-T2V-A14B and Wan2.2-I2V-A14B. Designed to power the next wave of creative AI applications from concept art to commercial visuals, these models enabling users to generate more detailed, realistic, and imaginative content than ever before.

In addition to Wan 2.2, SiliconFlow also supports the previously released Wan 2.1 series, giving users the flexibility to select the most suitable model for their specific needs.

With SiliconFlow's Wan 2.2 API, you can expect:

Cost-Effective Pricing: Wan2.2-T2V-A14B $0.29 / video;Wan2.2-I2V-A14B $0.29 / video.

Resolution Support: Wan2.2-T2V-A14B: 480P & 720P (5s);Wan2.2-I2V-A14B: 480P & 720P.

Model Highlights in WAN 2.2 Series

SiliconFlow offers two powerful models from the WAN 2.2 series. You can select the appropriate model based on their specific task requirements and desired functionality:

T2V-A14B:
- Delivers high-quality video generation.
- Outperforms leading commercial models on Wan-Bench 2.0.
I2V-A14B:
- Produces more stable, coherent videos.
- Reduces unrealistic camera movements.Enhanced support for diverse stylized scenes.

These improvements in WAN 2.2 are further validated through [benchmark performance](https://github.com/Wan-Video/Wan2.2）. In comparative tests on WAN-Bench 2.0, WAN2.2-T2V-A14B outperforms leading closed-source commercial models such as Sora and Hailuo 02 across several critical dimensions.

These superior results stem from WAN 2.2's key technical innovations:

Effective MoE Architecture: WAN 2.2 introduces a Mixture-of-Experts (MoE) architecture into video diffusion models, separating the denoising process across timesteps with specialized expert models to enlarge overall model capacity while maintaining computational cost.
Cinematic-level Aesthetics: The model incorporates meticulously curated aesthetic data with detailed labels for lighting, composition, contrast, and color tone, enabling precise and controllable cinematic style generation.
Complex Motion Generation: Trained on significantly larger data with +65.6% more images and +83.2% more videos compared to WAN 2.1, enhancing generalization across motions, semantics, and aesthetics.

With Wan 2.2 series integrated into SiliconFlow, you can now benefit from these improvements in your own projects.

Real-world Performance On SiliconFlow

Cinematic Vision Control

prompt: A purely visual and atmospheric video piece focusing on the interplay of light and shadow, with a corn train as the central motif. Imagine a stage bathed in dramatic, warm spotlights, where a corn train, rendered as a stark silhouette, moves slowly across space. The video explores the dynamic interplay of light and shadow cast by the train, creating abstract patterns, shapes, and illusions that dance across the stage. The soundtrack should be ambient and minimalist, enhancing the atmospheric and abstract nature of the piece.

Sweeping Motion

prompt: A dynamic group of diverse hip-hop dancers explodes across a vast stage bathed in vibrant neon lights, casting side-lit halos around their silhouettes. Wide cinematic shot captures synchronized movements, vibrant energy, and youthful expressions. Fast-paced camera work mirrors the beat, showcasing intricate footwork, explosive power, and the unity of the ensemble.

Precise Prompt Following

prompt: Watercolor style, the wet suminagashi inks slowly spread into the shape of an island on paper, with the edges continuously blending into delicate textural variations. A tiny paper boat floats in the direction of the water flows towards the still-wet areas, creating subtle ripples around it. Centered composition with soft natural light pouring in from the side, revealing subtle color gradations and a sense of movement.

Get Started Immediately

Explore: Try Wan 2.2 series in the SiliconFlow playground.

Integrate: Use our OpenAI-compatible API. Explore the full API specifications in the SiliconFlow API documentation.

import requests

url = "https://api.siliconflow.com/v1/video/submit"

payload = {
    "model": "Wan-AI/Wan2.2-T2V-A14B",
    "prompt": "an island near sea, with seagulls, moon shining over the sea, light house, boats int he background, fish flying over the sea",
    "image_size": "960x960"
}
headers = {
    "Authorization": "Bearer <token>",
    "Content-Type": "application/json"
}

response = requests.post(url, json=payload, headers=headers)

print(response.json())

Explore Wan2.2 on SiliconFlow today!

Business or Sales Inquiries →
Join our Discord community now →
Follow us on X for the latest updates →
Explore all available models on SiliconFlow →

GLM-4.5V: The World's Leading Open-Source Vision Reasoning Model Now on SiliconFlow

SiliconFlow — Mon, 25 Aug 2025 04:13:56 +0000

Today, we are excited to introduce that GLM-4.5V — the world’s best-performing open-source 100B-scale vision reasoning model — is now available on SiliconFlow. Built upon Z.ai's flagship text foundation model GLM-4.5-Air, GLM-4.5V is designed to empower complex problem solving, long-context understanding and multimodal agents. Following the technical approach of GLM-4.1V-Thinking, it also emphasizes advancing multimodal reasoning and practical real-world applications.

Whether it's accurately interpreting images and videos, extracting insights from complex documents, or autonomously interacting with graphical user interfaces through intelligent agents, GLM-4.5V delivers robust performance.

With SiliconFlow's GLM-4.5V API, you can expect:

Cost-Effective Pricing: GLM-4.5V $0.14/M tokens (input) and $0.86/M tokens (output).
Context Length: 66K-token multimodal context window.
Native support: Tool Use and Image Input.

Key Capabilities & Benchmark Performance

Through efficient hybrid training, it can handle diverse types of visual content, enabling comprehensive vision reasoning, including:

Image Reasoning: Scene understanding, complex multi-image analysis, spatial recognition.
Video Understanding: Long video segmentation and event recognition.
GUI Tasks: Screen reading, icon recognition, desktop operation assistance.
Complex Chart & Long Document Parsing: Research report analysis, information extraction.
Grounding: Precise visual element localization.

The model also introduces a Thinking Mode switch, allowing users to balance between quick responses and deep reasoning.

Demonstrating its strong capabilities, GLM-4.5V achieves state-of-the-art (SOTA) performance among models of the same scale across 42 public vision-language benchmarks, confirming its leading position in the field.

Technical Highlights

This model features advanced multimodal long-context processing capabilities with multiple technical innovations to enhance image and video processing performance:

66K multimodal long-context processing: Supports both image and video inputs and leverages 3D convolution to enhance video processing efficiency.
Bicubic interpolation mechanism: Improves robustness and capability in handling high-resolution and extreme aspect ratio images.
3D Rotated Positional Encoding (3D-RoPE): Strengthens the model's perception and reasoning of three-dimensional spatial relationships in multimodal information.

GLM-4.5V also follows a three-stage training strategy: pre-training, supervised fine-tuning (SFT) and reinforcement learning (RL):

Pre-training Stage: Large-scale interleaved multimodal corpora and long-context data are used to enhance the model's ability to process complex image–text and video content.
SFT Stage: Explicit chain-of-thought formatted training samples are introduced to improve GLM-4.5V's causal reasoning and multimodal understanding capabilities.
RL Stage: Multi-domain multimodal curriculum reinforcement learning is applied by building a multi-domain reward system that combines verifiable reward-based reinforcement learning (RLVR) and reinforcement learning from human feedback (RLHF), enabling comprehensive optimization in STEM problems, multimodal localization and agentic tasks.

Real-world Performance on SiliconFlow

When provided with an e-commerce page displaying multiple products, GLM-4.5V can identify both discounted and original prices in the image, then accurately calculate discount rates.

Developers' feedback on GLM-4.5V from our community has been very positive.

Now join the community to explore more use cases, share your results and get first-hand support!

Get Started Immediately

Explore: Try GLM-4.5V in the SiliconFlow playground.

Integrate: Use our OpenAI-compatible API. Explore the full API specifications in the SiliconFlow API documentation.

import requests

url = "https://api.siliconflow.com/v1/chat/completions"

payload = {
    "model": "zai-org/GLM-4.5V",
    "max_tokens": 512,
    "enable_thinking": True,
    "thinking_budget": 4096,
    "min_p": 0.05,
    "temperature": 0.7,
    "top_p": 0.7,
    "top_k": 50,
    "frequency_penalty": 0.5,
    "n": 1,
    "messages": [
        {
            "content": "how are you",
            "role": "user"
        }
    ]
}
headers = {
    "Authorization": "Bearer <token>",
    "Content-Type": "application/json"
}

response = requests.post(url, json=payload, headers=headers)

print(response.json())

Ready to scale? Contact us for enterprise deployments, custom integrations and volume pricing for GLM-4.5V.

Step3 Now Live on SiliconFlow: The Leading Open-source Multimodal Reasoning Model

SiliconFlow — Fri, 22 Aug 2025 08:11:31 +0000

Step3, Stepfun's latest cutting-edge multimodal reasoning model is now available on SiliconFlow. Built on a large-scale MoE architecture with 321B total parameters and 38B active parameters, the model delivers exceptional performance in vision-language reasoning. It offers optimized decoding efficiency for enterprise and developer needs, enabling grounded multimodal reasoning with accurate visual interpretation and reduced hallucination.

With SiliconFlow's Step3 API, you can expect:

Cost-Effective Pricing: Step3 $0.57/M tokens (input) and $1.42/M tokens (output).
Context Length: Supports 64K context length.
Native support Tool Use / Function Calling.

Key Capabilities & Benchmark Performance

Step3 features powerful visual perception and advanced reasoning capabilities, enabling accurate cross-domain understanding, multimodal mathematical reasoning and real-world grounded visual understanding tasks.

These capabilities are demonstrated through strong performance across industry-standard benchmarks, highlighting its effectiveness in tasks requiring both visual understanding and reasoning:

VLM Benchmark Performance: Step3 achieves the highest MMMU score (74.2) among open-source VLM models, surpassing proprietary VLM like Gemini 2.5 Flash (73.2); 64.2 on Hallusion Bench, outperforming leading proprietary models including Claude Opus 4 (59.9), Claude Sonnet 4 (57.0) and o3 (60.1), demonstrating Step3's superior performance in complex visual reasoning, factuality and cross-domain comprehension.
LLM Benchmark Performance: Step3 maintains competitive results with 82.9 on AIME25, 73.0 on GPQA-Diamond and 67.1 on LiveCodeBench, showcasing strong capabilities in mathematical reasoning, top graduate-level reasoning and code generation.

In addition to its top-tier performance, Step3 also comes at a lower cost — making it a budget-friendly choice for your workload.

Technical Highlights

Step3 addresses key challenges in multimodal alignment, decoding costs and inference efficiency through full-stack optimizations across model architecture design, training pipeline and deployment:

Pretrain Model Architecture: Step3 employs a novel Multi-Matrix Factorization Attention (MFA) mechanism that reduces KV cache overhead and computational costs while maintaining model capabilities and inference efficiency.
Multimodal Capabilities:
- Step3 uses a 5B Vision Encoder with dual-layer 2D convolution downsampling, reducing visual tokens to 1/16 of original size for improved efficiency;
- Training adopts a two-stage approach: first enhancing encoder perception, then freezing the vision encoder to optimize backbone and connector layers.
AFD System Architecture: Step3 implements Attention-FFN Disaggregation (AFD) that decouples computational tasks into specialized subsystems with multi-stage pipeline scheduling, effectively improving overall throughput efficiency.

Real-world Performance on SiliconFlow

Upload a restaurant receipt to Step3 on SiliconFlow to calculate the meal's calories. It accurately identifies food items, parses complex descriptions, categorizes dishes, matches them with calorie values and estimates total calories (e.g., 900-1330 kcal).

This process formed a complete closed loop — from raw data to concept recognition, calculation, and final explanation — with clear and consistent logic at every stage.

Get Started Immediately

Explore: Try Step3 in the SiliconFlow playground.
Integrate: Use our OpenAI-compatible API. Explore the full API specifications in the SiliconFlow API documentation.

import requests

url = "https://api.siliconflow.com/v1/chat/completions"

payload = {
    "model": "stepfun-ai/step3",
    "max_tokens": 65536,
    "min_p": 0.05,
    "temperature": 0.7,
    "top_p": 0.7,
    "top_k": 50,
    "messages": [
        {
            "role": "user",
            "content": "tell me a story"
        }
    ]
}
headers = {
    "Authorization": "Bearer <token>",
    "Content-Type": "application/json"
}

response = requests.request("POST", url, json=payload, headers=headers)

print(response.text)

Unlock Visual AI Power! Try Step3 now on SiliconFlow!

Think Deeper, Act Faster: Qwen3-235B-A22B-Thinking-2507 Now Available on SiliconFlow

SiliconFlow — Fri, 01 Aug 2025 05:39:36 +0000

With Qwen3-235B-A22B-Instruct-2507 (Non-Thinking mode) already demonstrating exceptional performance on SiliconFlow, today we're excited to bring the next breakthrough to our model catalog: Qwen3-235B-A22B-Thinking-2507. This newly open-source model delivers exceptional advances in both reasoning performance and general intelligence, matching the capabilities of leading proprietary models such as Gemini-2.5 Pro and O4-mini while establishing new performance benchmarks for open-source AI.

From advanced research analysis to complex code generation, developers now have access to unprecedented reasoning performance for sophisticated problem-solving tasks.

With SiliconFlow's Qwen3-235B-A22B-Thinking-2507 API, you can expect:

Cost-Effective Pricing: $0.35/M tokens (input) and $1.42/M tokens (output).
Extended Context Window: 256K context window for complex tasks.

Key Capabilities & Benchmark Performance

Compared to previous open-source models like DeepSeek-R1-0528, Qwen3-235B-A22B-Thinking-2507 demonstrates significant improvements in practical capabilities:

SOTA Reasoning Performance: Significantly improved logical reasoning, mathematics, science, coding, and academic benchmarks that typically require human expertise — achieve state-of-the-art results among open-source thinking models.
Enhanced General Capabilities: Better instruction following, tool usage, text generation, and alignment with human preferences.
Extended Long-context Understanding: Enhanced 256K long-context understanding capabilities.

These capabilities are reflected in the model’s strong and balanced performance across multiple industry-standard benchmarks.

It ranks first among all compared models on LiveCodeBench v6 and Arena-Hard v2, demonstrating superior coding ability and alignment with human preferences. On AIME25, it achieves 92.3 — outperforming Gemini-2.5 Pro (88.0) and matching O4-mini (92.7)—showcasing advanced mathematical reasoning.

Benchmark Comparison

Benchmark	Qwen3-235B-A22B-Thinking-2507	Gemini-2.5 Pro	O4-mini	DeepSeek-R1-0528
GPQA (General Knowledge)	81.1	86.4	81.4	81
AIME25 (Math Reasoning)	🥇 92.3	88.0	92.7	87.5
LiveCodeBench v6 (Code Gen)	🥇 74.1	72.5	71.8	68.7
HLE (Human Judgment Sim)	18.2	21.6	18.1	17.7
Arena-Hard v2 (Alignment)	🥇 79.7	72.5	59.3	72.2

These results demonstrate that Qwen3-235B-A22B-Thinking-2507 is one of the most capable open-source models to date, with competitive performance even against leading proprietary systems.

Real Application Scenarios

Available now on SiliconFlow, Qwen3-235B-A22B-Thinking-2507 features enhanced thinking capabilities with long-context understanding.

- Healthcare Intelligence

Medical literature analysis, clinical decision support, and precision medicine insights derived from patient records and research databases. Analysis of genetic variations, drug interactions, and treatment protocols. Perfect for diagnostic assistance, research evidence synthesis, and personalized treatment planning.

- Educational Enhancement

Interactive tutoring in complex STEM subjects, programming instruction, and personalized learning design. Adapts explanations and step-by-step guidance to individual learning styles and cognitive needs. Ideal for advanced mathematics, coding bootcamps, and research methodology training.

- Business Document Intelligence

Document analysis across contracts, technical specifications, and regulatory filings with contextual cross-referencing. Extracts key insights, identifies compliance risks, and generates executive summaries. Suited for legal document review, due diligence, and knowledge management systems.

Get Started Immediately

Explore: Try Qwen3-235B-A22B-Thinking-2507 in the SiliconFlow playground.
Integrate: Use our OpenAI-compatible API. Explore the full API specifications in the SiliconFlow API documentation.

import requests

url = "https://api.siliconflow.com/v1/chat/completions"

payload = {
    "model": "Qwen/Qwen3-235B-A22B-Thinking-2507",
    "messages": [
        {
            "role": "user",
            "content": "Tell me a story"
        }
    ]
}
headers = {
    "Authorization": "Bearer <token>",
    "Content-Type": "application/json"
}

response = requests.request("POST", url, json=payload, headers=headers)

print(response.text)

Ready to unlock advanced reasoning capabilities?

Explore Qwen3-235B-A22B-Thinking-2507 on SiliconFlow today.

GLM-4.5 Now Available on SiliconFlow: Open-Source SOTA Model for Reasoning, Code, and Agentic Applications

SiliconFlow — Wed, 30 Jul 2025 02:14:38 +0000

Today, we're excited to integrate GLM-4.5 and GLM-4.5-Air, Z.ai's latest flagship model serie, into SiliconFlow platform. This breakthrough model series represents a significant milestone in AGI development by natively unifying reasoning, coding, and agentic capabilities into a single model in order to satisfy more and more complicated requirements of fast-rising agentic applications.

Whether you're tackling full-stack development projects, sophisticated code refactoring, or building autonomous agent systems, GLM-4.5 provides the advanced functionality and reliability that intelligent agentic applications demand. This powerful addition to our model catalog empowers developers to push the boundaries of what's possible in intelligent automation and complex problem-solving scenarios.

With SiliconFlow's GLM-4.5 API, you can expect:

Cost-Effective Pricing: GLM-4.5 $0.5/M tokens (input) and $2/M tokens (output); GLM-4.5-Air $0.14/M tokens (input) and $0.86/M tokens (output).
Extended Context Window: 128K context window for complex tasks.

Key Capabilities & Benchmark Performance

The GLM-4.5 model series now available on SiliconFlow features the following key capabilities:

SOTA Performance: Delivers state-of-the-art results among open-source models in reasoning, code generation and agentic capabilities, with industry-leading performance in real-world code agent evaluations.
MoE Architecture: GLM-4.5 has 355B total/32B active parameters, while GLM-4.5-Air adopts a compact design with 106B total/12B active parameters. Both leverage the Mixture of Experts design for optimal efficiency.
Hybrid Inference: Both provide thinking mode for complex tasks and non-thinking mode for immediate responses.

To comprehensively evaluate GLM-4.5's general capabilities, Z.ai selected 12 representative benchmarks spanning three core domains: reasoning (MMLU Pro, AIME 24, MATH 500), coding (SciCode, GPQA, HLE, LiveCodeBench, SWE-Bench Verified), and agentic capabilities (Terminal-Bench, TAU-Bench, BFCL v3, BrowseComp).

Across these comprehensive metrics, GLM-4.5 demonstrates outstanding performance:

Global Ranking: Ranks 3rd globally across all models on the 12 comprehensive benchmarks, scoring 63.2 — just behind the leader Grok-4 (63.6) and surpassing Claude 4 Opus (60.9).
Open-Source Champion: Top-performing model in the open-source category.
Technical Domains: Demonstrates excellence across mathematical reasoning, scientific problem-solving, code generation, agent workflows, and complex task execution.

What Makes GLM-4.5 So Powerful

Advanced Training Pipeline

Z.ai developed GLM-4.5 using a sophisticated three-stage process:

Pre-training: 15 trillion tokens of general-purpose data for foundational capabilities.
Domain-specific training: 8 trillion tokens focused on code, reasoning, and agent tasks.
Reinforcement learning: Enhanced performance across reasoning, coding, and agent workflows.

Superior Parameter Efficiency

Through Pareto Frontier analysis, GLM-4.5 demonstrates exceptional efficiency:

Optimal scaling: Superior performance relative to models of comparable scale.
Efficiency leadership: Achieves optimal efficiency on the performance-scale trade-off boundary.
Resource advantage: Half the parameters of DeepSeek-R1, one-third of Kimi-K2.
Cost benefits: Higher parameter efficiency translates to faster inference and lower operational costs.

Real-word Performance

Beyond benchmark evaluations, GLM-4.5's practical capabilities have been rigorously tested in real-world coding scenarios:

Agentic Coding Evaluation

Independent evaluation of GLM-4.5's agentic coding capabilities was conducted using Claude Code across 52 diverse coding tasks, including frontend development, tool creation, data analysis, testing, and algorithm implementation.

Competitive Results:

vs. Kimi K2: 53.9% win rate in head-to-head comparisons.
vs. Qwen3-Coder: 80.8% success rate, demonstrating clear superiority
vs. Claude-4-Sonnet: Competitive performance, though further optimization remains possible
Tool calling accuracy: Leading 90.6% success rate, surpassing Claude-4-Sonnet (89.5%), Kimi-K2 (86.2%), and Qwen3-Coder (77.1%)

Real Application Scenarios

GLM-4.5's capabilities extend beyond benchmarks into practical development scenarios, demonstrating versatility across multiple domains through real-world implementations.

Interactive Artifact Creation

GLM-4.5 creates sophisticated standalone artifacts—from interactive mini-games to physics simulations—across HTML, SVG, Python and other formats, delivering superior user experiences for advanced agentic coding applications.

Slides Creation

Leveraging GLM-4.5's powerful agentic tool usage and HTML coding capabilities, the model-native PPT/Poster agent autonomously searches the web, retrieves images, and creates slides from simple requests or uploaded documents.

Full-Stack Web Development

GLM-4.5 excels in both frontend and backend development for modern web applications. Users can create entire websites with just a few words, then effortlessly add features through multi-turn dialogue, making the coding process smooth and enjoyable.

These real-world scenarios demonstrate GLM-4.5's practical utility in professional development workflows, from rapid prototyping to complete application delivery.

Get Started Immediately

Explore: Try GLM-4.5 & GLM-4.5-Air in the SiliconFlow playground.
Integrate: Use our OpenAI-compatible API. Explore the full API specifications in the SiliconFlow API documentation.

import requests

url = "https://api.siliconflow.com/v1/chat/completions"

payload = {
    "model": "zai-org/GLM-4.5",
    "messages": [
        {
            "role": "user",
            "content": "Tell me a story"
        }
    ],
    "top_p": 0.95,
    "temperature": 0.6
}
headers = {
    "Authorization": "Bearer <token>",
    "Content-Type": "application/json"
}

response = requests.request("POST", url, json=payload, headers=headers)

print(response.text)

Build with the GLM-4.5 and GLM-4.5-Air API on SiliconFlow today!