Step 3.5 Flash Series New Release! Available to All Step Plan Users!

#llm #ai

StepFun's latest model, Step 3.5 Flash 2603, is now live. Open to all Step Plan users — welcome to try it!

This model is an optimized version of Step 3.5 Flash. Building on the Flash series' strengths of high response speed and low cost, it brings the following improvements:

Added low think mode, which further reduces token consumption and improves output efficiency in relevant scenarios.
Optimized training for Coding frameworks and Agent frameworks, enhancing experience, stability, and token efficiency.

From our testing, Step 3.5 Flash 2603 in default reasoning mode (high) achieves roughly the same reasoning scores while reducing token consumption by 14%; switching to low think mode results in a slight drop in reasoning scores, but token consumption drops by 56%.

We've noticed that OpenClaw users interact with AI differently — especially in Agent scenarios, where a large number of tasks are high-frequency but not particularly complex. Savvy users are already "allocating on demand": using heavyweight models for complex steps, and lightweight models for intermediate and high-frequency tasks. Not every request needs to "think deeply."

Based on this observation, we've further improved reasoning efficiency and flexibility on top of the already fast Step 3.5 Flash — making it "faster on top of fast," without sacrificing intelligence.

Feedback from early test users confirms this:

The intelligence actually improved — when handling complex tasks, it proactively fixes errors rather than just reporting them.

In a model evaluation conducted by an Agent ecosystem partner, Step 3.5 Flash showed a clear speed advantage in high-frequency Agent scenarios, with total time spent at just half that of competing models.

Get Started Now

Step 3.5 Flash 2603 is now available to all Step Plan users.
After subscribing to Step Plan, you can use this model in the usual way — simply switch the model to step-3.5-flash-2603. Below are examples of configuring thinking intensity using the OpenAI Chat Completions API and the Anthropic Messages API.

openai-style

{
  "model": "step-3.5-flash-2603",
  "messages": [
    {
      "role": "system",
      "content": "You are a helpful assistant. Be concise, accurate, and structured."
    },
    {
      "role": "user",
      "content": "Please explain why in enterprise-level AI applications, latency, stability, and cost are often more important than the peak capability of a single response."
    }
  ],
  "temperature": 0,
  "max_tokens": 250000,
  "reasoning_effort": "low"
}

anthropic-style

{
  "model": "step-3.5-flash-2603",
  "max_tokens": 250000,
  "thinking": {
    "type": "enabled",
    "budget_tokens": 10000
  },
  "output_config": {
    "effort": "low"
  },
  "messages": [
    {
      "role": "user",
      "content": "Please give a brief introduction to StepFun."
    }
  ]
}