π― Key Takeaways (TL;DR)
- Breakthrough Release: ByteDance releases Seed-OSS series open-source LLMs under Apache-2.0 license
- Technical Highlights: 36B parameters, native 512K context, controllable thinking budget, trained with only 12T tokens
- Exceptional Performance: Achieves open-source SOTA on multiple benchmarks, particularly excelling in reasoning, coding, and agent tasks
- Practical Value: Provides both base models and instruction-tuned versions for research and commercial applications
Table of Contents
- What is Seed-OSS Model
- Core Technical Features
- Model Architecture Deep Dive
- Performance Benchmarks
- Controllable Thinking Budget Mechanism
- Quick Start Guide
- Competitive Analysis
- Frequently Asked Questions
What is Seed-OSS Model {#what-is-seed-oss}
Seed-OSS is an open-source large language model series developed by ByteDance's Seed Team, designed for powerful long-context, reasoning, agent, and general capabilities. The series includes three versions:
- Seed-OSS-36B-Base: Base model (pre-trained version with synthetic instruction data)
- Seed-OSS-36B-Base-woSyn: Clean base model (without synthetic instruction data)
- Seed-OSS-36B-Instruct: Instruction-tuned model (suitable for various downstream tasks)
π‘ Professional Tip
Seed-OSS is primarily optimized for international (i18n) use cases, showing excellent performance in multilingual support.
Core Technical Features {#key-features}
π― Controllable Thinking Budget
- Users can flexibly adjust reasoning length
- Supports dynamic thinking budget control to enhance inference efficiency
- Recommended to use multiples of 512 (512, 1K, 2K, 4K, 8K, 16K)
π§ Enhanced Reasoning Capability
- Specifically optimized for reasoning tasks
- Maintains balanced and excellent general capabilities
- Achieves 91.7 on AIME24 and 84.7 on AIME25
π€ Agentic Intelligence
- Excels in agent tasks such as tool-using and issue resolving
- TAU1-Retail: 70.4 (open-source SOTA)
- SWE-Bench Verified: 56 (open-source SOTA)
π¬ Research-Friendly
- Provides pre-trained models both with and without synthetic instruction data
- Offers more diverse options for the research community
π Native Long Context
- Supports up to 512K native long context
- RULER (128K) benchmark: 94.6 score
Model Architecture Deep Dive {#architecture}
Parameter | Specification |
---|---|
Parameters | 36B |
Attention | GQA (Grouped Query Attention) |
Activation Function | SwiGLU |
Number of Layers | 64 |
QKV Heads | 80 / 8 / 8 |
Head Dimension | 128 |
Hidden Size | 5120 |
Vocabulary Size | 155K |
Context Length | 512K |
RoPE Base Frequency | 1e7 |
β οΈ Note
The 36B parameter model requires approximately 20GB+ VRAM with Q4 quantization. Consider using inference frameworks that support partial offloading.
Performance Benchmarks {#benchmarks}
Base Model Performance Comparison
Benchmark | Qwen3-30B-A3B-Base | Qwen2.5-32B-Base | Seed-OSS-36B-Base | Seed-OSS-36B-Base-woSyn |
---|---|---|---|---|
MMLU-Pro | 59.8 | 58.5 | 65.1 | 60.4 |
MMLU | 82.7 | 84.0 | 84.9 | 84.8 |
BBH | 81.4 | 79.1 | 87.7 | 87.2 |
GSM8K | 87.0 | 87.5 | 90.8 | 90.3 |
MATH | 61.1 | 63.5 | 81.7 | 61.3 |
HumanEval | 70.7 | 47.6 | 76.8 | 75.6 |
Instruction-Tuned Model Performance
Task Category | Benchmark | Qwen3-30B-A3B-Thinking | Qwen3-32B | Seed-OSS-36B-Instruct |
---|---|---|---|---|
Math Reasoning | AIME24 | 87.7 | 82.7 | 91.7 |
Math Reasoning | AIME25 | 81.3 | 73.3 | 84.7 |
Coding | LiveCodeBench v6 | 60.3 | 53.4 | 67.4 |
Agent | TAU1-Retail | 58.7 | 40.9 | 70.4 |
Agent | SWE-Bench Verified | 31.0 | 23.4 | 56.0 |
Long Context | RULER (128K) | 94.5 | 77.5 | 94.6 |
β Best Practice
Recommended to use temperature=1.1 and top_p=0.95 for sampling to achieve optimal performance.
Controllable Thinking Budget Mechanism {#thinking-budget}
How It Works
The unique feature of Seed-OSS is its controllable thinking budget mechanism, allowing users to flexibly specify the model's thinking budget:
<seed:think>
Let me solve this problem step by step...
<seed:cot_budget_reflect>I have used 129 tokens, and there are 383 tokens remaining for use.</seed:cot_budget_reflect>
Using the power rule...
<seed:cot_budget_reflect>I have used 258 tokens, and there are 254 tokens remaining for use.</seed:cot_budget_reflect>
Alternatively, remember that...
<seed:cot_budget_reflect>I have exhausted my token budget, and now I will start answering the question.</seed:cot_budget_reflect>
</seed:think>
Budget Setting Guidelines
Budget Value | Use Case | Performance |
---|---|---|
0 | Direct answers needed | Fast response, no thinking process |
512 | Simple questions | Basic reasoning, moderate performance |
1K-2K | Medium complexity | Balance efficiency and quality |
4K-8K | Complex reasoning | Deep thinking, high-quality output |
16K+ | Extremely complex tasks | Maximum reasoning capability |
Default(-1) | Unlimited | Auto-adjust thinking length |
Quick Start Guide {#quick-start}
Environment Setup
pip3 install -r requirements.txt
pip install git+ssh://git@github.com/Fazziekey/transformers.git@seed-oss
Basic Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name_or_path = "ByteDance-Seed/Seed-OSS-36B-Instruct"
tokenizer = AutoTokenizer.from_pretrained(model_name_or_path)
model = AutoModelForCausalLM.from_pretrained(
model_name_or_path,
device_map="auto"
)
messages = [
{"role": "user", "content": "How to make pasta?"},
]
tokenized_chat = tokenizer.apply_chat_template(
messages,
tokenize=True,
add_generation_prompt=True,
return_tensors="pt",
thinking_budget=512 # Control thinking budget
)
outputs = model.generate(
tokenized_chat.to(model.device),
max_new_tokens=2048
)
output_text = tokenizer.decode(outputs[0])
vLLM Deployment
# Install vLLM version with Seed-OSS support
VLLM_USE_PRECOMPILED=1 VLLM_TEST_USE_PRECOMPILED_NIGHTLY_WHEEL=1 \
pip install git+ssh://git@github.com/FoolPlayer/vllm.git@seed-oss
# Start API server
python3 -m vllm.entrypoints.openai.api_server \
--host localhost \
--port 4321 \
--model ./Seed-OSS-36B-Instruct \
--tensor-parallel-size 8 \
--dtype bfloat16
Competitive Analysis {#comparison}
Training Efficiency Comparison
Model | Parameters | Training Tokens | Context Length | Special Capabilities |
---|---|---|---|---|
Seed-OSS-36B | 36B | 12T | 512K | Controllable thinking budget |
Qwen3-30B-A3B | 30B | 32T | 1M (RoPE) | Reasoning optimization |
Qwen2.5-32B | 32B | 18T | 128K | General capabilities |
Gemma3-27B | 27B | Undisclosed | 8K | Google ecosystem |
π‘ Professional Tip
Seed-OSS achieves excellent performance with only 12T tokens, demonstrating efficient training strategies and high data quality.
Application Scenario Mapping
graph TD
A[Seed-OSS-36B] --> B[Research Use]
A --> C[Commercial Applications]
A --> D[Agent Development]
B --> B1[Base Model Research]
B --> B2[Fine-tuning Experiments]
C --> C1[Customer Service Systems]
C --> C2[Content Generation]
D --> D1[Code Assistants]
D --> D2[Tool Calling]
π€ Frequently Asked Questions {#faq}
Q: What's the relationship between Seed-OSS and GPT-OSS?
A: Seed-OSS is independently developed by ByteDance and has no direct relationship with OpenAI's GPT-OSS. Both adopt controllable thinking budget design concepts, but differ in architecture and training methods.
Q: How much VRAM does the 36B parameter model require?
A:
- FP16: ~72GB
- INT8: ~36GB
- INT4: ~18-20GB
- Recommend using inference frameworks that support partial offloading, such as vLLM or llama.cpp
Q: How to choose the appropriate thinking budget?
A: Choose based on task complexity:
- Simple QA: 512 tokens
- Math reasoning: 2K-4K tokens
- Complex programming: 4K-8K tokens
- Research analysis: 8K+ tokens
Q: What's the difference between Base and woSyn versions?
A:
- Base version: Pre-trained with synthetic instruction data, better performance
- woSyn version: Clean base model, suitable for research and custom fine-tuning
Q: Which languages does the model support?
A: Seed-OSS is primarily optimized for international use cases, supporting multiple languages with a score of 78.4 on the MMMLU multilingual benchmark.
Q: Are there restrictions for commercial use?
A: Uses Apache-2.0 license, allowing commercial use, but recommend reading the license terms carefully.
Summary and Recommendations
Seed-OSS-36B represents significant progress in the open-source LLM field. Its unique controllable thinking budget mechanism and exceptional performance make it an ideal choice for research and applications.
π― Recommended Use Cases
- Research Institutions: Use woSyn version for fundamental research
- Enterprise Applications: Deploy Instruct version to build intelligent applications
- Developers: Use controllable thinking budget to optimize inference efficiency
- Education Sector: Serve as high-quality open-source resource for teaching and learning
π Future Outlook
- Anticipate release of larger-scale versions (such as the rumored 200B MoE model)
- Continue monitoring community feedback and performance optimizations
- Explore more innovative reasoning control mechanisms
β Take Action Now
Visit Hugging Face to download the model, or check the GitHub repository for latest documentation and example code.
Top comments (0)