A community fine-tune just went viral on HuggingFace: Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled by Jackrong. It's racked up 218,000+ downloads and 1,465 likes. The idea: distill Claude Opus 4.6's chain-of-thought reasoning patterns into the open Qwen3.5-27B model via LoRA.
The problem? It's self-hosted only — no inference provider supports it yet. Here's how to run it via NexaAPI while waiting for official hosting.
What Is This Model?
- Base: Qwen3.5-27B (dense transformer, 72.4% SWE-bench, matches GPT-5 mini)
- Fine-tune: LoRA rank 64, trained on ~3,950 Claude Opus 4.6 reasoning traces
-
Output format:
<think>reasoning tags + final answer - License: Apache 2.0 (free for commercial use)
- Context: 8K tokens (limitation vs base model's 256K)
Run Qwen3.5-27B via NexaAPI
While the distilled variant awaits inference provider support, you can run the base Qwen3.5-27B (which NexaAPI hosts) and get similar reasoning quality by prompting it correctly:
from openai import OpenAI
client = OpenAI(
api_key="your-nexa-api-key",
base_url="https://nexa-api.com/v1"
)
def qwen_reasoning(problem: str, show_thinking: bool = True) -> dict:
"""
Run Qwen3.5-27B with chain-of-thought reasoning via NexaAPI.
Mimics the Claude-distilled reasoning format.
"""
system_prompt = """You are a precise reasoning assistant. For every problem:
1. First, think through the problem step-by-step inside <think> tags
2. Then provide your final answer after </think>
Format:
<think>
[Your detailed reasoning here]
</think>
[Your final answer here]"""
response = client.chat.completions.create(
model="Qwen/Qwen3.5-27B",
messages=[
{"role": "system", "content": system_prompt},
{"role": "user", "content": problem}
],
temperature=1.0, # Higher temp for reasoning tasks
max_tokens=4096,
extra_body={"enable_thinking": True} # NexaAPI thinking mode
)
content = response.choices[0].message.content
# Parse thinking vs answer
result = {"raw": content, "thinking": "", "answer": ""}
if "<think>" in content and "</think>" in content:
think_start = content.index("<think>") + 7
think_end = content.index("</think>")
result["thinking"] = content[think_start:think_end].strip()
result["answer"] = content[think_end + 8:].strip()
else:
result["answer"] = content
if show_thinking:
print("🧠 Reasoning process:")
print(result["thinking"][:500] + "..." if len(result["thinking"]) > 500 else result["thinking"])
print("\n✅ Final answer:")
print(result["answer"])
return result
# Example: Complex coding problem
result = qwen_reasoning("""
Write a Python function that finds the longest palindromic subsequence in a string.
Explain your approach and provide the implementation with time/space complexity analysis.
""")
Why This Model Matters
The Qwen3.5-27B-Claude distillation represents a trend: open models absorbing proprietary reasoning patterns. Key implications:
- Cost: Run Claude-quality reasoning at 1/5 the price via NexaAPI
- Privacy: Self-hostable, no data leaves your infrastructure
- Customization: Apache 2.0 means you can fine-tune further
The 8K context limitation is real — the base Qwen3.5-27B supports 256K tokens. For production use, the base model via NexaAPI is more practical.
Getting Started
pip install openai
export NEXA_API_KEY="your-key-here"
from openai import OpenAI
# Drop-in replacement for OpenAI
client = OpenAI(
api_key="your-nexa-api-key",
base_url="https://nexa-api.com/v1"
)
# Access Qwen3.5-27B, Claude, Gemini, and 100+ models
response = client.chat.completions.create(
model="Qwen/Qwen3.5-27B",
messages=[{"role": "user", "content": "Explain quantum entanglement simply"}]
)
print(response.choices[0].message.content)
Get started at nexa-api.com — free tier available. Enterprise pricing: frequency404@villaastro.com
Sources: HuggingFace model card (218K downloads, 1,465 likes as of March 2026), Qubrid AI Qwen3.5-27B specs
Top comments (0)