DEV Community

Jovan Chan
Jovan Chan

Posted on • Originally published at runaihome.com

macOS 0.30.6: /api/embed with qwen3-embedding:0.6b crashes l Fix 2026

This article was originally published on runaihome.com

macOS 0.30.6: /api/embed Crashes with qwen3-embedding:0.6b on Long Inputs

This regression was introduced between versions 0.24.0 and 0.30.6. The qwen3-embedding:0.6b model triggers a segmentation fault in llama-server when processing long input strings via /api/embed on macOS with Metal GPU acceleration. The crash does not occur in version 0.24.0 with identical inputs, confirming this is a regression, not a model limitation.

Fix 1: Downgrade to 0.24.0

If you require the qwen3-embedding:0.6b model with long inputs immediately:

# Stop ollama service
brew services stop ollama

# Uninstall current version
brew uninstall ollama

# Install specific older version
brew install ollama@0.24.0
brew services start ollama@0.24.0
Enter fullscreen mode Exit fullscreen mode

Downgrading removes the regression but loses features and security patches from newer releases.

Fix 2: Chunk Long Inputs

Until a patch is released for 0.30.6, split inputs exceeding approximately 4,000 tokens into smaller chunks and aggregate the resulting embedding vectors:

# Example: Split long text into 2000-character chunks
CHUNK_SIZE=2000
CHUNKS=$(echo "$LONG_TEXT" | fold -w $CHUNK_SIZE | nl -w1 -s' ')

# Generate embeddings per chunk and average
for chunk in $CHUNKS; do
  curl -s -X POST http://localhost:11434/api/embeddings -d "{\"model\":\"qwen3-embedding:0.6b\",\"prompt\":\"$chunk\"}"
done
Enter fullscreen mode Exit fullscreen mode

Average the chunk vectors or use mean pooling to produce

Top comments (0)