This article was originally published on runaihome.com
macOS 0.30.6: /api/embed Crashes with qwen3-embedding:0.6b on Long Inputs
This regression was introduced between versions 0.24.0 and 0.30.6. The qwen3-embedding:0.6b model triggers a segmentation fault in llama-server when processing long input strings via /api/embed on macOS with Metal GPU acceleration. The crash does not occur in version 0.24.0 with identical inputs, confirming this is a regression, not a model limitation.
Fix 1: Downgrade to 0.24.0
If you require the qwen3-embedding:0.6b model with long inputs immediately:
# Stop ollama service
brew services stop ollama
# Uninstall current version
brew uninstall ollama
# Install specific older version
brew install ollama@0.24.0
brew services start ollama@0.24.0
Downgrading removes the regression but loses features and security patches from newer releases.
Fix 2: Chunk Long Inputs
Until a patch is released for 0.30.6, split inputs exceeding approximately 4,000 tokens into smaller chunks and aggregate the resulting embedding vectors:
# Example: Split long text into 2000-character chunks
CHUNK_SIZE=2000
CHUNKS=$(echo "$LONG_TEXT" | fold -w $CHUNK_SIZE | nl -w1 -s' ')
# Generate embeddings per chunk and average
for chunk in $CHUNKS; do
curl -s -X POST http://localhost:11434/api/embeddings -d "{\"model\":\"qwen3-embedding:0.6b\",\"prompt\":\"$chunk\"}"
done
Average the chunk vectors or use mean pooling to produce
Top comments (0)