<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: oreocato</title>
    <description>The latest articles on DEV Community by oreocato (@oreocato).</description>
    <link>https://dev.to/oreocato</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3914767%2Ff2d31aec-d75e-43f7-9384-f6e91767e661.jpg</url>
      <title>DEV Community: oreocato</title>
      <link>https://dev.to/oreocato</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/oreocato"/>
    <language>en</language>
    <item>
      <title># How to Run Qwen3.6-35B on Your Mac at 77 tok/s</title>
      <dc:creator>oreocato</dc:creator>
      <pubDate>Tue, 05 May 2026 22:42:16 +0000</pubDate>
      <link>https://dev.to/oreocato/-how-to-run-qwen36-35b-on-your-mac-at-77-toks-49e8</link>
      <guid>https://dev.to/oreocato/-how-to-run-qwen36-35b-on-your-mac-at-77-toks-49e8</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Level:&lt;/strong&gt; intermediate&lt;br&gt;&lt;br&gt;
&lt;strong&gt;Estimated time:&lt;/strong&gt; 20-40 minutes (most of it is the model download)&lt;br&gt;&lt;br&gt;
&lt;strong&gt;Minimum requirements:&lt;/strong&gt; Mac with Apple Silicon (M1/M2/M3/M4) and &lt;strong&gt;48 GB of unified RAM&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  What are we setting up?
&lt;/h2&gt;

&lt;p&gt;A local server compatible with the OpenAI API that runs the &lt;strong&gt;Qwen3.6-35B-A3B&lt;/strong&gt; model (quantized to 4 bits) using &lt;a href="https://github.com/ml-explore/mlx" rel="noopener noreferrer"&gt;MLX&lt;/a&gt;, Apple's Machine Learning framework for Silicon. When you're done, you'll have an endpoint at &lt;code&gt;http://127.0.0.1:7979&lt;/code&gt; that you can point any OpenAI-compatible client to (OpenCode, Continue, Cursor, etc.).&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Measured value&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Generation throughput&lt;/td&gt;
&lt;td&gt;~77 tok/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;TTFT (time-to-first-token)&lt;/td&gt;
&lt;td&gt;~0.25 s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Context window&lt;/td&gt;
&lt;td&gt;65 536 – 131 072 tokens&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;RAM required&lt;/td&gt;
&lt;td&gt;~20 GB model + ~12 GB KV cache&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Prerequisites
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Hardware
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Mac with Apple Silicon chip (M1 Pro/Max/Ultra or M2/M3/M4 equivalents)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Minimum 48 GB of unified RAM&lt;/strong&gt; (the quantized model takes ~20 GB; the KV cache needs up to 12 GB additional)&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Software
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Check Python version (you need 3.11+)&lt;/span&gt;
python3 &lt;span class="nt"&gt;--version&lt;/span&gt;

&lt;span class="c"&gt;# Check that you have git&lt;/span&gt;
git &lt;span class="nt"&gt;--version&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you don't have Python 3.11, install it with &lt;a href="https://brew.sh" rel="noopener noreferrer"&gt;Homebrew&lt;/a&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;brew &lt;span class="nb"&gt;install &lt;/span&gt;python@3.11
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Step 1 — Create the virtual environment
&lt;/h2&gt;

&lt;p&gt;From the folder where you want to install everything:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;mkdir &lt;/span&gt;mlx-server &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nb"&gt;cd &lt;/span&gt;mlx-server
python3.11 &lt;span class="nt"&gt;-m&lt;/span&gt; venv .venv
&lt;span class="nb"&gt;source&lt;/span&gt; .venv/bin/activate
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Step 2 — Install dependencies
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;--upgrade&lt;/span&gt; pip

&lt;span class="c"&gt;# MLX and the OpenAI API-compatible server&lt;/span&gt;
pip &lt;span class="nb"&gt;install &lt;/span&gt;mlx-lm
pip &lt;span class="nb"&gt;install &lt;/span&gt;mlx-openai-server
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Verify the installation:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;mlx-openai-server &lt;span class="nt"&gt;--help&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Step 3 — Download the model
&lt;/h2&gt;

&lt;p&gt;The model is automatically downloaded from Hugging Face the first time you run it. It takes approximately &lt;strong&gt;20 GB&lt;/strong&gt; of disk space.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Optional pre-download (recommended to track progress)&lt;/span&gt;
python3 &lt;span class="nt"&gt;-c&lt;/span&gt; &lt;span class="s2"&gt;"
from mlx_lm import load
model, tokenizer = load('mlx-community/Qwen3.6-35B-A3B-4bit')
print('Model downloaded successfully')
"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt; You need a &lt;a href="https://huggingface.co" rel="noopener noreferrer"&gt;huggingface.co&lt;/a&gt; account and to accept the model's terms if the repository requires it. For this model it is not required.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Step 4 — Start the server
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Option A — Direct command (simpler)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;mlx-openai-server launch &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--model-path&lt;/span&gt; mlx-community/Qwen3.6-35B-A3B-4bit &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--model-type&lt;/span&gt; lm &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--host&lt;/span&gt; 127.0.0.1 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--port&lt;/span&gt; 7979 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--tool-call-parser&lt;/span&gt; qwen3_coder &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--reasoning-parser&lt;/span&gt; qwen3_5 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--enable-auto-tool-choice&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--context-length&lt;/span&gt; 65536 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--temperature&lt;/span&gt; 0.7 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--top-p&lt;/span&gt; 0.8 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--top-k&lt;/span&gt; 20 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--min-p&lt;/span&gt; 0.0 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--repetition-penalty&lt;/span&gt; 1.05 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--max-bytes&lt;/span&gt; 12884901888 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--prompt-cache-size&lt;/span&gt; 3 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--log-level&lt;/span&gt; INFO
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Option B — Startup script (recommended)
&lt;/h3&gt;

&lt;p&gt;Save the following script as &lt;code&gt;start-mlx-server.sh&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;#!/usr/bin/env bash&lt;/span&gt;
&lt;span class="nb"&gt;set&lt;/span&gt; &lt;span class="nt"&gt;-euo&lt;/span&gt; pipefail

&lt;span class="nv"&gt;SCRIPT_DIR&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;cd&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;dirname&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;BASH_SOURCE&lt;/span&gt;&lt;span class="p"&gt;[0]&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nb"&gt;pwd&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;span class="nv"&gt;VENV&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$SCRIPT_DIR&lt;/span&gt;&lt;span class="s2"&gt;/.venv"&lt;/span&gt;

&lt;span class="c"&gt;# Default profile: high_context&lt;/span&gt;
&lt;span class="c"&gt;# Change with: MLX_PROFILE=baseline ./start-mlx-server.sh&lt;/span&gt;
&lt;span class="nv"&gt;PROFILE&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;MLX_PROFILE&lt;/span&gt;&lt;span class="k"&gt;:-&lt;/span&gt;&lt;span class="nv"&gt;high_context&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;

&lt;span class="nv"&gt;MODEL_PATH&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"mlx-community/Qwen3.6-35B-A3B-4bit"&lt;/span&gt;
&lt;span class="nv"&gt;HOST&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"127.0.0.1"&lt;/span&gt;
&lt;span class="nv"&gt;PORT&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"7979"&lt;/span&gt;

&lt;span class="nv"&gt;TOOL_CALL_PARSER&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"qwen3_coder"&lt;/span&gt;
&lt;span class="nv"&gt;REASONING_PARSER&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"qwen3_5"&lt;/span&gt;

&lt;span class="nv"&gt;TEMPERATURE&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"0.7"&lt;/span&gt;
&lt;span class="nv"&gt;TOP_P&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"0.8"&lt;/span&gt;
&lt;span class="nv"&gt;TOP_K&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"20"&lt;/span&gt;
&lt;span class="nv"&gt;MIN_P&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"0.0"&lt;/span&gt;
&lt;span class="nv"&gt;REPETITION_PENALTY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"1.05"&lt;/span&gt;
&lt;span class="nv"&gt;MAX_CACHE_BYTES&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"12884901888"&lt;/span&gt;  &lt;span class="c"&gt;# 12 GB&lt;/span&gt;

&lt;span class="nv"&gt;DRAFT_MODEL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"mlx-community/Qwen3.5-0.8B-MLX-4bit"&lt;/span&gt;
&lt;span class="nv"&gt;NUM_DRAFT_TOKENS&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;MLX_NUM_DRAFT_TOKENS&lt;/span&gt;&lt;span class="k"&gt;:-&lt;/span&gt;&lt;span class="nv"&gt;4&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;

&lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$PROFILE&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="k"&gt;in
    &lt;/span&gt;baseline&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="nv"&gt;CONTEXT_LENGTH&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"65536"&lt;/span&gt;
        &lt;span class="nv"&gt;PROMPT_CACHE_SIZE&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"3"&lt;/span&gt;
        &lt;span class="nv"&gt;EXTRA_ARGS&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;""&lt;/span&gt;
        &lt;span class="p"&gt;;;&lt;/span&gt;
    high_context&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="nv"&gt;CONTEXT_LENGTH&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"131072"&lt;/span&gt;
        &lt;span class="nv"&gt;PROMPT_CACHE_SIZE&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"5"&lt;/span&gt;
        &lt;span class="nv"&gt;EXTRA_ARGS&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;""&lt;/span&gt;
        &lt;span class="p"&gt;;;&lt;/span&gt;
    speculative&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="nv"&gt;CONTEXT_LENGTH&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"65536"&lt;/span&gt;
        &lt;span class="nv"&gt;PROMPT_CACHE_SIZE&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"3"&lt;/span&gt;
        &lt;span class="nv"&gt;EXTRA_ARGS&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"--draft-model-path &lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;DRAFT_MODEL&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt; --num-draft-tokens &lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;NUM_DRAFT_TOKENS&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
        &lt;span class="p"&gt;;;&lt;/span&gt;
    speculative_high&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="nv"&gt;CONTEXT_LENGTH&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"131072"&lt;/span&gt;
        &lt;span class="nv"&gt;PROMPT_CACHE_SIZE&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"5"&lt;/span&gt;
        &lt;span class="nv"&gt;EXTRA_ARGS&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"--draft-model-path &lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;DRAFT_MODEL&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt; --num-draft-tokens &lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;NUM_DRAFT_TOKENS&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
        &lt;span class="p"&gt;;;&lt;/span&gt;
    &lt;span class="k"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"Unknown PROFILE: &lt;/span&gt;&lt;span class="nv"&gt;$PROFILE&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
        &lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"Options: baseline, high_context, speculative, speculative_high"&lt;/span&gt;
        &lt;span class="nb"&gt;exit &lt;/span&gt;1
        &lt;span class="p"&gt;;;&lt;/span&gt;
&lt;span class="k"&gt;esac&lt;/span&gt;

&lt;span class="nb"&gt;exec&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$VENV&lt;/span&gt;&lt;span class="s2"&gt;/bin/mlx-openai-server"&lt;/span&gt; launch &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--model-path&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$MODEL_PATH&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--model-type&lt;/span&gt; lm &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--host&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$HOST&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--port&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$PORT&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--tool-call-parser&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$TOOL_CALL_PARSER&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--reasoning-parser&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$REASONING_PARSER&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--enable-auto-tool-choice&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--context-length&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$CONTEXT_LENGTH&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--temperature&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$TEMPERATURE&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--top-p&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$TOP_P&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--top-k&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$TOP_K&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--min-p&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$MIN_P&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--repetition-penalty&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$REPETITION_PENALTY&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--max-bytes&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$MAX_CACHE_BYTES&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--prompt-cache-size&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$PROMPT_CACHE_SIZE&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--log-level&lt;/span&gt; INFO &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nv"&gt;$EXTRA_ARGS&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;chmod&lt;/span&gt; +x start-mlx-server.sh
./start-mlx-server.sh
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Usage examples:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;./start-mlx-server.sh                                      &lt;span class="c"&gt;# high_context (default)&lt;/span&gt;
&lt;span class="nv"&gt;MLX_PROFILE&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;baseline ./start-mlx-server.sh                &lt;span class="c"&gt;# maximum throughput&lt;/span&gt;
&lt;span class="nv"&gt;MLX_PROFILE&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;speculative ./start-mlx-server.sh             &lt;span class="c"&gt;# speculative decoding&lt;/span&gt;
&lt;span class="nv"&gt;MLX_PROFILE&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;speculative &lt;span class="nv"&gt;MLX_NUM_DRAFT_TOKENS&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;6 ./start-mlx-server.sh
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Step 5 — Verify it works
&lt;/h2&gt;

&lt;p&gt;In another terminal, send a test request:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl http://127.0.0.1:7979/v1/chat/completions &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Content-Type: application/json"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'{
    "model": "mlx-community/Qwen3.6-35B-A3B-4bit",
    "messages": [{"role": "user", "content": "Hello, what is 2+2?"}],
    "max_tokens": 100
  }'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You should see a JSON response with the &lt;code&gt;choices[0].message.content&lt;/code&gt; field.&lt;/p&gt;




&lt;h2&gt;
  
  
  Stopping the server
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pkill &lt;span class="nt"&gt;-f&lt;/span&gt; mlx-openai-server
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Or if you have the &lt;code&gt;stop-mlx-server.sh&lt;/code&gt; script:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;#!/usr/bin/env bash&lt;/span&gt;
pkill &lt;span class="nt"&gt;-f&lt;/span&gt; mlx-openai-server &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"Server stopped."&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Connect with your favorite client
&lt;/h2&gt;

&lt;p&gt;The server exposes a 100% OpenAI-compatible API. Just point the &lt;code&gt;base_url&lt;/code&gt; to your local server.&lt;/p&gt;

&lt;h3&gt;
  
  
  OpenCode
&lt;/h3&gt;

&lt;p&gt;Create or edit the &lt;code&gt;opencode.json&lt;/code&gt; file in the root of your project:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"$schema"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"https://opencode.ai/config.json"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"provider"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"mlx-local"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"npm"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"@ai-sdk/openai-compatible"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"MLX Local (Qwen3.6-35B)"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"options"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"baseURL"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"http://127.0.0.1:7979/v1"&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"models"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"mlx-community/Qwen3.6-35B-A3B-4bit"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Qwen3.6-35B-A3B-4bit (local MLX)"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="nl"&gt;"limit"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"context"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;65536&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"output"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;32768&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Continue / Cursor
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;Base URL&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;http://127.0.0.1:7979/v1&lt;/span&gt;
&lt;span class="na"&gt;API Key&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;  &lt;span class="s"&gt;any-value  (the server does not validate it)&lt;/span&gt;
&lt;span class="na"&gt;Model&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;    &lt;span class="s"&gt;mlx-community/Qwen3.6-35B-A3B-4bit&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Python (openai SDK)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OpenAI&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;base_url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;http://127.0.0.1:7979/v1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;local&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;mlx-community/Qwen3.6-35B-A3B-4bit&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Explain what a transformer is&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}]&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Configuration profiles
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Profile&lt;/th&gt;
&lt;th&gt;Context&lt;/th&gt;
&lt;th&gt;Cache&lt;/th&gt;
&lt;th&gt;tok/s measured&lt;/th&gt;
&lt;th&gt;When to use&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;baseline&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;65 536&lt;/td&gt;
&lt;td&gt;3 entries&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;77.4&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Maximum throughput&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;high_context&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;131 072&lt;/td&gt;
&lt;td&gt;5 entries&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;75.7&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Long documents, extended contexts (default)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;blockquote&gt;
&lt;p&gt;The performance difference between both profiles (~2%) is within the noise margin. Use &lt;code&gt;high_context&lt;/code&gt; if you work with large files or very long conversations.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Key parameters explained
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Parameter&lt;/th&gt;
&lt;th&gt;Value&lt;/th&gt;
&lt;th&gt;Why it matters&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;--max-bytes 12884901888&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;12 GB&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;Critical.&lt;/strong&gt; Without this limit the model's KV cache (MoE architecture with &lt;code&gt;ArraysCache&lt;/code&gt;) grows unchecked until it exhausts RAM on contexts &amp;gt;30k tokens&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;--prompt-cache-size 3&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;3 LRU entries&lt;/td&gt;
&lt;td&gt;Limits how many conversations the prefix cache keeps in memory&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;--context-length 65536&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;64k tokens&lt;/td&gt;
&lt;td&gt;Maximum context window per request&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;--temperature 0.7&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;Balance between creativity and coherence&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;--repetition-penalty 1.05&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;Reduces repetitions in long responses&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Troubleshooting
&lt;/h2&gt;

&lt;h3&gt;
  
  
  The server disconnects after 30,000 tokens
&lt;/h3&gt;

&lt;p&gt;This was a known bug with the Qwen3.6-35B-A3B model due to its hybrid MoE architecture. The fix is to make sure you pass &lt;code&gt;--max-bytes 12884901888&lt;/code&gt;. With this parameter the server works correctly up to 60,000+ tokens (verified).&lt;/p&gt;




&lt;h2&gt;
  
  
  Architecture notes (for the curious)
&lt;/h2&gt;

&lt;p&gt;Qwen3.6-35B-A3B is a hybrid &lt;strong&gt;MoE (Mixture of Experts)&lt;/strong&gt; model. Instead of activating all parameters per token, it only activates a subset of "experts", making it efficient for its size. The &lt;code&gt;4bit&lt;/code&gt; version quantizes the weights to 4 bits, reducing RAM usage from ~70 GB to ~20 GB with minimal quality loss.&lt;/p&gt;

&lt;p&gt;MLX leverages Apple Silicon's &lt;strong&gt;unified memory&lt;/strong&gt;: the GPU and CPU share the same RAM pool, eliminating the transfer bottleneck that exists in systems with a dedicated GPU. That's why a Mac with 48 GB can run a model that on a PC would require a GPU with 80 GB of VRAM.&lt;/p&gt;




&lt;h2&gt;
  
  
  References
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://github.com/ml-explore/mlx" rel="noopener noreferrer"&gt;MLX on GitHub&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/ml-explore/mlx-lm" rel="noopener noreferrer"&gt;mlx-lm&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://huggingface.co/mlx-community/Qwen3.6-35B-A3B-4bit" rel="noopener noreferrer"&gt;Model on Hugging Face&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/pchavanne/mlx-openai-server" rel="noopener noreferrer"&gt;mlx-openai-server&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>tutorial</category>
      <category>beginners</category>
    </item>
  </channel>
</rss>
