<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: JT</title>
    <description>The latest articles on DEV Community by JT (@uncsoft).</description>
    <link>https://dev.to/uncsoft</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3806704%2F0fe088ff-d35f-4018-8f2e-095a7abde38f.png</url>
      <title>DEV Community: JT</title>
      <link>https://dev.to/uncsoft</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/uncsoft"/>
    <language>en</language>
    <item>
      <title>Your Mac Is a Supercomputer. It's Time We Benchmarked It Like One.</title>
      <dc:creator>JT</dc:creator>
      <pubDate>Thu, 05 Mar 2026 02:30:59 +0000</pubDate>
      <link>https://dev.to/uncsoft/your-mac-is-a-supercomputer-its-time-we-benchmarked-it-like-one-3708</link>
      <guid>https://dev.to/uncsoft/your-mac-is-a-supercomputer-its-time-we-benchmarked-it-like-one-3708</guid>
      <description>&lt;p&gt;&lt;em&gt;Why open source local AI benchmarking on Apple Silicon matters - and why your benchmark submission is more valuable than you think.&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fffzz1xmxtya6k07wk6ju.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fffzz1xmxtya6k07wk6ju.png" alt="Anubis OSS" width="800" height="520"&gt;&lt;/a&gt;&lt;br&gt;
The narrative around AI has been almost entirely cloud-centric. You send a prompt to a data center somewhere, tokens come back, and you pretend not to think about what that costs in latency, money, or privacy. For a long time, that was the only game in town.&lt;/p&gt;

&lt;p&gt;That's changing fast.&lt;/p&gt;

&lt;p&gt;Apple Silicon - from the M1 to the M4 Pro/ Max shipping in machines today, with M5 Max on the horizon - has quietly become one of the most capable local AI compute platforms on the planet. The unified memory architecture means an M4 Max with 128GB of RAM can run models that would require a dedicated GPU workstation in any other form factor. At laptop wattages. Silently. Offline. Without sending a single token to a third party.&lt;/p&gt;

&lt;p&gt;This isn't a niche enthusiast story anymore. It's a real shift in how developers, researchers, and privacy-conscious professionals are choosing to run AI workloads. And it comes with a problem we haven't solved yet: &lt;strong&gt;we don't have great, shared, community-driven data on how these machines actually perform in the wild.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;That's what I built &lt;a href="https://github.com/uncSoft/anubis-oss" rel="noopener noreferrer"&gt;Anubis OSS&lt;/a&gt; to help fix.&lt;br&gt;
&lt;a href="https://devpadapp.com/anubis-oss.html" rel="noopener noreferrer"&gt;App Page&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  The Fragmented Local LLM Ecosystem
&lt;/h2&gt;

&lt;p&gt;If you've spent time running local models on macOS, you've felt this friction. The tooling is scattered and siloed:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Chat wrappers&lt;/strong&gt; like Ollama, LM Studio, and Jan are excellent at what they do — conversation - but they're not built for systematic performance testing.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Hardware monitors&lt;/strong&gt; like &lt;code&gt;asitop&lt;/code&gt;, &lt;code&gt;macmon&lt;/code&gt;, and &lt;code&gt;mactop&lt;/code&gt; give you a beautiful CLI view of GPU and CPU utilization, but they have no concept of what the LLM is doing, which model is loaded, or what the prompt context size is.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Eval frameworks&lt;/strong&gt; like &lt;code&gt;promptfoo&lt;/code&gt; require YAML configs and terminal fluency that puts them out of reach for a lot of practitioners.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;None of these tools &lt;strong&gt;correlate hardware behavior with inference performance&lt;/strong&gt; in a meaningful, accessible way. You can watch your GPU spike during a generation pass, but you can't easily answer: &lt;em&gt;Is Gemma 3 12B Q4_K_M more watt-efficient than Mistral Small 3.1 on an M3 Pro? How does TTFT scale with context length on an M4 with 32GB vs. 64GB? Which quantization gives the best tokens-per-watt on the Neural Engine?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Anubis answers those questions. It's a native SwiftUI app — no Electron, no Python runtime, no external dependencies — that runs benchmark sessions against any OpenAI-compatible backend (Ollama, LM Studio, mlx-lm, vLLM, and more) while simultaneously pulling real hardware telemetry via IOReport: GPU utilization, CPU utilization, GPU/CPU/ANE/DRAM power in watts, GPU frequency, process memory including Metal allocations, and thermal state.&lt;/p&gt;

&lt;p&gt;Every run is logged, exportable as CSV or Markdown, and — optionally — submittable to the community leaderboard.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why the Open Dataset Is the Real Story
&lt;/h2&gt;

&lt;p&gt;The leaderboard submissions aren't just a scoreboard. They're the beginning of something more interesting: &lt;strong&gt;a real-world, community-sourced performance dataset across diverse Apple Silicon configurations, model families, quantizations, and backends.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Think about what that dataset contains:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Tokens per second across M1, M2, M3, M4, M5+ chips with varying unified memory configurations&lt;/li&gt;
&lt;li&gt;Time to first token (TTFT) as a function of prompt length and model size&lt;/li&gt;
&lt;li&gt;Watts-per-token efficiency across quantization levels (Q4, Q5, Q8, fp16) for the same model family&lt;/li&gt;
&lt;li&gt;Backend-specific variance — the same model, same chip, different runner (Ollama vs. mlx-lm vs. LM Studio)&lt;/li&gt;
&lt;li&gt;Thermal throttling behavior under sustained inference loads&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This data is hard to get any other way. Formal benchmarks from chipmakers are synthetic. Reviewer benchmarks cover a handful of models on a handful of chips. Nobody has the time or hardware budget to run a comprehensive cross-product matrix.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F22bt2k4ydnw19d026y32.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F22bt2k4ydnw19d026y32.jpg" alt="Leaderboard dataset" width="800" height="1691"&gt;&lt;/a&gt;&lt;br&gt;
But collectively, the community does.&lt;/p&gt;

&lt;h3&gt;
  
  
  How Model Runner Developers Can Use This
&lt;/h3&gt;

&lt;p&gt;If you're building or maintaining a backend like Ollama, LM Studio, or mlx-lm, community benchmark data tells you things your internal testing can't:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Which chip/memory configurations are underperforming relative to their theoretical bandwidth?&lt;/strong&gt; If an M3 Pro with 36GB is consistently underperforming an M2 Max with the same VRAM on a specific model class, that's a signal worth investigating in your memory management or Metal compute path.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Where is TTFT worst?&lt;/strong&gt; Time to first token is often the user-perceived latency that matters most. If community data shows TTFT degrading sharply on longer contexts for certain quantizations, that's a tuning opportunity.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;What's the real-world power envelope?&lt;/strong&gt; Synthetic benchmarks don't capture sustained thermal behavior. Community submissions do.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  How Model Tuners and Quantization Authors Can Use This
&lt;/h3&gt;

&lt;p&gt;The dataset is equally valuable if you're working on GGUF quantizations, MLX conversions, or fine-tuned adapters:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Quantization efficiency curves&lt;/strong&gt; across real hardware tell you where the quality/performance tradeoff actually lands for end users, not just on a benchmark server.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;ANE utilization patterns&lt;/strong&gt; — which quantization levels or architectures make better use of the Neural Engine — are nearly invisible without this kind of community telemetry at scale.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Memory footprint data&lt;/strong&gt; (including Metal/GPU allocations tracked via &lt;code&gt;proc_pid_rusage&lt;/code&gt;) shows whether your quantization is actually reducing real-world memory pressure or just parameter count.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you're shipping a new quantization of a popular model and want to understand how it performs across the installed base of Apple Silicon hardware your users actually own, &lt;strong&gt;this dataset is the closest thing to field telemetry you'll have access to.&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  The Apple Silicon Trajectory Makes This Urgent
&lt;/h2&gt;

&lt;p&gt;M5 is coming. The M4 Ultra hasn't even shipped in the Mac Pro yet. The memory ceiling on Apple Silicon keeps rising — 192GB on the current M4 Max configuration means models that were cloud-only a year ago can run locally today.&lt;/p&gt;

&lt;p&gt;Each generation, the gap between "what you can run locally" and "what you need the cloud for" narrows. We're already past the inflection point for most 7B–13B models. We're approaching it for 30B–70B classes on high-end configs. The 100B+ frontier is a matter of time and memory density.&lt;/p&gt;

&lt;p&gt;The decisions that model runners and quantization authors make &lt;strong&gt;right now&lt;/strong&gt; - about memory management, about Metal optimization, about ANE scheduling — will determine how well that new hardware gets utilized by the community. Having a community dataset that shows real-world performance gaps is one of the best early-warning systems we have.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why You Should Participate
&lt;/h2&gt;

&lt;p&gt;Running a benchmark in Anubis takes about two minutes. Submitting it to the leaderboard takes one click. But here's why it's worth your time beyond the leaderboard:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Your hardware configuration is probably underrepresented.&lt;/strong&gt; The M4 Pro with 48GB, the M2 Max with 96GB, the M3 Ultra — the matrix of chip × memory × thermal environment × backend is enormous. Every submission fills in a cell of that matrix that nobody else may have covered.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Your workload patterns matter.&lt;/strong&gt; If you're running long-context inference, or using a backend that others aren't, or benchmarking a model that just dropped — that data is genuinely novel and useful to the ecosystem.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The dataset is open.&lt;/strong&gt; This isn't data that disappears into a corporate analytics pipeline. It's a community resource, available for anyone building tools, writing research, or trying to optimize for the platform.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The project needs the stars.&lt;/strong&gt; Anubis OSS is working toward &lt;a href="https://github.com/uncSoft/anubis-oss" rel="noopener noreferrer"&gt;75 GitHub stars&lt;/a&gt; to qualify for Homebrew Cask distribution, which would make it dramatically easier for people to install and run. If you find value in what the project is doing, a star is a genuinely meaningful contribution to its reach.&lt;/p&gt;




&lt;h2&gt;
  
  
  Get Started
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Download Anubis OSS&lt;/strong&gt; from the &lt;a href="https://github.com/uncSoft/anubis-oss/releases/latest" rel="noopener noreferrer"&gt;latest GitHub release&lt;/a&gt; — it's a notarized macOS app, no build required&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Run a benchmark&lt;/strong&gt; — load any model in your preferred backend, pick a prompt preset, hit run&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Submit your results&lt;/strong&gt; to the &lt;a href="https://devpadapp.com/leaderboard.html" rel="noopener noreferrer"&gt;community leaderboard&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Star the repo&lt;/strong&gt; at &lt;a href="https://github.com/uncSoft/anubis-oss" rel="noopener noreferrer"&gt;github.com/uncSoft/anubis-oss&lt;/a&gt; to help us hit Homebrew distribution&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The local AI era is here. Let's build the shared infrastructure to understand it.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Anubis OSS is GPL-3.0 licensed. Built in Swift, no external dependencies, privacy-first — your benchmark data is submitted voluntarily and never includes anything beyond hardware specs and model performance metrics. A limited version is also available as part of &lt;a href="https://apps.apple.com/us/app-bundle/the-architects-toolkit/id1874965091?mt=12" rel="noopener noreferrer"&gt;The Architect's Toolkit&lt;/a&gt; bundle on the Mac App Store.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Questions or contributions? Open an issue or PR on &lt;a href="https://github.com/uncSoft/anubis-oss" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>macos</category>
      <category>opensource</category>
      <category>llm</category>
      <category>machinelearning</category>
    </item>
    <item>
      <title>Slidepad is cool but..</title>
      <dc:creator>JT</dc:creator>
      <pubDate>Thu, 05 Mar 2026 02:15:38 +0000</pubDate>
      <link>https://dev.to/uncsoft/slidepad-is-cool-but-ob2</link>
      <guid>https://dev.to/uncsoft/slidepad-is-cool-but-ob2</guid>
      <description>&lt;p&gt;What if it was half the price and had double the features, and was the available in the apple app store? I gotchu..&lt;br&gt;
&lt;a href="Https://devpadapp.com"&gt;DevPad&lt;/a&gt;&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftj5l34k51ixyjwqnyvgl.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftj5l34k51ixyjwqnyvgl.png" alt=" " width="800" height="799"&gt;&lt;/a&gt;&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkbleqi671d9glv492cg9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkbleqi671d9glv492cg9.png" alt=" " width="800" height="799"&gt;&lt;/a&gt;&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxq6x59vtddypi0x7tdgv.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxq6x59vtddypi0x7tdgv.png" alt=" " width="800" height="799"&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>slidepad</category>
      <category>devpad</category>
      <category>ai</category>
      <category>swift</category>
    </item>
    <item>
      <title>Open Source Local LL Benchmarking and Leaderboards on MacOS</title>
      <dc:creator>JT</dc:creator>
      <pubDate>Wed, 04 Mar 2026 23:11:18 +0000</pubDate>
      <link>https://dev.to/uncsoft/open-source-local-ll-benchmarking-and-leaderboards-on-macos-4a28</link>
      <guid>https://dev.to/uncsoft/open-source-local-ll-benchmarking-and-leaderboards-on-macos-4a28</guid>
      <description>&lt;h2&gt;
  
  
  Anubis &lt;img alt="anubis_icon (1)" src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fgithub.com%2Fuser-attachments%2Fassets%2F4369ce8d-8f3a-4502-9c49-6f3a82372e00" width="183" height="200"&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Local LLM Testing &amp;amp; Benchmarking for Apple Silicon&lt;/strong&gt; | &lt;a href="https://devpadapp.com/leaderboard.html" rel="noopener noreferrer"&gt;Community Leaderboard&lt;/a&gt;&lt;br&gt;
&lt;a href="https://github.com/uncSoft/anubis-oss" rel="noopener noreferrer"&gt;Github&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Anubis is a native macOS app for benchmarking, comparing, and managing local large language models using any OpenAI-compatible endpoint - Ollama, MLX, LM Studio Server, OpenWebUI, Docker Models, etc. Built with SwiftUI for Apple Silicon, it provides real-time hardware telemetry correlated with full, history-saved inference performance - something no CLI tool or chat wrapper offers. Export benchmarks directly without having to screenshot, and export the raw data as .MD or .CSV from the history. You can even &lt;code&gt;OLLAMA PULL&lt;/code&gt; models directly within the app. The binary is signed with an Apple Dev certificate. We are trying to get to 75 stars to distribute as a Cask via Homebrew. Please consider submitting your results to the leadersboards, the &lt;a href="https://devpadapp.com/explorer.html" rel="noopener noreferrer"&gt;dataset&lt;/a&gt; is also open-source, and we are trying to use the data with researchers to enhance how models are trained and perform.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fgithub.com%2Fuser-attachments%2Fassets%2Fc4b81dca-1a81-4b1e-8bbb-109a99a7e7bb" class="article-body-image-wrapper"&gt;&lt;img alt="image" src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fgithub.com%2Fuser-attachments%2Fassets%2Fc4b81dca-1a81-4b1e-8bbb-109a99a7e7bb" width="760" height="1092"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fgithub.com%2Fuser-attachments%2Fassets%2F5da02ee4-ef48-4785-9c46-1803b07d553f" class="article-body-image-wrapper"&gt;&lt;img alt="image" src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fgithub.com%2Fuser-attachments%2Fassets%2F5da02ee4-ef48-4785-9c46-1803b07d553f" width="760" height="492"&gt;&lt;/a&gt;&lt;/p&gt;


&lt;h2&gt;
  
  
  Why Anubis?
&lt;/h2&gt;

&lt;p&gt;The local LLM ecosystem on macOS is fragmented:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Chat wrappers&lt;/strong&gt; (Ollama, LM Studio, Jan) focus on conversation, not systematic testing&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Performance monitors&lt;/strong&gt; (asitop, macmon, mactop) are CLI-only and lack LLM context&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Evaluation frameworks&lt;/strong&gt; (promptfoo) require YAML configs and terminal expertise&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No tool&lt;/strong&gt; correlates hardware metrics (GPU / CPU / ANE / power / memory) with inference speed in real time&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Anubis fills that gap with three integrated modules - all in a native macOS app.&lt;/p&gt;


&lt;h2&gt;
  
  
  Leaderboard Submissions Now Available! Submit directly through the app
&lt;/h2&gt;
&lt;h3&gt;
  
  
  The dataset is robut and open source - &lt;a href="https://devpadapp.com/explorer.html" rel="noopener noreferrer"&gt;check it out here&lt;/a&gt;, please contribute!
&lt;/h3&gt;
&lt;h2&gt;
  
  
  Features
&lt;/h2&gt;
&lt;h3&gt;
  
  
  Benchmark
&lt;/h3&gt;

&lt;p&gt;Real-time performance dashboard for single-model testing.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Select any model from any configured backend&lt;/li&gt;
&lt;li&gt;Stream responses with live metrics overlay&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;8 metric cards&lt;/strong&gt;: Tokens/sec, GPU %, CPU %, Time to First Token, Process Memory, Model Memory, Thermal State, GPU Frequency&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;7 live charts&lt;/strong&gt;: Tokens/sec, GPU utilization, CPU utilization, process memory, GPU/CPU/ANE/DRAM power, GPU frequency - all updating in real time&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Power telemetry&lt;/strong&gt;: Real-time GPU, CPU, ANE, and DRAM power consumption in watts via IOReport&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Process monitoring&lt;/strong&gt;: Auto-detects backend process by port (Ollama, LM Studio, mlx-lm, vLLM, etc.) with manual process picker&lt;/li&gt;
&lt;li&gt;Detailed session stats: peak tokens/sec, average token latency, model load time, context length, eval duration, power averages&lt;/li&gt;
&lt;li&gt;Configurable parameters: temperature, top-p, max tokens, system prompt&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Prompt presets&lt;/strong&gt; organized by category (Quick, Reasoning, Coding, Creative, Benchmarking)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Session history&lt;/strong&gt; with full replay, CSV export, and Markdown reports&lt;/li&gt;
&lt;li&gt;Expanded full-screen metrics dashboard&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Image export&lt;/strong&gt;: Copy to clipboard, save as PNG, or share - 2x retina rendering with watermark, respects light/dark mode&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  Arena
&lt;/h3&gt;

&lt;p&gt;Side-by-side A/B model comparison with the same prompt.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Dual model selectors with independent backend selection&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Sequential&lt;/strong&gt; mode (memory-safe, one at a time) or &lt;strong&gt;Parallel&lt;/strong&gt; mode (both simultaneously)&lt;/li&gt;
&lt;li&gt;Shared prompt, system prompt, and generation parameters&lt;/li&gt;
&lt;li&gt;Real-time streaming in both panels&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Voting system&lt;/strong&gt;: pick Model A, Model B, or Tie - votes are persisted&lt;/li&gt;
&lt;li&gt;Per-panel stats grid (9 metrics each)&lt;/li&gt;
&lt;li&gt;Model manager: view loaded models and unload to free memory&lt;/li&gt;
&lt;li&gt;Comparison history with voting records&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  Leaderboard &lt;em&gt;(New in 2.1)&lt;/em&gt;
&lt;/h3&gt;

&lt;p&gt;Upload your benchmark results to the &lt;a href="https://devpadapp.com/leaderboard.html" rel="noopener noreferrer"&gt;community leaderboard&lt;/a&gt; and see how your Mac stacks up against other Apple Silicon machines.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;One-click upload&lt;/strong&gt; from the benchmark toolbar after a completed run&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Community rankings&lt;/strong&gt; sorted by tokens/sec with full drill-down into performance, power, and hardware details&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Filter by chip or model&lt;/strong&gt; to compare like-for-like (e.g. all M4 Max results, or all Llama 3.2 runs)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://devpadapp.com/explorer.html" rel="noopener noreferrer"&gt;Data Explorer&lt;/a&gt;&lt;/strong&gt; — interactive pivot table and charting powered by FINOS Perspective&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Privacy-first&lt;/strong&gt;: no accounts, no response text uploaded — just metrics and a display name&lt;/li&gt;
&lt;li&gt;HMAC-signed submissions with server-side rate limiting&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  Auto-Update &lt;em&gt;(New in 2.3)&lt;/em&gt;
&lt;/h3&gt;

&lt;p&gt;Anubis checks for updates automatically via &lt;a href="https://sparkle-project.org/" rel="noopener noreferrer"&gt;Sparkle&lt;/a&gt; and notifies you when a new version is available.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Automatic checks&lt;/strong&gt; on launch with user-controlled frequency&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Manual check&lt;/strong&gt; via the app menu (&lt;strong&gt;Anubis OSS &amp;gt; Check for Updates...&lt;/strong&gt;) or &lt;strong&gt;Settings &amp;gt; About&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Updates are code-signed, notarized, and verified with EdDSA before installation&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  Vault
&lt;/h3&gt;

&lt;p&gt;Unified model management across all backends.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Aggregated model list with search and backend filter chips&lt;/li&gt;
&lt;li&gt;Running models section with live VRAM usage&lt;/li&gt;
&lt;li&gt;Model inspector: size, parameters, quantization, family, context window, architecture details, file path&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Automatic metadata enrichment&lt;/strong&gt; for OpenAI-compatible models - parses model IDs for family and parameter count, scans &lt;code&gt;~/.lmstudio/models/&lt;/code&gt; and &lt;code&gt;~/.cache/huggingface/hub/&lt;/code&gt; for disk size, quantization, and path&lt;/li&gt;
&lt;li&gt;Pull new models, delete existing ones, unload from memory&lt;/li&gt;
&lt;li&gt;Popular model suggestions for quick setup&lt;/li&gt;
&lt;li&gt;Total disk usage display&lt;/li&gt;
&lt;/ul&gt;


&lt;h2&gt;
  
  
  Screenshots
&lt;/h2&gt;

&lt;p&gt;GPU Core detail&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fgithub.com%2Fuser-attachments%2Fassets%2F7cf7d6f2-bcb5-4f96-b04b-19d96df29e87" class="article-body-image-wrapper"&gt;&lt;img alt="Screenshot 2026-02-25 at 4 08 44 PM" src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fgithub.com%2Fuser-attachments%2Fassets%2F7cf7d6f2-bcb5-4f96-b04b-19d96df29e87" width="760" height="492"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Arena Mode&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fgithub.com%2Fuser-attachments%2Fassets%2Fc364bd43-4300-4565-8e6b-7fcae9e8dcd8" class="article-body-image-wrapper"&gt;&lt;img alt="Screenshot 2026-02-25 at 4 21 50 PM" src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fgithub.com%2Fuser-attachments%2Fassets%2Fc364bd43-4300-4565-8e6b-7fcae9e8dcd8" width="760" height="492"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Settings (add connections with quick presets)&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fgithub.com%2Fuser-attachments%2Fassets%2Fff9bb9fa-aa6e-472a-a787-5583a3883105" class="article-body-image-wrapper"&gt;&lt;img alt="Screenshot 2026-02-25 at 4 24 00 PM" src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fgithub.com%2Fuser-attachments%2Fassets%2Fff9bb9fa-aa6e-472a-a787-5583a3883105" width="800" height="517"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Vault - View model details, unload, and Pull models directly for Ollama&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fgithub.com%2Fuser-attachments%2Fassets%2F795157b5-efe8-4895-b499-beef25de9683" class="article-body-image-wrapper"&gt;&lt;img alt="Screenshot 2026-02-25 at 4 14 57 PM" src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fgithub.com%2Fuser-attachments%2Fassets%2F795157b5-efe8-4895-b499-beef25de9683" width="800" height="517"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  Supported Backends
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Backend&lt;/th&gt;
&lt;th&gt;Type&lt;/th&gt;
&lt;th&gt;Default Port&lt;/th&gt;
&lt;th&gt;Setup&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Ollama&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Native support&lt;/td&gt;
&lt;td&gt;11434&lt;/td&gt;
&lt;td&gt;Install from &lt;a href="https://ollama.com" rel="noopener noreferrer"&gt;ollama.com&lt;/a&gt; - auto-detected on launch&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;LM Studio&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;OpenAI-compatible&lt;/td&gt;
&lt;td&gt;1234&lt;/td&gt;
&lt;td&gt;Enable local server in LM Studio settings&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;mlx-lm&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;OpenAI-compatible&lt;/td&gt;
&lt;td&gt;8080&lt;/td&gt;
&lt;td&gt;&lt;code&gt;pip install mlx-lm &amp;amp;&amp;amp; mlx_lm.server --model &amp;lt;model&amp;gt;&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;vLLM&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;OpenAI-compatible&lt;/td&gt;
&lt;td&gt;8000&lt;/td&gt;
&lt;td&gt;Add in Settings&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;LocalAI&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;OpenAI-compatible&lt;/td&gt;
&lt;td&gt;8080&lt;/td&gt;
&lt;td&gt;Add in Settings&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Docker ModelRunner&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;OpenAI-compatible&lt;/td&gt;
&lt;td&gt;user selected&lt;/td&gt;
&lt;td&gt;Add in Settings&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Any OpenAI-compatible server can be added through &lt;strong&gt;Settings &amp;gt; Add OpenAI-Compatible Server&lt;/strong&gt; with a name, URL, and optional API key.&lt;/p&gt;


&lt;h2&gt;
  
  
  Hardware Metrics
&lt;/h2&gt;

&lt;p&gt;Anubis captures Apple Silicon telemetry during inference via IOReport and system APIs:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Source&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;GPU Utilization&lt;/td&gt;
&lt;td&gt;IOReport&lt;/td&gt;
&lt;td&gt;GPU active residency percentage&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;CPU Utilization&lt;/td&gt;
&lt;td&gt;&lt;code&gt;host_processor_info&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Usage across all cores&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GPU Power&lt;/td&gt;
&lt;td&gt;IOReport Energy Model&lt;/td&gt;
&lt;td&gt;GPU power consumption in watts&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;CPU Power&lt;/td&gt;
&lt;td&gt;IOReport Energy Model&lt;/td&gt;
&lt;td&gt;CPU (E-cores + P-cores) power in watts&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;ANE Power&lt;/td&gt;
&lt;td&gt;IOReport Energy Model&lt;/td&gt;
&lt;td&gt;Neural Engine power consumption&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;DRAM Power&lt;/td&gt;
&lt;td&gt;IOReport Energy Model&lt;/td&gt;
&lt;td&gt;Memory subsystem power&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GPU Frequency&lt;/td&gt;
&lt;td&gt;IOReport GPU Stats&lt;/td&gt;
&lt;td&gt;Weighted average from P-state residency&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Process Memory&lt;/td&gt;
&lt;td&gt;&lt;code&gt;proc_pid_rusage&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Backend process &lt;code&gt;phys_footprint&lt;/code&gt; (includes Metal/GPU allocations)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Thermal State&lt;/td&gt;
&lt;td&gt;&lt;code&gt;ProcessInfo.thermalState&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;System thermal pressure level&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;
&lt;h3&gt;
  
  
  Process Monitoring
&lt;/h3&gt;

&lt;p&gt;Anubis automatically detects which process is serving your model:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Port-based detection&lt;/strong&gt;: Uses &lt;code&gt;lsof&lt;/code&gt; to find the PID listening on the inference port (called once per benchmark start)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Backend identification&lt;/strong&gt;: Matches process path and command-line args to identify Ollama, LM Studio, mlx-lm, vLLM, LocalAI, llama.cpp&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Memory accounting&lt;/strong&gt;: Uses &lt;code&gt;phys_footprint&lt;/code&gt; (same as Activity Monitor) which includes Metal/GPU buffer allocations - critical for MLX and other GPU-accelerated backends&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;LM Studio support&lt;/strong&gt;: Walks Electron app bundle descendants to find the model-serving process&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Manual override&lt;/strong&gt;: Process picker lets you select any process by name, sorted by memory usage&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Metrics degrade gracefully - if IOReport access is unavailable (e.g., in a VM), Anubis still shows inference-derived metrics.&lt;/p&gt;


&lt;h2&gt;
  
  
  Requirements
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;macOS 15.0&lt;/strong&gt; (Sequoia) or later&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Apple Silicon&lt;/strong&gt; (M1 / M2 / M3 / M4 / M5 +) - Intel is not supported&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;8 GB&lt;/strong&gt; unified memory minimum (16 GB+ recommended for larger models)&lt;/li&gt;
&lt;li&gt;At least one inference backend installed (Ollama recommended)&lt;/li&gt;
&lt;/ul&gt;


&lt;h2&gt;
  
  
  Getting Started
&lt;/h2&gt;
&lt;h3&gt;
  
  
  1. Install Ollama (or another backend)
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# macOS - install Ollama&lt;/span&gt;
brew &lt;span class="nb"&gt;install &lt;/span&gt;ollama

&lt;span class="c"&gt;# Start the server&lt;/span&gt;
ollama serve

&lt;span class="c"&gt;# Pull a model&lt;/span&gt;
ollama pull llama3.2:3b
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h3&gt;
  
  
  2. Build &amp;amp; Run Anubis
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/uncSoft/anubis-oss.git
&lt;span class="nb"&gt;cd &lt;/span&gt;anubis-oss/anubis
open anubis.xcodeproj
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;In Xcode:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Set your development team in &lt;strong&gt;Signing &amp;amp; Capabilities&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Build and run (&lt;code&gt;Cmd+R&lt;/code&gt;)&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Anubis will auto-detect Ollama on launch. Other backends can be added in Settings.&lt;/p&gt;
&lt;h3&gt;
  
  
  3. Run Your First Benchmark
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;Select a model from the dropdown&lt;/li&gt;
&lt;li&gt;Type a prompt or pick one from &lt;strong&gt;Presets&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Click &lt;strong&gt;Run&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Watch the metrics light up in real time&lt;/li&gt;
&lt;/ol&gt;
&lt;h3&gt;
  
  
  4. Submit to the Leaderboard
&lt;/h3&gt;

&lt;p&gt;After a benchmark completes, click the &lt;strong&gt;Upload&lt;/strong&gt; button in the benchmark toolbar to submit your results to the &lt;a href="https://devpadapp.com/leaderboard.html" rel="noopener noreferrer"&gt;community leaderboard&lt;/a&gt;. Enter a display name and your run will appear in the rankings — no account required. Only performance metrics and hardware info are submitted; response text is never uploaded.&lt;/p&gt;


&lt;h2&gt;
  
  
  Building from Source
&lt;/h2&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Clone&lt;/span&gt;
git clone https://github.com/uncSoft/anubis-oss.git
&lt;span class="nb"&gt;cd &lt;/span&gt;anubis-oss/anubis

&lt;span class="c"&gt;# Build via command line&lt;/span&gt;
xcodebuild &lt;span class="nt"&gt;-scheme&lt;/span&gt; anubis-oss &lt;span class="nt"&gt;-configuration&lt;/span&gt; Debug build

&lt;span class="c"&gt;# Run tests&lt;/span&gt;
xcodebuild &lt;span class="nt"&gt;-scheme&lt;/span&gt; anubis-oss &lt;span class="nt"&gt;-configuration&lt;/span&gt; Debug &lt;span class="nb"&gt;test&lt;/span&gt;

&lt;span class="c"&gt;# Or just open in Xcode&lt;/span&gt;
open anubis.xcodeproj
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h3&gt;
  
  
  Dependencies
&lt;/h3&gt;

&lt;p&gt;Resolved automatically by Swift Package Manager on first build:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Package&lt;/th&gt;
&lt;th&gt;Purpose&lt;/th&gt;
&lt;th&gt;License&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://github.com/groue/GRDB.swift" rel="noopener noreferrer"&gt;GRDB.swift&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;SQLite database&lt;/td&gt;
&lt;td&gt;MIT&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://github.com/sparkle-project/Sparkle" rel="noopener noreferrer"&gt;Sparkle&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Auto-update framework&lt;/td&gt;
&lt;td&gt;MIT&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Swift Charts&lt;/td&gt;
&lt;td&gt;Data visualization&lt;/td&gt;
&lt;td&gt;Apple&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;


&lt;h2&gt;
  
  
  Architecture
&lt;/h2&gt;

&lt;p&gt;Anubis follows MVVM with a layered service architecture:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;┌─────────────────────────────────────────────────────────────┐
│                    PRESENTATION LAYER                       │
│   BenchmarkView    ArenaView    VaultView    SettingsView   │
├─────────────────────────────────────────────────────────────┤
│                      SERVICE LAYER                          │
│   MetricsService   InferenceService   ModelService   Export │
├─────────────────────────────────────────────────────────────┤
│                    INTEGRATION LAYER                        │
│  OllamaClient  OpenAICompatibleClient  IOReportBridge  ProcessMonitor │
├─────────────────────────────────────────────────────────────┤
│                    PERSISTENCE LAYER                        │
│   SQLite (GRDB)              File System                    │
└─────────────────────────────────────────────────────────────┘
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Views&lt;/strong&gt; display data and delegate to &lt;strong&gt;ViewModels&lt;/strong&gt;. ViewModels coordinate &lt;strong&gt;Services&lt;/strong&gt;. Services are stateless and use async/await. &lt;strong&gt;Integrations&lt;/strong&gt; are thin adapters wrapping external systems (Ollama API, IOReport, etc.).&lt;/p&gt;

&lt;h3&gt;
  
  
  Project Structure
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;anubis/
├── App/                    # Entry point, app state, navigation
├── Features/
│   ├── Benchmark/          # Performance dashboard
│   ├── Arena/              # A/B model comparison
│   ├── Vault/              # Model management
│   └── Settings/           # Backend config, about, help, contact
├── Services/               # MetricsService, InferenceService, ExportService
├── Integrations/           # OllamaClient, OpenAICompatibleClient, IOReportBridge, ProcessMonitor
├── Models/                 # Data models (BenchmarkSession, ModelInfo, etc.)
├── Database/               # GRDB setup &amp;amp; migrations
├── DesignSystem/           # Theme, colors, reusable components
├── Demo/                   # Demo mode for App Store review
└── Utilities/              # Formatters, constants, logger
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Backend Abstraction
&lt;/h3&gt;

&lt;p&gt;All inference backends implement a shared protocol, making it straightforward to add new ones:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight swift"&gt;&lt;code&gt;&lt;span class="kd"&gt;protocol&lt;/span&gt; &lt;span class="kt"&gt;InferenceBackend&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;var&lt;/span&gt; &lt;span class="nv"&gt;id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kt"&gt;String&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="k"&gt;get&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;var&lt;/span&gt; &lt;span class="nv"&gt;displayName&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kt"&gt;String&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="k"&gt;get&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;var&lt;/span&gt; &lt;span class="nv"&gt;isAvailable&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kt"&gt;Bool&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="k"&gt;get&lt;/span&gt; &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="kd"&gt;func&lt;/span&gt; &lt;span class="nf"&gt;listModels&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;throws&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="kt"&gt;ModelInfo&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="kd"&gt;func&lt;/span&gt; &lt;span class="nf"&gt;generate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kt"&gt;String&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;parameters&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kt"&gt;GenerationParameters&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="kt"&gt;AsyncThrowingStream&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kt"&gt;InferenceChunk&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;Error&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Data Storage
&lt;/h2&gt;

&lt;p&gt;All data is stored locally - nothing leaves your machine.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Data&lt;/th&gt;
&lt;th&gt;Location&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Database&lt;/td&gt;
&lt;td&gt;&lt;code&gt;~/Library/Application Support/Anubis/anubis.db&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Exports&lt;/td&gt;
&lt;td&gt;Generated on demand (CSV, Markdown)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Preferences&lt;/td&gt;
&lt;td&gt;UserDefaults&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Troubleshooting
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Ollama shows "Disconnected"
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Make sure Ollama is running&lt;/span&gt;
ollama serve

&lt;span class="c"&gt;# Verify it's accessible&lt;/span&gt;
curl http://localhost:11434/api/tags
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  No GPU metrics
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;GPU metrics require IOReport access via IOKit&lt;/li&gt;
&lt;li&gt;Some configurations or VMs may not expose these APIs&lt;/li&gt;
&lt;li&gt;Anubis will still show inference-derived metrics (tokens/sec, TTFT, etc.)&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  High memory usage
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Use &lt;strong&gt;Sequential&lt;/strong&gt; mode in Arena to run one model at a time&lt;/li&gt;
&lt;li&gt;Unload unused models via Arena &amp;gt; Models &amp;gt; Unload All&lt;/li&gt;
&lt;li&gt;Choose smaller quantized models (Q4_K_M over Q8_0)&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Model not appearing
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Click &lt;strong&gt;Refresh Models&lt;/strong&gt; in Settings&lt;/li&gt;
&lt;li&gt;Ensure the model is pulled: &lt;code&gt;ollama pull &amp;lt;model-name&amp;gt;&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;For OpenAI-compatible backends, verify the server is running and the URL is correct&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Contributing
&lt;/h2&gt;

&lt;p&gt;Contributions are welcome. A few guidelines:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Follow the existing patterns&lt;/strong&gt; - MVVM, async/await, guard-let over force-unwrap&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Keep files under 300 lines&lt;/strong&gt; - split if larger&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;One feature per PR&lt;/strong&gt; - small, focused changes are easier to review&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Test services and integrations&lt;/strong&gt; - views are harder to unit test, but services should have coverage&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Handle errors gracefully&lt;/strong&gt; - always provide &lt;code&gt;errorDescription&lt;/code&gt; and &lt;code&gt;recoverySuggestion&lt;/code&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Adding a New Backend
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;Create a new file in &lt;code&gt;Integrations/&lt;/code&gt; implementing &lt;code&gt;InferenceBackend&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Register it in &lt;code&gt;InferenceService&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Add configuration UI in &lt;code&gt;Settings/&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;That's it - the rest of the app works through the protocol&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  Support the Project
&lt;/h2&gt;

&lt;p&gt;If Anubis is useful to you, consider &lt;a href="https://ko-fi.com/jtatuncsoft/tip" rel="noopener noreferrer"&gt;buying me a coffee on Ko-fi&lt;/a&gt; or &lt;a href="https://github.com/sponsors/uncSoft" rel="noopener noreferrer"&gt;sponsoring on GitHub&lt;/a&gt;. It helps fund continued development and new features.&lt;/p&gt;

&lt;p&gt;A sandboxed, less feature rich version is also available on the &lt;a href="https://apps.apple.com/us/app-bundle/the-architects-toolkit/id1874965091?mt=12" rel="noopener noreferrer"&gt;Mac App Store&lt;/a&gt; if you prefer a managed install.&lt;/p&gt;




&lt;h2&gt;
  
  
  License
&lt;/h2&gt;

&lt;p&gt;GPL-3.0 License — see &lt;a href="https://dev.toLICENSE"&gt;LICENSE&lt;/a&gt; for details.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Other projects:&lt;/strong&gt; &lt;a href="https://www.devpadapp.com" rel="noopener noreferrer"&gt;DevPad&lt;/a&gt; · &lt;a href="https://www.devpadapp.com/nabu.html" rel="noopener noreferrer"&gt;Nabu&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>opensource</category>
      <category>swift</category>
      <category>apple</category>
    </item>
  </channel>
</rss>
