DEV Community: Julien Simon

Judging AFM-4.5B with DeepSeek-R1 670B

Julien Simon — Thu, 03 Jul 2025 17:40:49 +0000

In this fun demonstration, you can witness the impressive capabilities of Arcee AI’s AFM-4.5B-Preview, Arcee’s first foundation model, across diverse domains. The demo showcases the model tackling complex knowledge questions, creating sophisticated creative writing, and addressing specialized domain-specific applications in healthcare, finance, technology, and education.

Each response is independently evaluated by DeepSeek-R1 670B, providing expert assessment of answer quality. Will this small preview model be up to the task? Hey, what do you think? 😃

3 production-ready models released by Arcee AI on Hugging Face

Julien Simon — Thu, 03 Jul 2025 12:59:45 +0000

In this video, I introduce and demonstrate three production-grade models that Arcee AI recently opened and released on Hugging Face.

Arcee-SuperNova-v1 (70B) is a merged model built from multiple advanced training approaches. At its core is a distilled version of Llama-3.1–405B-Instruct into Llama-3.1–70B-Instruct, using our DistillKit to preserve instruction-following strengths while reducing size.

Virtuoso-Large (72B) is our most powerful and versatile general-purpose model, designed to excel at handling complex and varied tasks across domains. With state-of-the-art performance, it offers unparalleled capability for nuanced understanding, contextual adaptability, and high accuracy.

Caller (32B) is a robust model engineered for seamless integrations and optimized for managing complex tool-based interactions and API function calls. Its strength lies in precise execution, intelligent orchestration, and effective communication between systems, making it indispensable for sophisticated automation pipelines.

Homunculus 12B and GLM-4–32B-Base-32K: 2 new Arcee AI research-oriented models

Julien Simon — Thu, 03 Jul 2025 11:33:08 +0000

In this new video, I introduce two new research-oriented models that Arcee AI recently released on Hugging Face.

Homunculus is a 12 billion-parameter instruction model distilled from Qwen3–235B onto the Mistral AI Nemo backbone. It was purpose-built to preserve Qwen’s two-mode interaction style — /think (deliberate chain-of-thought) and /nothink (concise answers) — while running on a single consumer GPU, and even on CPU as demonstrated in the video.

GLM-4–32B-Base-32K is an enhanced version of Tsinghua University’s THUDM’s GLM-4–32B-Base-0414, specifically engineered to offer robust performance over an extended context window. While the original model’s capabilities degraded after 8,192 tokens, this version maintains strong performance up to a 32,000-token context, making it ideal for tasks requiring long-context understanding and processing.

Introducing AFM, the Arcee Foundation Model

Julien Simon — Wed, 18 Jun 2025 17:25:50 +0000

Today, we’re thrilled to unveil the Arcee Foundation Model, a new family of GenAI models built from the ground up for enterprise reality.

The first release — AFM-4.5B — is a 4.5-billion-parameter frontier model that delivers excellent accuracy, strict compliance, and very high cost-efficiency.

In short: enterprise-grade intelligence that can run anywhere — on a smartphone, at the edge, or in the cloud.

For a quick taste, you can test AFM-4.5B in our playground and on Together.ai.

For a deeper dive into the model’s training pipeline and benchmarks, details are available in our technical blog post.

PS: As we’ll now focus on our AFM foundation models, and because we love open-source, we’re opening up access to our previously closed-source language models. Details in the tech blog post.

Introducing AFM, the Arcee Foundation Model — June 18

Julien Simon — Wed, 04 Jun 2025 09:45:34 +0000

Introducing AFM, the Arcee Foundation Model — June 18

Join us live on June 18 for the launch of AFM, the Arcee Foundation Model.

Over the last two years, Arcee has proven the value of its post-training stack (MergeKit, DistillKit, EvolKit, Spectrum). Every open-source model we released on Hugging Face took the top spot on the OpenLLM Leaderboard in its size category.

Now, Arcee is moving into building foundation models, and AFM, our first release, is about to redefine SLM performance. It outperforms all models in its size range, surpassing those from all top model builders. It also outperforms much larger models.

Join us on June 18, we’ll tell you all about it.

⭐️⭐️⭐️ YouTube stream: https://youtube.com/live/QQv5P7jsc_E

⭐️⭐️⭐️ LinkedIn stream: https://www.linkedin.com/events/7335944792935161856/

⭐️⭐️⭐️ X stream: https://x.com/julsimon

AI at the edge — June 10, 2025 — live from Cisco Live in San Diego, CA!

Julien Simon — Thu, 29 May 2025 10:31:57 +0000

AI at the edge — June 10, 2025 — live from Cisco Live in San Diego, CA!

Join us on June 10 for a fun session on AI at the edge, live from the floor of Cisco Live in San Diego, CA!

We’ll demonstrate and discuss an SLM-powered retail assistant running on Intel Xeon CPUs in a Cisco UCS server.

Thanks to a chatbot interface powered by open-source small language models and real-time data analytics, store associates can interact naturally through voice or text, receiving immediate information about product availability from Chooch’s inventory system or crowd density from WaitTime’s analytics platform.

AI at the edge is real, and you’ll see it with your own eyes. Sign up now!

You can also watch us live on YouTube:

Enriching Inventory Data with Arcee Conductor

Julien Simon — Mon, 12 May 2025 14:09:33 +0000

This post was initially published on the Arcee AI [_blog](https://www.arcee.ai/blog/enriching-inventory-data-with-arcee-conductor)._

In the world of inventory management, the accuracy and richness of data are paramount to help users and customers quickly and easily locate the right item every time. In mission-critical domains like healthcare, these items are not commodities; they are critical tools that can mean the difference between life and death. Ensuring that every item in the inventory is accurately described, categorized, and up-to-date is not just a best practice — it’s a necessity.

‍

Inventory Data is Often Hard to Understand

Unfortunately, many inventory management systems, particularly legacy systems, suffer from incomplete and inconsistent data. They often use heavily abbreviated descriptions due to application and database constraints. As a result, descriptions are hard for users to understand. Here’s an example you could find in a hospital inventory management system

"Item": "IV START KIT W/CHG SKIN PREP CENTRAL LINE"

If you’re an experienced doctor or nurse, you may figure it out. However, junior staff and non-medical staff would certainly be confused. Abbreviated descriptions often lack the necessary detail to distinguish between similar items. This ambiguity can lead to errors, especially in critical environments like hospitals, where the wrong tool can have severe consequences.

The lack of detail also makes it difficult to implement user-friendly features in IT applications and severely limits search functionalities. Users may have to sift through multiple results to find the exact item they need whichwastes valuable time and increases the risk of errors. Personalized recommendations are also difficult to implement without additional data.

Language Models Can Make Inventory Data Human-Readable

Thanks to data enrichment, we can add detailed descriptions and additional fields that significantly improve the user experience of inventory systems. Here’s how the example above can be improved, with a human-readable description and information on applications and risks.

"Item": "IV START KIT W/CHG SKIN PREP CENTRAL LINE"

"Description": "An IV start kit that includes a chlorhexidine skin prep solution, designed for easy insertion and secure maintenance of central lines."

"Applications": "Insertion of central venous catheters", "Preparing the insertion site for IV access", "Infection control during line placement"

"Risks": "Potential for skin irritation from chlorhexidine", "Risk of infection if aseptic technique is not followed", "Allergic reactions to components of the kit"

As we know by now, language models excel at understanding complex data. With the right prompt, they can easily generate the rich data we need to build better user experiences in inventory systems.

One way to get started with an inventory system project would be to pick the “best” large language model (LLM) available today, and it could certainly do an excellent job. However, LLMs are notoriously slow and expensive, making it difficult to scale and customize the data enrichment process. Hospitals routinely manage thousands, sometimes tens of thousands, of unique inventory items. If you take a look at other industries like construction, food production, and of course, e-commerce, with a hospital you could be looking at 10x that number and even more.

This is where Arcee Conductor can help.

Arcee Conductor Picks the Best Model for Each Prompt

Arcee Conductor is a powerful inference platform based on a collection of high-quality small and large language models (LLMs). The beauty and efficiency of Arcee Conductor lie in its ability to select in real-time the most appropriate model for each query, ensuring that the output is both high-quality and cost-effective.

Simple queries will automatically go to small language models (SLMs), delivering faster and more cost-effective inference than with an LLM. Only the more complex queries will go to LLMs, and you’ll only encounter their slower generation and higher cost when it’s truly necessary.

Let’s see Conductor in action.

Arcee Conductor Can Efficiently Enrich Inventory Data

We built a small demonstration to show how to enrich hospital inventory data with Arcee Conductor. The step-by-step Jupyter notebook is available on GitLab.

First, we generate 100 item descriptions similar to what you would find in a hospital inventory system. The file containing these descriptions is also available on GitLab. Here are a few examples.

“MASK SURG 3PLY ELASTIC EAR LOOP PLEATED DISP BFE98% ASTM2”

“GLOVE EXAM NITR PWD-FREE SML NONSTER TXTRD FINGRTIP”

“SYRINGE 3ML LUER-LOK TIP STERILE LATEX-FREE DISP”

Then, we send each item description to the Arcee Conductor API configured in auto mode, letting it select the best SLM or LLM for each query. As the API is compatible with the OpenAI API, we can send our queries with the popular OpenAI client.

Based on the abbreviated item description, we ask the model to write:

A human-readable description, in 1–2 sentences,
A list of applications,
A list of risks,

Once the process is over, we have successfully enriched the data! The enriched file is available on GitLab, and the examples above now look like this.

"Item": "MASK SURG 3PLY ELASTIC EAR LOOP PLEATED DISP BFE98% ASTM2"

"Description": "A surgical mask made of three layers of material, featuring elastic ear loops and pleated design, with a bacterial filtration efficiency of 98% as tested by ASTM2 standards."

"Applications": ["Protection against airborne particles during surgical procedures", "Prevention of cross-contamination in healthcare settings", "Use in environments requiring high-level respiratory protection"]

"Risks": ["Potential skin irritation from prolonged wear", "Risk of decreased breathability if wet or soiled", "Inadequate protection if not worn correctly or if damaged"]

"Item": "GLOVE EXAM NITR PWD-FREE SML NONSTER TXTRD FINGRTIP"

"Description": "A powder-free, textured, nitrile exam glove designed for sensitive skin, providing protection during medical examinations."

"Applications": ["Use in clinical settings for patient examinations", "Procedures requiring tactile sensitivity"]

"Risks": ["Risk of allergic reactions to nitrile", "Potential for tears if not handled carefully"]}

"Item": "SYRINGE 3ML LUER-LOK TIP STERILE LATEX-FREE DISP"

"Description": "A sterile, latex-free syringe with a 3ml capacity and a Luer-Lok tip, designed for precise and secure liquid delivery."

"Applications": ["Injection of medications", "Aspiration of fluids", "Administration of vaccines"]

"Risks": ["Risk of contamination if not properly sterilized", "Potential for air embolism if not used correctly", "Risk of needle stick injury if not handled properly"]}

You can see that it’s much simpler to understand what the items are. The additional high-quality data also makes it easier to build efficient search or recommendation features.

Now, let’s look at how cost-efficient this process is.

You Can Save up to 75% with Arcee Conductor

For our 100 examples, here’s the breakdown of models selected by Arcee Conductor:

Arcee Virtuoso-Large, our 32-billion parameter general-purpose model: 86 times
GPT-4.1: 7 times
Claude Sonnet 3.7: 6 times
Arcee Virtuoso-Medium, our 14-billion parameter general-purpose model: once

Conductor was able to get the job done with Arcee SLMs 87% of the time. Not only does this reduce processing time by about 50%, but it also translates into significant cost savings. With a total cost of $0.0507, Arcee Conductor is respectively 75% and 56% more cost-effective than using only Sonnet 3.7 ($0.201) or GPT-4.1 ($0.1157).

You may wonder, how did Arcee SLMs perform? Did they produce high-quality data similar to what the LLMs would have generated? Of course, we recommend that you run your own evaluation, but here’s a side-by-side comparison that is representative of the performance of our SLMs for this demo. We used the same prompt for the two models on this item description: “MASK SURG 3PLY ELASTIC EAR LOOP PLEATED DISP BFE98% ASTM2”

Description
Virtuoso-Large: A disposable surgical mask with elastic ear loops, pleated design, and a bacterial filtration efficiency of 98%, meeting ASTM level 2 standards.
Sonnet 3.7: A disposable three-ply surgical mask with elastic ear loops, pleated design, and 98% bacterial filtration efficiency that meets ASTM Level 2 standards.
Use cases
Virtuoso-Large: “Protecting healthcare workers during medical procedures”
“Preventing the spread of respiratory infections in clinical settings”
“Providing a barrier against large droplets and splashes”
Sonnet 3.7: “General medical procedures”
“Healthcare settings”
“Patient care areas”
“Protection against respiratory droplets”
“Surgical environments”
Risks
Virtuoso-Large: “Potential for reduced effectiveness if not properly fitted or worn”
“Risk of contamination if reused or not disposed of properly”
“May cause skin irritation or allergic reactions in some users.”
Sonnet 3.7: “Not sufficient for aerosol-generating procedures”
“May not provide adequate protection against airborne pathogens”
“Single-use only”
“Improper fit may reduce effectiveness”
“Does not create a complete seal around face”
Cost
Virtuoso-Large: $0.00026685
Sonnet 3.7: $0.002595
Time
Virtuoso-Large: 1.87 second
Sonnet 3.7: 3.2 seconds

‍As you can see, the results are extremely close, and it’s hard to justify the 10x higher cost of Sonnet and its 70% slower generation time.

Conclusion

Using Arcee Conductor for data enrichment can transform your inventory management systems, and generally, any system that could benefit from better, richer data and metadata. The automatic model selection ensures that you are always using the most appropriate model, and the cost savings are very significant. We encourage you to try it!

We’d love to hear from you and see how we can help. Don’t hesitate to contact sales@arcee.ai or submit a request to meet with our book a demo form.

Resources

Learn more about Arcee Conductor.
Sign up for Arcee Conductor and get $20 of free inference credits.
Read the Arcee Conductor documentation.

Grab the notebook and sample files on GitLab.

Announcing Arcee AI AnyMCP

Julien Simon — Thu, 08 May 2025 16:42:53 +0000

We’re thrilled to announce the first release of *Arcee AI * AnyMCP!

At Arcee, we believe in making powerful technology accessible and easy to use. That’s why we’ve developed Arcee AI AnyMCP, a groundbreaking platform that allows you to remotely deploy and manage thousands of MCP (Multi-Client Protocol) servers in seconds. Whether you’re a developer, a gamer, or a tech enthusiast, Arcee AI AnyMCP is designed to streamline your workflow and enhance your experience.

Why Arcee AI AnyMCP?

In the words of the King, “🎶 a little less conversation, a little more action 🎶.” We couldn’t agree more. The world of MCP can be complex, but it doesn’t have to be. With Arcee AI AnyMCP, you can focus on what matters most — action and innovation

Here are the key features:

One-Click Deployment : Deploy any MCP server with a single click and start using it immediately with any MCP-compatible client.
Seamless Integration : Integrate seamlessly with your existing tools and workflows.
Remote Deployment & Management : Deploy and manage thousands of MCP servers with just a few clicks.
Compatibility : Use Arcee AI AnyMCP with Claude Desktop or any other MCP-compatible client.
Scalability : Supports thousands of MCP servers. If your server isn’t listed, just request it!
100% Free : Yes, you read that right! Arcee AI AnyMCP is completely free to use right now.

Get Started Today!

Ready to experience the power of Arcee AI AnyMCP? Head over to https://mcp.arcee.ai/to get started.

Let’s make the world of MCP more accessible and exciting together! 🤘

We can’t wait to see what you create with Arcee AI AnyMCP. Try it now and share your thoughts with us. Your feedback is invaluable as we continue to improve and expand our platform.

Happy deploying! 🚀

Arcee AI webinar: routing your function calling and reasoning queries with Arcee Conductor

Julien Simon — Wed, 07 May 2025 08:31:48 +0000

In this video, we show you how Arcee Conductor can now automatically route each function calling or reasoning query to the best SLM/LLM, efficiently delivering precise and cost-effective results for any task.

Model routing with reasoning queries in Arcee Conductor

Julien Simon — Tue, 06 May 2025 10:21:43 +0000

Arcee Conductor now supports routing for reasoning models. You can automatically each one of your reasoning queries to the best and most effective SLM/LLM 😊

Arm podcast

Julien Simon — Mon, 21 Apr 2025 18:18:23 +0000

The podcast I recently recorded with Arm is now available on all major platforms. Tune in if you want to learn about the evolution of small language models, the significance of CPU-based AI inference, and what Arcee AI is doing in that space.

Arm Viewpoints: Small language models, big ambitions

Model routing for function calling with Arcee Conductor

Julien Simon — Thu, 10 Apr 2025 10:16:39 +0000

In this video, we show you how to run function calling with Arcee Conductor. This allows you to automatically invoke external APIs and tools, and use their output for text generation while making sure you use the best and most cost-effective SLM/LLM for each query. In this particular example, I run API calls on Yahoo Finance.