DEV Community

Cover image for 🚀 Why Your ML Service Needs Rust + CatBoost: A Setup Guide That Actually Works
Andrey Alekseev
Andrey Alekseev

Posted on

🚀 Why Your ML Service Needs Rust + CatBoost: A Setup Guide That Actually Works

Let’s talk about a problem that’s been bothering ML teams for a while now. When we’re working with batch processing, Python does the job just fine. But real-time serving? That’s where things get interesting.


I ran into this myself when building a service that needed to stay under 50ms latency. Even without strict latency requirements, everything might seem fine at first, but as the service grow, Python starts showing its limitations. Variables would randomly turn into None, and without type checks, tracking down these issues becomes a real headache.

Right now, we don’t have a go-to solution for real-time model serving. Teams often turn to tools like KServe or BentoML, but this means dealing with more moving parts — more pods to watch, extra network calls slowing things down.

Possible ways of serving the model

What about other languages? C++ is fast and works with pretty much every ML library out there, but let’s be real — building and maintaining a C++ backend service is not something most teams want to take on.

It would be great if we could build models in Python but serve them in a faster language. ONNX tries to solve this, and it works relatively great for neural networks. But when I tried using it with CatBoost, handling categorical features turned into a challenge — the support just isn’t there yet.

This brings us to Rust, which offers an interesting middle ground:

  1. It’s just as fast as C++, but easier to my liking
  2. The type system keeps your business logic clean and predictable
  3. The compiler actually helps you write better code instead of just pointing out errors
  4. The ML ecosystem is growing, with support from big names like Microsoft

Working with Official Catboost in Rust

Good news — there’s actually an official Catboost crate for Rust! But before you get too excited, let me tell you about quirks that I discovered along the way.

The tricky part isn’t the Rust code itself — it’s getting the underlying C++ libraries in place. You’ll need to compile Catboost from source, and getting the environment right for this is most difficult part.

Catboost team provides their own Ubuntu-based image for building it from source, which sounds great. But what if you’re planning to run your service on Debian to keep things light? Then you better build Catboost on the same version of Debian you’ll use for serving, otherwise you might run into compatibility issues.

Let’s talk about why this matters in practice. The Ubuntu build image needs a hefty 4+ GB of memory to work with. But if you set up a custom Debian build correctly, you can bring that down to just 1 GB. And when you’re running lots of services in the cloud, these numbers of extra memory usage start adding up in your monthly bill.

Setting Up Your Rust + Catboost Build Environment

Let me walk you through setting up a Debian-based environment for Catboost. I’ll explain not just what to do, but why each step matters.

Possible (and recommended) file structure

Installing Catboost

On the Rust side it happens as simple as with other crates in Rust:

[package]
name = "MLApp"
version = "0.1.0"
edition = "2021"

[dependencies]
catboost = { git = "https://github.com/catboost/catboost", rev = "0bfdc35"}
Enter fullscreen mode Exit fullscreen mode

However Catboost crate does not have precompiled C/C++ bindings and during installation (cargo build) will try to compile it from sources specifically for your environment. So let’s set up our environment.

Starting with the Right Base Image

First, we’re going with debian:bookworm-slim as our base image. Why? It comes with CMake 3.24+, which we need for our build process. The ‘slim’ variant keeps our image size down, which is always nice.

Setting Up the C++ Build Environment

We need a bunch of C++ packages, and while I’m using version 16 in our setup, you actually have flexibility here. Any version that supports -mno-outline-atomics will work fine.

Let’s break down our package installation into logical groups.

Setting Up Package Sources

First, we need to get our package sources in order. This part is crucial for getting the right LLVM tools:

RUN apt-get update && DEBIAN_FRONTEND=noninteractive apt-get install -y - no-install-recommends \
 # will use it to download packages
 wget \ 
 # cryptographic package to verify LLVM sources 
 gnupg \
 # check Debian version to get correct LLVM package
 lsb-release \
 # package management helper
 software-properties-common
Enter fullscreen mode Exit fullscreen mode

Then we need to add LLVM’s repository.

RUN wget -O - https://apt.llvm.org/llvm-snapshot.gpg.key | apt-key add - \
 && echo "deb http://apt.llvm.org/$(lsb_release -sc)/ llvm-toolchain-$(lsb_release -sc)-16 main" \
 >> /etc/apt/sources.list.d/llvm.list
Enter fullscreen mode Exit fullscreen mode

The main step of installing packages

We need quite a few packages, and I recommend to organize them by purpose — it makes maintenance so much easier:

RUN apt-get update && DEBIAN_FRONTEND=noninteractive apt-get install -y - no-install-recommends \
 # Basic build essentials
 build-essential \
 pkg-config \

 # Core development packages
 libssl-dev \
 cmake \
 ninja-build \
 python3-pip \

 # LLVM toolchain - version 16 works great, but any version with 
 # -mno-outline-atomics support will do
 clang-16 \
 libc++-16-dev \
 libc++abi-16-dev \
 lld-16 \

 # Don't forget git!
 git
Enter fullscreen mode Exit fullscreen mode

Next step cost me some time to figure out. Catboost expects to find clang in /usr/bin/clang, but our installation puts it in /usr/bin/clang-16. That’s why we have this bit:

RUN ln -sf /usr/bin/clang-16 /usr/bin/clang && \
 ln -sf /usr/bin/clang++-16 /usr/bin/clang++ && \
 ln -sf /usr/bin/lld-16 /usr/bin/lld
Enter fullscreen mode Exit fullscreen mode

And do not forget to set up environment variables

ENV CC=/usr/bin/clang
ENV CXX=/usr/bin/clang++
ENV LIBCLANG_PATH=/usr/lib/llvm-16/lib
ENV LLVM_CONFIG_PATH=/usr/bin/llvm-config-16
Enter fullscreen mode Exit fullscreen mode

Managing Dependencies

We need Conan (version 2.4.1+) for handling C++ dependencies. A word of caution about the installation:

RUN pip3 install --break-system-packages "conan==2.11.0"
Enter fullscreen mode Exit fullscreen mode

That --break-system-packages flag might look scary, but it’s actually the easiest way I found to install Python packages system-wide in newer Debian versions. Besides we won’t be using much Python anyway in our build image.

Smart Build Strategy

Here’s a trick that’ll save you tons of build time during active development stage. Split your build into two steps:

  1. First, build just the dependencies:
COPY ./Cargo.* ./
RUN mkdir src && \
 echo "fn main() {}" > src/main.rs && \
 RUSTFLAGS="-C codegen-units=1" cargo build - release
Enter fullscreen mode Exit fullscreen mode

Important note here. You need that RUSTFLAGS="-C codegen-units=1" flag because it ensures that C++ and Rust play along.

  1. Then build your actual application:
COPY ./src src
RUN cargo build - release
Enter fullscreen mode Exit fullscreen mode

This way, Docker caches the dependency build, and you only rebuild your app code when it changes. Much faster!

Build flow

A Critical Warning About Memory

This is important: during the C++ build steps, you’ll need machine with 20+ GB of memory (I used 32Gb). And here’s the part that cost me almost a day of debugging — if you don’t have enough memory, you won’t get a clear error message (or any to be honest). Instead, your build will mysteriously timeout, leaving you wondering what went wrong. I learned this one the hard way!

Wrapping It Up

Now we have a working Rust environment with Catboost that can handle all the good stuff: categories, text data, embeddings. Getting here wasn’t exactly easy.

Next time we’ll cover:

  • Building an Axum web service
  • Smart model loading patterns
  • Real-world performance tricks I learned along the way

So we’ll turn this foundation into something that can actually serve models in production! I ran into some interesting problems while figuring this out, like accidentally loading the same model multiple times for each handler call.

And if you’ve tried this out and hit any weird issues, let me know. It’s always interesting to hear what problems other people run into.

Full Dockerfile

Sentry image

See why 4M developers consider Sentry, “not bad.”

Fixing code doesn’t have to be the worst part of your day. Learn how Sentry can help.

Learn more

Top comments (0)

The Most Contextual AI Development Assistant

Pieces.app image

Our centralized storage agent works on-device, unifying various developer tools to proactively capture and enrich useful materials, streamline collaboration, and solve complex problems through a contextual understanding of your unique workflow.

👥 Ideal for solo developers, teams, and cross-company projects

Learn more

👋 Kindness is contagious

Please leave a ❤️ or a friendly comment on this post if you found it helpful!

Okay