Sara Han for Pruna AI

Posted on Nov 17

Introducing Pruna 0.3.0 - The Upgrade You’ve Been Waiting For

#ai #opensource #machinelearning #news

Today, we are excited to announce that we have released the long-awaited Pruna 0.3.0.

We’ve restructured our internal framework to make algorithm management more flexible and scalable, setting the stage for even more powerful algorithm support going forward.

Why the Refactor

In previous versions, certain algorithm groups — such as cachers or quantizers — were tightly coupled to the package’s structure. This rigid grouping made it difficult to introduce new types of algorithms or to combine them in flexible ways.

Starting with Pruna 0.3.0, we’ve reworked this system so that such classifications are no longer hard constraints. Instead, they now serve as supplementary metadata, enabling a more modular, composable, and future-proof design. This refactor lays the groundwork for integrating new optimization techniques and custom pipelines without structural limitations.

This is a ground refactorization that enables two things:

Instead of applying algorithms in a fixed way defined by their group, compression algorithms can be applied in flexible orders regardless of their group.
Instead of constraining one algorithm per group in the SmashConfig, multiple algorithms from the same group can be combined as long as they are marked as compatible.

What This Means for You

You don’t need to do anything special — just upgrade to the new version and you’ll be ready to go!

pip install --upgrade pruna

Once upgraded, everything will work out of the box. While we’ve slightly refined how configuration is defined (for the better!), the old interface would still be valid. You can find all the details in the next section.

What Changed

A More Flexible Algorithm Interface

This release introduces a more flexible configuration interface for algorithms.

Before, you had to define your SmashConfig step by step.

from pruna import SmashConfig

config = SmashConfig()
config["compiler"] = "torch_compile"
config["cacher"] = "deepcache"

Now, with this release, you can do it all in one line with a list of the algorithm names — faster and simpler.
```
from pruna import SmashConfig

config = SmashConfig(["torch_compile", "deepcache"])
```

A More Flexible Hyperparameters Interface

This release introduces a more flexible configuration interface for hyperparameters.

Before, if you needed to specify algorithm parameters, you no longer had to go through the tedious process of setting each one.

from pruna import SmashConfig

config = SmashConfig()
config["compiler"] = "torch_compile"
config["torch_compile_fullgraph"] = True
config["torch_compile_mode"] = "max-autotune"
config["quantizer"] = "hqq"
config["hqq_weight_bits"] = 4
config["hqq_compute_dtype"] = "torch.bfloat16"

Now, you can now use a dictionary-style configuration to define detailed, per-algorithm parameters all at once.

from pruna import SmashConfig

config = SmashConfig({
      "hqq":
          {
              "weight_bits": 4,
              "compute_dtype": "torch.bfloat16"
          },
      "torch_compile":
          {
          "fullgraph": True,
          "mode": "max-autotune"
          }
})

A More Flexible Algorithm Ordering and Compatibility

Another major change is how the algorithm application order is determined.

Previously, the execution sequence was dictated by the hierarchy of algorithm groups and a global ordering based on these groups. In 0.3.0, this has been replaced by a more atomic and declarative system: each algorithm now specifies its own compatibility rules and ordering constraints. If an algorithm is compatible with another one, it will now always specify in which order they can be executed.

This makes the algorithm pipeline more self-organizing, robust to new extensions, and capable of resolving valid combinations dynamically.

New documentation

To make sure you have everything you need, we’ve also updated our documentation. You can now easily find the latest guides and tutorials under the “Open Source” tab.

Get Started Now

Enjoy the Quality and Efficiency!

Compress your own models with Pruna and give us a ⭐ to show your support!
Try our endpoints in Replicate, Wiro or Segmind with just one click.
Stay up to date with the latest AI efficiency research on our blog, explore our materials collection, or dive into our courses.
Join the conversation and stay updated in our Discord community.

DEV Community