<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Pruna AI</title>
    <description>The latest articles on DEV Community by Pruna AI (@pruna-ai).</description>
    <link>https://dev.to/pruna-ai</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Forganization%2Fprofile_image%2F10301%2F3b7e98bd-7f51-44f5-935f-c0c097a2311d.png</url>
      <title>DEV Community: Pruna AI</title>
      <link>https://dev.to/pruna-ai</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/pruna-ai"/>
    <language>en</language>
    <item>
      <title>Pruna 0.3.2: More OSS Algos, More Ways to Optimize</title>
      <dc:creator>Sara Han</dc:creator>
      <pubDate>Wed, 11 Mar 2026 09:00:00 +0000</pubDate>
      <link>https://dev.to/pruna-ai/pruna-032-more-oss-algos-more-ways-to-optimize-5dp3</link>
      <guid>https://dev.to/pruna-ai/pruna-032-more-oss-algos-more-ways-to-optimize-5dp3</guid>
      <description>&lt;p&gt;It’s been almost a year since we open-sourced. Over that time, Pruna has grown quickly: more contributors, algorithms, families, tutorials, and optimized models. With &lt;strong&gt;v0.3.2&lt;/strong&gt;, open-sourcing many more of these algorithms is the natural next step.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;What Landed in 0.3.2&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;This release expands the ecosystem with support for a broad set of new algorithms and new algorithm families, improved compatibility across them, and a set of fixes that make the whole framework stronger.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;New algorithms and families:&lt;/strong&gt; Pruna 0.3.2 adds a broad new set of optimization building blocks to the OSS stack. This includes new &lt;strong&gt;compilers, kernels, pruners,&lt;/strong&gt; and entire new algorithm families such as &lt;strong&gt;Decoders&lt;/strong&gt;, &lt;strong&gt;Distillers&lt;/strong&gt;, &lt;strong&gt;Enhancers&lt;/strong&gt;, and &lt;strong&gt;Recoverers&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;More than just new algos:&lt;/strong&gt; The most important part of this release is not only the number of new algorithms, but how they fit into Pruna. 0.3.2 increases composability by allowing otherwise incompatible algorithms to be treated as compatible when they are applied to disjoint parts of a model.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;More tutorials:&lt;/strong&gt; The new release also brings more tutorials to help you learn how to make your model more efficient. So it makes it easier for you to discover what each method does, understand when to use it, and get started composing them in practice.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pruning bugs and maintenance:&lt;/strong&gt; This release is not only about new features, but it also includes important fixes and cleanup work that reinforce the core of Pruna. That includes pruning-related bug fixes, maintenance work across the codebase, and general improvements that make the new algorithms easier to use and more reliable in practice.&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;For more information, check the GitHub release &lt;a href="https://github.com/PrunaAI/pruna/releases/tag/v0.3.2" rel="noopener noreferrer"&gt;here&lt;/a&gt;.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Meet the New Algorithms and Families&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;One of the biggest updates in 0.3.2 is the &lt;a href="https://docs.pruna.ai/en/v0.3.2/compression.html" rel="noopener noreferrer"&gt;expansion of Pruna’s optimization&lt;/a&gt; core. &lt;/p&gt;

&lt;h3&gt;
  
  
  Expanding Existing Families
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Compilers&lt;/strong&gt;: &lt;a href="https://docs.pruna.ai/en/v0.3.2/compression.html#ipex-llm" rel="noopener noreferrer"&gt;ipex_llm&lt;/a&gt; and &lt;a href="https://docs.pruna.ai/en/v0.3.2/compression.html#x-fast" rel="noopener noreferrer"&gt;x_fast&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;These new compiler integrations expand the set of execution-level optimizations. You can use ipex-llm for PyTorch-based LLM inference on Intel CPUs and x-fast to speed up inference for any model using a combination of xformers, triton, cudnn, and torch tracing.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Kernels&lt;/strong&gt;: &lt;a href="https://docs.pruna.ai/en/v0.3.2/compression.html#ring-attn" rel="noopener noreferrer"&gt;ring_attn&lt;/a&gt; and &lt;a href="https://docs.pruna.ai/en/v0.3.2/compression.html#sage-attn" rel="noopener noreferrer"&gt;sage_attn&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This release introduces two important kernel-level additions. Ring attention brings distributed attention capabilities that help scale workloads across multiple devices, while sage attention adds a fast, memory-efficient attention kernel to the toolbox.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Pruner&lt;/strong&gt;: &lt;a href="https://docs.pruna.ai/en/v0.3.2/compression.html#padding-pruning" rel="noopener noreferrer"&gt;padding_pruning&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Padding pruning allows you to remove unnecessary padded computation. This is a targeted optimization that, while simple, still delivers efficiency gains.&lt;br&gt;
&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Usage example
&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pruna&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;SmashConfig&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;smash&lt;/span&gt;

&lt;span class="c1"&gt;# Initialize the SmashConfig and configure the algorithms
&lt;/span&gt;&lt;span class="n"&gt;smash_config&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;SmashConfig&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ring_attn&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;torch_compile&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="c1"&gt;# Configure the hyperparameters
&lt;/span&gt;&lt;span class="n"&gt;smash_config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;torch_compile_target&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;module_list&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;
&lt;span class="c1"&gt;# Optionally, add further compatible algorithms
&lt;/span&gt;&lt;span class="n"&gt;smash_config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;qkv_diffusers&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;padding_pruning&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Introducing New Families
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Decoders&lt;/strong&gt;: &lt;a href="https://docs.pruna.ai/en/v0.3.2/compression.html#zipar" rel="noopener noreferrer"&gt;zipar&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Pruna now supports decoders to speed up autoregressive generation by changing the decoding strategy itself. These methods speed up autoregressive generation by making decoding more parallelizable.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Distillers:&lt;/strong&gt; &lt;a href="https://docs.pruna.ai/en/v0.3.2/compression.html#text-to-image-distillation-inplace-perp" rel="noopener noreferrer"&gt;text_to_image_distillation_inplace_perp&lt;/a&gt;, &lt;a href="https://docs.pruna.ai/en/v0.3.2/compression.html#text-to-image-distillation-lora" rel="noopener noreferrer"&gt;text_to_image_distillation_lora&lt;/a&gt;, &lt;a href="https://docs.pruna.ai/en/v0.3.2/compression.html#text-to-image-distillation-perp" rel="noopener noreferrer"&gt;text_to_image_distillation_perp&lt;/a&gt;, &lt;a href="https://docs.pruna.ai/en/v0.3.2/compression.html#hyper" rel="noopener noreferrer"&gt;hyper&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Distillers make it easier to reduce inference costs by transferring behavior into smaller, more efficient variants. &lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Enhancers:&lt;/strong&gt; &lt;a href="https://docs.pruna.ai/en/v0.3.2/compression.html#img2img-denoise" rel="noopener noreferrer"&gt;img2img_denoise,&lt;/a&gt; &lt;a href="https://docs.pruna.ai/en/v0.3.2/compression.html#realesrgan-upscale" rel="noopener noreferrer"&gt;realesrgan_upscale&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Enhancers improve output quality after or alongside optimization. These methods are especially useful when the goal is not only faster inference, but also better final outputs.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Recoverers:&lt;/strong&gt; &lt;a href="https://docs.pruna.ai/en/v0.3.2/compression.html#id1" rel="noopener noreferrer"&gt;text_to_image_distillation_inplace_perp&lt;/a&gt;, &lt;a href="https://docs.pruna.ai/en/v0.3.2/compression.html#id2" rel="noopener noreferrer"&gt;text_to_image_distillation_lora&lt;/a&gt;, &lt;a href="https://docs.pruna.ai/en/v0.3.2/compression.html#id3" rel="noopener noreferrer"&gt;text_to_image_distillation_perp&lt;/a&gt;, &lt;a href="https://docs.pruna.ai/en/v0.3.2/compression.html#text-to-image-inplace-perp" rel="noopener noreferrer"&gt;text_to_image_inplace_perp&lt;/a&gt;, &lt;a href="https://docs.pruna.ai/en/v0.3.2/compression.html#text-to-image-lora" rel="noopener noreferrer"&gt;text_to_image_lora&lt;/a&gt;, &lt;a href="https://docs.pruna.ai/en/v0.3.2/compression.html#text-to-image-perp" rel="noopener noreferrer"&gt;text_to_image_perp&lt;/a&gt;, &lt;a href="https://docs.pruna.ai/en/v0.3.2/compression.html#text-to-text-inplace-perp" rel="noopener noreferrer"&gt;text_to_text_inplace_perp&lt;/a&gt;, &lt;a href="https://docs.pruna.ai/en/v0.3.2/compression.html#text-to-text-lora" rel="noopener noreferrer"&gt;text_to_text_lora&lt;/a&gt;, &lt;a href="https://docs.pruna.ai/en/v0.3.2/compression.html#text-to-text-perp" rel="noopener noreferrer"&gt;text_to_text_perp&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Recoverers make it possible to push compression more aggressively and then restore part of the lost quality afterward. This gives you a much more flexible optimization workflow, especially when combining quantization, pruning, or distillation with quality recovery steps.&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Usage example
&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pruna&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;SmashConfig&lt;/span&gt;

&lt;span class="n"&gt;smash_config&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;SmashConfig&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="c1"&gt;# Quantize the model to 4-bits
&lt;/span&gt;    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;diffusers_int8&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;weight_bits&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="c1"&gt;# Recover, allowing you to push quantization to lower bit rates without compromising quality
&lt;/span&gt;    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text_to_image_perp&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="c1"&gt;# you can increase or reduce 'batch_size' depending on your GPU, or use 'gradient_accumulation_steps' with it
&lt;/span&gt;        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;batch_size&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;num_epochs&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;validate_every_n_epoch&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.5&lt;/span&gt; &lt;span class="c1"&gt;# run validation every half epoch
&lt;/span&gt;    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;
&lt;span class="c1"&gt;# Attach a text-to-image dataset, used for recovery
&lt;/span&gt;&lt;span class="n"&gt;smash_config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_data&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;COCO&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;smash_config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;limit_datasets&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="mi"&gt;256&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;64&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  More Efficient Strategies
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4kzlxu7j1cj1fahkdta6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4kzlxu7j1cj1fahkdta6.png" alt=" " width="800" height="643"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Diagram showcasing the current algorithm families supported by Pruna (10-03-2026)&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;So, instead of only asking “how do I make this model faster?”, you can now think in more advanced strategies like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;compress first, then recover quality&lt;/li&gt;
&lt;li&gt;parallelize decoding instead of just reducing precision&lt;/li&gt;
&lt;li&gt;distribute attention across devices&lt;/li&gt;
&lt;li&gt;add post-processing quality enhancers&lt;/li&gt;
&lt;li&gt;swap in better attention kernels&lt;/li&gt;
&lt;li&gt;combine multiple compatible algorithms into a single pipeline&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This makes Pruna more flexible not just as a collection of optimizations, but also as a system for easily combining them.&lt;/p&gt;

&lt;p&gt;Try out &lt;a href="https://github.com/PrunaAI/pruna" rel="noopener noreferrer"&gt;Pruna 0.3.2&lt;/a&gt;, smash your model, and show us what combinations you come up with.&lt;/p&gt;

&lt;h2&gt;
  
  
  Enjoy the Quality and Efficiency!
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Compress your own models with &lt;a href="https://github.com/PrunaAI/pruna" rel="noopener noreferrer"&gt;Pruna&lt;/a&gt; and give us a ⭐️ to bring you many more algos!&lt;/li&gt;
&lt;li&gt;Stay up to date with the latest AI efficiency research on our &lt;a href="https://www.pruna.ai/blog" rel="noopener noreferrer"&gt;blog&lt;/a&gt;, explore our &lt;a href="https://github.com/PrunaAI/awesome-ai-efficiency" rel="noopener noreferrer"&gt;materials collection&lt;/a&gt;, or dive into our &lt;a href="https://github.com/PrunaAI/courses" rel="noopener noreferrer"&gt;courses&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Join the conversation and stay updated in our &lt;a href="https://discord.com/invite/JFQmtFKCjd" rel="noopener noreferrer"&gt;Discord&lt;/a&gt; community.&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>opensource</category>
      <category>news</category>
      <category>machinelearning</category>
      <category>ai</category>
    </item>
    <item>
      <title>LLM Architectures Explained: What Powers Today’s Top Models</title>
      <dc:creator>Sara Han</dc:creator>
      <pubDate>Wed, 04 Mar 2026 11:22:59 +0000</pubDate>
      <link>https://dev.to/pruna-ai/an-introduction-to-the-architectures-powering-the-current-llms-41n3</link>
      <guid>https://dev.to/pruna-ai/an-introduction-to-the-architectures-powering-the-current-llms-41n3</guid>
      <description>&lt;p&gt;Large Language Models (LLMs) have rapidly taken the spotlight in a wide range of fields over the past few years. At Pruna, the focus has been clear: make these models smaller, faster, cheaper, and greener. To make this possible, the team has explored and provided different optimization techniques, from caching and model compilation to advanced quantization and beyond.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;For an overview of AI model optimization techniques, see this &lt;a href="https://dev.to/pruna-ai/making-ai-models-faster-cheaper-and-greener-heres-how-58le"&gt;blog&lt;/a&gt;.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;However, these individual optimizations are just pieces of a much larger machine. We must lift the hood and examine the engine to understand how it works. This blog post will provide an overview, not attempting to cover every mathematical detail, but focusing on the main intuition, of the key architectures powering today’s language models: Autoregressive Models, State-Space Models, Diffusion-based Models, and Liquid Neural Networks.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where It All Begins: Tokenizers and Embeddings
&lt;/h2&gt;

&lt;p&gt;Before we dive into the intricate inner workings, it’s worth remembering that an LLM can’t “think” until it first “reads” your request, something it does through tokenization and embedding.&lt;/p&gt;

&lt;p&gt;For example, if you ask, "How do I optimize a model?", the model doesn’t receive that sentence as you wrote it. Instead, first it's tokenized, i.e., the text is broken into smaller, more frequent chunks known as tokens. The process involves the following steps:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Text normalization&lt;/strong&gt;, standardizing case and punctuation to ensure consistency.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pre-tokenization,&lt;/strong&gt; which breaks the text into rough chunks such as words or subwords. &lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The actual tokenization kicks in.&lt;/strong&gt; This step can vary slightly between models depending on design choices: the tokenization method (most commonly Byte Pair Encoding, or BPE, and its variants), the vocabulary and special tokens that define the model’s “dictionary,” and the training data that influences how the tokenizer learns the patterns to split the input.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;When it’s time to generate text, the model maps each token’s ID back to its original text fragment. But tokens alone aren’t enough — the model needs to understand their meaning and relationships, and work with numerical representations. That’s where embeddings come in. Each token ID is transformed into a high-dimensional vector that captures the meaning of the word based on how it was used in the training set. This is what allows LLMs to grasp intent, subtlety, and meaning far beyond basic definitions.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuxv8d46ojyfggawhf6vq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuxv8d46ojyfggawhf6vq.png" alt="Tokenization" width="800" height="296"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Token-by-Token: The Autoregressive Way&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Many LLMs are autoregressive, i.e., they generate text by predicting the next token in a sequence one by one. The Transformer architecture powers most of today’s leading models.&lt;/p&gt;

&lt;p&gt;Once we step into a transformer, we'll find a stack of transformer blocks. Each block processes the incoming token and passes the results to the next. At each block's heart, two operations occur: self-attention and a feedforward network.&lt;/p&gt;

&lt;p&gt;The self-attention mechanism determines how important each token is relative to all others in the sequence. This process implies:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The model &lt;strong&gt;computes attention scores&lt;/strong&gt; by multiplying the query vector of the current token with the key vectors of all other tokens.&lt;/li&gt;
&lt;li&gt;After normalization, &lt;strong&gt;each score is used to weigh the corresponding value vector&lt;/strong&gt;. The weighted sum of these values becomes the output of the attention layer.&lt;/li&gt;
&lt;li&gt;When &lt;strong&gt;a query and key are a strong match&lt;/strong&gt; — meaning they produce a high attention score — the associated value has a stronger influence on the final output.&lt;/li&gt;
&lt;li&gt;Transformers use &lt;strong&gt;multi-head attention&lt;/strong&gt;, i.e., multiple attention mechanisms ("heads") are run in parallel to increase the model's ability to capture different types of relationships. Each head focuses on different aspects of the input, combining their outputs to form a richer representation.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;After the self-attention step, the output at each position is passed through a feedforward neural network, a stack of dense layers with non-linear activation functions like ReLU or GeLU. This helps the model detect complex patterns that attention alone might miss.&lt;/p&gt;

&lt;p&gt;Finally, each sub-layer (self-attention and feed forward) is wrapped with residual connections and layer normalization, which helps stabilize the model and allows for deeper networks.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2udxljra3b8fw7pxj3d6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2udxljra3b8fw7pxj3d6.png" alt="Transformers" width="398" height="491"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Source: &lt;a href="https://arxiv.org/abs/1706.03762" rel="noopener noreferrer"&gt;https://arxiv.org/abs/1706.03762&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;To make the Transformer more efficient, several optimizations are often applied to different part of the transformer block:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Since the attention mechanism is typically the main computational bottleneck, various strategies have been focused on reducing its load:

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;KV caching&lt;/strong&gt; stores previously computed keys and values to speed up text generation significantly by avoiding redundant computations.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Sparse Attention&lt;/strong&gt; limits focus to a subset of tokens&lt;/li&gt;
&lt;li&gt;Sliding Window Attention restricts attention to the most recent tokens&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Flash Attention&lt;/strong&gt; improves GPU memory usage and throughput&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Paged Attention&lt;/strong&gt; manages KV caches more effectively for long sequences&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Multi-Query Attention (MQA)&lt;/strong&gt; lowers computational cost by sharing keys and values across all attention heads.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;Feed forward can be improved with another powerful approach, the Mixture of Experts (MoE). It replaces the traditional single feedforward block with multiple expert networks specialized in different patterns or topics, selectively activated through a gating mechanism. So, only a subset of them runs at a time, allowing the model to scale efficiently during training.&lt;/li&gt;

&lt;/ul&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Thinking in States:&lt;/strong&gt; A Different Way to Think About Sequences
&lt;/h2&gt;

&lt;p&gt;While autoregressive models like Transformers generate text by predicting the next token based on all previously seen tokens, State Space Models (SSMs) take inspiration from physic. At a time, they map a continuous input sequence to a latent space representation and predict the output sequence.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fckgd6pnqxyhlpblx2wmy.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fckgd6pnqxyhlpblx2wmy.png" alt="State Space Models" width="800" height="308"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Source: &lt;a href="https://newsletter.maartengrootendorst.com/p/a-visual-guide-to-mamba-and-state" rel="noopener noreferrer"&gt;https://newsletter.maartengrootendorst.com/p/a-visual-guide-to-mamba-and-state&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;SSMs capture only the most relevant information in three different ways to represent the relationship between input, state, and output. Depending on the task, the stage of the process, or the type of data, it’s possible to switch between these representations, although it requires some advanced methods, to take advantage of the most efficient one for the problem at hand, maximizing performance.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;strong&gt;Aspect&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Continuous Representation&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Recurrent Representation&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Convolutional Representation&lt;/strong&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Core idea&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Describes how the state changes smoothly over time.&lt;/td&gt;
&lt;td&gt;Breaks time into steps, updating the current state based on the previous state and new input.&lt;/td&gt;
&lt;td&gt;Updates the current state using a weighted history of previous states.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Advantages&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Ideal for data with irregular or time-shifted sampling · Mathematically feasible analysis&lt;/td&gt;
&lt;td&gt;Natural fit for sequences · Efficient inference&lt;/td&gt;
&lt;td&gt;Local, interpretable features · Parallelizable training&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Disadvantages&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Very slow training and inference&lt;/td&gt;
&lt;td&gt;Slow training · Gradient issues in too-long sequences&lt;/td&gt;
&lt;td&gt;Inefficient in online/autoregressive use · Fixed context size&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Suitability&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Handling continuous data&lt;/td&gt;
&lt;td&gt;Efficient inference&lt;/td&gt;
&lt;td&gt;Fast training via parallelization&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;To handle the complexity of natural language, deep SSMs stack multiple state space layers and add non-linear transformations. In this setup, the SSM blocks handle dependencies across tokens in the sequence, while the non-linear layers capture dependencies across embedding dimensions. This division of labor allows the model to represent intricate language patterns while still benefiting from the efficiency of state-tracking mechanisms.&lt;/p&gt;

&lt;h2&gt;
  
  
  Removing the Noise: Diffusion LLMs
&lt;/h2&gt;

&lt;p&gt;In the world of computer vision, one of the most groundbreaking advances in recent years has been diffusion models. The core idea is quite intuitive: start with an image and gradually add random noise over many steps until it turns into pure noise — resembling TV static or white noise. Then, train a model to reverse this process — step by step — by learning how to remove the noise and recover the original image (or generate a completely new one). Through this iterative denoising, the model learns the underlying patterns and structures of visual data, encoding that knowledge into a latent space, i.e., a map of all the possible images the model could generate, where each point represents a unique combination of learned features.&lt;/p&gt;

&lt;p&gt;Similar principles have recently been explored in the context of language modeling, where researchers are adapting diffusion-based approaches to generate text. In this case, the process begins with a random noise representation, which is then gradually refined and “denoised” into a coherent sequence of tokens.&lt;/p&gt;

&lt;p&gt;Unlike traditional autoregressive models that generate one token at a time, diffusion-based language models produce the entire sequence simultaneously (although they can also operate in a semi-autoregressive fashion by predicting blocks of tokens after blocks of tokens). This makes the process inherently parallelizable and potentially more efficient, especially during inference. In addition, as they consider the whole text structure simultaneously, they might be naturally better at logical reasoning and generating well-structured responses. Their ability to continuously refine output also holds promise for reducing hallucinations and minimizing errors.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flx6fbuker69gflwuklgt.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flx6fbuker69gflwuklgt.png" alt="LLaDa Overview" width="800" height="282"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Source: &lt;a href="https://arxiv.org/abs/2502.09992" rel="noopener noreferrer"&gt;https://arxiv.org/abs/2502.09992&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Overview at a Glance
&lt;/h2&gt;

&lt;p&gt;Now that we’ve walked through the main architectures, it’s time to recap!&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Autoregressive LLMs&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;State-Space LLMs&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Diffusion LLMs&lt;/strong&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Core Idea&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Sequential token prediction via conditional probabilities.&lt;/td&gt;
&lt;td&gt;Sequence modeling via state-space equations.&lt;/td&gt;
&lt;td&gt;Iterative noise reduction.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Computational Cost&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;High&lt;/td&gt;
&lt;td&gt;Low&lt;/td&gt;
&lt;td&gt;High&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Inference Speed&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Slow-Medium&lt;/td&gt;
&lt;td&gt;Fast&lt;/td&gt;
&lt;td&gt;Medium-Fast&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Long-context&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Limited by memory&lt;/td&gt;
&lt;td&gt;Designed for long sequences.&lt;/td&gt;
&lt;td&gt;Limited by memory.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Interpretability&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Medium&lt;/td&gt;
&lt;td&gt;Medium&lt;/td&gt;
&lt;td&gt;Low&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Examples&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;GPT, LLaMA, Mistral&lt;/td&gt;
&lt;td&gt;Mamba&lt;/td&gt;
&lt;td&gt;LLaDA, Mercury Coder&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;While we’ve gone over the core ideas of these architectures, you should take into account that each can have other possible configurations depending on how encoding and decoding are designed for specific tasks.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's Next?
&lt;/h2&gt;

&lt;p&gt;In this blog post, we explored an overview of the main architectures behind today’s cutting-edge LLMs. Understanding these foundations is key to optimizing performance and choosing where to focus your efforts.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Enjoy the Quality and Efficiency!&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Want to take it further?&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Compress your own models with &lt;a href="https://github.com/PrunaAI/pruna" rel="noopener noreferrer"&gt;Pruna&lt;/a&gt; and give us a ⭐ to show your support!&lt;/li&gt;
&lt;li&gt;Try &lt;a href="https://www.pruna.ai/" rel="noopener noreferrer"&gt;our image and video models&lt;/a&gt; with just one click.&lt;/li&gt;
&lt;li&gt;Stay up to date with the latest AI efficiency research on our &lt;a href="https://www.pruna.ai/blog" rel="noopener noreferrer"&gt;blog&lt;/a&gt;, explore our &lt;a href="https://github.com/PrunaAI/awesome-ai-efficiency" rel="noopener noreferrer"&gt;materials collection&lt;/a&gt;, or dive into our &lt;a href="https://github.com/PrunaAI/courses" rel="noopener noreferrer"&gt;courses&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Join the conversation and stay updated in our &lt;a href="https://discord.com/invite/JFQmtFKCjd" rel="noopener noreferrer"&gt;Discord&lt;/a&gt; community.&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>beginners</category>
      <category>machinelearning</category>
      <category>opensource</category>
    </item>
    <item>
      <title>FLUX.2 [flex] Challenge</title>
      <dc:creator>Sara Han</dc:creator>
      <pubDate>Fri, 30 Jan 2026 12:00:00 +0000</pubDate>
      <link>https://dev.to/pruna-ai/flux2-flex-challenge-4b63</link>
      <guid>https://dev.to/pruna-ai/flux2-flex-challenge-4b63</guid>
      <description>&lt;p&gt;To celebrate together the release of FLUX.2 [flex] by Pruna AI in collaboration with Black Forest Labs. We’re launching the FLUX.2 [flex] Design Challenge! 🎨&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Theme&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Make an infographic of how to grow a black forest of plums&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How to participate&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;1️⃣ Answer with a single creative infographic using FLUX.2 [flex] and the screenshot showcasing it. You can try it here: &lt;a href="https://bfl.ai/models/flux-2" rel="noopener noreferrer"&gt;https://bfl.ai/models/flux-2&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;2️⃣ Follow  the @PrunaAI and @bfl_ml account in X.&lt;/p&gt;

&lt;p&gt;3️⃣ Mention us and add the hashtag #flux2flex.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Prize&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Popularity + Judge’s evaluation&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;🥇 €150 🥈€100 🥉 €50&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Dates&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Until February 6th (23:59 CET)&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Join the challenge, vote for your favorites, and inspire the community!&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Read the rules in detail: &lt;a href="https://www.pruna.ai/blog/flux2flex-challenge" rel="noopener noreferrer"&gt;https://www.pruna.ai/blog/flux2flex-challenge&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>challenge</category>
    </item>
    <item>
      <title>Slashing torch.compile Warmup &amp; LoRA Swapping Times with Pruna</title>
      <dc:creator>Sara Han</dc:creator>
      <pubDate>Wed, 28 Jan 2026 15:07:53 +0000</pubDate>
      <link>https://dev.to/pruna-ai/slashing-torchcompile-warmup-lora-swapping-times-with-pruna-1gei</link>
      <guid>https://dev.to/pruna-ai/slashing-torchcompile-warmup-lora-swapping-times-with-pruna-1gei</guid>
      <description>&lt;p&gt;PyTorch introduced &lt;code&gt;torch.compile&lt;/code&gt;, a powerful feature that significantly boosts performance by compiling the models. However, it comes with a catch: the first run is very slow. That warmup delay  can be a drag on development iteration and can lead to slower cold starts in production. If you’ve ever swapped a LoRA or made a small model change, you’ve probably noticed that frustrating pause before things get moving again. But what if you could dramatically reduce, or even eliminate, these warmup delays?&lt;/p&gt;

&lt;p&gt;In this post, we'll dive into two practical techniques, powered by Pruna, to mitigate warmup times. We'll show you how to:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Using Pruna's portable compilation feature, eliminate the initial model warmup&lt;/strong&gt; when deploying or reloading a model on a new machine (with identical hardware).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Achieve zero warmup when switching LoRAs&lt;/strong&gt; (Low-Rank Adaptations) on an already optimized model.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Get ready to reclaim those precious seconds (or even minutes!) and make your &lt;code&gt;torch.compile&lt;/code&gt; experience smoother than ever.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Challenge: Understanding &lt;code&gt;torch.compile&lt;/code&gt; Warmup
&lt;/h2&gt;

&lt;p&gt;Before we dive into the solutions, let's briefly touch upon why &lt;code&gt;torch.compile&lt;/code&gt; has a warmup phase. When you first invoke a model compiled with &lt;code&gt;torch.compile&lt;/code&gt;, several things happen under the hood. PyTorch needs to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Capture the computational graph&lt;/strong&gt;: It traces the execution of your model to understand its structure.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Perform graph optimizations&lt;/strong&gt;: The captured graph is then optimized for better performance.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Detect and fuse operators&lt;/strong&gt;: The backend (such as Inductor) identifies which operations can be combined for faster execution.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Generate code&lt;/strong&gt;: Optimized code (often CUDA kernels for GPUs or efficient CPU code) is generated by the chosen backend (like Inductor).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Compile the code&lt;/strong&gt;: This generated code is then compiled into executable machine instructions.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This entire process, especially the code generation and compilation steps, can take a noticeable amount of time, ranging from seconds to minutes, depending on the model's complexity and the hardware. While this is a one-time cost for a given model shape and hardware (as the compiled artifacts are cached), it can be disruptive:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Start/Stop instances&lt;/strong&gt;: When a new instance of an application starts (e.g., a serverless function or a new pod in Kubernetes), the first request might experience this long warmup, leading to poor user experience.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Switch instances&lt;/strong&gt;: If you compile a model on one machine and then try to run it on another (even with identical hardware), the cache might not be directly usable, leading to another full warmup.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Switch model adapters&lt;/strong&gt;: Swapping LoRAs or other adapters can alter the model graph triggering recompilation.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Development Iteration&lt;/strong&gt;: Waiting for recompilation after minor code changes or restarting a kernel slows the development cycle.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Pruna offers elegant ways to mitigate these issues, as we'll see next.&lt;/p&gt;

&lt;h2&gt;
  
  
  Use Case 1: Eliminating Initial Warmup with Pruna's Portable Compilation
&lt;/h2&gt;

&lt;h3&gt;
  
  
  The Problem
&lt;/h3&gt;

&lt;p&gt;Traditionally, running a compiled model on a new machine triggers a full compilation warmup, even if the hardware is identical. This can slow down processes, especially when deploying models to production or sharing them with others.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;The Core Idea&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Pruna makes compilation portable. It saves the required artifacts so they can be easily packaged with your model and reused on another machine (with the same hardware architecture and CUDA drivers) without needing to recompile from scratch. That way, the model will run fast right from the first inference.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Benefits
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Faster deployment&lt;/strong&gt;: Skip the first-run delay when deploying pre-compiled models to production servers, especially serverless instances.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Easier collaboration&lt;/strong&gt;: Share ready-to-run models with your team.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Smoother pipelines&lt;/strong&gt;: Speed up CI/CD by avoiding repeated compilation.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  How-to Use Pruna’s Portable Compilation
&lt;/h3&gt;

&lt;p&gt;Let's walk through how to use this feature:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Load your model as normally&lt;/strong&gt;: In our example, we use a Stable Diffusion pipeline from Diffusers.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Configure Pruna for Portable Compilation&lt;/strong&gt;: This is where the magic happens. Create a &lt;code&gt;SmashConfig&lt;/code&gt; object and configure &lt;code&gt;torch_compile&lt;/code&gt;  to be portable.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Smash the Model&lt;/strong&gt;: Apply the configuration using &lt;code&gt;smash()&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Run and Save the Model&lt;/strong&gt;: Run your model for the first time trigger compilation process, including the warmup. After that, just save your Pruna-smashed model, and it’ll be ready to use on any other machine.
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;torch&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;diffusers&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;StableDiffusionPipeline&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pruna&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;SmashConfig&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;smash&lt;/span&gt;

&lt;span class="c1"&gt;# Load the model
&lt;/span&gt;&lt;span class="n"&gt;pipe&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;StableDiffusionPipeline&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_pretrained&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;CompVis/stable-diffusion-v1-4&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;torch_dtype&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;float16&lt;/span&gt;
&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;to&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cuda&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Configure torch.compile and combine it with other Pruna features, as caching
&lt;/span&gt;&lt;span class="n"&gt;smash_config&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;SmashConfig&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;deepcache&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{},&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;torch_compile&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;torch_compile_make_portable&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Smash the model
&lt;/span&gt;&lt;span class="n"&gt;pipe&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;smash&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pipe&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;smash_config&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;smash_config&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Run the model for the first time
&lt;/span&gt;&lt;span class="nf"&gt;pipe&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;a photo of an astronaut riding a horse on mars&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Save the smashed model, including its portable compilation artifacts
&lt;/span&gt;&lt;span class="n"&gt;pipe&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;save_pretrained&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;smashed_sd_portable_model/&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Use case 2: Zero Warmup for LoRA Switching with Diffusers Hotswap and Pruna (&lt;code&gt;torch.compile&lt;/code&gt;) Compatibility
&lt;/h2&gt;

&lt;h3&gt;
  
  
  The Problem
&lt;/h3&gt;

&lt;p&gt;Low-Rank Adaptation (LoRA) is a game-changer for efficiently fine-tuning large models. It allows for quick adaptation by training only a small set of parameters.&lt;/p&gt;

&lt;p&gt;A powerful workflow involves dynamically switching between different LoRAs on a base model to change its output on the fly—for instance, altering image styles in a generative model. However, a challenge arises when you combine it with compilation. Every LoRA swap can look like a graph change—triggering a long recompilation and wiping out the speed advantage.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Core Idea
&lt;/h3&gt;

&lt;p&gt;While Diffusers handles the mechanics of LoRA hotswapping, using Pruna with &lt;code&gt;torch.compile&lt;/code&gt;and leveraging one of its cachers ensures that these Diffusers-driven LoRA swaps are efficient and don't cause recompilation warmups after the initial model compilation.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Benefits
&lt;/h3&gt;

&lt;p&gt;With Pruna and Diffusers together, you get flexible LoRA adaptation and high-performance execution with no warmup delays.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Instant LoRA swaps&lt;/strong&gt;: Serve models that adapt to diverse user inputs by loading different LoRAs or applications requiring rapid switching between LoRA-defined styles or functionalities (e.g., in an image generation UI), without the latency of recompilation.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Efficient experimentation&lt;/strong&gt;: Test multiple LoRAs quickly without waiting for recompiles.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  How-to &lt;strong&gt;Leverage Diffusers Hotswap with Pruna for Zero Warmup&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Let's walk through how this works:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Load the Base Model and Enable Diffusers LoRA Hotswapping.&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Configure Pruna&lt;/strong&gt;: Configure &lt;code&gt;torch.compile&lt;/code&gt; and enable a cacher. In this example, we will be using the &lt;code&gt;fora&lt;/code&gt; cacher, but others also maintain compatibility.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Smash the Model&lt;/strong&gt;: Apply the configuration using &lt;code&gt;smash()&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Run the Model&lt;/strong&gt;: Run the model for the first time triggering the &lt;code&gt;torch.compile&lt;/code&gt; warmup for the base model and the current LoRA. Then, you’ll be ready to hotswap to a new LoRA
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;torch&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;diffusers&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;FluxPipeline&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pruna&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;SmashConfig&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;smash&lt;/span&gt;

&lt;span class="c1"&gt;# Load the base model and enable LoRA hotswapping
&lt;/span&gt;&lt;span class="n"&gt;pipe&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;FluxPipeline&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_pretrained&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;black-forest-labs/FLUX.1-dev&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;torch_dtype&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;bfloat16&lt;/span&gt;
&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;to&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cuda&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;pipe&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;enable_lora_hotswap&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;target_rank&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;128&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="c1"&gt;# target_rank is an example
&lt;/span&gt;
&lt;span class="c1"&gt;# Load an initial LoRA
&lt;/span&gt;&lt;span class="n"&gt;pipe&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;load_lora_weights&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;alvdansen/frosting_lane_flux&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="c1"&gt;# Example LoRA
&lt;/span&gt;
&lt;span class="c1"&gt;# Configure Pruna's `torch.compile` and `fora`
&lt;/span&gt;&lt;span class="n"&gt;smash_config&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;SmashConfig&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;fora&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;fora_interval&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;fora_start_step&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;torch_compile&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;smash_config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_prepare_saving&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt; &lt;span class="c1"&gt;# `False`for experimentation
&lt;/span&gt;
&lt;span class="c1"&gt;# Smash the model
&lt;/span&gt;&lt;span class="n"&gt;pipe&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;smash&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;pipe&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;smash_config&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;smash_config&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Run the model for the first time
&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;a cat jumping in the air to catch a bird&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;generator&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Generator&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cpu&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;manual_seed&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;pipe&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;num_inference_steps&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;28&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;generator&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;generator&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="n"&gt;images&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Comparing the Solutions: Portable Compilation vs. Pruna Cacher Compatibility
&lt;/h2&gt;

&lt;p&gt;While we separately presented these use cases, they can be easily combined:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Use &lt;strong&gt;portable compilation&lt;/strong&gt; to create a base smashed model (perhaps with a default LoRA and apply Pruna optimization that loads quickly on new instances.&lt;/li&gt;
&lt;li&gt;Once loaded, pruna’s compatibility with hot-swapping would ensure that any subsequent LoRA hot swaps (managed by Diffusers) on that instance are also free of &lt;code&gt;torch.compile&lt;/code&gt; warmup delays.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This combined approach would give you a fast cold start &lt;em&gt;and&lt;/em&gt; adapter switching.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusions: Reclaim Your Time with Pruna
&lt;/h2&gt;

&lt;p&gt;The &lt;code&gt;torch.compile&lt;/code&gt; warmup can slow down production workflows for cold starts and adapter switching. Pruna addresses these challenges with two key features:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Portable compilation&lt;/strong&gt; (&lt;code&gt;torch_compile_make_portable=True&lt;/code&gt;) removes first-run warmup when deploying to identical hardware, enabling immediate peak performance.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Diffusers' LoRA hotswapping&lt;/strong&gt; with &lt;code&gt;torch.compile&lt;/code&gt; and a &lt;strong&gt;Pruna cacher&lt;/strong&gt; enables instant LoRA switching without recompilation delays.&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;For background on PyTorch's compilation and caching mechanisms, you might find the official &lt;a href="https://pytorch.org/tutorials/recipes/torch_compile_caching_tutorial.html" rel="noopener noreferrer"&gt;PyTorch &lt;code&gt;torch.compile&lt;/code&gt; Caching Tutorial&lt;/a&gt; insightful.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;We hope this guide helps you optimize your &lt;code&gt;torch.compile&lt;/code&gt; workflows. Happy coding!&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Enjoy the Quality and Efficiency!&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Want to take it further?&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Compress your own models with &lt;a href="https://github.com/PrunaAI/pruna" rel="noopener noreferrer"&gt;Pruna&lt;/a&gt; and give us a ⭐ to show your support!&lt;/li&gt;
&lt;li&gt;Stay up to date with the latest AI efficiency research on our &lt;a href="https://www.pruna.ai/blog" rel="noopener noreferrer"&gt;blog&lt;/a&gt;, explore our &lt;a href="https://github.com/PrunaAI/awesome-ai-efficiency" rel="noopener noreferrer"&gt;materials collection&lt;/a&gt;, or dive into our &lt;a href="https://github.com/PrunaAI/courses" rel="noopener noreferrer"&gt;courses&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Join the conversation and stay updated in our &lt;a href="https://discord.com/invite/JFQmtFKCjd" rel="noopener noreferrer"&gt;Discord&lt;/a&gt; community.&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>deeplearning</category>
      <category>machinelearning</category>
      <category>performance</category>
      <category>python</category>
    </item>
    <item>
      <title>Measuring What Matters: Objective Metrics for Image Generation Assessment</title>
      <dc:creator>Sara Han</dc:creator>
      <pubDate>Wed, 03 Dec 2025 18:05:47 +0000</pubDate>
      <link>https://dev.to/pruna-ai/measuring-what-matters-objective-metrics-for-image-generation-assessment-4a69</link>
      <guid>https://dev.to/pruna-ai/measuring-what-matters-objective-metrics-for-image-generation-assessment-4a69</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;🎊 Announcement&lt;/strong&gt;: Try our performance models for free, &lt;strong&gt;P-image&lt;/strong&gt; and &lt;strong&gt;P-image-Edit&lt;/strong&gt;, &lt;a href="https://www.pruna.ai/" rel="noopener noreferrer"&gt;here&lt;/a&gt;. Just 1 second without compromising the quality!&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Generating high-quality visuals with state-of-the-art models is becoming increasingly accessible. Open-source models run on laptops, and cloud services turn text into images in seconds. These models are already reshaping industries like advertising, gaming, fashion, and science.&lt;/p&gt;

&lt;p&gt;But creating images is the easy part. Judging their quality is much harder. Human feedback is slow, expensive, biased, and often inconsistent. Plus, quality has many faces: creativity, realism, and style don’t always align. Improving one can hurt another.&lt;/p&gt;

&lt;p&gt;That’s why we need clear, objective metrics that capture quality, coherence, and originality. We’ll explore methods for evaluating image quality and comparing models with &lt;a href="https://github.com/PrunaAI/pruna" rel="noopener noreferrer"&gt;Pruna&lt;/a&gt;, beyond simply asking "does it look cool?"&lt;/p&gt;

&lt;h2&gt;
  
  
  Metrics Overview
&lt;/h2&gt;

&lt;p&gt;There is no single correct way to categorize evaluation metrics, as a metric can belong to multiple categories depending on its usage and the data it evaluates. In our repository, all quality metrics can be computed in two modes: &lt;em&gt;single&lt;/em&gt; and &lt;em&gt;pairwise.&lt;/em&gt; &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Single mode&lt;/strong&gt; evaluates a model by comparing the generated images to input references or ground truth images, producing one score per model.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pairwise mode&lt;/strong&gt; compares two models by directly evaluating the generated images from each model together, producing a single comparative score for these two models.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This flexibility enables both absolute evaluations (assessing each model individually) and relative evaluations (direct comparisons between models).&lt;/p&gt;

&lt;p&gt;On top of the evaluation modes, it also makes sense to think about metrics in terms of their evaluation criteria to provide structure and clarity. Our metrics fall into two overarching categories: &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Efficiency Metrics&lt;/strong&gt;: Measure the speed, memory usage, carbon emissions, energy, etc., usage of models during inference. At Pruna, we focus on making your models smaller, faster, cheaper, and greener, so evaluating your models using these efficiency metrics is a natural fit. However, because efficiency metrics are not specific to image generation tasks, we won't discuss them in detail in this blog post. If you'd like to learn more about these metrics, please refer to our &lt;a href="https://docs.pruna.ai/en/stable/index.html" rel="noopener noreferrer"&gt;documentation&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Quality Metrics&lt;/strong&gt;: Measure generated images' intrinsic quality and alignment to intended prompts or references. These include:

&lt;ul&gt;
&lt;li&gt;Distribution Alignment: How closely generated images resemble real-world distributions.&lt;/li&gt;
&lt;li&gt;Prompt Alignment: Semantic similarity between generated images and their intended prompts.&lt;/li&gt;
&lt;li&gt;Perceptual Alignment: Pixel-level or perceptual similarity between generated and reference images.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;The table below summarizes the most common quality metrics available at Pruna, their categories, score ranges, and key limitations to help guide metric selection.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;strong&gt;Metric&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Measures&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Category&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Range (↑ higher is better/↓ lower is better)&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Limitations&lt;/strong&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;FID&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Distributional similarity to real images&lt;/td&gt;
&lt;td&gt;Distribution Alignment&lt;/td&gt;
&lt;td&gt;0  to ∞ (↓)&lt;/td&gt;
&lt;td&gt;Assumes Gaussianity, requires a large dataset, depends on a surrogate model&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;CMMD&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;CLIP-space distributional similarity&lt;/td&gt;
&lt;td&gt;Distribution Alignment&lt;/td&gt;
&lt;td&gt;0  to ∞ (↓)&lt;/td&gt;
&lt;td&gt;Kernel choice affects results, depends on a surrogate model&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;CLIPScore&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Image-text alignment&lt;/td&gt;
&lt;td&gt;Prompt Alignment&lt;/td&gt;
&lt;td&gt;0 to 100 (↑)&lt;/td&gt;
&lt;td&gt;Insensitive to image quality, depends on a surrogate model&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;PSNR&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Pixel-wise similarity&lt;/td&gt;
&lt;td&gt;Perceptual Alignment&lt;/td&gt;
&lt;td&gt;0 to ∞ (↑)&lt;/td&gt;
&lt;td&gt;Not well perceptually aligned&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;SSIM&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Structural similarity&lt;/td&gt;
&lt;td&gt;Perceptual Alignment&lt;/td&gt;
&lt;td&gt;-1 to 1 (↑)&lt;/td&gt;
&lt;td&gt;Can be unstable for small input variations&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;LPIPS&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Perceptual similarity&lt;/td&gt;
&lt;td&gt;Perceptual Alignment&lt;/td&gt;
&lt;td&gt;0 to 1 (↓)&lt;/td&gt;
&lt;td&gt;depends on a surrogate model&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Distribution Alignment Metrics&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Distribution alignment metrics measure how closely generated images resemble real-world data distributions, comparing both low- and high-dimensional features. In pairwise mode, they compare outputs from different models to produce a single score that reflects relative image quality.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fc47ais7gf699p13j52v9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fc47ais7gf699p13j52v9.png" alt="comparison-outputs"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The generated image closely resembles the real one, and the distributions are well aligned, suggesting good quality.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnexq3j519tqadzdqynjk.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnexq3j519tqadzdqynjk.png" alt="comparison-outputs1"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The generated image is noticeably off, and the distributions differ significantly, which the metric captures as a mismatch.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Fréchet Inception Distance (FID):&lt;/strong&gt;  FID (&lt;a href="https://arxiv.org/abs/1706.08500" rel="noopener noreferrer"&gt;introduced here&lt;/a&gt;)  is one of the most popular metrics for evaluating how realistic AI-generated images are. It works by comparing the &lt;em&gt;feature&lt;/em&gt; &lt;em&gt;distribution&lt;/em&gt; of the reference images (e.g., real images) to the images generated by the model to evaluate.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Here’s how it works in a nutshell:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;We take a &lt;strong&gt;pretrained surrogate model&lt;/strong&gt; and pass both real and generated images through it. The pretrained surrogate model is usually the &lt;strong&gt;Inception v3&lt;/strong&gt;, explaining the metric name.&lt;/li&gt;
&lt;li&gt;The model turns each image into a &lt;strong&gt;feature embedding&lt;/strong&gt; (a numerical summary of the image). We assume the embeddings from each set form a &lt;strong&gt;Gaussian distribution&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;FID then measures the &lt;em&gt;distance&lt;/em&gt; between the two distributions — the closer they are, the better.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;A lower FID score indicates that the generated images are more similar to real ones, meaning better image quality.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Want the math?&lt;/strong&gt;&lt;br&gt;
FID is calculated as the Fréchet distance between two multivariate Gaussians:&lt;br&gt;


&lt;/p&gt;
&lt;div class="katex-element"&gt;
  &lt;span class="katex-display"&gt;&lt;span class="katex"&gt;&lt;span class="katex-mathml"&gt;FID=∣μr−μg∣2+Tr(Σr+Σg−2(ΣrΣg)1/2)
\text{FID} = |\mu_r - \mu_g|^2 + \text{Tr}(\Sigma_r + \Sigma_g - 2(\Sigma_r \Sigma_g)^{1/2})
&lt;/span&gt;&lt;span class="katex-html"&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord text"&gt;&lt;span class="mord"&gt;FID&lt;/span&gt;&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mrel"&gt;=&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord"&gt;∣&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord mathnormal"&gt;μ&lt;/span&gt;&lt;span class="msupsub"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mathnormal mtight"&gt;r&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mbin"&gt;−&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord mathnormal"&gt;μ&lt;/span&gt;&lt;span class="msupsub"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mathnormal mtight"&gt;g&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord"&gt;∣&lt;/span&gt;&lt;span class="msupsub"&gt;&lt;span class="vlist-t"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mtight"&gt;2&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mbin"&gt;+&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord text"&gt;&lt;span class="mord"&gt;Tr&lt;/span&gt;&lt;/span&gt;&lt;span class="mopen"&gt;(&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord"&gt;Σ&lt;/span&gt;&lt;span class="msupsub"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mathnormal mtight"&gt;r&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mbin"&gt;+&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord"&gt;Σ&lt;/span&gt;&lt;span class="msupsub"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mathnormal mtight"&gt;g&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mbin"&gt;−&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord"&gt;2&lt;/span&gt;&lt;span class="mopen"&gt;(&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord"&gt;Σ&lt;/span&gt;&lt;span class="msupsub"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mathnormal mtight"&gt;r&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord"&gt;Σ&lt;/span&gt;&lt;span class="msupsub"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mathnormal mtight"&gt;g&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="mclose"&gt;&lt;span class="mclose"&gt;)&lt;/span&gt;&lt;span class="msupsub"&gt;&lt;span class="vlist-t"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mtight"&gt;&lt;span class="mord mtight"&gt;1/2&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="mclose"&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;/div&gt;
&lt;br&gt;
where:

&lt;ul&gt;
&lt;li&gt;
&lt;span class="katex-element"&gt;
  &lt;span class="katex"&gt;&lt;span class="katex-mathml"&gt;(μr,Σr)(\mu_r, \Sigma_r)&lt;/span&gt;&lt;span class="katex-html"&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mopen"&gt;(&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord mathnormal"&gt;μ&lt;/span&gt;&lt;span class="msupsub"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mathnormal mtight"&gt;r&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="mpunct"&gt;,&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord"&gt;Σ&lt;/span&gt;&lt;span class="msupsub"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mathnormal mtight"&gt;r&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="mclose"&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;/span&gt;
 are the mean and covariance of real image features,&lt;/li&gt;
&lt;li&gt;
&lt;span class="katex-element"&gt;
  &lt;span class="katex"&gt;&lt;span class="katex-mathml"&gt;(μg,Σg)(\mu_g, \Sigma_g)&lt;/span&gt;&lt;span class="katex-html"&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mopen"&gt;(&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord mathnormal"&gt;μ&lt;/span&gt;&lt;span class="msupsub"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mathnormal mtight"&gt;g&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="mpunct"&gt;,&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord"&gt;Σ&lt;/span&gt;&lt;span class="msupsub"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mathnormal mtight"&gt;g&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="mclose"&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;/span&gt;
 are the mean and covariance of generated image features,denotes the trace of a matrix},&lt;/li&gt;
&lt;li&gt;
&lt;span class="katex-element"&gt;
  &lt;span class="katex"&gt;&lt;span class="katex-mathml"&gt;(ΣrΣg)1/2(\Sigma_r \Sigma_g)^{1/2}&lt;/span&gt;&lt;span class="katex-html"&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mopen"&gt;(&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord"&gt;Σ&lt;/span&gt;&lt;span class="msupsub"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mathnormal mtight"&gt;r&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord"&gt;Σ&lt;/span&gt;&lt;span class="msupsub"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mathnormal mtight"&gt;g&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="mclose"&gt;&lt;span class="mclose"&gt;)&lt;/span&gt;&lt;span class="msupsub"&gt;&lt;span class="vlist-t"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mtight"&gt;&lt;span class="mord mtight"&gt;1/2&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;/span&gt;
 is the geometric mean of the covariance matrices.&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Clip Maximum-Mean-Discrepancy (CMMD):&lt;/strong&gt; CMMD (&lt;a href="https://arxiv.org/abs/2401.09603" rel="noopener noreferrer"&gt;introduced here&lt;/a&gt;) is another way to measure how close your generated images are to real ones. Like FID, it compares feature distributions, but instead of using Inception features, it uses embeddings from a pretrained CLIP model.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Here’s how it works:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;We take a &lt;strong&gt;pretrained surrogate model&lt;/strong&gt; and pass both real and generated images through it. The pretrained surrogate model is usually the &lt;strong&gt;CLIP&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;The model turns each image into a &lt;strong&gt;feature embedding&lt;/strong&gt; (a numerical summary of the image). We &lt;strong&gt;do not&lt;/strong&gt; assume the embeddings from each set form a &lt;strong&gt;Gaussian distribution&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;Use a kernel function (usually RBF) to compare how these distributions differ, without assuming they are Gaussian.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;A lower CMMD score indicates that the feature distributions of generated images are more similar to those of real images, meaning better image quality.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Want the math?&lt;/strong&gt;&lt;br&gt;
CMMD is based on the Maximum Mean Discrepancy (MMD) and is computed as:&lt;br&gt;

&lt;/p&gt;
&lt;div class="katex-element"&gt;
  &lt;span class="katex-display"&gt;&lt;span class="katex"&gt;&lt;span class="katex-mathml"&gt;CMMD=E![k(ϕ(xr),ϕ(xr′))]+E![k(ϕ(xg),ϕ(xg′))]−2 E![k(ϕ(xr),ϕ(xg))]
\text{CMMD} = \mathbb{E}!\left[ k(\phi(x_r), \phi(x_r')) \right]+ \mathbb{E}!\left[ k(\phi(x_g), \phi(x_g')) \right]- 2\,\mathbb{E}!\left[ k(\phi(x_r), \phi(x_g)) \right]
&lt;/span&gt;&lt;span class="katex-html"&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord text"&gt;&lt;span class="mord"&gt;CMMD&lt;/span&gt;&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mrel"&gt;=&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord mathbb"&gt;E&lt;/span&gt;&lt;span class="mclose"&gt;!&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="minner"&gt;&lt;span class="mopen delimcenter"&gt;[&lt;/span&gt;&lt;span class="mord mathnormal"&gt;k&lt;/span&gt;&lt;span class="mopen"&gt;(&lt;/span&gt;&lt;span class="mord mathnormal"&gt;ϕ&lt;/span&gt;&lt;span class="mopen"&gt;(&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord mathnormal"&gt;x&lt;/span&gt;&lt;span class="msupsub"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mathnormal mtight"&gt;r&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="mclose"&gt;)&lt;/span&gt;&lt;span class="mpunct"&gt;,&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mord mathnormal"&gt;ϕ&lt;/span&gt;&lt;span class="mopen"&gt;(&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord mathnormal"&gt;x&lt;/span&gt;&lt;span class="msupsub"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mathnormal mtight"&gt;r&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mtight"&gt;&lt;span class="mord mtight"&gt;′&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="mclose"&gt;))&lt;/span&gt;&lt;span class="mclose delimcenter"&gt;]&lt;/span&gt;&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mbin"&gt;+&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord mathbb"&gt;E&lt;/span&gt;&lt;span class="mclose"&gt;!&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="minner"&gt;&lt;span class="mopen delimcenter"&gt;&lt;span class="delimsizing size1"&gt;[&lt;/span&gt;&lt;/span&gt;&lt;span class="mord mathnormal"&gt;k&lt;/span&gt;&lt;span class="mopen"&gt;(&lt;/span&gt;&lt;span class="mord mathnormal"&gt;ϕ&lt;/span&gt;&lt;span class="mopen"&gt;(&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord mathnormal"&gt;x&lt;/span&gt;&lt;span class="msupsub"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mathnormal mtight"&gt;g&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="mclose"&gt;)&lt;/span&gt;&lt;span class="mpunct"&gt;,&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mord mathnormal"&gt;ϕ&lt;/span&gt;&lt;span class="mopen"&gt;(&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord mathnormal"&gt;x&lt;/span&gt;&lt;span class="msupsub"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mathnormal mtight"&gt;g&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mtight"&gt;&lt;span class="mord mtight"&gt;′&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="mclose"&gt;))&lt;/span&gt;&lt;span class="mclose delimcenter"&gt;&lt;span class="delimsizing size1"&gt;]&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mbin"&gt;−&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord"&gt;2&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mord mathbb"&gt;E&lt;/span&gt;&lt;span class="mclose"&gt;!&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="minner"&gt;&lt;span class="mopen delimcenter"&gt;[&lt;/span&gt;&lt;span class="mord mathnormal"&gt;k&lt;/span&gt;&lt;span class="mopen"&gt;(&lt;/span&gt;&lt;span class="mord mathnormal"&gt;ϕ&lt;/span&gt;&lt;span class="mopen"&gt;(&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord mathnormal"&gt;x&lt;/span&gt;&lt;span class="msupsub"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mathnormal mtight"&gt;r&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="mclose"&gt;)&lt;/span&gt;&lt;span class="mpunct"&gt;,&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mord mathnormal"&gt;ϕ&lt;/span&gt;&lt;span class="mopen"&gt;(&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord mathnormal"&gt;x&lt;/span&gt;&lt;span class="msupsub"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mathnormal mtight"&gt;g&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="mclose"&gt;))&lt;/span&gt;&lt;span class="mclose delimcenter"&gt;]&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;/div&gt;
&lt;br&gt;
where:

&lt;ul&gt;
&lt;li&gt;
&lt;span class="katex-element"&gt;
  &lt;span class="katex"&gt;&lt;span class="katex-mathml"&gt;ϕ(xr),ϕ(xr′)\phi(x_r), \phi(x_r')&lt;/span&gt;&lt;span class="katex-html"&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord mathnormal"&gt;ϕ&lt;/span&gt;&lt;span class="mopen"&gt;(&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord mathnormal"&gt;x&lt;/span&gt;&lt;span class="msupsub"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mathnormal mtight"&gt;r&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="mclose"&gt;)&lt;/span&gt;&lt;span class="mpunct"&gt;,&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mord mathnormal"&gt;ϕ&lt;/span&gt;&lt;span class="mopen"&gt;(&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord mathnormal"&gt;x&lt;/span&gt;&lt;span class="msupsub"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mathnormal mtight"&gt;r&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mtight"&gt;&lt;span class="mord mtight"&gt;′&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="mclose"&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;/span&gt;
 are two independent real image embeddings extracted from CLIP.&lt;/li&gt;
&lt;li&gt;
&lt;span class="katex-element"&gt;
  &lt;span class="katex"&gt;&lt;span class="katex-mathml"&gt;ϕ(xg),ϕ(xg′)\phi(x_g), \phi(x_g')&lt;/span&gt;&lt;span class="katex-html"&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord mathnormal"&gt;ϕ&lt;/span&gt;&lt;span class="mopen"&gt;(&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord mathnormal"&gt;x&lt;/span&gt;&lt;span class="msupsub"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mathnormal mtight"&gt;g&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="mclose"&gt;)&lt;/span&gt;&lt;span class="mpunct"&gt;,&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mord mathnormal"&gt;ϕ&lt;/span&gt;&lt;span class="mopen"&gt;(&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord mathnormal"&gt;x&lt;/span&gt;&lt;span class="msupsub"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mathnormal mtight"&gt;g&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mtight"&gt;&lt;span class="mord mtight"&gt;′&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="mclose"&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;/span&gt;
 are two independent generated image embeddings extracted from CLIP.&lt;/li&gt;
&lt;li&gt;
&lt;span class="katex-element"&gt;
  &lt;span class="katex"&gt;&lt;span class="katex-mathml"&gt;k(x,y)k(x, y)&lt;/span&gt;&lt;span class="katex-html"&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord mathnormal"&gt;k&lt;/span&gt;&lt;span class="mopen"&gt;(&lt;/span&gt;&lt;span class="mord mathnormal"&gt;x&lt;/span&gt;&lt;span class="mpunct"&gt;,&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mord mathnormal"&gt;y&lt;/span&gt;&lt;span class="mclose"&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;/span&gt;
 is a positive definite kernel function that measures similarity between embeddings.&lt;/li&gt;
&lt;li&gt;The expectations 
&lt;span class="katex-element"&gt;
  &lt;span class="katex"&gt;&lt;span class="katex-mathml"&gt;E[⋅]\mathbb{E}[\cdot]&lt;/span&gt;&lt;span class="katex-html"&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord mathbb"&gt;E&lt;/span&gt;&lt;span class="mopen"&gt;[&lt;/span&gt;&lt;span class="mord"&gt;⋅&lt;/span&gt;&lt;span class="mclose"&gt;]&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;/span&gt;
 are computed over multiple sample pairs.&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Prompt Alignment Metrics&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhto81d7ru1t0fzrchpi3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhto81d7ru1t0fzrchpi3.png" alt="example-prompt-alignments"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Prompt alignment metrics evaluate how well generated images match their input prompts, especially in text-to-image tasks. In pairwise mode, they instead measure semantic similarity between outputs from different models, shifting focus from prompt alignment to model agreement.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;CLIPScore:&lt;/strong&gt;  CLIPScore (&lt;a href="https://arxiv.org/abs/2104.08718" rel="noopener noreferrer"&gt;introduced here&lt;/a&gt;) tells you how well a generated image matches the text prompt that produced it. It uses a pretrained CLIP model, which maps both text and images into the same embedding space.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Here’s the idea:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Pass the image and its prompt through the &lt;strong&gt;surrograte CLIP model&lt;/strong&gt; to get their embeddings.&lt;/li&gt;
&lt;li&gt;Measure how close these two embeddings are. The closer they are, the better the alignment between the image and the prompt.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;CLIPScore ranges from 0 to 100. A higher score means the image is more semantically aligned with the prompt. Note that this metric doesn’t assess visual quality, but rather the match in meaning.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Want the math?&lt;/strong&gt;&lt;br&gt;
Given an image $x$  and its corresponding text prompt $t$, CLIP Score is computed as:&lt;br&gt;

&lt;/p&gt;
&lt;div class="katex-element"&gt;
  &lt;span class="katex-display"&gt;&lt;span class="katex"&gt;&lt;span class="katex-mathml"&gt;CLIPScore=max⁡!(100×ϕI(x)⋅ϕT(t)∣ϕI(x)∣ ∣ϕT(t)∣,  0)
\text{CLIPScore} = \max!\left(100 \times 
\frac{\phi_I(x) \cdot \phi_T(t)}
{|\phi_I(x)|\,|\phi_T(t)|},\; 0\right)
&lt;/span&gt;&lt;span class="katex-html"&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord text"&gt;&lt;span class="mord"&gt;CLIPScore&lt;/span&gt;&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mrel"&gt;=&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mop"&gt;max&lt;/span&gt;&lt;span class="mclose"&gt;!&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="minner"&gt;&lt;span class="mopen delimcenter"&gt;&lt;span class="delimsizing size3"&gt;(&lt;/span&gt;&lt;/span&gt;&lt;span class="mord"&gt;100&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mbin"&gt;×&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mopen nulldelimiter"&gt;&lt;/span&gt;&lt;span class="mfrac"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord"&gt;∣&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord mathnormal"&gt;ϕ&lt;/span&gt;&lt;span class="msupsub"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mathnormal mtight"&gt;I&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="mopen"&gt;(&lt;/span&gt;&lt;span class="mord mathnormal"&gt;x&lt;/span&gt;&lt;span class="mclose"&gt;)&lt;/span&gt;&lt;span class="mord"&gt;∣&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mord"&gt;∣&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord mathnormal"&gt;ϕ&lt;/span&gt;&lt;span class="msupsub"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mathnormal mtight"&gt;T&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="mopen"&gt;(&lt;/span&gt;&lt;span class="mord mathnormal"&gt;t&lt;/span&gt;&lt;span class="mclose"&gt;)&lt;/span&gt;&lt;span class="mord"&gt;∣&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="frac-line"&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord"&gt;&lt;span class="mord mathnormal"&gt;ϕ&lt;/span&gt;&lt;span class="msupsub"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mathnormal mtight"&gt;I&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="mopen"&gt;(&lt;/span&gt;&lt;span class="mord mathnormal"&gt;x&lt;/span&gt;&lt;span class="mclose"&gt;)&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mbin"&gt;⋅&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord mathnormal"&gt;ϕ&lt;/span&gt;&lt;span class="msupsub"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mathnormal mtight"&gt;T&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="mopen"&gt;(&lt;/span&gt;&lt;span class="mord mathnormal"&gt;t&lt;/span&gt;&lt;span class="mclose"&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="mclose nulldelimiter"&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="mpunct"&gt;,&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mord"&gt;0&lt;/span&gt;&lt;span class="mclose delimcenter"&gt;&lt;span class="delimsizing size3"&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;/div&gt;
&lt;br&gt;
where:

&lt;ul&gt;
&lt;li&gt;
&lt;span class="katex-element"&gt;
  &lt;span class="katex"&gt;&lt;span class="katex-mathml"&gt;ϕI(x)\phi_I(x)&lt;/span&gt;&lt;span class="katex-html"&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord mathnormal"&gt;ϕ&lt;/span&gt;&lt;span class="msupsub"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mathnormal mtight"&gt;I&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="mopen"&gt;(&lt;/span&gt;&lt;span class="mord mathnormal"&gt;x&lt;/span&gt;&lt;span class="mclose"&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;/span&gt;
 is the CLIP image embedding of the generated image.&lt;/li&gt;
&lt;li&gt;
&lt;span class="katex-element"&gt;
  &lt;span class="katex"&gt;&lt;span class="katex-mathml"&gt;ϕT(t)\phi_T(t)&lt;/span&gt;&lt;span class="katex-html"&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord mathnormal"&gt;ϕ&lt;/span&gt;&lt;span class="msupsub"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mathnormal mtight"&gt;T&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="mopen"&gt;(&lt;/span&gt;&lt;span class="mord mathnormal"&gt;t&lt;/span&gt;&lt;span class="mclose"&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;/span&gt;
 is the CLIP text embedding of the associated prompt.
CLIP Score ranges from 0 to 100, with higher scores indicating better alignment between an image and its corresponding text prompt. However, it can be insensitive to image quality, as it focuses on semantic similarity rather than visual fidelity.&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Perceptual Alignment Metrics
&lt;/h3&gt;

&lt;p&gt;Perceptual alignment metrics evaluate the perceptual quality and internal consistency of generated images. They compare pixel-level or feature-level differences between images. These metrics are often pairwise by nature, as comparing generated images with other generated images is more appropriate in certain cases, such as pixel-by-pixel comparisons.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Peak Signal-to-Noise Ratio (PSNR)&lt;/strong&gt;: PSNR measures the pixel-level similarity between a generated image and its reference (ground truth) image. It is widely used for evaluating image compression and restoration models.&lt;/p&gt;

&lt;p&gt;A higher PSNR value indicates better image quality, but PSNR does not always correlate well with human perception.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;Want the math?&lt;br&gt;
PSNR is computed as:&lt;br&gt;

&lt;/p&gt;
&lt;div class="katex-element"&gt;
  &lt;span class="katex-display"&gt;&lt;span class="katex"&gt;&lt;span class="katex-mathml"&gt;PSNR=10×log⁡10!(L2MSE)
\text{PSNR} = 10 \times \log_{10}!\left(\frac{L^2}{\text{MSE}}\right)
&lt;/span&gt;&lt;span class="katex-html"&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord text"&gt;&lt;span class="mord"&gt;PSNR&lt;/span&gt;&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mrel"&gt;=&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord"&gt;10&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mbin"&gt;×&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mop"&gt;&lt;span class="mop"&gt;lo&lt;span&gt;g&lt;/span&gt;&lt;/span&gt;&lt;span class="msupsub"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mtight"&gt;&lt;span class="mord mtight"&gt;10&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="mclose"&gt;!&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="minner"&gt;&lt;span class="mopen delimcenter"&gt;&lt;span class="delimsizing size3"&gt;(&lt;/span&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mopen nulldelimiter"&gt;&lt;/span&gt;&lt;span class="mfrac"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord text"&gt;&lt;span class="mord"&gt;MSE&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="frac-line"&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord"&gt;&lt;span class="mord mathnormal"&gt;L&lt;/span&gt;&lt;span class="msupsub"&gt;&lt;span class="vlist-t"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mtight"&gt;2&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="mclose nulldelimiter"&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="mclose delimcenter"&gt;&lt;span class="delimsizing size3"&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;/div&gt;
&lt;br&gt;
where:

&lt;ul&gt;
&lt;li&gt;
&lt;span class="katex-element"&gt;
  &lt;span class="katex"&gt;&lt;span class="katex-mathml"&gt;LL&lt;/span&gt;&lt;span class="katex-html"&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord mathnormal"&gt;L&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;/span&gt;
 is the maximum possible pixel value (e.g., 255 for an 8-bit image).&lt;/li&gt;
&lt;li&gt;
&lt;span class="katex-element"&gt;
  &lt;span class="katex"&gt;&lt;span class="katex-mathml"&gt;MSE\text{MSE}&lt;/span&gt;&lt;span class="katex-html"&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord text"&gt;&lt;span class="mord"&gt;MSE&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;/span&gt;
 (Mean Squared Error) is the average squared difference between pixel values.&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Structural Similarity Index (SSIM)&lt;/strong&gt;:  SSIM improves upon PSNR by comparing local patterns of pixel intensities instead of just raw pixel differences. It models human visual perception by considering luminance, contrast, and structure in small image patches&lt;/p&gt;

&lt;p&gt;SSIM ranges from -1 to 1, where 1 indicates perfect similarity.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Want the math?&lt;/strong&gt;&lt;br&gt;
SSIM is often computed as:&lt;br&gt;

&lt;/p&gt;
&lt;div class="katex-element"&gt;
  &lt;span class="katex-display"&gt;&lt;span class="katex"&gt;&lt;span class="katex-mathml"&gt;SSIM(x,y)=(2μxμy+C1)(2σxy+C2)(μx2+μy2+C1)(σx2+σy2+C2)
\text{SSIM}(x, y) =
\frac{(2\mu_x\mu_y + C_1)(2\sigma_{xy} + C_2)}
     {(\mu_x^2 + \mu_y^2 + C_1)(\sigma_x^2 + \sigma_y^2 + C_2)}
&lt;/span&gt;&lt;span class="katex-html"&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord text"&gt;&lt;span class="mord"&gt;SSIM&lt;/span&gt;&lt;/span&gt;&lt;span class="mopen"&gt;(&lt;/span&gt;&lt;span class="mord mathnormal"&gt;x&lt;/span&gt;&lt;span class="mpunct"&gt;,&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mord mathnormal"&gt;y&lt;/span&gt;&lt;span class="mclose"&gt;)&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mrel"&gt;=&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mopen nulldelimiter"&gt;&lt;/span&gt;&lt;span class="mfrac"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mopen"&gt;(&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord mathnormal"&gt;μ&lt;/span&gt;&lt;span class="msupsub"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mathnormal mtight"&gt;x&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mtight"&gt;2&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mbin"&gt;+&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord mathnormal"&gt;μ&lt;/span&gt;&lt;span class="msupsub"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mathnormal mtight"&gt;y&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mtight"&gt;2&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mbin"&gt;+&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord mathnormal"&gt;C&lt;/span&gt;&lt;span class="msupsub"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mtight"&gt;1&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="mclose"&gt;)&lt;/span&gt;&lt;span class="mopen"&gt;(&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord mathnormal"&gt;σ&lt;/span&gt;&lt;span class="msupsub"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mathnormal mtight"&gt;x&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mtight"&gt;2&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mbin"&gt;+&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord mathnormal"&gt;σ&lt;/span&gt;&lt;span class="msupsub"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mathnormal mtight"&gt;y&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mtight"&gt;2&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mbin"&gt;+&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord mathnormal"&gt;C&lt;/span&gt;&lt;span class="msupsub"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mtight"&gt;2&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="mclose"&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="frac-line"&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mopen"&gt;(&lt;/span&gt;&lt;span class="mord"&gt;2&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord mathnormal"&gt;μ&lt;/span&gt;&lt;span class="msupsub"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mathnormal mtight"&gt;x&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord mathnormal"&gt;μ&lt;/span&gt;&lt;span class="msupsub"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mathnormal mtight"&gt;y&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mbin"&gt;+&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord mathnormal"&gt;C&lt;/span&gt;&lt;span class="msupsub"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mtight"&gt;1&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="mclose"&gt;)&lt;/span&gt;&lt;span class="mopen"&gt;(&lt;/span&gt;&lt;span class="mord"&gt;2&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord mathnormal"&gt;σ&lt;/span&gt;&lt;span class="msupsub"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mtight"&gt;&lt;span class="mord mathnormal mtight"&gt;x&lt;/span&gt;&lt;span class="mord mathnormal mtight"&gt;y&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mbin"&gt;+&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord mathnormal"&gt;C&lt;/span&gt;&lt;span class="msupsub"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mtight"&gt;2&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="mclose"&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="mclose nulldelimiter"&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;/div&gt;
&lt;br&gt;
where:

&lt;ul&gt;
&lt;li&gt;
&lt;span class="katex-element"&gt;
  &lt;span class="katex"&gt;&lt;span class="katex-mathml"&gt;μx,μy\mu_x, \mu_y&lt;/span&gt;&lt;span class="katex-html"&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord mathnormal"&gt;μ&lt;/span&gt;&lt;span class="msupsub"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mathnormal mtight"&gt;x&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="mpunct"&gt;,&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord mathnormal"&gt;μ&lt;/span&gt;&lt;span class="msupsub"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mathnormal mtight"&gt;y&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;/span&gt;
 are the mean intensities of images 
&lt;span class="katex-element"&gt;
  &lt;span class="katex"&gt;&lt;span class="katex-mathml"&gt;xx&lt;/span&gt;&lt;span class="katex-html"&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord mathnormal"&gt;x&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;/span&gt;
 and 
&lt;span class="katex-element"&gt;
  &lt;span class="katex"&gt;&lt;span class="katex-mathml"&gt;yy&lt;/span&gt;&lt;span class="katex-html"&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord mathnormal"&gt;y&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;/span&gt;
.&lt;/li&gt;
&lt;li&gt;
&lt;span class="katex-element"&gt;
  &lt;span class="katex"&gt;&lt;span class="katex-mathml"&gt;σx2,σy2\sigma_x^2, \sigma_y^2&lt;/span&gt;&lt;span class="katex-html"&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord mathnormal"&gt;σ&lt;/span&gt;&lt;span class="msupsub"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mathnormal mtight"&gt;x&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mtight"&gt;2&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="mpunct"&gt;,&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord mathnormal"&gt;σ&lt;/span&gt;&lt;span class="msupsub"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mathnormal mtight"&gt;y&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mtight"&gt;2&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;/span&gt;
 are the variances.&lt;/li&gt;
&lt;li&gt;
&lt;span class="katex-element"&gt;
  &lt;span class="katex"&gt;&lt;span class="katex-mathml"&gt;σxy\sigma_{xy}&lt;/span&gt;&lt;span class="katex-html"&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord mathnormal"&gt;σ&lt;/span&gt;&lt;span class="msupsub"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mtight"&gt;&lt;span class="mord mathnormal mtight"&gt;x&lt;/span&gt;&lt;span class="mord mathnormal mtight"&gt;y&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;/span&gt;
 is the covariance between the images.&lt;/li&gt;
&lt;li&gt;
&lt;span class="katex-element"&gt;
  &lt;span class="katex"&gt;&lt;span class="katex-mathml"&gt;C1,C2C_1, C_2&lt;/span&gt;&lt;span class="katex-html"&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord mathnormal"&gt;C&lt;/span&gt;&lt;span class="msupsub"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mtight"&gt;1&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="mpunct"&gt;,&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord mathnormal"&gt;C&lt;/span&gt;&lt;span class="msupsub"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mtight"&gt;2&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;/span&gt;
 are small constants for stability. &lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;

&lt;ul&gt;
&lt;li&gt;**Learned Perceptual Image Patch Similarity (LPIPS): **LPIPS is a deep-learning-based metric that measures perceptual similarity between images using features from a pre-trained neural network (e.g., VGG, AlexNet). Unlike PSNR and SSIM, LPIPS captures high-level perceptual differences rather than pixel-wise differences.&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Want the math?&lt;/strong&gt;&lt;br&gt;
LPIPS is computed as:&lt;br&gt;

&lt;/p&gt;
&lt;div class="katex-element"&gt;
  &lt;span class="katex-display"&gt;&lt;span class="katex"&gt;&lt;span class="katex-mathml"&gt;LPIPS(x,y)=∑lwl ∣Fl(x)−Fl(y)∣22
\text{LPIPS}(x, y) = \sum_l w_l \,|F_l(x) - F_l(y)|_2^2
&lt;/span&gt;&lt;span class="katex-html"&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord text"&gt;&lt;span class="mord"&gt;LPIPS&lt;/span&gt;&lt;/span&gt;&lt;span class="mopen"&gt;(&lt;/span&gt;&lt;span class="mord mathnormal"&gt;x&lt;/span&gt;&lt;span class="mpunct"&gt;,&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mord mathnormal"&gt;y&lt;/span&gt;&lt;span class="mclose"&gt;)&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mrel"&gt;=&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mop op-limits"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mathnormal mtight"&gt;l&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span&gt;&lt;span class="mop op-symbol large-op"&gt;∑&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord mathnormal"&gt;w&lt;/span&gt;&lt;span class="msupsub"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mathnormal mtight"&gt;l&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mord"&gt;∣&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord mathnormal"&gt;F&lt;/span&gt;&lt;span class="msupsub"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mathnormal mtight"&gt;l&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="mopen"&gt;(&lt;/span&gt;&lt;span class="mord mathnormal"&gt;x&lt;/span&gt;&lt;span class="mclose"&gt;)&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mbin"&gt;−&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord mathnormal"&gt;F&lt;/span&gt;&lt;span class="msupsub"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mathnormal mtight"&gt;l&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="mopen"&gt;(&lt;/span&gt;&lt;span class="mord mathnormal"&gt;y&lt;/span&gt;&lt;span class="mclose"&gt;)&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord"&gt;∣&lt;/span&gt;&lt;span class="msupsub"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mtight"&gt;2&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mtight"&gt;2&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;/div&gt;
&lt;br&gt;
where:

&lt;ul&gt;
&lt;li&gt;
&lt;span class="katex-element"&gt;
  &lt;span class="katex"&gt;&lt;span class="katex-mathml"&gt;Fl(x)F_l(x)&lt;/span&gt;&lt;span class="katex-html"&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord mathnormal"&gt;F&lt;/span&gt;&lt;span class="msupsub"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mathnormal mtight"&gt;l&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="mopen"&gt;(&lt;/span&gt;&lt;span class="mord mathnormal"&gt;x&lt;/span&gt;&lt;span class="mclose"&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;/span&gt;
 and 
&lt;span class="katex-element"&gt;
  &lt;span class="katex"&gt;&lt;span class="katex-mathml"&gt;Fl(y)F_l(y)&lt;/span&gt;&lt;span class="katex-html"&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord mathnormal"&gt;F&lt;/span&gt;&lt;span class="msupsub"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mathnormal mtight"&gt;l&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="mopen"&gt;(&lt;/span&gt;&lt;span class="mord mathnormal"&gt;y&lt;/span&gt;&lt;span class="mclose"&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;/span&gt;
 are deep feature representations of images 
&lt;span class="katex-element"&gt;
  &lt;span class="katex"&gt;&lt;span class="katex-mathml"&gt;xx&lt;/span&gt;&lt;span class="katex-html"&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord mathnormal"&gt;x&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;/span&gt;
 and 
&lt;span class="katex-element"&gt;
  &lt;span class="katex"&gt;&lt;span class="katex-mathml"&gt;yy&lt;/span&gt;&lt;span class="katex-html"&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord mathnormal"&gt;y&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;/span&gt;
 from layer 
&lt;span class="katex-element"&gt;
  &lt;span class="katex"&gt;&lt;span class="katex-mathml"&gt;ll&lt;/span&gt;&lt;span class="katex-html"&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord mathnormal"&gt;l&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;/span&gt;
.&lt;/li&gt;
&lt;li&gt;
&lt;span class="katex-element"&gt;
  &lt;span class="katex"&gt;&lt;span class="katex-mathml"&gt;wlw_l&lt;/span&gt;&lt;span class="katex-html"&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord mathnormal"&gt;w&lt;/span&gt;&lt;span class="msupsub"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mathnormal mtight"&gt;l&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;/span&gt;
 are learned weights that adjust the importance of each feature layer.&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;

&lt;p&gt;To illustrate how different distortions impact metric scores, let's look at the following example. The image below showcases various distortions applied to an original image and how metrics like SSIM, PSNR, and LPIPS react to these changes.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcp9jlk6c76fdaor2a1ev.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcp9jlk6c76fdaor2a1ev.png" alt="example-distortions"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The results in the image illustrate how different types of distortions affect the scores given by these task-based metrics. Notably:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Blurred images&lt;/strong&gt; tend to score higher in SSIM than in PSNR. This suggests that while fine details are lost, the overall structure and patterns of the image remain intact, which aligns with SSIM’s focus on structural consistency.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pixelated images&lt;/strong&gt;, on the other hand, maintain relatively high PSNR values but drop in SSIM ranking. This indicates that while pixel intensity differences remain small, the structural coherence of the image is significantly degraded—highlighting SSIM’s sensitivity to spatial relationships rather than just pixel-level accuracy.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These observations demonstrate why selecting the right metric is crucial. Each of the metrics captures different aspects of image quality, making them useful in different scenarios depending on the type of distortion and the perceptual quality being assessed.&lt;/p&gt;

&lt;h2&gt;
  
  
  Confidently evaluate AI models with the Evaluation Agent!
&lt;/h2&gt;

&lt;p&gt;The evaluation framework in &lt;a href="https://github.com/PrunaAI/pruna" rel="noopener noreferrer"&gt;pruna&lt;/a&gt; consists of several key components:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Step 1:&lt;/strong&gt; Define what you want to measure&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Use the &lt;code&gt;Task&lt;/code&gt; object to specify which quality metrics you'd like to compute. You can provide the metrics in three different ways depending on how much control you need.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pruna.evaluation.task&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Task&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pruna.data.pruna_datamodule&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;PrunaDataModule&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pruna.evaluation.metrics.metric_torch&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;TorchMetricWrapper&lt;/span&gt;

&lt;span class="c1"&gt;# Method 1: plain text from predefined options
&lt;/span&gt;&lt;span class="n"&gt;evaluate_image_generation_task&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Task&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;image_generation_quality&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;datamodule&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;PrunaDataModule&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_string&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;LAION256&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

&lt;span class="c1"&gt;# Method 2: list of metric names
&lt;/span&gt;&lt;span class="n"&gt;metrics&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;clip_score&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;psnr&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;evaluate_image_generation_task&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Task&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;request&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;metrics&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;datamodule&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;PrunaDataModule&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_string&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;LAION256&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

&lt;span class="c1"&gt;# Method 3: list of metric instances
&lt;/span&gt;&lt;span class="n"&gt;clip_score_metric&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;TorchMetricWrapper&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;clip_score&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;model_name_or_path&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;openai/clip-vit-base-patch32&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;psnr_metric&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;TorchMetricWrapper&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;psnr&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;base&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;2.0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;metrics&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;clip_score_metric&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;psnr_metric&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;evaluate_image_generation_task&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Task&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;metrics&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;datamodule&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;PrunaDataModule&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_string&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;LAION256&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Step 2:&lt;/strong&gt; Run the Evaluation Agent&lt;/p&gt;

&lt;p&gt;Pass your model to the &lt;code&gt;EvaluationAgent&lt;/code&gt; and let it handle everything: running inference, computing metrics, and returning the final scores.&lt;br&gt;
&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pruna.evaluation.evaluation_agent&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;EvaluationAgent&lt;/span&gt;

&lt;span class="n"&gt;eval_agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;EvaluationAgent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;evaluate_image_generation_task&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;eval_agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;evaluate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;your_model&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; 
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;As AI-generated images become more prevalent, evaluating their quality effectively is more important than ever. Whether you're optimizing for realism, accuracy, or perceptual similarity, selecting the right evaluation metric is key. With Pruna now open-source, you have the freedom to explore, customize, and even contribute new evaluation metrics to the community . &lt;/p&gt;

&lt;p&gt;Our documentation and tutorials (&lt;a href="https://docs.pruna.ai/en/stable/index.html" rel="noopener noreferrer"&gt;here&lt;/a&gt;) provides a step-by-step guide on how to add your own metrics, making it easier than ever to tailor evaluations to your needs. Try it out today, contribute, and help shape the future of AI image evaluation!&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Enjoy the Quality and Efficiency!&lt;/strong&gt;
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Compress your own models with &lt;a href="https://github.com/PrunaAI/pruna" rel="noopener noreferrer"&gt;Pruna&lt;/a&gt; and give us a ⭐ to show your support!&lt;/li&gt;
&lt;li&gt;Try our models and endpoints in &lt;a href="https://replicate.com/prunaai" rel="noopener noreferrer"&gt;Replicate&lt;/a&gt; with just one click.&lt;/li&gt;
&lt;li&gt;Stay up to date with the latest AI efficiency research on our &lt;a href="https://www.pruna.ai/blog" rel="noopener noreferrer"&gt;blog&lt;/a&gt;, explore our &lt;a href="https://github.com/PrunaAI/awesome-ai-efficiency" rel="noopener noreferrer"&gt;materials collection&lt;/a&gt;, or dive into our &lt;a href="https://github.com/PrunaAI/courses" rel="noopener noreferrer"&gt;courses&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Join the conversation and stay updated in our &lt;a href="https://discord.com/invite/JFQmtFKCjd" rel="noopener noreferrer"&gt;Discord&lt;/a&gt; community.&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>analytics</category>
      <category>beginners</category>
    </item>
    <item>
      <title>Introducing Pruna 0.3.0 - The Upgrade You’ve Been Waiting For</title>
      <dc:creator>Sara Han</dc:creator>
      <pubDate>Mon, 17 Nov 2025 15:59:11 +0000</pubDate>
      <link>https://dev.to/pruna-ai/introducing-pruna-030-the-upgrade-youve-been-waiting-for-16hj</link>
      <guid>https://dev.to/pruna-ai/introducing-pruna-030-the-upgrade-youve-been-waiting-for-16hj</guid>
      <description>&lt;p&gt;Today, we are excited to announce that we have released the long-awaited &lt;a href="https://github.com/PrunaAI/pruna/releases/tag/v0.3.0" rel="noopener noreferrer"&gt;Pruna 0.3.0&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;We’ve restructured our internal framework to make algorithm management more flexible and scalable, setting the stage for even more powerful algorithm support going forward.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Why the Refactor&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;In previous versions, certain algorithm groups — such as cachers or quantizers — were tightly coupled to the package’s structure. This rigid grouping made it difficult to introduce new types of algorithms or to combine them in flexible ways.&lt;/p&gt;

&lt;p&gt;Starting with Pruna 0.3.0, we’ve reworked this system so that such classifications are no longer hard constraints. Instead, they now serve as supplementary metadata, enabling a more modular, composable, and future-proof design. This refactor lays the groundwork for integrating new optimization techniques and custom pipelines without structural limitations.&lt;/p&gt;

&lt;p&gt;This is a ground refactorization that enables two things:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Instead of applying algorithms in a fixed way defined by their group, &lt;strong&gt;compression algorithms can be applied in flexible orders regardless of their group&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;Instead of constraining one algorithm per group in the &lt;code&gt;SmashConfig&lt;/code&gt;, &lt;strong&gt;multiple algorithms from the same group can be combined&lt;/strong&gt; as long as they are marked as compatible.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What This Means for You
&lt;/h2&gt;

&lt;p&gt;You don’t need to do anything special — just upgrade to the new version and you’ll be ready to go!&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;pip&lt;/span&gt; &lt;span class="n"&gt;install&lt;/span&gt; &lt;span class="o"&gt;--&lt;/span&gt;&lt;span class="n"&gt;upgrade&lt;/span&gt; &lt;span class="n"&gt;pruna&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Once upgraded, everything will work out of the box. While we’ve &lt;strong&gt;slightly refined how configuration is defined&lt;/strong&gt; (for the better!), the old interface would still be valid. You can find all the details in the next section.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Changed
&lt;/h2&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;A More Flexible Algorithm Interface&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;This release introduces a more flexible configuration interface for algorithms.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;em&gt;Before&lt;/em&gt;&lt;/strong&gt;, you had to define your &lt;code&gt;SmashConfig&lt;/code&gt; step by step.&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pruna&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;SmashConfig&lt;/span&gt;

&lt;span class="n"&gt;config&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;SmashConfig&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;compiler&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;torch_compile&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cacher&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;deepcache&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;em&gt;Now&lt;/em&gt;&lt;/strong&gt;, with this release, you can do it &lt;strong&gt;all in one line with a list of the algorithm names — faster and simpler&lt;/strong&gt;.&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pruna&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;SmashConfig&lt;/span&gt;

&lt;span class="n"&gt;config&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;SmashConfig&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;torch_compile&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;deepcache&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;A More Flexible Hyperparameters Interface&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;This release introduces a more flexible configuration interface for hyperparameters.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;em&gt;Before&lt;/em&gt;&lt;/strong&gt;, if you needed to specify algorithm parameters, you no longer had to go through the tedious process of setting each one.&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pruna&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;SmashConfig&lt;/span&gt;

&lt;span class="n"&gt;config&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;SmashConfig&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;compiler&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;torch_compile&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;torch_compile_fullgraph&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;
&lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;torch_compile_mode&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;max-autotune&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;quantizer&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;hqq&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;hqq_weight_bits&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;
&lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;hqq_compute_dtype&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;torch.bfloat16&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;em&gt;Now&lt;/em&gt;&lt;/strong&gt;, you can now use a &lt;strong&gt;dictionary-style configuration&lt;/strong&gt; to define detailed, per-algorithm parameters all at once.&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pruna&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;SmashConfig&lt;/span&gt;

&lt;span class="n"&gt;config&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;SmashConfig&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
      &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;hqq&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
          &lt;span class="p"&gt;{&lt;/span&gt;
              &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;weight_bits&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
              &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;compute_dtype&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;torch.bfloat16&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
          &lt;span class="p"&gt;},&lt;/span&gt;
      &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;torch_compile&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
          &lt;span class="p"&gt;{&lt;/span&gt;
          &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;fullgraph&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
          &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;mode&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;max-autotune&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
          &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;A More Flexible Algorithm Ordering and Compatibility&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Another major change is how the algorithm application order is determined.&lt;/p&gt;

&lt;p&gt;Previously, the execution sequence was dictated by the hierarchy of algorithm groups and a global ordering based on these groups. In 0.3.0, this has been replaced by a more atomic and declarative system: each algorithm now specifies its own compatibility rules and ordering constraints. If an algorithm is compatible with another one, it will now always specify in which order they can be executed.&lt;/p&gt;

&lt;p&gt;This makes the algorithm pipeline more self-organizing, robust to new extensions, and capable of resolving valid combinations dynamically.&lt;/p&gt;

&lt;h3&gt;
  
  
  New documentation
&lt;/h3&gt;

&lt;p&gt;To make sure you have everything you need, we’ve also updated our &lt;a href="https://docs.pruna.ai/en/stable/setup/" rel="noopener noreferrer"&gt;documentation&lt;/a&gt;. You can now easily find the latest guides and tutorials under the “Open Source” tab.&lt;/p&gt;

&lt;h2&gt;
  
  
  Get Started Now
&lt;/h2&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Enjoy the Quality and Efficiency!&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Compress your own models with &lt;a href="https://github.com/PrunaAI/pruna" rel="noopener noreferrer"&gt;Pruna&lt;/a&gt; and give us a ⭐ to show your support!&lt;/li&gt;
&lt;li&gt;Try our endpoints in &lt;a href="https://replicate.com/prunaai" rel="noopener noreferrer"&gt;Replicate&lt;/a&gt;, &lt;a href="https://wiro.ai/models/wan-ai/wan2-2-ti2v-5b-text-to-video-fast" rel="noopener noreferrer"&gt;Wiro&lt;/a&gt; or &lt;a href="https://www.segmind.com/models/qwen-image-fast" rel="noopener noreferrer"&gt;Segmind&lt;/a&gt; with just one click.&lt;/li&gt;
&lt;li&gt;Stay up to date with the latest AI efficiency research on our &lt;a href="https://www.pruna.ai/blog" rel="noopener noreferrer"&gt;blog&lt;/a&gt;, explore our &lt;a href="https://github.com/PrunaAI/awesome-ai-efficiency" rel="noopener noreferrer"&gt;materials collection&lt;/a&gt;, or dive into our &lt;a href="https://github.com/PrunaAI/courses" rel="noopener noreferrer"&gt;courses&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Join the conversation and stay updated in our &lt;a href="https://discord.com/invite/JFQmtFKCjd" rel="noopener noreferrer"&gt;Discord&lt;/a&gt; community.&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>opensource</category>
      <category>ai</category>
      <category>machinelearning</category>
      <category>news</category>
    </item>
    <item>
      <title>Effective Prompting for Generative Vision Models</title>
      <dc:creator>Sara Han</dc:creator>
      <pubDate>Mon, 10 Nov 2025 18:14:43 +0000</pubDate>
      <link>https://dev.to/pruna-ai/effective-prompting-for-generative-vision-models-1bpc</link>
      <guid>https://dev.to/pruna-ai/effective-prompting-for-generative-vision-models-1bpc</guid>
      <description>&lt;p&gt;It’s likely that you’ve used a vision model to generate an image recently, but ended up with somewhat questionable results. You might have blamed this on the model not working correctly (and maybe that’s true), but it could also be because you didn’t give it the proper instructions.&lt;/p&gt;

&lt;p&gt;A vision model will only create what it’s asked to, and how you ask matters. Prompting isn’t just about describing what you see; it’s about guiding the model so it interprets your request correctly. Just one word can sometimes double its accuracy.&lt;/p&gt;

&lt;p&gt;In this blog, we’ll cover the key principles for prompting your vision models more effectively, from good practices to the nuances of different use cases. Whether you’re a developer, designer, marketer, or beginner, this guide will help you achieve the results you’re looking for.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Where to Test Your Prompts&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Before diving into how vision prompting works, let’s first look at where we can put it to the test. In this case, we’ll be using several endpoints available on &lt;a href="https://replicate.com/" rel="noopener noreferrer"&gt;Replicate&lt;/a&gt;, which we’ve optimized with Pruna to make them cheaper, faster, and more efficient. All of Pruna’s models are available &lt;a href="https://replicate.com/prunaai" rel="noopener noreferrer"&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Prompting Good Practices&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;While there are nuances that can be applied to each use case, there are also several key principles that should always be kept in mind when prompting a model:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Give direction:&lt;/strong&gt; State the goal, task, context, or desired style.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Be clear:&lt;/strong&gt; Use precise, unambiguous language. You don’t need to describe every detail, just select the key words that matter most.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Split the work:&lt;/strong&gt; If the goal is complex, break the prompt down into several chained steps.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Provide examples:&lt;/strong&gt; If possible, include an example and reference it in your prompt.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tune your prompts:&lt;/strong&gt; Always review the output and refine your prompts based on the results to get better responses. Using a grid can be helpful.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Know your model:&lt;/strong&gt; Review the model’s documentation or description. Some models support tags, parameters, or specific input formats that can significantly improve performance.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frlzll8x1nitbwncyxkcb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frlzll8x1nitbwncyxkcb.png" alt="comparison-good-bad-prompt" width="800" height="614"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Prompting in Practice
&lt;/h2&gt;

&lt;h3&gt;
  
  
  From Words to Pictures
&lt;/h3&gt;

&lt;p&gt;For image generation, you can craft the perfect prompt following a default structure: &lt;strong&gt;Subject + Subject’s Action + Style + Context.&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Subject:&lt;/strong&gt; Where is the focus of your image? It should be the main element of your image (person, object, animal, or scene).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Subject’s Action:&lt;/strong&gt; What’s the subject doing? It should describe what the subject is doing or how it interacts with the environment.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Style:&lt;/strong&gt; How is the image presented? It should specify the artistic direction or medium.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Context:&lt;/strong&gt; How and where is it happening? It should include the background, lighting, atmosphere, mood, point of view, or colors.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;When writing the prompt, make sure each element is descriptive and focused only on the specific element you want to generate, avoiding contradictions. If it's abstract or vague, it can lead to unpredictable results. For example, a prompt like “The best thing you can draw” is too ambiguous and might not produce anything appealing or coherent. Similarly, simply copying and pasting random text from the internet won’t work well — the model will struggle to extract a clear meaning or visual direction from it.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5j2cadhhbcjztdfh9etr.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5j2cadhhbcjztdfh9etr.png" alt="text-to-image" width="800" height="341"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  From Text or Image to Video
&lt;/h3&gt;

&lt;p&gt;For video generation, we can use a similar structure as for image generation. However, some extra aspects should be considered: &lt;strong&gt;Subject + Subject’s Action + Environment + Shot Type + Style + Context.&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Subject:&lt;/strong&gt; Who or what is the main focus of your video? It should be the main element of your scene (person, object, animal).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Subject’s Action:&lt;/strong&gt; What’s the subject doing? It should describe what the subject is doing or how it interacts with the environment.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Environment:&lt;/strong&gt; Where is it happening? It should include the scene details surrounding the subject.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Shot Type&lt;/strong&gt;: What’s the camera’s perspective or movement? It should describe the angle, trajectory, movement, and speed of the camera.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Style:&lt;/strong&gt; How is the image presented? It should specify the artistic direction or medium.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Context:&lt;/strong&gt; How is it happening? It should include the background, lighting, atmosphere, mood, point of view, or colors.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Editing Images
&lt;/h3&gt;

&lt;p&gt;For image editing, we should introduce a new prompt structure: &lt;strong&gt;Task + Target + Edit Type + Preservation&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Task:&lt;/strong&gt; What do you want to accomplish? It should define the main goal of the edit.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Target:&lt;/strong&gt; What specific element should be edited? It should identify the subject or area to modify.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Edit Type:&lt;/strong&gt; How should the change be applied? It should describe the method, intensity, or style of the edit.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Preservation:&lt;/strong&gt; What should remain unchanged? It should specify which parts of the image mustn’t change.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F77frqfo6g7zfgrjr67g1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F77frqfo6g7zfgrjr67g1.png" alt="image-editing" width="800" height="293"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  More Considerations
&lt;/h2&gt;

&lt;p&gt;On one hand, even though most vision models have recently improved — with greater care taken in training data and design — different biases can persist. That’s why, when prompting, it’s important not to reinforce them. You can mitigate this by evaluating the outputs to ensure diversity and representation, and by providing more context and detail.&lt;/p&gt;

&lt;p&gt;On the other hand, prompting in vision models raises a range of ethical questions that go beyond bias. Therefore, it’s essential to consider factors such as consent, authorship, data protection, and manipulation when using them.&lt;/p&gt;

&lt;h2&gt;
  
  
  What’s Next
&lt;/h2&gt;

&lt;p&gt;In conclusion, this blog post provides a structured and straightforward guide to get started with prompting a vision model. So, you can generate an image or video, or edit an existing one to suit your needs.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Enjoy the Quality and Efficiency!&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Want to take it further?&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Compress your own models with &lt;a href="https://github.com/PrunaAI/pruna" rel="noopener noreferrer"&gt;Pruna&lt;/a&gt; and give us a ⭐ to show your support!&lt;/li&gt;
&lt;li&gt;Stay up to date with the latest AI efficiency research on our &lt;a href="https://www.pruna.ai/blog" rel="noopener noreferrer"&gt;blog&lt;/a&gt;, explore our &lt;a href="https://github.com/PrunaAI/awesome-ai-efficiency" rel="noopener noreferrer"&gt;materials collection&lt;/a&gt;, or dive into our &lt;a href="https://github.com/PrunaAI/courses" rel="noopener noreferrer"&gt;courses&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Join the conversation and stay updated in our &lt;a href="https://discord.com/invite/JFQmtFKCjd" rel="noopener noreferrer"&gt;Discord&lt;/a&gt; community.&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>promptengineering</category>
      <category>beginners</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>Making AI Models Faster, Cheaper, and Greener — Here’s How</title>
      <dc:creator>Sara Han</dc:creator>
      <pubDate>Mon, 03 Nov 2025 13:11:27 +0000</pubDate>
      <link>https://dev.to/pruna-ai/making-ai-models-faster-cheaper-and-greener-heres-how-58le</link>
      <guid>https://dev.to/pruna-ai/making-ai-models-faster-cheaper-and-greener-heres-how-58le</guid>
      <description>&lt;p&gt;In this blog, we present the key techniques to gain AI efficiency, meaning models that are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Faster&lt;/strong&gt;: Accelerate inference times through advanced optimization techniques&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Smaller&lt;/strong&gt;: Reduce model size while maintaining quality&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cheaper&lt;/strong&gt;: Lower computational costs and resource requirements&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Greener&lt;/strong&gt;: Decrease energy consumption and environmental impact&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For this, Pruna provides an open-source toolkit that simplifies scalable inference, requiring just a few lines of code to optimize your models in each of the mentioned aspects.&lt;/p&gt;

&lt;p&gt;So first, let’s take a quick look at an overview of these techniques, and then we’ll dive deeper into each one.&lt;/p&gt;

&lt;h2&gt;
  
  
  Optimization Techniques
&lt;/h2&gt;

&lt;p&gt;To get started, we created a high-level overview of the different techniques implemented in Pruna. This list can be further enriched; however, it provides a solid basis for your understanding.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9fiuidkbpq8up9hsva0i.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9fiuidkbpq8up9hsva0i.png" alt="Diagram with optimization techniques" width="800" height="442"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Technique&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;th&gt;Impacts&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Batching&lt;/td&gt;
&lt;td&gt;Groups multiple inputs together to be processed simultaneously, improving computational efficiency and reducing overall processing time.&lt;/td&gt;
&lt;td&gt;Speed (✅), Memory (❌), Accuracy (~)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Caching&lt;/td&gt;
&lt;td&gt;Stores intermediate results of computations to speed up subsequent operations, reducing inference time by reusing previously computed results.&lt;/td&gt;
&lt;td&gt;Speed (✅), Memory (⚠️), Accuracy (~)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Speculative Decoding&lt;/td&gt;
&lt;td&gt;Speculative decoding speeds up AI text generation by having a small, fast model predict several tokens at once, which a larger model then verifies, creating an efficient parallel workflow.&lt;/td&gt;
&lt;td&gt;Speed (✅), Memory (❌), Accuracy (⚠️)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Compilation&lt;/td&gt;
&lt;td&gt;Compilation optimizes the model with instructions for specific hardware.&lt;/td&gt;
&lt;td&gt;Speed (✅), Memory (➖), Accuracy (~)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Distillation&lt;/td&gt;
&lt;td&gt;Trains a smaller, simpler model to mimic a larger, more complex model.&lt;/td&gt;
&lt;td&gt;Speed (✅), Memory (✅), Accuracy (❌)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Quantization&lt;/td&gt;
&lt;td&gt;Reduces the precision of weights and activations, lowering memory requirements.&lt;/td&gt;
&lt;td&gt;Speed (✅), Memory (✅), Accuracy (❌)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Pruning&lt;/td&gt;
&lt;td&gt;Removes less important or redundant connections and neurons, resulting in a sparser, more efficient network.&lt;/td&gt;
&lt;td&gt;Speed (✅), Memory (✅), Accuracy (❌)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Recovering&lt;/td&gt;
&lt;td&gt;Restores the performance of a model after compression.&lt;/td&gt;
&lt;td&gt;Speed (⚠️), Memory (⚠️), Accuracy (🟢)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;✅(improves), ➖(stays the same), ~(could worsen), ❌(worsens)&lt;/p&gt;

&lt;h3&gt;
  
  
  Technique requirements and constraints
&lt;/h3&gt;

&lt;p&gt;Before we continue, note that each one of these techniques and their underlying implementation algorithms has specific requirements and constraints. Some techniques can only be applied on specific hardware, like GPUs, or models like LLMs or image generation models. Others might require a tokenizer, processor, or dataset to function. Lastly, not all techniques can be used interchangeably, and therefore have compatibility limitations.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Optimization Techniques
&lt;/h2&gt;

&lt;p&gt;We will now dive a bit deeper into different optimization techniques. Although we will dive a bit deeper into the various techniques and their underlying algorithms, we will not be going into the nitty-gritty details, and keep it high level, and for each technique, highlight one of the fundamental underlying algorithms that has been implemented in the Pruna library.&lt;/p&gt;

&lt;h3&gt;
  
  
  Batching AI model inference
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffhqey6iqnxjtf2crwzw5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffhqey6iqnxjtf2crwzw5.png" alt="Comparison of individual requests, dynamic batching and continuous batching" width="800" height="478"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Batching groups multiple inputs together to be processed simultaneously, improving computational efficiency and reducing overall processing time. Instead of processing one prompt at a time, the GPU processes multiple prompts in parallel, maximizing hardware utilization. This significantly increases throughput since modern GPUs are designed for parallel computation. Batching reduces the per-example computational overhead and allows for better distribution of fixed costs across multiple inputs, thus often increasing the throughput.&lt;/p&gt;

&lt;p&gt;For batching, &lt;a href="https://docs.pruna.ai/en/stable/compression.html#whisper-s2t" rel="noopener noreferrer"&gt;we implemented WhisperS2T&lt;/a&gt;, which works on top of Whisper models. It intelligently batches smaller speech segments and is designed to be faster than other implementations, boasting a &lt;strong&gt;2.3X speed improvement over &lt;a href="https://github.com/m-bain/whisperX/tree/main" rel="noopener noreferrer"&gt;WhisperX&lt;/a&gt; and a 3X speed boost compared to &lt;a href="https://huggingface.co/openai/whisper-large-v2" rel="noopener noreferrer"&gt;HuggingFace Pipeline&lt;/a&gt; with FlashAttention 2 (&lt;a href="https://github.com/Vaibhavs10/insanely-fast-whisper" rel="noopener noreferrer"&gt;Insanely Fast Whisper&lt;/a&gt;)&lt;/strong&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Caching intermediate results
&lt;/h3&gt;

&lt;p&gt;Caching stores intermediate results of computations to speed up subsequent operations, reducing inference time by reusing previously computed results. For transformer-based LLMs, this typically involves storing key-value pairs from previous tokens to avoid redundant computation. When generating text token by token, each new token can reuse cached computations from previous tokens rather than recomputing the entire sequence. This dramatically improves inference efficiency, especially for long-context applications. However, caching goes beyond only saving KV computations and can be used in multiple places for LLMs and image generation models.&lt;/p&gt;

&lt;p&gt;For caching, &lt;a href="https://docs.pruna.ai/en/stable/compression.html#deepcache" rel="noopener noreferrer"&gt;we implemented DeepCache&lt;/a&gt;, which works on top of diffuser models. DeepCache accelerates inference by leveraging the U-Net blocks of diffusion pipelines to reuse cached high-level features. The nice thing is that it is training-free and almost lossless, while accelerating models 2X to 5X.&lt;/p&gt;

&lt;h3&gt;
  
  
  Speculative decoding with parallelizing generation
&lt;/h3&gt;

&lt;p&gt;Speculative decoding improves the efficiency of language model inference by parallelizing parts of the generation process. Instead of generating one token at a time, a smaller, faster draft model generates multiple candidate tokens in a single forward pass. The larger, more accurate model then verifies or corrects these tokens in parallel, allowing for faster token generation without significantly sacrificing output quality. This approach reduces the number of sequential steps required, thereby lowering overall latency and accelerating inference. It’s essential to note that the effectiveness of speculative decoding depends on the alignment between the draft and target models, as well as the chosen parameters, such as batch size and verification strategy.&lt;/p&gt;

&lt;p&gt;For speculative decoding, we have not implemented any algorithms. Yet! Stay tuned to discover our future speculative decoding algorithms.&lt;/p&gt;

&lt;h3&gt;
  
  
  Compilation for specific hardware
&lt;/h3&gt;

&lt;p&gt;Compilation optimizes the model for specific hardware by translating the high-level model operations into low-level hardware instructions. Compilers like NVIDIA TensorRT, Apache TVM, or Google XLA analyze the computational graph, fuse operations where possible, and generate optimized code for the target hardware. This process eliminates redundant operations, reduces memory transfers, and leverages hardware-specific acceleration features, resulting in faster inference times and lower latency. It is essential to note that each combination of model/hardware will have a different optimal compilation setup.&lt;/p&gt;

&lt;p&gt;For compilation, &lt;a href="https://docs.pruna.ai/en/stable/compression.html#stable-fast" rel="noopener noreferrer"&gt;we implemented Stable-fast&lt;/a&gt;, which works on top of diffuser models. Stable-fast is an optimization framework for Image-Gen models. It accelerates inference by fusing key operations into optimized kernels and converting diffusion pipelines into efficient TorchScript graphs.&lt;/p&gt;

&lt;h3&gt;
  
  
  Distillation for smaller models
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fn5r8eto8sq076c78no4f.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fn5r8eto8sq076c78no4f.png" alt="Diagram showing the distillation process" width="800" height="316"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Distillation trains a smaller, simpler model to mimic a larger, more complex model. The larger “teacher” model produces outputs that the smaller “student” model learns to replicate, effectively transferring knowledge while reducing computational requirements. This technique preserves much of the performance and capabilities of larger models while significantly reducing parameter count, memory usage, and inference time. Distillation can target specific capabilities of interest rather than general performance.&lt;/p&gt;

&lt;p&gt;For distillation, &lt;a href="https://docs.pruna.ai/en/stable/compression.html#hyper-pro" rel="noopener noreferrer"&gt;we implemented Hyper-SD&lt;/a&gt;, which works on top of diffusion models. Hyper-SD is a distillation framework that segments the diffusion process into time-step groups to preserve and reformulate the ODE trajectory. By integrating human feedback and score distillation, it enables near-lossless performance with drastically fewer inference steps.&lt;/p&gt;

&lt;h3&gt;
  
  
  Quantization for lower precision
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzhz7o372a644ny1dsuta.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzhz7o372a644ny1dsuta.png" alt="Representation of the quantization process" width="500" height="387"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Quantization reduces the precision of weights and activations, lowering memory requirements by converting high-precision floating-point numbers (FP32/FP16) to lower-precision formats (INT8/INT4). It reduces model size, memory bandwidth requirements, and computational complexity. Modern quantization techniques, such as Post-Training Quantization (PTQ) and Quantization-Aware Training (QAT), minimize accuracy loss while achieving substantial efficiency gains. Hardware accelerators often have specialized support for low-precision arithmetic, further enhancing performance.&lt;/p&gt;

&lt;p&gt;For quantization, &lt;a href="https://docs.pruna.ai/en/stable/compression.html#hqq" rel="noopener noreferrer"&gt;we implemented Half-Quadratic Quantization (HQQ)&lt;/a&gt;, which works on top of any model. HQQ utilizes fast and robust optimization techniques for on-the-fly quantization, eliminating the need for calibration data and making it applicable to any model. This algorithm is adapted explicitly for diffuser models.&lt;/p&gt;

&lt;h3&gt;
  
  
  Pruning away redundant neurons
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fd67yn8rgfycyzplzg8sh.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fd67yn8rgfycyzplzg8sh.png" alt="Representation of the pruning process" width="800" height="455"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Pruning removes less important or redundant connections and neurons, resulting in a sparser, more efficient network. Various pruning strategies exist, including magnitude-based pruning (removing the smallest weights) and lottery ticket hypothesis approaches (finding sparse subnetworks). Key design choices typically involve deciding which structure to prune (e.g., weight, neuron, blocks) and determining how to score structures (e.g., using weight magnitude, first-order, or second-order information). Pruning can significantly reduce model size (often by 80-90%) with minimal performance degradation when done carefully. Sparse models require specialized hardware or software support to realize computational gains.&lt;/p&gt;

&lt;p&gt;For pruning, &lt;a href="https://docs.pruna.ai/en/stable/compression.html#torch-structured" rel="noopener noreferrer"&gt;we implemented structured pruning&lt;/a&gt;, which works on top of any model. Structured pruning removes entire units like neurons, channels, or filters from a network, leading to a more compact and computationally efficient model while preserving a regular structure that standard hardware can easily optimize. &lt;/p&gt;

&lt;h3&gt;
  
  
  Recovering performance with training
&lt;/h3&gt;

&lt;p&gt;Recovering is special since it allows for improving the performance of compressed models. After compression, it restores the performance of a model through techniques like finetuning or retraining. After aggressive pruning, models typically experience some performance degradation, which can be mitigated by additional training steps. This recovery phase allows the remaining parameters to adapt and compensate for the compression. Approaches for efficient recovery include learning rate rewinding, weight rewinding, and gradual pruning with recovery steps between pruning iterations. The recovery process helps achieve optimal trade-offs between model size and performance.&lt;/p&gt;

&lt;p&gt;For recovering, &lt;a href="https://docs.pruna.ai/en/stable/compression.html#text-to-text-perp-pro" rel="noopener noreferrer"&gt;we implemented text-to-text PERP&lt;/a&gt;, which works on top of text generation models. This recoverer is a general-purpose &lt;a href="https://arxiv.org/pdf/2312.15230" rel="noopener noreferrer"&gt;PERP recoverer&lt;/a&gt; for text-to-text models using norm, head, and bias finetuning and optionally HuggingFace’s LoRA. Similarly, we support text-to-image PERP for other image generation models.&lt;/p&gt;

&lt;h2&gt;
  
  
  What’s next?
&lt;/h2&gt;

&lt;p&gt;This blog provided a brief introduction to each of these categories, but there are many more nuances, techniques, and implementations that we will highlight in upcoming blogs. The cool thing is that each of these techniques has been implemented in the open-source Pruna library and is ready for you to experiment with! &lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Enjoy the Quality and Efficiency!&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Want to take it further?&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Compress your own models with &lt;a href="https://github.com/PrunaAI/pruna" rel="noopener noreferrer"&gt;Pruna&lt;/a&gt; and give us a ⭐ to show your support!&lt;/li&gt;
&lt;li&gt;Explore our &lt;a href="https://github.com/PrunaAI/awesome-ai-efficiency" rel="noopener noreferrer"&gt;materials collection&lt;/a&gt;, or dive into our &lt;a href="https://github.com/PrunaAI/courses" rel="noopener noreferrer"&gt;courses&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Join the conversation and stay updated in our &lt;a href="https://discord.com/invite/JFQmtFKCjd" rel="noopener noreferrer"&gt;Discord&lt;/a&gt; community.&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>machinelearning</category>
      <category>efficiency</category>
      <category>opensource</category>
    </item>
    <item>
      <title>The AI efficiency framework from Pruna AI is now open-source</title>
      <dc:creator>Bertrand Charpentier</dc:creator>
      <pubDate>Thu, 20 Mar 2025 12:24:56 +0000</pubDate>
      <link>https://dev.to/pruna-ai/the-ai-efficiency-framework-from-pruna-ai-is-now-open-source-46nc</link>
      <guid>https://dev.to/pruna-ai/the-ai-efficiency-framework-from-pruna-ai-is-now-open-source-46nc</guid>
      <description>&lt;p&gt;I am Bertrand from Pruna AI. Together with John, Rayan, Stephan, we created Pruna AI to tackle challenges in AI model optimization. We’re a group of researchers in AI efficiency and reliability, originally from TUM.&lt;/p&gt;

&lt;p&gt;Since we got so many times questions on how compression of AI models was working under the hood, we decided to open-source the &lt;a href="https://github.com/PrunaAI/pruna" rel="noopener noreferrer"&gt;&lt;code&gt;pruna&lt;/code&gt; package&lt;/a&gt; with the help of all the Pruna AI team. As a whole, the &lt;code&gt;pruna&lt;/code&gt; package is an AI efficiency framework that can be installed with &lt;code&gt;pip install pruna&lt;/code&gt; to compress models, and thus save memory and compute power when running AI models for inference.&lt;/p&gt;

&lt;p&gt;With open-sourcing, people can now inspect and contribute to the open code. Beyond the code, we provide detailed readme, tutorials, benchmarks, and documentation (&lt;a href="https://docs.pruna.ai/en/stable/index.html" rel="noopener noreferrer"&gt;https://docs.pruna.ai/en/stable/index.html&lt;/a&gt;) to make transparent compression, evaluation, and saving/loading/serving of AI models. &lt;/p&gt;

&lt;p&gt;Beyond the open-source package, we commercially offer &lt;code&gt;pruna_pro&lt;/code&gt; with advanced compression methods, recovery methods, and an optimization agent to unlock greater efficiency and productivity gains.&lt;/p&gt;

&lt;p&gt;We are pleased to share this with you all. We would be glad to hear your thoughts and questions in the comments :)&lt;/p&gt;

</description>
      <category>ai</category>
      <category>machinelearning</category>
      <category>opensource</category>
    </item>
  </channel>
</rss>
