loading...

Overview

s_p profile image Shreevari SP ・2 min read

GitHub logo xiph / rav1e

The fastest and safest AV1 encoder.

rav1e Travis Build Status Actions Status Coverage Status

The fastest and safest AV1 encoder.

Table of Content

Overview

rav1e is an AV1 video encoder. It is designed to eventually cover all use cases, though in its current form it is most suitable for cases where libaom (the reference encoder) is too slow.

Features

  • Intra, inter, and switch frames
  • 64x64 superblocks
  • 4x4 to 64x64 RDO-selected square and 2:1/1:2 rectangular blocks
  • DC, H, V, Paeth, smooth, and all directional prediction modes
  • DCT, (FLIP-)ADST and identity transforms (up to 64x64, 16x16 and 32x32 respectively)
  • 8-, 10- and 12-bit depth color
  • 4:2:0 (full support), 4:2:2 and 4:4:4 (limited) chroma sampling
  • 11 speed settings (0-10)
  • Near real-time encoding at high speed levels
  • Constant quantizer and target bitrate (single- and multi-pass) encoding modes
  • Still picture mode

If you want to know what rav1e is, this is a good place to start.

What is?

  • Rate: Rate of bitstream output of the encoder.
  • Distortion: Deviation from the original image.
  • RD cost: Rate distortion cost is the trade-off cost between bitrate and quality loss. (We want to optimize this trade-off)
  • Lambda: Lagrangian multiplier used to optimize the RD cost.
  • tx: Transform
  • CDEF: Constrained Directional Enhancement Filter
  • LRF: Loop Restoration Filter

Why Rate-Distortion Optimization?

rav1e makes use of these RDO techniques to reach the saddle point in the trade-off between compression and quality.


Mechanism

A full exhaustive search for best RD-cost among all the tools and combinations is infeasible since the cost grows exponentially with every dimension that a tool adds to the search.
Many tools are pruned based on the speed settings and some tools like motion search are approximate by default.
But at the slowest speed setting, there is a search among nearly all available tools in rav1e.


How to calculate RD cost?

During the encoding process, we measure RD cost using distortion.

cost = distortion + lambda * rate

During the post-processing filter stage, we measure RD cost using error.

cost = error + lambda * rate

What are we optimizing for lowest RD-cost?

rdo_mode_decision:

  • Inter modes
  • Inter Compound modes
  • Tx type
  • Tx size
  • Intra modes

luma_chroma_mode_rdo:

  • Chroma mode RDO for predicted luma

rdo_partition_decision:

  • Partition decision

rdo_loop_decision:

  • CDEF + LRF combination

Overview of a tile encode

Description

Each tile is encoded independent of the other tiles. We can observe RD Optimization abundantly in the encoding process. Even the post-processing filters are chosen with RD costs taken into consideration.

Steps

1. Build coarse motion vectors of the tile for INTER blocks only.
2. For each superblock in tile:
    1. Motion estimation: Using either Diamond search or Full search on subsampled blocks.
    2. encode_partition_bottomup / encode_partition_topdown. 
    3. rdo_loop_decision.
3. CDEF and LRF.

Illustration

An overview of how a tile is encoded in rav1e

Coming up next...

  • RDO for modes
  • RDO during partitioning

Posted on by:

s_p profile

Shreevari SP

@s_p

Exploring rav1e a clean-room AV1 encoder built using Rust.

Discussion

markdown guide