The Versatile Video Coding (VVC / H266) standard was released in July 2020 and became the successor of High-Efficiency Video Codec (HEVC / H265). Tests show that VVC saves up to 30% bitrate resources, but this is paid back with increased algorithm complexity. Some of the novations appeared in the transform part of VVC.
What do we know about the transform?
Transform is a type of data compression; its primary goal is to map images or their residual data onto the transform area.
Good data decorrelation, slight computational complexity, reversibility, and so on are all the markers to choose the best transform type. The two most popular transform categories to which all transform methods belong are image-based transforms and block-based transforms. Now we are interested in block-based transform.
Block-based transform coding is used for the predicted residual block, where the low-frequency components of the transform coefficients obtained after the transform are concentrated in the upper left corner of the block, and the high-frequency components are in the lower right corner. In this article, we will observe new features of block-based transform: Multi Transform Selection (MTS), Low-Frequency Non-Separable Transform (LFNST), and Subblock Transform (SBT).
Multi Transform Selection (MTS)
Multi Transform Selection (MTS) is a new type of sine/cosine transformation that was added in VVC. DCT-II transform was the only variant in HEVC, but in VVC, thanks to MTS creators, some new sine/cosine transform types were added: DST-7 and DCT-8. Definitions of these transform types are shown below:
If we compare VVC with HEVC, more attention is paid in VVC to quantize transformation matrices more accurately to save their orthogonality. In order to keep the intermediate value of the transform coefficient within the 16bit range after the horizontal and vertical transform, all coefficients are 10bit.
To increase control efficiency in MTS, both the intra/inter modes are specified. To turn on MTS, it needs to be enabled at SPS and CU levels, and the following conditions should be met:
- CU both in width and height are less or equal to 32,
- There are some non-zeroed coefficients (not only DC coefficient is non-zeroed),
- The last coefficient is not DC (CBF is set equal to 1),
- There are not so many coefficients (the last significant coefficient is not located inside the MTS zero-out region).
DCT-II transform mode is applied for both directions if the MTS flag at CU level equals 0. Maximum values of this transform mode for both width and height increase up to 64.
Here you can see mode combinations for different blocks:
To decrease the computational complexity of large-size blocks DST-7 and DCT-8, the high-frequency coefficient zero operation is performed when the size of the transform block using DST-7 and DCT-8 (width or height, or width and height) reaches 32. Only 16×16 low-frequency coefficients in the upper left corner are retained.
Here we can see which MTS modes were used for transformation in every block of our demo stream. To visualize it, we will use VQ Analyzer ver. 6.0.
And here are all possible variants of MTS:
The example of MTS work is shown in pic 5.
Low-Frequency Non-Separable Transform (LFNST)
Another new feature that was implemented in VVC transform part is Low-Frequency Non-Separable Transform (LFNTS). LFNST can be applied only for intra block for both luma/chroma components.
So what is it? It is the stage applied between forward primary transform and quantization (on the encoder side) and on the decoder side between de-quantization and inverse primary transform. The main goal of LFNST is to compress further the redundancy between low-frequency primary transform coefficients, which are the transform coefficients from the conventional direction intra prediction. It is also true that LFNST helps to concentrate coefficients precisely in the top right corner for elongated rectangles.
LFNST consists of 2 modes: 4×4 LFNST (for blocks with width/height < 8) and 8×8 LFNST (for blocks with width/height >= 8). So let’s consider an example. Here is a block (let’s call it X):
And this block can be presented as a one-dimensional vector:
After that, the non-separable transform can be calculated as: F =T⋅ X (F and X are vectors), where T is the 16×16 transform matrix, and F is the 16×1 vector with our calculated transform coefficients. These coefficients can be regrouped into a 4×4 block using a raster scan order.
LFNST uses a matrix multiplication approach to decrease computational complexity, save memory space, and store the matrix coefficients. It is good if the matrix dimension can be minimized. Thus, the idea is to map the N-dimensional vector to the R-dimensional vector, where N > R and N/R is the reduction factor. For 8×8 LFNST, the reduction factor is 4, so the transform matrix is 16×64. But in later stages of VVC, it was further reduced to 16×48.
Matrix dimensional reduction helped to decrease the memory usage for storing matrices from 10Kb to 8Kb without much performance degradation.
On the decoder side, the inverse process of LFNST uses the transposed matrix of the forward transform matrix.
These LFNST transform matrices T can be represented as follows:
- 8 different 16×16 matrices for 4×4 LFNST (Cartesian product of 4 lfnstTrSetIdx & 2 lfnst_idx);
- 8 different 16×48 matrices for 8×8 LFNST;
All these matrices are precalculated and stored on the encoder/decoder side.
To choose the best suitable LFNST matrix, you need to know lfnstTrSetIdx, which depends on IntraPredMode and lfnst_idx, which is transmitted in bitstream. Here is the table of dependence of lfnstTrSetIdx on IntraPredMode:
But LFNST cannot work in every case. These are some restrictions when LFNST cannot be used:
- block size is more than 64×64 or width/height CU are more than max transform_block_size;
- if ISP is applied, TU width or height is less than 4;
- block has INTER mode;
- transform skip is used;
- blocks 4×4 and 8×8 have more than eight first significant coefficients;
- there are some coefficients out of the 1st subblock;
- if there is only one non-zeroed coefficient (DC only);
It is important to note that LFNST and MTS are connected: if LFNST is enabled, only DCT-II mode (MTS index = 0) can be applied.
At last, let’s see how LFNST works on actual blocks. Here we use VQ Analyzer 6.0 to visualize this process.
Subblock Transform (SBT)
Subblock transform (SBT) is another new feature of transform in VVC. A distinctive feature of SBT is that only a part of the residual block is coded for CU. SBT is used only in inter-predicted CU. Suppose SBT is disabled at CU level (cu_sbt_flag = 0). In that case, the whole residual block is coded according to MTS mode, which was previously specified. If SBT is enabled on CU level, only a part of the residual block is coded with inferred adaptive transform, and the other part is zeroed out.
SBT mode has the following modes in which it can be specified:
- it can have vertical or horizontal split;
- it can be split 2:2 (as binary tree) or as 1:3 / 3:1 (as asymmetric binary tree);
- it can choose the position left/top(0) or right/bottom(1) (depending on the partition).
So, SBT 8 cases can be represented (as Cartesian multiplication of different modes, specified above).
It is important to note if the partition is 1:3 or 3:1 (QUAD modes), the non-zero residual is located only in a smaller region.
Here you can see the example of SBT position, type, and transform type:
The important constraint is that SBT cannot be applied to CU, which is coded with combined inter-intra mode.
That is all about the new features of transform that we wanted to describe in this article. Hopefully, it will be helpful, and you have enjoyed reading it.
- Versatile Video Coding Editorial Refinements on Draft 10
- Algorithm description for Versatile Video Coding and Test Model 12 (VTM 12)
- Low-Frequency Non-Separable Transform (LFNST) (https://ieeexplore.ieee.org/document/8954507 )