MPEG-5 Part-1 or EVC or Essential Video Coding is an MPEG standard backed by Samsung, Huawei, Qualcomm, Divideon. It consists of two profiles – a Baseline Profile that uses only expired patents and is royalty-free and a Main Profile that requires royalties and uses new & innovative coding tools.
As most of you are probably aware, MPEG announced three new video codecs – Versatile Video Coding (VVC), Essential Video Coding (EVC), and Low Complexity Enhancement Video Coding (LCEVC). OTTVerse covered this in our introduction to the VVC, EVC, and LCEVC codecs; and talked about the architecture and performance of LCEVC (28% gain over AVC)!
An exciting time for people involved in video compression, right? Three new codecs to study, implement, and optimize!
In this article, let’s look at MPEG-5 Part-1 EVC (Essential Video Coding), it’s requirements, development, and a few of its coding tools.
What is the Essential Video Codec (EVC) Standard
According to the MPEG website, the goal of MPEG-5 Part 1 (Essential Video Coding) is as follows –
The goal of MPEG-5 EVC is to provide a standardized video coding solution to address business needs in some use cases, such as video streaming, where existing ISO video coding standards have not been as widely adopted as might be expected from their purely technical characteristics.from the MPEG website
Note the wording: “existing ISO video coding standards have not been as widely adopted as might be expected from their purely technical characteristics“. Interesting use of words 🙂
Next, let’s look at a few of the requirements posted on the MPEG’s website as part of w17928 document –
- … however, coding efficiency is not the only factor that determines the industry choice of video coding technology for products and services. …
- Video coding technologies should address the needs of existing and emerging real-world use cases.
- Video coding technology should also be easy to adopt from both technological and business perspectives.
- At MPEG meeting 122 in April 2018, some industry representatives identified the need for a “licensing-friendly” video codec that would facilitate the timely availability of clear and transparent Type 2 licensing terms.
- Definition of a test model, test sequences, and test conditions
- The test model should consist of two toolsets: a base and an enhanced toolset.
- The base toolset should be configured with tools that were made public more than 20 years ago or for which a Type 1 declaration is received.
- There should be additional tools in the enhanced toolset, each of which shall provide a significant improvement in coding efficiency and be capable of being cleanly switched off on an individual basis.
I’ll pause here so that we can let these words sink in. The MPEG committee is making the following pretty clear in their “requirements”
- HEVC’s patent-pool issues stymied the adoption of a formidable and feature-rich video codec (I know – because I worked on it for 4 years!);
- companies sunk millions of dollars in HEVC R&D and are stuck with that investment;
- the time has now come for addressing the patenting issue head-on, up-front, and in the beginning and not at the end.
Which is why (subtle and not-so-subtle) hints at patents and licensing pop-up everywhere in the MPEG-5 Part-1 EVC’s requirements.
Anyways, the fundamental requirement is that EVC will contain two separate profiles: a “Baseline profile” and a “Main profile”.
- The Baseline profile will only contain video coding technologies that are older than 20 years and freely available for use in the standard.
- The Main profile includes several coding tools that are designed to improve the compression efficiency of the codec. These are not royalty-free and can be turned on/off individually.
- Importantly, all the contributors of the standard are expected to make the timely publication of applicable licensing terms within two years of the FDIS stage either individually or as part of a patent pool.
So, companies that are looking to switch from their existing codecs to EVC can choose to use the royalty-free Baseline profile or pay royalties and use the Main Profile, which will have a higher compression efficiency than the Baseline profile.
With that introduction to EVC, let’s take a look at the architecture and at a few of the coding tools in EVC.
Architecture of EVC
Here is a high-level architectural block diagram of MPEG-5 EVC presented in IBC 2019 by Jonatan Samuelsson (Divideon), Kiho Choi (Samsung), Jianle Chen (Futurewei), and Dmytro Rusanovskyy (Qualcomm). It shows the coding tools in both the Baseline and Main profiles. The gray boxes indicate the various coding tools present in the Main profile.
The EVC codec is based on the tools & technologies proposed by Qualcomm, Samsung, and Huawei and the reference picture management and other high-level syntax aspects proposed by Divideon.
Here is another block diagram of the EVC standard from the IEEE Paper titled “An Overview of the MPEG-5 Essential Video Coding Standard” (download link). The Baseline profile is shown by the Green boxes and the Main profile by the Blue boxes. It gives a very clear picture of the Baseline and Main profile coding tools at each stage of the compression pipeline.
Let’s take a look at a few of the coding blocks in the next few sections. We won’t do deep-dives into each of the blocks as that is beyond the scope of this article.
High Level Syntax
An EVC bitstream consists of NAL (Network Abstraction Layer) units with a small NAL unit header indicating the temporal ID and certain NAL properties. Some NAL unit types included in EVC and which we know of from AVC, HEVC, etc. are the
- SPS (Sequence Parameter Set) – parameters that apply to an entire Coded Video Sequence (CVS)
- PPS (Picture Parameter Set) – data that applies to one or more pictures of a CVS
- APS (Adaptation Parameter Set) – data that applies to one or more parts of one or more pictures of a CVS.
In addition to this, the Main profile includes allows for flexibility in picture identifiers, Picture Order Count Signalling (POCS), and Reference Picture Lists (RPL), signalled at the picture level.
Entropy Coding is the process of losslessly compressing a set of characters using a well-defined code. Examples include Huffman coding and Binary Arithmetic Coding.
MPEG-5 EVC’s Baseline profile uses the same Binary Arithmetic Coding algorithm used in the JPEG standard’s Annexe D (source). This includes a binarization step and a fixed context model available via a Look Up Table.
The Main profile improves on the Baseline profile using a Context Modelling and Initialization (CMI) process for more efficient probability modeling. Similar to context modelling in codecs such as AVC, HEVC, the CMI block uses the syntax elements of neighboring blocks to model the probabilities that are fed to the binary arithmetic encoder.
EVC uses a quad-tree coding structure (64×64 to 4×4 sized blocks) that can be used to very efficiently partition a block of pixels into smaller shapes adapting to the characteristics of the block. We saw this in HEVC, which used Quadtree Decomposition to encode large video resolutions (> 1080p) efficiently.
- Binary Ternary Tree (BTT) which allows for nonsquare coding units
- Split Unit Coding Order (SUCO)– using this tool, one can use either a Left-to-Right or a Right-to-Left processing order for the split units.
In the baseline profile, the following tools and restrictions apply for Intra-prediction –
- all coding units are square-shaped
- there are 5 intra prediction modes: DC (average value of the neighbours), horizontal, vertical, diagonal left, and diagonal right.
- depending on the prediction modes of the upper and left neighbors, a code is adaptively generated and assigned for the prediction mode of the current block.
- rectangular coding units are allowed (in addition to the square-shaped CUs).
- Enhanced Intra Prediction Directions (EIPD) introduces 28 additional directional modes
- Intra Block Copy (IBC) allows you to reference a block of previously coded samples in the same picture.
- Unidirectional and Bi-directional motion estimation and compensation
- Temporal Direct Mode in the baseline profile references the motion vector of the temporally co-located block (as in H.263)
- Five blocks are used for signaling the ME mode in the following order –
- 3 spatial MVPs: left (MVP0), up (MVP1), up-right (MVP2)
- temporal MVP (MVP3)
- zero MVP (MVP4)
- Half and Quarter-Pel Motion Estimation is allowed
- Baseline profile uses Multiple Reference Pictures.
The Main Profile has a lot of new coding tools.
- Advanced Motion Interpolation and Signalling (AMIS): merge neighboring blocks to indicate that they use the same motion, but also to use a more advanced scheme for creating a list of candidate predictors compared to the predictors in the Baseline profile.
- Merge with Motion Vector Difference (MMVD) tool uses a process similar to the conceptual merging of neighboring blocks but additionally allows the signaling of a motion vector using an expression that includes a starting point, a motion magnitude, and a motion direction.
- Advanced Motion Vector Prediction (ADMVP): this tool allows for a larger number of candidate motion vector predictors for a block by referring to neighboring blocks from the same picture and the co-located block from a reference picture.
- Adaptive Motion Vector Resolution (AMVR) is a very interesting tool that provides a way to reduce the precision of motion vectors from quarter-sample, to half-sample, full-sample, double sample, or quadruple sample! Compression efficiency can be improved a lot using AMVR.
- Affine Prediction Mode tool allows one to represent motions that are not purely translational.
Decoder-side Motion Vector Refinement (DMVR) uses a bilateral template matching process for refining motion vectors in bi-prediction mode.
- The baseline profile uses a deblocking filter that is defined in H.263 Annex J .
The main profile defines several filtering tools –
- Advanced Deblocking Filter (ADDB),
- Hadamard Transform Domain Filter (HTDF): applied to luma samples before deblocking and only if the quantization parameter is larger than 17.
- Adaptive Loop Filter (ALF) allows the signaling of up to 25 different filters for the luma component and optimal filters can be selected through a classification process for each 4×4 block.
This is just a few of the coding tools in Baseline and Main Profile. For a detailed explanation of the tools, its best to refer to the spec, or the various papers published on EVC.
SMPTE 2019 Video on MPEG-5 EVC
Before we end this article, here is an interesting talk from Jonatan Samuelsson (Divideon) on MPEG-5 Part-2 EVC. It’s a good overview of the coding standard!
Well, it is an exciting time to be in if you are a video compression enthusiast. So many codecs, tons of optimizations and use-cases to satisfy, and innumerable possibilities.
But, what do you do as an organization?
- Do you stick to H.264/AVC and squeeze every drop out of it?
- Or, are you looking at advanced codecs like AV1 and VVC to get better performance for 4K and UHD compression?
- Or, will you switch over to the new and innovative LCEVC (read about the amazing 28% gains that OTTVerse got with LCEVC)?
- Or, will you drop H.264/AVC and go royalty-free with EVC’s Basic Profile?
Do let me know in the comments! Thank you and have a great day!