Comprehensive Guide to LCEVC (MPEG-5 Part 2) - Low Complexity Enhancement Video Coding


The LCEVC Codec (MPEG-5 Part 2) or “Low Complexity Enhancement Video Coding” is one of the three new codecs being introduced by MPEG (others being Versatile Video Coding (VVC) and Essential Video Coding (EVC)) with the aim of increased compression efficiency for existing codecs at little to no increase in coding complexity by the use of a base bitstream and an enhancement bitstream.

In this excellent presentation at the ITU Workshop on the Future of Media in Geneva, Guido Meardi who is the CEO & Co-Founder of V-Nova gave a detailed introduction to the LCEVC codec. And, he referred to LCEVC (MPEG-5 Part 2) as “a codec to improve other codecs” - which is actually quite true! For those who don’t know, V-Nova has been instrumental in driving the LCEVC standard through their research and work on the Perseus codec. More information on that here.

In this article, let’s take a look at

  • the “why” and “what” behind the LCEVC codec standardization
  • how the encoder and decoder work
  • theoritical complexity considerations
  • possible applications of the LCEVC codec

What is the LCEVC Codec (MPEG-5 Part 2)?

The LCEVC codec (Low Complexity Enhancement Video Coding) aims at being “a codec to improve other codecs” at a low complexity overhead. The LCEVC codec’s output is a combination of a “base bitstream” produced by an existing video codec such as AVC, HEVC, VP9, AV1, etc. along with enhancement layers that can be used conditionally to improve the quality of the video.

If the decoder/end-device supports LCEVC, the enhancement layers are decoded, else, the base codec alone is used to decode the bitstream and the video is rendered to the user. This ensures backward-compatibility and encourages roll-out of the LCEVC codec without the fear of breaking the end-user’s experience.

This concept is nicely captured in the figure below (taken from Guido Meardi’s Presentation at Geneva).

Architecture Diagram from Guido Meardi's Presentation at Geneva

In the next few articles of our series on the LCEVC codec, let’s learn how to produce an LCEVC bitstream using FFmpeg, and dive more into the details of the filters used in LCEVC.

Without further ado, let’s get started, shall we?

Key Requirements of the LCEVC Codec

The key requirements for the LCEVC project were specified in the MPEG as follows. Here is a summary of the goals that were set forward for the LCEVC MPEG-5 Part 2 project.

  • when enhancing an n-th generation MPEG codec (e.g., AVC), the compression efficiency for the aggregate stream is appreciably higher than that of the n-th generation MPEG codec used at full resolution and as close as possible to that of the (n+1)-th generation MPEG codec (e.g., HEVC) used at full resolution, at bandwidths and operating conditions relevant to mass market distribution;
  • encoding and decoding complexity for the aggregate full resolution video (i.e., base plus enhancement) shall be comparable with that of the base encoder or decoder, respectively, when used alone at full resolution.

In simpler terms, the aims of the LCEVC codec are -

  • when the base codec is AVC for example, then the compression efficiency when using LCEVC (base and enhancement layers) should be higher than just using AVC to encode the full resolution video (else, what is the point of a new codec, right?)
  • when the base codec is AVC for example, then the complexity (both encoding and decoding) of the LCEVC codec should be comparable with the base codec’s complexity for encoding the full resolution video. In other words, LCEVC should not dramatically increase the complexity of the encoder/decoder for compression gains - if it did, it wouldn’t be low complexity, now would it?

Further more, the MPEG document also talks about the “key implementation and non-technical requirements” and they are:-

  • the video stream should be decodable without specific firmware or OS support by all devices capable to decode the base codec, with substantially same resource utilization (e.g., processing power, battery consumption, etc.) as the base decoder at full resolution decoded in hardware;
  • all web browsers should be able to decode high resolution video without plug-ins and/or browser upgrade, e.g. via HTML5 javascript;
  • the additional data stream should be compatible with the existing ecosystem, e.g. ad insertion, metadata management, CDNs, DRM/CA and network protocols such as DASH, HLS, MMT and SS;
  • the overall processing power requirement to encode a video stream should be comparable with that of the base codec when used alone at full resolution.

All good! Ultimately, the aims of the LCEVC codec are to

  • improve the compression efficiency of any other codec with little or no increase in coding complexity.
  • be backward compatible so that legacy devices and software can decode the bitstreams from the base codec if they do not have support for the LCEVC codec’s enhancement layers.

With this understanding, let’s now dive into the architectural details of the LCEVC codec’s Encoder and Decoder.

Architecture of the LCEVC (MPEG-5 Part 2) Codec

Encoder

To understand the inner-workings of the LCEVC Encoder, let’s take a look at the block diagram from a paper published in the ITU Journal (please take some time and read this - it has a good explanation of the LCEVC codec.)

LCEVC Encoder Block Diagram (ITU Journal Paper on LCEVC [Link in the References])

The block diagram above gives a very clear picture of the encoding process of the LCEVC codec and here is how it works -

  1. Downsampling: There are two downsampler blocks that receive the full-resolution image as the input and produce two downsampled images (one each at the output of the first and second stage downsampling respectively).
  2. Compressing using the base codec: The base encoder takes the output of the second-stage downsampler and compresses it using a “base codec” which can be anything you choose (AVC, HEVC, AV1, VP9, etc.).
  3. Upsample and Compress at Level L1
    1. Next, the base-image (downsampled two times) is upsampled once.
    2. A difference image is computed using the upsampled image and the output of the first stage downsampler.
    3. The difference image goes through transform, quantization, and entropy coding and the final output is transmitted as the “L1-Coefficient Layer”.
  4. Prepare the Input for the L2 Stage: The encoded output of the L1 stage is reconstructed and then it is upsampled to produced a reconstructed image of the original resolution (because it has been upsampled two times now).
  5. Compression at Level L2
    1. At level L2, you have the (1) original image and (2) the reconstructed image (that started its life at the base layer).
    2. The difference of these two images is computed and then compressed to produce the L2 Coefficient Layers.
    3. Optionally, temporal prediction can be performed and the resultant prediction coefficients can be compressed and transmitted to the end-device.
  6. Encoding sparsely populated images: this is a huge technical challenge because the Discrete Cosine Transform (DCT) was designed to take advantage of spatial correlation between pixels. The LCEVC codec has worked around this problem by introducing small transform kernels (2x2 and 4x4) to avoid trying to compress large blocks of information.
  7. Entropy Coding of the Enhancement Layer Coefficients: Considering that the information is quite sparse to begin with and the use of small 2x2 and 4x4 kernels, the authors of LCEVC decided to use a Run Length Encoder (RLE) and a Prefix Coding Encoder. A Run-Length Encoder is a very simple method of entropy coding and has been used successfully in the past as the basis of CAVLC (H.264/AVC).
  8. Temporal prediction between the original full resolution image and the reconstructed full-resolution image. This results in the Layer 2 Enhancement Coefficients and the encoded Prediction Vectors.

Note: Using large transform kernels to compress sparse information is a bad idea e.g., if you have a 32x32 DCT applied on a 32x32 macroblock that is filled with mostly black pixels with 20 white pixels randomly distributed (2% white) and you decide to use DCT on it, its a safe bet that those white pixels will get lost during the DCT & quantization process. This might be fine for normal image compression, but, when you are trying to compress “difference” images, then preserving sparse regions is very important.

Note: The details of the transform kernels, upsampling and downsampling filters are too complex for this introductory article on LCEVC and will be covered in a future article in this series. Subscribe to get notified of these articles directly in your inbox.

Decoder

Here is the block diagram of the LCEVC codec’s decoder from the ITU Journal publication. We won’t dive into it in detail because the decoder does the opposite of what the encoder does and the block diagram explains this process very well.

LCEVC Decoder Block Diagram (ITU Journal Paper on LCEVC [Link in the References])


Rather, let’s take a look at the image below that shows the LCEVC decoding process (taken from this presentation). It provides an intuitive understanding of how the LCEVC decoder works.

A Great Visual Representation of How LCEVC Works

Here’s what is happening -

  • The decoder get a small image as its initial input and upsamples it to produce the “Preliminary Intermediate Picture”
  • Then, the first Enhancement SubLayer is added to the Preliminary Intermediate Picture to produce the Combined Intermediate Picture. So, now, we have completed the first stage of decoding.
  • The “Combined Intermediate Picture” is upsampled to produce the “Preliminary Output Picture” - now this image is at full resolution.
  • Finally, the second Enhancement SubLayer is added to the Preliminary Output Picture to produce the Combined Output Picture. Optionally, if any temporal prediction exists in the Enhancement Layers, then that is combined to produce the final output picture.

What is important to note however, is that, the output of the base codec is sent through two upsampling stages to produce the full resolution image. In the absence of LCEVC Enhancement Layer support, the device can directly render the output of the base codec and this is important to maintain backward compatibility.

Complexity of the LCEVC Codec

With a good understanding of how the LCEVC codec works, let’s proceed to understanding the coding complexity. In other words, how much effort is needed to squeeze compression gains out of the LCEVC codec.

A few things are quite apparent to me at the onset.

  • The complexity of the LCEVC codec is tied to the complexity of the encoder being used for the base layer (i.e., AVC, HEVC, AV1, or upcoming codecs like VVC, etc.).
  • The complexity of the LCEVC codec is tied to the resolution of the base layer and consequently on the choice of the downsampling step-sizes. Why? Let’s take an example to get a better picture. If the full resolution video is 1920x1080p, and there are two downsampling stages (each downsampling by a factor of 2 horizontally and vertically), then the base layer is 480x270p. Now, compare two encoding scenarios
    1. LCEVC codec that uses AVC as its base codec; input = 1080p; base-layer = 270p (this is the image size upon which the AVC encoder acts)
    2. standalone AVC whose input = 1080p

In any conventional video codec (AVC or HEVC for e.g.), the coding complexity reduces with reduction in the resolution (with other parameters unchanged). In other words, you need to do more work to compress 1080p than 270p. So, LCEVC comes out on top!

One question does arise though.

What about the coding complexity of the Enhancement Stages?

From my understanding and tests, they do not contribute much to the coding complexity and one of the primary reasons is the simplicity of those stages.

If you look at the Encoder block diagram, you will see that there is no temporal prediction at the enhancement layers i.e., no P, B pictures or Hierarchical Prediction. It is common knowledge that temporal prediction contributes almost 90% (or more) of the encoding complexity of modern encoders like AVC, HEVC, AV1.

The enhancement layer compression is very similar to compressing standalone images (like what JPEG does) and this can reduce the coding complexity a lot even before advanced techniques like block-level parallelism are even evaluated!

Note: In a future article in the series on the LCEVC Codec, we will perform tests using FFmpeg and quantify the results using different base encoders such as H.264/AVC, HEVC, AV1, etc.

Subscribe to get notified of these articles directly in your inbox.

Applications of the LCEVC (MPEG-5 Part 2) Codec

What is going to be interesting are the applications of LCEVC. From my perspective, it has two massive advantages (among others) -

  • reduction in the time needed to encode and decode (i.e., reduced coding complexity)
  • bitrate reduction through the use of base and enhancement layers.
  • backwards compatibility and the ability to use only the base layer.

Application of LCEVC in Sports Broadcasting

But, if we talk about a very-high value sector, then it would have to be sports broadcasting where LCEVC can shine. Guido Meardi says in an interview to SVGEurope,

With LCEVC it is possible to reduce the amount of data being delivered without compromising on quality, easing the effective management of vast concurrent traffic – a need that is more acute with lock-down preventing groups from gathering to watch televised sport.

I couldn’t agree more. And the reason is an unique confluence of factors (at the time of writing this article i.e. July 20, 2020) which are -

  • the world is reeling under the COVID-19 pandemic
  • there is no vaccine in sight and reports suggest that large-scale manufacture and administration of vaccines could stretch well into 2021.
  • it’s safe to assume that no government will allow stadiums to get filled with 30-40,000 people in the near future due to the fear of community transmission of COVID-19.
  • BUT, I am confident that governments will make an exception to sporting teams and allow them to play games behind closed doors in a bio-secure bubble (i.e., the players won’t be allowed to visit their families for a week before and after the match, only a few officials and ground-staff will be allowed in the stadium, etc.).

And these restrictions will bring sports broadcasting front and center.

The “crowd” that was going to attend the match in the stadium is now going to tune in to the game on their connected/smart TVs and they are going to expect nothing but the best possible quality transmission.

This is where LCEVC can make a mark by reducing the glass-to-glass latency and improving the coding efficiency, cutting bandwidth requirements, and improving video quality while they are at it.

Application of LCEVC in E-Learning and Online Education

Apart from this, I can see LCEVC being used in the Education sector quite a lot. Online-education has finally come of age (unfortunately, due to COVID-19) and it is definitely going to add additional burdens on already stressed bandwidth connections at homes. This is where LCEVC can contribute by reducing the bandwidth consumption. In addition to layered-video-coding approach, most E-Learning videos are typical “head-and-shoulder” videos or graphics which can be compressed quite easily by modern video codecs.

Final Thoughts

I think that the LCEVC codec along with EVC and VVC are going to make 2020 and 2021 very interesting in the world of video compression. With competition from AV1 and the patent/licensing sword perennially hanging over the codec world, I think LCEVC has an interesting role to play for existing codec licensees.

In the next few articles on our series on LCEVC, we will

  • cover details of the upsampling and downsampling filters.
  • learn how to use FFmpeg to produce LCEVC bitstreams
  • quantify the performance of LCEVC vis-a-vis the base codecs used (AVC, HEVC, AV1)

Until next time, take care and good luck. Please subscribe to get notified of future articles directly in your inbox.

Acknowledgment and References

This article would not have been possible without the help of the material published by the folks at V-Nova, ITU, MPEG, and other websites. A brief list of interesting resources are linked below.