In this article of the “Hitchhiker’s Guide to Video Compression”, let’s take a look at the video pre-processing block that is critical to every encoding / transcoding workflow.👋 This is part of a series of articles titled “The Hitchhiker’s Guide to Video Compression” – a gentle and opinionated introduction to the fascinating world of video compression.
Video Pre-Processing is a very important step in any commercial encoder. Though it is not a part of any video codec or video coding standard, it is important to understand what happens in a pre-processor because of its impact on video compression efficiency.
In this article, let’s take a look at some important video pre-processing steps, shall we?
Interlaced to Progressive Conversion (De-Interlacing)
De-interlacing is a common scenarion where the input is interlaced video and the output needs to be in progressive format.
Interlaced video was developed for formats such as NTSC and PAL where alternating lines are displayed that are taken from two separate fields which were captured at slightly different times. So, you display the odd-numbered lines and then the even-numbered lines. This is done so fast that it gives the impression of a complete image.
But, if you are given an interlaced video and asked to produce progressive output, you need to do some work. In this case, you’ll need to interleave the top and bottom fields from an interlaced movie, apply some cleanup-filtering to remove any distortions, and then send it to the encoding pipeline.
Or, you could simply duplicate the rows present in the field (also known as “bobbing”).
De-interlacing has been studied extensively for the past couple of decades (or more) and there are several good products and algorithms out there for you to choose from. Whichever algorithm you choose to use to for de-interlacing, you are bound to have some combing artifacts. So, it is always worth the money investing in a good de-interlacer.
This is another common pre-processing step in video encoders. For example, if your input video is 1920x1080p @ 60 fps, and you want the output to be 640x480p @ 60 fps, then you need to resize the frames before sending it to the codec pipeline.
Image resizing is exceedingly common in OTT compression workflows, where you have several different resolutions in your bitrate ladder.
How do you resize images though? The most naive way of image resizing is to simply throw away unwanted pixels or add new pixels during the resizing process, but this can lead to very annoying visual artifacts.
Modern encoders and video pre-processors use well-researched filters such as the bicubic, bilateral, trilateral, gaussian, or lancsoz filters in the image re-sizing process.
Let’s assume your input video’s resolution is
1920x1080 pixels at
60fps, and you want a
30fps output, then you’ll have to use an algorithm to convert the frame rates as requested.
Frame-rate conversion works both ways – you might need to discard every
nth frame if you are going from a higher frame-rate to a lower one, or you might have to add frames if you want to go from a lower frame rate to a higher one.
When you attempt at increasing the frame-rate by either frame-stuffing or frame-doubling, you need to take a lot of care so as to not introduce video artifacts and succeed at making the video look normal and not cartoonish. Frame-rate conversion is a rich & wonderful area of research, actually!
It is common for encoders to have their own proprietary noise removal algorithms to clean up the video before compressing it. Generally, these noise removal processes result in softer images due to the Gaussian noise removal filters used, but, this sometimes helps with compression efficiency.
Note: In a future article, we’ll tackle two important concepts in video compression (Transform and Quantization) and the effect of filtering on compression efficiency will begin to make sense.
Scene Change detection
For efficient video compression, it important to know when the scene changes in the video you are trying to compress.
If you know what prediction is, you’ll realise that it is useless to predict or find commonalities between two very different images. It is like searching something common between a black image and a white image – you won’t find anything.
Hence, the need for detecting where the scene changes in a movie – so that you don’t try and predict across such a scene change.
Note: If you haven’t understood this concept, don’t worry for now. After you get through the articles on Prediction and Motion Estimattion (I, P, B pictures), everything will start making sense.
There are obviously more algorithms and functions that fit the video preprocessing bill, but, I’ll stop here.
The reason I wanted to talk about video pre-processing is to show you how important pre-processing is and how much innovation can take place here.
Many people assume that it is only the codec that matters, but, that is wrong.
Any one of you reading this article can come up with a superior scene change detection algorithm, or a noise removal filter, or a frame-rate convertor and take the industry by storm by contributing it back to open-source codecs.
- De-interlacing picture taken from IBM