Video Encoding is the science of reducing a video’s size or bitrate without adversely impacting its quality, as perceived by a human. It is both an art and a science to encode a video to reduce its size while retaining its quality. As we shall learn in this article, it’s a complex and exciting field that has a huge impact on video streaming and delivery.
Here is what we shall cover in this article on video encoding –
- What is Video Encoding?
- What is Video Transcoding and Difference from Video Encoding?
- Quality vs. Bitrate (or Size) in Video Encoding
- Important Factors in Video Encoding – Codec, Bitrate, Resolution, Time, GOP, Frame Type
- Is Video Encoding an Art or a Science?
So, let’s get started!
What is Video Encoding?
Video Encoding is the science of reducing a video’s size or bitrate without adversely impacting its quality, as perceived by a human.
Reducing the file’s size is called compression, and videos are compressed using a well-defined and documented set of mathematical tools and algorithms called video codecs. After a video is compressed, it is generally in a format (known as a bitstream) that can only be understood by a corresponding software that decodes the bitstream. For example, a video encoded using the H.264/AVC video codec cannot be decoded by the HEVC codec and vice versa.
After encoding a video, its quality can be judged either objectively or subjectively –
- Objective measures include PSNR, SSIM, VMAF, and these are software that use math to judge a video’s quality. Read more about using PSNR, VMAF, and SSIM here and check out easyVMAF, a handy tool for VMAF calculations.
- Subjective measures like MOS involve a strong human element where groups of people score a video on a scale of 0-5 to indicate its quality. This is also referred to as golden-eye viewing.
What is Video Transcoding & Difference from Video Encoding
You might hear people using the terms Video Encoding and Video Transcoding inter-changeably. While this is generally not frowned upon, there is a subtle difference between the two.
Video encoding generally refers to the process of compressing raw, uncompressed video. While video transcoding refers to re-encoding compressed files, it involves an additional step to decode the incoming video before encoding it.
But, don’t be confused – both processes encode the video, at the final step.
Typically transcoders have a lot of capabilities, such as the ability to
- decode different container formats (mp4, ts)
- decode bitstreams using different video codecs such as H.264/AVC, HEVC, AV1, VP9, etc.
- change the resolution of the video to produce outputs of different resolutions (critical to ABR stream production)
In addition to transcoding, there are two more terms used sparingly in the industry.
- Transrating is the act of changing the bitrate of the video
- Transmuxing is the act of changing the container format (e.g. mp4 to avi or ts).
With this introduction to video encoding and transcoding, let us now understand the all-important rate-distortion trade-off in video encoding in the next section.
Video Quality vs. Bitrate (Size) in Video Encoding/Transcoding
To understand the trade-off between bitrate (or size of the video) and video quality, it is important to understand how video compression works. You don’t have to go deep to get an intuition about the quality-bitrate trade-off, but, you need to understand a couple of things.
A video compression algorithm does two important things (in codecs like H.264/AVC, HEVC, VP9, AV1, etc.) –
- Converts the video from the “pixel domain” to the “frequency domain” using the Discrete Cosine Transform (DCT). If you don’t know how the DCT works, check out our explanation to a 5-year-old.
- Discards many frequency domain data (known as “coefficients”) using a technique called Quantization while ensuring that the human eye cannot perceive this loss of data.
In essence, when you compress a video, you are throwing away some information while ensuring that the video quality is not destroyed.
- If you compress the video heavily, you lose a lot of information, and the effect of compression is visually perceivable (bad video quality).
- If you don’t compress the video a lot, then the file size is big, and the video quality is excellent.
This is called the rate-distortion tradeoff in video compression. The smaller the bitrate, the poorer the video quality – assuming that the resolution, bit-depth, and chroma-subsampling, encoding time, hardware, etc., are fixed.
This fact is illustrated in the screengrabs from the CrowdRun sequence encoded using H.264/AVC at two different bitrates, with all other parameters unchanged between the encodes.
The stream with the lower bitrate has poorer video quality than the video with the higher bitrate. This illustrates the point that we made earlier regarding the rate-distortion tradeoff.
However, it would be wrong to make a blanket statement that lower bitrate implies poorer quality. As demonstrated in this article, depending on the nature of the content (cartoon, head-and-shoulder), it’s quite possible that lowering the bitrate does not appreciably lower the video quality.
Now, let’s learn about some important factors that affect video encoding that you should be aware and wary of before you encode/transcode.
Important Factors that Affect Video Encoding
There are so many factors that go into video encoding and transcoding and most of them are affect each other. For example, you can’t arbitrarily change the video’s resolution while transcoding and ignore adjusting the bitrate, right?
In this section, let’s take a look at some of the most important factors that go into producing a well-compressed, high-quality bitstream.
Nature of the Content
Not all videos are created equal! Every video is unique and has a flavor to it and this is what makes video compression an art. The settings or tuning that goes into compressing high-action NFL video is different from what one would use to compress cartoon video like the Simpsons – agree?
So many factors go into classifying a video such as,
- Does it contain high-action sports?
- Is there a lot of grass or water (tough to compress!)
- High action movie (war or high-speed car chase?)
- Head and shoulders content like news and talk shows
- Cartoon, anime?
- Does the content have ticker text that moves in one direction while the movie moves in another direction?
There are so many different kinds of scenes that set movies apart, and engineers can better compress videos. If you know that the movie contains only flat regions like a cartoon, you can assign a lower bitrate than a high-action sports clip like the one shown below.
Here is how I define a video codec – “A video codec is a set of tools and algorithms designed to compress video to achieve a pre-determined rate-distortion tradeoff.” Video codecs are typically built by consensus and involve committees of engineers, scientists from academia, and industry (software and hardware companies).
Popular examples of video codecs are H.264/AVC, HEVC, AV1, VP9, EVC, VVC, LCEVC.
Every codec has a specific goal when it is designed. It might be to achieve higher performance than the previous generation, or be royalty-free, or introduce a new method of compression (like LCEVC has done).
And, because every codec is differently designed and optimized, their performance is bound to be different too. For example, AV1 achieves a much greater compression efficiency than H.264/AVC but uses more time and resources than AVC. And, this is okay! It just comes down to your needs and resources.
To learn more about video compression and codecs, please take a look at the following articles –
- Architectural deep-dive of LCEVC and LCEVC vs. H.264/AVC – a 28% gain at 3x the speed
- Introduction to EVC (Essential Video Coding)
- Interview with the architects of VVC (Versatile Video Coding)
For a simple explanation of video codecs, please read our intuitive guide to video codecs.
Rate Control Mode (CBR, VBR, Capped VBR)
Every encoder has a “rate control” algorithm that determines how a particular bitrate-budget is spent over a period of time (or GOP). The rate control technique used in a codec has a tremendous impact on its compression efficiency, video quality, and speed.
Below are three popular rate control algorithms. A deep discussion of the three is impossible in this overview article, but stay tuned for a deep dive into rate control.
- CBR or Constant Bitrate: the average bitrate is kept constant while sacrificing video quality.
- VBR or Variable Bitrate: the video quality is kept constant while allowing the bitrate to fluctuate.
- Capped VBR or Capped Variable Bitrate: the video quality is kept constant while allowing the bitrate to fluctuate within a limit or cap.
Depending on the mode that you choose for video encoding/transcoding, the encoder will optimize the tradeoff between quality and bitrate or file size.
Video Bitrate or simply, Bitrate is the number of bits of video information transmitted per second. The units of bitrate are typically
- kbps or kilobits per second
- mbps or megabits per second
When you start encoding, you typically need to provide a bitrate to the encoder. Depending on the encoding mode (CBR, VBR, 2-pass VBR, capped VBR), the encoder will use the bitrate value as a guide to compress the video.
As a rule of thumb, the higher the bitrate, the higher the video quality. But, there are always exceptions, and depending on the resolution, and the content (slow, fast, head-shoulders, etc.), increasing the bitrate will not increase the quality of the video.
For a deeper discussion of Bitrates in Video Encoding, go here.
Video Resolution or Resolution of a Video is width by the height of a video. The unit of measurement of video resolution is Pixels. Video Resolution is generally indicated in a couple of ways –
- using the height of the video like 1080p or 720p.
- or, by mentioning the values of width and height – 1920×1080, or 1280×720.
The resolution of a video plays an important role in encoding for obvious reasons.
- a high resolution video (4K for e.g.) will require more bits and time to compress.
- a low resolution video (360p) will require fewer bits and will be faster to compress
So, when it is time to choose an encoding bitrate, do keep in mind the video’s resolution. For a deeper discussion of Video Resolutions in Video Encoding, go here.
The time needed to compress/encode/transcode a video is a critical factor in video encoding. Different circumstances require different solutions, right?
- If you are live streaming video, then you cannot afford to encode 1 frame of video every minute! It will be catastrophic! You need to compress your video in realtime at approximately 24 fps, 50 fps, or 60 fps. In such situations, encoders typically sacrifice quality in order to gain speed.
- However, if you only streaming video-on-demand, then you have the luxury of taking several hours to compress each hour of your video. In such situations, encoders are tuned to compress slowly and use several complex tools to get higher compression efficiency and better video quality.
For example, in encoders such as FFmpeg, you have predefined settings called Slow, Slower, VerySlow, Fast, Faster, Fastest, and these settings indicate the tradeoff in the encoder between speed, quality, and compression efficiency.
Depending on your use-case (live or on-demand), you should choose the encoding setting (and speed).
GOP (Group of Pictures) and GOP Length
A Group of Pictures (GOP) is a collection of video frames with a well-defined order in which they are to be encoded/decoded, and displayed. The length of a GOP is referred to as “GOP Length” and profoundly impacts video compression efficiency, stream resilience, and video quality.
In short, a very large GOP size will typically give high compression efficiency, but, low on video quality and resilience. And, vice-versa.
Closed GOP and Open GOP
Closed GOPs and Open GOPs are common in video streams and impact the compression efficiency, stream’s error resilience, and switchability in ABR streaming.
- As the name suggests, a Closed GOP is closed to frames outside the GOP. A frame belonging to a Closed GOP can only refer to frames within its own GOP.
- An Open GOP is the opposite of a Closed GOP and it allows frames from an Open GOP to refer to frames from another GOP.
Please read this article for a deep-dive into Closed GOPs and Open GOPs, their impact on video compression and ABR streaming.
Frame Types (I, P, B frames)
The concept of I-frames (IDR or Key Frames), P-frames, and B-frames is fundamental to video encoding and they are used to improve the stream’s compression efficiency, video quality, and resilience.
- An I-frame or a Key-Frame, or an Intra-frame consists ONLY of macroblocks that use Intra-prediction.
- A P-frame stands for Predicted Frame and allows macroblocks to be compressed using temporal prediction in addition to spatial prediction. For motion estimation, P-frames use frames that have been previously encoded.
- A B-frame is a frame that can refer to frames that occur both before and after it. The B stands for Bi-Directional for this reason.
Please read this article for a deep-dive into I, P, B-frames, their impact on video compression and ABR streaming.
Upscaling and Downsampling tools & algorithms
While we won’t be doing a deep dive here into up/downsampling algorithms, one should keep in mind that these tools affect the video’s quality quite dramatically. Suppose the input video’s resolution is 540p and you want to produce a 720p output, then you need a tool that upsamples the video to 720p, right? And by downsampling the video, this tool has a profound impact on the video’s quality – if it does a good job, then the output video will look good. Otherwise, you are in big trouble.
Typically, upsampling and downsampling tools use well-researched image processing filters like Lanczos or Bicubic filters to create videos of new resolutions. However, it would be worth testing these tools before choosing to buy or commit to an encoder/transcoder.
With that, we come to end of our list of important factors that impact video encoding. While not exhaustive, it provides a guidance on what to look out for. Let’s summarize this in the next section with our video encoding checklist.
Video Encoding Checklist
Here are some points to keep in mind before encoding a video. This is not an exhaustive list, but, will help you narrow down on the right parameters for video compression.
- Understand your audience – what is the typical bandwidth that your audience has? Are you streaming to an urban population with high-speed Internet? Or to a rural population with low Internet speeds? Do they use a fixed connection (cable internet)? Or, are they always on the move and are using their mobile data plans?
- What devices are your end-users using? Are they typically watching on large screen SmartTVs, or are they primarily using hand-held smartphones?
- What is the nature of the content you’re streaming? Is it high-action movies, sports or low-action content like news, or simple-to-compress content like cartoons?
- How much time do you have on your hands for compressing the video? Live-streaming or On-demand?
- Which video codec do you use? Will your target audience’s devices be able to decode H.264/AVC, HEVC, AV1, or VP9?
These are only some of the factors that come into play before you encode/transcode your videos. I am sure there are many more, but I hope this checklist serves as a good starting point.
Now, lets look at a few tools and services for video encoding and transcoding.
Tools for Video Encoding and Transcoding
In this section, let’s take a look at popular tools and services for video compression/encoding/transcoding. Some are paid and some are open-source – each with different capabilities and features.
Let’s take a look at the open-source video encoding options first.
FFmpeg (open source)
FFmpeg is by far the most popular tool for video encoding/transcoding. It supports video decoding, demuxing & muxing to/from most container formats, changing video resolution, and can encode in the most popular video codecs such as H.264/AVC, HEVC, AV1, VP9, etc., making it an indispensable tool for video compression.
Handbrake (open source)
Handbrake is an open-source video transcoder that’s widely regarded as the best GUI-based tool for video encoding and transcoding. It’s effortless to use, multi-platform, and it covers a vast range of presets and devices as well. This means you will find it easier than ever to compress videos quickly without spending money on a conversion or transcoding tool or having to spend hours crafting command-line arguments. Check out OTTVerse’s “Introduction to Handbrake” article for more information.
Paid Tools and Services for Video Encoding
There are several (literally, hundreds!) of paid services for cloud-based video compression. These services typically support multiple container types, codecs, and ingest/egress formats like RTMP, HLS, DASH, etc.
Some of the more popular services are listed below, but this in no way is an exhaustive list.
Do let me know if any other services should be added to this list of cloud-based video transcoding services.
So, Is Video Encoding an Art or a Science?
Having learned so much about video encoding and compression, here is an interesting philosophical question. Is video encoding/transcoding a science or an art? What do you think?
Honestly, if you ask me, I’d say that the jury is split on this question.
Video Encoding is a science because it has exact formulations and algorithms invented in labs worldwide and continuously tested and improved upon.
On the flip side, video encoding is an art because the perception of a video’s quality is subjective and different people will perceive video quality differently. So, when you set up an encoder or an encoder’s compression tools, you cannot simply go by what the quality measures and bitrate values say in some table that is handed out to you!
A serious engineer will encode his video, and watch them several times in different settings and decide which one is most likely to satisfy a large percentage of his viewers. People-pleasing is an art, you see 🙂
So, what do you think? Is encoding an art or a science? Let me know what you think in the comments section!
I hope this article on video encoding and video transcoding has been educational and gave you a taste of the art & science of video compression. Do check out OTTVerse’s Compression series of articles for more information.
Until next time, take care, and happy streaming!
Krishna Rao Vijayanagar
I’m Dr. Krishna Rao Vijayanagar, founder of OTTVerse. I have a Ph.D. in Video Compression from the Illinois Institute of Technology, and I have worked on Video Compression (AVC, HEVC, MultiView Plus Depth), ABR streaming, and Video Analytics (QoE, Content & Audience, and Ad) for several years.
I hope to use my experience and love for video streaming to bring you information and insights into the OTT universe.