Video compression is one of the most important aspects of video production. The need for high quality visuals and sound in order to capture a viewer’s attention can often be hindered by the data limitations faced by streaming services like YouTube, Facebook, and Vimeo. This blog post explains the importance of video compression through a practical exercise to drive home the point!
Why compress video?
Video Compression has been a widely researched topic for decades and rightfully so.
Each passing year brings innovations in video capture, rendering, and display technologies. Along with this, companies are faced with consumer expectations of better-looking videos (higher quality) at the same cost or lower than what they are paying.
In other words, content providers like Netflix, HBO, etc. need to show you better quality videos while not passing on the cost to you! Unfortunately, this is easier said than done.
Every day, millions of people around the world watch videos in the form of movies, shows, news, and sports broadcasts, TikTok clips, or the ubiquitous cat or “fail” videos on Facebook.
But, almost everyone is ignorant of the complexity that goes into producing, storing, and delivering the video each time the “Play” button is pressed. People stand in front of the Eiffel Tower and marvel at the hard work that went into building it. But, hardly anyone is astonished when they press play and the season finale of LOTR magically appears on their screens without a hitch! Right?
The fact of the matter is that building a video delivery pipeline which ensures that your audience can watch their cat videos on-demand anytime and anywhere is not as easy as it sounds!
And not cheap, either 🙂
A critical component in video capture and delivery pipeline is the video compression piece (formally called the encoder/transcoder). Companies spend a lot of time and energy in choosing and fine-tuning their encoders (literally in the high 6-7 figures).
But, has it ever occurred to you why we need to compress video before transmitting it? What is the trade-off that is being made here? What are we gaining by compressing video? Is it negligible and can be done away with?
These are good questions because once the “why” is understood, then everything else will fall into place!
In this article, we will try and make the case for the existence of video compression. And hopefully, set the tone for a series of articles that will take you into the depths of the art and science of video compression.
So, let’s get started.
What is the cost of a single Pixel?
Okay – so, to store videos (and deliver them, of course), you need to budget for storage devices. For now, let’s ignore the delivery side of things and focus purely on the storage problem.
The storage devices (SSDs or HDDs) to store videos represents your cost. And our goal is to estimate how much would it cost to store a video that is in 1080p format, 24 fps, color (RGB), and 90 minutes long.
However, in order to understand the cost, let us first begin by dissecting a video into its constituent parts.
So, what is a video made up of? The answer is surprisingly simple – a movie is made up of a series of images that are shown to you at a set speed that tricks your mind into thinking that there is motion!
Each of these images is called a “frame” and so, a movie is basically a series of frames in a particular order.
Going one level deeper, a frame is made up of pixels.
Several pixels arranged in a particular order make up a frame. For example, a 1080p frame has 1920 x 1080 pixels arranged in 1080 rows and 1920 columns.
And, taking this understanding further, a frame isn’t a simple 2D array of numbers. In order to produce a sense of color, in our example, we assume a frame of video has 3 planes – one for Red, Blue, and Green. The primary colors – Red, Blue, and Green can be combined to create any color that you want. [Here is another explanation of this concept].
For example, if you mix Red, Green, and Blue in equal proportions, you can produce shades of grey – from black to pure white! This property is so useful that these three colors are used to produce all the different colors in digital displays (TV, or your phone).
Cool – so now we know that a frame is a 3D array of pixels and the pixels are in fact just numbers that represent the intensity of a color (R, G, B in our example).
Assume you use 8 bits of memory to represent one pixel in one of the color planes, then, to represent the intensity of the RGB triplet, you need 24 bits of memory.
In other words, we need 24 bits of space to store 1 pixel (R, G, B).
Note: Readers who are aware of video technologies will recognize that videos are not stored in RGB format and that they are stored in YUV (420, 422, or 444) formats. Additionally, you’ll be aware that bit depths can vary (8, 10, 12 bits per pixel). However, for this article, let us assume that our video is being stored in RGB and 8-bit format. We will cover the different colors, subsampling, and bit-depth formats in future posts in the video compression series. Thank you for bearing with this simple example!
How much does a 90 min, 1080p movie cost?
Now that we know that it costs 24 bits to store a single pixel of a video (R, G, B), let us compute the cost of an entire frame of video.
If you are watching HD video or 1080p, we are talking about 1920 x 1080 pixels per frame.
Hence, the cost of a frame is
(1920 * 1080 pixels * 24 bits/pixel) = 49766400 bits.
However, we aren’t here to watch images. We want to see the entire movie!
In order to maintain a sense of motion (or to trick your eyes into thinking that there is “motion”), your TV needs to show 24 frames every second (fps), or else your brain will quickly realize that you are not watching a video. This phenomenon is called the “persistence of vision” or the “illusion of motion”.
Just so we are clear, 24 fps is considered the bare minimum frame-rate (frame-rate is the number of frames in every second of video). The industry feels that 60 fps is a good standard frame-rate and I agree.
There is a superb explanation here
Here is an animation of 24 fps vs 60 fps
Back to our math
So, assuming, our movie is produced in way that it contains 24 frames each second, then what is the storage space needed for 1 second of video?
49766400 bits/frame * 24 fps = 1194393600 bits/second.
This is for 1 second of a movie. We are talking about a 90 min video here. So how much space do we need to store 90 minutes of video?
1194393600 bits for 1 second * 60 seconds in a minute * 90 minutes = 6449725440000 bits
Wow – let’s make that number manageable.
size = 6449725440000 bits = 806215680000 bytes = 787320000 kilobytes (assuming 1024 bytes in a kilobyte) = 768867.1875 megabytes (assuming 1024 kilobytes in a megabyte) = approx. 750 gigabytes (assuming 1024 megabytes in a gigabyte)
750 GB to store a movie?? That’s freaking crazy!
Putting 750 GB into perspective
If you want to store 750 GB worth of data on a really good SSD, then, let’s assume that buy a Samsung Evo SSD drive (each costing $250).
Will anyone spend $250 to store a single movie? It’s an absurd thought, right?
Now you might argue that we could use a cheaper storage technology, but, that isn’t the point we are trying to make here.
Video storage is in reality expensive and that’s why there is continuous innovation in the industry to come up with newer algorithms and techniques to shrink movies into more manageable sizes without affecting the video quality.
Just think of the Netflix app on your phone. Netflix gives you the option to download movies onto your phone so that you can watch them if you go offline. You can download an entire movie (many movies, as a matter of fact) and still not blow up your phone’s memory.
You don’t download several gigabytes or terabytes of data – do you? You are downloading 1 or 2 GB at the most.
This massive reduction in the filesize is due to the art and science of video compression.
Is Video Compression an Art or a Science?
Video Compression is a science because it has very precise formulations and algorithms that are invented in labs around the world and continuously tested and improved upon.
On the flip side, video compression is also an art because the perception of the quality of a video is subjective and different people will perceive video quality differently.
As we shall see in future posts in the Video Compression series, when you try and reduce the size of a video file, the quality degrades. To one person, the image might look sharper, whereas the person standing next to him might think to soften the image would make it tolerable. This need to satisfy complex “visual tastes” makes video compression an art as much as a science.