In this article, we take a look at what a video codec is, what are video coding standards, and how video coding standards related to the process of compressing and decompressing data.
In a previous article on data compression, we established that doing some sort of data prediction is critical in order to reduce the size of the data at hand.
During that prediction process, we ended up designing a “language” in order to convey the meaning and syntax of the compressed information to the person at other end.
The language’s design might have been subtle – and, in this article, let’s bring to the fore, shall we?👋 This is part of a series of articles titled “The Hitchhiker’s Guide to Video Compression” – a gentle and opinionated introduction to the fascinating world of video compression.
Data Compression and The Creation of a Language
To kick off our understanding of a video codec, I am going to ask you to convey or transmit the following sequence of English alphabets to your friend. Now, here are the constraints – we have a primitive system at hand and it takes 1 byte of space to store either an alphabet or number that you use. Okay?
<data> A A A A A A A A A A A A A A A A A A A A B </data>
Note: In case you find it hard to read, there are 20 A’s and 1 B.
What would you do?
Answer: I would say, that there are two obvious techniques (apart from all the advanced entropy coding techniques that we’ll pretend you haven’t heard of) :-
Idea #1: write out all the data using 1 byte for each “A” and 1 byte for the “B”. This takes up 21 bytes of storage space.
Idea #2: Represent the data as follows using, 1. 1 byte to represent the first character “A” 2. 1 byte to represent the number “20” 3. 1 byte to represent the alphabet “B” 4. 1 byte to represent the number “1”
That gives us something like this :
A 20 B 1. Just 4 bytes – amazing, right?
Okay, let’s call
A 20 B 1 a bitstream and send it to someone.
But, wait! How is he supposed to understand this bitstream? It won’t make sense to anyone other than you and I, right?
Yes! And, this is precisely why we need to define a “language” to go along with our data. This “language” needs to have clear rules that govern the “decoding or decompressing” of the “coded or compressed” data.
Let’s do just that and call our “language” CompressIt.
CompressIt has very simple rules and says that –
- odd-numbered bytes represent the alphabet to be written down, and
- even-numbered bytes represent the number of times that alphabet is written down consecutively.
Having designed our language, I can now take a copy of CompressIt’s specifications and publish it online for other engineers to read and possibly find loopholes or bugs.
Bugs in a 2-line document – how’s that possible?
To that, I say – why not?
For example, someone might come forward and say, “Hey guys, what happens if an alphabet is repeated 150 times and I ask this because
150 exceeds the value that a byte can hold”.
Oops, that’s a major blooper on our part and this shows the importance of peer review!
So, now the language creators (you and I) go back to the drawing board and come back with a better spec and send it out for review, again.
This process continues till all the chinks are ironed out and the world (of codec engineers) agrees upon the rules of this “language”.
Welcome To The Birth Of A Codec
I know I have been using the word “language” in this example, but, what you and I have actually done is to define and implement a “codec” – a scheme for compressing and decompressing data.
In the real-world, here is how a codec is born (in most cases i.e.)
A group of individuals known as a “codec committee” get together and,
- tell the world that there is a need for a new codec and specify the requirements (example: 50% better compression than the previous codec, ability to compress 8K video, new color formats, etc.)
- ask for contributions, proposals, and suggestions and evaluate their feasability.
- coordinate all the documentation, testing, experiments that are necessary to decide on the rules and tools for the codec
- and, finally, publish the codec’s specifications for the world to use.
This is the same birthing process which every video codec out there (H.264/AVC, HEVC, AV1, VP9, etc.) has gone through. And now, you know how its done!
But is there a need for a video codec to be explained in great detail? Can’t you just release the software? Is this a waste of time and money?
How Does Specifying a Codec Help the Industry?
Great question! Let’s go back to our CompressIt example to understand better.
If some engineer in Russia wants to write a decoder for the CompressIt codec, all he needs to do is get a copy of the codec’s specification and understand it.
From studying CompressIt, he realizes that he needs to
- take an incoming stream of data or read a file that was compressed using CompressIt,
- take the odd-numbered bytes and use that to represent the alphabet (lets say: X), and
- read the even-numbered byte (= N) and use that to repeat the alphabet “X” N-times.
And, repeat this process till the end of the file is reached.
Well, this is precisely what a codec specification enables – it specifies how the bitstream should look and behave so that anyone in the world can write a decoder for it.
It doesn’t tell you how to create that bitstream, and only tells you how it should look.
Here’s another real-world example of the importance of a video codec and its specification.
Let’s say you read online that “Apple has added support for HEVC in their next release”. What this means is that an engineering team at Apple
- downloaded a copy of the HEVC spec,
- read it (again and again and again and again – open the HEVC spec and you’ll see why)
- wrote a program that can decode a bitstream created using the HEVC codec,
- and are planning to release it in their next software update.
And why were they able to do this?
‘Cause a bunch of engineers got together to create the HEVC video compression standard and published a document spelling out every step of the process and the bitstream – so that anyone in the world can write a decoder for it.
Do Codecs Specifications Define the Encoders?
No, but, there is a bit of word-jugglery going on here here. Let me explain.
A codec specification tells you what tools are present in the language, what the output of an encoder should look like, and how a decoder will parse it.
But, what this also means is that you can program the encoder anyway which you want to – as long as the bitstream that is produced by the encoder conforms to the codec’s specification. That is the most critical point I am trying to make here.
In other words,
- if you are using CompressIt to compress data, you cannot switch the meaning of the even and odd-numbered bytes because all the decoders in the world will get confused.
- but, you can use any technique to count and record consecutive alphabets and the number of times they were repeated. Heck, you can use a quantum computer if you’d like to – as long as you ensure that the bitstream out of the encoder conforms to the spec.
probably surely get fired for using a quantum computer, but, you’ll still have written a valid encoder. Silver lining, eh?
Good. Now, here is a question for you.
When you program an encoder or a decoder, how do you know that it works correctly? In other words, where is the gold standard or the reference encoder / decoder for you to compare your code against?
Test Models and Reference Encoders & Decoders
The “need” for a gold standard or a reference is why codec committees and working groups release “test models” and reference encoders and decoders.
These implementations are not optimised for speed or performance. They only contain an implementation of the encoder and decoder as per the specification and you can use them to verify the bitstream that your encoder implementation produced.
In addition, committees often release encoded bitstreams that adhere to the standard. Decoder manufacturers can use these bitstreams to test their software or hardware for codec compliance.
I hope by now you understood what a video codec is, what video coding standards are, and how they pertain to encoder and decoder development. In future articles, let’s understand how the various tools in video codecs work.