EME, CDM, AES, CENC, and Keys - The Essential Building Blocks of DRM

Anyone trying to understand DRM (Digital Rights Management) will be confronted with acronyms such as AES, CDM, CENC, EME, etc. This can get very confusing for a newcomer, but understanding them is important to get a good understanding of DRM. In this article, let’s take a gentle tour of the building blocks of DRM:- EME, CDM, AES, CENC, and the use of Keys & Key Servers.

OTTVerse is happy to share that this article won 2nd place at Mile-High Video Blog awards 2021

The Hitchhiker’s Guide to DRM

Table of Contents

Simplified Architecture of a DRM System

As we saw in the previous article, DRM is a combination of encryption and business rules to control access and consumption of digital content.

Simply put, DRM is a system that,

provides the tools and infrastructure to enable a content provider to encrypt their content, and
build an ecosystem around the encrypted content so the provider can control who/what can decrypt and consume their content.

In the previous article of the series, we saw Ram and Shyam sending coded messages to each other. At the same time, Hari maintained the codebooks and decided who got to read/write the notes – remember?

Now, let’s replace this simple system with the technology needed to secure and distribute video. What do we get?

Let’s describe what we have here. There is a movie that we want to send to an authenticated user securely.

So,

we ask a DRM company’s server for a codebook to encrypt our video,
then we encrypt the video using that codebook
we send the movie to the user.
the user then asks the DRM company’s server for the codebook to unlock the video (decrypt it)
and then he watches the movie!

Fantastic!

Is this all there is to know about DRM for video?

Nope! Here is a toy-example of how to transfer movies securely using DRM. It captures the essence of DRM perfectly but wouldn’t work well in the real world.

In the rest of this article, let’s take each piece of this simple system, re-think it, re-design it, and see how it fits within the world of video delivery and DRM, shall we?

Step 0: Let’s Move to Adaptive Bitrate Streaming

Before discussing the order, let’s modify our example to suit the ABR (Adaptive BitRate) video delivery model.

ABR Refresher: in ABR, a movie is encoded into different bitrate-resolution combinations (a.k.a ladder) and then split into chunks or segments. Each chunk represents a few seconds of video and it is independently decodable.

“Packaging” refers to chunking or breaking up a movie into small pieces and describing it in a manifest or playlist document. When the user wants to play the movie, he needs to refer to this manifest.

Depending on the available bandwidth, the player requests a chunk/segment of a particular bitrate (rendition, or rung of the ladder), and a CDN (Content Delivery Network) responds with the requested chunk.

Popular methods of video delivery using ABR are MPEG DASH and HLS. For a deeper understanding, please refer to our OTT and ABR video streaming articles.

Let’s change our block diagram to reflect ABR video delivery.

The only changes here are the packaging and CDN-based delivery steps. That’s all.

Okay, let’s move on and start with the encryption process.

Step 1: Video Encryption

The whole idea of encryption is to ensure that when someone intercepts our data, they should not read it or watch it in the case of video.

Encryption refresher: – encryption is a technique used to keep data confidential and prevent unauthorized people from reading it. Encryption uses a “key” to convert input data (plaintext) into an alternate form called ciphertext. It is almost impossible to convert the ciphertext back to plaintext without the key.

However, practically speaking, decryption without the key is possible, and encryption algorithms are designed to make reverse engineering extremely expensive – in terms of time, money, and computing resources needed.

One of the most popular encryption techniques is the “Advanced Encryption Standard,” or “AES”. It is also called Rijndael (after its inventor) and was established by the U.S. National Institute of Standards and Technology (NIST) in 2001 to encrypt electronic data.

Some important points to remember about AES:-

It’s a symmetric-key algorithm: encryption and decryption are performed using the same key.
It has three variants based on the key length: 128, 192, and 256 bits. The longer the key, the harder it is to crack.
Cracking the AES-128 without the key would require a “billion times a billion years” and a supercomputer (source).

If you want to go deep into the AES standard, look at the AES’s Wikipedia page. I am not an expert in cryptography and won’t be able to do justice to the AES.

Note: Encryption is not encoding, and decryption is not decoding in the video space. For videos, encoding, and decoding are words used to refer to compression and decompression, respectively. To learn more about encoding, decoding, and video codecs, please read our articles on the need for compression and a simple introduction to video codecs.

Is AES-128 The Only Encryption Technique?

No, it isn’t, and let’s think about this implication for a minute.

Suppose a content provider decides to engage with three different DRM companies, and all three use different encryption techniques. In that case, the content provider needs to encrypt their videos three times, wasting storage space and other resources.

That is why the CENC specification came into being – to reduce this encryption-driven fragmentation of the market and to reduce storage requirements.

Let’s learn about this next.

CENC or Common Encryption

Before we dive into CENC, let’s step back and look at the state of OTT streaming protocols and CMAF in particular.

There are primarily two protocols in use today – MPEG-DASH and HLS. There are others, such as MSS (Microsoft Smooth Streaming) and HDS, but, we’ll leave them aside for this discussion.

MPEG-DASH uses the mp4 container format for its videos, and HLS uses MPEG-TS (ts) container for its files. If a content provider uses both MPEG-DASH and HLS, then they need to store a copy of their videos in both mp4 and ts file formats.

Now, let’s add the DRM encryption problem to it. If our three hypothetical DRM providers use three different encryption standards, then the content providers needs to store 2 * 3 … six copies of each video! What a waste of storage space!!

To combat the first problem posed by video streaming protocols, the CMAF specification was created which said that videos can be stored in the fragmented mp4 container format (fmp4). With support from both MPEG-DASH and HLS, you can now create only one set of videos, store it in fmp4 format, and use a common set of files for both protocols.

Just make sure you create two manifests (sigh!).

How About Unifying the Encryption?

We still need to store multiple copies of each file if different DRM technologies use different encryption standards, right?

For this purpose, the MPEG developed the CENC or Common Encryption specification, specifying that videos can be encrypted using either cenc (AES-128 CTR) or cbcs (AES-128 CBC). CTR stands for Counter; and CBC stands for Cipher Block Chaining.

The implication of CENC is that a content provider needs to encrypt his videos only once, and any decryption module can decrypt it. Note: Exposing the encryption algorithm is not a problem as long as the keys are strongly protected.

CENC might sound like a magic wand for DRM unification, but it is not.

The market has three primary DRM technologies – Apple FairPlay, Google Widevine, and Microsoft PlayReady.

Apple FairPlay supports only AES-CBC cbcs mode.
HLS supports only AES-CBC cbcs mode (irrespective of CMAF)
Widevine and PlayReady support both AES-128 CTR cenc or AES-128 CBC cbcs modes.
MPEG-DASH with CMAF supports both AES-128 CTR cenc or AES-128 CBC cbcs modes.
MPEG-DASH without CMAF supports only AES-128 CTR cenc mode.

As you can see, the CMAF and CENC specs have lead to confusion and fragmentation in the streaming space.

A possible convergence point is the universal use of CMAF and AES-CBC cbcs mode, but, how will these impact legacy devices that support only CTR or only MPEG-TS?

That’s a discussion for another time.

Step 2: Key, KeyID, and the License Server

We have established that we will encrypt videos using AES-128-bit encryption. At this stage, a few questions come up are –

Where do we get the AES-128 Encryption Keys?
How do we associate an Encryption Key with a movie?
Where do we store the Encryption Keys?

Let’s answer them one at a time.

Where do we get the AES-128-bit encryption keys?

Any content provider can generate the encryption keys manually using specialized software. Alternatively, several DRM vendors provide the necessary tools and software to generate these keys.

How do we associate an encryption key with a movie?

Let’s understand the “why” first. When you go to a hotel, you ask the receptionist for the keys to a particular room by mentioning the room number – right? You’re providing the association here between a key and a room by telling her the room number.

Similarly, when we encrypt a movie with a particular key, we must create that association and provide that to the DRM license server (our receptionist, if you will).

In DRM, a “KeyID” is associated with an encryption key and a movie. It is a unique string of characters generated when creating an encryption key for a particular movie.

And finally,

Where do we store the Encryption Key & its KeyID?

The Encryption Key and the KeyID are stored in a secure server (Key Store) that works alongside a DRM license server.

When a client needs to play an encrypted movie, it requests the DRM license server for the decryption key by providing that particular movie’s KeyID. If the DRM license server is happy with the request (authentic request), it will ask the Key Store to provide the decryption key associated with that KeyID.

Bonus Question: How is the KeyID transmitted to the player?

Rationale: without the KeyID, the license server can’t look up a movie’s decryption key.

Answer: the KeyID is sent to the video player with the DASH or HLS manifest. The player parses the manifest, finds the KeyID, and asks the DRM License Server for the decryption key associated with that KeyID.

To summarize the discussion around Encryption Keys, KeyIDs, and the License Server –

The Encryption Key is “private” and needs to be stored in a secure key store along with its associated KeyID.
The KeyID can be made “public”.
Anyone with the KeyID can ask the License Server for the private key (decryption key). It is up to the DRM provider to authenticate the person asking and then supply or deny the decryption key.

Here’s a block diagram of what we learned about Keys, Encryption, and License Servers.

Step 3: Decrypting Video At The Player and the Key Server

At the client-side (player application), the user presses play on the movie he wants to watch. Now, the video player needs a way to recognize whether the movie is encrypted.

Otherwise, it will try and playback an encrypted movie, crash, and cause a horrible user experience.

Signaling that a movie is encrypted can be accomplished in several ways.

You could note in the manifest that the movie is encrypted and provide its KeyID.
Another way is to insert a few bytes of unique information into the video bitstream. When the player examines the bitstream before playing back, it can catch this unique information and realize it is encrypted.

The next few steps for the player are straightforward.

The player finds the KeyID and requests the license server for the decryption key.
The license server uses pre-defined mechanisms to recognize whether the player making the request is authentic.
After the license server is satisfied with the player’s authenticity, it responds with the license & decryption key.

We’ve described a simple scheme, but there are many problems (technical and commercial) with our scheme. Here are some problems right off the bat.

We’ve described a prototypical “player” that sends a request for the decryption keys to the DRM License Server. But,
- How does the license server know if the player is trustworthy?
- And, what if the decryption software in the player exposes the key and the decrypted content?
Also, if you are a video player developer, do you have to develop decryption modules for every DRM technology? And, do you have to update it each time they change their interfaces?

Furthermore, the sequence of events at the player (client-side) looks something like this –

obtain the movie & its manifest from the CDN
extract the KeyID from the manifest
create the license request
send the license request to the license server
wait, listen, and receive the response from the license server.
use the decryption key from the server to decrypt the content
decode the decrypted content
display the decoded movie

A single program or entity should NOT do all of the above.

It will result in a tightly coupled architecture and prevent attempts at openness and a plug-and-play ecosystem. Let’s see what can be done about it.

Player-Side Architecture

At the player level, the responsibilities described earlier are divided across different modules as follows –

The player takes care of obtaining the movie, parsing the manifest, extracting the KeyID, making the requests to the DRM License Server, etc.
A separate module (CDM or Content Decryption Module) creates the license request and decrypting & decoding the content.

Now, let’s look at the CDM.

CDM or Content Decryption Module

Every DRM provider provides its

mechanism to create a license request (using the KeyID, device identifier, signing the request, etc.)
mechanism to understand the license response received from the DRM License Server (the response is encrypted too) and extract the decryption key.
rules around storing the license locally on the client, license renewal, expiry, etc.

Using those details, modules called CDMs (Content Decryption Modules) can be built into browsers such as Chrome, Firefox, Microsoft Edge, Safari, etc.

DRM vendors test and certify these CDMs to ensure that

the license requests are formed correctly and as per specifications.
they do not leak the decryption keys
they do not leak the decrypted and decoded movies
they securely store the decryption keys based on the license specifications (store the key for X days, for example)
safely transport the video to the screen without leaking it

For the above reasons, CDMs in browsers are closed-source, a source of contention in the industry and the public. They are not-trusted because the public cannot see what’s inside the CDM’s source code.

Note: Several browsers give you the option to turn off the CDM. But if you do so, you won’t be able to watch any DRM-protected content. That’s the industry’s trade-off.

Here’s a screenshot of the Widevine plugin on Firefox’s plugin page (on my Ubuntu 20.04 machine).

aes-cenc-cdm-eme-keys widevine drm firefox

Oh wait, there is another layer of abstraction that we haven’t discussed yet.

EME or Encrypted Media Extensions

We saw in the previous section that the player applications need to talk to the CDM in the browser and with the License Servers to exchange license information, right?

This is both a technical and a business problem. Why?

player vendors need to integrate with all the different license servers & CDMs and keep track of the changes to their interfaces to stay up-to-date
a player company says that they don’t support some popular “XYZ” platform because of XYZ’s frequently-changing interfaces, then it’s highly likely that nobody will buy their players. Not good!

That gave rise to a layer between the players and the CDMs called the EME or Encrypted Media Extensions. The EME provides a standardized set of APIs for players (apps) to communicate with the CDMs.

Let’s now understand how EME and CDMs work together –

Encrypted Media Extensions (EME) is a JavaScript API.
Content Decryption Module (CDM) is software that decrypts and, optionally, decodes + displays the video.
The video player is a JavaScript program that uses the EME APIs to transmit messages between the CDM and the License Server.

An advantage of using EME is that content providers and player vendors are now developing streaming services that can be viewed on different browsers because of the interoperability introduced by the use of EME. You can develop an app that uses the EME spec to talk to the license server and the CDM – irrespective of the DRM platform or browser (CDM) used.

For more information, see the EME specification.

Video Decoding and Display

After decrypting a video, it must be decoded and displayed to the user without exposing the decrypted, decoded, or raw frames. The CDM (Content Decryption Module) is vital in preventing data leaks because it is the first point of contact for/with the decrypted data.

When it comes to video playback, a CDM can either

decrypt the movie and hand over the bitstream to the application (not very secure because someone can hack the app to dump the video)
decrypt, decode, and pass on the decoded frames of video to the platform’s display engine.
decrypt, decode, and display the video by itself (most secure)

The process can also occur in software or the device’s hardware (more secure).

Putting everything together on the player/client side, we get the following block diagram.

Our prototype DRM system is ready.

But it is missing a few critical features that make it attractive to content providers.

Step 4: Authentication, License Rotation, and Supporting Offline Playback

At this stage, I want to distinguish core DRM technology providers (such as Apple, Google, and Microsoft) and DRM vendors that provide services around those technologies. In this section, let’s look at a few of these business rules expected in the industry regarding DRM – these could be offered directly by the DRM technology provider or a DRM vendor.

User Authentication

DRM technology providers such as FairPlay, Widevine, and PlayReady do not offer User Authentication services. However, DRM vendors can! When the user hits “Play,” a separate server authenticates the user’s credentials (e.g., customerID). It checks whether the user can play that content based on subscription levels, promo codes, etc. After this server authenticates the user, the app can make a license request to the license server. Note: This is a gross simplification of the workflow, and professional DRM vendors have more sophisticated workflows for authentication.

Geo-blocking

Geo-blocking is used when content providers want to block the playback of a movie in certain countries. Like User Authentication, this is an add-on service by most DRM vendors. When the user hits “Play” on a particular movie, the DRM vendor’s servers can check if the movie can be watched in the user’s location. Based on the content provider’s rules, the license and encryption key is either sent/denied to the client.

Persistent and Non-Persistent Licenses

As the name suggests, a Persistent License can be stored on the client device after being received from the license server. It can be used to playback the movie(s) until the expiration time mentioned in the license has been reached. Before the license expires, the CDM needs to make a license renewal request.

Non-persistent licenses are used for the immediate playback of a movie. They are not to be stored for extended periods. They are generally used and discarded after the current playback session has expired or in the middle of a session if there are policies that have short expiry times.

Key Rotation

Key Rotation involves encrypting different sections (or segments) of a movie with different keys to mitigate attacks. Suppose a hacker obtains the key to a movie. In that case, it might allow him to watch only a small section of the movie if the following sections use different keys. Additionally, you can associate different licensing rules for different sections of the content by using multiple keys. For example, a movie’s “exclusive behind-the-scenes” section can be shown to premium subscribers only, while all free subscribers can watch the rest of the movie.

Offline Playback

Some services offer to playback videos when an internet connection is unavailable – dubbed “offline playback.” I’ve downloaded several movies on Netflix onto my phone when I know I am getting onto a long-haul flight. In such situations, the player can’t contact the license server to get the DRM keys.

And, so the DRM provider needs to provide an option to store the keys securely on the device so that the content can be unlocked and played back even when an internet connection is unavailable. A highly secure CDM implementation is needed to prevent the keys from leaking.

Optimized Encryption of Video

Encrypting & decrypting movies can get expensive, especially in UHD and 4K movies, and there is a need for optimized encryption. One such optimization is to encrypt only the Intra Frames (keyframes / I-frames / IDR frames) of every video segment. This optimization has several advantages –

Encryption is faster because Intra-frames make up a tiny proportion of a movie’s total frames.
Only after decoding the Intra-frame can its dependent frames (i.e., frames that depend on the I-frame) be decoded. Hence, the movie is rendered useless without decodable Intra-frames.

An example of this is SAMPLE-AES in Apple Fairplay that encrypts only samples of every media segment. We’ll read about it in the article on Apple’s FairPlay Streaming DRM.

Security Levels and Blocking Playback of Certain Resolutions

Content Decryption can take place in software or hardware, and generally, hardware decryption is considered more secure because the operations occur in the Trusted Execution Environment or TEE. The TEE is defined in Wikipedia as “a secure area of a main processor that guarantees code and data loaded inside to be protected with respect to confidentiality and integrity.”

However, some devices (low-end, typically) cannot perform hardware decryption and decoding.

Content providers require a mechanism to conditionally allow/block playback on various devices. One straightforward way is to generate DRM licenses that specify which devices are allowed to play specific resolutions of the movie’s bitrate ladder.

E.g., Google’s Widevine defines three security levels – L1 (highest), L2, and L3 (lowest). Typically, devices with L3 security levels are blocked from playing HD resolutions. We’ll read more about this in a separate article on Google’s Widevine DRM.

Conclusion

I hope you understand how AES, EME, CDM, CENC, Keys & Key Servers form the building blocks of DRM.

Here are the rest of the Hitchhiker’s Guide to DRM series articles.

Thank you, and see you next time!

DRM Menu

Krishna Rao Vijayanagar

Founder at OTTVerse

Krishna Rao Vijayanagar, Ph.D., is the Editor-in-Chief of OTTVerse, a news portal covering tech and business news in the OTT industry.

With extensive experience in video encoding, streaming, analytics, monetization, end-to-end streaming, and more, Krishna has held multiple leadership roles in R&D, Engineering, and Product at companies such as Harmonic Inc., MediaMelon, Airtel Digital, and Visionular Inc.. Krishna has published numerous articles and research papers and speaks at industry events to share his insights and perspectives on the fundamentals and the future of OTT streaming.

EME, CDM, AES, CENC, and Keys – The Essential Building Blocks of DRM