EME, CDM, AES, CENC, and Keys – The Essential Building Blocks of DRM

eme cdm cenc keys

Anyone trying to understand DRM (Digital Rights Management) will be confronted with acronyms such as AES, CDM, CENC, EME, etc. This can get very confusing for a newcomer, but understanding them is important to get a good understanding of DRM. In this article, let’s take a gentle tour of the building blocks of DRM:- EME, CDM, AES, CENC, and the use of Keys & Key Servers.

Simplified Architecture of a DRM System

As we saw in the previous articleDRM is a combination of encryption and business rules to control access and consumption of digital content.

Simply put, DRM is a system that,

  • provides the tools and infrastructure to enable a content provider to encrypt their content, and
  • build an ecosystem around the encrypted content so that the content provider can control who/what can decrypt and consume their content.

In the previous article of the series, we saw Ram and Shyam sending coded messages to each other. At the same time, Hari maintained the codebooks and decided who got to read/write the notes – remember?

aes-cenc-cdm-eme-keys

Now, let’s take this simple system and replace it with the technology needed to secure and distribute video. What do we get?

aes-cenc-cdm-eme-keys

Let’s describe what we have here. There is a movie that we want to send to an authenticated user securely.

So,

  1. we ask a DRM company’s server for a codebook to encrypt our video,
  2. then, we encrypt the video using that codebook
  3. we send the movie to the user.
  4. the user then asks the DRM company’s server for the codebook to unlock the video (decrypt it)
  5. and then he watches the movie!

Fantastic!

Is this all there is to know about DRM for video?

Nope! What we have here is a simple, toy-example of how to transfer movies securely using DRM. It captures the essence of DRM perfectly but wouldn’t work well in the real world.

In the rest of this article, let’s take each piece of this simple system, re-think it, re-design it, and see how it fits within the world of video delivery and DRM, shall we?

Step 0: Let’s Move to Adaptive Bitrate Streaming

Before we talk about the order, let’s modify our example to suit the ABR (Adaptive BitRate) model of video delivery.

ABR Refresher: in ABR, a movie is encoded into different bitrate-resolution combinations (a.k.a ladder) and then split into chunks or segments. Each chunk represents a few seconds of video and it is independently decodable.

“Packaging” refers to chunking or breaking up a movie into small pieces and describing it in a manifest or playlist document. When the user wants to play the movie, he needs to refer to this manifest.

Depending on the available bandwidth, the player requests a chunk/segment of a particular bitrate (rendition, or rung of the ladder) and a CDN (Content Delivery Network) responds with the requested chunk.

Popular methods of video delivery using ABR are MPEG DASH and HLS. For a deeper understanding, please refer to our articles on OTT and ABR video streaming.

Let’s change our block digram to reflect ABR video delivery.

aes-cenc-cdm-eme-keys

The only changes here are the packaging and CDN-based delivery steps. That’s all.

Okay, let’s move on and start with the encryption process.

Step 1: Video Encryption

The whole idea of encryption is to ensure that when someone intercepts our data, they should not read it or watch it in the case of video.

Encryption refresher: – encryption is a technique used to keep data confidential and prevent unauthorized people from reading it. Encryption uses a “key” to convert input data (plaintext) into an alternate form called ciphertext. It is almost impossible to convert the ciphertext back to plaintext without the key.

However, practically speaking, decryption without the key is possible, and encryption algorithms are designed make reverse-engineering extremely expensive – in terms of time, money, and computing resources needed.

One of the most popular encryption techniques is the “Advanced Encryption Standard” or “AES” for short. It is also called Rijndael (after its inventor) and was established by the U.S. National Institute of Standards and Technology (NIST) in 2001 to encrypt electronic data.

Some important points to remember about AES:-

  • It’s a symmetric-key algorithm: encryption and decryption are performed using the same key.
  • It has three variants based on the key-length: 128, 192, and 256 bits. The longer the key, the harder it is to crack.
  • Cracking the AES-128 without the key would require a “billion times a billion years” and a super-computer (source).

If you are interested in going deep into the AES standard, look at the AES’s Wikipedia pageI am not an expert in cryptography and won’t be able to do justice to the AES.

Note: Please remember that encryption is not encoding, and decryption is not decoding in the video space. For videos, encoding and decoding are words used to refer to compression and decompression, respectively. To learn more about encoding, decoding, and video codecs, please read our articles on the need for compression and a simple introduction to video codecs.

Is AES-128 The Only Encryption Technique?

No, it isn’t, and let’s think about the implication of this for a minute.

If a content provider decides to engage with three different DRM companies, and all three use different encryption techniques, then it means that the content provider needs to encrypt their videos three times, resulting in a waste of storage space and other resources.

That is why the CENC specification came into being – to reduce this encryption-driven fragmentation of the market and to reduce storage requirements.

Let’s learn about this next.

CENC or Common Encryption

Actually, before we dive into CENC, let’s step back and take a look at the state of OTT streaming protocols and CMAF in particular.

There are primarily two protocols in use today – MPEG-DASH and HLS. There are others such as MSS (Microsoft Smooth Streaming) and HDS, but, we’ll leave them aside for this discussion.

MPEG-DASH uses the mp4 container format for its videos and HLS uses the MPEG-TS (ts) container for its files. If a content provider uses both MPEG-DASH and HLS, then they need to store a copy of their videos in both mp4 and ts file formats.

Now, let’s add the DRM encryption problem to it. If our three hypothetical DRM providers use three different encryption standards, then a content providers needs to store 2 * 3 … six copies of each video! What a waste of storage space!!

To combat the first problem posed by video streaming protocols, the CMAF specification was created which said that videos can be stored in the fragmented mp4 container format (fmp4). With support from both MPEG-DASH and HLS, you can now create only one set of videos, store it in fmp4 format, and use a common set of files for both protocols.

Just make sure you create two manifests (sigh!).

How About Unifying the Encryption?

We still need to store multiple copies of each file if different DRM technologies use different encryption standards, right?

For this purpose, the MPEG developed the CENC or Common Encryption specification, specifying that videos can be encrypted using either cenc (AES-128 CTR) or cbcs (AES-128 CBC). CTR stands for Counter; and CBC stands for Cipher Block Chaining.

The implication of CENC is that a content provider needs to encrypt his videos only once and any decryption module can decrypt it. Note: Exposing the encryption algorithm is not a problem as long as the keys are strongly protected.

Well, CENC might sound like a magic wand for DRM-unification, but it is not.

There are three primary DRM technologies in the market – Apple FairPlay, Google Widevine, and Microsoft PlayReady.

  • Apple FairPlay supports only AES-CBC cbcs mode.
  • HLS supports only AES-CBC cbcs mode (irrespective of CMAF)
  • Widevine and PlayReady support both AES-128 CTR cenc or AES-128 CBC cbcs modes.
  • MPEG-DASH with CMAF supports both AES-128 CTR cenc or AES-128 CBC cbcs modes.
  • MPEG-DASH without CMAF supports only AES-128 CTR cenc mode.

As you can see, the CMAF and CENC specs have lead to confusion and fragmentation in the streaming space.

A possible convergence point is the universal use of CMAF and AES-CBC cbcs mode, but, how will these impact legacy devices that support only CTR or only MPEG-TS?

That’s a discussion for another time.

Step 2: Key, KeyID, and the License Server

By now, we have established that we will be encrypting or videos using AES-128 bit encryption. At this stage, a few questions that come up are –

  1. Where do we get the AES-128 Encryption Keys?
  2. How do we associate an Encryption Key with a movie?
  3. Where do we store the Encryption Keys?

Let’s answer them one at a time.

Where do we get the AES-128 bit encryption keys?

Any content provider can generate the encryption keys manually using specialized software. Alternatively, several DRM vendors provide the necessary tools and software to generate these keys.

How do we associate an encryption key with a movie?

Let’s understand the “why” first. When you go to a hotel, you ask the receptionist for the keys to a particular room by mentioning the room number – right? You’re providing the association here between a key and a room by telling her the room number.

Similarly, when we encrypt a movie with a particular key, we need to create that association and provide that to the DRM license server (our receptionist, if you will).

In DRM, a “KeyID” provides the association between an encryption key and a movie. It is a unique string of characters generated at the time of creating an encryption key for a particular movie.

And finally,

Where do we store the Encryption Key & its KeyID?

The Encryption Key and the KeyID are stored in a secure server (Key Store) that works alongside a DRM license server.

When a client needs to play an encrypted movie, it requests the DRM license server for the decryption key by providing that particular movie’s KeyID. If the DRM license server is happy with the request (authentic request), it will ask the Key Store to provide the decryption key associated with that KeyID.

Bonus Question: How is the KeyID transmitted to the player?

Rationale: without the KeyID, the license server can’t lookup a movie’s decryption key.

Answer: the KeyID is sent along with the DASH or HLS manifest to the video player. The player parses the manifest, finds the KeyID, and asks the DRM License Server for the decryption key associated with that KeyID.

To summarize the discussion around Encryption Keys, KeyIDs, and the License Server –

  • The Encryption Key is “private” and needs to be stored in a secure key store along with its associated KeyID.
  • The KeyID can be made “public”.
  • Anyone with the KeyID can ask the License Server for the private key (decryption key). It is up to the DRM provider to authenticate the person asking and then supply or deny the decryption key.

Here’s a block diagram of what we just learned on Keys, Encryption, and License Servers.

aes-cenc-cdm-eme-keys

Step 3: Decrypting Video At The Player and the Key Server

At the client-side (player application), the user presses play on the movie he wants to watch. Now, the video player needs a way to recognize if the movie is encrypted or not.

Otherwise, it will try and playback an encrypted movie, crash, and cause a horrible user experience.

Signaling that a movie is encrypted can be accomplished in several ways.

  • You could add a note in the manifest that the movie is encrypted and also provide its KeyID.
  • Another way is to insert a few bytes of unique information into the video bitstream. When the player examines the bitstream before playing back, it can catch this unique information and realize that it is encrypted.

The next few steps at the player are straightforward.

  1. The player finds the KeyID and requests the license server for the decryption key.
  2. The license server uses pre-defined mechanisms to recognize if the player making the request is authentic or not.
  3. After the license server is satisfied with the player’s authenticity, it responds with the license & decryption key.
aes-cenc-cdm-eme-keys

We’ve described a simple scheme, but there are many problems (technical and commercial) with our scheme. Here are some problems right off the bat.

  1. We’ve described a prototypical “player” that sends a request for the decryption keys to the DRM License Server. But,
    • How does the license server know if the player is trustworthy?
    • And, what if the decryption software in the player exposes the key and the decrypted content?
  2. Also, if you are a video player developer, do you have to develop decryption modules for every DRM technology? And, do you have to update it each time they make a change to their interfaces?

Furthermore, the sequence of events at the player (client-side) looks something like this –

  1. obtain the movie & its manifest from the CDN
  2. extract the KeyID from the manifest
  3. create the license request
  4. send the license request to the license server
  5. wait, listen, and receive the response from the license server.
  6. use the decryption key from the server to decrypt the content
  7. decode the decrypted content
  8. display the decoded movie

A single program or entity should NOT do all of the above.

It will result in a tightly coupled architecture and will prevent any attempts at open-ness and a plug-and-play ecosystem. Let’s see what can be done about it.

Player-Side Architecture

At the player-level, the responsibilities described earlier are divided across different modules as follows –

  1. The player takes care of obtaining the movie, parsing the manifest, extracting the KeyID, making the requests to the DRM License Server, etc.
  2. A separate module (called the CDM or Content Decryption Module) takes care of creating the license request, decrypting & decoding the content.

Now, let’s look at the CDM.

CDM or Content Decryption Module

Every DRM provider provides its own

  1. mechanism to create a license request (using the KeyID, device identifier, signing the request, etc.)
  2. mechanism to understand the license response received from the DRM License Server (the response is encrypted too) and extract the decryption key.
  3. rules around storing the license locally on the client, license renewal, expiry, etc.

Using those details, modules called CDMs (Content Decryption Modules) can be built into browsers such as Chrome, Firefox, Microsoft Edge, Safari, etc.

DRM vendors test and certify these CDMs to ensure that

  1. the license requests are formed correctly and as per specifications.
  2. they do not leak the decryption keys
  3. they do not leak the decrypted and decoded movies
  4. they securely store the decryption keys based on the license specifications (store the key for X days, for example)
  5. safely transport the video to the screen without leaking it

For the above reasons, CDMs in browsers are closed-source, and this is a source of contention in the industry and public. They are not-trusted because the public cannot see what’s inside the CDM’s source code.

Note: Several browsers give you the option to turn off the CDM. But, if you do so, you won’t be able to watch any DRM-protected content. That’s the industry’s trade-off.

Here’s a screenshot of the Widevine plugin in Firefox’s plugin page (on my Ubuntu 20.04 machine).

aes-cenc-cdm-eme-keys widevine drm firefox

Oh wait, there is another layer of abstraction that we haven’t discussed yet.

EME or Encrypted Media Extensions

We saw in the previous section that the player applications need to talk to the CDM in the browser and with the License Servers to exchange license information, right?

This is both a technical and a business problem. Why?

  • player vendors need to integrate with all the different license servers & CDMs and keep track of the changes to their interfaces to stay up-to-date
  • a player company says that they don’t support some popular “XYZ” platform because of XYZ’s frequently-changing interfaces, then it’s highly likely that nobody will buy their players. Not good!

That gave rise to a layer that sits between the players and the CDMs called the EME or Encrypted Media Extensions. The EME provides a standardized set of APIs for players (apps) to communicate with the CDMs.

aes-cenc-cdm-eme-keys

Let’s now understand how EME and CDMs work together –

  • Encrypted Media Extensions (EME) is a JavaScript API.
  • Content Decryption Module (CDM) is a software that decrypts and optionally, decodes + displays the video.
  • The video player is a JavaScript program that uses the EME APIs to transmit messages between the CDM and the License Server.

An advantage of use EME is that content providers and player vendors are now develop streaming services which can be viewed on different browers because of the inter-operability introduced by the use of EME. You can develop an app that uses the EME spec to talk to the license server and the CDM – irrespective of the DRM platform or browser (CDM) being used.

For more information, see the EME specification.

Video Decoding and Display

After decrypting a video, it needs to be decoded and displayed to the user without exposing the decrypted, decoded, or the raw frames. The CDM (Content Decryption Module) plays a vital role in preventing data leaks because it is the first point of contact for/with the decrypted data.

When it comes to video playback, a CDM can either

  • decrypt the movie and hand over the bitstream to the application (not very secure because someone can hack the app to dump the video)
  • decrypt, decode, and pass on the decoded frames of video to the platform’s display engine.
  • decrypt, decode, and display the video by itself (most secure)

The process can also take place in software or the device’s hardware (more secure).

Putting everything together on the player/client-side, we get the following block diagram.

aes-cenc-cdm-eme-keys

Our prototype DRM system is ready.

But it is missing a few critical features that make it attractive to content providers.

Step 4: Authentication, License Rotation, and Supporting Offline Playback

At this stage, I want to distinguish core DRM technology providers (such as Apple, Google, and Microsoft) and DRM vendors that provide services around those technologies. In this section, let’s look at a few of these business rules expected in the industry when it comes to DRM – these could be offered directly by the DRM technology provider or a DRM vendor.

User Authentication

DRM technology providers such as FairPlay, Widevine, PlayReady do not offer User Authentication services. However, DRM vendors can! When the user hits “Play,” a separate server authenticates the user’s credentials (e.g., customerID). It checks whether the user is authorized to play that content based on subscription levels, promo codes, etc. After this server authenticates the user, the app can make a license request to the license server. Note: This is a gross simplification of the workflow and professional DRM vendors have more sophisticated workflows for authentication.

Geo-blocking

Geo-blocking is used when the content providers want to block the playback of a movie in certain countries. Similar to User Authentication, this is an add-on service by most DRM vendors. When the user hits “Play” on a particular movie, the DRM vendor’s servers can check if the movie can be watched in the user’s location. Based on the content provider’s rules, the license and encryption key is either sent/denied to the client.

Persistent and Non-Persistent Licenses

As the name suggests, a Persistent License can be stored on the client device after being received from the license server. It can be used to playback the movie(s) until the expiration time mentioned in the license has been reached. Before the license expires, the CDM needs to make a license renewal request.

Non-persistent licenses are used for immediate playback of a movie. They are not to be stored for extended periods. They are generally used and discarded after the current playback session has expired, or in the middle of a session if there are policies that have short expiry times.

Key Rotation

Key Rotation involves encrypting different sections (or segments) of a movie with different keys in order to mitigate attacks. Suppose a hacker obtains the key for a movie. In that case, it might allow him to watch only a small section of the movie if the following sections use different keys. Additionally, you can associate different licensing rules for different sections of the content by using multiple keys. For example, an “exclusive behind-the-scenes” section of a movie can be shown to premium subscribers only, while all free subscribers can watch the rest of the movie.

Offline Playback

Some services offer to playback videos when an internet connection is unavailable – dubbed “offline playback.” I’ve downloaded several movies on Netflix onto my phone when I know I am getting onto a long-haul flight. The player can’t contact the license server to get the DRM keys in such situations.

And, so the DRM provider needs to provide an option to store the keys securely on the device so that the content can be unlocked and played back even when an internet connection is unavailable. A highly secure CDM implementation is needed to prevent the keys from leaking out.

Optimized Encryption of Video

Encrypting & decrypting movies can get expensive, especially in UHD and 4K movies and there is a need for optimized encryption. One such optimization is to encrypt only the Intra Frames (keyframes / I-frames / IDR frames) of every video segment. This optimization has several advantages –

  • Encryption is faster because Intra-frames make up a tiny proportion of the total number of frames in a movie.
  • Only after decoding the Intra-frame can its dependent frames (i.e., frames that depend on the I-frame) be decoded. Hence, the movie is rendered useless without decodable Intra-frames.

An example of this is SAMPLE-AES in Apple Fairplay that encrypts only samples of every media segment. We’ll read about in the article on Apple’s FairPlay DRM.

Security Levels and Blocking Playback of Certain Resolutions

Content Decryption can take place in software or hardware and generally, hardware decryption is considered more secure because the operations take place in the Trusted Execution Environment or TEE. The TEE is defined in Wikipedia as “a secure area of a main processor that guarantees code and data loaded inside to be protected with respect to confidentiality and integrity”.

However, some devices (low end, typically) cannot perform hardware decryption and decoding.

Content providers require a mechanism to conditionally allow/block playback on a wide variety of devices. One straightforward way to do so is to generate DRM licenses that specify which devices are allowed to play certain resolutions of the movie’s bitrate ladder.

For e.g., Google’s Widevine defines three security levels – L1 (highest), L2, and L3 (lowest). Typically, devices with L3 security level are blocked from playing HD resolutions. We’ll read more about this in a separate article on Google’s Widevine DRM.

Conclusion

I hope you understood how AES, EME, CDM, CENC, Keys & Key Servers form the building blocks of DRM.

Here are the rest of the articles in the Hitchhiker’s Guide to DRM series.

Thank you, and see you next time!


The Hitchhiker’s Guide to DRM is sponsored by BuyDRM™

BuyDRM™ is a leading provider of Digital Rights Management and Content Security Services for the entertainment, enterprise and transportation industries. Since the turn of the century, BuyDRM™ has amassed substantial success stories for many of today’s largest brands.

OTT operators, television networks, movie studios, gaming sites and premium content distributors use the BuyDRM™ award-winning KeyOS Multi-DRM Platform to provide robust content security for their streaming and downloadable video.

Customers include the Academy of Motion Picture Sciences and Arts (AMPAS), ABC (Australian Broadcasting Corporation), BBC iPlayer, BBC Sounds, Blizzard, EPIX, FuboTV, Rakuten Viki, Sony New Media Solutions, Sony Pictures, SBS Belgium, Showtime and Zee5.

For more information, please visit BuyDRM™.

About The Author

I’m Dr. Krishna Rao Vijayanagar, and I am the Founder and Editor of OTTVerse.com. I've spent several years working hands-on with Video Codecs (AVC, HEVC, MultiView Plus Depth), ABR streaming, and Video Analytics (QoE, Content & Audience, and Ad). I hope to use my experience and love for video streaming to bring you information and insights into the OTT universe. Please use the Contact Page to get in touch with me.

6 thoughts on “EME, CDM, AES, CENC, and Keys – The Essential Building Blocks of DRM”

  1. Pingback: Google Widevine DRM - How Does It Work? - OTTVerse

  2. Pingback: Glossary of DRM Terminologies - OTTVerse

  3. Pingback: Apple FairPlay Streaming DRM - How Does It Work? - OTTVerse

  4. Pingback: EME, CDM, AES, CENC, and Keys - The Essential Building Blocks of DRM - OTTVerse | Hacker News | AnotherFN.com - Another FN

  5. Pingback: EME, CDM, AES, CENC, and Keys – Building Blocks of DRM - GistTree

  6. Pingback: EME, CDM, AES, CENC, and Keys – Building Blocks of DRM – Hacker News Robot

Leave a Reply