Video Engineering for OTT - A 10K Foot View

If you search for the following phrase on Google, ‘what is video encoding’, the top result is from Mux, which explains it as “Video encoding is a process for making video files smaller through compression.” Probably the most unarguable definition of the term “encoding.”

However, enter into the technology mind-map of any OTT company, and you will see one node with the name “Encoding” with at least 5-6 sub-nodes, most of which are not apparent from the above definition.

video encoding for OTT
video engineering for OTT
video systems for OTT

For example, here’s a list of technologies covered under the ‘Encoding’ umbrella in a typical OTT company.

Source Video QC
Thumbnail Extractions
Video Compression
Packaging to ABRS formats
Packaging for Encryption and DRM
Subtitle Packaging
QoS and QoE
Content Management System

The point is – that it’s an industry norm to refer to

everything done with the video files – as Encoding,
the system responsible for this is the Encoder,
and the team owning this system as the Encoder Team.

I am not a great fan of this practice. In the organizations where I have worked, I have tried to incorporate a better nomenclature – Video Engineering, Video Engineering System, and Video Engineering Team.

Isn’t that more intuitive?

And then there is the Content Management System. The CMS and the Video System share most of the responsibilities. This article will be incomplete without a brief discussion about the CMS. But, let’s save the detailed deep-dive into the CMS for a dedicated article.

For now, let us discuss each of the above-listed topics and understand them better – how they are implemented and the key challenges and tradeoffs in a typical OTT context.

Table of Contents

Source Video QC

Different business models contribute to how and where the video content for an OTT service comes from. The complexities and challenges involved in Source Video Quality Control (QC) depend upon how much control the organization has over the mezzanine video files ingested into the system. It is a widespread practice to have a contract between the Video Engineering team and the Content Providers regarding the source video’s properties and quality constraints.

Usually, it is a good idea to reject content that does not comply with the constraints. This obligates the Content Providers to provide contract-compliant source videos. However, in rare (and unfortunate) cases where the OTT service does not have the negotiating power with the content owners, we try to fix the issue ourselves. Not always successful, though.

Let us discuss a few artifacts that are common in source videos.

Letter Box and Pillar Box

Description: Letter Box – Black bars on top and bottom of the Video. Pillar Box – Black bars on the left and right side of the Video.
Root Cause: Usually, manifests in old library contents are processed to make Aspect Ratio adjustments and adapt for screen dimensions.

Low-Resolution Video

Description: This refers to the minimum resolution for the source video. It can vary from content to content. We might want to ingest only Full HD for some content while accepting 720p for other titles.

Upscaled Video

Description: Actual video was 720p. It was upscaled to 1080p for compliance with the contract. The result is that the whole purpose of having a contract is defeated.
Root Cause: This has to be intentional—an attempt by the Content Provider to sell the non-quality compliant content to the OTT service provider.

Broken upload

Description: The size of the source video is 1.4 GB, but the OTT service received only 1.1GB.
Root Cause: Due to an incomplete copy-operation, or due to network interruptions, an incomplete source video was received.

Unsupported Codecs or Containers

Description: The spectrum of codecs and containers is vast. Ideally, we should support all standard-compliant formats, but we will always have some limitations.

There are many such artifacts and more details on this merit a dedicated article. I am sharing below the list of parameters that I had listed for one such QC activity.

The complexities involved in detecting the source video QC issues range from very simple to very complex and resource/CPU/time-consuming. Some of them can be detected by parsing the headers, while others will require complex analysis that includes video decoding.

And for some issues (e.g., upscaling), it is more efficient to perform QC manually (subjective QC). For some of these issues, the impact would be on the QoE for the consumer, while others (e.g., the stream cannot be decoded) could be blockers that will not allow us to encode and publish the videos.

The same applies to fixing these issues. For example, the low-resolution problem cannot be fixed, as there is no simple/viable way to increase the video’s resolution. In such cases, we will have to choose between going ahead with the compromised quality or declining the business. In contrast, issues like the pillar and letterboxing can be fixed using open source tools like FFmpeg.

There are various tools – some open source like FFmpeg, ffprobe, MediaInfo, HandBrake, etc., and other proprietary tools such as Baton, Rhozet, Telestream, etc. However, a detailed discussion of these tools is out of the scope of this article.

Thumbnail Extraction

For obvious reasons, OTT videos require thumbnails.

There are two types of thumbnails. The creative teams generate the first set, and the second set is auto-generated from the video’s content.

For example, consider a web series. We need one set of thumbnail(s) that will represent the entire web series. And then, we require a thumbnail for each episode of the web series.

The first set of thumbnails are generated manually by the creative teams. They involve special photography and image editing, while the second ones are generated by algorithms – by capturing the best frames from the episode.

Let’s focus the discussion around the auto-generated thumbnails. Algorithms ranging from simple ones like face detection to very complex ones, including Deep Neural Networks, are used.

Below are some example conditions that the frames should meet to qualify for use as thumbnails.

Faces should be at the center of the frame.
There should be no text in the frame.
Frames should have clear edges, and blurry frames cannot be used.
Frames should have less than 50% black area.

The algorithm should extract a set of frames matching the above criterion. One of these frames is then chosen by the Programming Team for use as thumbnails.

Refer to this discussion for useful insights into thumbnail extraction using simple tools and algorithms.

Video Compression

Video compression is at the core of the video system. It is the component with the highest ROI and the one with the highest scope for building your IP. Most critical for any OTT company to get it right.

Build it or Buy it – just make sure you are at par with your competitors, if not better.

Let’s talk about the ‘goals’ and ‘expectations’ of the video engineering team in an OTT company related to video compression.

Use a codec supported on all platforms – iOS, Android, Chromium, Firefox, Safari, Edge. Among the codecs out there, H.264/AVC wins hands down in terms of versatility, and it is expected to stay so for at least five more years.
Compress the stream to as low bitrate as possible while maintaining acceptable quality. The bitrates will depend on the codec used, content complexity, target screen size, and resolutions. E.g.: H.264/AVC at a target bitrate of 3.5 Mbps for high motion pop songs, watched on a mobile device with 6 inches screen at 1080p.
Multiple renditions with different resolutions must be generated to support varying network conditions. It is an industry practice to create resolutions ranging from 144p to 1080p (and 4K if required).
H.264/AVC is a must. We might want to generate VP9 and/or HEVC as well.
These codecs will give the below benefits –
- Save the CDN egress costs by up to 25-30%
- It makes it possible to deliver HD content even at (relatively) weaker network conditions.
- Save network costs for the consumers by 25-30%

However, since you are generating multiple formats and renditions, there will be additional costs involved.

CPU costs for encoding the second format
Storage and CDN end-point caching costs.
Uncertainties involved in the HEVC Patent Pool and associated royalties.

There are complexities with respect to the platform support for the HEVC and VP9 codecs. You need to understand your users and their devices before adopting these codecs.

Apart from this, there’s a lot of buzz in the market regarding the next-generation codecs.

AV1: This is backed by big boys like Microsoft, Apple, Google, etc. It’s a royalty-free codec. It boasts of bitrate savings up to 50% above the HEVC and VP9 codecs. Netflix is already streaming in this format.
VVC: New kid in the MPEG family. Boasts of 10-12% bitrate saving compared to AV1. Still in the early stages of adoption. Click here for an interview with the HHI team who are developing the VVC codec.
LCEVC: High gain with low complexity codec – a very novel idea in the nascent stages. Click here to read an overview of LCEVC and a comparison of LCEVC with H.264/AVC.

As mentioned, there’s a lot of buzz around in the encoding communities, and no one is clear about the future of these codecs. As an OTT company, you would want to invest in these now only if,

You have enough consumption to recover your expenses (engineering + compute & storage) from the bitrate savings offered by these codecs in the short term – considering that the engineering efforts would go in vain in case the codec becomes unsuccessful in the long term in terms of industry adoption.
You are keen to be a differentiator and build an IP around these technologies.

For an OTT service targeting developing markets like SEA and MENA, I would recommend the following codec strategy. Since these markets are dominantly Android markets, we should be happy to use just the H.264/AVC and VP9 codecs. With less than 10-15 % iOS users (most of them in the markets with better bandwidths and lower cost-sensitive consumers), HEVC is not a worth-while investment once you have VP9.

It would help if you started exploring AV1 but would want to watch out for the industry trends before leaping – do ponder over the above points seriously before jumping-in.

ABR Streaming

ABR stands for Adaptive Bitrate Streaming, and it is the technology that enables video players to adapt the streaming quality based on the bandwidth fluctuations. As discussed in the previous section, we have generated multiple encodings of different bitrates and resolutions. To use ABR, we chunk the video and develop indexing that maps these resolutions to the corresponding bitrates – and this is the most crucial part of the packaging process.

There are a few common ABR streaming protocols –

HLS – HTTP Live Streaming – A standard by Apple. Open and adopted by all platforms and players that support ABR.
DASH – Dynamic Adaptive Streaming over HTTP – a standard by MPEG adopted and popularised by Google and Android. All platforms have adopted it except Apple.
Smooth Streaming – By Microsoft – not very popular.

ABR is the backbone of streaming over the internet—a great example of simple technologies making a big difference. You can understand more about it from here.

Encryption and DRM

The OTT content needs to be secured. DRM is all about protecting the content against piracy. You can refer to this article on OTTVerse explaining the basics of DRM. However, just like most technologies – putting things into practice is a different ball game. Here is an excellent write-up on the difficulties of DRM by a veteran of the DRM industry, a gentleman I have worked with personally.

In a follow-up article, I will share my experiences working on some advanced use-cases of DRM. Things like –

Restricted access to HD content to only premium consumers.
Restricting access of content over HDMI ports and screen mirroring.
Applying usage rules on concurrent views of content.
Usage rules for expiry of downloads.

Hopefully, these articles will be out very soon. So, subscribe to OTTVerse and stay tuned.

Subtitle and Closed Captions

The subtitle (text) is created separately, and usually, it is created manually by the subtitling vendors. There are technologies for speech-to-text conversion, but not yet widely adopted in the OTT space.

There are two popular standards for representing these subtitles – the SRT and the WebVTT.

There’s a very subtle difference between Subtitles and Closed Captions.

The purpose of having Subtitles is to help audiences who do not understand the content’s language.
Closed Captions are used to help audiences who are hard-of-hearing or when the content is to be consumed with audio mute.

More details here.

During the ‘packaging’ stage, references to the WebVTT are added to the HLS and DASH manifest files. The following metadata is added to the manifest files during packaging to enable the subtitle feature.

Metadata to enable the player to notice the availability of the subtitles and recognize the language. This will reflect in the consumer’s app under the subtitles options.
Metadata to enable the player to fetch the subtitle if the consumer opts for it. This is a CDN URL for the relative location wrt to the manifest file’s location.

Every piece of technology, however simple it may seem – if you adopt it in the right way, there’s an opportunity to differentiate. Consider this use case where-in you go live with a piece of content on your OTT platform without subtitles, and then you want to add subtitles now. It will not be possible to do this efficiently unless your APIs are designed for this use case.

Ideally, we should be having a separate Microservice for Subtitle Packaging. This is a very ‘mainstream’ use case at one of the organizations I worked at. We had our business in multiple markets in SEA and Middle East markets which covered more than 10 languages for subtitles. The Subtitle Service that we developed is one of the most popular microservices in our Video System.

Content Management (Asset Management and Storage)

The source videos, encoded videos, thumbnails, subtitles, metadata are the OTT company’s assets. You have an inventory of these assets. And your business is all about monetizing these assets. So, where are you going to save/store these assets? There are multiple options, varying in costs and latency. Also, the nomenclature is going to be super important. Depending upon the ‘aging’ of the assets, it is a popular practice to archive the content to the ‘cold’ storage options.

Another aspect of asset management is the ‘metadata’, ‘relationships’, and ‘hierarchy’ of these assets. This is one of the primary concerns of the CMS (Content Management System). Other concerns are the ‘Rights’ associated with these assets. Rights with respect to the geography, screen types, license duration, etc.

And then there is the whole workflow associated with the asset management – things like, for example.

Approvals – Scanning for explicit content (adult scenes, violence, etc.)
Scheduling to subtitling vendors (generating tickets, assigning, and closing)
Thumbnail generation – manual or auto

Frankly speaking, content management is a domain in itself, of which we have only scracthed the surface. More about CMS’ in a dedicated article later.

Conclusion

Acquiring video content from the Content Provider and processing it for consumption on an OTT Platform is called Video Engineering. And as discussed, it is a complex process. It is also arguably the most critical part of the OTT tech stack – getting it right is critical.

I am glad that the Indian OTT Players have started to realize this.

Until recently, most of these players were using turn-key solutions with only one goal – “make-it-work,” but as their business is growing large by the day, they realize that these technologies are at the core of their business. Good to see these companies ramping-up teams and hiring domain experts.

Wrapping up the discussion, I would like to reiterate that I have barely managed to scratch the surface, even for the topics I touched. Topics like Live Streaming, Low Latency, Analytics, Objective/Subjective Quality Metrics, etc., are equally important in today’s OTT Landscape.

Hopefully, I will be back on OTTVerse soon and will talk about these topics. Until then, take care and keep streaming!

Uday Shankar Ammanagi

Uday has spent his entire professional career of 15 years as a Software Engineer in the Multimedia, Broadcast & Streaming Industry. He started off with Assembly Programming to port Audio-Video codecs to Embedded chips and presently serves as the Principal Engineer (Video Engineering) at one of the leading OTT companies in SEA & MENA. A servant-leader with deep experience and interest in all aspects - be it Technology, Business, or People.

Video Engineering for OTT – A 10K Foot View