For any OTT streaming platform, finding the right balance among providing adequately good quality across many types of devices while maintaining CDN cost is biggest challenge. CDN cost is often one of the biggest component of operating cost for an OTT streaming platform.
Table of Contents
Optimising CDN Costs at Zee5
We, at Zee5 are churning out high quality videos with a lot of 4K content being released every day. Our focus on video quality is driven by
- the growth of screen sizes in India and people watching 4K content on large Smart TVs
- increase in the average internet speed (introduction of 5G in India and it’s fast growth), and
- a growing demand for an immersive viewing experience!
High quality content is absolutely needed to stay ahead of the competition!
But, delivering it comes at a steep cost!
The increase in cost for delivering high quality video comes primarily from two factors –
- CDN delivery costs (largest factor)
- Transcoding costs (smaller impact)
With the growth of subscribers and their screen sizes, people are watching more higher resolution content, and this increases CDN usage (and its cost) exponentially!
This is a nightmare scenario for a streaming service!
One way to mitigate high costs is to increase subscription prices, but, in price-sensitive markets, increasing subscription prices generally results in a loss of subscribers (and bad publicity on social media!).
The only option left to us was to tackle this problem technologically. That is, to reduce the CDN costs while improving the quality.
While this goes against traditional wisdom, it is exactly what we have achieved, and we are continuing our journey!
How Did We Optimise CDN Costs while Improving Video Quality?
In Zee5’s OTT platform, typically, we serve two types of ABR-packaged content
- HLS (HTTP Live Streaming), mainly played on Apple devices.
- DASH, played mainly on Android devices, HTML5 TVs, Roku devices, Set top boxes, etc.
For VOD transcoding, we use AWS MediaConvert for transcoding and packaging of all premium content, and most of the regular content.
Over a period of six months, we embarked on a journey of improving quality while reducing the size of segments generated using two main techniques –
- Using HLS fMP4 instead of HLS TS for Optimising CDN Costs, and
- Optimising Transcoding Presets
The rest of this white-paper details the steps we took to achieve our goals.
Using HLS fMP4 instead of HLS TS for Optimising CDN Costs
MPEG-2 TS is the de-facto standard of broadcasting data over Satellite and Cable, and most households consume AV content through this standard. So when Apple came up with HLS, they piggybacked on MPEG-2 TS to encapsulate AV data.
So, for the initial 3 versions of HLS, TS was only supported container format. But soon it became clear that MPEG-2 TS is not efficient container format to carry adaptive bit rate content. MPEG-2 TS was meant to deliver data reliably over environment prone to burst error. So it has the following characteristics which enables receiver to detect error/reproduce data reliably.
- Each TS packet is short with a size of 188 bytes.
- A packet starts with a four byte mandatory header, with the first byte being 0x47 (sync byte).
- Post that, there can be an optional adaptation field.
- The adaptation field contains PCR (among multiple other things), which is 42 bits in length.
- PCR is not used on OTT ABR playback.
- Typically, each video frame/group of audio samples starts with a PES (packetised elementary stream) header (with 4 bytes start code 0x00 0x00 0x00 0x01), and then the presentation and decode (optional) timestamp, each of 33 bits length.
- PES always starts with a new TS packet, so the end of the previous PES and the beginning of the new PES can never coincide on the same TS packet.
- Thus, if we have a video frame of 1000 bytes, it would take a minimum of 6 TS packets (i.e. 188 X 6 = 1128 bytes).
- As can be seen, the 6th packet needs to be padded with a large number of padding bytes (0x0) to complete the packet.
- This padding bytes download is not necessarily the best usage of bandwidth. For ABR, bandwidth is of premium importance.
The following table depicts the comparison of cumulative segment sizes between HLS TS and HLS fMP4 for a daily TV show episode of length around 20 minutes. Please note, the encoding parameters used for both is exactly same.
The table clearly shows the achievable gains in file size reduction.
The fMP4 format achieves these results primarily through two ways :-
- Encapsulating/isolating encoding (encoding type, profile, and level for example) and DRM specific data to a separate initialization segment.
- Typically, there will be one initialization segment per track (video/audio).
- This enables to omit the same information repeated to all segments pertaining to the relevant track and encryption related metadata.
- The fMP4 specification doesn’t mandate any limitation on casing of frames/samples, which ensures that there is no wastage of bits due to packaging constraint.
- Frames and samples are merely put continuously on MDAT section one after another in packed manner.
- Turn box depicts the size of each frame/group of samples, which gives the player position and sizing details of each frame/group of samples.
Because of these reasons, moving to fMP4 reduces bandwidth consumption vs. using MPEG-TS.
Additionally, from the results, we see that as we go to lower resolutions, the savings is higher.
This is because lower the resolution, frame sizes are smaller. Thus, the wastage to maintain 188 byte alignment is much higher for lower frame sizes. E.g., in order to carry a B frame of 15 bytes, we need a full TS packet, which is 188 bytes in size.
Tweaking Transcoding Presets to Optimise CDN Cost
While moving to fMP4 from MPEG-TS gave us good filesize reductions, we saw that more than 90% of Zee5 users consume DASH content, while HLS is primarily consumed on Apple devices. So, we needed to further fine-tune our encoding presets to get significant savings across our content library.
In AWS MediaConvert, transcoding presets are encapsulated to a JSON document called a “Template”. Any change on a Template is validated as follows –
- By watching content on various sized displays (i.e. TVs, phones, laptop screens).
- By generating and comparing VMAF scores before and after changing the transcoding settings.
According to Netflix, a change of 6 VMAF points would be noticeable to a viewer. We’ve noticed that people start noticing differences as the VMAF change is bigger than 3 points. Also, when the VMAF score exceeds 95, the scope of further quality improvement is fairly limited.
A combination of the above-mentioned practises helps us ensure that,
- at any stage of optimisation, we are not degrading the user’s perceptual AV experience,
- and any video quality improvements can be quantified by VMAF scores.
We created a test suite of around 16-18 source assets, which mostly represents
- all types content (TV shows/Originals/Movies),
- Resolutions (upto 4K and upto 1080p),
- Bit depths (8 bits and 10 bits),
- Encoding Types (AVC and HEVC),
- Genres (Drama and Actions) available on Zee5 platform..
Such a diverse dataset was chosen ensure that improvements seen in the test platform translate to actual, real-world content on Zee5. Additionally, each time we change the preset, we run it through the aforementioned test suite to understand the changes or improvements brought about by the settings.
So, What Did We Tweak on AWS MediaConvert?
Currently, we are using AWS MediaConvert’s QVBR mode for all renditions using AVC and HEVC. QVBR ensures that the encoder tries to reach the specified quality level for a given rendition, while staying within the specified bit rate.
One recommendation from AWS to improve video quality (and reduce file size) was to use multi-pass transcoding.
However, using multiple passes makes transcoding painfully slow (almost unusable for HEVC), and transcoding time is extremely important for a lot of our time-sensitive content (in terms of time-to-publish).
We did not consider using the accelerated mode combined with multi-pass transcoding. If we did so, all our transcoding would go onto the on-demand queue, which is quite expensive and ensures that we are not utilising all the 25 RTS slots we have reserved.
Prior to the start of optimisation, our transcoding system used the following settings –
- AVC: Single pass HQ for all renditions
- HEVC: Single pass for most renditions
For above mentioned reasons (transcoding time constraint and effectively utilising RTS slots), we didn’t use the quality tuning levels.
Though this limited the number of levers at our disposal, the remaining levers were still very effective (as seen in the results). The table below shows the settings that we tweaked –
|QVBR Level||This has a very big impact on quality and size, we carefully tweaked this for each rendition to improve quality and reduce size.|
|Maximum Bitrate||We have tweaked this, but with QVBR mode this setting has really limited benefit.|
|GOP Settings||Improves encoding efficiency.|
|Segment Size||It is often used to balance encoding efficiency and rebuffering at the player. A segment size of 6 seconds provided a very good balance for us.|
|Scaling||Has a good impact on both size and quality of transcoded content.|
|Noise Reducer||Has a good impact on both size and quality of transcoded content.|
Often codec profile and level is a good lever to improve encoding efficiency, however, we desisted using this as some of the low cost/low capability device couldn’t decode certain combinations of Type/Profile/Level. For e.g., on certain low-end phones, decoding of AVC High Profile with Level 4.1 causes black dots on the screen.
The following table depicts quality improvement (in terms of VMAF score) and bandwidth saving while using MPEG-DASH with one of the initial AVC preset improvements for an original content with duration around 41 minutes.
The following table depicts quality improvement (in terms of VMAF score) and bandwidth saving while using MPEG-DASH with one of the initial HEVC preset improvements for a movie clip of 5 minutes containing fast motion and action sequence.
And we have improved the presets and thus the results since that period.
Results of our CDN Cost Optimisation Efforts
We started deploying our preset optimisations in the beginning of Sept. 2022, and the last set of improvements went into production in the middle of Nov. 2022 for both AVC and HEVC.
In the results, we would like to depict two things.
- CDN bandwidth usage
- How the cumulative segment size reduced, while quality has improved (using a popular movie as an example)
Note: the following sections focus on results gathered from Aug. 2022 till the end of Nov. 2022 to showcase the improvement.
CDN Bandwidth Usage
As discussed at the start of this whitepaper, the consumption of 4K content has increased on Zee5. So, it is expected that
- the total play duration (amount of total time user watched the content) remain similar, but,
- the bandwidth usage from CDN (and cost) would increase because 4K content is of much higher bit rate than Full-HD or 720p content.
However, as the following table depicts, CDN bandwidth usage remained similar (in fact, it has reduced proportionally) post optimizations introduced while play duration largely remained unaffected (as expected).
Cumulative Segment Size reduction
The following table depicts the cumulative HLS segment size transition for a very popular movie on Zee5 from June – Nov. 2022 (post our last optimisation) depicting the saving caused by both encoding optimisation and change of container format:
We have taken feedback from various stakeholders (internal and external), and all of them agreed that this movie and in general, the AV quality for content has improved significantly over last 3-4 months.
Super Premium Preset
While devising the preset, we pushed the encoder to generate the best quality without worrying about bit rate while tweaking all the levers/settings available through AWS MediaConvert.
For each super-premium movie, we take 10 minute clippings and tune the settings till we arrive at the best quality that we can achieve.
The following tables gives the quality improvement seen by employing our Super Premium settings for a couple of clips from the movies mentioned earlier.
Clip 1 – HEVC
Clip 1 – AVC
Clip 2 – HEVC
Clip 2 – AVC
Next Steps for Optimising our CDN Costs
Despite achieving the goals we set for ourselves at the start of this exercise, we understand that we must keep making significant progress in order to preserve our competitive edge and advance in the field of video quality.
Here are a few steps that we intend to take in this direction
Move towards HEVC
As more and more devices are becoming capable of decoding HEVC, we want to publish more HEVC-encoded content to provide better quality. HEVC shall be available by new year for HD Movies and for original content by end of 2022.
Exploring Dolby Vision
We are looking at leveraging more wide colour gamut technologies to deliver a better video experience. We are currently targeting Dolby Vision as preferred solution. We intend to publish some premium movies with Dolby vision by first quarter, 2023.
Evaluate VVC and AV1
AV1 seems to be particularly interesting because of Google’s backing and availability of decode support on the Android platform. We are talking to various partners and evaluating their preparedness.
Evaluate Different Encoding Solution Providers
This will be done to improve existing AV offerings (in terms of video quality and bandwidth) and to build a truly cloud agnostic encoding solution.