This article will show how Content Adaptive Encoding (CAE) ensures excellent video quality while reducing the bitrate for H.264/AVC transcoding. We demonstrate this with bitrate and VMAF results for videos of varying complexities.
When choosing the best bitrate for each video, Content Adaptive Encoding takes advantage of the fact that compression effectiveness depends on the content’s complexity.
- bitrate/bandwidth savings for easy to moderate complexity videos
- and quality improvement for complex video content segments.
Definition of Content Adaptive Encoding and Per-Title Encoding
“Content Adaptive Encoding” and “Per-Title Encoding” are used interchangeably on the internet. We asked ChatGPT about their definitions, and here is what we got –
This is a method of video encoding in which the encoder adjusts the bitrate of the video on a scene-by-scene or shot-by-shot basis based on the complexity of the content in each segment. The goal of per-title encoding is to maintain a consistent level of visual quality while minimizing the file size of the encoded video.
Content Adaptive Encoding
Content Adaptive Encoding considers the complexity of the video content and the context in which it will be viewed. For example, a video that will be viewed on a large screen with a fast internet connection may be encoded at a higher bit rate than the video that will be viewed on a small screen with a slower internet connection. Content Adaptive Encoding aims to optimize the viewing experience for the specific context in which the video will be played.
Note: Read this article on the basics of per-title encoding to learn more.
What Contributes to Video Complexity?
Before discussing video complexity, let us first understand how an encoder works.
A video encoder exploits temporal and spatial redundancies so that fewer bits can be used to represent the video frames. Complex video footage has fewer opportunities for redundancy reduction, which lowers compression efficiency.
Below, we list a few crucial factors that contribute to video content complexity –
- Texture Information
- Detailed information in a video: Typically, trees, water bodies, crowds, and outdoor scenes are more complex.
- Sporting events with stadium views, large audiences, grass, gardens, and distant camera views add to high-complexity scenes.
- In general, frames with many edges and variations constitute high-complexity videos.
- High Motion and Multiple Motion
- Fast camera movement
- Pan, zoom in, zoom out
- Fast object movement,
- Change in view or perspective,
- Lighting changes, etc., add to the complexity of the video.
- Frequent Scene Changes
- new scenes, change in camera view and adding and deleting objects to the background make the footage more complex.
However, these scenes can tolerate more noise than smooth/low-motion video segments.
Bitrate comparison of ABR (Average Bitrate Encoder) & CAE (Content Adaptive Encoder)
Video encoding/transcoding for adaptive video streaming traditionally uses fixed bitrate ladders. These ladders are typically generated by encoding a few sample videos and setting a bit rate that works well for them.
The approach of using fixed bitrate ladder is fundamentally flawed because it is very difficult to design a fixed bitrate ladder that works optimally for all the videos due to varying complexity of video content.
Thousands of video test vectors and their complexity are classified from very low to very high complexity. The duration of these videos ranges from 30 sec to 6 min.
For this high-complexity video content, it has been observed that a bitrate of around 6000 kbps is required to ensure acceptable video quality. Below is the bar graph of bitrate (in kbps) vs. different complexity groups of video test sequences. We compare the output of an ABR encoder (@ 6000 kbps) to a Content Adaptive Encoder.
The blue bars represent ABR mode, which encodes at 6000 kbps irrespective of the video content’s category. The orange bars are for Content Adaptive Encoder, where lower complexity content use lower bitrates, and the bitrates increase with increasing content complexity.
When using Content Adaptive Encoding, we see significant bit rate savings with lower and moderate complexity content.
VMAF Comparison of ABR vs. Content Adaptive Encoding
The VMAF bar graph for ABR and CAE encoding modes is as below. The orange bars are VMAF for the CAE mode of encoding, and the blue bars indicate ABR (average bitrate) transcoding.
- For ABR mode, it can be seen that even if low and moderate-complexity video content is coded with high bitrates, they do not provide any significant additional quality improvements.
- According to Fig. 1.0, while bit consumption is lower for low and moderate contents, VMAF values are comparable to ABR @ 6000 kbps (shown in Fig 2.0).
- In some cases, CAE takes slightly lower bitrates for high-complexity content, but it improves overall visual quality and provides better VMAF.
This CAE algorithm is designed to have low complexity, so it will not add any further encoding complexity or processing delay. The CAE can work in single-pass mode, making it suitable for live encoding use cases. It provides uniform, high-quality video output. The bandwidth savings of CAE helps achieve the storage and CDN costs significantly.
The target bit rate can be reduced for mobile and tablet screen sizes, and living room displays and bigger size displays, the target bit rate can be increased to achieve desired video quality.
Advantages of Content Adaptive Encoding: Illustration with a CDN Cost
If we consider a broadcast use-case, where one-to-many video streaming is done, the bandwidth contributes a significant factor to the cost of the service offerings.
Example: consider a small media company
- generating 500 hours of content every month
- with an average (with adaptive encoding) bandwidth of video delivery of 2 Mbps
- assume an average of 10K views of this content.
- consider CDN cost ranges from 0.001 USD per GB to 0.01 USD per GB.
Annual number of hours content = 500 * 12 = 6000 Hrs Size of 1 Hr content (@ 2Mbps) = 2*1024*1024*60 (min) * 60 (hours) / (8 * 1024^3(GB)) = 0.88 GB Total CDN Usage per year = 0.88 * 6000(Hours) * 10000(user) = 52800000 GB (51562 TB) CDN cost range = 0.001 * 52800000 to 0.01 * 52800000 USD = $52800 to $528000 USD
Even a 10% savings in bandwidth can save up to $5280 – $52800, considering the above example. CDN cost savings of up to 40% for OTT and up to 70% for Ed-Tech can be achieved while delivering good quality content for a large audience.
Content Adaptive Encoding improves the video quality while significantly reducing bitrate (or filesize) depending on the content’s complexity.
Low-complexity, low-delay Content Adaptive Encoding can help transcode VOD and Live Video. Different presets for different screen sizes can help achieve good quality video and bit rate savings across various display sizes.
Content Adaptive Encoding provides stable, high-quality, low-complexity, and bandwidth-optimized solutions for VOD and live applications in E-Shopping, OTT, Gaming, EdTech, and other video streaming applications.
Ashok Magadum is the Founder and CEO of Vidarka Technologies. He has a Master's from Manipal University and 21+ years of experience in the design and development of various Video codecs, to name a few MPEG-2/AVC/SVC/HEVC/Dolby Vision/Content Adaptive, Video Transcoding, Cloud Media Streaming, computer vision, ADAS, DSP, Embedded Systems and various Multimedia Technologies.
He has worked for various startups such as Ittiam System, Zenverge Inc, Saranyu Technologies, and MNCs like NXP, Freescale, and Samsung for the design and development of many solutions.
For any queries or more details, please reach us at [email protected].