Creating the Perfect Encoding Ladder

I’m in the finishing stages of a report comparing cloud-based per-title encoding features. As part of that project, I computed the optimal ladder for 23 test clips when encoding with the x265 codec (medium and slow presets) and x264 codec (slow). This process made three things crystal clear.

First, if your HEVC ladder is anything like your H.264 ladder, you’re not optimizing your HEVC encodes. Second, if you’re using either the H.264 or HEVC ladder from the Apple HLS Authoring Specifications, it’s probably time for a rethink. Third, if you deliver diverse types of content and don’t use some form of per-title or per-category encoding, implementing either one should be your highest encoding-related priority.

Jan Ozer

Jan Ozer develops training courses for streaming media professionals; provides encoding-related testing services to encoder developers; helps video producers perfect their encoding ladders and deploy new codecs. Jan blogs primarily at the Streaming Learning Center.

To be fair to Apple, referring to the ladders, the Authoring specs say “The above bit rates are initial encoding targets for typical content delivered via HLS. Apple recommends that you evaluate them against your specific content and encoding workflow, then adjust accordingly.”

This article will hopefully serve as the basis for such a comparison.

Finding the Perfect Ladder

I detailed how to create the perfect ladder in this article entitled Creating the Perfect Bitrate Ladder for Video Encoding. The core of the analysis is the Netflix brute force convex hull methodology where you encode the same clip at multiple resolutions and data rates. You pick your top-rung based upon the desired quality for that rung, and lower rungs based upon Apple’s 1.5 – 2x jump from a lower rung to a higher rung.

My target for the cloud comparison was 95 VMAF points, which I hit in Figure 1 at 2500 kbps. I multiplied this by .66 to identify the next lowest rung, which was 1650, which I rounded down to 1600. Divide 1600/2500 and you get 1.56, within the 1.5 – 2x jump.

enc ladder Figure1

Figure 1. Netflix’s convex hull analysis to identify the perfect ladder.

I modified this analysis in three ways. As discussed in the aforementioned article, I considered 99th percentile scores as a measure of quality variability. Specifically, I chose the lowest bitrate where the total VMAF score was 95 or higher, and 99% of all frames had a score of 89 or higher.

I also capped the overall bitrate for all files at 6 Mbps. So, files that couldn’t achieve the required quality levels at 6 Mbps used 6 Mbps as the top rung at whatever levels were achieved. Note that this only happened five times with H.264, five times with x265 medium, and only once with x265 slow.

The final modification is related to the resolution of the lowest rung. Particularly with HEVC, the highest quality low rung was often 720p or higher. However, as you can read about here, digital rights management and other considerations dictated that the lowest rung be smaller than 720p, so I chose 640×360. Accordingly, even if the top quality resolution at 200 kbps was 720p, I used 360p.

I performed this analysis for all 23 test files using the slow and medium x265 presets, then again encoding with x264 and the slow preset. Once complete, I had the “perfect ladder” to compare to the various services. Though a bit off-topic for this article, I wanted to share one of the fruits of this effort in Figure 1.

Briefly, Figure 1 plots the vertical resolution and data rate of the average HEVC encoding ladders produced by the services that I analyzed. The purple ladder is the convex hull, or the theoretical perfect ladder. You see that it reaches close to 1080p by about 1200 kbps. One ladder, in green, is even more aggressive, reaching 1080p at around 600 kbps. The light blue ladder tracks the convex hull through about 800 kbps and then gets slightly more aggressive. The other three ladders are much more conservative, reaching 1080p at 2800 kbps to 3700 kbps.

All ladders use different codecs and/or different settings for the same codecs (like x265). But in general, the closer the ladder was to the convex hull, the better the service performed in overall quality and other measured criteria.

enc ladder Figure2

Figure 2. Comparison of overall HEVC ladders compared to the purple convex hull (x265/Slow).

Encoding Ladders for Various Content Types

OK, let’s get to the focus of this article, how the ideal ladders compared using various codecs and settings and how they compared to the Apple recommendations. I’ll walk through summaries of the four categories covered in the report; entertainment, animation, sports, and office, and then show the overall ladder comparison. Table 1 identifies the test clips in each category. Clips ranged in duration from one to five minutes.

Encoding ladder table 1

Table 1. Test clips.

Entertainment

Figure 3 shows the encoding ladders for the entertainment clips which are relatively easy-to-encode files. Note that both x265 clips reached 1000 vertical pixels, very close to 1080p, at 1 Mbps. In contrast, Apple’s HEVC recommendations don’t reach 1000 pixels until 4100, a much more conservative approach.

The difference between x264 and either flavor of x265 is huge; x264 doesn’t get to 1000 pixels until 2900 or so. And again, the difference between the x264 convex hull ladder and the Apple H.264 recommendations is very significant for most of the ladder.

Average entertainment ladders plus Apple recommendations.

Figure 3. Average entertainment ladders plus Apple recommendations.

Table 3 contains the average ladders in table format. Some observations:

  • Remember that in most cases the top rung quality for all convex hull ladders was 95 overall VMAF and with 99% of all frames above 89, which should be more than is adequate for your top rung. Apple’s H.264 ladder had one additional rung at 7800 kbps that’s not shown in the table because for all content types this was clearly overkill.
  • Note the top rung data rate differential between x265 medium and slow. While you’re going to increase your encoding time and cost significantly, you’re cutting the max rate by close to 19%. With even modest distribution numbers you should recoup the additional cost fairly quickly (see this article for more on the economics of choosing an x265 preset).
  • The x265 slow preset saves about 31% of the top rung bit rate over x264 slow.

enc ladder table2

Table 3. Average encoding ladders for entertainment content.

Animated Content

Animated content typically encodes more efficiently than entertainment content and benefits from higher resolutions lower in the encoding ladder. While the total bitrates are much lower than the entertainment clips, proving the first point, the results were mixed as to ladder steepness. The x264 convex hull hit 1080p much sooner than with the entertainment clips (2100 kbps as opposed to 4875), but both x265 clips deployed lower resolution rungs higher in the ladder.

enc ladder Figure4

Figure 4. Average animation ladders plus Apple recommendations.

Table 4 shows the average encoding ladders for the animated clips plus the Apple recommendations, which are clearly higher than necessary for this type of content.

enc ladder table4

Table 4. Average encoding ladders for animated content.

Sports Content

Sports content is the most challenging, increasing the top-rung bitrates almost to Apple levels. Still, both x265 encodes benefit from higher resolutions low in the encoding ladder, much more so than x264. Apple’s HEVC ladder tracks the x264 convex hull ladder reasonably closely, but is still much more conservative than either x265 ladder. Even for this challenging content, the Apple H.264 ladder looks very suboptimal compared to x264.

Average sports ladders plus Apple recommendations.

Figure 5. Average sports ladders plus Apple recommendations.

Table 4 contains the individual ladders. Observations include:

  • The top rung of the x265 slow ladder saves 24% over the x265 medium ladder.
  • x265 medium doesn’t save that much over x264 because several sports clips were capped by the 6 Mbps limitation.
  • While Apple’s ladders are still too conservative, the data rates are close to what’s needed for this type of content, particularly considering that several clips were capped at 6 Mbps for both x264 and x265 medium.

Average encoding ladders for sports content.

Table 4. Average encoding ladders for sports content.

Office Content

Office content tends towards the easiest to encode, with a screencam, a PowePoint-based video, and a simple talking head, plus other office-related content. The disruption in the ladders comes from the fact that several clips only had three rungs. Since the bottom rung was always 640×360, the resolution of rung 3 was actually lower than rung 4.

Obviously, if you’re encoding training or similar office content, the Apple ladders are not a good starting point.

Average office ladders plus Apple recommendations.

Figure 6. Average office ladders plus Apple recommendations.

Table 5 shows the various encoding ladders. Looking at the top rungs, using x.265 slow only saves about 16% while x.265 medium saves 37% over x264 slow. 

enc ladder table5

Table 5. Average encoding ladders for office content.

Overall Results

Figure 7 shows the overall results for all 23 test clips.

Average overall results plus Apple recommendations

Figure 6. Average overall results plus Apple recommendations.

Table 6 shows the overall results in table format. Overall, you achieve a 21% savings in top rung bandwidth by encoding with x265 slow as compared to x265 medium, and a 32% top rung savings by switching from x264 slow to x265 slow. Again, that’s minimized somewhat by the fact that we capped the data rate at 6 Mbps for all encodes.

Average encoding ladders for all content.

Table 6. Average encoding ladders for all content.

How to Use This Data

Historically, no organization has done more for encoding ladders than Apple, which published a ladder in Tech Note TN2224 that served as the basis for almost all early encoding ladders. Since Netflix announced per-title encoding back in 2015, however, it’s become increasingly clear that no single ladder fits all content, which Apple freely admits.

There are numerous per-title approaches; I’ll be releasing a report comparing AWS Elemental, Bitmovin, Microsoft Azure, Tencent, and Zencoder by the end of April. If you can’t implement per-title for some reason, consider per-category encoding, where you create unique ladders for different types of content distributed by your organization. I’ve seen this work very effectively with consulting clients that broadcast talk shows and action series, as well as training companies that deliver both screencam-based and real-world videos.

Jan Ozer
Jan Ozer

Develops training courses for streaming media professionals; provides encoding-related testing services to encoder developers; helps video producers perfect their encoding ladders and deploy new codecs. Jan blogs primarily at the Streaming Learning Center.

Be the first to comment

Leave a Reply

Your email address will not be published.


*