Everytime you encode a video file with a distribution oriented codec like H.264, HEVC, VP9, or AV1, you choose a bitrate control mechanism that controls bitrate, overall quality, transient quality, and encoding cost. Examples of common rate control modes include CBR, VBR, CRF, and Capped CRF. This article discusses how these options work, their strengths and weaknesses, and how and when to implement them.
The first two modes discussed, constant bitrate encoding and variable bit rate encoding, are available in virtually every encoder for every distribution-oriented codec. The second two, Constant Rate Factor, and Capped Constant Rate Factor, are available in FFmpeg for x264, x265, libvpx-VP9, and libaom-AV1, though I’ll exclusively discuss x264 in this article.
I’m going to refer to three files during this discussion.
- Test – This two-minute clip is comprised of 30 seconds of talking head video and 30 seconds of ballet repeated twice.
- Football – This is a two-minute section of the high-motion Harmonic Football test clip which contains regions of both high and low motion.
- Talkingheads – This is a two-minute segment of a low-motion talking head clip.
By way of background, whenever you encode a video file for distribution (as opposed to archiving, or uploading for transcoding) you should consider five factors: compatibility, overall quality, transient quality, deliverability, and quality. Here’s a brief description.
- Compatibility – can the player you’re delivering the video to decode and play the file? Here we’re discussing H.264 so compatibility is near universal. With HEVC, VP9, and AV1, compatibility may be an issue.
- Overall quality – this is the overall quality of the file, in this discussion measured with VMAF computed using harmonic mean averaging.
- Transient quality – this is the likelihood that the file will display transient quality issues, in this discussion measured by low-frame VMAF, or the lowest VMAF score for any frame in the file.
- Deliverability – this is your ability to deliver the file to the remote viewer without interruption. This is typically not a concern for viewers on high-bandwidth connections but definitely an issue with files delivered over 3G or similar connections.
- Encoding cost – using a technique that involves more than one pass significantly boosts encoding time; if you’re paying for your own encoding farm, or paying many cloud encoding facilities (like AWS Elemental MediaConvert), two-pass encoding boosts cost significantly.
Finally, in FFmpeg, and most encoding tools that deploy the x264 and x265 codecs, there are three switches that control the bitrate. These are:
b:v– This sets the overall bitrate.
maxrate– This sets the maximum bitrate.
bufsize– This sets the size of the Video Buffer Verifier – see here.
Let’s start with Constant Bitrate Encoding, or CBR.
Constant Bitrate Encoding (CBR)
As the name suggests, when you encode with constant bitrate encoding, you use a constant bitrate over the entire file, irrespective of the complexity of the scenes in the video file. When encoding with FFmpeg, you implement CBR by using the same data rate for b:v, maxrate, and bufsize, as follows.
-b:v 5000k -maxrate 5000k -bufsize 5000k
In the test file, which alternates 30 seconds of talking head and thirty seconds of ballet, the CBR encoded file looks like Figure 1 (in Bitrate Viewer). If you look hard enough, you can just see the wavy blue line that tracks the average bitrate hovering right around the 5000k bitrate line.
On the right, you see that the average bitrate is 4938 kbps and the peak bitrate 6013 kbps, about 20% higher. With most software encoders, CBR isn’t a flat line, but it’s certainly less variable than the other control techniques shown below.
We’ll review the quality implications of CBR encoding in a moment.
From a deliverability perspective, the advantage of CBR is clear.
If you’re delivering live video into the cloud over a fixed bitrate connection or video to a constrained connection to a remote viewer, the lack of variability in the stream helps ensure against interruptions. CBR is also a single-pass technique, which means it’s cheaper than variable bitrate encoding discussed next.
Variable Bitrate Encoding (VBR)
Variable Bitrate (VBR) encoding attempts to hit the bitrate target but adjusts the bitrate over the duration of the video according to the complexity of the content. VBR typically requires two passes; one to scan the video and identify the complexity of the different regions, the other to actually encode.
VBR is often further refined by describing the extent to which the maximum rate can vary over the target. You would call the first example below 200% constrained VBR because the maximum rate is 2x the target. You’d call the second example 150% constrained VBR because the maximum is 150% higher than the target. The third example would be 110% constrained VBR.
-b:v 5000k -maxrate 10000k -bufsize 10000k -b:v 5000k -maxrate 7500k -bufsize 7500k -b:v 5000k -maxrate 5500k -bufsize 5500k
Figure 2 shows the bitrate profile of the test file encoded using 200% constrained VBR. The data rate clearly fluctuates between the alternating low-motion talking head sequence and the higher-motion ballet. Though the average bitrate is similar to CBR (5041 kbps compared to 4938 kbps), the maximum bitrate is significantly higher (11137 kbps compared to 6013 kbps ). The 150% constrained VBR clip has a similar average (5036 kbps) and a 20% lower peak bitrate (9090 kbps).
Obviously, from a deliverability perspective, VBR is more challenging but this only matters with constrained connections close to the streaming bitrate. If you’re delivering 5000 kbps 1080p video to viewers in the US, Europe, and Scandinavia with 50 mbps and higher connection speeds, you probably won’t experience any issues. If it’s 40 mbps 8K video to the same regions, 200% constrained VBR starts to feel a bit scary. Of course, if it’s 500 kbps 200% constrained VBR video over a 3G connection, CBR (or 110% constrained VBR) sounds a lot better.
What are the quality implications of all this?
Table 1 shows the scores of the real-world Football clip using the four discussed modes. The average bitrate is very similar with significant deltas in peak bitrate. The overall VMAF score is very close; less than 0.7 VMAF points differentiate CBR and the highest VBR value.
|Encoding Mode||Average Bitrate||PeakBitrate||VMAF||Low-frame VMAF|
The big difference is in low frame score, the indicator for transient quality issues, where CBR is about 5.5 points lower than 200% constrained VBR. This represents a transient issue that some viewers might notice. Interestingly, there’s only about a 1 point difference in low-frame VMAF between 200% and 150% constrained VBR, and another two point difference between 150% and 110% constrained VBR.
To explore further, I compared the CBR and 200% constrained VBR files in the Moscow State University Video Quality Measurement Tool (Figure 3).
- The top graph is the VMAF score for both files over the duration of the entire file, with CBR in red, 200% Constrained VBR in green.
- The bottom graph is a zoom in of the highlighted region in the top graph which roughly is from frame 2100 to 3400. The red stalactite-looking formations are frames where CBR quality is significantly worse than VBR.
In the figure, you see the Show frame button on the lower right. In this clip, which is encoded using fairly conservative encoding parameters, the difference between the CBR and VBR frames was almost unnoticeable, particularly since the most significant deltas were only one or two frames in duration.
With other clips, encoded with a lower bitrate, the transient issues might be more noticeable. It’s the potential for these transient issues that convinced most VOD producers to use VBR rather than CBR, particularly for 1080p video distributed to high-bandwidth viewers.
Interestingly, Apple endorses 200% constrained VBR in their HLS Authoring Specifications, which states “1.30. For VOD content, the peak bit rate SHOULD be no more than 200% of the average bit rate.” That said, whether 200% constrained VBR is appropriate for high-frame rate 8K content, which might require 40 mbps to achieve acceptable quality, remains to be seen.
To summarize up till now, CBR wins for cost and deliverability while VBR edges CBR overall in quality. However, the risk of transient quality issues is very real with CBR.
Constant Rate Factor (CRF) Encoding
With CBR and VBR you choose a target bitrate and the encoder adjusts quality to meet that bitrate. The problem with this approach is that if you’re using the same encoding ladder for all of your video clips, you waste a lot of unnecessary bandwidth with easy-to-encode clips like our talking head clip.
Figure 4 shows the talking head clip encoded at 200% constrained VBR with a 5 mbps target, same as our football clips. The average and peak bitrates are inline with the football clip above, but the VMAF score is 97.61.
Studies show that VMAF values in excess of 93 aren’t perceivable by viewers which is why I recommend that producers target a VMAF score of 95 for the top of the ladder clip. As you’ll see below, with this clip, you could reduce the bitrate by at least 60% and still hit that 95 target.
So, again, when encoding with CBR and VBR the encoder adjusts the quality as needed to hit the target bitrate. In contrast, with CRF encoding, a single-pass encoding mode, you choose a quality target and the encoder adjusts the bitrate to achieve that quality level. CRF values range from 0 to 51, with lower numbers delivering higher quality scores. Encoding with CRF and FFmpeg looks like this:
ffmpeg -i input_file -crf 23 output_file
CRF encoding works well for archiving or for producing mezzanine files for upload and transcoding. However, it’s suboptimal from a deliverability perspective because you don’t know the data rate that you’ll produce until you encode the file.
- With the talking head clip, a CRF value of 22 resulted in a file with an average bitrate of 1878 kbps and a VMAF score of 96.26, shaving more than 60% the data rate of the VBR encode with no perceivable impact on quality.
- With the football clip, however, CRF 22 produced an average bitrate of 10650 kbps, which is too high for most 1080p encoding ladders.
How do you harvest available bandwidth savings while ensuring a reasonable data rate limit? By combining CRF with a data rate cap, or Capped CRF.
As the name suggests, with capped CRF, you combine a CRF value with a data rate cap. The relevant portion of the command string would look like this.
-crf 22 -maxrate 5000k -bufsize 10000k
With the alternating talking head and ballet test clip, this command string produced the result shown in Figure 5. Again, while the max rate isn’t a flat line the ballet GOPs are very closely aligned to the 5000 kbps line and the peak bitrate is 6302. In operation, the encoder used the CRF value to encode the talking head region and applied the cap in the ballet regions.
How does this compare to 200% constrained VBR?
The 200% constrained VBR encode produced a mean VMAF of 97.30 (and a data rate of 5041). So, the capped CRF encode saved about 30% of the bandwidth and produced a VMAF of 96.55, which would be visually indistinguishable. However, as you see, there is significant bitrate variability, which could hinder deliverability using constrained connections.
In a high motion clip, like the football test clip, there are many regions in the clip where the CRF value produces a data rate higher than the cap. In these regions, the cap controls the bitrate, not the CRF value. In these cases, capped CRF won’t save much bandwidth because there are few regions where the encoder can produce the specified quality without exceeding the cap.
You see this in Table 2 which shows bitrate data and VMAF scores for the Football clip encoded using 200% constrained VBR and Capped CRF (CRF 22/5 mbps cap). The average bitrate is about the same, though the capped CRF clip has a much lower peak. Average VMAF scores are also very similar.
|Encoding Mode||Average Bitrate (kbps)||PeakBitrate (kbps)||VMAF||Low-Frame VMAF|
As with CBR, the major delta is in the low-frame VMAF, the indicator of transient quality issues. Figure 6 shows the comparison Result Plot from VQMT; again, when looking at the frames at the sites of the major stalactites, I saw no observable difference.
However, where CBR only enhances deliverability, capped CRF does this and saves bandwidth on easier-to-encode files. In essence, this makes capped CRF a per-title encoding technology that you can implement with almost all encoding tools, live and VOD, that are based on FFmpeg.
Capped CRF isn’t a slam dunk; you should run your own tests and determine if the transient issues are more evident in your clips than I saw in the football clip. If transient issues are minimal and you are considering capped CRF, you should experiment with different CRF levels (see here).
Again, CRF and capped CRF aren’t available for all encoders and all codecs; so if you’re using a third-party encoder not based upon FFmpeg and not using the x264, x265, libvpx-VP9, or libaom-AV1 codecs, they may not be available.
Table 3 summaries the strengths and weaknesses of the four encoding methods discussed.
|CBR||– Adjusts quality to achieve bitrate|
– Same bitrate entire file
|– Consistent bitrate |
– Single pass
|– Overall quality|
– Transient quality
|– Live |
– VOD with constrained bandwidth
|Constrained VBR||– Adjusts quality to achieve bitrate|
– Adjusts bitrate to scene complexity
|– Overall quality |
– Transient quality
|– Bitrate variability|
– Cost (2 or more passes)
|– Most other VOD|
|CRF||– Adjusts data rate to achieve quality||– Single pass|
– Delivers set quality level
|– No bitrate control||– Archiving|
– Mezz file creation
|Capped CRF||– CRF with data rate maximum||– Per-title method|
– Single pass
|– Transient quality|
– Bitrate variability