In this article, we compare and analyze the performance of the SVT-AV1 presets by varying the CRF values and showing how the objective quality (PSNR, VMAF, SSIM), file size, and encoding speed vary.
The Alliance for Open Media (AOMedia), consisting of several industry giants like Google, Microsoft, Mozilla, and Netflix, developed the AV1 video codec – open and royalty-free. The primary objective was to create a codec with superior compression capabilities to potentially replace H.264 and H.265/HEVC and keeping in mind that the world was moving towards higher resolutions.
Since the proposal of the AV1 standard, there have been several implementations, such as
- aomenc / libaom
- SVT-AV1 (developed by Intel and Netflix and adopted by AOMedia in 2020)
- rav1e by Xiph
- and other commercial implementations
I’ve chosen SVT-AV1 for this article and will evaluate its “preset” performance across a range of CRF values. The idea is to understand how each preset performs w.r.t quality, size, bitrate, speed, etc.
This exploratory post will not tackle CBR, row/column tiling, multi-threading, etc. Please subscribe for follow-up articles on SVT-AV1, rav1e, other video codecs, rate-control modes, and visual analysis.
Table of Contents
I ran all the tests on an AWS EC2 c5.9xlarge instance. Here are its specs –
- Intel Xeon Platinum 8124M with support for the AVX-512 instruction set.
- 36 vCPUs
- 72GB RAM
- Operating system: Ubuntu 22.04 LTS
I used FFmpeg with the SVT-AV1 encoder that I compiled locally on the EC2 instance. I chose this approach because this is what I use in my setup and what most companies in the video streaming industry use.
Here are FFmpeg and SVT-AV1 specs:-
Finally, I used the Parkjoy (1080p, 50fps) test sequence with moderate to high motion and lots of grass/trees/textural details.
As I mentioned in the introduction, this experiment aims to observe the performance of the SVT-AV1 presets as you vary the CRF value used for encoding in FFmpeg (what is Constant Rate Factor).
If you are familiar with FFmpeg, you’ll know that the “preset” parameter controls the encoding tools used to compress a video by turning them on/off entirely or changing their settings.
- A “slower” preset will take lots of time to encode and produce an excellent-quality video! This is because the encoder will enable the most extensive possible set of tools and the most generous parameters (e.g. search ranges for motion estimation, quarter-pel motion estimation, OBMC, etc.).
- Conversely, a “fast” preset will force the encoder to drop specific coding tools to encode quickly and, in exchange, sacrifice video quality.
It’s a tradeoff at the end of the day – speed vs. quality!
SVT-AV1 presets go from 0 – 13, and you can go here to learn about the presets and what coding tools they enable/disable. I avoided “13” because it is only used for debugging (from the official docs). I also avoided presets 0 & 1 because they are too slow from my earlier tests!
So, I decided to go with presets=2 – 12 for this analysis.
Finally, here is the procedure I followed –
- fix the preset value to N (min = 2, max = 12).
- vary the CRF from 20 – 63 in steps of 6
- for each encode, capture the PSNR, VMAF, SSIM, file size, encoding time, encoding FPS, and bitrate.
- After evaluating all possible CRF values, change the preset to N+1.
Command Line and Disclaimers
The FFmpeg command line I used is a simple 1-pass AV1 encode –
ffmpeg -benchmark -i <input> -c:v libsvtav1 -crf <value> -preset N output.mkv
This is wrapped into a script that cycles through all combinations of CRF and Preset, extracts the metrics, and generates the charts.
I’d also like to mention that these experiments are not aimed at producing CBR, capped VBR, or testing how tiling or multi-threading works in SVT-AV1 – these are topics for future articles.
This article is an exercise in showing the variation of performance of SVT-AV1 with presets and CRF values.
Typically, when you look at a video codec’s performance, you need to evaluate –
- metrics such as the PSNR, SSIM, and VMAF (how to compute PSNR, SSIM, VMAF?)
- output file size & bitrate produced
- encoding speed (time and FPS)
These factors will reveal if the codec and the parameters you chose suit your use case.
For example, a 24×7 news studio might want to encode and publish videos quickly. The studio might be willing to compromise on video quality (because of the short-lived nature of its content) in return for a fast turn-around time. I’ve seen this multiple times while working with news agencies in India.
On the other hand, a premium content provider like Netflix or SonyLIV will be willing to invest more time in encoding its content because it knows that its end-users expect premium quality and that its content’s lifespan is quite long!
There is always a tradeoff between video quality, speed, and file size. Hence, extensive testing and evaluation are needed to choose the correct codec & its associated parameters.
With this in mind, let’s look at the results starting with SVT-AV1’s encoding speed.
Encoding Speed of the SVT-AV1 Presets
I will start the analysis of the presets from the encoding speed and use this to substantiate a few points as we go along. Please bear in mind that I am using a
c5.9xlarge machine for this test.
Side note: you’ll notice that I have plotted the same data in two different ways for each metric – this will carry on throughout this article.
- In the left subplot, I show the variation of the metric w.r.t the SVT-AV1 presets, keeping the CRF constant in each plot line.
- In the right subplot, I show the variation of the metric w.r.t to CRF, keeping the preset constant in each plot line.
It’s the same data – just viewed in two different ways.
Here are the encoding FPS plots for all the SVT-AV1 presets with CRF varying from 20 – 63.
Figure 1: Encoding FPS results for Parkjoy using SVT-AV1 v1.6.0
The results are self-explanatory –
- preset=12 is the fastest, while preset=2 is the slowest. This is expected.
- for each preset, the FPS increases as you increase the CRF from 20 towards 63. This is because more coding tools are turned off, and the encoder’s focus on video quality reduces.
While performance depends on the kind of machine you are using to encode, it is pretty safe to observe that presets 8 – 12 are suitable for fast encoding (near real-time, real-time, or faster-than-realtime).
Presets 2 – 4 are pretty slow, and this is okay, as they are designed to produce high-quality video at the lowest possible bitrate and file size. It makes much more sense when presets 2 – 4’s speed is seen in conjunction with the objective quality metrics and file size! We’ll see that later.
Next, let’s move on to the objective quality metrics.
PSNR, SSIM, VMAF Scores
Here are the objective quality scores (PSNR, VMAF, SSIM) for SVT-AV1 by varying the presets and CRF values.
Looking at the objective metrics –
- For very low CRF values (around 24 and below), the VMAF scores are similar across all presets with very little variation and certainly within the JND rule-of-thumb of 6 VMAF points.
- However, there are noticeable differences in the same CRF ranges in the SSIM and PSNR data. For example, at CRF=24, preset 2 gets a 40dB score, while preset 12 scores 37.5 dB.
- The VMAF graphs tell us something interesting and odd – if you want a VMAF score greater than 95, you can use CRF values up to 38, irrespective of the preset.
So, does this mean you can use the highest speed preset, “12”, set the CRF value to 38, and get a video comparable to preset=2 (the slowest mode)?
The VMAF scores seem to agree with this assumption!
However, the answer is not that simple. Apart from just the video quality, you also need to look at the size and bitrate of the files produced to get a complete picture of what is happening.
Here is a snippet of the data in the table below –
|Preset||CRF||Filesize (MB)||Bitrate (kbps)||PSNR (dB)||SSIM||VMAF||Time (sec)||FPS|
Here is what the data says when the CRF is fixed at 38,
- preset=2 produces a file of size of 32 MB & bitrate of 26 Mbps
- preset=12 produces a file of size of 39 MB & bitrate of 32 Mbps
- preset=12 is 135x faster than preset=2.
So, if you use preset=12, you will get the same video quality (VMAF) as preset=2 and very high performance, but the output is ~19% bigger and 18% higher bitrate. This will increase your storage costs, CDN delivery costs, on-device download storage, and chances of buffering! Something to think about!
Visually (subjective), there is a difference between the two sequences (preset=2 and preset=12) as the lower preset’s output appears more grainy than preset=2. Structurally, one would not find a lot of problems with the grass, but one will surely see a lot of grain and noise around the people running (PSNR seems to have picked this up).
Note: please open the images in separate tabs for a closer look.
To understand this better, here is another slice of the data with even-numbered presets, CRF fixed at 26, and the corresponding metrics and data points.
|Preset||CRF||Filesize (MB)||Bitrake (kbps)||PSNR (dB)||SSIM||VMAF||Time (sec)||FPS|
From the table above, we can see that
- preset=12 has a 124x performance gain over preset=2, but their VMAF scores differ only by 0.420. The same goes with the SSIM (0.013 gap), but PSNR shows a difference of ~ 2.5 dB gain for preset=2.
- However, if you look at the file size and bitrate produced, preset=2’s output is ~ 18.5% smaller, which can significantly impact the overall business metrics!
- On the other hand, if you are not worried about file size and need a quick turn-around time, perhaps presets 10 & 12 are worth looking at.
Now look at preset=6: it produces a file whose size, bitrate, and objective metrics are similar to preset=2. And it runs 20x faster than preset=2.
With this, one can start building an argument that if you want a preset that provides a good trade-off between quality and speed, you might want to consider SVT-AV1 presets in the range of [6, 8].
SVT-AV1 Preset Usage
Judging by the results, one can safely use a preset between 9 – 12 for fast/real-time/faster-than-realtime encoding. Again this depends on the server capacity that you have. I observed that an AWS EC2 instance like the
c5.9xlarge could provide high-speed encoding, nearing or exceeding real-time, depending on your source/destination frame rates.
But, if you have more time to spare, a preset value of 6 can give you a good trade-off between quality, file size, and speed. Playing with SVT-AV1 presets 6, 7, and 8 will help you find the right balance for your use case.
Finally, an SVT-AV1 preset of less than six will give you a high-quality video but will take up much computing time and resources.
We end our evaluation of the SVT-AV1 preset video codec by comparing its performance by varying the preset and CRF values. The SVT-AV1 codec is a good candidate for teams willing to experiment with AV1 deployments.
With the right preset, encoder settings, and computing hardware, you can quickly reduce your video library’s size, storage, and CDN delivery costs with SVT-AV1.
In future articles, we will evaluate SVT-AV1 vs. other popular video codecs and look into the results for specific encoder settings and video genres. In addition, we will also evaluate its CBR encoding performance and compare it with other codecs and AV1 implementations.
Click here to learn about video coding (fundamentals, advanced topics, AVC, HEVC, AV1, LCEVC).
Until next time, happy streaming!
Krishna Rao Vijayanagar
Krishna Rao Vijayanagar, Ph.D., is the Editor-in-Chief of OTTVerse, a news portal covering tech and business news in the OTT industry.
With extensive experience in video encoding, streaming, analytics, monetization, end-to-end streaming, and more, Krishna has held multiple leadership roles in R&D, Engineering, and Product at companies such as Harmonic Inc., MediaMelon, and Airtel Digital. Krishna has published numerous articles and research papers and speaks at industry events to share his insights and perspectives on the fundamentals and the future of OTT streaming.