OTTVerse interviews Zoe Liu, Chief Technology Officer & Co-Founder, Visionular – NAB 2024

In this exclusive interview with OTTVerse, Zoe Liu, the CTO of Visionular, talks about the AV1 codec, how the adoption and support for AV1 has been growing in the last few years, the use of AV1 in VOD and Live Transcoding, and the future of codecs.

Zoe Liu is the Co-Founder & CTO of Visionular, a leading on-prem and cloud API solution provider in video transcoding, intelligent processing, and video streaming. Before co-founding Visionular, Zoe Liu was a Staff Software Engineer with the Google Chrome Media team for five years.

Visionular provides AI-driven video compression (H.264/AVC, HEVC, AV1) for Live, VOD, RTC use cases. Visionular’s AI-driven video and image compression can help reduce storage, delivery, and transcoding costs for video companies.

You can meet Zoe and the Visionular team at NAB 2024 in Booth W1666.

Watch this exclusive interview here –

[00:00:00.000] – Jan Ozer
Hi, I’m Jan Ozer. I’m sitting here with Zoe Liu from Visionular, and we’re here to talk about codecs. Hi, Zoe.

[00:00:10.960] – Zoe Liu
Hi, Jan. Thanks for having me down here. It’s always a great pleasure for us to discuss about codecs, especially with you.

[00:00:18.710] – Jan Ozer
It’s always a lot of fun. So you at Visionular support three codecs. You support H.264/AVC, H265 or HEVC and AV1. What’s new with AV1?

[00:00:29.340] – Zoe Liu
Well, AV1 actually has been here for five years, I would say almost six years, because it was finalized in June 2018, now it’s the sixth year. I think the new thing is, I would say the ecosystem is picking up. So everybody knows that last year, Apple rolled out the iPhone 15, which is A17 that has AV1 hardware decoder available in that chip side. And then at the same time, we also see that at least from our side, we see AV1 has been talked quite a bit. And for us, of course, we got more use cases and leveraging our AV1, Aurora1 encoder product to really deploy in different use cases. So we can see that AV1 is picking up and actually faster, most recently, that we have observed.

[00:01:20.050] – Jan Ozer
Is that live, VOD?

[00:01:22.520] – Zoe Liu
Well, that’s actually a good question. It’s actually both, because I would say the VOD case always benefit from the new generations of vendors, because vendors have more new coding tools into that. Usually, it’s more complicated. So you can afford more CPU resources, more turnaround time, then you try to get higher quality and lower bit rate. So definitely on both sides, AV1 is also picking up. We actually received quite a bit more requests on that, and especially AV1 has the film grain encoding for the very first time in the main profile. And in the living room, we will see more and more TV sets. They not only support AV1, but support AV1 synthesis. That actually adding another layer of support to AV1. At the same time you mentioned you’re asking about live, especially we have observed a very opposite trending that for low latency, extremely low latency scenario for AV1 , two things. One is AV1 for low latency live, the other one is AV1 for real-time communications, which is RTC. They all pick up. So for example, right now we do live streaming, and then if someone needs, we recently have a use case, for example.

[00:02:55.380] – Zoe Liu
It’s actually Jan, and they need to capture video and really low delay to send the video back to the ground because they want to know in the ground what is going on in the air. So the delay has to be low. And on the other side, the bandwidth from the air down to the ground is very challenging. So originally, they installed the AVC 264 encoder, but the bandwidth is limited and the quality comes out not as ideal. When they switched from AVC to AV1, they observed the dramatically with very limited bandwidth, I would say less than even 100 or 50 kilobits per second, they actually observe very good quality delivered by AV1. So this is one use case. On the other use case, we just mentioned for very low delay, there’s an interaction. For example, we have meetings, and AV1 has a unique set of feature, which is a screen content coding ICC features. This is the first time that as a standard, like AV1, including this set of features in its main profile. So every standard, every encoder using AV1 need to support, and the screen content is very unique from natural content. And we all know that, for example, even the background, for example, is orange, but because the lighting noise, the orange is not the same from this dot to that dot.

[00:04:22.580] – Zoe Liu
If you have a screen, if you have a slide, and the background is orange, it’ll be orange everywhere, exactly the same pixel value. So definitely this two kind of different. If you have dedicated screen content coding, you can save a lot of bits on that content. So we recently have another use case. The use case is actually for the security. So someone at the bank, they may have a key ask, a login from either a browser or from dedicated app, and then it could be hacked at that moment. So what our customer used is they actually put all this sensitive info to the back-end, and all the back-end generated the user side interface as a video, and send this video back to the user side, so when the user type in the password, what they see on the screen indeed is a decoded version that has been sent from the server side. This side, you actually need a very low delay video that is being processed on the back-end, and also very fast delay decoding. This is all can leverage the AV1’s ICC, the screen content coding tools. First, it’s high quality because you want the user just feel everything just happen non-local.

[00:05:44.410] – Zoe Liu
Actually, there’s a communication video transmission, encoding, all happening. On the other side, you can have the low delay and just feel like everything, but gain very high level of security. Those are user cases that you can leverage unique features. And of course, you need to get encoded for the optimized to fast enough but still maintains the advantage by the AV1 standard.

[00:06:10.590] – Jan Ozer
What about AV1 in broadcast TV, including HDR? Out to television sets and out to mobile?

[00:06:17.800] – Zoe Liu
Yes, I think for AV1, for the broadcasting is because, as we mentioned, that it has more complicated tools. I saw as you’re actually further optimize it because in when you’re testing live, usually you’ll be able to a taller, a little longer delay. So for example, even for sports, live streaming, you usually can turn, for example, up to 5, 10 seconds, or sometimes even people talk about 30 seconds. In that low delay, in that latency, not that low delay, but that latency level, you can average the encoding optimization to further leverage the AV1’s new coding tools to further save the bid rate, get higher quality. In this scenario, we also observe that AV1 is picking up because of the new size of coding tools. So boosts up the quality, still saved the sitting cost, getting a little further layer bitrate.

[00:07:21.700] – Jan Ozer
When AV1 first shipped, and it was, I guess it was launched five years ago or six years ago here.

[00:07:27.980] – Zoe Liu
Yes, that’s right. It’s June 2018 when it was finalized.

[00:07:32.520] – Jan Ozer
When I tested it for Streaming Media magazine the first time, I think it was 2000 times real-time. How did you get it down to not only real-time, but also multiple outputs on a single computer real-time? How did you get it so efficient on the encode side?

[00:07:48.420] – Zoe Liu
Well, that’s actually every codec standard when they deploy, experience that. While AV1 just roll out, we still remember that it’s actually, it’s MSU, Moscow State University, actually published in the early part of 2018, even before it was finalized, they mentioned that, Wow, AV1 is about 2000X slower. That first impression is I gave everybody’s opinion that AV1’s good, but it’s slow. Along the way, I think it’s just like many other standards. There’s always a coding tools that can further optimize. They’re also underlying assembly code that you can get it faster. So there’s always a way that you can get faster, and then you can leverage all kinds of different scenario, different use cases to optimize the two sides. So by today, not only us, we always mention that we are delivering commercial encoders, but we actually leverage a lot of open source and then the knowledge from the community. We always mention that we climb further up and by standing on top of the giant shoulders, which come from the open source community. There’s mainly two open source AV1 encoder, libaom and SVT AV1. There’s a great AV1 Software Decoder Open Source. Then all this actually contribute a lot, the community to make everyone getting faster and faster.

[00:09:27.110] – Zoe Liu
I just want to give one example because this is not AV1. This is because we’re doing MV-HEVC for the demo. MV-HEVC is especially triggered by Vision Pro because Vision Pro use MV-HEVC to get encoding those special videos. When we actually got our own MV-HEVC encoder, we try to get a benchmark. Right now, the benchmark is HTM. It is a reference software for the standard. It is really slow. Actually, when we encode one minute, Big Buck Bunny, spatial videos, HTM needs 36 hours to encode one minute. So if I say, Oh, this is so slow, then I’m really slow. But I can tell everybody using our, not just our, I think others can’t potentially do that, but it’s not available to us yet. So using our Aurora 5 MV-HEVC encoder, we encode even smaller file size, a better quality. We need only 42 seconds. Can you see MV-HEVC as a standard? It is slow. Is it just that you have not optimized? Same thing back to AV1. 2018, slow to nowadays, everybody enjoy not only VOD, live, but also low latency, just to mention. The latency there is a highest to be lower that sometimes only tens of a milliseconds, even lower than 100 milliseconds.

[00:11:00.020] – Zoe Liu
With all this, maybe our encoder engineers are dedicated together, and then we make the encoder better and better, really become more flexible and feasible.

[00:11:11.030] – Jan Ozer
What are you hearing about VVC at this point?

[00:11:14.900] – Zoe Liu
Yeah, I think a VVC is also great standard because a VVC was finally in 2020, two years actually younger. And we already see that a lot of people are deploying, developing VVC. It’s a little challenging right now. For example, when the browser starts to support VVC, and then people ask about, we actually always think newer standard, like a VVC. Now there’s an even newer standard on both sides. Like MPEG, they talk about ECM. On the AOM side, they talk about AV2. There’s a new tools that actually get into the newer standards because you still want to get better quality, even lower bitrate. All these things are ongoing. I have to say that AI also has been considered. There’s actually new tools, but also very complicated. With this way, I think encoder also enjoy the whole set of technologies. If more faster hardware, like GPUs, not just on the server side, but on the mobile side, we all know that on Apple, there’s NPU already really taken off on devices. We can tell that potentially AI-based, even big model-based, that kind of encoding tools will be included in the newer standard, further optimization with the support of the newer generation of hardware, both CPUs, GPUs, and the dedicated chipset.

[00:12:58.120] – Zoe Liu
I think we all can enjoy new technologies and even better quality, lower bit rate, even lower latency, and then everybody should be able to enjoy. Not only, higher quality, high resolution, but could really enjoy the immersive video experiences.

[00:13:22.560] – Jan Ozer
So what about the AR/VR revolution that we, I guess a lot of people hope was going to drive the next generation of codec adoption? Apple came out with HEVC in their Vision Pro headset. What’s the next big application that’s going to drive demand for a new codec? What are you seeing?

[00:13:43.740] – Zoe Liu
Well, this is a little bit of a challenging question. So for example, Vision Pro, we definitely hear from the market that everything is a long term Vision Pro, even though it’s a challenging people complaining, maybe the battery seems like challenging. Also has a little heavy to actually wear for more than two hour or sometimes even one hour. However, and then the immersive experiences that you get from Vision Pro is AR because you’ll see the real world and also you see the virtual world, and also their technology detect the gesture of fingers, and then the eyes, and that is a very intriguing and innovative. So we would say that the immersive AR, it is one of the experience. On the other side, everybody talk about now we have autonomous cars, right? So in that one, people are moving around, but you can see that a lot of video is being collected. Then those videos could be a large volume. Now we’re talking about encoder, not just for getting the videos, serving the human eyes, because we experience. Then some of the videos not serving human eyes. They’re serving the machines to learn new things.

[00:15:04.350] – Zoe Liu
There’s also, everybody know that there’s Open AI. They have the generative AI. They create so much, very real, and those generative videos. Actually, we analyze the Gen AI video. We found that it’s very interesting because first, we found that those videos still very different from natural videos. They don’t have that much noise. They have a lot higher contrasting and the sharpening. Actually the edges in those videos are usually much sharper, actually the natural videos and closer to the computer generated videos. If it’s just from codec point of view, you can find, oh, these are not natural videos. It must be something that have been generated by machines. Also for the cameras, you can see that the camera motions a little more regular. It’s slightly different from the real world. Now you have a new generation videos. And along the road, we know that the GenAI video could create more and more natural video, but it also create a lot more videos. Right now, those videos are generated with a big bottle in the server side, and then you need a compression because you need to share them in the end, and distribute it to the end devices.

[00:16:19.400] – Zoe Liu
So with all of this, we have the Gen AI videos, we have the video on different scenarios, and then we also know that Starlink, they actually serve the videos from the outer space back to, we actually have a team member here once joined the Artemis project back at Webex. Webex at least got the first AV1 video call from the moon down to the Earth, but it’s actually not video call. It’s mainly 20 second delay down there. But with all of this, it’s actually technology-driven all these experiences with AIs, with our human beings exploring the world, not only on the Earth, but also on the outer space. And this is the second time that technology driving the new use case, new use case in return, gave the new challenges for these technologies. So we’re all happy to be a part of this very innovative and fast-pacing world. And then I would say new things will come a lot faster than before.

[00:17:25.020] – Jan Ozer
Can you talk real briefly about the H. 267 and AV2.

[00:17:30.870] – Zoe Liu
Okay, just on the way to mention, H.267 is just I mentioned, it’s an ECM, something ongoing on impact side, and on the other side, AV2, right now, it’s, I have a new name, AVM. That’s actually ongoing on the ALM side. So right now this is two organizations, they are the big driver behind the new codec standards. And I would say they’re still actually mainly for boosting further quality and the lower bitrate, but on the same time, where you mentioned, they consider how do we leverage AI. So they do, for example, using the AI-based deblocking filter. It’s amazing that sometimes they got even 7% to 10% bitrate savings by using only that coding tool. This is actually a lot. You already standard from one generation to the another generation, you can 25% to 30% under the standard condition is already generations difference. But now this one tool by AI, you actually obtain close to 10%, that’s a lot. However, the complexity is huge. So as I mentioned, with the AI use, when you use the hardware, when you get a new hardware, not only on the server side, because the standard need to think about the…

[00:18:52.810] – Zoe Liu
IT standard mainly standardize the bit stream as well as the decoder behavior. You don’t want to increase the decoder has it that much. Then when the end devices has more and more computational power, not only for CPU, but for AI, like we just mentioned, NPU, it is possible actually using more AI-based to further down actually the file size at the same quality. We believe both standard consider what they can do because the standard is serving for future. Next five years, or even longer. Then they need to consider what’s the new technology. Everybody have to think about AI. As we mentioned that they all think about AI. They all also look into what’s the hardware, what’s the chipside, what’s the other technologies that will drive those tools deployment in the new standard. So it’s everything together. When we talk about computational, talk about quantum computing, so when this computational power become as exponentially developed, I will say new coding tools will include this new standard beyond our imagination.

[00:20:12.230] – Jan Ozer
Well, that’s a great place to stop. Thanks for taking the time today.

[00:20:15.390] – Zoe Liu
Okay. Thank you for having me. I hope that everybody can pay attention to video compression, and especially everybody deserves to enjoy the best video experiences.

Disclaimer: The following transcript has been lightly edited for clarity and conciseness

1200x200-Pallycon

Leave a Comment

Your email address will not be published. Required fields are marked *

Enjoying this article? Subscribe to OTTVerse and receive exclusive news and information from the OTT Industry.