High speed transcoding refers to processing video content quickly and efficiently without sacrificing video quality.
Video processing workflow, which includes transcoding, packaging, and CDN transfer, is one of the most time-consuming workflow processes in many applications. High-speed transcoding addresses this issue and makes the video processing workflow faster.
Ashok Magadum is the Founder and CEO of Vidarka Technologies, a company focused on the domain of High Quality, Cost Optimal, Scalable, and Reliable Video Streaming Solutions for VOD, Live and Low Latency Streaming covering OTT/FAST, EdTech, E-Shopping, Gaming, and many more application areas.
Some of the example applications where high-speed transcoding is desired include,
- The preparation of recorded VOD content for a live-streaming event where the recorded content is to be made available as soon as the live session ends
- News applications where the edited video content needs to be distributed at the earliest.
- Short video transcoding for many applications such as e-commerce and advertising where the user uploads the content desires to have it ready for distribution as soon as possible.
The most time-consuming module in the entire video processing chain is video transcoding. This article discusses the challenges, limitations, and approaches to reducing video transcoding time. It also presents a chunk-based content-adaptive encoding technique to reduce video transcoding time.
Read my article on Content Adaptive Encoding to learn how it can reduce your CDN costs by 40 – 70%.
Table of Contents
Why Speed Matters for Video Delivery
In many video streaming applications, it is essential to make the recorded live video available to viewers watching on different devices and at different bandwidth conditions. News, e-commerce, interviews, and recorded live events should be made available to viewers at the earliest.
For news broadcasters, fast transcoding is critical to get breaking news stories out to audiences in near real-time. News events happen quickly, and broadcasters need to be able to quickly edit and transcode their footage into various formats for distribution to different platforms. Fast transcoding allows them to do this quickly, ensuring their content reaches viewers as quickly as possible.
Interviews are a popular form of video content in advertising because they allow brands to showcase their products, services, and expertise through the voice of an industry expert or a satisfied customer. However, to make these interviews effective, advertisers need to be able to transcode and edit the footage quickly so that they can distribute it across various advertising channels as soon as possible.
Similarly, for OTT video streaming platforms, fast transcoding is desirable for content companies to provide new content quickly to users. These platforms need to be able to transcode video content into different bitrates and resolutions to accommodate different internet connection speeds and playback devices. With fast transcoding technology, they can quickly transcode large volumes of video content and make it available to viewers in the best possible quality.
restrictions on high speed Transcoding and Video processing.
Video encoding involves a complex process of compressing video data to reduce its size with minimum compromises on visual quality. Parallel processing is one way to speed up this process, but certain restrictions must be considered.
One of the main restrictions in parallel processing for video encoding is that certain parts of the encoding process cannot be parallelized.
For example, the motion estimation process, a crucial step in video encoding, requires access to previously encoded frames and cannot be parallelized across multiple frames.
Within a frame, in intra-video processing, the limitation to parallelizing encoding of macroblocks is the need for data dependencies. Since each frame is compressed individually, there is a need to reference data from previously coded blocks of the same frame. This means that certain parts of the compression process cannot be parallelized.
Entropy coding is another crucial step in video coding that involves encoding the quantized coefficients obtained from the transform stage. One limitation is that entropy coding relies on context-based coding, which means that the encoding of a particular symbol depends on the symbols that have been previously encoded.
This creates a data dependency that limits the extent to which entropy coding can be parallelized. In particular, if different processing units try to encode symbols in parallel without communicating, the coding efficiency may be reduced due to the lack of shared context information.
Approaches for High Speed Transcoding
Below are a few approaches that can help improve video encoding speed despite the limitations due to the nature of video encoder/transcoder algorithms.
Systems with higher clock frequencies can help in speeding up the encoding process. The encoding instructions can be executed faster and provide faster outputs. However, in this approach, the scalability is limited as there is a limit to which a core can be clocked. Hence, there can be limited speeding up of transcoding that can be achieved with this approach.
Multicore Processing approach
There are processors which have multiple advanced cores which offer higher processing power. Assuming N-core processors where the compute capability theoretically scales up by a factor of
N, but at the same time, it needs N independent processes/threads/tasks to achieve this scaling. As the nature of video transcoding is still sequential, achieving this higher degree of parallelism is tough, and a transcoding process can utilize the multi-core advantage only to a limited extent.
The table below shows the results of the impact on processing time with an increasing number of cores.
|1080P Video Transcode to 4 resolutions|
|Number of Cores||Time to process a 60 min long video|
Theoretically, compared to 4 cores, 270 min should have been reduced to 90 min with 12 cores, but the observed processing time was 120 Min.
The video consists of pixels, and a block of pixels is used as a processing unit while encoding. They are referred to as blocks, and groups of these as macroblocks. A group of macroblocks (MB) is called slices, usually a few rows of MBs or rectangular areas of the video. These slices can be encoded independently, which helps generate multiple parallel processing entities to take advantage of parallel processing.
One of the limitations of this approach is the MBs in the slice boundary may not be able to use neighboring slice MBs for their prediction and hence achieve better compression. This impacts compression efficiency. Also, this approach needs a lot of code changes inside the encoder, and offering this as a generic solution for different core processors is quite challenging.
Multi-resolution – Parallel Processing
In the case of adaptive streaming, where the output consists of multiple resolutions, a multi-core process can be well utilized by allocating resolutions to different cores. As different resolutions have different complexities, it is important to the load and how to balance with the available cores. For example, our Content Adaptive H.264/AVC encoder can generate six resolutions for 1080P input in 15 minutes on a 32-core machine. This is almost 4x the speed of real-time processing.
Chunk Based Encoding
Another approach for faster video processing is when a complete video can be divided into smaller duration segments, and each can be parallelly processed. For a single file case, the encoders/transcoders might use the bitrate savings from one scene to another. When we divide the video into chunks and transcode them independently, this might have an impact on overall quality. However, if the segment size is large enough, the observed impact on quality can be minimized.
Quality impact study for H.264/AVC Content Adaptive Encoder for Chunk-based processing
In our experiment with Content Adaptive Encoding (CAE) chunked-based processing, there was around 10% variation in the bitrate for the same configuration as full-length video.
The chunk-based processing becomes more constrained as the savings from past segments cannot be utilized. With this observation, when Chunk-based H.264/AVC CAE is operated with the configuration of around 10% more than the earlier target bitrate, it achieves a similar bitrate and quality as the full-length video processing.
It has been observed with multiple contents that chunk-based processing can be utilized with the least impact on quality with some changes to the target bitrate.
Below is one example of data for a video clip run in CAE economical mode with 3000 Kbps as the target bitrate.
CAE provides almost 29% bitrate reduction for this video. When input video is processed parallelly with smaller chunks, the output video bitrate reduces almost by 5% when the segment size is 5 minutes and around 12% when encoded with a segment size of 1 minute.
The outputs are comparable for the 1-minute segment processing mode, setting the target bitrate of 3400 Kbps. The output with 1 min segment-based parallel processing is 2169 Kbps (2% more bitrate) with an improved VMAF of around 0.7.
|Duration (minutes)||Actual Bitrate (Kbps)||VMAF|
|Single segment (No Chunking)||2107||92.59|
|5 min Segment||2001||92.22|
|1 min segments||1837||91.27|
|1 min segments: with 10% more target Bitrate settings||2169||93.29|
Hence with appropriate settings, chunk-based processing can be used with the least impact on quality.
Speed improvement with chunk-based processing
The advantage of multicore and Multiresolution – Parallel Processing can be enhanced with chunk-based processing.
One of our use cases involves multi-resolution parallel processing using 32 cores; transcoding a one-hour content takes around 15 minutes.
In theory, if the segment size is set as 15 minutes and segments are distributed across multiple machines, the processing of each segment is expected to take 3.75 minutes.
In practical cases, this process involves chunking input files and transcoding, and all the segments need to be stitched into a single output file.
With these overheads, we can expect around 5 minutes for processing 60 minutes of 1080p30 input files to 6 output resolutions of 1080P, 720P, 576P, 480P, 360P, and 234P.
This timing can be further reduced with more powerful multi-core processors and chunk sizes smaller than 5 min.
Summary and Conclusion
Having a high speed transcoding setup can significantly reduce the time it takes to process and output a finished video which can help improve workflow efficiency and ensure faster turnaround times for clients or customers.
Various methods and approaches are available to speed up the transcoding process. The methods have some advantages and certain limitations.
Chunk-based CAE transcoding not only speeds up the transcoding process but also helps to achieve high quality along with Content Adaptive transcoding with savings video bandwidth up to 70%
With Chunk-based CAE, the processing speed for H.264/AVC coding for 1080P video is 10x real-time, which can be achieved without compromising quality. With smaller segments, the processing time can be further reduced.
Ashok Magadum is the Founder and CEO of Vidarka Technologies. He has a Master's from Manipal University and 21+ years of experience in the design and development of various Video codecs, to name a few MPEG-2/AVC/SVC/HEVC/Dolby Vision/Content Adaptive, Video Transcoding, Cloud Media Streaming, computer vision, ADAS, DSP, Embedded Systems and various Multimedia Technologies.
He has worked for various startups such as Ittiam System, Zenverge Inc, Saranyu Technologies, and MNCs like NXP, Freescale, and Samsung for the design and development of many solutions.
For any queries or more details, please reach us at [email protected].