MPEG Issues Call for Evidence for Video Coding for Machines

The 139^th MPEG meeting was held online, 18–22 July 2022

MPEG Issues Call for Evidence for Video Coding for Machines

At the 139^th MPEG meeting, MPEG Technical Requirements (WG 2) issued a Call for Evidence (CfE) for technologies and solutions enabling efficient feature coding for machine vision tasks.

MPEG’s exploration work on Video Coding for Machines aims at compressing features for machine-performed tasks such as video object detection and event analysis. As neural networks increase in complexity, architectures such as collaborative intelligence, whereby a network is distributed across an edge device and the cloud, become advantageous. With the rise of newer network architectures being deployed amongst a heterogenous population of edge devices, such architectures bring flexibility to systems implementers. Due to such architectures, there is a need to efficiently compress intermediate feature information for transport over wide area networks (WANs). As feature information differs substantially from conventional image or video data, coding technologies and solutions for machine usage could differ from conventional human-viewing-oriented applications to achieve optimized performance. With the rise of machine learning technologies and machine vision applications, the amount of video and images consumed by machines has rapidly grown. Typical use cases include intelligent transportation, smart city technology, intelligent content management, etc., which incorporate machine vision tasks such as object detection, instance segmentation, and object tracking. Due to the large volume of video data, extracting and compressing the feature from a video is essential for efficient transmission and storage. Feature compression technology solicited in this CfE can also be helpful in other regards, such as computational offloading and privacy protection.

Over the last three years, MPEG has investigated potential technologies for efficiently compressing feature data for machine vision tasks and established an evaluation mechanism that includes feature anchors, rate-distortion-based metrics, and evaluation pipelines.

This CfE welcomes submissions of responses from companies and other organizations. Registration is required by 26 August 2022, and the submission of proponent documentation is due by 14 October 2022. Discussion of the submissions in response to the CfE will be performed at the 140^th MPEG meeting in October 2022.

Companies and organizations that have developed VCM technologies are invited to bring such information in response to this CfE by contacting Dr. Igor Curcio, MPEG Technical Requirements Convenor, at [email protected].

At the 139^th MPEG meeting, MPEG Technical Requirements (WG 2) also issued an updated version of its previously-issued (i.e., at the 138th MPEG meeting) Call for Proposals (CfP) on Video Coding for Machines. The changes include a clarification that responses to this call need to support the coding of video data for machine tasks and include a questionnaire that summarizes the requirements each proposal fulfills. Furthermore, the registration deadline has been extended. Proponents are now welcome to register until 22 August 2022. No changes have been made to the test data or evaluation methods.

MPEG Ratifies the Third Edition of Green Metadata, a Standard for Energy-Efficient Media Consumption

At the 139^th MPEG meeting, MPEG Systems (WG 3) issued the Final Draft International Standard (FDIS) of the third edition of ISO/IEC 23001-11 Energy-Efficient Media Consumption (Green Metadata). FDIS is the final milestone of standard development.

MPEG Systems has been working on Green Metadata for the last ten years to enable the adaptation of the client’s power consumption according to the complexity of the bitstream. Many modern implementations of video decoders can adjust their operating voltage or clock speed to adjust the power consumption level according to the required computational power. Thus, if the decoder implementation knows the variation in the complexity of the incoming bitstream, then the decoder can adjust its power consumption level to the complexity of the bitstream. This will allow less energy use in general and extended video playback for the battery-powered devices.

The third edition enables support for Versatile Video Coding (VVC, ISO/IEC 23090-3, a.k.a. ITU-T H.266) encoded bitstreams and enhances the capability of this standard for real-time communication applications and services. While finalizing the support of VVC, MPEG Systems has also started the development of a new amendment to the Green Metadata standard, adding the support of Essential Video Coding (EVC, ISO/IEC 23094-1) encoded bitstreams.

MPEG Completes the Third Edition of the Common Media Application Format by adding Support for 8K and High Frame Rate for High Efficiency Video Coding

At the 139^th MPEG meeting, MPEG Systems (WG 3) issued the Final Draft International Standards (FDIS) of the third edition of the ISO/IEC 23000-19 Common Media Application Format (CMAF) for segmented media. FDIS is the final milestone of standard development.

The third edition of CMAF adds two new media profiles for High Efficiency Video Coding (HEVC, ISO/IEC 23008-2, a.k.a. ITU-T H.265), namely for (i) 8K and (ii) High Frame Rate (HFR). Regarding the former, the media profile supporting 8K resolution video encoded with HEVC (Main 10 profile, Main Tier with 10 bits per colour component) has been added to the list of CMAF media profiles for HEVC. The profile will be branded as ‘c8k0’ and will support videos with up to 7680×4320 pixels (8K) and up to 60 frames per second. Regarding the latter, another media profile has been added to the list of CMAF media profiles, branded as ‘c8k1’ and supports HEVC encoded video with up to 8K resolution and up to 120 frames per second. Finally, chroma location indication support has been added to the 3rd edition of CMAF.

MPEG Scene Descriptions adds Support for Immersive Media Codecs

At the 139^th MPEG meeting, MPEG Systems (WG 3) issued the Committee Draft Amendment (CDAM) for ISO/IEC 23090-14 Scene Description support for immersive media codecs. CDAM is the first formal milestone of the amendment development approval process.

ISO/IEC 23090-14 specifies the coded representation of scene descriptions, and this new amendment facilitates the integration of Video-based Point Cloud Compression (V-PCC, specified in ISO/IEC 23090-5) and MPEG Immersive Video (MIV, ISO/IEC 23090-12) into a scene. Immersive media codecs such as V-PCC and MIV encode immersive media data with multiple conventional 2D video codecs. The decoded bitstreams from each video decoder are combined to reconstruct the immersive media presentation. The scene description of ISO/IEC 23090-14 is based on the GL Transmission Format (glTF), which lacks support for integrating multiple video bitstreams into a single 3D object. This amendment enables the association of multiple buffers with a single mesh object. Additionally, it enables these buffers to deliver decoded components of V-PCC or MIV objects to the rendering engine so that the rendering engine can reconstruct the 3D point cloud or immersive video content. As the components of Visual-based Volumetric Visual coded (V3C) objects are encoded as 2D videos, support for YCbCr formats is also added. The amendment is planned to be completed, i.e., to reach the status of Final Draft Amendment (FDAM), by the end of 2023.

MPEG Starts New Amendment of VSEI containing Technology for Neural Network-based Post Filtering

At the 139^th MPEG meeting, the MPEG Joint Video Experts Team with ITU-T SG 16 (WG 5; JVET) issued a Committee Draft Amendment (CDAM) text for the Versatile Supplemental Enhancement Information (VSEI) standard (ISO/IEC 23002-7, a.k.a. ITU-T H.274). Beyond the SEI message for shutter interval indication, which is already known from its specification in Advanced Video Coding (AVC, ISO/IEC 14496-10, a.k.a. ITU-T H.264) and High Efficiency Video Coding (HEVC, ISO/IEC 23008-2, a.k.a. ITU-T H.265), and a new indicator for subsampling phase indication which is relevant for variable-resolution video streaming, this new amendment contains two Supplemental Enhancement Information (SEI) messages for describing and activating post filters using neural network technology in video bitstreams. This could reduce coding noise, upsampling, colour improvement, or denoising. The description of the neural network architecture itself is based on MPEG’s neural network coding standard (ISO/IEC 15938-17). Results from an exploration experiment have shown that neural network-based post filters can deliver better performance than conventional filtering methods. Processes for invoking these new post-processing filters have already been tested in a software framework and will be made available in an upcoming version of the Versatile Video Coding (VVC, ISO/IEC 23090-3, a.k.a. ITU-T H.266) reference software (ISO/IEC 23090-16, a.k.a. ITU-T H.266.2).

MPEG Starts New Edition of Video Coding-Independent Code Points Standard

At the 139^th MPEG meeting, the MPEG Joint Video Experts Team with ITU-T SG 16 (WG 5; JVET) issued a Committee Draft (CD) text for the third edition for the video part of the Coding-Independent Code Points (CICP) standard (ISO/IEC 23091-2, a.k.a. ITU-T H.273). This new edition will include two new YCgCo-R colour type identifiers enabling efficient luma-chroma representations with improved support of lossless conversion to and from RGB colour spaces without requiring support of different bit depths for luma and chroma.

MPEG White Paper on the Third Edition of the Common Media Application Format

At the 138th MPEG meeting, MPEG Liaison and Communication (AG 3) approved a white paper on the third edition of the Common Media Application Format (CMAF).

The Common Media Application Format (CMAF), as defined in ISO/IEC 23000-19, has found significant market adoption as the media streaming format used by industry, in particular for convergence between ISO/IEC 23009 MPEG Dynamic Adaptive Streaming over HTTP (DASH) and other streaming technology. Based on these developments and new market needs, the CMAF standard has been continuously developed, and bug fixes and maintenance have been addressed taking into account feedback from deployments. The most recent result of these efforts is MPEG’s completion of the third edition of CMAF, which incorporates the changes introduced by three approved amendments.

The third edition addresses the following updates, along with general bug fixes, clarifications, and improvements:

definition of new logical structures, namely CMAF random access chunk, and CMAF Principal Header;
addition of timed metadata tracks;
addition of new media profiles for HEVC for High Frame Rate, interlaced video content, and 8K-UHD;
listing of common source formats based on display resolution operating point sets and frame rate operating point sets; and

addition of new media profiles for MPEG-H audio, VVC, and EVC.

MPEG Issues Call for Evidence for Video Coding for Machines