You can’t get very far into video streaming without running into packages and packagers. Streaming media requires two different types of packages, adding to the confusion.
This article looks into the kinds of packages and reviews some tools available to create them. We also take a look at ABR packaging for Live/VOD streaming for both the HLS and MPEG-DASH streaming protocols.
With that, let’s get started.
What Is Packaging?
Before we get into the details of streaming video packaging, a quick review of how digital media works is in order. Let’s look at an HD video.
Compression and Elementary Streams
An HD video contains 1920 x 1080 = 2,073,600 pixels for each frame. With 256 possible values each for Red, Green, and Blue, we need 24 bits to define each pixel, giving us 49,766,400 bits per frame. At 30 frames each second, we are looking at 1.5 Gbps. If we wanted to store a 2-hour movie in this “raw” format, the file would be over 670 GB. 4K video, with its higher bit depths and frame rates, is 10x higher, 15 Gbps, and 6.7 TB for the whole movie.
Video and audio codecs compress this raw data down to a manageable size. The H.264 video codec can easily compress the raw HD video at 1.5 Gbps down to a manageable 6 Mbps, just 0.4% of the uncompressed size, with negligible impact on video quality. The output of the video encoder is known as a video Elementary Stream. The audio is also compressed with a codec, resulting in an audio Elementary Stream. For more on audio codecs, read this introduction to audio compression and the various audio codecs in use.
During playback, the Graphical Processing Unit (GPU) decodes the compressed stream and sends the correct Red, Green, and Blue values of each pixel to the display, one frame at a time.
Note: read OTTVerse’s Introduction to Video Encoding for a deeper look at video compression.
The video and audio codecs took care of the size of the data, but they aren’t convenient from a user perspective. For one thing, they do not contain any timing information. An H.264 Elementary Stream for a 120fps video is indistinguishable from that for a 25fps video.
The more fundamental problem is that each stream must be decoded in order, from the beginning, to make any sense. The only way to move to a particular point in the stream is to start at the beginning and start processing until you have reached the desired location in the video.
When you add audio and captioning data, the player now needs to process two or three different files in parallel, one for each media type. The player would have to rely on the user to provide information about the frame rate and other essential parameters required to decode and render the media.
This is where the media package comes in. The media package combines the video elementary stream, audio elementary stream, captioning data, timing information, and an index in a single file. While there are various video containers in use, including AVI, WMV, MKV, and Ogg, the most commonly used container for OTT streaming is MPEG Part 14 (mp4).
The MP4 specification uses objects called atoms. These are also called boxes. The two critical atoms in an MP4 file are moov (movie) and mdat (media data).
The moov atom contains the movie meta-data, data about the movie, not the actual media data. It includes a collection of trak atoms, one for each of the tracks in the movie. A trak atom contains information about the track, including an index to the track media data located in the mdat atom.
The mdat atom contains the actual movie data, the audio elementary streams, the video elementary streams, and any associated captioning data. The size of these atoms is different for each file, so the usual way for a program to get the information is to start reading the file to find the size of the moov atom, read the entire moov atom, then read each mdat atom as needed.
Accessing a file this way is trivial on any platform that supports random file access but quickly becomes tedious and inefficient over HTTP.
The encoder performs the media packaging as the final step of the encoding process.
Streaming media is delivered over the internet using HTTP. Users access the same content from smart TVs with a broadband connection and mobile devices with congested cellular service. Adaptive Bitrate (ABR) streaming accommodates this mix of devices and networks.
A streaming media player requires information beyond that contained in an individual MP4 file.
- The available track resolutions/bitrates (for ABR support)
- The available audio and video codecs (for player compatibility)
- The available audio/captioning languages
- Ad insertion points (for advertising support)
HTTP Live Streaming (HLS) and Dynamic Adaptive Streaming over HTTP (DASH) provide all of this information in a single manifest file. In the snippets below, we will see how the manifest files store this information.
HLS lists all of the available audio and video tracks in the Master Playlist using the EXT-X-STREAM-INF tag. Each tag contains the bandwidth, resolution, and codecs. The URL of the Media Playlist for that track immediately follows the tag. Players select the appropriate tracks based on their capabilities.
#EXTM3U #EXT-X-STREAM-INF:BANDWIDTH=150000,RESOLUTION=416x234,CODECS="avc1.42e00a" http://example.com/low/index.m3u8 #EXT-X-STREAM-INF:BANDWIDTH=240000,RESOLUTION=416x234,CODECS="avc1.42e00a" http://example.com/lo_mid/index.m3u8 #EXT-X-STREAM-INF:BANDWIDTH=440000,RESOLUTION=416x234,CODECS="avc1.42e00a" http://example.com/hi_mid/index.m3u8 #EXT-X-STREAM-INF:BANDWIDTH=640000,RESOLUTION=640x360,CODECS="avc1.42e00a" http://example.com/high/index.m3u8 #EXT-X-STREAM-INF:BANDWIDTH=64000,CODECS="mp4a.40.5" http://example.com/audio/index.m3u8
DASH Media Presentation Descriptions provide the information with a little more structure, grouping the media tracks into Adaptation Sets. Suppose the video is available in both h.264 and h.265. In that case, there will be h.264 and h.265 adaptation sets, each containing the appropriate tracks. Within each adaptation set, the manifest includes a Representation for each track. As with the HLS #EXT-X-STREAM-INF tag, the Representation has the bandwidth, resolution, and codecs. Players select the appropriate Adaptation Sets based on their capabilities.
<AdaptationSet id="1" segmentAlignment="true" maxWidth="640" maxHeight="360" maxFrameRate="25" par="16:9" lang="eng" startWithSAP="1"> <Representation id="1" mimeType="video/mp4" codecs="avc1.42e00a" width="416" height="234" frameRate="25" sar="1:1" startWithSAP="1" bandwidth="150000"> … </Representation> <Representation id="2" mimeType="video/mp4" codecs="avc1.42e00a" width="416" height="234" frameRate="25" sar="1:1" startWithSAP="1" bandwidth="240000"> … </Representation> <Representation id="3" mimeType="video/mp4" codecs="avc1.42e00a" width="416" height="234" frameRate="25" sar="1:1" startWithSAP="1" bandwidth="440000"> … </Representation> </Representation> <Representation id="3" mimeType="video/mp4" codecs="avc1.42e00a" width="640" height="360" frameRate="25" sar="1:1" startWithSAP="1" bandwidth="640000"> … </Representation>
Note: read OTTVerse’s Structure of a MPEG-DASH MPD for a deeper look at the DASH Media Presentation Description file format.
Streaming packagers perform two essential services: Content fragmentation and manifest preparation.
A standard MP4 file puts all of the movie metadata in the moov atom and the media data in the mdat atom as we saw previously.
For ABR streaming, where the player can switch tracks at every segment, this information is required for each segment. Fragmented MP4 puts each segment into its own mdat atom and adds a sidx (segment index) and moof (movie fragment) atom for each mdat atom. If all of the track segments are stored in a single file then the file contains a sidx/moof/mdat sequence for each segment, like this:
If you stored the segments as individual files, there would be an init.mp4 file containing the moov atom, and each of the segment (.m4s) files will have the sidx, moof, and mdat atoms for that segment.
Some packaging tools, like Bento4, provide separate tools for fragmenting the video and preparing the manifest. Others, like GPAC and Shaka, use a single tool to do the fragmenting and manifest preparation.
The final step is creating the content manifests. For HLS, this includes the Master Playlist and the individual track Media Playlists. For DASH, the information is all contained in the Media Presentation Description (MPD). HLS and DASH manifests collect all the information the player needs to select an appropriate set of tracks and find the segments that make up the track.
Popular Streaming Packagers
There are several open-source packagers that can help you create compliant MPEG-DASH and HLS streams. OTTVerse has detailed tutorials on the following packagers to make your job easier.
|Packager||DASH VOD||HLS VOD||DASH LIVE||HLS LIVE|
Packages and Packagers are a vital part of the live and VOD video streaming pipeline and a good understanding of packaging will go a long way in helping you create and serve compliant streams – be it HLS or DASH. Until next time, thank you and happy streaming.
Interested in MPEG-DASH and video streaming? You’ll be interested in these articles for sure!