MPEG-DASH is one of the most popular video-streaming protocols and is widely used to deliver media either via Video on Demand (VOD) or Live Streaming and to various end-user devices, including smartphones, tablets, SmartTVs, gaming consoles, and more.
Fundamental to the MPEG-DASH protocol is the manifest or MPD (Media Presentation Description) that is created when the media is packaged and prepared for transmission via DASH.
In this edition of the Hitchhiker’s Guide to MPEG-DASH, we dig into the DASH Media Presentation Description (MPD), as defined in ISO/IEC 23009-1. The goal is to give you a better understanding of the different parts of a DASH MPD, what they’re used for, and how they work.
Interested in MPEG-DASH and video streaming? You’ll be interested in these articles for sure!
So, let’s get started!
What is the advantage of DASH in video streaming?
MPEG DASH is a platform and browser-agnostic video streaming protocol. There are plenty of freely-available online players, and more and more browsers include native DASH support, eliminating the need for external players or plugins. So long as the webserver reports the DASH manifest MIME type correctly, the user doesn’t need to ask how to play the MPD or find a player; When the browser downloads the MPD, the video loads and plays automatically.
What is a DASH Manifest?
You will recall that a browser or DASH player uses the MPD file, commonly known as a DASH Manifest, to determine which resources to request from an HTTP server and when to request them. To quote the specification:
The Media Presentation Description (MPD) is a document that contains metadata required by a DASH Client to construct appropriate HTTP-URLs to access Segments and to provide the streaming service to the user.
DASH MPD Format
DASH MPD files are XML documents. XML schemas like MPD can be quite complex, and it is the packager’s responsibility to create a valid MPD. The ISO specification is a technical document written by and for developers. This article aims to give an overview of the MPD structure so you can have a better understanding of the packager input parameters.
DASH MPD Encoding
DASH MPD files use UTF-8 encoding. Remember that the first 128 UTF-8 symbols are identical to traditional 7-bit ASCII, so an ASCII document is automatically a UTF-8 document. For characters 129 and above, it is the packager’s responsibility to do the encoding correctly.
DASH MPD MIME Type
Browsers use the MIME type/subtype reported in the HTTP content-type header to figure out how to process a file, similar to how a computer uses the file extension to select a suitable application. You must configure your webserver to deliver DASH MPD files with the IANA-registered content-type application/dash+xml. There are no required parameters.
What are the parts of an MPD?
As the name says, an MPD contains a Media Presentation with a clear, consistent hierarchical organization. From top to bottom, the individual elements of the MPD hierarchy are:
- The Media Presentation contains a sequence of one or more Periods.
- A Period contains one or more Adaptation Sets.
- An Adaptation Set contains one or more Representations.
- A Representation contains one or more Segments.
- Segments carry the actual media data and associated metadata.
Each element consists of a set of attributes. Individual attributes are either Mandatory, Conditionally Mandatory, Optional, or Optional with Default Values.
An excellent description of an MPD’s structure can be found in the popular and definitive guide published by Iraj Sodogar in IEEE Multimedia titled The MPEG-DASH Standard for Multimedia Streaming Over the Internet“. It clearly depicts the hierarchical structure used in a DASH MPD.
In the next few sections, we’ll tackle each of these parts of an MPD separately and understand how they are used and how they work with each other.
The Media Presentation contains information about all of the different media types in the content. The most common media types are video, audio, and closed captioning data. At the top level, the MPD contains information including the MPD Profile, minimum buffer time, presentation duration, and maximum segment duration, and title.
Example of the top level of an MPD
<?xml version="1.0"?> <!-- MPD file Generated with GPAC version 0.5.2-DEV-rev710-g713274f-master at 2015-10-15T08:08:21.937Z--> <MPD xmlns="urn:mpeg:dash:schema:mpd:2011" minBufferTime="PT1.500S" type="static" mediaPresentationDuration="PT0H12M14.167S" maxSegmentDuration="PT0H0M2.005S" profiles="urn:mpeg:dash:profile:isoff-on-demand:2011,http://dashif.org/guidelines/dash-if-main"> <ProgramInformation moreInformationURL="http://gpac.sourceforge.net"> <Title>/var/www/Dashcontent/5b_hev/tos_720_Multirate_HEVC_ondemand.mpd generated by GPAC</Title> </ProgramInformation> … </MPD>
DASH lets you divide a video into multiple Periods. By default, playback continues seamlessly from one Period to the next, providing an easy way to add seek points for individual chapters. SCTE35 tags between Periods can trigger the player to perform dynamic ad insertion.
Example of MPD with 3 Periods
<Period id="0" duration="PT59.52S"> … </Period> <Period id="1" duration="PT0H1M8.678S"> … </Period> <Period id="2" duration="PT0H11M14.647S"> … </Period>
An Adaptation Set catalogs the different representations of the media. Within each Adaptation Set, the player will only select one Representation for a particular Segment, but the Representation can change from one Segment to the next.
Video Adaptation Sets typically contain multiple Representations, one for each resolution/bitrate, allowing the media player to select the best available quality without buffering. If the video is available in more than one codec, each codec will be in a different AdaptationSet.
If the audio is available in both stereo and multi-channel surround, these will be in separate adaptation sets to prevent the player from switching back and forth during playback. Each language will have its own Adaptation Set.
For each Adaptation Set, the player will select one of the available Representations at a time, but the selection can change from one segment to the next.
After downloading the MPD, the player selects the Adaptation Sets to play based on a combination of device capabilities and user preferences. Consider an MPD that contains video, audio, and closed captioning. Video is available in AVC and HEVC (two Adaptation Sets). Audio is available in either stereo or 6-channel audio in English, French, and Spanish (six Adaptation Sets). Closed captioning is available in English, French, Portuguese, and Spanish (four Adaptation Sets). The player might select:
- HEVC video (because the player supports it).
- English language stereo audio (because the player only has two audio channels and the user has selected English as their default language).
- No closed captioning (because the user has turned off captioning).
Example of two video adaptation sets and one audio adaptation set
<AdaptationSet id="1" segmentAlignment="true" maxWidth="1920" maxHeight="800" maxFrameRate="24" par="1920:800" lang="eng" startWithSAP="1"> <Representation id="1" mimeType="video/mp4" codecs="avc1.640028" width="1920" height="800" frameRate="24" sar="1:1" bandwidth="2029827"> … </Representation> <Representation id="2" mimeType="video/mp4" codecs="avc1.640028" width="1920" height="800" frameRate="24" sar="1:1" bandwidth="1016035"> … </Representation> </AdaptationSet> <AdaptationSet id="2" segmentAlignment="true" group="1" maxWidth="1920" maxHeight="1080" maxFrameRate="24" par="16:9" lang="eng"> <Representation id="3" mimeType="video/mp4" codecs="hev1.1.6.L120.90" width="1920" height="800" frameRate="24" sar="1:1" bandwidth="1980081"> … </Representation> <Representation id="4" mimeType="video/mp4" codecs="hev1.1.6.L120.90" width="1920" height="800" frameRate="24" sar="1:1" bandwidth="991395"> … </Representation> </AdaptationSet> <AdaptationSet id="3" segmentAlignment="true" lang="eng"> <Representation id="5" mimeType="audio/mp4" codecs="mp4a.40.2" audioSamplingRate="48000" startWithSAP="1" bandwidth="34189"> … </Representation> </AdaptationSet>
Within an Adaptation Set, a Representation describes one of the versions of the content. The most common use of multiple Representations is to describe the various video bitrates available for Adaptive Bitrate (ABR) streaming. Each Representation always includes the average bandwidth of the Representation.
Example of four video Representations
<Representation id="1" mimeType="video/mp4" codecs="hev1.2.4.L63.90" width="512" height="288" frameRate="24" sar="1:1" startWithSAP="1" bandwidth="500646"> … </Representation> <Representation id="2" mimeType="video/mp4" codecs="hev1.2.4.L90.90" width="768" height="432" frameRate="24" sar="1:1" startWithSAP="1" bandwidth="999733"> … </Representation> <Representation id="3" mimeType="video/mp4" codecs="hev1.2.4.L93.90" width="1280" height="720" frameRate="24" sar="1:1" startWithSAP="1" bandwidth="1993305"> … </Representation> <Representation id="4" mimeType="video/mp4" codecs="hev1.2.4.L120.90" width="1920" height="1080" frameRate="24" sar="1:1" startWithSAP="1" bandwidth="3960054"> … </Representation>
Segments contain the information required to construct the actual URLs used to download the content. The MPD can either provide a list of the segment URLs or a template that the player uses to build the URLs dynamically.
Example of a Segment list
<SegmentList duration="10"> <SegmentURL media="seg-m1-C2view-1.mp4"/> <SegmentURL media="seg-m1-C2view-2.mp4"/> <SegmentURL media="seg-m1-C2view-3.mp4"/> </SegmentList>
Example of a Segment template
<SegmentTemplate media="$Bandwidth%/$%04dNumber$.mp4v"> </SegmentTemplate>
How are URLs described in an MPD?
Given the vast amount of content available from streaming providers, the URLs can get rather long, sometimes more than 100 characters. Video is often available in 6-10 bitrates to support good quality over a range of network conditions. Individual segments are usually 2-10 seconds long, allowing the player to respond rapidly to changing network conditions.
For our hypothetical video with 12 adaptation sets, a 2-hour movie could require 40,000 segments. A list of those URLs would be over 4MB, and the player needs to download the complete MPD before it can start playing the video. Our hypothetical user would use less than 9% of those URLs. There has to be a better way, and sure enough, there is.
The Base URL
While the actual URLs might be over 100 characters each, most of those characters will be the same for all URLs. This is where the Base URL comes in. Each MPD has an optional @BaseURL attribute. The Base URL is used as the prefix for all of the Segment URLs and contains the protocol, domain name, and most of the server’s directory structure for the video. It is stored at the MPD level and shared by all Representations, Adaptation Sets, and Periods. Additional Base URLs can be added at the Period, Adaptation Set, and Representation levels, further increasing the storage efficiency.
If the content is available from multiple sources, a separate BaseURL can be included for each source.
Example of nested Base URLs
<MPD xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="urn:mpeg:dash:schema:mpd:2011" xsi:schemaLocation="urn:mpeg:dash:schema:mpd:2011 DASH-MPD.xsd" type="dynamic" minimumUpdatePeriod="PT2S" timeShiftBufferDepth="PT30M" availabilityStartTime="2011-12-25T12:30:00" minBufferTime="PT4S" profiles="urn:mpeg:dash:profile:isoff-live:2011"> <BaseURL>http://cdn1.example.com/</BaseURL> <BaseURL>http://cdn2.example.com/</BaseURL> <Period id="1"> <!-- Video --> <AdaptationSet mimeType="video/mp4" codecs="avc1.4D401F" frameRate="30000/1001" segmentAlignment="true" startWithSAP="1"> <BaseURL>video/</BaseURL> … </AdaptationSet> </Period> </MPD>
In this example, the video segment files would be available at both
The simplest way to store the URLs, at least from a human readability perspective, is to list them all in each Representation. If each Representation has its own directory on the server and you use Base URLs, this is nothing more than the Segment file names. For shorter content with a small number of Adaptation Sets, this is a common practice.
For longer content, with hundreds or thousands of segments per Representation, listing all of the file names still takes a lot of space. To further reduce the MPD size, DASH allows Segments to be stored using a SegmentTemplate. URL templates consist of ordinary characters combined with one or more Identifiers. Identifiers are marked (‘escaped’ in programmer-speak) with a “$” character at the beginning and end. The player uses the template to construct the URL when it is ready to request the Segment.
The packager can insert Segment Templates at the Period level or lower, using the defined Identifiers shown in the table below.
|An escape sequence, i.e., “$$” is replaced with a single “$”.|
|The player substitutes this identifier with the value of the attribute [email protected] of the containing Representation.|
|The player substitutes this identifier with the number of the corresponding Segment.|
|The player substitutes this identifier with the value of Representat[email protected] attribute value.|
|The player substitutes this identifier with the value of the [email protected] attribute for the Segment. You can use either |
You can suffix any identifier, within the enclosing ‘$’ characters, with an additional format tag following this prototype:
The width parameter is an unsigned integer that provides the minimum number of characters to print. If the value to print is shorter than this, it will pad the result with zeros. The value isn’t truncated if the number is larger than the tag format.
- Segment Templates let you stream live content without the need to periodically update the MPD with new Segments as they become available.
- Segment Templates require careful, consistent construction of the individual file names, but your packager takes care of this for you.
How can MPDs be updated dynamically?
DASH supports both static and dynamic MPDs. Dynamic MPDs are typically used for live streaming, adding new segments or periods as content becomes available, and dropping older segments and periods when the content is no longer available.
What are MPD Profiles?
DASH Profiles tell the player which of the many DASH features the MPD contains, mainly for compatibility reasons. Your packager will take care of this for you, so you don’t need to worry about it. The most commonly used profiles are:
How does DASH support DRM?
DASH uses the ContentProtection element to identify the content protection used. The contents of this element are unique to each DRM provider. If you are using DRM, you will need to work with your DRM provider to make sure that your packager is properly integrated with their system.
Example of Content Protection in an MPD
<ContentProtection schemeIdUri="http://example.net/052011/drm"> <drm:License>http://MoviesSP.example.com/protect?license=kljklsdfiowek</drm:License> <drm:Content>http://MoviesSP.example.com/protect?content=oyfYvpo8yFyvyo8f</drm:Content> </ContentProtection>
That’s it folks for this introduction to MPEG-DASH MPDs. I hope you understood the different components that make up an MPD and its structure. In future articles of this series, we’ll learn about packaging videos into MPEG-DASH format using popular packagers such as Bento4, Shaka, mp4box, etc.
Do check out this article on OTTVerse.com for a list of freely available DASH MPDs covering a variety of streaming use cases.
Until next time, happy streaming!