What is 4K HDR Dynamic Metadata?
It's the next stage in the roll-out of HDR but do we need it?
You may have read a lot of discussion about dynamic metadata recently and found yourself wondering what is it, how it works and is it important.
Why do we need metadata?The use of metadata is part of the development of High Dynamic Range (HDR) and the reason that it is needed is because the world of video has recently got a lot more complicated. In the old days video content was graded using a CRT monitor and thus matched the limitations of that particular technology. So we had a set of standards that were based around the Rec. 709 colour gamut and a peak brightness of about 100 nits. As a result the content created was mapped to the display itself on a one-to-one basis and, assuming the display had been set up correctly, what you saw on the screen was exactly what the content creators intended.
In the last couple of years this situation has fundamentally changed wth the advent of HDR and now the content being created uses a colour gamut and a peak brightness that is wider and higher than the display's capabilities. This means the content has to include additional information, or metadata, that tells the display how to match the content to the limit of its own colour gamut and peak brightness. If the metadata wasn't present then the display wouldn't know how to correctly show the HDR content it is receiving in terms of its own colour volume. You'll probably hear the term 'colour volume' used a lot but it just means the combination of the display's colour gamut and its peak brightness.
Why do we need tone mapping?The problem is that the media container used for HDR (Rec. 2020) can carry more saturated colours than can actually be shown on currently available wide colour gamut displays. Many of these displays can show colours up to the DCI-P3 color gamut, although not all, but that is still smaller than the Rec. 2020 color gamut. So the media container is bigger than the display's native colour gamut, which means that the content has to be precisely mapped to the display's capabilities, although that's only half the problem because there's still the issue of peak brightness.
The PQ EOTF used for HDR 10 and Dolby Vision content can go up to 10,000 nits of peak brightness and although the maximum peak brightness currently used for content grading is 4,000 nits, the brightest domestic displays can only reach between 1,000 and 2,000 nits of peak brightness. As a result the peak brightness of the content far exceeds the capabilities of the display and, when combined with the colour gamut, the resulting colour volume can be many times larger than that of the display. Thus a process is required that can reduce the colour volume of the content down to the capabilities of each display, whilst still retaining as much of the creator's original intentions as possible – this is called tone mapping.
What is static metadata?A key part of this tone mapping is the metadata that is included with the content, the more information included the more accurate the tone mapping. When it comes to HDR there are two types of metadata, Static and Dynamic, with the former being standardised by SMPTE ST 2086 (Mastering Display Colour Volume). This defines the static metadata that is supported by HDMI 2.0a, and is included with mastered HDR content to convey the colour volume of the mastering display and the luminance (brightness) of the content. This is described using the red, green, and blue primary colours and white point of the mastering display, plus its black level and peak luminance level. SMPTE ST 2086 also conveys the following luminance attributes of the mastered content:
MaxCLL (Maximum Content Light Level) – The MaxCLL is the luminance of the brightest pixel in the content expressed as nits;
Max FALL (Maximum Frame-Average Light Level) – The average luminance of all the pixels in each frame is first calculated and the MaxFALL is then the maximum value of the frame-average for all the frames in the content in nits.
The problem with static metadata is that if the tone mapping is performed without scene-by-scene content information, the mapping will be based only on the brightest scene and the widest gamut scene in the entire content. As a result the majority of the content will have greater compression of dynamic range and colour gamut than should really be necessary. So the less capable an HDR display is, the more important it is that the content is correctly tone mapped. That brings us on to dynamic metadata, which includes more information to help less capable displays tone map the content correctly.
What is dynamic metadata?Dynamic metadata allows a compatible display to tone map the HDR content to a smaller colour volume only as needed, when the content exceeds the capability of the playback display itself. The metadata can change dynamically, based on the minimum, maximum and average luminance and gamut requirements for each scene. Tone mapping is more important the greater the difference between the mastering display and the playback display and will be an essential part of the future proofing of HDR technology. It ensures that playback displays that can do accurate tone mapping will still show content well, even when mastering displays are ultimately at or near the Rec. 2020 colour gamut limits and use a peak brightness of 10,000 nits.
Dynamic metadata has been standardised by SMPTE ST2094, which defines content-dependent metadata which will be supported in version 2.1 of the HDMI standard. Dynamic metadata will convey frame-by-frame or scene-by-scene tone mapping information that will enable the display to vary the image throughout the entire content. The dynamic metadata determines what parts of the colour volume (both colour gamut and peak brightness) do not need to be used from the original mastering image, while still making sure the image produced is a good reproduction of the original. In practice dynamic metadata will primarily concentrate on scene-by-scene mapping information but it could also use frame-by-frame information when a scene changes.
Where is dynamic metadata used?The most common form of High Dynamic Range is HDR10 which currently only uses static metadata as standardised by SMPTE ST 2086. However with the finalisation of the SMPTE ST2094 standard and the release of HDMI 2.1, which is also a requirement, we now see dynamic metadata being added to HDR10 in the form of HDR10+ (SMPTE ST2094-40). Samsung has been instrumental in the development of HDR10+ which is open source and licence-free. Recently Panasonic, Philips, Hisense and TCL have also adopted HDR10+, whilst content-provider support comes from Amazon. At present HDR10+ can be used for streaming but it's currently not part of the Ultra HD Blu-ray specs, although there are plans to add it in the near future.
There is another form of HDR called Dolby Vision (ST2094-10) and this already supports dynamic metadata. In fact it is one of the main benefits that Dolby has been championing when promoting their format as superior to HDR10 with static metadata. Dolby can certainly lay claim to developing the entire groundwork of the HDR ecosystem. The PQ (Perceptual Quantiser) EOTF on which HDR is based was developed by Dolby and standardised as SMPTE ST 2084. The same goes for both static and dynamic metadata, the basis of which was originally developed by Dolby. Their HDR ecosystem is definitely appealing with its closed approach to both the mastering display and the playback display, as well as the use of dynamic metadata to ensure an optimal image.
Dolby's dynamic metadata solution enables accurate tone mapping from any arbitrary mastering colour volume to any arbitrary consumer device colour volume, providing future proof scalability. Dolby Vision is certainly gaining a foothold thanks to support from Netflix and Amazon, who use the format on their streaming services. All the major studios except 20th Century Fox use Dolby Vision for their streamed content and Sony, Universal, Paramount, Lionsgate and Warners have already released Ultra HD Blu-rays with Dolby Vision included. LG, Sony, Loewe, B&O, Funai, Vizio and Toshiba offer TVs that support Dolby Vision and LG, Funai, Cambridge Audio and Oppo also offer Ultra HD Blu-ray players that are capable of playing back the new Dolby Vision discs. Perhaps most importantly the Apple TV 4K also supports Dolby Vision, as does iTunes, providing even more content in the format.
Although Dolby Vision and HDR10+ are the two versions of dynamic metadata that have been receiving the most attention recently, in the interests of completeness it's worth mention that there are two other versions of dynamic metadata. There is one developed by Philips (ST2094-20) and one developed by Technicolor (ST2094-30), although the two companies merged their respective HDR roadmaps in 2016 and are now working together.
MORE: What is Dolby Vision?
Is there a difference between static and dynamic metadata?Although there has been a lot of discussion recently about dynamic metadata, as well as the release of HDMI 2.1 and the perceived benefits of Dolby Vision, the real question is how much difference will dynamic metadata actually make. In out testing to date there's no doubt that dynamic metadata does deliver a better HDR experience, especially on less capable displays. For example there will probably be scenes within a film that are completely within a display's capabilities and possibly scenes that are completely outside those capabilities, a situation that would be problematic using just static data with a minimum, maximum and average number for the entire film. Dynamic metadata ensures that the tone mapping preserves more contrast in both the dark and bright parts of the image.
We have managed to compare the HDR10 and Dolby Vision versions of the same film using two Ultra HD Blu-ray players and an LG B7 OLED TV and the Dolby Vision experience was superior. Of course the B7 is limited to a peak brightness of about 700nits, so the addition of dynamic metadata undoubtedly helped. This is where dynamic metadata adds value by allowing less capable displays to handle the huge colour volume of HDR content, especially when graded at 4,000 nits. By capturing content characteristics in real time, the image can be delivered effectively and consistently across a range of different consumer displays. Dynamic metadata provides the best mechanism to do this by optimising tone mapping when HDR content exceeds the capabilities of a display and minimising mapping when the image is already within those capabilities.
In addition we have also seen demonstrations of the same content encoded with HDR10 and HDR10+ and the images in HDR10 with dynamic metadata were certainly more detailed in the highlights when compared to the HDR10 footage that just used static metadata. However once again, where dynamic metadata really adds value is in terms of the tone-mapping, so the better the TV the less important the use of dynamic metadata. A TV that can deliver more than 1,500 nits accurately and 100% of DCI-P3, could deliver a detailed and accurate picture even without dynamic metadata if the content is graded at 1,000 nits and DCI-P3. Of course once you play content graded at 4,000 nits, the current limit of professional grading monitors, then even a more capable TV has to tone map.
There's no doubt that dynamic metadata, whether delivered via HDR10+ or Dolby Vision, will increase in importance as High Dynamic Range develops and will certainly play a key role in terms of delivering HDR over video streaming services and also allowing less capable TVs to accurately tone map content created with a much larger colour volume. The ultimate goal is to deliver optimised images that represent, as closely as possible, the content creator's original intentions and that's always been what really matters.
All images courtesy of Dolby.
To comment on what you've read here, click the Discussion tab and post a reply.