Few things cause as much confusion among Home Theater enthusiasts as the myriad details surrounding Digital Video formats. It is typical to run into settings choices, for example, which come with no useful explanation, nor even advice as to when or why you might prefer one over another. It is also typical to run into non-intuitive limitations: You can't do THIS because you are also trying to do THAT!
In this post, I will attempt to survey the entire topic of Digital Video formats as applied to Home Theater systems. There's way too much material here to cover everything in one post, but I will try to show you how the pieces fit together, and introduce the jargon you will see repeatedly in future posts as I get into more details.
So if you've ever wondered just what, "HDMI 4K/24 YCbCr 4:2:2 12-bit HDR10 BT.2020 with HDCP 2.2" actually MEANS (and why the heck you'd need to KNOW that), this post is for you!
In my previous post on Digital Audio I discussed what I called the "life cycle" of Analog and Digital Audio -- i.e., the points where conversions might be necessary between them. The same sort of thing applies to Digital Video, but not to the same degree. The reason is, the industry has worked hard in recent years to eliminate Analog Video cabling from all Consumer Electronics and Home Theater gear: The so called, "Analog Sunset".
This has not been done out of any sort of altruism, nor in the belief that Digital Video is inherently higher quality. Rather, it's been done because keeping video in Digital form allows the industry to impose Copy Protection on it! Copy Protection has been raised to the level of religion among the folks who own the rights to home media content (i.e., the studios). And they will dote on just about any cunning scheme which offers to add more protection against copying their content.
So these days, it is basically impossible to buy a new Blu-ray disc player, for example, which includes Analog Video outputs.
The highest quality form of Analog video connection in OLDER gear (and still used in some professional gear) was called Component Video -- easily recognizable by the fact that it took 3 separate cables to carry the video signal between devices. That number "3" is going to come up a lot in the discussion that follows.
Today's Home Theater equipment uses a type of Digital cabling and signal protocol which rejoices in the name, HDMI (High-Definition Multimedia Interface). And the Copy Protection protocol tacked on top of HDMI is known as HDCP (HIgh-bandwith Digital Content Protection).
HDMI, which carries both Digital Video and Digital Audio (and, oh my!, many other oddball things as well such as, for example, device remote control signals!) derives from an older cabling standard called DVI which was invented as a simple way to attach computers to their display monitors. Many of the issues with modern HDMI date back to this ancestry.
For example, DVI was a video-only cabling scheme. HDMI adds audio by the trick of embedding the digital audio content INSIDE the digital video content. Which means there is no such thing as an audio-only signal on HDMI cabling! Ever. There must ALWAYS be a video signal so that there's a place to stash the audio. And the audio signal may even be LIMITED by the space available to stash it, inside that video.
The video (and audio) carried on HDMI cabling is ALWAYS Digital -- never Analog. But there are many different flavors of Digital Video which HDMI can carry. Part of the difficulty in playing modern, home media content is getting devices to agree on just what flavor of video (and audio) will be in use at the moment. The source device and the destination device can have very different capabilities in this regard. And the user may have some preferences too, established in the settings in each device.
And so a sort of negotiation has to happen between the devices to determine exactly what will be put on the cable. This negotiation is called an "HDMI Handshake" -- part of which also includes making sure HDCP copy protection is happy with the result. Since copy protection is involved, just about ANY change in the configuration will trigger a new HDMI Handshake. So for example if you change the video Resolution being used, or if the content switches to a different audio format, there's a time-out for a new Handshake! HDMI (along with its copy protection) is also an "end to end" format. So ALL of the devices in the HDMI signal chain for video (and audio) participate in each and every Handshake. If you turn a device Off or On, for example, Voila! You trigger a new Handshake because the HDMI chain has now changed.
HDMI handshakes, by design, take a few seconds to perform -- time enough for all the devices involved to get their act together. And during that time, your audio and video content gets Muted. So you WILL notice each time a new HDMI Handshake happens!
The video carried on HDMI is defined by several different characteristics:
- Resolution: The number of lines of video and the number of pixels on each line.
- Interlacing: How the lines of the image are ordered (Progressive or Interlaced).
- Frame Rate: The number of images per second (24 for film, or 50 or 60 for TV).
- Aspect Ratio: The shape of the image.
- Video Encoding Format: A flavor of "RGB" or "YCbCr" encoding.
- Color Depth: The number of bits used to record the data for each pixel.
- Dynamic Range: The brightness range authored into that encoding (Standard or High).
- Color Gamut: The range of color saturations also authored in (SD, HD, or Wide).
You've probably already run into video Resolution terms such as 480i or 1080p or UHD (misleadingly also called 4K). The numbers in 480i and 1080p refer to the number of lines in a complete, single image from the video stream. The small "i" and "p" describe whether a given image is transmitted in Interlaced or Progressive format respectively.
Progressive format is what you might expect to happen: Each line of a given image, from top to bottom, is transmitted in order until you get to the end. Then the next image starts. Interlaced is more confusing: The odd number lines are sent followed by the even number lines. Interlaced format reduces the necessary bandwidth (data rate) to transmit each image of the video, at the expense of it taking twice as long.
Interlaced is further complicated by the fact that the odd and even numbered lines may reflect the same instant in time (as for example if you scanned the image from a still photograph), or some small amount of time may have passed between when the two sets of lines were captured (as when using an older video camera that first captures one set of lines and then goes back to capture the other set of lines). Whatever you are recording may, of course, have moved a little bit during that time, which affects both how the video looks and how it gets processed.
TECHNICAL NOTE: The individual images of the video are called Frames, and the interlaced half-Frames that combine together to make an image are called Fields.
UHD is a Marketing term for video which is 2160 lines of 3840 pixels each. (Contrast with 1080p video which is 1080 lines of 1920 pixels each.) You can almost hear the Marketing guys thinking that "UHD" would surely sound more impressive than calling their new video merely 2160p. Indeed it didn't take them long to begin conflating UHD with the even MORE misleading term, "4K" to describe this new video! (UHD video is always Progressive.)
Frame Rate, as stated above, refers to the number of images per second. For years, movies have been shot at 24 frames per second. When TV was introduced, the cheapest way to make TVs was to tie their electronics to the local power line frequency. So 60 Hz in the US, for example, and 50 Hz in Europe. Thus Frame Rates of 24, 50, and 60 frames per second are all common. The combination of a Resolution and Frame Rate are often stated together like this: 4K/24 or 1080i/60.
When the video is Interlaced, it takes two such cycles to transmit all the lines representing a complete image. So traditional, Standard Definition TV in the US for example -- which was 480i/60 -- actually transmitted a complete image only 30 times per second.
Content on regular Blu-ray discs might be 1080p/24 or 1080i/60. The full frame, Frame Rate of the 1080i/60 content is thus 30 frames per second. So it's a data rate which is more, but only a modest amount more, than the data rate of 1080p/24 content. Thus both formats take a similar amount of space on disc and data rate when reading the disc.
TECHNICAL NOTE: These Frame Rate figures frequently gloss over the difference between Nominal and True Frame Rates for video. For technical reasons involving how audio and video were combined (modulated) into a single, radio frequency, TV transmission channel without interfering with each other, it turned out to be necessary to LOWER the frame rate of broadcast TV a tiny amount. The amount turned out to be 1/10 of 1%. So a 480i/60 TV broadcast was actually slowed down -- producing 29.97 full Frames each second instead of 30. This slow down amounts to 1 less Frame every 33 1/3 seconds -- far too small a difference for the eye to see, and well within the abilities of low cost TVs to "re-sync" with each new Frame of video as it arrived slightly late. If you've ever seen an older TV that had horizontal interference lines slowly rising up the screen this is exactly what you were seeing. The horizontal interference lines are coming from power line interference -- at a true 60 Hz -- but the TV frame rate was slightly slower than that! So the interference lines appeared slightly closer to the top of each successive Frame. Meanwhile movies prepped for TV broadcast also had to be slowed down to match. Again by 1/10 of 1%. So the True Frame Rate for films on TV was 23.976 frames per second instead of 24. These days, films on home media might be 23.976, or 24.000, or even 25.000 (from Europe), but we typically gloss over that and refer to film rate as just /24.
If you read my previous post on Standard Definition Video Aspect Ratio, you'll recall that SD Video might be either 4:3 in shape or 16:9. Standard Definition refers to any video format with a Resolution under 720p. So 480i and 480p in the US and 576i and 576p in Europe.
720p, 1080i, and 1080p video are always 16:9 in Aspect Ratio (although the content inside the video may be a different shape -- padded out to 16:9 with Letter Box Bars or Pillar Box Bars embedded in that content).
UHD video, at present, is also always 16:9. However the UHD Blu-ray specifications include a feature -- not currently used by the studios -- to author "Scope" aspect ratio content, at an Aspect Ratio of 21:9. Expect this to become a point of confusion in the future.
So now we've got the image Geometry and Frame Rate nailed down. Which means it's time to get into the REALLY complicated stuff!
Are you sitting comfortably? Good. Then let's begin.
Video for Home Theater can be authored, and transmitted, in several different "Encoding Formats". Conversion BETWEEN these formats is common. For example, the format used when storing a movie on a Blu-ray disc may be different from the format use to transmit that movie to your TV. And THAT may be different from the format used inside the TV while it is doing its video processing.
The simplest Encoding Format to describe is called "RGB". In RGB, each pixel of an image is defined by a Red, Green, and Blue value -- called Components. If there's no Red or Green or Blue you get Black. If there are equal amounts of Red, Green and Blue you get a gray somewhere in the range from Black to White.
The 3 Components for each pixel share a Color Depth -- which is simply the number of bits used to represent each Component. Color Depth choices are 8-bits, 10-bits, and 12-bits.
So an 8-bit Color Depth means that each pixel is represented by 24-bits in total. A 10-bit Color Depth is 30 bits per pixel, and a 12-bit Color Depth is 36 bits per pixel.
The vast bulk of home media content out there -- whether broadcast on TV, or streamed over the Internet, or recorded on a movie disc -- is authored at 8-bit Color Depth. 24 bits per pixel. This is the world of Standard Dynamic Range (SDR) content.
However, the brave new worlds of UHD TV, UHD Internet streaming, and UHD Blu-ray discs have introduced High Dynamic Range (HDR) content. There are several competing formats for HDR content in the market right now. The most common, and the baseline format for UHD Blu-ray discs, is known as "HDR10". As the name implies, HDR10 uses 10-bit Color Depth.
A competing HDR format from Dolby Labs called "Dolby Vision" uses 12-bit Color Depth.
But since content of 8-bit Color Depth is so ubiquitous, I'm going to focus on that for the discussion of Encoding Formats here. Just keep in mind that everything I'm about to say has corresponding results for content of 10 and 12-bit Color Depth.
In 8-bit Color Depth, each Component of the RGB for a given pixel can take on just 256 values -- between 0 and 255. That's the maximum you can represent in a number of 8 bits.
You might naturally expect that Black would be defined as Red, Green and Blue values of 0 each. HOWEVER, content for Home Theater is NOT authored that way! RGB for Home Theater is authored so that Black is represented by a value of 16 for each of the Red, Green and Blue Components.
The same thing happens at the White end. Reference White is defined as a value of 235 for each Component.
The values from 1 to 15 are called the "Blacker Than Black" values. The values from 236 to 254 are called the "Peak White" values. (The values 0 and 255 are both reserved for special purposes.)
WHY do this? Well it turns out that Digital Signal Processing of video has problems if the imaging data has a sharp cutoff. You get "artifacts" in the video unless things are allowed to float around a bit either side of Black and Reference White.
So the Video Encoding is designed to include both "foot room" -- the Blacker Than Black Values -- and "head room" -- the Peak White values. And these ranges are preserved through all the stages of the video chain starting with when the content is first digitized (perhaps in the camera), through all of the editing, through delivery via broadcast or streaming or on physical media, through the processing in your playback device, and through the processing in your TV.
Even though there is real pixel content in the Blacker Than Black range, it is NOT intended to be seen! If your TV is set up correctly, any pixel with a Blacker Than Black value will be indistinguishable from just Black. All of that stuff will merge into a single, uniform Black.
The White end is a bit different. Every TV has a limit as to how much light it can put out. If you try to push beyond that limit the pixels won't get any brighter. The brightness will just top out at some point. And indeed that limit may be different for different colors -- meaning really bright pixels may be tinged with color when they are supposed to be white.
On the other hand, you want ENOUGH light output from the TV to produce a pleasing, comfortably bright level for White.
So when setting up a TV, you start by adjusting its light output for Reference White. Then you look at what happens when the TV is fed pixels with values in the Peak White range. If those clip -- that is, if they are not distinguishable -- you could lower the TV's light output for Reference White to give it more range to display the Peak Whites. But you don't want to lower it so much that Reference White looks dingy gray. You want Reference White to be a clean White. And you definitely want Reference White to be truly White -- not tinged with color because one or more of the colors is ALREADY clipping.
From an authoring point of view, the content editors are supposed to insure that all of the critical content of their movie or TV show is in the range from Black to Reference White. However, they are also free to include interesting content -- sparks, glints, and cloud highlights for example -- that are in the Peak White range. They just need to stay aware that it may not be possible for some TVs to render the Peak Whites.
TECHNICAL NOTE: There are actually two flavors of RGB you will encounter. The one I've just described is most commonly called either "Studio RGB" or "RGB Video Level", and it is the one you want to use for Home Theater unless your gear can't handle that properly. The other flavor of RGB is called a variety of names, but "Extended RGB", "Enhanced RGB", and "RGB PC Level" are the most common. This flavor of RGB defines Black as 0 and Reference White as 255. Note that since no other values are available, there is no way in Extended RGB to encode pixels with values in the Blacker Than Black range or the Peak White range. Extended RGB is most commonly used when the source device is creating the video on the fly, and is DIRECTLY connected to its display monitor. This would be the case for computers and for video game consoles for example. They can get away with the lack of foot room and head room in the video because there is essentially no video processing going on between the creation of the video and the pixels lighting up. The Extended and Enhanced names were created by Marketing folks who were trying to make the most out of the fact that the full range from 0-255 was being used. In reality, this is NOT a format you want to use for Home Media content or for Home Theater setup -- again, unless you have some piece of equipment which simply won't work well for any other video encoding format. Note also that the HDMI Handshake does not negotiate whether Studio RGB or Enhanced RGB will be used whenever RGB itself is in use. This is a MANUAL choice you have to make in the source and destination devices. Look for a setting which has to do with "Black Levels" and which offers only two choices. Alas, the setting and choices may be called just about anything. But if you use RGB and get this setting confused your Black and White levels will be noticeably wrong. This is a major point of confusion for folks who try to use RGB video.
As simple as it is to describe, it turns out RGB Video Encoding is NOT what gets used for Home Theater content authoring. Instead what gets used is something called "YCbCr"!
Just as with RGB, YCbCr uses three Components to describe a pixel. The first Component describes a gray scale brightness, known as Luminance. The other two Components carry "color difference" values telling how much Blue and Red to either add or subtract from that gray to get the desired color. For example, subtract ALL the Blue and Red from the gray and you end up with a Green of the given Luminance brightness.
The Luminance Component is confusingly labeled "Y". There's an historical reason for that -- "L" was already in use -- but it's not really necessary to get into that. The Cb and Cr Components are the Blue color difference and Red color difference respectively. I've attached a picture showing how Cb and Cr translate into the range of available colors for a fixed (halfway) value of Y.
TECHNICAL NOTE: You may also encounter the name YPbPr. This is basically the same thing, but technically YPbPr refers to an Analog video signal -- as carried on those 3 Component Video cables I mentioned up top -- whereas YCbCr refers to a Digital video signal.
YCbCr is the SAME as Studio RGB in terms of preserving data ranges for Blacker Than Black and Peak White values. That is, in 8-bit Color Depth, Black is represented by a Y value of 16 and Reference White is represented by a Y value of 235. (For technical reasons the Reference Cb and Cr values go up to 240.)
As it turns out the YCbCr encoding and the RGB encoding do have different characteristics when doing the Digital Signal Processing to process your Digital Video. But the MAIN reason YCbCr is used is entirely different.
It has been known for a long time that the human eye is less sensitive to fine spatial details in colors than in gray scale. It is easier for the eye to pick up details in grays than in color.
This physiological fact has been used since the dawn of color TV to reduce the bandwidth (data rate) needed to transmit and store color video. How? By only including color information half as often as gray scale information!
Think of it as drawing the fine detail in pencil and then washing color over that using a thicker brush.
The Video Encoding Format jargon gets PARTICULARLY obscure for this. To wit:
- "YCbCr 4:4:4" means that all three components are present for EVERY pixel.
- "YCbCr 4:2:2" means that color is present only HALF as often horizontally.
- "YCbCr 4:2:0" means that color is present only HALF as often BOTH horizontally and vertically!
So if you think of a stream of Digital Video Component values coming along to represent a line of video, YCbCr 4:4:4 would look like this:
Y, Cb, Cr, Y, Cb, Cr, Y, Cb, Cr....
In comparison, YCbCr 4:2:2 would look like this:
Y, Cb, Y, Cr, Y, Cb, Y, Cr.....
Meanwhile YCbCr 4:2:0 extends the same behavior of dropping color Components to successive lines of video. That doesn't represent well in text, so I'll just gloss over that.
Again, this only works because the gray scale brightness of each pixel -- its Luminance -- is separately recorded in this format. You can't DO this with RGB encoding! Each one of the 3 Components of RGB is partly responsible for the brightness of the pixel. And indeed RGB encoding is always, "like YCbCr 4:4:4", in that all three Components are transmitted for each and every pixel.
The jargon here -- 4:4:4, 4:2:2, and 4:2:0 -- is truly something man is not meant to wot of, so take my advice and don't even try. Just remember the descriptions for each I just gave.
It should be evident the storage space for a frame of video, and the bandwidth (data rate) for transmitting that frame of video, goes DOWN if you don't transmit the color Components for every pixel.
Indeed, for a given Color Depth RGB video and YCbCr 4:4:4 consume the same bandwidth. YCbCr 4:2:2, on the other hand, consumes less bandwidth, and YCbCr 4:2:0 consumes less bandwidth still.
But the pixels on your TV can not light up until a color is assigned to every pixel!
That means YCbCr 4:2:0 and YCbCr 4:2:2 have to be reconstituted -- SOMEPLACE -- back to YCbCr 4:4:4 before the pixels can light up!
This process -- called "Color Upsampling" -- is a type of scaling from lower resolution to higher resolution -- but specifically for the color data.
Ignoring, for the moment, the High Dynamic Range formats I alluded to further up, ALL of the content for Home Theater Media is authored at 8-bits Color Depth, and is stored on physical media using YCbCr 4:2:0 encoding. Think of it as kind of a first step of data compression -- reducing the necessary storage space and also reducing the data rate needed to read content off that media.
TECHNICAL NOTE: The actual storage formats for Home Theater Media include additional, highly sophisticated compression techniques which I'll gloss over in this post. For example, video might be recorded using AVC, MPEG-4, or HEVC compression. These further reduce -- quite dramatically -- the necessary storage space and data read rates: Something that's absolutely essential to get the huge amount of data represented by a movie or TV show compact enough to fit onto a reasonably priced storage media like a Blu-ray disc. In the process of reading the movie off the disc, these storage formats get expanded by, for example, your Blu-ray player, into the real video stream which can be sent to your TV over the HDMI cable. But the result of that expansion is, as just mentioned YCbCr 4:2:0. That is, the studios author YCbCr 4:2:0 and then compress THAT for storage onto the disc.
The Color Upsampling of YCbCr 4:2:0 involves two steps. Color detail is interpolated vertically, resulting in YCbCr 4:2:2, and then also horizontally resulting in YCbCr 4:4:4.
As it turns out, all THREE of those flavors of YCbCr can be sent over the HDMI cable! If the source device sends out YCbCr 4:4:4, that means it has done all the work of Color Upsampling -- something you might prefer if you have faith in that device. But YCbCr 4:4:4 is also the highest bandwidth signal to go out on the HDMI cable (for a given Resolution, Frame Rate, and Color Depth). And that might be a problem if your HDMI cabling is not able to handle that bandwidth reliably -- for example if you have a long cable run to a projector.
Sending YCbCr 4:2:2 means the source does half the work of Color Upsampling. The rest is left to your TV. But the bandwidth of the signal is less, which may make you HDMI cabling more reliable.
Well then, since the content is only YCbCr 4:2:0 to begin with, and since the work of Color Upsampling is going to get done regardless before the pixels light up on the TV, why not send YCbCr 4:2:0 over the HDMI cable and thus use only the smallest amount of bandwidth?
The problem is that Color Upsampling starting from 4:2:0 requires that you buffer multiple lines of video. i.e., you need data from adjacent lines to establish the color values of the pixels in THIS line.
That raises the cost and complexity of the TVs.
So broadcast video is sent as 4:2:2 instead of 4:2:0, and until fairly recently video over HDMI cabling was also limited to 4:2:2 or 4:4:4.
What happened fairly recently? UHD (4K) video is what!
Sending 4K video over HDMI cabling put a distinct strain on the design specs for the HDMI cabling. This was particularly true for 4K/60 video.
There's no point having a 4K TV if the cabling can't get the video into the TV! So an exception was made -- bought into by the TV manufacturers -- that new TVs would be designed to accept and properly process 4K/60 YCbCr 4:2:0 and 4K/50 YCbCr 4:2:0 video -- both limited to 8-bit Color Depth. These were the first generation of 4K TVs: BEFORE High Dynamic Range and Wide Color Gamut got added into the mix!
Which probably reminds you that from my original list of Digital Video characteristics, up above, I've neglected to discuss Color Gamut!
Color Gamut refers to the range of colors the video can represent.
The human eye is a remarkable instrument for perceiving color. Indeed the ability of the human eye in this regard EXCEEDS what can be reproduced as color on TVs, on movie film, on photographs, and on the printed page.
The wish, of course, for any type of media, is to come as close as possible to reproducing the entire range of light and color the eye can see. But practical matters prevent that. The technology simply does not exist -- even if you disregard the cost!
This was most sorely felt when Color TV was first invented. Both the TV cameras and the actual home TVs were simply not very good at color!
The standards around this have evolved both in substance and in name over the years, but by the time Standard Definition, Digital Video came about -- as, for example, what's found on SD-DVD discs -- the standard had been renamed BT.601. This is Standard Definition Video Color Gamut -- dating all the way back to 1982.
BT.601 substantially limits the maximum saturations of colors -- way less than the limits the human eye can see. You can show Red, but you can't show a really PURE Red.
When HDTVs were introduced, one of the biggest changes -- perhaps even more impressive on first glance than the increase in image Resolution -- was the introduction of the new, High Definition Video Color Gamut, named BT.709! Although HD Color Gamut was a big improvement over the SD version, it was still dramatically less than what the human eye could see. Again the limiting factor was the technology available. Even the new HDTVs could not reproduce much beyond what BT.709 called for.
With the advent of UHD (4K) video, it was time for another change. This time the standards bodies decided to go BEYOND the limits of current technology. The new Color Gamut for UHD video -- called BT.2020 -- would allow a range of colors WAY bigger than BT.709. And indeed, way bigger than any video cameras or displays -- at ANY price -- could handle! (But STILL smaller than the full gamut the human eye can perceive!)
The agreement was that studios and TV makers would, for now, target a subset of BT.2020 -- called "P3" -- which was closer to what current technology could handle -- and similar to what was being used in the new, Digital movie theaters.
But that P3 gamut is always stored and delivered in a BT.2020 container. And so as technology evolves, and both content creation and display technologies extend further and further into the realm of increased color saturation, the BT.2020 standard will ALREADY be there. Which, with any luck, will make it easier to introduce such new technologies as time goes by!
And thus we get to the next generation of UHD (4K) TVs: Those that were engineered to handle the higher light output requirements of High Dynamic Range (HDR) content, while also coming closer to the ability to reproduce at least the P3 subset of the new, BT.2020 Color Gamut -- now rejoicing in the Marketing name of Wide Color Gamut (WCG).
And that meant putting MORE data on the discs and transmitting MORE data over the HDMI cable (or broadcast TV channel, or Internet stream)!
In the case of HDMI, that required new hardware designs to handle the higher bandwidth signals. You'll recall I mentioned the HDR10 style High Dynamic Range required 10-bit Color Depth and Dolby Vision required 12-bit.
And some of this content would be 4K/60! So the old workaround of having TVs handle 4K/60 with YCbCr 4:2:0 at only 8-bit Color Depth would not fly.
So the HDMI spec was expanded to include higher bandwidth formats -- in new model source devices and TVs, of course.
The new standard allows 4:2:0, 4:2:2, and 4:4:4 all the way up to a Color Depth of 12-bits but with some gotchas:
- YCbCr 4:2:0 would STILL remain limited to either 4K/60 or 4K/50 video. Trying to send 4K/24 4:2:0, or even 1080p/60 4:2:0, at ANY bit depth, would not be a legal HDMI format.
- The newly-legal combo of 4K/60 YCbCr 4:4:4 would be limited to 8-bit Color Depth, as anything higher would exceed the bandwidth the HDMI cables could carry. Thus if you were trying to send HDR10 or Dolby Vision content (which require 10-bit and 12-bit respectively) the YCbCr format would also have to be reduced -- typically to 4:2:2.
And while they were at it, the industry ALSO tossed in a new, even more finicky, version of HDCP copy protection -- called HDCP 2.2. So you can't even GET this content into a device unless it has also been updated to be compatible with HDCP 2.2!
So there you have it: The complete lay of the land for Digital Video formats!
Which only leaves the question, which one should I USE?
The first recommendation is don't use the "Extended"/"Enhanced"/"PC Level" version of RGB format unless you have a device that really needs that. This might be the case if you were trying to use a Computer Monitor as your TV screen, for example. Some older game consoles are also designed to work best with this form of output.
Using that form of RGB for normal, Home Media content means that the video levels in the content need to be stretched to fit. On top of which, any Blacker Than Black pixels get clipped to Black and any Peak White pixels get clipped to Reference White. This massaging of the content can result in rounding errors which you will likely see as banding.
The next recommendation is to use a higher Color Depth if your cabling can tolerate it. Now don't expect any substantial change in your video from raising 8-bit even to 12-bit. Why? Because the CONTENT is authored at 8-bit. That means the extra bits you transmit over the HDMI cable for 10-bit or 12-bit can only represent ROUNDING results coming out of the video processing in your source device. There will likely be a difference, but at best it should be quite subtle. Indeed, depending on your TV, if you send 10-bit or 12-bit the TV may strip that back to 8-bit as the first step upon input! This is less likely to happen in the newest UHD TVs since they need to handle the higher Color Depth for HDR content.
There's a proviso here: This assumes there are no video processing bugs in either your source device or the TV! TVs are complicated beasts and there certainly exist cases of TVs that handle some input formats worse than other formats. Since this would represent a design flaw in the TV -- a "bug" -- there's no rhyme or reason to it. You just try the various format choices of interest to you, and if you discover your TV is doing a noticeably poorer job with some format combo, simply discard that one from consideration.
TECHNICAL NOTE: Due to the change of HDMI specs to allow for the brave new world of UHD High Dynamic Range and Wide Color Gamut, there's substantial concern among TV makers that their newfangled HDMI Inputs will turn out to be incompatible with older HDMI source devices! For example, when the source asks the TV what it can accept as input, the source may get befuddled when all these new, unexpected possibilities are returned! To prevent this, there will typically be a setting in the TV to make the HDMI input work in the OLDfangled way -- and indeed, oldfangled will likely be the factory default. So if you don't get 4K with HDR working into your TV, odds are you simply have to find that setting and change the HDMI input on the TV to newfangled. This is just the sort of stuff Marketing people love to invent names for, so the setting could be called just about anything. For example, on recent UHD TVs from LG Electronics, the setting is called "HDMI Ultra HD Deep Color", and it defaults to Off for each HDMI Input.
That leaves the question of Studio RGB vs. one of the 3 flavors of YCbCr (within the HDMI Resolution and Frame Rate combos for which they are legal).
The theoretical answer is IT SHOULD NOT MAKE A DIFFERENCE! That is, given the way the content is authored, and assuming the TV is doing its video processing correctly, you should get the SAME result with any of these!
The practical answer is you have to be prepared for possible bugs in the TV, just as with Color Depth. So you TRY the various formats of interest to you, and if you see any that are being handled poorly you just discard those from consideration. But keep in mind, the CORRECT answer here is that you WON'T see any difference!
So if you have a set of formats that all seem to work equally well, how do your choose?
Well in my case I reasoned like this: I want the smallest possible change in video format going into my TV in the face of different forms of content. Why? Because unless you take the time to check you don't really know whether your TV has some sort of calibration problem for each format. Checking takes time. So the fewer you have to check the better.
And for me, that means I use YCbCr 4:2:2 at 12-bit Color Depth into my TV.
This is legal for both 4K/24 4:2:2 12b and 4K/60 4:2:2 12b.
If I used 4:4:4, for example, 4K/60 4:4:4 would be limited to 8-bit due to the HDMI bandwidth limitation I mentioned above.
Meanwhile, some folks have a challenge getting their HDMI cabling to work. It ALMOST works but they have problems with the highest bandwidth signals -- e.g., when they try to send 4K/60.
For those folks, the combo 4K/60 4:2:0 -- even at 12-bit -- may be just the ticket. Lowering the bandwidth on the HDMI cable just enough to make their 4K/60 reliable.
Hopefully this post has given you enough background that you can now consider such choices with reasonable confidence you are not clobbering your video Picture Quality!
As I said in another post, we've covered A LOT of ground here. Don't expect to have it all sink in at once. Just keep in mind you can refer back to this post as questions arise or jargon rears its head!
--Bob