Understanding Audio Downmix and Surround Sound Processing, OR Wait! I've Got the Wrong Number of Speakers?

Surround Speakers.jpg

It used to be so SIMPLE, back in 1898, when Francis Barraud painted his brother's dog, Nipper, staring intently into the brass horn of a wind up phonograph and hearing, His Master's Voice!

One audio channel.  One speaker.  One dog.  And one enduring trademark!

One has to wonder how Nipper might react to today's Dolby Atmos installations for Home Theater (or any of its competitors); a technology designed to support up to 24 speakers at ear level, a special bass audio channel, and 10 ADDITIONAL "height" speakers overhead!  Would Nipper still stare in wonder?  Or would he dive under the sofa?

The answer to that question likely revolves around how INTELLIGENTLY those speakers were used!  In particular, what if the content you are playing has a different number of audio channels from the number the speakers you have installed?  This is the realm of audio Downmix and Surround Sound Processing, and that's our topic for today.

Downmixing is what happens when you need to render MORE content channels into FEWER speakers.

Surround Sound Processing is what happens when you want to render FEWER content channels into MORE speakers.

Downmixing is essential.  If you don't have speakers to match all the channels in the content, and fail to downmix the audio, some content channels simply get lost.  You won't hear what's in those channels.

Surround Sound Processing is optional.  It's purely a matter of taste whether you want to use some or all of the extra speakers in your configuration (i.e., beyond the ones which match the content channels).

From a user perspective, Downmixing is simple:  There are standard methods for doing it, and so there are no real choices you need to make other than making sure it is enabled.  Surround Sound Processing, on the other hand, may present a variety of choices you need to make as to exactly HOW it is done -- i.e., how your system figures out which audio to send to your extra speakers.  Usually there is no "right" answer for those choices.  Just as when deciding whether to use Surround Sound Processing in the first place, deciding HOW you want it to work, is also, purely a matter of taste.

Today, all of this falls into the realm of Digital Audio processing, and may be taking place in ANY of your gear which handles Digital Audio.  But for the most part, Downmix and Surround Sound Processing are the job of the main, Digital Audio processor in your setup -- typically an Audio Video Receiver (AVR).

(Before we go further, now might be a good time to review my post on Digital Audio 101, as the discussion below is based on concepts detailed there.)

It was the introduction of Stereo audio which started us down this path.

Suppose you want to play one of those newfangled, Stereo records but only have one speaker?  You could just play the Left or Right channel, of course, but better would be to mix the two channels together to send to your Monaural speaker, so you hear ALL the content captured in those two channels.  That's an example of Downmix, and initially it would have been done using Analog audio circuitry.

Or suppose you've invested in a nifty, new Stereo system, but still have older, Mono records?  Well, you want that Mono audio to come out on BOTH speakers, identically timed and in proper volume balance.  If done right, the two speakers will combine to produce the desired, Monaural effect.  This is the simplest example of Surround Sound Processing (although it would not have been called that at the time), and again, it would have been done using Analog audio circuitry.

Fancier Analog audio circuitry might, optionally, add some "de-correlation" to the audio going to those two speakers -- basically shifting the timing subtly.  This would cause you to hear the sound as more "spread out".  And the maker could market this as Simulated Stereo, and charge more money for it!  This would be the simplest example of the type of processing CHOICES you might have to make when using Surround Sound Processing.

All of that was pretty exciting for its time, but elementary compared to what was ABOUT to happen.  And THAT came about because movie studios and theater operators started panicking about television -- worrying folks would stay home and watch TV instead of coming to the movies!  Multi-channel audio (along with widescreen presentations) were seen as key to getting paying customers back into the seats!

See my post on Checking Your Speaker Distances and Polarities for some of the history.  Other, more harebrained, enticements were tried as well.  For example, it's no coincidence a new wave of 3D films were produced in the early 50s.  And don't forget Smell-o-Vision!

To begin with, the "ambient" Surround sound for these movies was extracted on-the-fly from the Stereo audio track for the feature.  The single Surround channel would be sent to speakers on the back and side walls of the theater.

All of this also coincided with the development of the first Digital Audio processing devices -- targeted at movie studios and theater operators.  The first such devices allowed studios to "Matrix" a Surround track into the feature Stereo audio.  When combined with matching electronics in the theater, this resulted in a more accurate rendition of the desired Surround ambience audio.  This pretty rapidly evolved into systems for creating and playing true, discrete, multi-channel movie tracks.

The next trick was to get all this into people's homes!  To do that you had to convince people to buy additional Surround speakers -- along with amps to drive them.  And you also had to convince them to buy digital sound processing electronics which could steer audio to those speakers.

Digital Audio processing technology was coming down in price at a pretty rapid clip, but it was still quite spendy for the average homeowner, compared to just buying a TV for example.  Folks who were into high end music systems were more likely customers -- used to the high prices of such exotic equipment.  But there was no tradition of surround sound in music.  Nothing like what the movie studios were touting in the theaters.  So it was a bit of a slog.

But The Power of Marketing Compels You!  And slowly but surely, Surround Sound systems took root in homes.

Now keep in mind the CONTENT people were playing in their homes was still just Stereo -- or even Mono!  There was no way to get those movie theater tracks at home, for example.  So the new Surround Sound systems were competing to find more interesting ways to expand Stereo into multiple speakers -- for example 5 speakers at ear level plus a subwoofer -- a 5.1 speaker system.

All sorts of competing algorithms were marketed.  You like organ music?  How 'bout we play it for you as if you were sitting inside a Cathedral?  Echoes, reverb, etc. etc.  Your old Stereo system can't do THAT!

And then came DVD discs.  And Dolby Digital and DTS 5.1 audio tracks!  And THAT'S when Surround Sound REALLY took off in Consumer Electronics!

Studios could now repurpose the Surround Sound audio mixes they were already producing for movies as DVD audio tracks.  And DVD players could send that audio as Digital Bitstreams (over Optical or Coax S/PDIF cabling) to Surround Sound processors.

The video output of the disc players was still Analog at this point, but the electronics could be designed to switch that to the TV while also processing the digital audio.  So now you had Surround Sound "Audio Video Receivers".  Many of the niche companies who made the original Stereo to Multi-channel Surround Sound Processors got into this game, but they were largely drowned out by the major electronics companies rushing to get their share of this new, cash windfall.

(The niche players focussed on their proprietary algorithms for ambience processing, along with trying to outdo the original AVRs by going a step further:  Processing 5.1 audio tracks for output to 7.1 Surround speaker setups.)

And THIS is where the question finally hit home users:  What do you do if you've got the wrong number of speakers?

Many buyers were not prepared to purchase a full complement of speakers up front.  They might have only Stereo speakers.  Or they might have a pair of Stereo speakers and a pair of Surround speakers, but no Center speaker and no Subwoofer.

This was a major concern for the electronics companies, because if folks could only use new AVRs with a full complement of speakers they might very well put off BUYING that new AVR.  And so the industry came to agreement pretty quickly on standards for doing Downmixing.

For example, if the user did not have a Center speaker, then Center channel content would be "steered" to the Left Front and Right Front speakers -- mixing together with the Stereo audio already intended for those two speakers -- and thus produce the EFFECT of having a Center speaker, in addition to the two, existing Front speakers.  Thus you now had a so-called, "Phantom" Center speaker.

Or if the user had no Surround speakers, the audio intended for those speakers would be steered to the Front speaker on the same side, but also with a standardized reduction in volume.  Thus the Surround content would be heard, but in a way which would not distract from the more important content already intended for those Front speakers.  This was called "preserving the sound stage".  I.e., the difference in volume would clue in the listener this "steered" Surround sound was not really coming from something in front of them.

As I mentioned, the niche Surround Sound electronics makers typically had their own proprietary algorithms for Digital Audio processing, but when it came to processing these movie tracks from DVD discs what was important was what the MOVIE studios had used.  And that was licensed technology from Dolby Laboratories and DTS.  And the rules for Downmixing in these new Home Theater Surround Sound systems were defined by Dolby and DTS.

So that makes it simple for users:  You live with the way your equipment manufacturer implemented the Dolby and DTS rules for Downmixing.  There are no decisions you have to make -- except of course whether now is the time to add additional speakers to your system!

These rules, for example, detail "Downmix Attenuation".  A Digital Audio stream has a maximum volume it can represent -- called the Full Scale signal.  But if you are mixing two or more streams together as part of Downmixing, what if they are each ALREADY at Full Scale?  You'll end up "clipping" the result -- meaning the audio gets distorted.  So Downmix Attenuation is applied.  That is, the individual signals are each reduced in volume before they are summed together, to keep the result from clipping.  The AMOUNT of Downmix Attenuation you need is based on how many channels are being mixed together -- plus how much attenuation might already have been applied to "preserve the sound stage" as described above.  Again, this is all standardized.  From the user's perspective, this shows up as difference in volume according to the content you are playing.  For example if you play a Stereo track into Stereo speakers at a given Volume setting, it will sound LOUDER than if you play a 5.1 track into those same Stereo speakers at the same Volume setting. The 5.1 track has to be Downmixed, and Downmix Attenuation has to be applied.

(Alternatively, the AVR could process the Digital Audio stream using more bits per sample than were present in the original audio tracks.  That means Full Scale for the processed audio is larger than Full Scale for the content channels.  So there's more room to sum things together without clipping.  Thus no need for Downmix Attenuation, but at the cost of using more expensive electronics in the AVR.)

Downmixing the LFE channel -- the ".1" of "5.1" or "7.1" tracks -- is particularly thorny.  As I've discussed in previous posts, the LFE channel exists as a place to hold LOUD bass.  This works because the content in the LFE channel is recorded -10dB down from the normal speaker channels.  On playback, Sub Boost is applied to get that channel back up where it is supposed to be to match the Subwoofer volume with the volume for the main speakers.

But suppose you do not HAVE a Subwoofer?  Should you Downmix the LFE channel into some or all of the other speakers?

If you do, the LFE content and the main speaker content will have to be matched in volume BEFORE they get summed together.  (Since once they are summed together, there's no way to separate them any more and provide Sub Boost to just the LFE content.)

So that means the main speaker content has to be attenuated -10dB.  And then BOTH the main speaker and Sub content have to be attenuated according to the Downmix Attenuation.  And only then can they be summed together.

Now, that's a lot of Attenuation.  You can compensate by raising Volume during playback, of course, except attenuating the signal like that lowers it closer to the "Noise Floor" in your audio electronics.  When you raise Volume to get back to a normal listening level, the Noise Floor also gets raised in Volume.  And thus the Noise Floor may be raised to a level that's audible!

The same sort of thing happens when Sub Boost is applied to the LFE channel output, but the Noise Floor presents itself as "Hiss", and that's made up of higher frequencies the Subwoofer will not reproduce.  So no problem.

However, if you Downmix the LFE channel into the main speaker channel(s) -- because you HAVE NO Subwoofer -- you are now risking hearing the Noise Floor in those main speaker channels, where it CAN be heard.

The bottom line is you should not be Downmixing the LFE channel into main speaker channels UNLESS you can arrange for that to happen AFTER the necessary Sub Boost has already been applied to it!

If the Sub Boost is being applied in the AVR, it is safe for the AVR to do LFE Downmixing (after that point).  But it is NOT safe for some source device to do LFE Downmixing and pass the result to the AVR.

So if you have a multi-channel Analog source device, and have no Subwoofer, it is not wise to set it to mix the LFE content into, say the Left Front and Right Front Analog audio output channels and pass that result along to the rest of your audio system.  Instead you should just discard the LFE channel.  (Easily done by lying to the device and telling it a Subwoofer actually exists, even though one is not connected.)  You can do that because the rules for mixing multi-channel audio tracks say the audio mixing engineer can not assume the user's playback configuration will include a Subwoofer.  I.e., all CRITICAL bass must also be present in the main speaker channels -- each of which can go as low in frequency as needed.  So what's lost is not all bass, but rather just the LOUD bass component carried in the LFE channel.

So much for Downmixing.  What about Surround Sound Processing?

Multi-channel AVRs also provide THAT!  Here we begin with the existing schemes from the prior, Stereo to Multi-channel Sound Processors.  I.e., if you are playing Stereo (or Mono) content, and have, say, a 5.1 speaker configuration, what if any audio should be sent to the Center and Surround speakers?

(The SUBWOOFER is easy.  It gets bass steered from the content audio channels according to Crossover processing.  See my post on Choosing a Crossover Frequency for more.)

Now think about this for a moment.  The Stereo (or Mono) audio you are playing was never intended to be played in more than a Stereo speaker system.  There's nothing authored into it saying, "I'm audio for the Center speaker," or, "I'm audio for the Left Surround speaker."

That means the AVR has to analyze that audio and CREATE appropriate NEW audio to go to the Center speaker and the Surrounds.  This is basically math -- a form of Digital Signal Processing.

And rather sophisticated math at that!   A metric TON of research has gone into the development of Surround Sound algorithms, resulting in multiple approaches.  And don't forget that EXTRA processing the Sound Sound Processors were marketing.  You remember -- adding echo and reverb because your listening room is supposed to sound like the inside of a Cathedral?

Let's consider one piece of this puzzle:  Dialog.

Multi-channel movie mixes typically put (almost) all the dialog into the Center speaker.  This makes sense of course, since usually the people talking on screen are where the camera happens to be pointing at the moment.  I.e, they are directly in front of the viewer.  You do get SOME dialog that gets put into other speakers, but that immediately draws attention away from the screen in front of the viewer.  So it is done for EFFECT, because the filmmakers want to make the POINT that this speaker is off camera.

Meanwhile, music cues generally underplay the Center speaker.  They are primarily in the Left Front and Right Front speakers -- with audio fill provided by the Surround speakers.  This is natural as well.  You want the good Stereo separation provided by the Front speakers, along with the ability to preserve room ambience via the Surrounds.

So what should the AVR do with the Center speaker if the movie track you happen to be playing is in Stereo?  The usual answer is that it should attempt to DETECT dialog in that Stereo track, and send that to the Center (with perhaps, also, some attenuated version of it in the Fronts).

OK, but what if the original dialog is louder in the Left Front channel than in Right Front?   Should it still go to Center?  And what if someone is SINGING?  Is that still dialog?  Or should it be treated more like music -- maintaining the separation of the two Front speakers?

And how on earth do you tell a sound is dialog in the first place?  Is a dog barking the same thing as dialog?  What about wind rustling through the grasses?

As I said above, this is research level stuff!

And then there are the Surround speakers.  Presumably the audio you want to send to those (from your Stereo movie track) will be audio which enhances the "ambience" of the track -- the feeling you are in a real space with appropriate, secondary sound coming from around you instead of all from the front.

Again, this involves analysis of the original audio.  And of course all this analysis has to happen in "real time" -- while the audio is playing -- and using electronics inexpensive enough to attract Home Theater buyers!

As I said earlier, this all started with proprietary algorithms developed by the niche, Surround Sound processor companies.  And some of that carries forward even today with proprietary solutions offered in major brand AVRs.  But just as with Downmixing, the MAIN players here today are Dolby Laboratories and DTS.

For example, Dolby licenses an algorithm known as PLII (or PLIIx for 7.1 speaker systems), and you are likely to find it included in most AVRs.

Some of these algorithms, in some AVRs, will even offer adjustments.  So for example, if you are raising Stereo content to 5.1 speakers, how "wide" should the Center sound?  This has to do with how much the steered audio (like dialog) is focussed in Center or shared between Center and either of the Front speakers.  

So how to you choose?  Which algorithm should you use, and, if it offers settings, how do you select the RIGHT settings?

The answer is:  There's no, one RIGHT choice here!  It is purely a matter of taste!

For example, you may decide to forego Surround Sound processing altogether and play your Stereo content as just Stereo -- likely with the addition of the Subwoofer for Crossover processing.  That is, you will play 2.0 content using only the 2.1 subset of your speakers.

This is a popular choice among serious music listeners, for example, as these folks tend to prefer their music to be as little "processed" as possible.

Or, if you decide you'd like your AVR to light up the Center and Surround speakers, you may discover the different Surround Sound algorithms it includes have characteristics that appeal to you.  For example, one algorithm may make more use of the Surround speakers than another.  And it is perfectly OK to pick based solely on personal preference!

If you happen to have a 7.1 speaker configuration, you will likely find your AVR includes choices which differ primarily in how they raise 5.1 content to 7.1 speakers output.  That is, these choices may sound pretty similar when playing Stereo content, but may be different in a way you find appealing when playing 5.1 content.

There are many things in your Home Theater setup which you should want to get "RIGHT".  Configuring your speakers so they produce the best output -- both individually and when playing together -- is an example.  Configuring your Subwoofer and dealing with problems of bass response in your listening room is an example.  But once you've got your equipment set up properly, choosing whether or not to use a Surround Sound algorithm, and picking which ONE to use when you do, is ENTIRELY up to what you happen to like.

Just like selecting a playback Volume.

There's a REASON multiple Surround Sound Processing algorithms have survived in competition with each other.  Personal tastes vary!

Do take the time to experiment, of course, so you can get a feel for which choice(s) you like best.  And don't be surprised if you find you like a DIFFERENT choice depending on the content you are playing!  And also compare no processing at all.  I.e., 2.0 tracks played as 2.1 speakers (with the Subwoofer), or 5.1 tracks played in 5.1 speakers even though you have 7.1 speakers installed.

The latest thing in Home Theater audio is the new, "Immersive" audio tracks intended for playback in speaker configurations which include "Height" speakers.

There are several competing versions of these, including Dolby Atmos, DTS:X, and Auro-3D.  A movie disc you buy may include, for example a Dolby Atmos track, or a DTS:X track, but is unlikely to include both.  These tracks are also designed to be compatible with Home Theater setups which do NOT include Height speakers -- or the fancier electronics which knows how to decode their Immersive audio components.  For example, a Dolby Atmos track on disc is actually a Dolby TrueHD 7.1 track along with additional information.  If played in a system which does not understand Dolby Atmos, it will play as a TrueHD 7.1 track.  Similarly, a DTS:X track is actually a DTS-HD MA 7.1 track along with additional information.  The "additional information" does not interfere with playing these Immersive tracks as their underlying TrueHD 7.1 or DTS-HD MA 7.1 formats.

Height speakers are speakers mounted high on a wall or mounted on the ceiling itself over the listening area.  (Dolby is also promoting indirect speakers which sit on top of your regular, floor speakers and BOUNCE the sound off the ceiling at you.)   The idea is that folks authoring Immersive tracks can include audio which passes above the listeners, thus providing another whole dimension to the sound stage.

But there's more than just additional speakers involved here.  Dolby Atmos, for example, is an "Object" audio system.  That is, the authoring allows for the specification of particular sounds as multiple "objects" along with the current position of each object in the sound field.  In addition to the discrete audio going to the normal, ear level speakers, the system will then play the audio for each "object" using the combination of speakers closest to its specified location.  As the object moves -- perhaps across the top of the room over the listeners -- its audio will shift, automatically, into different speakers.  One of the niftiest things in this approach is the playback system places the object in speakers according to the actual speakers you have installed!  So whether you have just 2 Height speakers installed, or 4, or 6, or whatever, each object will be rendered according the actual location of your actual speakers.

This is an entirely different take on the idea of Downmixing or Surround Sound processing!

And yet, the more things change, the more they remain the same.  If you don't have a full complement of ear level speakers, Downmixing will still happen there.  And what's more, each of these companies have licensed NEW Surround Sound algorithms to the equipment makers.

Why?  So you can play a traditional 7.1 or 5.1 track -- or even Stereo -- and end up with some audio "steered" to your Height speakers!

Just as with prior Surround Sound algorithms (like Dolby's PLIIx) it is entirely up to you -- your taste -- whether you want to enable this new Surround Sound steering for Height speakers or not.

I will warn you of one pitfall in this regard:  You remember I said a Dolby Atmos track is actually a Dolby TrueHD 7.1 track along with extra info?  Well it's that extra info that tells your AVR it is actually being handed an Atmos track.

And it's possible to screw that up.  To get an Atmos track into your AVR you *MUST* send it over HDMI as a Bitstream, and you must make sure there is no processing happening in your source device which interferes with the "extra information" -- what I like to call, the Sprinkles -- which differentiates the track as an Atmos track.  The most common mistake here is leaving Secondary Audio Mixing enabled in your Blu-ray disc player.

If you screw this up, your AVR should make it clear, because it will not IDENTIFY the incoming audio as a Dolby Atmos track.  But if you don't notice that, and if you happen to have the Height speaker Surround Sound Processing enabled, you will *STILL* get audio out of your Height speakers!  I.e., you will get the audio "steered" to your Height speakers from the regular -- non-Atmos -- audio you are actually sending to the AVR.

I can't tell you how many times I've had folks tell me they love these new Atmos tracks (for example) when they weren't actually playing them!  The audio steered to the Height speakers had fooled them completely!

But such steered, Height audio is not at all the same as playing the real, Immersive track CORRECTLY.  Any more than playing a Stereo track in 7.1 speakers is the same as playing a true 7.1 track.

So use Surround Sound processing to taste to raise fewer channels of content to more speakers of output, but be sure to double-check your multi-channel tracks -- and in particular, these nifty, new Immersive tracks -- are being played CORRECTLY!