Digital Audio 101 (...01010...)

The world we live in is inherently Analog.  All the things we see and hear, for example, are the result of physical events which are fundamentally continuous in nature.  Some portions of your Home Theater are also Analog.  For example, your speakers are Analog devices, as are the audio power amps that drive them.  And the electrical signals carried on the cabling between those amps and speakers are also Analog.

But modern PROCESSING of both audio and video is done Digitally.  And the newest forms of storage and transmission of audio and video are also Digital.  In this post we will explore the basics of Digital Audio, with particular emphasis on the LPCM and Bitstream audio formats.

First it is useful to understand the life cycle, so to speak, of Analog and Digital audio.  Although these days it is more and more common to find some audio (and video for that matter) which has been created, from scratch, Digitally, audio USUALLY starts out as Analog -- for example a live performance recorded by a microphone.  Such Analog audio can be recorded in Analog form -- for example on a magnetic tape or vinyl record master -- but these days it is most usually recorded in Digital form.

The process of converting Analog audio to Digital form is called, quite simply, digitizing.  But there's nothing simple about the process!  The art of representing continuous events (such as audio) in a set of discrete, Digital samples, is fraught with problems.  And the techniques for doing this well are deep subjects in Information Theory and Signal Processing.  I won't attempt to get into any of that now.

So take it as a black box:  An Analog signal goes into one side of a Digitizer, and a Digital audio "stream" comes out the other side.  The stream is made up of discrete samples -- numbers -- representing characteristics of the audio at successive points in time.

The nice thing about a Digital audio stream is it is just the ticket if you want to PROCESS the audio.  Specialized computers, called Digital Signal Processors, do their work on just such Digital streams.  It's all just math when you get down to it.  Very sophisticated math.

For example, suppose you have an audio track intended for a single speaker.  But that speaker is limited in its ability to reproduce bass frequencies.  So you want to help that speaker by pairing it with a subwoofer (a powered speaker specifically designed to render bass frequencies).  That means you need to "steer" some of the audio to the subwoofer.  Indeed you need to do this in a gradual -- blended -- way:  As the frequencies get lower you want more of the audio to go the Subwoofer.  Over a range of frequencies, both the original speaker and the subwoofer will share in producing the desired audio output.  The process of doing this gradual transition of bass audio between the original speaker and the subwoofer is called Crossover signal processing.

And it is done Digitally in modern audio gear.

So you take the Analog audio, digitize it, and pass the resulting Digital stream through the Crossover, and you end up with TWO Digital streams:  One for the original speaker with bass extracted, and the other carrying the bass intended for the subwoofer.

But the speaker and subwoofer are Analog devices!  And so there's another step here.  The Digital streams must be converted BACK into Analog!  This is the job of the DACs (Digital to Analog Converters).  And just like the digitizer, these are sophisticated, signal processing devices.

So lets take an example where you want to play a traditional, vinyl record and have its audio come out on a pair of stereo speakers and a Subwoofer.  The audio recording on the record itself is Analog -- continuous curves cut into the surface of the record.  The electrical signal from the record player to your electronics is also Analog.  That signal now has to be Digitized so it can be processed.  The resulting Digital audio is then passed through the Crossover to steer bass to the Subwoofer -- bass audio that would otherwise have gone to the Left and Right speakers.  Now you have THREE Digital audio streams (one for each speaker and one for the Subwoofer), all of which need to be converted BACK to Analog for output.  So these streams are passed through the DACs to do that job.

Or lets take a more complicated example where you decide you want to play a music "media file" you happen to have in storage in your system.  The contents of the media file are Digital audio.  Lets assume the device you have for playing that media file has particularly wonderful Analog audio output, so you cable things to use that.  But you ALSO have electronics in your system which you use to help correct for problems in the audio response characteristics of your listening room -- caused, e.g., by reflections and such.  Such "Room Correction" processing is done Digitally.  So you play your media file, and the Digital audio extracted from it is passed through the DACs in your player to produce the wonderful Analog audio output you are looking for.  That audio -- now an Analog signal -- traverses the cabling to your Room Correction electronics.  There, however, it must be re-Digitized, because the Room Correction processing is done digitally.  After the Digital audio is modified by the Room Correction processing, it must now be converted BACK to Analog again for output, because of course that's what your speakers need.

The point of the above is to highlight when and where conversions happen between Analog and Digital -- in either direction -- as the audio travels through your system.  Because EACH such conversion is a potential problem point:  A place where quality can be lost depending on the sophistication of your gear.

So keep this in the back of your mind as you are setting things up to play your audio:  Where are conversions going to happen between Analog and Digital, and what might be done to minimize the number of conversion or improve the quality of conversions when they are essential?

With that in mind, let's look at the Digital audio itself.

Digital audio might exist in storage files or in transmission formats -- carried along a cable between devices, or passed along internally within a given device.  Audio and Video media file formats are a rather complicated topic in their own right -- there are LOTS and LOTS of such formats, with very different characteristics.  But I'm going to gloss over that topic right now except to say that some media file formats are "containers" which hold the REAL digital audio or video format inside them.  Some containers can be filled with multiple, different types of audio or video formats (called "codecs"), and the main purpose of the container is to present its combined content -- whatever that happens to be -- in proper relationship.  So for example, a video clip and its associated audio (held in separate codecs inside a single container file) need to get played in proper sync with each other.

The simplest form of Digital audio used in Home Theater equipment is the LPCM stream.  LPCM is short for Linear Pulse Code Modulation.  Remember when I said earlier that the math analysis involved in digitizing Analog audio is complicated?  Well "Pulse Code Modulation" is one of the complex results of such analysis, and LINEAR Pulse Code Modulation is the most commonly used flavor of that.  You don't really need to understand the math (or the jargon for that matter) to follow along, so I'll gloss over that, too.

One of the things that makes an LPCM stream simple is that it represents a single "channel" of audio -- audio intended for just one speaker for example.  So if you are playing a multi-channel movie track with audio for 7 main speakers and a subwoofer -- commonly denoted as a "7.1" track -- this would be made up of 8, separate, LPCM streams.

An LPCM stream is made up of individual samples that share a "sample size", and which are presented in the stream at a fixed "sample rate".  The samples, then, are just numbers.  The sample size is described in bits.  Typical sample sizes would be 16-bits or 24-bits per sample.   The more bits per sample, the bigger the number that can fit in any given sample.

Sample size reflects the "dynamic range" of the particular LPCM stream.  That is, the difference between the loudest (biggest number) and softest (smallest number) sounds which the stream can represent.  It is the job of the people feeding the Analog audio into the digitizer to make sure the loudest and softest sounds are representable in the LPCM stream being created.  If, for example, you try to digitize sounds that are TOO loud, you get what's called "clipping" -- a type of audible distortion which sounds like distinct harshness in the loudest passages.

Sample rate, on the other hand, reflects the "frequency range" the LPCM stream can represent.  This comes from one of those theoretical results of Information Theory and Signal Processing I've alluded to -- in this case, something called the Nyquist Limit.  This says that a Digital stream can only accurately represent Analog frequencies up to 1/2 the frequency of its sampling rate.

So if you want to digitize audio up to 20 kHz -- a typical number thrown around for the highest frequencies the human ear can detect -- you need a sampling rate of at least 40 kHz.

Note that this Limit has NOTHING to do with the quality of the recording gear used or the care taken during the recording process.  It comes out of the math itself; which means there's no way around it!

And what's worse, if you try to record frequencies that are too high, it's not just THOSE frequencies that get handled incorrectly.  Rather, frequencies LOWER DOWN -- frequencies you SHOULD have been able to record cleanly -- also get damaged!

So when folks digitize Analog audio, they are careful to first FILTER the audio to eliminate frequencies which are too high to handle in the intended LPCM stream.

The very simplicity of LPCM streams also makes them ideal for Digital audio processing.  So audio tracks are often created using LPCM.  That is, LPCM audio elements are mixed and processed together to produce LPCM "Masters" -- the desired, final result.

The stereo audio found on a CD music disc is recorded on disc in LPCM -- a separate stream for the Left and Right channels.  Each stream on the CD is recorded with a 16-bit sample size and a 44.1 kHz sample rate.

Modern movie tracks might have 5.1, 7.1 or even more audio channels -- created by mixing numerous, individual dialog, ambience and music sound elements together according to the Director's intent, and the skills of the Audio Editor and the Audio Mixer.  Each such resulting channel is mastered in LPCM -- typically with a 24-bit sample size and a 48 kHz sample rate.

Although both simple and ubiquitous, LPCM streams come with problems.

First and foremost, they are not COMPACT.  They take up a LOT of space in storage media and require high data rates both when reading tracks off of such media, and when transmitting audio internally or over cables to successive devices.

This became a real problem when the industry was trying to launch the original, Standard Definition, DVD movie discs!  SD-DVD is a format that is particularly resource challenged -- both in terms of storage capacity on the disc, and in terms of the maximum data rate that can be read from the disc.

The video for the movie takes up the bulk of both of those, of course, but audio really needed to be more compact.  PARTICULARLY for discs with more than one language track.

In addition, for multi-channel audio tracks, as found in modern movies, the LPCM version of the audio consists of multiple, separate, LPCM streams, which must be handled as a bundle together, and presented in proper sync to each other.

The solution to both problems was the Bitstream audio track.

A Bitstream track is simply a way of combining multiple LPCM streams into a single stream carrying all the audio channels at the same time.  In the process, you can use characteristics of the audio to reduce both the storage size AND the data rate for the Bitstream.  This "compression" works because real world audio is "correlated", and also limited in both frequency and dynamic range.  That is, any given moment in audio is often very similar to the moments on either side of it, and the max range of frequencies and dynamic range are not needed all the time.

The original Bitstream formats -- licensed separately by Dolby Labs and DTS -- go further.  They take advantage of the fact that the human ear can't really hear everything it is theoretically CAPABLE of hearing.  And the Dolby Digital (DD) and DTS Bitstream formats for SD-DVD where designed to take advantage of this by implementing what's called "Lossy" compression of the audio.

Lossy simply means that when you process the Bitstream through a Decoder for playback, the set of LPCM streams that come out of the Decoder are NOT identical to the LPCM streams that went into the corresponding Encoder in the studio.  The parts of the audio that get lost are cleverly chosen to be hard to hear.  And this works very well indeed!

Meanwhile, when Blu-ray discs were introduced, the constraints of the original SD-DVDs were largely lifted.  A Blu-ray disc has vastly more storage capacity than an SD-DVD, and the maximum data rate that can be read from a Blu-ray disc during playback is also much higher than from an SD-DVD.

So to take advantage of this, new, "Lossless" Bitstream formats were introduced.  Dolby's version is called Dolby TrueHD.  The version from DTS is called DTS-HD MA (for High Definition, Master Audio).

Lossless simply means that when you decode the Bitstream for playback, the LPCM streams that come out are bit for bit identical to the LPCM streams sent into the Encoder in the studio.  NOTHING is "lost".

Mind you, a Lossless Bitstream is STILL compressed.  That's the whole point of using the Bitstream format.  It's just not compressed AS MUCH as a Lossy Bitstream derived from the same set of LPCM Masters.

Also note carefully that "Lossless", in this context, is *NOT* the same thing as "High Quality".  If the studio Masters for the audio track were crappy to begin with, what comes out of the Lossless Bitstream decoding for those during playback will be IDENTICALLY crappy!

Bitstream tracks are indeed the solution to more compact audio -- taking up both less space in storage and less data rate to read from storage.

But they have their OWN problems.

First and foremost, Bitstream tracks can NOT be PROCESSED!

The specialized Digital Signal Processing electronics which process LPCM tracks have no clue what to do with a Bitstream track.  So to process a Bitstream track, it must first be "Decoded" back into a set of LPCM streams.

This is equally true for producing Analog output from a Bitstream.  The DACs also have no idea what to do with a Bitstream.  So the steps for producing Analog audio from a Bitstream are:

  1. "Decode" the Bitstream into a set of LPCM streams, and
  2. Use the DACs to "Convert" each LPCM stream to Analog audio

Of course devices that play Bitstream tracks do this for you automatically.  You just need to keep aware of what's going on under the covers.

Let's take an example.  Blu-ray discs introduced a new feature, called Secondary Audio, which is not found on SD-DVD discs.  The Secondary Audio track is a separate, stereo track which is on disc in parallel with the normal audio.  Studios use Secondary Audio to implement things like Menu Sound Effects (click on a Menu button and you hear a sound) and also certain Picture-In-Picture Commentary tracks.

If you select a disc feature which wants to play Secondary Audio, that audio does not go out from the player as some sort of a separate signal.  To hear the Secondary Audio it must be "mixed" into the normal audio track, by the player, as part of playing that normal audio track.

But mixing is audio "processing", and so it is done in LPCM!

Which means, if you play a Bitstream track while Secondary Audio is active, the player must first decode the Bitstream track into LPCM and then mix the stereo, Secondary audio into that -- producing modified LPCM.

But suppose you have set the player to output Bitstream audio to your Audio/Video Receiver?  Well to do that the player must then RE-Encode the now-mixed LPCM back into a Bitstream.

However, no consumer electronics has the horsepower to produce a Lossless Bitstream, in realtime, in such fashion.  So if you started off listening to a Lossless TrueHD 7.1 track, for example, that will be Decoded into a 7.1 set of LPCM streams, the Secondary Audio will be mixed into the Left Front and Right Front speaker channels of that LPCM set, and then the result will be re-encoded back into a LOSSY Bitstream.  Typically DD 5.1.  Which means you've not only gone from Lossless to Lossy, but your Rear speaker channels have now been "Down-mixed" into your Side speaker channels!

Or, to use the technical term:  BLECH!

So the upshot is, unless you really REALLY want to listen to the stuff that's on disc as Secondary Audio, you should leave Secondary Audio Mixing turned OFF in your Blu-ray disc player whenever you are using Bitstream audio output.

What about LPCM audio output?  Well that depends on the details of your player.  Some players (like those sold by OPPO Digital) were able to do Secondary Audio Mixing at full quality for LPCM output.  That is, the full Lossless Bitstream was decoded into LPCM, and the mixing was done in a way that preserved the original channel count, sample rate, and sample size of that LPCM.  Which meant you could use Secondary Audio Mixing without loss of quality so long as you used LPCM output from the player.  Not all players do this.

One item in that last paragraph has probably confused you:  The part about decoding the "full Lossless Bitstream".  Here's what's up with that.  Dolby TrueHD and DTS-HD MA Bitstream tracks can be transmitted without problem over HDMI connections, but they can not be transmitted over Optical or Coax Digital audio output connections (for example, to older gear that does not have HDMI Input available).  Why?  Partly for technical reasons on signal limits for that style of cabling but MAINLY because that style of cabling does not include the "Copy Protection" the studios demand for these high quality audio tracks.  So every Blu-ray disc which features a Dolby TrueHD or DTS-HD MA track ALSO includes a "compatibility" track for just this contingency.  Typically this is a traditional, Lossy DD 5.1 or DTS 5.1 track.  The compatibility track may not be listed as one of the audio choices for the disc, but it is there nonetheless.  And when you send Bitstream over Optical or Coax connections, the compatibility track is what you get.

And referring back now to the prior paragraph, some players only decode the compatibility track when asked to do Secondary Audio Mixing!  So even if you are using HDMI LPCM output, you may not get the full quality you were expecting from your TrueHD or DTS-HD MA track if you leave Secondary Audio Mixing enabled.  Check the details for your player.

We are not quite done with Bitstream tracks, because the industry has now moved beyond even 5.1 and 7.1 tracks to "immersive" tracks which also include audio for additional "Height" speakers.

Dolby has its version of this called Dolby Atmos, and DTS has its version called DTS:X.

There's lots to talk about with regard to these, but I'll only touch on one piece of that in this post. To wit:  These tracks are built ON TOP OF the Lossless Bitstream tracks just described.

That is, a Dolby Atmos track is, in reality, a Dolby TrueHD 7.1 track along with additional data (I jokingly refer to as "sprinkles") instructing what audio should go to your Height speakers, assuming you have any configured.  Similarly a DTS:X track is, in reality a DTS-HD MA 7.1 track -- plus "sprinkles".

This is important to know for three reasons.  First, these tracks are compatible.  That is, if you don't have a receiver that knows about Atmos or DTS:X, you can still play these tracks, and they will play as TrueHD 7.1 and DTS-HD MA 7.1 respectively.

Second, only the device which is "Decoding" the Bitstream is capable of telling whether or not the "sprinkles" are there -- and only if that Decoding device has been designed to handle Atmos and DTS:X.  At this point, only Audio/Video Receivers do this.  That means your player will identify the track you are playing as TrueHD 7.1 or DTS-HD MA 7.1, even while your AVR reports that the Bitstream coming in from that player is Atmos or DTS:X.  This confusing state of affairs is considered "completely normal" (if not exactly "sane").

And Third, since the Bitstream Decoders in players do not speak either Atmos or DTS:X, if you have the player do your Decoding -- for example if you want to use HDMI LPCM output instead of HDMI Bitstream output -- the "sprinkles" will be lost.  The only way you can get an Atmos or DTS:X track to play *AS* Atmos or DTS:X is to use HDMI Bitstream output (to an AVR that speaks Atmos and DTS:X).

This third point ties into the discussion above about Secondary Audio Mixing.  Why?  Because you'll recall that Secondary Audio Mixing obliges the player to first Decode the Bitstream track, prior to the mixing.  And as just described, that means the "sprinkles" get lost!

So if you want Atmos or DTS:X output to work as intended, you must:

  1. Use HDMI Bitstream output to an AVR that understands Atmos and DTS:X, and
  2. Turn Secondary Audio Mixing OFF in your player!

One final point I'll touch on regarding Bitstream tracks is the question of which device should do the Decoding of those tracks?

If you send HDMI Bitstream to your AVR the AVR will be obliged to do the Decoding.  In the case of Atmos and DTS:X tracks, this is essential.  But for other Bitstream tracks, it is not.

The alternative is that you send HDMI LPCM to your AVR -- or even listen to the Analog audio outputs of your player.  In this case the player will do the Decoding.

So except for the special case of Atmos and DTS:X "immersive" tracks, which device SHOULD you chose to do the Decoding?

The answer is, assuming no bugs in either the player or the AVR, it usually makes no difference.  That is, unlike the sophisticated math used in the Digital Signal Processor functions of digitizing Analog audio to Digital, or converting Digital audio to Analog, the Decoding of Bitstream tracks to LPCM follows a fixed recipe regardless of whether the player of the AVR is doing the job.  That is, the result of Decoding the Bitstream to LPCM is either "right", or there is a bug in the device's software and the result is flat out "wrong".  It is NOT the case that one device which gets it "right" might do a BETTER job of this than another device which also gets it "right".

So you are free to have the player or the AVR do the Decoding of Bitstreams.

(I said "Usually" above because there are rare Bitstream format combos which one device might handle and the other device does not handle.  But this is an added complication you can pretty much ignore.)

To finish, I'll touch on another type of Digital audio format:  DSD (Direct Stream Digital) is an alternative to LPCM.  You'll recall that an LPCM stream was made up of samples that shared a fixed "sample size", typically 16-bits or 24-bits, streamed at a "sample rate" typically in the 10s of thousands of samples per second.

DSD uses an entirely different scheme where each sample consists of just ONE bit, and the sample rates are up in the MILLIONS of samples per second!  We needn't go into the details of how this actually works.

DSD was originally invented as a format for Digitizing audio from Analog magnetic tapes.  It was designed as an "Archive" format which had the special property that you could, in a pretty straightforward fashion, derive from it any other desired Digital audio format, whenever you wanted to extract some music from your archive.

Sony and Philips decided to try turning DSD into a commercial music distribution format, and the result was the Super Audio CD, or SACD, disc.

SACD discs are still being produced commercially, and various companies also sell Digital music media files recorded in DSD.

From the standpoint of our discussion above, DSD acts something like a Bitstream in that -- and this is a key point -- DSD can not be PROCESSED.  If you want to do any Digital audio processing on a DSD stream it must first be converted to LPCM.

Unlike a Bitstream however, it IS possible to find hardware which includes the special DACs which can do Direct-to-Analog Conversion of DSD Digital audio to Analog audio output.  That is, they can take DSD Digital audio input and produce Analog audio output directly.  Devices which don't have such special DACs need to convert DSD to LPCM first.  Then that LPCM can be put through their traditional DACs for conversion to Analog audio output.

In theory, you should not be able to hear a difference between DSD played *AS* DSD and DSD played after first having been converted to LPCM (presuming that conversion is done correctly).  But there are plenty of fans of DSD audio who believe they CAN hear superior results if the audio is left as DSD until it can be directly converted to Analog.

(There are also those who mistakenly believe DSD must be inherently superior because its sample rate is so high -- without having recognized that in the DSD encoding scheme, the high sample rate of mere 1-bit samples is also used to encode dynamic range (volume), not just frequency response.)

The point to keep in mind, though is what I just touched on above.  DSD audio can not be PROCESSED.  So if you want to use ANY sort of Digital audio processing -- such as Crossover, Down-Mixing, Surround Sound Processing, Speaker Distance Adjustment, or Room Correction, you MUST allow your electronics to first convert your DSD audio to LPCM.  Indeed, hardware that offers the special option of DSD Direct-to-Analog Conversion invariably has a user setting (typically defaulted to OFF) which you must enable in order to do that -- because doing so means you are also bypassing all the Digital audio processing in that device.  All you are left with is Volume control.

We've covered a LOT of ground in this post, and it may take several read-throughs for it all to sink in.  That's normal.

Just keep this post in mind for reference when you encounter these topics, and need to brush up on what's really going on!