Understanding DSD to LPCM Conversion, OR SACD "Noise Shaping" for Fun & Profit!

Blu-ray disc with SACD and DSD.png

In my prior post on Digital Audio, I introduced two, "simple", Digital Audio formats:  LPCM (Linear Pulse Code Modulation) and DSD (Direct Stream Digital).  These are "simple" in the sense that each stream of LPCM or DSD contains the audio for just one speaker channel -- as compared to the more complex, Bitstream formats which combine multiple channels into a single stream.

However, there is one huge, practical difference between them.  DSD Digital Audio can not be "processed"!  If you have DSD content, and want to convert it directly to Analog audio for your speakers, without any other format fiddling in between, you most forego all types of Digital Audio processing.  So, no Crossover (bass steering).  No Down-mixing.  No Surround Sound processing.  No Speaker Distance adjustments.  No Room Correction.  NOTHING, except for Volume control.

If you WANT any such processing, you must first convert the DSD Digital Audio into a different Digital Audio format which CAN be processed.  I.e., into LPCM.

Which of course raises the question, "Is that SAFE?"  Can you DO that without screwing up the quality of the DSD original?  Or must you give up quality to gain access to that processing?

The short answer is, Yes, it is safe, given properly engineered gear.  Let's take a deeper look at what's going on!

In the latter half of the last century, Sony had a problem.  They had, quite literally, miles and miles of recording tape representing their catalog of valuable music recordings, dating back several decades.  For practical reasons, not least of all storage space, Sony wanted to digitize all of that -- archive it as Digital Audio files.  The technology for converting Analog Audio on magnetic tape to Digital Audio had finally evolved to the point they felt they could do this without compromising quality.  That is, they could capture, digitally, every nuance of what was on the tapes.

But the PROBLEM was, WHICH digital format to use?  Digital Audio formats were still in their infancy.  Who knew what formats might simply become obsolete in just the next few years -- much less well into the future?  If they picked a format capable of very high bandwidth (very high quality), they could be sure of capturing all of the quality of the tapes, but at the expense of files which might be too large or too expensive to process for practical playback.  If they picked a more compact format -- also cheaper to process for playback -- the marketplace might decide they had sacrificed quality.

In short, if they picked the wrong format, it could be a commercial disaster.  

Then, they had a bright idea:  They realized there was a stage in the digitizer electronics where the Analog Audio was now in Digital form, but BEFORE it had been committed to any particular, Digital Audio output format.  Their cunning plan, then, was why not record THAT as your Digital Audio archive?  You see, the cool thing about this idea is you can later convert THAT Digital archive file to WHATEVER Digital Audio format might be most sellable, at any time in the future!  Ka-ching!

This being Sony, no good idea is really finished until it has been anointed with a clever Marketing name.  In this case:  Direct Stream Digital, or DSD for short.

The digitizing electronics used something called Delta-Sigma Digitization (or Modulation).  This was a result of the rapidly-evolving, joint sciences of Information Theory and Digital Signal Processing -- basically the formal study of how the math actually works for this stuff.  The Delta portion of this meant the digitizer recorded the CHANGE in the Analog signal at each point in time -- rather than its absolute value.  In the simplest sense, this would be a stream of bits -- 1's and 0's.  If the signal was constant max amplitude you would get a set of 1s.  If it was constant minimum amplitude you would get a set of 0's.  Equally alternating 1's and 0's represented the mid-point of zero amplitude.  An imbalance in the density of 1's and 0's showed the signal moving between those points.

The Sigma portion worked to reduce the error inherent in that -- i.e., that a given 1 bit, by itself, doesn't record how MUCH the signal has increased, nor what might have happened in between this step in time and the prior step.  The scheme was to take the stream of 1s and 0s and feed them into a 1-bit Digital to Analog Converter (DAC), to convert them BACK to Analog.  Then add THAT back into the Analog signal going into the digitizer in the first place.  This continuous feedback countered the errors introduced by the original digitizing.  Put the two parts together and you have Delta-Sigma Digitization.

The result was a stream of bits -- 1's and 0's -- representing the Digital version of the original Analog audio.  The technical term for this type of encoding is "Pulse Density Modulation".

And it works REALLY well so long as you have a metric ton of bits flowing along in the stream!  That is, the rate of 1's and 0's per second had to be REALLY high compared to the frequencies found in the Analog Audio to capture that audio cleanly.

In a normal digitizer, this Pulse Density Modulated stream of 1's and 0's would then be passed through a "decimation filter" to produce the desired Digital Audio output format.  For example, a Linear Pulse Code Modulation (LPCM) file containing 16-bit Volume samples, captured 44,100 times every second, for each speaker channel -- which happens to be the form of LPCM Digital Audio found on CD music discs -- one LPCM stream each, for the Left and Right speaker channels.

Sony's DSD scheme, on the other hand, simply archived that raw stream of Pulse Density Modulated 1's and 0's.  Two million, eight hundred twenty two thousand, four hundred of them every second!

Like I said, REALLY high.

If you do the math, you'll discover that 2,822,400 is 64 times the 44,100 per second sampling rate used for the LPCM on CDs.  So this first DSD format has become known as DSD64.

TECHNICAL NOTE:  Over time, people have extended this concept to even higher bit-rate forms of DSD, known as DSD128, DSD256, and even DSD512.  SACD discs only use DSD64.  The higher bit-rate formats can be purchased from outfits that sell Digital Audio music files, if you have the hardware to play them.  The discussion that follows will be focussed on DSD64, but the concepts involved carry over to the higher bit-rate formats just as well.

MARKETING NOTE:  It might have occurred to you "Delta-Sigma Digitization" ALSO has the initials "DSD".  This is, surely, just a coincidence.

DSD is an Archive Format DESIGNED to be Convertible into LPCM at High Quality

It's important to step back a moment and discuss just what DSD is, and isn't, designed to accomplish.  DSD is a Digital Archive format.  Its whole reason for being is to provide a way to archive Analog recordings, digitally, in a way that's not beholden to any given, commercial, Digital Audio format.  The idea is you can take a DSD archive file and readily convert it into ANY, commercial Digital Audio format you like -- such as some flavor of LPCM.  I.e., whichever format might have the most commercial value at the moment.

DSD can ALSO be converted back to Analog!  That is you can take your Digital DSD archive file of a magnetic tape and produce Analog audio from it to record on another magnetic tape -- or to press a vinyl record.  So whether your commercial needs involve digital or analog audio formats, the DSD archives will suffice.

DSD, from its inception, was *NOT* intended to be a studio editing format nor a commercial distribution format!

This goes back to the idea that DSD can not be "processed".  Take editing as the simplest example.  You can't simply cut a portion out of a track, or splice a portion of one take into a poorer quality portion of another take.  Why?  Because DSD's Pulse Density Modulation produces a volume at any given point in time based on the audio that has come PRIOR to that.  Remember?  What's being recorded is the CHANGES from moment to moment.  So you have to know where you were to figure out where you end up next!

Whatever type of Digital Audio processing you might want to do, you run into this same problem.  And that's also what limits DSD as a commercial distribution format:  Because most audio playback setups will need SOME processing, beyond mere Volume control, to produce best results.

Now think about this for a moment.  That SACD disc you just played, or that DSD256 music file you just downloaded -- umm, how did they MAKE that?  How did they mix, edit and process the audio to produce the final result?

The answer is, they didn't use DSD.

Indeed, the only "pure" DSD files out there are ones copied from those old magnetic tapes:  Digital Archives of, e.g., music of the 50s.  And I do mean copied -- no additional studio re-editing or re-mixing allowed!

The workflow for MODERN recordings typically involves multiple different Digital Audio formats between the performance and the final product you can play at home.  DSD, typically has a small role to play in that because each conversion of another Digital Audio format INTO DSD is a source of potential quality loss.  So other Digital Audio formats are used for the editing and mixing.  The final result delivered to you may be DSD -- and even High Quality DSD if the studio knows what it is doing -- but it was never "pure" DSD from the get go.

Indeed the same non-DSD masters which sourced your DSD file (or SACD disc) were likely used to produce other, high bit-rate, audiophile quality, Digital Audio formats of the same music.  Such as high bit-rate, lossless FLAC files.

The point being, there's no "purist" rationale for sending DSD content to your Digital to Analog Converters (DACs) as DSD.  There may be other reasons you want to do that, but forget about the idea that this was DSD from the get go, and so I want to keep it that way.

So why on earth do we have SACD discs and DSD music files, in the first place?

Well back to Sony again -- and Philips Electronics.  After the wild success of their Compact Disc (CD) format -- launched in 1982 -- the two of them were looking towards their next trick.

The music distribution portion of Sony eyed the huge library of DSD archives Sony had amassed and smelled ready money.  Sony, and again, Philips, launched their Super Audio CD, or SACD format in 1999, as the way to cash in.

But wait!  What about all those problems just mentioned?  Well the idea was these discs would be played in specialized players which side-stepped the whole issue of "processing"!  The first SACD player was the Sony SCD-1, built like a tank and costing $5,000 in 1990's dollars, when that was real money.

It played existing, regular CDs and the newfangled SACD discs -- in Stereo only.  But the SACD audio could only be heard on the player's Stereo Analog outputs.  (It also had Optical and Coax S/PDIF Digital Audio outputs, but only CDs would produce audio on those.)

That is, the DSD content on the SACD discs would be converted into Analog in the player; thus eliminating any possibility of Digital processing.  Indeed, the whole idea was SACD players *COULDN'T* produce Digital Audio output.  The argument was there was no Copy Protection available on the high bit-rate Digital Audio outputs of the day to protect these crown jewels of the Sony music archives.  (Keep in mind this was before HDMI and its HDCP Copy Protection protocol.)  But really this just kept the whole concept of "processing" DSD in abeyance -- for a while.  The DSD was stuck inside the player -- no way to get it out to anything else!

The first real problem came with the advent of multi-channel SACD music -- 5.1 tracks on SACD discs -- containing audio for 5 regular speakers and a subwoofer.  Well what if you don't have a full complement of 5.1 speakers?  And what if your regular speakers need the support of the subwoofer for handling bass authored in their channels?  DSD can not be processed for down-mix if you don't have the full set of speakers.  And DSD can not be processed for Crossover if you want to steer bass from a regular speaker channel to your subwoofer.

The generation of newer SACD players which could play 5.1 tracks side-stepped the down-mixing problem.  If you don't have 5.1 speakers then play the 2.0 music track on your SACD discs, instead!

But the subwoofer issue was a quandary.  Sony's solution was to define the ".1" channel as a "Subwoofer" channel rather than as the Low Frequency Effects (LFE) channel which had already been defined for use with DD 5.1 and DTS 5.1 movie tracks for SD-DVDs.  What's the difference?  In the movie tracks, each speaker channel carries its own bass, and the ".1" channel carries additional LOUD bass -- the Low Frequency Effects.  If you want to do Crossover processing to steer bass from the main speakers to the subwoofer, that has to be mixed into the LFE bass -- and level matched accordingly.  This gets into the basic issue of Subwoofer Boost which I've mentioned in previous posts.  I.e., dealing with the headroom built into the LFE channel so that it can safely carry LOUD bass.  LFE is recorded -10dB down from the regular speaker channels and that has to be corrected on playback so that the LFE content is matched in level with those other speakers.  And if you are steering bass from the main speakers into that LFE output, that steered bass has to be REDUCED in volume to match the LFE content, which then gets BOOSTED after output as part of playback.

But in Sony's view, the ".1" channel on a 5.1 SACD disc would be a PRE-MIXED Subwoofer channel -- already carrying the bass which would otherwise be in the regular speaker channels.  And so it would be recorded at the same level as those regular speaker channels!

This has caused no end of confusion ever since, as people tried to accommodate BOTH ways of handling the ".1" content channels in just ONE audio system setup!  It got so bad some studios producing licensed SACD disc simply ignored Sony and recorded their ".1" channel -10dB down just as if it was a movie LFE channel.  (That REALLY caused confusion!)

Other studios simply ignored the ".1" channel altogether when making SACD discs.  That is, the 5.1 track on their SACD discs is actually 5.0 content.  ALL of the bass is in the regular speaker channels and the ".1" channel is silent!  Of course this REALLY puts the burden on the user to have full range speakers, or to address how to do Crossover processing with this SACD / DSD content!

Then, with the advent of HDMI cabling -- along with HDCP Copy Protection -- SACD playback got licensed for DIGITAL Audio output.  Now you had the issue of what would be sent out on that HDMI cable.  Would it be DSD, or LPCM?

In addition, you were now getting SACD playback licensed for movie disc players, many of which also offered their own forms of Digital Audio processing to better enable their multi-channel Analog outputs.

Now ALL of the Digital to Analog Converters (DACs) -- both the ones found in such players and in Audio Video Receivers (AVRs) -- could handle conversion of LPCM Digital Audio to Analog Audio output.

SOME players and SOME AVRs included fancier DACs which could also convert DSD Digital Audio directly to Analog Audio output.  But to USE that feature -- to use DSD-Direct-to-Analog Conversion -- meant foregoing any forms of Digital Audio processing!  So a player or AVR which offered that option also had to include a user setting to enable DSD-Direct-to-Analog Conversion.  I.e., the user had to make a choice to give up Digital Audio processing to get that.

Meanwhile some AVRs which accepted DSD on their HDMI Inputs, did NOT have the special DACs to allow DSD-Direct-to-Analog Conversion.  Those AVRs simply converted HDMI DSD Input to LPCM as the first step upon input.  They didn't need the user setting to enable DSD-Direct-to-Analog Conversion, because they weren't actually capable of that.  But including HDMI DSD as an allowed input format this way made for an easy "check off" item for folks looking for the possibility of sending DSD, but not looking hard enough to learn what the AVR actually did with it!  As I said above -- Ka-ching!

SACDs never achieved the huge market success CDs had enjoyed, but they are still out there in quantity, and new SACD discs are still being made -- primarily for the audiophile market.

In addition, the advent of Internet downloading and streaming of Digital music files has given new impetus to DSD as a file format, separate from SACD discs.  Most of the services that sell high quality music files will include one or more bit-rates of DSD as a download option.

So the issue remains:  Should you leave your DSD content as DSD or should you convert it to LPCM to enable processing?  This is a thornier problem for folks wanting to play 5.1 DSD tracks (as from SACD discs), since if they forego processing -- i.e., if they use DSD-Direct-to-Analog Conversion -- they have to handle their playback configuration issues some other way.  So you have to have at least a 5.1 speaker setup -- since you can't do down-mixing to make up for a missing speaker.  And the speakers and subwoofer should be equidistant since you can't do speaker distance adjustment.  And your speakers should be capable of full frequency range playback since you can't do crossover processing to steer bass to a subwoofer.  And the acoustics in your room have to be good enough you don't need to employ Room Correction processing.

Folks playing Stereo DSD content into a Stereo speaker setup have an easier time of it since there's no issue of down-mixing, nor of crossover (since there's no subwoofer included), and speaker distance adjustment is usually not a requirement.

But still, if you are GOING to convert your DSD to LPCM to enable processing, does that mean you'll forego quality?

Again I refer you back to the historical discussion above:  DSD was DESIGNED to be converted to other Digital Audio formats -- particularly LPCM!

So lets look at the particulars.

DSD64, the original DSD format and the only one found on SACD discs, has an effective dynamic range of 20-bits of LPCM.  So as long as you are converting to LPCM of 24-bits per sample or higher, you are fine.  The same holds true for the higher bit-rate DSD formats.  Simply put, 24-bits of dynamic range is A LOT.  You will be hard pressed to find real world music that pushes that.

TECHNICAL NOTE:  The best DACs operate at HIGHER than 24-bits, but this primarily insures they can implement Volume control -- as part of their conversion to Analog audio -- without compromising the dynamic range of the output.

The commonly accepted limit for perceivable dynamic range -- i.e., the limits of human hearing when trying to distinguish between the softest and loudest sounds -- is also around 20-bits.

What about frequency range?

From my prior posts you may recall discussion of the Nyquist Limit, which is a math result from the sciences of Information Theory and Signal Processing stating that a Digital Audio stream can only accurately capture Analog Audio frequencies up to 1/2 of its sampling rate.

CDs sample at 44.1 kHz, meaning they can accurately render frequencies up to a little over 22kHz.  The commonly accepted upper limit for human hearing is about 20kHz (and lower as you get older).  So CDs should be fine for that.

Nevertheless, numerous audiophiles report higher sampling rates sound better!  But this is extraordinarily hard to pin down.  In favor of higher sampling rates is the fact that digital audio processing involved in the production or playback of CDs may be more audible at only 44.1kHz.  In addition, there's some evidence folks can hear colorization of sounds caused by higher frequency harmonics which are not separately audible.

Arguing against it is the reality that higher bit-rate music files are often simply recorded better to begin with!  A better performance, or better equipment, or more highly skilled recording engineers.  I.e., they sound better, but that doesn't really have anything to do with their higher sampling rate.

In any event, DSD64 is capable of recording audio up above 100kHz!  So your conversion to LPCM would have to be able to handle that, right?  Say LPCM 192kHz 24-bit for example?  Or even THAT might not be enough!

Turns out this is WRONG.

The reason is, the higher frequencies in DSD64 are full of noise!  Oodles of it.

The culprit is what's called "Quantization Error".  The 1-bit per sample digitization which underlies DSD is massively subject to Quantization Error.  Simply put, the fact that each sample can only change the current result by 1 bit or not, combined with the fact that there's no way to record what's happening in the Analog audio BETWEEN those 1-bit samples, means the digital conversion has built-in errors.

I want to emphasize that this is IN THE MATH.  That is, it has nothing to do with the quality of the recording gear used or the skill of the recording engineers.  Quantization Error is simply going to happen with DSD.  And what's worse, it happens in the audible frequencies!  That is, the Noise Floor becomes too high -- meaning the usable dynamic range of the recording becomes too low -- in the range from 20Hz to 20kHz.  Which is just the audible audio you want to get right!

So, umm, if that's true, how come SACD music sounds so darn good?

It's "Noise Shaping" to the rescue!  I already alluded to this in my discussion of Delta-Sigma Digitization above.  By carefully adding current error BACK into the Analog signal going into the digitizer, you ADD Noise to the digitized result.  But -- and this is the clever bit -- you end up moving ALL the Noise much higher in frequency!

In DSD64, this "feedback loop Noise Shaping" takes the bulk of the Quantization Error Noise out of the audible range below 20kHz and moves it up mostly above 50 kHz!

The Noise Floor in the audible frequencies (below 20kHz) is dramatically reduced -- meaning the dynamic range gets up to that 20-bit value I mentioned above.  But the Noise gets dumped into those higher frequencies.  It also gets much worse.  That is the Noise you've moved up there is now higher in volume than the Noise you removed from the Digital Audio below 20kHz.  You really have added additional Noise, but you've also moved it out of the way!

At least, out of the way as far as human ears are concerned.  However, your audio electronics and high frequency speaker elements -- the Tweeters -- are NOT human ears.  And they really don't want to be subjected to that kind of Noise!

So, the PRACTICAL limit of frequency for DSD64 is something below 50kHz.  You don't WANT DSD64 to retain frequencies above that because, "Here There be Exaggerated Noise!"

The DACs which support DSD-Direct-to-Analog Conversion handle this as part of their design.  That is they include a 50kHz filter in the Conversion.  Whether that direct conversion is being done in your player (for output on its Analog outs) or in your AVR (based on HDMI DSD input from your player), you are covered.

But what if you convert DSD64 to LPCM?  That LPCM is going to go to your AVR to be converted to Analog.  Even if your AVR happens to have the fancier style of DACs which include knowledge of how to convert DSD to Analog directly, the AVR has no way of knowing the LPCM coming in on its HDMI Input originated as a DSD64 Digital Audio stream!

And so the AVR can't know to include the 50 kHz protection filter.

In this case the protection has to be included in the player, as part of its conversion of DSD64 to LPCM for output to the AVR.

And the most convenient way to do this is to take advantage of the Nyquist Limit mentioned above!  So for example, if you convert DSD64 to LPCM at an 88.2 kHz sample rate, the frequencies captured in that LPCM stream will be limited, automatically to no higher than 44.1 kHz -- 1/2 the sample rate.

Well, hmmm, can't we get a little bit closer?  What if we use a 96 kHz sample rate for the LPCM?  That would get us up to 48 kHz in frequency -- but still under 50 kHz.  Wouldn't that be better?

Turns out the answer is, No.

Music for home theater is recorded at sample rates which are either multiples of 44.1 kHz (as in CDs) of 48 kHz ( as in most movies).  Even multiples of those base sampling rates also work just fine.  So 44.1 kHz, or 88.2 kHz, or 176.4 kHz, etc.  Or 48 kHz, or 96 kHz, or 192 kHz, etc.

But ODD multiples, or switching from one of these sets to the other, produces problems.  Think of them as rounding errors.  (It's really just another type of Quantization Error.)

DSD64, you'll recall has a sampling rate with is 64 times 44.1 kHz.  And so its conversion to LPCM should be to one of the values in the 44.1 kHz list.

So the sweet spot for conversion of DSD64 to LPCM is to use LPCM 88.2 kHz 24-bit.

The higher bit-rate versions of DSD also have Quantization Error, and also use Noise Shaping to tame it.  But the exaggerated Noise that produces ends up even higher in frequency, and so they can be paired with higher sample-rate versions of LPCM.  If your player (or AVR) plays these higher bit-rate versions of DSD, it should handle this automatically for you whenever you select to have that DSD converted to LPCM.

So which way is better?

If you are so fortunate to have a speaker setup and playback environment which does not NEED Digital Audio Processing, and if you have taken the time to think through what that really means, should you stick with DSD-Direct-to-Analog Conversion when playing your DSD content?

The answer is, with well engineered gear it SHOULDN'T make a difference.  That is the audible results should be the same.  Of course you can try it both ways and see for yourself!

If you DO hear a difference there are several possible explanations.  First your gear may have a problem in how it handles LPCM audio or, for that matter, DSD audio.  Second, you may actually still have some Digital Audio Processing enabled.  Perhaps you've forgotten about it, or perhaps your gear does it under the covers.  Thus the LPCM version -- which uses that processing -- may sound different (either better or worse).  Third the way your gear converts LPCM or DSD to Analog may result in a small Volume difference between them.  As mentioned in one of my prior posts, the ear and brain have a bias towards hearing "louder" as "better".  

The tougher situation is if your setup really DOES need the benefits of Digital Audio Processing.  For example if your speakers really need Crossover processing so that their low frequency output can be supported by your Subwoofer.  The natural answer in that case would be, of course, to convert DSD to LPCM so you can access that Processing.  But what if DSD-Direct-to-Analog Conversion sounds better, anyway, when you try it?  That suggests there's a problem in either the setup or implementation of the Processing you are using with LPCM.

The bottom line of course is to use the playback method which sounds best to YOU!  But don't feel you HAVE TO keep DSD content as DSD to preserve quality.  In a properly engineered and configured playback environment, conversion of DSD to LPCM will work just fine for you!