Lip Sync, OR Why Are Her Lips Still Moving After She's Stopped Talking?

Certain flaws in your Home Theater viewing are bound to take you out of the moment -- flaws just too annoying to ignore.  And Audio out of sync with Video is certainly one of them!

We've all seen examples of poorly "dubbed" movies, where the lip-sync error is so humongous it's comical.  But errors even more subtle will still leave you with the irritating feeling that something is just, OFF.  And once that sync error is corrected, there's that Ahhh! moment as you settle in and realize this is finally RIGHT!

In this post we will talk about Lip Sync errors -- where they come from, and what you can do about them.

The first thing you need to know is that it is commonplace for there to be "Inherent Sync Errors" in movies and TV shows.

There are many ways such errors can occur in the content you watch -- some of them likely dating back to the original Theatrical Release.  The errors may have crept in during the original filming, during the editing, during RE-recording of audio elements (because the original audio turned out to be unusable), etc., etc.  For example, the imagery from one take might be combined (in editing) with the audio from ANOTHER take for some reason.  As careful as the actors might be, the lip sync will not match.

And the thing about such Inherent Sync Errors is that they usually vary from scene to scene!  So it is pointless trying to correct for them during viewing.  You'd spend all your time correcting!

FORTUNATELY, the vast majority of such Inherent Sync Errors are SMALL.  Whether Audio is early or late compared to Video, the difference is small enough for the brain to "not see" the problem.  Indeed the human brain is really quite good at "not seeing" sync errors.  It WANTS things to be in sync.  As it turns out, this will be one of the things you have to watch out for when adjusting the sync in your system (as I discuss below).  You need to focus hard to spot there's correction still needed, because your brain is going to try to tell you the sync is ALREADY good enough.

So, you do NOT try to adjust your system to correct for the Inherent Sync Errors in whatever you happen to be watching at the moment.  Of course every now and again you'll run across a movie that has gross sync error in some scenes -- perhaps even to the point of that comical dubbing I mentioned above.  My advice to you is to just live with that.  Relax, and realize this is just the way the film was made.  Perhaps remind yourself not to buy such poorly made films in the future!


But if you are not trying to adjust for Inherent Errors in the content, what ARE you trying to adjust for?

The answer is, you are trying to set the correct amount of  "delay" for your Audio to compensate for the amount of time it takes your equipment to process the VIDEO you are playing.

Video processing is complicated.  The more you ask your video gear to do, the more complicated it gets!  And the newest, high resolution video -- such as with UHD (4K) content -- takes extra time to process simply because there's so MUCH data to contend with in each and every frame.

The processing horsepower in new TVs is dramatically greater than in older TVs.  But what those processors are being asked to do has also grown dramatically.

Audio, too, has gotten more complex.  Think of the new, "immersive" audio formats such as Dolby Atmos or DTS:X.  But there's simply no comparison with the amount of DATA that needs to be processed in Audio vs. Video.  

So from the standpoint of A/V Sync, you can think of Audio processing as instantaneous -- REGARDLESS of the audio format you are playing, and regardless of what you are asking your gear to do in the way of value added processing on top of that Audio format.

Crossover processing?  Piece of cake.  Down-mixing?  No sweat.  Surround Sound processing to raise fewer channels of content to more speakers of output?  Not tough enough.  Even full blown digital Room Response Correction?  Yep, even that.  Audio processing does take some time, of course, but the amount of time is so SMALL compared to video processing that there's simply no point even considering it.

Now, the amount of time it takes to process VIDEO can vary significantly depending on what you've asked your gear to do.  Consider what happens inside your TV, for example.  There's a lot more data crunching needed if you ask the TV to smooth out motion, or to adjust the image dynamically by analyzing the contents in each frame.

Rather than have every little nuance of your combo of requested Video features changing the processing time, TV makers, for example, typically block out CHUNKS of time.  As long as all the processing fits within the allocated chunk, you can depend on that known, reserved chunk of time being the only processing time you need to consider.

And the most useful time chunk for Video processing is the time it takes for a single frame of Video.  That is, a TV (for example) can buffer a frame or two of Video -- holding them in memory while they are processed -- and then release them in time to capture the next frame in that memory buffer.

If you consider Film Frame Rate content -- 24 frames per second -- you end up with a time per frame of 41 2/3 ms -- milliseconds (thousandths of a second).  Let's round that down to 40 ms.

Thus it is typical for a TV to try to get *ALL* its Video processing done in 40 ms -- or 80 ms.  Suppose the particular combo of processing you've asked for only takes 65 ms?  Well the TV will likely still hold things for 80 ms just to keep things "simple".  But if the processing takes 85 ms?  The TV will likely bump things up by another Frame time -- to 120 ms.  (OR it will cut corners on that combo of Video processing requests!)

For Video Frame Rate content -- 60 frames per second -- the per Frame time is 16 2/3 ms.  The TV MIGHT adjust its buffering time, but it may need extra Frame times to get its processing done.

Note that the way the math works here, 2 Frame times of 24 fps content equals 5 Frame times of 60 fps content -- both are 83 1/3 ms.  So if you use THAT as your reserved Chunk of time for Video processing you can have a set number of full Frames for both frame rates!  Of course that means you need to be able to get all the processing done for FIVE Video Frame Rate frames in the same amount of time you've allowed for processing only TWO Film Frame Rate frames.

(Which is why you may discover TVs only offer some of their processing when fed the slower, Film Frame Rate content!)

Whatever the case, your goal is to discover how much you need to delay your Audio so that it matches up with the processing time your gear has reserved for Video.  Note that this can get complicated because your source device (perhaps a disc player or a cable or satellite TV box) may be doing some Video processing of its own, your Audio Video Receiver (AVR) may be doing some video processing, too, and of course your TV is going to be doing a pile of video processing.

And each piece of gear may also be contributing some Audio delay!  For example your source may or may not be delaying its audio output to be in correct sync with its video output.  And your AVR may or may not be doing the same.

The big culprit in Video processing time -- your TV -- can NOT help in this regard.  Why?  Because in a typical Home Theater setup the Audio is not going through the TV!  The Audio is instead being handled by your AVR.  So your TV isn't even in the signal path for the Audio and can't delay it.

Of course if you are playing content INTERNAL to the TV -- such as from its built-in Channel tuner or a built-in Internet streaming app like Netflix -- then the TV *IS* the source for both the Video and the Audio.  And the TV can then delay the Audio present on its own outputs -- whether for its built in speakers or for audio going to your AVR -- to keep things in sync.  This is both a blessing -- for its simplest use, the TV can produce its own, correct, lip sync -- and a problem, because if it is sending audio through your AVR you don't want the AVR adding ADDITIONAL delay over and above what that TV has already done!


Let's pause for a moment and talk about humans.

Typical adjustments for Audio/Video Sync will be in 10 ms steps -- or perhaps 5 ms.  Some devices go way overboard and allow adjustment in single millisecond steps!

Now think about that.  As stated above, one Frame time for Film Frame Rate content is a little over 40 ms.  That means a 10 ms adjustment amounts to ONE QUARTER of one Frame time.

There are no pictures in between.  You get this Frame, and then, 40 ms later, you get the next Frame.  Nothing in between.

It *IS* possible to interpolate Frames in your mind.  You may truly see that an event -- such as one thing striking another thing -- has not quite happened by THIS Frame but has already happened by the NEXT Frame.  So the actual striking has happened someplace in between.  And thus the 10 ms adjustments.

However, truly, it is a rare human who can accurately "see" sync error within 10 ms.  And quite candidly, NOBODY can see it within 1 ms.  You'd need electronic gear to check sync that precisely, and there's no point in doing so in terms of practical adjustment for a Home Theater setup.


And let's pause for another moment to talk about DELAY vs. SPEEDING UP of Audio or Video.

It may be something of an Ah Hah! moment for some readers, but if you think about it, it should be obvious that you CAN'T Speed Up either Audio or Video.  You can't make the signal come out your outputs any sooner than it arrives on your inputs.

Now some devices -- such as the Blu-ray and UHD Disc players sold by OPPO Digital -- offer the ability to set Negative A/V Sync adjustment values:  Sync delay LESS THAN 0.  But what's really going on in such devices is that they have a built-in, baseline amount of Audio delay to compensate for their own, internal video processing.  All the Negative adjustment values are doing is reducing that -- which makes Audio come out a bit sooner than video on the player's outputs simply because the internal Video processing time in the player has no longer been fully compensated.

So in reality, all you have to work with is Delay.  Fortunately Delaying (buffering) Audio is fairly inexpensive in terms of the needed memory and control circuits.  Far less expensive than what TVs have to do to buffer Frames of Video as described above!


Dealing With "The Helpful AVR Problem"!

From the discussion above it should be obvious that the USUAL Lip Sync Error you will see is that Audio is ahead of Video.  She stopped talking, but her lips are still moving!  That's because Audio has not yet been delayed sufficiently to compensate for the combined Video processing time in your gear.

But every now and again someone will complain that their Video is ahead of their Audio!  They need to slow up the Video -- or speed up the Audio!  And as just mentioned, that's not something you'll be able to do except in very limited ways such as in those OPPO players.

So what the heck's going on!

The USUAL explanation turns out to be what I call, "The Helpful AVR Problem".  These people are typically using TWO HDMI cabling paths:  One for Video and one for Audio.  For example, they have one HDMI cable going from their source device directly to their TV for Video, and another HDMI cable going from that same source device to the AVR for Audio.

It is becoming more and more common to find source devices -- such as disc players -- which offer dual HDMI outputs like this.  The Marketing guys like to try implying this is so you can get improved Audio on the "Audio-only" HDMI output.  But in reality the REASON for including dual outputs is, quite simply, the technology is moving too fast for the owners of (pretty expensive) AVRs to keep up!

So for example, they may have a TV which can accept 3D video, but their AVR will not pass 3D video.  Or they may have a TV which can accept UHD (4K) Video, but again, their AVR will not pass that.  Or they may have an AVR which WILL pass UHD Video, but not the latest *FLAVORS* of UHD Video -- such as with one of the competing schemes for encoding High Dynamic Range into the Video!

Now the MAKERS of the AVRs have an easy answer for this.  Buy a new AVR!

But the owners sometimes balk at replacing their current AVR when it is only, perhaps, a year old and working just FINE, thank you.

So dual HDMI cabling lets them keep using their current AVR while also sending the latest, whizbang Video formats directly to their new TV!

And here's the problem:  As mentioned in my previous posts on Digital Audio and Digital Video, there IS NO SUCH THING as an Audio-only HDMI signal.  HDMI Audio is only ever found EMBEDDED inside an HDMI Video signal.  Always!

So the HDMI connection carrying Audio to your AVR is ALSO carrying Video to that AVR.  The Video may be particularly simple -- perhaps a constant black frame -- but it is there nonetheless.

Which means, if you think about it, that the AVR does not *KNOW* it is being bypassed for Video!

And so -- to "Be Helpful" -- the AVR may be adding a chunk of Audio delay on its own, EVEN IF you have it set to add NO delay, in order to compensate for its own, internal, Video processing time.  And THIS is what makes your Audio late -- too MUCH Audio delay -- Video shows ahead of Audio.

If you get bitten by "The Helpful AVR Problem", there are a couple things to try.  Basically, what you are trying to do is convince the AVR that it has no Video Processing to do (and thus has no need to add that extra, unwanted Audio delay).  Some AVRs will have a setting that lets you shut off their HDMI output -- which of course means there is no Video going out from the AVR.

But the more common workaround is to set the AVR to "HDMI Video Pass-through".  This is a setting which basically tells the AVR to bypass any of its internal Video processing -- to send video, unaltered, from its inputs to its HDMI output.


Understanding Automatic Lip Sync (A/V Sync) Adjustment

This is another major source of confusion.  Some years ago, the powers that be added Automatic Lip Sync as an optional feature in the HDMI specifications, and it was promptly oversold by those equipment makers who decided to implement it.

First of all, there is nothing fancy going on here.  There is no analysis of the Audio and Video streams, for example, to constantly correct for any errors.  In particular, there is nothing in this feature to address the Inherent Sync Errors in movies and shows which I mentioned at the top of this post.

So what's going on?  Well, what's going on is that the HDMI spec was modified to allow the AVR to inquire of the TV what amount of Audio delay the TV thought would be a good idea.  The AVR (which otherwise has no knowledge of what's going on inside the TV for Video processing) could then blithely implement that amount of Audio delay and consider it a job well done!

The PROBLEM is that the values being sent out by the TVs were often WRONG!  Why?  Because the TVs were typically implementing this as a fixed number, without regard either to the feature settings the user had enabled in the TV or to the differences in the TVs own processing time based on the different Video formats which might be sent to the TV.

In addition, this Automatic Adjustment did nothing to correct for sync problems that already existed in the content when it arrived at the inputs of the AVR - perhaps due to settings made in the source device.

And finally, there's the problem if people are using dual HDMI cabling (as described above).  The AVR is getting its answer from an entirely different HDMI Input of the TV, which is not receiving the real Video stream, and might be set to entirely different Video processing settings!

Although Automatic Lip Sync Adjustment may work in some cases, my recommendation is that you are usually better off setting your Audio delay manually -- after checking for yourself what amount of delay works best.


OK, now we are getting to the meat of the matter, which is how to adjust your A/V Sync settings correctly!

As mentioned in my prior post on Calibration Discs, you do NOT want to try doing this using real world movie or TV show content.  The reason is that "Inherent Sync Error" I discussed at the top of this post.

You may have movie A and movie B, both have which have a small amount of Inherent Sync Error -- for the most part, small enough to "not see".  But either of them may have a scene where the Inherent Error just creeps up to a large enough amount that you CAN see it if you focus hard enough and fight your brain's desire to "see" stuff as in sync.

And so you reach for you Remote and start fiddling with the A/V Sync adjustment to make that error go away.  And indeed, even this larger error may be small enough, and in the same direction as the errors in the rest of that movie, that you can find a setting which makes the whole movie look good!  There's still residual error, but you've reduced all of it to be with the range that's small enough to "not see".  So, time to declare Victory, right?

But now you go play the OTHER movie and there's sync errors all over the place!  It looks way worse than before you started fiddling with the adjustment for the first movie!  What's going on?

Simple:  The errors in the first movie are in the OPPOSITE DIRECTION to the errors in the second movie!  So by adjusting sync for the first movie, you've now made the errors in the second movie WORSE.  You've raised all of those Inherent Errors in the second movie ABOVE the point where your brain can successfully "not see" them!

Instead you want to adjust A/V Sync using content of known correctness for sync.  And that means using a Calibration Disc.  In my post on Calibration Discs, I recommended "Spears and Munsil, 2nd Edition", Blu-ray, as a disc that had an effective test chart for checking A/V Sync.

It actually has several charts.  For example you can use it to check for correct sync both for Film Frame Rate content (24 frames per second) and Video Frame Rate content (60 frames per second), and you can select the audio output format as well.

But whichever chart you are using, what you are looking to do is adjust Audio Delay to compensate for the true Video processing time in your gear.

If you get this right, you will STILL not have zero sync error on most ANY of the content you watch.  There will still be the Inherent Sync Error in each movie or show -- varying scene by scene as I described up above.

But, and this is the crucial point, you won't be adding any ADDITIONAL error to that due to improper compensation for Video processing time!  And so, if the Inherent Errors in your content are small enough to "not see" to begin with, they will REMAIN small enough to "not see".

AND, this will be true WHICHEVER direction those Inherent Errors happen to take!

Now having done this, you may still spot the occasional scene which has an Inherent Error large enough to catch your eye.  So be it.  That's just the way that film was made.  But for the vast majority of your content, things will look correct.


The actual details of using the A/V Sync calibration chart to adjust your Audio delay setting are important, too.

First of all, remember my admonition above that the brain WANTS to see sync as "good enough". So you need to focus, to make sure you are giving your brain a chance to tell you what's REALLY going on -- not what it would like to believe is going on.

Do this by approaching the correct A/V Sync setting from BOTH sides:  Too much Audio delay and too little Audio delay.  And back off occasionally to a setting which is sufficiently WRONG that your brain will absolutely show you the sync is incorrect.

The more you play with your A/V Sync calibration chart (whichever disc you use for that purpose) the easier this will become.  You will sense when your brain is fooling you, and back off to an incorrect setting to get it back in the game!

The second thing to consider is my admonition above that TVs, in particular, may take DIFFERENT amounts of processing time according to the settings you have enable in them.

The only way to know for sure is to TRY the different settings and see what happens.

For example, I use a TV from LG Electronics which offers a form of "motion smoothing" they call TruMotion.  I'm not a fan of it, and would prefer to leave it off in my TV.

But with all of the various "enhancement" video features also turned off in the TV that leaves me with a quandary!  This particular TV takes MORE time to process /24 video (Film Frame Rate) than /60 video (Video Frame Rate)!

After griping about this for a while, I found a solution:  I enable the TruMotion processing but leave it set to DO NOTHING!  (There's a selection which allows the user to control how much processing it does, and I use that selection with the sliders set all the way off.)  And although this does not alter the video in the way I don't like, it has the beneficial side effect of causing the TV to take the SAME amount of Video processing time for both /24 and /60 content!

Now this is just one case example, and the situation in YOUR TV or projector may be very different.  The point is, you need to check the combos of video formats you are sending to the TV and the processing settings you want to enable in the TV to see what actually happens in terms of Video processing time -- and thus the Audio delay you need to apply to get correct Lip Sync.

Yes, it takes a little time to check this stuff, but it is WELL worth it to not feel like everything you are watching is one of those badly dubbed foreign films!

--Bob