Monday, July 6, 2009

Anime Studio 6 Auto Lip Sync

We'll be posting a full video tutorial on how we do our lip sync for the series (not hard but does take some time to get right) but in the meantime I thought I'd comment on the "automatic" lip sync function in Anime Studio 6.

The bottom line: it's practically worthless. Other than perhaps using it for a word or two it won't work for anything you have any pride in (and even then you're better off just using the respond to .WAV function).

It was clear this was the case from even a casual test of the feature, but in order to be fair I did some scientific tests and here are the results for those interested in the details (for those of you who don't understand lip sync, just understand it's not worth using).

I took a .WAV clip from the web (from the movie "As Good As It Gets") as being pretty typical of the sound files some hobbyist might work with. Better audio might yield better results but since it's based on typed text I kind of doubt it -- that is to say, if you type the text in, the auto sync should be generating phonemes on that text and matching it up to the audio, not generating phonemes from the audio. But perhaps it's doing both or even worse, using the audio only to generate the phonemes and then the text to try and line things up (rather than the other way around as Papagayo does, although it does it by asking you to line things up manually).

For comparison I put the text in Papagayo and manually lined up the generated phonemes. PG does an excellent job at generating the appropriate phonemes from the typed text. You can (and indeed, must) at times change what you type to match up to the audio you are hearing -- a character might say "all" instead of "oil" but as long as what you type is what it sounds like you are in good shape. In this case the lines are simple and clear enough to understand "How do you write women so well? "I think of a man and take away reason and accountability" so I did not have to change anything I typed.

For fair purposes I treated this as one character -- one thing the auto lip sync in AS doesn't do is allow for multiple characters with one audio file, unlike Papagayo, but it would be unfair to penalize it for that since that's not what it's intended to do. Then I took the XSheet from PG and matched it up with the keys generated by AS (reading the saved anme file). Here are the actual stats.

PG generated 56 phonemes and AS generated 89. It might be thought that AS generated a lot of dupes but that was not really the case. However, what was very obvious was that AS misread the track on the majority of words.

Example: "How". PG broke it down into two phonemes, "etc". and "O" and this is visibly accurate. AS took the same word and made it into four phonemes -- "etc", "rest", "MBP" and "F", which isn't even close (and looks very odd, as do most of these misreads). "do" was etc and U in PG, "E" and "O" in AS (not awful but no where near as convincing). They agreed upon "you" as "U" but "write" came out as "etc', "AI" , "etc" in PG (looks fine) and the very odd "U", "WQ", "U", "L" in AS. One more in detail: "women" is (in PG) "WQ", "AI", "MBP", "AI" and "etc" (as always, spot on) while AS didn't even come close with the bizarre "E", "L", "F", "MBP" and "F" (well, they did get ONE right).

And so it goes, on and on. The AS auto sync has the mouth opening and closing correctly most of the time (but not even that can be taken for granted) but the phonemes it generates are so bizarre and off that I have no idea how it is coming up with them.

I have tried the auto sync with much better audio files -- such as the ones we use in our weekly series, which are digital audio recorded at better then CD quality -- and the results were equally off. So I don't think the quality of the audio is the problem here.

To be fair, there isn't anything on the market that can auto sync properly -- and there are some programs that purport to do just that. One, indeed, costs a lot of money and does an even worse job than AS. I don't think this is Mike's fault (although I do wish there were an option to force AS to use the phonemes as generated by PG and match those up with the audio. I suspect that would be much better but that's just a guess. If I have time someday I may try programming this.)

I'd be glad to supply the files for anyone who wants to compare and/or try this experiment for themselves, but for me it is compelling -- AS auto lip sync isn't ready for prime time or much of anything else.

That shouldn't deter most folks seeking excellent lip sync -- Papagayo (the freeware that Mike wrote to work in conjunction with Anime Studio) is easy to use and does a superb job as noted above. Just like in all other forms of animation, you can't do it automagically but if everyone could do this they wouldn't pay us the big bucks .

1 comment:

  1. Hi Mike,

    I'd like to take a look at your test files if I could. Could you email the audio and sync results to lipsync(at)

    I'll let you know what I find out after checking out your files.


    (A different) Mike