lipsync2: Talking Face Generation with Most Accurate Lip Synchronization


research and development work conducted at BHuman AI

by


model architecture of lipsync2

The Bigger Question!

Why do we call it lipsync2, where's the first one if this is the second...? The answer is that this is our second model in the series of lip synchronization models. The first one was a great success, but it had its limitations, which we've overcome in this version. We've made it more accurate, more robust, and more efficient.

The Result

Starting on an immersive experience with our lip synchronization model, where we set new standards in the creation of videos that generate faces talking with accurate lip movements. lipsync2 is our answer to the challenges left unsolved in previous works. The first big win is getting the synchronization of lips and speech just right. It's a game-changer because this is the key in making the characters in the videos convincing. If the lip movements are off, the whole video feels fake; isn't it?. Therefore, that is what we have managed to reduce. However, the magic of lipsync2 does not end with lip-syncing. What makes it extra special is its ability to understand and replicate the full range of facial movements from the original video. It is not just about getting the words right; it is about preserving the complete expressions and personality. This attention to detail means the end product is not just a tedious avatar but a true reflection of the individual's unique way of speaking and expressions.

What's more!

Well, lipsync2 is a good performer when it comes to long audio sequences. Previously, models, especially ours, would lack when the audio track was too lengthy, messing up the sync and disrupting the video realism. That problem is now in our history, lipsync2 handles long signals of speech with ease, keeping everything in sync and maintaining a smooth, natural feel throughout But there is even more to it. This project wasn't just about fixing bugs or an increment to the previous; it was about taking the whole work to a new level. We looked at all the feedback from the first version and used it to make something not just better, but innovative. We are talking about a tool that transforms the way we produce and perceive digital and personalized communication, making it more vivid and enjoyable.

Some Results

BibTex

@article{
        lipsync2-bhumanai, 
        title={lipsync2: Talking Face Generation with Most Accurate Lip Synchronization}, 
        author={Taneem Ullah Jan}, 
        year={2023}
      }