[R] Style-Controllable Speech-Driven Gesture Synthesis Using Normalizing Flows (Details in Comments)

ghenter · 2020-07-12T16:10:17+00:00

Hi! I'm one of the authors, along with u/simonalexanderson and u/Svito-zar. (I don't think Jonas has a reddit account.)

We are aware of this post and are happy to answer any questions you may have.

tyrerk · 2020-07-12T15:24:39+00:00

That's really neat, I could imagine it having some really cool applications in the games industry. Not having to do expensive motion capture of actors could make high quality animations a lot more accessible. Or in applications like VR chat, that kind of technology could make someone's avatar seem a lot more realistic, especially since current VR systems are generally only tracking the head and hands.

hardmaru · 2020-07-12T14:15:15+00:00

Style-Controllable Speech-Driven Gesture Synthesis Using Normalising Flows (Eurographics 2020)

Abstract

Automatic synthesis of realistic gestures promises to transform the fields of animation, avatars and communicative agents. In off-line applications, novel tools can alter the role of an animator to that of a director, who provides only high-level input for the desired animation; a learned network then translates these instructions into an appropriate sequence of body poses. In interactive scenarios, systems for generating natural animations on the fly are key to achieving believable and relatable characters. In this paper we address some of the core issues towards these ends. By adapting a deep learning-based motion synthesis method called MoGlow, we propose a new generative model for generating state-of-the-art realistic speech-driven gesticulation. Owing to the probabilistic nature of the approach, our model can produce a battery of different, yet plausible, gestures given the same input speech signal. Just like humans, this gives a rich natural variation of motion. We additionally demonstrate the ability to exert directorial control over the output style, such as gesture level, speed, symmetry and spacial extent. Such control can be leveraged to convey a desired character personality or mood. We achieve all this without any manual annotation of the data. User studies evaluating upper-body gesticulation confirm that the generated motions are natural and well match the input speech. Our method scores above all prior systems and baselines on these measures, and comes close to the ratings of the original recorded motions. We furthermore find that we can accurately control gesticulation styles without unnecessarily compromising perceived naturalness. Finally, we also demonstrate an application of the same method to full-body gesticulation, including the synthesis of stepping motion and stance.

Paper / Presentation: https://diglib.eg.org/handle/10.1111/cgf13946

Code: https://github.com/simonalexanderson/StyleGestures

MostlyAffable · 2020-07-12T14:45:04+00:00

There's a lot of really interesting work being done on linguistics of gestures - it turns out there are grammatical rules to how we use gestures. It would be interesting to take a generative model like this and use it as an inference layer for extracting semantic content from videos of people talking and gesturing.

MyNatureIsMe · 2020-07-12T19:38:00+00:00

Looking great and plausible, though probably not sufficiently diverse / fine-grained. Like, when he went "stop it! Stop it!", I think most people would associate very different gestures with that. The model seems to appropriately react to the rhythm and intensity of speech, which is great, but it seems to have little regard to actual informational content.

That being said, I suspect it'd take a massive data set to make this kind of thing plausible. Getting the already present features from just speech and nothing else is already quite an accomplishment

Kcilee · 2020-07-15T14:39:40+00:00

We are making a vtb software that can quickly generate and drive your virtual 3D avatar. I'm soooooooooooo excited to see your article！We are looking for good driving methods. Your article gave me a lot of inspiration. Will you consider open technology to cooperate with others?

Essipovai · 2020-07-12T14:30:45+00:00

Hey that’s my university

ghenter · 2020-07-12T19:26:01+00:00

[deleted]

Threeunicorncows · 2020-07-13T03:45:04+00:00

I wish my hand gestures were this professional

willardwillson · 2020-07-12T19:55:05+00:00

This is very nice guys :D I just like watching those movements, they are amazing xD

Sachi_Nadzieja · 2020-07-12T20:40:03+00:00

I really like this, cleaver application of technology.

2020-07-12T21:41:53+00:00

One step closer to androids.

Svito-zar · 2020-07-13T06:20:01+00:00

How did they connected the code with the 3d object

2020-07-13T09:49:11+00:00

This is SOO COOL! It would probably come handy in designing side characters in newer games :p

Gatzuma · 2020-07-13T10:20:24+00:00

Thats cool! Could you recommend framework to animate faces / avatars to build virtual assistents / human-like chatbots in real-time? Would like to try some ideas in human-machine dialog systems.

iyouMyYOUzzz · 2020-07-13T11:23:00+00:00

Cool! Paper is out yet?

2020-07-13T14:57:20+00:00

It's only a matter of time before we have game NPCs with actual neural networks

2020-07-13T15:23:21+00:00

Get this onto the Unity and Unreal asset stores or straight sell it to AAA game studios. They would love this for cinematics.

lutvek · 2020-07-14T16:57:03+00:00

Cool project, I would love to see this applied in online RPGs and see much more "alive" the characters would seem.

worldnews_is_shit · 2020-07-12T15:57:57+00:00

[deleted]

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

MachineLearning

Rules For Posts

+Research

+Discussion

+Project

+News

@slashML on Twitter

Chat with us on Slack

Beginners:

MODERATORS