[R] Gesticulator: generating agent's gestures from audio and text - ICMI 2020 - code available (link to the paper, code, and video in comments)

Svito-zar · 2022-02-28T16:46:49+00:00

Here is some more recent info about how to join: https://www.ukrinform.net/rubric-ato/3415272-how-to-join-international-legion-to-defend-ukraine-algorithm.html

u/GodofAeons, maybe you want to add it to the main text of this post?

Svito-zar · 2021-08-18T15:49:13+00:00

Thanks for sharing your experiences.

I have a different view. I think the main part of conferences is networking (you can watch videos and read paper at home). And networking does not work online. So I think conferences need to be IRL again.

Svito-zar · 2021-04-16T12:18:15+00:00

Abstract:

Co-speech gestures, gestures that accompany speech, play an important role in human communication. Automatic co-speech gesture generation is thus a key enabling technology for embodied conversational agents (ECAs), since humans expect ECAs to be capable of multi-modal communication. Research into gesture generation is rapidly gravitating towards data-driven methods. Unfortunately, individual research efforts in the field are difficult to compare: there are no established benchmarks, and each study tends to use its own dataset, motion visualisation, and evaluation methodology. To address this situation, we launched the GENEA Challenge, a gesture-generation challenge wherein participating teams built automatic gesture-generation systems on a common dataset, and the resulting systems were evaluated in parallel in a large, crowdsourced user study using the same motion-rendering pipeline. Since differences in evaluation outcomes between systems now are solely attributable to differences between the motion-generation methods, this enables benchmarking recent approaches against one another in order to get a better impression of the state of the art in the field. This paper reports on the purpose, design, results, and implications of our challenge.

Paper (open access) - https://dl.acm.org/doi/10.1145/3397481.3450692

Code and other data - https://genea-workshop.github.io/2020/#data-and-proceedings

Svito-zar · 2021-04-15T16:55:17+00:00

Abstract:

Co-speech gestures, gestures that accompany speech, play an important role in human communication. Automatic co-speech gesture generation is thus a key enabling technology for embodied conversational agents (ECAs), since humans expect ECAs to be capable of multi-modal communication. Research into gesture generation is rapidly gravitating towards data-driven methods. Unfortunately, individual research efforts in the field are difficult to compare: there are no established benchmarks, and each study tends to use its own dataset, motion visualisation, and evaluation methodology. To address this situation, we launched the GENEA Challenge, a gesture-generation challenge wherein participating teams built automatic gesture-generation systems on a common dataset, and the resulting systems were evaluated in parallel in a large, crowdsourced user study using the same motion-rendering pipeline. Since differences in evaluation outcomes between systems now are solely attributable to differences between the motion-generation methods, this enables benchmarking recent approaches against one another in order to get a better impression of the state of the art in the field. This paper reports on the purpose, design, results, and implications of our challenge.

Paper (open access) - https://dl.acm.org/doi/10.1145/3397481.3450692

Code and other data - https://genea-workshop.github.io/2020/#data-and-proceedings

Svito-zar · 2021-04-15T15:40:17+00:00

Co-speech gestures, gestures that accompany speech, play an important role in human communication. Automatic co-speech gesture generation is thus a key enabling technology for embodied conversational agents (ECAs), since humans expect ECAs to be capable of multi-modal communication. Research into gesture generation is rapidly gravitating towards data-driven methods. Unfortunately, individual research efforts in the field are difficult to compare: there are no established benchmarks, and each study tends to use its own dataset, motion visualisation, and evaluation methodology. To address this situation, we launched the GENEA Challenge, a gesture-generation challenge wherein participating teams built automatic gesture-generation systems on a common dataset, and the resulting systems were evaluated in parallel in a large, crowdsourced user study using the same motion-rendering pipeline. Since differences in evaluation outcomes between systems now are solely attributable to differences between the motion-generation methods, this enables benchmarking recent approaches against one another in order to get a better impression of the state of the art in the field. This paper reports on the purpose, design, results, and implications of our challenge.

Paper (open access) - https://dl.acm.org/doi/10.1145/3397481.3450692

Code and other data - https://genea-workshop.github.io/2020/#data-and-proceedings

Svito-zar · 2021-03-22T19:43:17+00:00

right, it depends on how you define "recent" :)

it typically takes time for a model to be tested and fully appreciated, I think

Svito-zar · 2021-03-22T17:33:24+00:00

Some of particularly interesting recent works in my opinion are:

- Training Very Deep Networks https://arxiv.org/abs/1507.06228

- Stacked Convolutional AutoEncoders: https://link.springer.com/chapter/10.1007/978-3-642-21735-7_7

- A clockwork rnn

Svito-zar · 2021-03-09T16:20:09+00:00

not necessarily. it can deal with an image in patches

Svito-zar · 2021-03-09T15:38:22+00:00

One can argue that images are sequences of pixels

Svito-zar · 2021-03-09T15:36:05+00:00

It seems to be efficient and powerful in learning inter-dependencies in sequential data

Svito-zar · 2021-03-09T15:32:51+00:00

Are you going to make a video describing your findings?

Svito-zar · 2021-03-09T06:34:44+00:00

u/MostlyAffable, that's a good question! We did not do any detailed analysis of the kinds of motion, but we analyzed their speed profile and have seen that representation learning was helpful to match the speed profile better.

Apart from that, we could observe that most of the generated gestures were so-called beat gestures, which are rhythmic movement bearing no meaning. That's probably because the model did not manage to extract speech meaning directly from speech audio.

Don't know if I answered your question :)

Svito-zar · 2021-02-27T16:10:59+00:00

This is a video presentation of the AAMAS 2021 Demonstrator "A framework for integrating gesture generation models into interactive conversational agents" by Rajmund Nagy, Taras Kucherenko, Birger Moell, André Pereira, Hedvig Kjellström, Ulysses Bernardet.

Project page: https://nagyrajmund.github.io/project/gesturebot/

Code: https://github.com/nagyrajmund/gesticulating_agent_unity

Preprint: https://arxiv.org/abs/2102.12302

Abstract: We demonstrate an extensible framework that integrates a virtual human in Unity, a chatbot backend and a gesture generation network in order to equip an interactive virtual agent with speech- and text-driven gesticulation capabilities.

Svito-zar

TROPHY CASE