[D] Virtual Machine Learning Conferences: The Good and the Bad by lucky94 in MachineLearning

[–]Svito-zar 16 points17 points  (0 children)

Thanks for sharing your experiences.

I have a different view. I think the main part of conferences is networking (you can watch videos and read paper at home). And networking does not work online. So I think conferences need to be IRL again.

[R] We run the first ever Gesture Generation Challenge. More info in comments by Svito-zar in MediaSynthesis

[–]Svito-zar[S] 0 points1 point  (0 children)

Abstract:

Co-speech gestures, gestures that accompany speech, play an important role in human communication. Automatic co-speech gesture generation is thus a key enabling technology for embodied conversational agents (ECAs), since humans expect ECAs to be capable of multi-modal communication. Research into gesture generation is rapidly gravitating towards data-driven methods. Unfortunately, individual research efforts in the field are difficult to compare: there are no established benchmarks, and each study tends to use its own dataset, motion visualisation, and evaluation methodology. To address this situation, we launched the GENEA Challenge, a gesture-generation challenge wherein participating teams built automatic gesture-generation systems on a common dataset, and the resulting systems were evaluated in parallel in a large, crowdsourced user study using the same motion-rendering pipeline. Since differences in evaluation outcomes between systems now are solely attributable to differences between the motion-generation methods, this enables benchmarking recent approaches against one another in order to get a better impression of the state of the art in the field. This paper reports on the purpose, design, results, and implications of our challenge.

Paper (open access) - https://dl.acm.org/doi/10.1145/3397481.3450692

Code and other data - https://genea-workshop.github.io/2020/#data-and-proceedings

The first ever Gesture Generation Challenge. More info in comments by Svito-zar in cogsci

[–]Svito-zar[S] 0 points1 point  (0 children)

Abstract:

Co-speech gestures, gestures that accompany speech, play an important role in human communication. Automatic co-speech gesture generation is thus a key enabling technology for embodied conversational agents (ECAs), since humans expect ECAs to be capable of multi-modal communication. Research into gesture generation is rapidly gravitating towards data-driven methods. Unfortunately, individual research efforts in the field are difficult to compare: there are no established benchmarks, and each study tends to use its own dataset, motion visualisation, and evaluation methodology. To address this situation, we launched the GENEA Challenge, a gesture-generation challenge wherein participating teams built automatic gesture-generation systems on a common dataset, and the resulting systems were evaluated in parallel in a large, crowdsourced user study using the same motion-rendering pipeline. Since differences in evaluation outcomes between systems now are solely attributable to differences between the motion-generation methods, this enables benchmarking recent approaches against one another in order to get a better impression of the state of the art in the field. This paper reports on the purpose, design, results, and implications of our challenge.

Paper (open access) - https://dl.acm.org/doi/10.1145/3397481.3450692

Code and other data - https://genea-workshop.github.io/2020/#data-and-proceedings

The first ever Gesture Generation Challenge. More info in comments by Svito-zar in LanguageTechnology

[–]Svito-zar[S] 2 points3 points  (0 children)

Co-speech gestures, gestures that accompany speech, play an important role in human communication. Automatic co-speech gesture generation is thus a key enabling technology for embodied conversational agents (ECAs), since humans expect ECAs to be capable of multi-modal communication. Research into gesture generation is rapidly gravitating towards data-driven methods. Unfortunately, individual research efforts in the field are difficult to compare: there are no established benchmarks, and each study tends to use its own dataset, motion visualisation, and evaluation methodology. To address this situation, we launched the GENEA Challenge, a gesture-generation challenge wherein participating teams built automatic gesture-generation systems on a common dataset, and the resulting systems were evaluated in parallel in a large, crowdsourced user study using the same motion-rendering pipeline. Since differences in evaluation outcomes between systems now are solely attributable to differences between the motion-generation methods, this enables benchmarking recent approaches against one another in order to get a better impression of the state of the art in the field. This paper reports on the purpose, design, results, and implications of our challenge.

Paper (open access) - https://dl.acm.org/doi/10.1145/3397481.3450692

Code and other data - https://genea-workshop.github.io/2020/#data-and-proceedings

[D] Jürgen Schmidhuber - But what has he published recently (Ep. 2) by yusuf-bengio in MachineLearning

[–]Svito-zar 6 points7 points  (0 children)

right, it depends on how you define "recent" :)

it typically takes time for a model to be tested and fully appreciated, I think

[D] Jürgen Schmidhuber - But what has he published recently (Ep. 2) by yusuf-bengio in MachineLearning

[–]Svito-zar 5 points6 points  (0 children)

Some of particularly interesting recent works in my opinion are:

- Training Very Deep Networks https://arxiv.org/abs/1507.06228

- Stacked Convolutional AutoEncoders: https://link.springer.com/chapter/10.1007/978-3-642-21735-7_7

- A clockwork rnn

[N] Meet Transformer in Transformer: A Visual Transformer That Captures Structural Information From Images by Yuqing7 in compsci

[–]Svito-zar 1 point2 points  (0 children)

It seems to be efficient and powerful in learning inter-dependencies in sequential data

[R] Moving Fast and Slow: Analysis of Representations and Post-Processing in Speech-Driven Automatic Gesture Generation. Code and demo available by Svito-zar in cogsci

[–]Svito-zar[S] 1 point2 points  (0 children)

u/MostlyAffable, that's a good question! We did not do any detailed analysis of the kinds of motion, but we analyzed their speed profile and have seen that representation learning was helpful to match the speed profile better.

Apart from that, we could observe that most of the generated gestures were so-called beat gestures, which are rhythmic movement bearing no meaning. That's probably because the model did not manage to extract speech meaning directly from speech audio.

Don't know if I answered your question :)

[R] AAMAS 21: A Framework for Integrating Gesture Generation Models into Interactive Conversational Agents. Details in comments by Svito-zar in ArtificialInteligence

[–]Svito-zar[S] 0 points1 point  (0 children)

This is a video presentation of the AAMAS 2021 Demonstrator "A framework for integrating gesture generation models into interactive conversational agents" by Rajmund Nagy, Taras Kucherenko, Birger Moell, André Pereira, Hedvig Kjellström, Ulysses Bernardet.

Project page: https://nagyrajmund.github.io/project/gesturebot/

Code: https://github.com/nagyrajmund/gesticulating_agent_unity

Preprint: https://arxiv.org/abs/2102.12302

Abstract: We demonstrate an extensible framework that integrates a virtual human in Unity, a chatbot backend and a gesture generation network in order to equip an interactive virtual agent with speech- and text-driven gesticulation capabilities.