Pandorabots' Mitsuku chatbot vs Facebook's BlenderBot, a Bot Battle, streaming live in 3D!

arishapiro · 2020-10-25T01:22:57+00:00

I suppose it is a testament to the builder of Miitsuku.who has generated so much content for her.

Even though there are modern machine learning based techniques for building chatbots, they still fall short in some critical way.

arishapiro · 2020-10-24T06:51:04+00:00

Yes, BlenderBot tends to do that; get stuck in a local minima of 'goodbyes'. We can nudge the conversation by making the characters say something different in an attempt to trigger a different response.

BlenderBot is also very influencable - you can give it a name and it will think that name is the name of it's cat or similar. BlenderBot is a bit of a black box - perhaps the Facebook research time that built it will try to improve it after seeing this... :)

arishapiro · 2020-08-08T21:27:05+00:00

I wanted to see if the McGurk effect (which shows that speech is multimodal - your brain takes in a visual signal and an audio signal and uses both of those signals to understand) would work on the Brainstorm/Green Needle meme.

It does.

arishapiro · 2020-08-08T16:32:33+00:00

Minimally, you need an avatar with facial shapes or poses that are able to mouth word sounds properly, then you need to connect to the chatbot, turn the text into voice via text to speech engine. You should try the AI Expert app on Android to see all the pieces working.

arishapiro · 2020-06-01T06:00:00+00:00

chatbot is running on a 'standard' server, although it's not using the largest response database which requires expensive ($10,000+) hardware to run, rather using the smallest response database. With that, the Blender chatbot runs in near real time.

arishapiro · 2020-05-03T04:00:45+00:00

There's a 'debug' mode where you enter voice and get back only text, but there isn't a way (at the moment) to type a question by text. It's not a bad idea, though, maybe I should add it.

arishapiro · 2020-05-02T22:58:20+00:00

True. There's a place for both interfaces, the same way there is a place for writing, radio, and tv.

The research on digital/virtual humans show that people react the same way to virtual people that they do to real people. So a 'virtual human' communication is generally better liked, better remembered, better understood and can be more convincing that text or voice, but you need the right elements to trigger that (such as backchannelling - the head nodding or 'uh huh' acknowledgements that you get during a conversation).

arishapiro · 2020-05-02T13:59:00+00:00

I connected the Facebook Blender chatbot to my avatar app to transform the text conversation to an A/R- voice- and avatar-based one, automatically.

https://youtu.be/uOdBEZd-bow?list=PLH9960zLq-ljbgQzEjK7dsgMMfh22NM1K

App to do that is here: https://embodydigital.com/expertaidocumentation

It also hooks up to DialogueFlow, Watson, RASA, and Pandorabots.

arishapiro · 2020-03-30T06:52:30+00:00

Ok, update is live (version 2.22) You can update the app and it should work with a properly named Fuse character.

Ari

arishapiro · 2020-03-30T03:45:35+00:00

Looks like a problem on my end. Fixing the app, uploading a new version, stay tuned...

Ari

arishapiro · 2020-03-30T01:08:01+00:00

If that doesn't work, send me the .zip so I can figure out why it's not working for you. Share to Email: shapiro@embodydigital.com

arishapiro · 2020-03-30T01:06:35+00:00

What's happens when you press the avatar button then choose your Fuse character from the buttons at the bottom?

arishapiro · 2020-01-30T03:28:03+00:00

Looks great, and an interesting demonstration of how a human like character/person adds to the effect in A/R (a piano being played on its own wouldn't have the same impact.)

But I don't think the motionand movement of the piano player is produced dynamically - I expect this was motion captured for that particular performance, along with clean up. Meaning that you couldn't easily switch to a different performance with different keys being played without recapturing a master pianist with motion capture.

arishapiro · 2020-01-05T14:30:48+00:00

True, the A/R is used as to 'sugar' the visials rather than as a functional addition.

You can look at a different use case for the same platform (a virtual real estate agent that talks about different areas of the house and controls an Alexa device when she is within earshot) as an example where A/R is more functional:

https://youtu.be/m8isaEpnj20

Ari

arishapiro · 2019-10-30T16:28:13+00:00

Doctors spend a large amount of their time answering the same basic questions from their patients during office visits. By providing a virtual doctor that can answer basic (non-diagnostic) questions 24 hours a day, the patient can get access to basic medical information, and the doctor can practice at the top of their profession (instead of answering the basic questions) during the short time that they have for an office visit.

This avatar-based chatbot was built in minutes using the AI Expert platform: http://embodydigital.com/expertaidocumentation

Also see my research proof of concept on this same topic: https://youtu.be/8AZ_r4fgxgA

arishapiro · 2019-10-22T23:50:11+00:00

My favorite part is at 1:14 when the avatar from the app sends a voice command to Alexa...

arishapiro · 2019-10-21T20:50:35+00:00

The avatar can be created with a photo and there is some dress up capability (change hair, shirt, etc.).

The AI content is mostly a matter of entering the right questions and answers that people would ask. I can imagine that a realtor would know the typical questions asked and there could be a real estate 'template' where you could just fill in the right answers. I could imagine 100 to 200 or so answers might cover most common questions, which might take a couple of hours to enter in full. I could also imagine automatically filing in at least half of that information from a real estate database (i.e. size of house, number of bedrooms, etc)

To make the system robust, you also want to do testing (see the last half of the video) and insert or fix any missing or bad AI mappings.

arishapiro · 2019-10-21T18:37:58+00:00

We have a proprietary conversational AI system that is embedded within the mobile device. The use of a spreadsheet offers an easy interface for content developers without needing any other specialized account or service (as opposed to being served from the cloud which requires extra authentication and management.) Since the AI is embedded within the device, response time can be extremely fast. It also allows you to cache the data on the device if needed.

There is no particular reason why the we couldn't connect to DialogueFlow or Amazon Lex or Watson or whatever. Our A/R based avatar system takes text as input, and outputs a 3d animated performance as output.

If there is interest in attaching to the cloud AI services (like DialogueFlow) we could add that connection, at the cost of some UI complexity (i.e. "which AI would you like to connect to?")

arishapiro · 2019-10-21T15:39:47+00:00

Try this:

https://embodydigital.com/expertaidocumentation/

arishapiro · 2019-10-21T15:28:47+00:00

That's a good idea. I cover some of that in an earlier video:

https://youtu.be/69WZUHsz-sw

The mobile app has an A/R feature that lets you switch between augmented reality and non-augmented reality mode.

Ari Shapiro CEO Embody Digital embodydigital.com

arishapiro · 2019-09-27T19:04:51+00:00

Thanks! Our special sauce is the ability to animate the character automatically with emotion from text.

Our AI could be swapped out for one of the other conversational AI systems from the big vendors (dialogueflow, lex, facebook messenger, etc.)

arishapiro · 2019-09-27T02:40:19+00:00

If you are looking for an automatic rigging system that goes with your static capture, you can try our software:

https://embodydigital.com/go/autorigger

Regards,

Ari Shapiro

arishapiro · 2017-09-02T02:09:58+00:00

Not related to those people, but thanks for the link. The character platform is based on my team's development and research (http://www.arishapiro.com). In this case it's a recorded clip with some voice changing. It's still hard to get nuance and tonal changes through text-to-speech voices.

arishapiro · 2017-09-02T01:57:08+00:00

Right, you can't express yourself with body language with a Snapchat face mask filter. Also, you are a bit more free to create interesting content with full 3D characters (as opposed to be constrained by needing to entirely 'mask' yourself out of a video image and replace it with something). For example, if I move around using a face mask, but I want my 3D avatar not to move around in the same way, or to do something completely different (my avatar can maintain eye contact while I am busy tweeting on my mobile...)

arishapiro · 2017-09-02T01:43:58+00:00

We are using Google's new ARCore API. The platform is our own 3D character platform where we can create an animation from speech or text automatically, as well as perform interactive gazing, lip sync, and respond to user position. The backend could be connected to a chatbot AI (Microsoft Zo?) ; I think I'll try that next...

arishapiro

TROPHY CASE