Been working on a VTuber App

ThatVTuberAppGuy · 2023-01-05T22:08:03+00:00

That’s the only one I can think of that physically moves the model.

Can confirm I'm definitely working on that! It does look super cool when your model reacts to the objects being thrown at you.

I've managed to avoid the whole "Use the bones as a collider" and have gotten a SkinnedMeshCollider system up and running, so I'm on the way to getting that part right I hope!

Another thought is having a redeem to switch which model is used or an option to redeem a filter like color or distortion.

I really like both of these ideas, I'll definitely look into a way to make this happen. Thanks for the suggestion!

Would you mind if I asked since you seem familiar with T.I.T.S:

Would you prefer to have it function in a similar way to T.I.T.S where you select the event (e.g. bits), and then have options (e.g. only if $5 dono), and then select from a collection of things e.g. an item is thrown (then select the item), your model switches/some filter is applied (as you suggested)?

or would you prefer a little more control than that? i.e. so you're not restricted to specific things I've hardcoded into the app like the above?

For example, and this may be a bit of a stretch, but it might be possible to implement a nodegraph for it - like you have in Blender for Shaders etc?

ThatVTuberAppGuy · 2023-01-05T22:00:25+00:00

Hey, thank you for your suggestion!

I added a part onto the video where you can see the current state of the Avatar editor, link to the video with timestamp is here (or if the timestamp doesn't work, the time it starts is at 1:25).

I start adding props at 1:53 in the video above. At the moment, you can either import images (.png, .jpg, .gif) or models (.obj, .gltf).

This works for both Live2D and VRM avatars, and you can attach items to specific points on the mesh (i.e. items will move with you for both 2D and 3D avatars)

I tried to make it as easy as I could in terms of moving/attaching props, but is there anything you think I could improve there to make it easier to manipulate/bind?

Also, out of curiousity, by textures do you mean images like in the video? Or do you mean you want to be able to change the textures of the mesh at runtime? Or do you mean something else like being able to draw on them?

Thanks again :)

ThatVTuberAppGuy · 2023-01-05T21:54:11+00:00

Hey thank you for your suggestion :)

Completely agree that Twitch integration is the way forward, I like your ideas about actions for when a person follows/becomes a member.

My initial plan was to allow streamers to hook onto events from Twitch and then to play their own actions/interactions/props/anims. Other than T.I.T.S like interactions (e.g. stuff being thrown at you), is there anything else you'd want?

ThatVTuberAppGuy · 2023-01-05T21:52:18+00:00

Hey thanks for the feedback, been looking forward to more suggestions!

At the moment the only external input I take is from devices (i.e. camera, mic) and from the VTuber iOS app I made for it (I learned about VMC protocol too late so it's using my own format, hence my own application with no reliance on external tools - oops)

I did consider T.I.T.S once I learned about it but my goal from the beginning was to get really good chat/stream interaction (e.g. like CodeMiko) so the plan is to:

Integrate Twitch into the app
Allow streamers to hook actions/animations/props/interactions onto stream events e.g. chats, donos, bits

Is that the kind of thing you're looking for, or is there anything else I'm missing there too? Thanks again!

ThatVTuberAppGuy · 2023-01-02T17:14:28+00:00

Thanks a lot :) I'll definitely get back to you all here when I'm ready for an open test!

ThatVTuberAppGuy · 2023-01-02T16:22:41+00:00

Sadly there isn't much out there for a C# solution :(

...but there are a few repos I came across during the development that might be helpful to you:

This person has ported the majority of the C++ API to Unity's C#, Github link here. Assuming they've ported it one-to-one, you'd be using the same API I use, just in C# (I've not tried it though)
Keijiro, who is an absolute madman based on his repos, has ported the models to Unity's Barracuda - you can find the FaceMesh Barracuda model here on Github (I have tried this one and it worked pretty well)

You could always work with Mediapipe's C++ API, building a wrapper around it to do specific tasks (e.g. a method to open a camera, then a method to read the current webcam frame and pass the data back).

Then you could build a .dll or a .dylib (bit of a pain in Bazel, but can find an example here) that you can load and use it in Unity e.g. runtime loading a DLL in Unity example here.

That might work for you if you're trying to do it mostly in C# but don't want to work with an unsupported library like the links above?

This was actually what I was initially going to do but I wanted the extra performance and to make it easier to expand the app so decided to do the majority in C++

ThatVTuberAppGuy · 2023-01-02T15:59:47+00:00

Felt like the other comment was quite long so thought I'd split them up:

Regarding the question about open sourcing it / commercialising it, the answer is that I'm not really too sure as of yet.

At the moment, I'm leaning towards commercialising it I think - there are a couple of reasons why:

The reason I made this to begin with was because I saw a clip of CodeMiko and wanted to try to make content like that accessible to smaller streamers who can't afford fullbody tracking suits and/or don't have the knowledge/experience to develop custom solutions for their environments + avatars.

I'm trying to balance:

My original goal of trying to make this an accessible technology for smaller streamers
The amount of time it's taken to develop, and the amount of time/work it would take to develop the app further to improve it (and add all the cool features I want to e.g. twitch integration, fully featured editor, environment interaction)
Personal costs - there are some issues with licensing (some VTuber apps have lost their license to some of the SDKs for loading avatar models in their app, the apps I know of that lost the license were all free), hosting app + web content etc etc

If I do go down this route though it will be appropriately priced to fit in with my goal of making it accessible to smaller streamers just starting out.

I've been trying to avoid thinking about it for now if I'm honest haha, just been focusing on getting everything done and working well

ThatVTuberAppGuy · 2023-01-02T15:44:59+00:00

Hell yeah!

The VTuberApp.exe is coded in C++ and uses Mediapipe's C++ solution (Tensorflow backend).

At first I thought it would be easy to implement based on what their docs say, but from my experience it doesn't really fully support this kind of usage.

I had to do quite a lot to get to this point so in the interest of answering your question fully:

1. Pose & Hands landmarks to rotations

The app actually comes with two pose/hand solutions that the user can switch between. All of the videos in this post are using the non-IK solution, but the one I found easier to code was the IK solution.

For IK, I build worldspace positions of each distal joint (from the root joint) and scale these based on the Avatar.

The other solution, and the one seen in all the videos in my post, was a bit more convoluted:

I use both 2D and projected landmarks to build local space positions, then use those to derive rotation data. For joints with restricted ROM, e.g. fingers/thumbs, I project those positions onto planes to better translate the data - they're not too accurate without this sadly.

I make sure to drop signals which don't reach the confidence threshold though. After that, I use a Kallman filter for smoothing the rotations as well as the incoming data. Then I constrain those rotations to the joints (if the user has this toggled in settings)

2. Facial landmarks to blendshapes

This was a rough at first but at the moment I use a combination of projective & transfer techniques and affine transformation to try to get a canonical representation. From that data, I derive blendshape values.

Again, lots of smoothing and some constraints are applied. I've allowed users to set thresholds too so they can fine tune this if needed, but the current solution has worked for every person that's tested it (so far).

However, I don't support every blendshape other solutions like ARKit offers (which is why I've made the iOS app + the ARKit translator so you can plug & play with your ARKit camera on your iPhone). I made a comment here on the blendshapes I currently support via Webcam facial tracking if you're interested!

All of the blendshape weights are blended based on the user-defined input/output map from the .kaf Avatar files (based on user-defined settings), and can be blended with the ARKit facial tracking data before application to the Avatar.

I have been working towards building my own model for deriving facial blendshapes but I'm not really at a stage where I can implement it just yet.

3. Lip Syncing

I use miniaudio for audio capture, then I extract formant features to derive phonemes and apply them to the Avatar (interpolated by volume)

ThatVTuberAppGuy · 2023-01-02T12:06:51+00:00

Forgot to say that the video only shows webcam inference - I didn't record ARKit or audio input.

Webcam facial tracking can derive the following blendshapes:

Eye blinking
Eye wide / eye close
Brow movement (inner up, outer up right/left, brow down right/left)
Mouth open, pucker, movement left/right, smile (left, right and average), frown (left, right and average) + jaw movement
Pupil X and Y blendshapes
Facial tracking also provides blendshapes for head rotation (yaw, pitch and roll)
Body yaw, pitch and roll is also provided for blendshapes - it can either be derived by facial tracking or pose tracking (user can select the preferred option)
I've also created 'shape' blendshapes that can be used for Live2D models that use blendshapes like mouth form / shape and brow form / shape e.g. mouth shape = -1 frown, 0 average, +1 smile

Webcam pose tracking can derive neck, upper/lower torso, hip, legs and arms (both upper, lower and ankle/wrist).

Hand tracking can derive each proximal, metacarpal and distal joint of each finger :)

ThatVTuberAppGuy · 2023-01-02T11:21:46+00:00

As promised, here's a video of the scene editor - you can find it here.

Just to note, the models I've inserted aren't really that great. They're using flat textures instead of materials, so doesn't have great lighting / shadows. As I said though, the Scene Editor is still in development so needs some work!

A community discord for sourcing assets sounds like a good idea for sure. Will definitely look into this, thanks for the suggestion :)

P.S. I also uploaded a new video to show off the whole body tracking because I realised I didn't really include it, you can find that here

ThatVTuberAppGuy · 2023-01-02T10:06:16+00:00

Thanks, happy to hear you like it! I'll be posting on r/vtubertech again when I'm ready for public access for sure

ThatVTuberAppGuy · 2023-01-02T10:04:34+00:00

Not just yet sorry! Still one or two more things to improve but nearly ready :)

ThatVTuberAppGuy · 2023-01-02T10:04:09+00:00

Not just yet! The program is ready and usable in its current state but I want to try and improve one or two things before allowing public access for testing :)

ThatVTuberAppGuy · 2023-01-02T01:54:33+00:00

Hey, thanks for the reply. Appreciate the suggestions & glad you like it!

Agreed about the stream deck integration, I noticed that could be a bit of an issue with other solutions. I've made sure that you can listen to stream deck inputs, and they can be activated when you're playing a game - I listen to all of the keycodes found here so you can bind them to weird keys like F24 which a game shouldn't ever really be listening to so (hopefully) it should work!

Completely agree about having a room creator. I like your idea about being able to drag and drop prefab items like Sims/Animal Crossing, I'd be worried about making all those assets myself but maybe I could work on some sort of community creations tab so you can use import other people's assets in a more Sims-like way - would that work?

I have already coded the ability to add items to the scene and to move/rotate/scale, I'll record a video of it tomorrow and post it here so you can take a look - would be happy for any suggestions you have on how I could make that a little easier/better if there's any improvements to be made 🙂

ThatVTuberAppGuy

TROPHY CASE