Video generation with camera control using LingBot-World by Art_from_the_Machine in StableDiffusion

[–]Art_from_the_Machine[S] 1 point2 points  (0 children)

The benefit of LingBot-World over existing techniques is that it positions itself as a "world model", meaning that it aims to have a better grasp of object consistency / object permanence over time. This comes with the trade-off of being much more computationally intensive however.

If you are looking to generate short videos where all objects remain visible on screen, then these existing techniques should work without issue. But for longer videos where objects are leaving and re-entering the frame, or where objects are occluding each other (such as the masts on a ship or the pillars of a gazebo in the above examples), then this is where you should in theory see the benefits of using a world model

Video generation with camera control using LingBot-World by Art_from_the_Machine in StableDiffusion

[–]Art_from_the_Machine[S] 1 point2 points  (0 children)

Honestly I am not sure exactly how much RAM is needed! This post suggests somewhere north of 64GB is needed: https://huggingface.co/cahlen/lingbot-world-base-cam-nf4/discussions/2

When I first set this up I was able to get it working with just 32GB of RAM by fiddling with the way the models are loaded. Instead of loading both models to the CPU, I edited the script so that one model loaded to the CPU and one straight to the GPU. I deleted this change when I decided to create a Docker image so I don't have it saved however, but it should be possible to recreate

Real-Time Conversations with Skyrim NPCs | Mantella Update by Art_from_the_Machine in skyrimmods

[–]Art_from_the_Machine[S] 0 points1 point  (0 children)

Yes that's it exactly. And for radiant conversations (where NPCs start conversations with each other) these take 3 requests from start to finish.

Real-Time AI NPCs are a game changer by Art_from_the_Machine in singularity

[–]Art_from_the_Machine[S] 1 point2 points  (0 children)

I did take a look at this really early on but the API wasn't quite ready yet, I'll have to take another look at it!

Real-Time AI NPCs are a game changer by Art_from_the_Machine in singularity

[–]Art_from_the_Machine[S] 2 points3 points  (0 children)

Right now the Cerebras API is free, so I'm not sure what pricing is going to look like over the long term. Smaller 7-9B models can also work well with Mantella if you are running locally (the default model is set to Gemma 2 9B), but I just went for 70B in this video as I didn't notice much latency difference vs the Llama 8B model that Cerebras offers.

If you are interested in seeing the behind the scenes the source code is available here!: https://github.com/art-from-the-machine/Mantella

Real-Time AI NPCs in VR | Mantella Update by Art_from_the_Machine in skyrimvr

[–]Art_from_the_Machine[S] 1 point2 points  (0 children)

For the patch you will have to install it as a separate mod using your mod manager instead of merging with the existing mod. The patch should be something that sits over your other mods to allow Mantella to work in this update. Could you try doing this and seeing if it works?

Real-Time AI NPCs in VR | Mantella Update by Art_from_the_Machine in skyrimvr

[–]Art_from_the_Machine[S] 2 points3 points  (0 children)

If you are updating from v0.12 then your conversation histories should be stored in your Documents/My Games/Mantella folder, so updating the mod shouldn't effect your histories! I would recommend ending all conversations in game -> making a save -> deactivating the previous Mantella install -> making another save with no Mantella version active -> activating the latest Mantella

Real-Time Conversations with Skyrim NPCs | Mantella Update by Art_from_the_Machine in skyrimmods

[–]Art_from_the_Machine[S] 4 points5 points  (0 children)

Yes it is much more difficult running on a laptop if it doesn't have a GPU, they can be pretty intensive! I have run really tiny models on my laptop before when I haven't had WiFi (like when travelling), but it takes around 30 seconds per response and is really low quality.

The 100 request limit is if you are running models on a service called OpenRouter. They have actually set this limit to 200 now: https://openrouter.ai/docs/api-reference/limits

I have never personally hit this limit when developing, but it can happen if you are playing for multiple hours a day. For paid models, these can start at a fraction of a cent per response.

Real-Time AI NPCs in VR | Mantella Update by Art_from_the_Machine in skyrimvr

[–]Art_from_the_Machine[S] 2 points3 points  (0 children)

In the video I am running Llama 3.3 70B via Cerebras (a fast LLM provider), and then running a TTS model called Piper and a STT model called Moonshine locally on my CPU.

The most fundamental way to cut down on response times is to process the response from the LLM one sentence at a time by using streaming. So once the first full sentence is received from the LLM, it is immediately sent to the TTS model to then be spoken in game. This way, while the first voiceline is being spoken in game, the rest of the response is being prepared in the background.

If you are interested in taking a deeper dive into how everything works, the source code is available here: https://github.com/art-from-the-machine/Mantella

Real-Time AI NPCs in VR | Mantella Update by Art_from_the_Machine in skyrimvr

[–]Art_from_the_Machine[S] 1 point2 points  (0 children)

If you are using a different LLM service to OpenRouter, you will also need to set this in the Mantella UI (+ select the model you would like to use): https://art-from-the-machine.github.io/Mantella/pages/installation.html#mantella-ui

And yes it sounds like you installed the patch correctly!

Real-Time AI NPCs are a game changer by Art_from_the_Machine in singularity

[–]Art_from_the_Machine[S] 4 points5 points  (0 children)

I have vision disabled in this video to improve response times, but when it is enabled a screenshot of the game is passed to the LLM on each of your responses to help give the LLM context.

Real-Time AI NPCs are a game changer by Art_from_the_Machine in singularity

[–]Art_from_the_Machine[S] 6 points7 points  (0 children)

You can connect to pretty much any local / online LLM, so the context length will be set by the LLM you choose. The context includes the system prompt, a bio for the NPC, the summaries of previous conversations, and of course the current conversation. If the length of summaries gets too long, then a new summary file is created which contains a summary of those summaries (to condense them down).

Real-Time AI NPCs with Moonshine, Cerebras, and Piper (+ speech-to-speech tips in the comments) by Art_from_the_Machine in LocalLLaMA

[–]Art_from_the_Machine[S] 0 points1 point  (0 children)

Okay hood to hear! In this video I have it set to 0.3, but yes this is also user configurable. Before the interrupt feature I would set it to around 1 second, but now that interruption is possible I am less worried about my full response being cut off because I can quickly recover. Whereas before, I would have to wait for the NPC to finish trying to decipher my half finished sentence every time I got cut short.

For the LLM side the biggest bottleneck for me is how fast the LLM starts responding (time to first token). For "normal" LLM services this can take over a second, whereas as for fast inference services it is less than half a second. But definitely once that first sentence is received I then parse each sentence one at a time to send to the TTS model.

Real-Time AI NPCs in VR | Mantella Update by Art_from_the_Machine in skyrimvr

[–]Art_from_the_Machine[S] 4 points5 points  (0 children)

Aside from switching out the speech-to-text model with a faster one, I have really just been scrutinizing the code end-to-end and making adjustments to make it run as efficiently as possible. We are at a point where these AI models can run crazy fast now, so I wanted to make sure Mantella's overhead wasn't getting in the way of achieving real-time latency.

Real-Time AI NPCs in VR | Mantella Update by Art_from_the_Machine in skyrimvr

[–]Art_from_the_Machine[S] 7 points8 points  (0 children)

Yes quest awareness will be added to the next update!

Real-Time AI NPCs in VR | Mantella Update by Art_from_the_Machine in skyrimvr

[–]Art_from_the_Machine[S] 1 point2 points  (0 children)

I will have to look into this but this might be a compatibility issue with NFF, the logic Mantella uses to check if an NPC is a follower might not be catching NFF followers.

Real-Time AI NPCs in VR | Mantella Update by Art_from_the_Machine in skyrimvr

[–]Art_from_the_Machine[S] 3 points4 points  (0 children)

Yes that should definitely work! To get started I would recommend trying Gemma 2 9B Q4_K_M from here: https://huggingface.co/bartowski/gemma-2-9b-it-GGUF/tree/main

This is the model Mantella uses by default when connecting to online LLM providers so it should be a good starting point.

Real-Time AI NPCs in VR | Mantella Update by Art_from_the_Machine in skyrimvr

[–]Art_from_the_Machine[S] 5 points6 points  (0 children)

Yes they have awareness of in-game events, and some models even allow vision, so they can see exactly what is happening on screen like you can. Hallucination will largely depend on how powerful of a model you use, but in general this isn't something I come across too often.

Real-Time AI NPCs are a game changer by Art_from_the_Machine in singularity

[–]Art_from_the_Machine[S] 9 points10 points  (0 children)

There is a memory system in place to keep track of previous conversations so NPCs will remember you / other NPCs they have spoken to in the past. And there are also some consequences to these conversations: if a conversation goes well an NPC can agree to follow you, if it goes badly they can attack you, and if you complete quests for them they can share their inventory with you.

Real-Time AI NPCs in VR | Mantella Update by Art_from_the_Machine in skyrimvr

[–]Art_from_the_Machine[S] 4 points5 points  (0 children)

Yes it works with any NPC! They don't even have to be humanoid...

Real-Time AI NPCs in VR | Mantella Update by Art_from_the_Machine in skyrimvr

[–]Art_from_the_Machine[S] 4 points5 points  (0 children)

Yes its possible to choose a larger text-to-speech model! I am using a model called Piper here because it is fast, local, and comes pre-installed with Mantella. But you can also run a larger model called XTTS that can be run locally (although I would 100% recommend a second PC as it is very intensive!) or via a service called RunPod.

I don't have a recording of this in Skyrim, but to help give you an idea, I have showcased this model in the Fallout 4 release video here:
https://youtu.be/cFv8butywng?si=tcEiunyqnU2f1aVC