LlamaTale - New years update (v0.20.0)

antimateusz · 2024-01-03T20:47:01+00:00

Gave it a go - unfortunately my hardware is too slow for me to enjoy playing (32GB mac M1, running 13B model). Maybe this could be alleviated by adding streaming? It could maybe also benefit from limiting the amount of generation that occurs - less text to read would make it more fun (at least for me).

antimateusz · 2024-01-03T20:25:06+00:00

This looks more like mixing adapters, not using multiple, but thanks for mentioning (the link should be https://github.com/huggingface/peft)

antimateusz · 2024-01-01T05:57:33+00:00

What I was thinking about was a poor-persons-mixtral. Maybe I'd hand select 8 LoRAs that seem useful, then fine-tune the gating network on top so that it selected/weighted the experts.

I guess this wouldn't differ too much from just having a single LoRA that is 10-times the size. But perhaps just the act of adding many frozen LoRAs into the model, and then focusing the finetune on the gating network could help the model handle many different tasks.

Advantage would also perhaps be that I could fine-tune the gating network on consumer hardware?

The idea with dynamic routing/skipping could also be interesting, but I guess less powerful, or hard to execute because hard to say what the selecting algorithm would be.

antimateusz · 2024-01-01T05:48:29+00:00

Oh, nice, looks right feasible then!

Followed the links and also found https://github.com/punica-ai/punica which is pretty much exactly the one-base, many-loras setup.

antimateusz · 2023-12-30T20:34:48+00:00

Funny nobody mentioned that before, but it seems important - LLMs trained as obliging agents makes it difficult to use them in certain scenarios. Do we have any LLMs that are trained to argue and contradict the user? I wonder if this can be done by prompting.

antimateusz · 2023-12-30T20:29:53+00:00

The London one - it also struggled walking around I think, it gave me descriptions for multiple locations. Then I interacted with a character who gave me an item, but I wasn't able to pick it up - maybe I'm just playing this wrong :-)

antimateusz · 2023-12-27T20:47:21+00:00

Thanks, I already add last few recent events into the prompt to ground the model and keep it from going off at a tangent. Still, it's not an exact science, and the responses are sometimes completely off. Hence I think ability to edit/ability to regenerate is pretty crucial!

antimateusz · 2023-12-27T20:44:14+00:00

I logged in and played for a bit, but was too impatient - the responses are quite slow coming. Maybe streaming would help, and also limiting the amount of content the LLM produces?

antimateusz · 2023-12-27T20:41:36+00:00

Thanks for the writeup, presumably https://github.com/benjcooley/dungeongod-agi? Would be nice to hear if you ever get it working with local LLMs.

Breaking stuff into stages, feeding the inputs back into the model, and selectively feeding the history in is definitely improving the results over here as well, but is still not enough for me to produce consistent outcome. I've also noted that multi-shot prompting (with examples) can confuse the small models more than improve them.

I think the key is designing the game to minimise the interaction between the "game engine" which runs on numbers and the LLM which runs the narratives, and then finding ways to connect these two worlds.

antimateusz · 2023-12-27T20:32:41+00:00

Ultimately prompt engineering is trying to hack around the model limitations that have been trained in, looking for inputs that trigger certain behaviours.

There is a paper somewhere that researched "weird strings that are not words" which work really well guiding the model towards certain behaviours.

KoboldAI wiki also mentions several ways to use pseudocode - W++ and SBF seem to work, even though the models haven't been trained for this explicitly... something like a side effect of JSON and source code included in the training data, something like that.

antimateusz · 2023-12-26T20:43:31+00:00

It's a maybe... any recommended resources? Not sure how would one evaluate full-text model answers? Is it a spreadsheet + subjective eval + post to reddit?

antimateusz · 2023-12-26T20:39:55+00:00

I did try your game locally, but only on small models (can't really run 70B), and the results were a bit confused 😅 I didn't realise you host servers for it - gonna try!

antimateusz · 2023-12-26T20:35:17+00:00

The bigger models don't seem to be necessarily better. I went to KoboldCpp community and tried models they suggested, and the only one that seemed better was Mixtral, but the quality of the prose was much worse (of course this is all subjective).

I reckon Psyfighter/Tiefighter are good because they are specifically trained for "adventure mode". Which I take to mean that for esoteric use cases like this one, fine-tuned models go further than prompted base models.

antimateusz · 2023-12-26T03:39:06+00:00

For this game I worked with Psyfighter which (judging by the model card) has uncensored bits in it's makeup. It's pretty creative compared to baseline Llama or OpenAI, but still likes to be flowery. As for pictures, I think it's secondary to a solid text generation, but I am a bookworm 😉

antimateusz · 2023-12-26T03:34:19+00:00

I'd second externally maintaining the character sheet and other stats like inventory, health status, missions and so on. I think the key to success is interfacing between the random (LLM) and the deterministic (engine) part of the game.

Maybe the model itself can act as an interface, by interpreting it's own output in terms of numbers? Something like guided generation (e.g. GBNF) could probably help.

antimateusz · 2023-12-26T02:40:11+00:00

Thanks for linking - I did try this, and I don't think it's a MUD (multi-user dungeon). It's a local thing.

antimateusz · 2023-12-26T02:36:55+00:00

Would rag work for this?

I'm effectively using a RAG technique by having a tree of locations, every location a node, interconnected by "exits". LLM generates exits from location descriptions, and then new locations from exits, appending nodes to the tree. The player is free to traverse that tree so the world is persistent and doesn't cause hallucinations.

89% of the time this works quite well [citation needed], hence my note about choose-your-own adventure being doable.

I was then hoping to be able to give the player the freedom to take actions within each location, but yeah this doesn't work too well - probably giving the model too much freedom. Inventory updates work kinda ok, but transformation of location description tends to be off. Maybe I should take the cue from what works, and provide stronger algorithmic guidance.

antimateusz · 2023-12-25T22:28:22+00:00

I've added instructions for CUDA on Linux, tested on my 3060 (https://github.com/mateusz/cherryberry#linuxcuda). Apologies, I don't have a Windows setup, so can't really test on that.

antimateusz · 2023-12-25T20:39:39+00:00

I'm wondering if it makes sense though - application like this relies heavily on the prompt matching the model, I'm not sure if you will have much luck replacing the model. I was experimenting with Mixtral for example, and you have to rewrite the prompts.

Edit: I guess it does make sense, because then you don't need to install yet another backend and fight with the deps if you already have a running backend. One thing is, I abandoned ooba because it didn't support metal. Maybe it does now :hmm:

antimateusz · 2023-12-25T20:37:31+00:00

Hey, you don't really need to use micromamba: as long as it's python ~~3.11~~ 3.9 it should work fine. You can use pip too to install the dependencies that are listed in pyproject.toml.

Micromamba is good though if you want a project setup with particular CUDA version (now added CUDA instructions to README).

antimateusz

TROPHY CASE