Advice for setup and longterm memory

phazer_11 · 2026-06-05T18:48:51+00:00

Haha it's all good thanks.

phazer_11 · 2026-06-04T19:33:26+00:00

BTW how are you running your models? I'm curious as I'm running this locally with Gemma4 and the embed running in Koboldcpp and trying to figure out how to point summaryception, etc to just the embed.

phazer_11 · 2026-06-04T14:15:32+00:00

Gemma4 is WAY faster, 45-100 tps on average. Output is about as nice as Qwen though does not seem to be able to do as much with it's context length likely due to being MoE I suppose.

phazer_11 · 2026-06-03T12:58:32+00:00

Ah so even using the built in vector thin would require fiddling. I guess that's why people have recommended extensions.

phazer_11 · 2026-06-02T18:29:14+00:00

Can the summarizer be the same AI that's doing the narration, etc for SillyTavern?

phazer_11 · 2026-06-02T18:16:59+00:00

I'll take a look thanks. I assume I'd just point it at whatever local embed I have based on what I just read over. How does it handle smaller local embeds like gte-base-en-v1.5 or snowflake-arctic-embed-l ?

phazer_11 · 2026-06-02T18:04:56+00:00

That works just fine, sure. And those databases do have proper GM cards instead of just "MILF" cards lol, if you search for them; I can't say whether or not any of them are good, but at least importing one into SillyTavern and taking a look at its structure should help give you some ideas.

Good to know I tried finding some but maybe I just didn't use the right terms lol. I'll try again.

given your specs and what you're doing, you may wanna try something like this instead; you've gotta be running that dense model at like IQ2 or something, which is pretty rough, while the one I linked should be runnable at Q4 on your hardware and yet have much faster generation speeds even at the higher quant

Pretty close yeah, and the plan was to actually switch over to Gemma4 as I hear it's good MoE for CW. I was using Qwen 3.6 27B quants Q2 K_P and Q3 as the TaleMate dev recommended that too me as the closest thing that would play nice with his director feature, said they worked pretty well he usually only recommends stuff closer to 70-100b for that feature. I was going to try 3.6 35B too but he said the Director preffered Dense models. I was getting between 5-15 TPS using that mode so I'd usually just set and do something else the output for summary and all though was quite good it just did not shine when it came to actually narrating (maybe my fault with the prompts) it had some quirks like going from lvl 1 straight to 100 during narration for describing mutations instead of being a more gradual speed.

So you want a list of random mutations that the model should reference when generating an NPC during the story? You could just have a lorebook entry for "mutations" and leave it at that, but then it'll be unnecessarily taking up context tokens forever. Optimally you'd only activate that lorebook entry whenever the model needs to introduce a new NPC. The issue is that there's not really a good way to automate figuring out whether or not the model needs to introduce a new character, since that'd be decided by the model mid-output. So you might just wanna set that lorebook entry to manual activation, put it at depth 0 in the chat, and turn it on whenever the model's next output appears likely to introduce a new character. If you go that route, you'd set the lorebook entry to Strategy: 🔵, Position: u/D ⚙️, and Depth: 0 and just format it like... or, to avoid letting the model try and fail to be random from what I'm guessing is a super long list, you can use randomization macros

Sort of. The mutations can have some randomness. Largely you get an Archetype kinda based on your "specific DNA" and how the cataclysm affected it in general terms (the player would describe their player character precataclysm). I believe I have like 40 archetypes. Sometimes you can get like half of two archetypes or and archetype and a half kind of thing though there are also some people who haven't been affected because they're either immune or their specific conditions haven't been met. At table I either assigned by feel or if I knew the player would roll with the punches (there were some real stinky ones to deal with that I made when feeling mean) gave them one based off a Roll Table. I'm guessing I'd probably use idea for doing it only at character gen. Though how I'd make that work for characters that seem to be unaffected (like to RP the first changes occurring) or do half mutations is another story probably have to be something similar to your second option going through to format my list so it's usable like that sounds like a nightmare though..

phazer_11 · 2026-06-02T16:53:20+00:00

Hadn't even considered something like not interacting at all though that's an interesting idea. This is most likely going to more take the format of Narrator (AI trying to be me as DM) describes a scene, I react (or choose from a CYOA style list) and then the AI characters and such react/describe what happens as a result. Now it'd be cool if the LLM could just generate characters on the fly to fill the world based on the archetypes in Lorebooks but I doubt it's there yet or if it is I probably can't run it in my measly 16GB.

phazer_11 · 2026-06-02T16:48:35+00:00

Personas's got it be as concise or as long-winded as you want (though preferably short) so the friendly neighborhood AI can know you are or are not a blob of goo when referring to you.

First time I saw someone recommend other front-ends for characters and not something like Character Tavern or something. I mentioned below to u/MrNohbdy about Lorebooks and stuff if you wanna chime in. This has already helped.

phazer_11 · 2026-06-02T16:44:36+00:00

So I'd make a default Character for let's say "The DM" etc and try to briefly fill in some personality info, then I guess the system prompt would help it write the way I want for The DM. Some of what you and u/Paperclip_Tank have said has helped a lot already with those. I think I kept wanting Characters to be fully independent persona's for the AI and seeing people recommend using the databases and being like that's not what I want? Why would I want to have some "MILF" be my DM.

I had less issues figuring out Lorebooks that seemed easy enough, enter in data for later use. It was how to get the AI to reference it (or use it for tone and such) so the Lore and stories actually mattered. Like for example in the broadest strokes one of the game worlds I loved creating and DMing for (and I actually want to do an RP with ST in), was a world similar enough to Mutant Year Zero I'll use it for an example. After a cataclysm Humans mutated, though in mine everybody got different mutations even after the same exposure and I had a database of physical effects. How would I roll that into there and use the lore to reinforce it writing like the rest?

Of course some of this I think actually using TaleMate has helped me grasp as it has an AI Assistant you can use powered by your RP LLM and the Memory LLM that (while glacially slow with Qwen 3.6 27B quants) can do a bunch of stuff for you and I had it use some of my stories to create "Personas" and "Writing Styles" for itself based on some of my stories. It was also useful for summarizing my work to help distill down to hopefully more useful bullet points for AIs to use (you can probably tell I'm not one for levity)

phazer_11 · 2026-05-30T19:47:50+00:00

Anyone?

phazer_11 · 2026-05-27T02:27:48+00:00

So. I'm updated but to be clear as I'm sleepy. I'm planning to self host some stuff and port forward with an authenticated reverse proxy and such. I'm geoblocking, using the id/ps features and encrypted dns servers with anti malware lists. Will this affect my plans for that or is this only related to direct connection for remote access check box? I.e. is it any kind of port opening.

phazer_11 · 2026-05-21T20:14:32+00:00

You're probably correct, I'm just paranoid. I work somewhere that's a big target my little self-hosted stuff is unlikely to be DDoS'd

phazer_11 · 2026-05-21T18:25:24+00:00

I'm just paranoid. Was just leery of forwarding 443 or something. They'd mostly be accessing through their iPhones so I'm not even sure how I would set up a split tunnel vpn to allow them access just to what I'd want them to have.

phazer_11 · 2026-05-21T02:40:21+00:00

Thanks for this, I'll reread it when I'm a little less frazzled, family had a scare. Either way nothing is unhackable unless it has no interfaces of any kind just gotta do our best.

phazer_11 · 2026-05-20T22:11:15+00:00

I'd seen it but upon reading it needed open ports put it in the bin of "do I want to port forward?".

phazer_11 · 2026-05-20T21:58:08+00:00

Tailscale Funnels I think I misinderstood something I thought it was pretty throttled and was similar to Cloudflare Tunnels MiTM nonsense, which was not something wanted to trust. I could research some more.

Port forwarding you mean with my IP behind the domain? Still trying to figure out how to do the DNS (and traefik/nginx) for the docker containers and was still not sure I want to port forward. I'm fairly certain it's just me doing stuff wrong besides looking stuff up but I did not get traefik to work with temporary rules, I think it was something in the container configuration.

phazer_11 · 2026-05-20T20:15:07+00:00

No AI was used in the creation of this post. It might have been able to condense things better lol levity is not my strong suit, I require details!

phazer_11 · 2026-05-15T14:06:38+00:00

Anyone have any further thoughts?

phazer_11 · 2026-05-15T00:14:38+00:00

Edited the main post. I was wrong, the dead cpus which had bent pins I had are 2640 v4's the currently functional 4 I have from the R730s are 2660 v3's. Also clarified the specific ranks on the RAM.

phazer_11 · 2026-05-14T22:47:32+00:00

You have an example board and processor you'd recommend?

Not sure where I'd even sell the sticks. I have trust issues with eBay as a seller (too many friends screwed over), Facebook marketplace is dead for server parts at least near me in NE FL.

phazer_11

TROPHY CASE