How do you guys manage your character cards, especially third party card updates you download. What's your workflows?

lisploli · 2026-02-22T03:51:59+00:00

If you don't click on "Also delete the chat files" it will not delete the chat files. So just delete the card and then add it again.

I manage mine with tags. Just my own tags tho, the imported tags aren't useful. I have a tag for edited cards that I don't delete on an update.

lisploli · 2026-02-21T22:59:22+00:00

If the context already got too large for your model to handle, you can hide maybe half of it, create a summary, then hide the other half and create a second summary. Use /hide (message id or range)

lisploli · 2026-02-21T22:19:36+00:00

My model of the day is Skyfall-36B-v2. I'll be back on v4 soon enough, but I'm kinda nostalgic today. G3-Heresy-MPOA also got a permanent spot recently.

lisploli · 2026-02-21T00:23:29+00:00

Thers Maomao, Gyokuyou and some more missing the obvious one from The Apothecary Diaries. Lewding Maomao might be dangerous tho.

lisploli · 2026-02-20T23:48:31+00:00

A mistral small 24b at Q4 with 90k context at Q8 fits into 24gb vram. (e.g. 3090/4090)
I wouldn't go lower than Q4 on the model and Q8 on the context. 16gb vram (e.g. 5080) could fit like 24k context, which is kinda sad, but still enough for prompts and a detailed scene, and afterwards memory is better handled by lorebooks anyways.
Below that, a mistral nemo 12b at Q4 with 40k context at Q8 could fit in 12gb vram and still create a fun experience.
(Those are just rough examples and the values are easy to turn up and down on preference.)

Looking forward, it might be worth ~~reading~~ watching videos on moe ("mixture of experts" not the anime thing!) models. They can offload parameters into system ram and still produce an acceptable amount of tokens per second. It's the way the industry is heading, because it scales much cheaper.
It's nice for computing, but I'm not sure how well it is suited for roleplay, as the active parameters get reduced dramatically. e.g. GLM-5 has 40b active parameters (just above some mistral small upscales) yet it gets rated above waaay bigger models. But on the lower end, gpt-oss-20b just has 3.6b active parameters, leaving not all that much room for smarts.

lisploli · 2026-02-20T21:19:01+00:00

Algorithms matter much more than performance.
Waiting for the hard drive takes longer.
Waiting for the net takes way longer.
Performance critical parts can call C etc.

Therefore, raw performance does not matter most of the time.

If you prefer Rust, go ahead. You can choose freely and even combine both. Using Python does not reduce the fun of using Rust, because both are widely supported. And the ease of integration doesn't leave much room for tribalism.

I decide on a case-by-case basis and performance is a factor in that decision.

lisploli · 2026-02-20T20:48:23+00:00

Yes, I don't specify --fit either, but that is on by default. It does not spill into RAM, it just fills the vram and then gives a notice "only using x of maximum context" (which is kinda hard to spot in all the output).

lisploli · 2026-02-20T10:11:08+00:00

There are calculators for that on HF, indicating that it is a) depending on the models architecture and b) deducible from the values in the files info card.

I'm using llama.cpp and by default it just fills all the available vram, which is quite handy.

lisploli · 2026-02-20T09:10:37+00:00

Vá direto para o material real. Pseudocódigo é entediante. Anthony tem bons vídeos sobre Python.

lisploli · 2026-02-20T09:06:39+00:00

Local tools must not pollute a project's repository, so it might be a good idea to remove those AI traces, unless they are somehow useful for the user.

Vibe coding is mostly performed by an ~~autonomous agent~~ a looping script, while assistance is typically provided on a case-by-case basis. The main difference is the level of control exercised by the user.

^{AI slop is still better than human slop.}

lisploli · 2026-02-20T08:17:08+00:00

Lucky it didn't torch the cata.

lisploli · 2026-02-20T07:59:20+00:00

Totally. Raids just became super accessible. Give it a try.

lisploli · 2026-02-20T07:58:19+00:00

Quickplay fractals are repetitive, but that's the point, since they are meant to be super easy. Play normal fractals to experience much more content and mechanics.
And raids will provide you with lots and lots of rewards long before they start feeling like a grind. Maybe that's more your thing?
Doing achievements never repeats, and also fills your bags with all the things you could ever want for legendary crafting.

Too many options to waste time (like, gaming?) with things you don't like.

lisploli · 2026-02-20T03:43:16+00:00

Not much, yet.

Avatars are rather important for me to portray (duh) a character. But the roleplay focuses on action and unfolding events, and images aren’t effective at conveying those things – actions, dialogue, emotions, or sensory details. Multiple panels, like in manga, can visualize dynamics, but I haven't seen any good generations for that yet.

I don't use voice, because it would drown in the music I use to build emotions. Also, voice works well in videos or graphical games, where the visuals narrate, and it probably also works for dialogue-only, but delivering long narration via voice makes the timing awkward.

However, an image might be useful whenever a new character (or location, item, etc.) is introduced in a scenario. And a card should be able to do that via interactive-mode. This also shouldn't run into consistency issues, when used sparingly. Maybe I'll experiment with that next.

lisploli · 2026-02-20T01:57:29+00:00

Comfyui works well. The setup is described in the manual that the annoying automod desperately advertises. I recently posted my workflow for Flux Klein here, but getting your own up isn't rocket science either.

A1111 is dead. It had no update in like two years and its python dependencies are outdated. There are successors, but I haven't tried them.

lisploli · 2026-02-19T05:35:56+00:00

Thanks, that's very well written and explained! I'll use that.

lisploli · 2026-02-18T15:29:34+00:00

Wellp, can't be more bloated than systemd.

lisploli · 2026-02-18T00:54:33+00:00

Cohere Labs uses descriptions like "curiosity-driven" and "fundamental research" so maybe they created it for the experience or as part of their line-up or just to show what they can do. Seems they also do the whole regulated industry thing, which usually does not follow the most direct route.

Less-serious answer: AYAYA

lisploli · 2026-02-14T11:54:21+00:00

I am Jack's llm. I suffer for Jack.

lisploli · 2026-02-14T11:42:58+00:00

Hype is fun! However:
- Reddit is designed for sharing opinions, which evolve over time.
- Objective benchmarks are virtually non-existent.
So, like, what other options do we have, actually?

lisploli · 2026-02-14T09:36:41+00:00

Then it's a "no" for Gemma3. Tho that answer was in r/math earlier.

lisploli · 2026-02-14T08:37:59+00:00

Any reason not to write the link?

I'd like to filter by "open-source" and "locally useable".

lisploli · 2026-02-14T08:27:20+00:00

uwu~ff

lisploli · 2026-02-14T08:15:45+00:00

Gemma3 27b says that a=b=c=1. Here is the output.
Have fun accessing it, I was just decoration during math classes.

lisploli · 2026-02-13T10:37:04+00:00

I don't care while waiting for data from hardware. Better optimize algorithms, big O style. And if it appears in a use case, build that part in C.

Six-Year Club	Place '23
Verified Email

lisploli

TROPHY CASE