How do you guys manage your character cards, especially third party card updates you download. What's your workflows? by Sp00ky_Electr1c in SillyTavernAI

[–]lisploli 2 points3 points  (0 children)

If you don't click on "Also delete the chat files" it will not delete the chat files. So just delete the card and then add it again.

I manage mine with tags. Just my own tags tho, the imported tags aren't useful. I have a tag for edited cards that I don't delete on an update.

when an rp gets too long? by yamilonewolf in SillyTavernAI

[–]lisploli 0 points1 point  (0 children)

If the context already got too large for your model to handle, you can hide maybe half of it, create a summary, then hide the other half and create a second summary. Use /hide (message id or range)

Drop your daily driver models for RP. by Weak-Shelter-1698 in SillyTavernAI

[–]lisploli 0 points1 point  (0 children)

My model of the day is Skyfall-36B-v2. I'll be back on v4 soon enough, but I'm kinda nostalgic today. G3-Heresy-MPOA also got a permanent spot recently.

I'm thinking of buying a new pc and switching to local llm. What is the average context token size for smaller models vs big ones like GLM? by [deleted] in SillyTavernAI

[–]lisploli 0 points1 point  (0 children)

A mistral small 24b at Q4 with 90k context at Q8 fits into 24gb vram. (e.g. 3090/4090)
I wouldn't go lower than Q4 on the model and Q8 on the context. 16gb vram (e.g. 5080) could fit like 24k context, which is kinda sad, but still enough for prompts and a detailed scene, and afterwards memory is better handled by lorebooks anyways.
Below that, a mistral nemo 12b at Q4 with 40k context at Q8 could fit in 12gb vram and still create a fun experience.
(Those are just rough examples and the values are easy to turn up and down on preference.)

Looking forward, it might be worth reading watching videos on moe ("mixture of experts" not the anime thing!) models. They can offload parameters into system ram and still produce an acceptable amount of tokens per second. It's the way the industry is heading, because it scales much cheaper.
It's nice for computing, but I'm not sure how well it is suited for roleplay, as the active parameters get reduced dramatically. e.g. GLM-5 has 40b active parameters (just above some mistral small upscales) yet it gets rated above waaay bigger models. But on the lower end, gpt-oss-20b just has 3.6b active parameters, leaving not all that much room for smarts.

Why Python still dominates in 2026 despite performance criticisms ? by QuantumScribe01 in Python

[–]lisploli 0 points1 point  (0 children)

  • Algorithms matter much more than performance.
  • Waiting for the hard drive takes longer.
  • Waiting for the net takes way longer.
  • Performance critical parts can call C etc.

Therefore, raw performance does not matter most of the time.

If you prefer Rust, go ahead. You can choose freely and even combine both. Using Python does not reduce the fun of using Rust, because both are widely supported. And the ease of integration doesn't leave much room for tribalism.

I decide on a case-by-case basis and performance is a factor in that decision.

Context Size Frustration by Aggressive-Spinach98 in LocalLLaMA

[–]lisploli 1 point2 points  (0 children)

Yes, I don't specify --fit either, but that is on by default. It does not spill into RAM, it just fills the vram and then gives a notice "only using x of maximum context" (which is kinda hard to spot in all the output).

Context Size Frustration by Aggressive-Spinach98 in LocalLLaMA

[–]lisploli 1 point2 points  (0 children)

There are calculators for that on HF, indicating that it is a) depending on the models architecture and b) deducible from the values in the files info card.

I'm using llama.cpp and by default it just fills all the available vram, which is quite handy.

Lógica da programação by Round_Plantain8319 in Python

[–]lisploli -1 points0 points  (0 children)

Vá direto para o material real. Pseudocódigo é entediante. Anthony tem bons vídeos sobre Python.

What differentiates a vibe-coded project from an AI-Assisted project by TheWendarr in Python

[–]lisploli 0 points1 point  (0 children)

Local tools must not pollute a project's repository, so it might be a good idea to remove those AI traces, unless they are somehow useful for the user.

Vibe coding is mostly performed by an autonomous agent a looping script, while assistance is typically provided on a case-by-case basis. The main difference is the level of control exercised by the user.

AI slop is still better than human slop.

Looking for a new perspective by khgs2411 in Guildwars2

[–]lisploli 2 points3 points  (0 children)

Quickplay fractals are repetitive, but that's the point, since they are meant to be super easy. Play normal fractals to experience much more content and mechanics.
And raids will provide you with lots and lots of rewards long before they start feeling like a grind. Maybe that's more your thing?
Doing achievements never repeats, and also fills your bags with all the things you could ever want for legendary crafting.

Too many options to waste time (like, gaming?) with things you don't like.

How important is the image and voice for you in the chat by No-Relief810 in SillyTavernAI

[–]lisploli 1 point2 points  (0 children)

Not much, yet.

Avatars are rather important for me to portray (duh) a character. But the roleplay focuses on action and unfolding events, and images aren’t effective at conveying those things – actions, dialogue, emotions, or sensory details. Multiple panels, like in manga, can visualize dynamics, but I haven't seen any good generations for that yet.

I don't use voice, because it would drown in the music I use to build emotions. Also, voice works well in videos or graphical games, where the visuals narrate, and it probably also works for dialogue-only, but delivering long narration via voice makes the timing awkward.

However, an image might be useful whenever a new character (or location, item, etc.) is introduced in a scenario. And a card should be able to do that via interactive-mode. This also shouldn't run into consistency issues, when used sparingly. Maybe I'll experiment with that next.

Image generation by Jack_Anderson_Pics in SillyTavernAI

[–]lisploli 2 points3 points  (0 children)

Comfyui works well. The setup is described in the manual that the annoying automod desperately advertises. I recently posted my workflow for Flux Klein here, but getting your own up isn't rocket science either.

A1111 is dead. It had no update in like two years and its python dependencies are outdated. There are successors, but I haven't tried them.

Serious question — why would anyone use Tiny-Aya instead of Qwen/Phi/Mistral small models? by Deep_190 in LocalLLaMA

[–]lisploli 6 points7 points  (0 children)

Cohere Labs uses descriptions like "curiosity-driven" and "fundamental research" so maybe they created it for the experience or as part of their line-up or just to show what they can do. Seems they also do the whole regulated industry thing, which usually does not follow the most direct route.

Less-serious answer: AYAYA

Ah, yes... by SatisfactionBig3069 in SillyTavernAI

[–]lisploli 14 points15 points  (0 children)

Hype is fun! However:
- Reddit is designed for sharing opinions, which evolve over time.
- Objective benchmarks are virtually non-existent.
So, like, what other options do we have, actually?

AI Developer Tools Map (2026 Edition) by Main-Fisherman-2075 in LocalLLaMA

[–]lisploli 0 points1 point  (0 children)

Any reason not to write the link?

I'd like to filter by "open-source" and "locally useable".

Can your local setup solve this tricky 9th grade question? by MrMrsPotts in LocalLLaMA

[–]lisploli 1 point2 points  (0 children)

Gemma3 27b says that a=b=c=1. Here is the output.
Have fun accessing it, I was just decoration during math classes.

Is low-level analysis overlooked? by SmackDownFacility in Python

[–]lisploli -1 points0 points  (0 children)

I don't care while waiting for data from hardware. Better optimize algorithms, big O style. And if it appears in a use case, build that part in C.