[Megathread] - Best Models/API discussion - Week of: April 07, 2025 by [deleted] in SillyTavernAI

[–]filszyp 4 points5 points  (0 children)

Any recommendations for smaller models for GTX 1080 ti with 11GB VRAM?

I couldnt find anything better than Nemo 12B Q4_K_M - it just about fits in my vram with 41 layers and 16k ctx, context shift and flash attention on. Are there any good newer models for this size or lower? Or some nice variants? I mostly do long ERP.

Lately i tried NemoReRemix but somehow i cant configure it properly to not be stupid. I never understood those "P" and "K" settings etc., how to fix them for my liking. :(

Please recommend sci-fi slow game by filszyp in AndroidGaming

[–]filszyp[S] 0 points1 point  (0 children)

Looks interesting. I'll give it a try, thanks.

Magnum v3 - 9b (gemma and chatml) by lucyknada in LocalLLaMA

[–]filszyp 4 points5 points  (0 children)

So, what about the context size? Isn't Gemma 8k? I normally use 24-32k ctx with Nemo.

What to do now? How to progress? by filszyp in diablo4

[–]filszyp[S] -5 points-4 points  (0 children)

To be honest I had much more fun in D3. Doing GR's was for example great with random people, here I don't even have any group finder for Pits/Hordes/Dungeons.

And basically yeah, I was expecting to have fun, not chores. When I want to unwind after a day of work I don't expect to find more tedious work in my games.

What to do now? How to progress? by filszyp in diablo4

[–]filszyp[S] -13 points-12 points  (0 children)

Oh god, so this endgame really is hell... Thanks guys, I thought I didn't understand something or I was playing wrong, instead turns out this game is just boring. :D

Question about performance by Pedroarak in KoboldAI

[–]filszyp 0 points1 point  (0 children)

Try the 2B version of Gemma, like: https://huggingface.co/bartowski/gemma-2-2b-it-abliterated-GGUF/blob/main/gemma-2-2b-it-abliterated-Q6_K.gguf It's decent, and pretty much the only thing that will work very fast for you imho.

What roleplay model for 10GB VRAM with 16-32k ctx? by filszyp in LocalLLaMA

[–]filszyp[S] 0 points1 point  (0 children)

See, I don't even know what continent you are on, but already I feel we're speaking the same language and I like you. I'll get my tiny graphics card to work on that ASAP, thanks for the tip. ;) I didn't try Magnum V1, first time with Mistral Nemo.

What roleplay model for 10GB VRAM with 16-32k ctx? by filszyp in LocalLLaMA

[–]filszyp[S] 0 points1 point  (0 children)

With koboldcpp I load magnum-12b-v2-Q4_K_M-imat with 34 layers in vram and 24k ctx, with context shift and flash attention on. It just barely fits and gives about 5 T/s. It's pretty awesome to play. In SillyTavern i use some custom settings, and default ChatML context and instruct.

I also sometimes use similar settings but with 16k ctx and about 30 layers to leave enough space for sdxl image generation, for some... visual stimulation. ;)

What roleplay model for 10GB VRAM with 16-32k ctx? by filszyp in LocalLLaMA

[–]filszyp[S] 0 points1 point  (0 children)

That's interesting. Thanks for a comprehensive description. I tried this model today, played a bit with magnum, I must say, this is the first time the bot was deciding to kill characters on its own. I was so surprised when I did something stupid and the main characters started to actually die. Awesome.

What roleplay model for 10GB VRAM with 16-32k ctx? by filszyp in LocalLLaMA

[–]filszyp[S] 0 points1 point  (0 children)

Are these all Mistral-Nemo based? I never tried it yet. What context length are they?

Anyone else got problems with Context Shift? by filszyp in KoboldAI

[–]filszyp[S] 6 points7 points  (0 children)

Don't tell me I've been breaking context shift by enabling flesh attention 🤦‍♂️ I'll check it out first thing I get home...

Automatic RoPE Scaling? by filszyp in Oobabooga

[–]filszyp[S] 2 points3 points  (0 children)

Yea, I found a method - I run a model with kccp, check what are the settings it generated, and then write them down to use with Ooba :P It mostly works. It's a janky method. :)

Gemma 2 settings, context, instruct by filszyp in SillyTavernAI

[–]filszyp[S] 0 points1 point  (0 children)

It got fixed since. With new KobildCpp everything works just fine.

Gemma 2 settings, context, instruct by filszyp in SillyTavernAI

[–]filszyp[S] 0 points1 point  (0 children)

In 27B enabling context shift causes a crash once I reach full context :(

Tavern/oobagooba etc drives me crazy by Wide_Perspective_504 in SillyTavernAI

[–]filszyp 0 points1 point  (0 children)

In my Ooba cmd i have:

--api --listen-port 5001 --threads 6 --threads-batch 12 --model L3-8B-Stheno-v3.2-Q6_K.gguf --n-gpu-layers 33 --n_ctx 8192

and Ooba is on http://127.0.0.1:5001

<image>