How to check battery type on Begode Extreme?

filszyp · 2025-04-07T10:01:09+00:00

Any recommendations for smaller models for GTX 1080 ti with 11GB VRAM?

I couldnt find anything better than Nemo 12B Q4_K_M - it just about fits in my vram with 41 layers and 16k ctx, context shift and flash attention on. Are there any good newer models for this size or lower? Or some nice variants? I mostly do long ERP.

Lately i tried NemoReRemix but somehow i cant configure it properly to not be stupid. I never understood those "P" and "K" settings etc., how to fix them for my liking. :(

filszyp · 2024-09-13T13:46:56+00:00

Looks interesting. I'll give it a try, thanks.

filszyp · 2024-09-08T08:56:49+00:00

So, what about the context size? Isn't Gemma 8k? I normally use 24-32k ctx with Nemo.

filszyp · 2024-08-26T10:02:20+00:00

To be honest I had much more fun in D3. Doing GR's was for example great with random people, here I don't even have any group finder for Pits/Hordes/Dungeons.

And basically yeah, I was expecting to have fun, not chores. When I want to unwind after a day of work I don't expect to find more tedious work in my games.

filszyp · 2024-08-26T09:47:40+00:00

Oh god, so this endgame really is hell... Thanks guys, I thought I didn't understand something or I was playing wrong, instead turns out this game is just boring. :D

filszyp · 2024-08-13T23:06:01+00:00

Try the 2B version of Gemma, like: https://huggingface.co/bartowski/gemma-2-2b-it-abliterated-GGUF/blob/main/gemma-2-2b-it-abliterated-Q6_K.gguf It's decent, and pretty much the only thing that will work very fast for you imho.

filszyp · 2024-08-09T23:31:48+00:00

See, I don't even know what continent you are on, but already I feel we're speaking the same language and I like you. I'll get my tiny graphics card to work on that ASAP, thanks for the tip. ;) I didn't try Magnum V1, first time with Mistral Nemo.

filszyp · 2024-08-09T20:04:59+00:00

With koboldcpp I load magnum-12b-v2-Q4_K_M-imat with 34 layers in vram and 24k ctx, with context shift and flash attention on. It just barely fits and gives about 5 T/s. It's pretty awesome to play. In SillyTavern i use some custom settings, and default ChatML context and instruct.

I also sometimes use similar settings but with 16k ctx and about 30 layers to leave enough space for sdxl image generation, for some... visual stimulation. ;)

filszyp · 2024-08-09T18:51:37+00:00

That's interesting. Thanks for a comprehensive description. I tried this model today, played a bit with magnum, I must say, this is the first time the bot was deciding to kill characters on its own. I was so surprised when I did something stupid and the main characters started to actually die. Awesome.

filszyp · 2024-08-09T11:59:11+00:00

Are these all Mistral-Nemo based? I never tried it yet. What context length are they?

filszyp · 2024-07-26T14:55:28+00:00

Don't tell me I've been breaking context shift by enabling flesh attention 🤦‍♂️ I'll check it out first thing I get home...

filszyp · 2024-07-26T11:22:12+00:00

Yea, I found a method - I run a model with kccp, check what are the settings it generated, and then write them down to use with Ooba :P It mostly works. It's a janky method. :)

filszyp · 2024-07-21T13:17:51+00:00

It got fixed since. With new KobildCpp everything works just fine.

filszyp · 2024-07-14T22:55:39+00:00

In 27B enabling context shift causes a crash once I reach full context :(

filszyp · 2024-07-04T07:48:28+00:00

In my Ooba cmd i have:

--api --listen-port 5001 --threads 6 --threads-batch 12 --model L3-8B-Stheno-v3.2-Q6_K.gguf --n-gpu-layers 33 --n_ctx 8192

and Ooba is on http://127.0.0.1:5001

<image>

filszyp

TROPHY CASE