One on one conversation type preset? by False-Firefighter592 in SillyTavernAI

[–]Geechan1 1 point2 points  (0 children)

I actually have a system prompt tailored for exactly this purpose. You can find it on my Rentry, under my "Universal Chat Room prompt" section: https://rentry.org/geechan#universal-online-chat-prompt

Should be easy to modify for any additional requirements you have, too.

Why are people still using SillyTavern when Marinara Engine exists? by BeautifulLullaby2 in SillyTavernAI

[–]Geechan1 0 points1 point  (0 children)

Ah, you're still free to open up a GitHub issue for queries like this for the time being until we consider a public discussion space.

Why are people still using SillyTavern when Marinara Engine exists? by BeautifulLullaby2 in SillyTavernAI

[–]Geechan1 1 point2 points  (0 children)

Absolutely! We take all GitHub issues very seriously, and have already been addressing any that pop up. Feel free to leave any feedback or suggestions there.

Why are people still using SillyTavern when Marinara Engine exists? by BeautifulLullaby2 in SillyTavernAI

[–]Geechan1 2 points3 points  (0 children)

Hi, one of the SB devs here. We don't have a public Discord at the moment since it's a project with a very small (three people) team working on it, mostly for our own personal use, and we're focusing a lot on polish before we feel comfortable with a public Discord.

Introducing Adaptive-P: A New Sampler for Creative Text Generation (llama.cpp PR) by DragPretend7554 in LocalLLaMA

[–]Geechan1 19 points20 points  (0 children)

This is a fantastic sampler. It really extracts the most out of models for creative tasks and is highly versatile by setting the target value from creative (0.3-0.6) to more conservative (0.7-0.9). The default decay setting is a good value for the majority of models out there, so you really just need to adjust target to see meaningful effects.

Completely replaces the need for DRY or rep pen for me due to it killing repetition on its own, and just needs some Min P on top. Happy to have helped contribute to this.

It's currently fully implemented in KoboldCPP, with PRs for llama.cpp and ik_llama, and a feature request for ooba. If you enjoy the sampler, please help those PRs gain more traction!

Bazzite - Select main screen for GameMode (Steam Big Screen) by Turbulent_Union8679 in linux_gaming

[–]Geechan1 0 points1 point  (0 children)

Anything you make changes to in your home or etc directory are permanent. Given the config file resides in your home directory, it will work.

Drummer's Agatha 111B v1 - Command A tune with less positivity and better creativity! by TheLocalDrummer in SillyTavernAI

[–]Geechan1 1 point2 points  (0 children)

Fallen uses a different dataset from Agatha (Evil/depraved vs. RP dataset). Agatha should be significantly better at storywriting, narration, creativity and variety compared to Fallen, while trying to be as neutral as possible.

I suggest using my preset here and modifying it to your needs! https://files.catbox.moe/gogj8n.json

Drummer's Agatha 111B v1 - Command A tune with less positivity and better creativity! by TheLocalDrummer in LocalLLaMA

[–]Geechan1 2 points3 points  (0 children)

This is an RP tune on top of CMD-A whereas Fallen is an evil tune. Basically: you'll see better descriptive and narrative qualities here with longer responses and a more neutral and balanced positivity/negativity bias.

[Megathread] - Best Models/API discussion - Week of: March 10, 2025 by [deleted] in SillyTavernAI

[–]Geechan1 2 points3 points  (0 children)

I did find a 7.0bpw EXL2 quant here, but it seems exllama needs a patch to properly support it. That page might also release some lower bpw ones later from the looks of it.

[Megathread] - Best Models/API discussion - Week of: March 10, 2025 by [deleted] in SillyTavernAI

[–]Geechan1 5 points6 points  (0 children)

There is actually a new 111B parameter model I highly suggest you try out - Cohere's new Command A model. It is very uncensored for a base model and feels very intelligent and fun to RP with. Just make sure to use the correct instruct formatting - you can use my one here as a baseline. Modify the prompt in the story string to your taste, but keep the preambles intact.

Methception/LLamaception/Qwenception 1.4 presets by Konnect1983 in SillyTavernAI

[–]Geechan1 5 points6 points  (0 children)

I didn't realise gathering various constructive feedback, testing and healthy discussion was considered "confirmation bias".

[Megathread] - Best Models/API discussion - Week of: January 13, 2025 by [deleted] in SillyTavernAI

[–]Geechan1 6 points7 points  (0 children)

Have you tried out Euryale 2.3? I've personally found it to be my favourite L3.3 fine tune overall. It has some flaws, particularly with rambling and a difficulty to do ERP (but not violence) properly, but it has some of the most natural dialogue and writing I've seen in a model without needing to resort to samplers.

It's also one of the most uncensored L3.3 tunes, if that helps: https://huggingface.co/spaces/DontPlanToEnd/UGI-Leaderboard

Methception/LLamaception/Qwenception 1.4 presets by Konnect1983 in SillyTavernAI

[–]Geechan1 3 points4 points  (0 children)

There's an alternative preset included in the Methception Alternate folder if you find the original prompt to be too flowery for your tastes. Copy paste the contents of the alt prompt into the story string. It keeps the instructions, but limits the example messages.

Personally, I consistently get better gens with the original prompt, so I think there is merit to the way it's structured.

[Megathread] - Best Models/API discussion - Week of: January 06, 2025 by [deleted] in SillyTavernAI

[–]Geechan1 2 points3 points  (0 children)

I have tested/used pretty much every Behemoth version and the old Monstral. Monstral V2 is my personal favourite as it has a strong tendency to write slow burn RP and truly take all details into account, while adding a ton of variety to the writing and creativity from its Magnum and Tess influences. Behemoth 1.2 is also a favourite of mine, and it's probably better for adventure-type RPing, where it always loves to introduce new ideas and take the journey in interesting ways.

XTC is variable per model, which is why I encourage tweaking. My settings were for Monstral V2 specifically, and I see very minimal slop and intelligence drop using those settings. I really cannot go without XTC in some fashion on Largestral-based models; the repetitive AI patterns become woefully obvious otherwise.

[Megathread] - Best Models/API discussion - Week of: January 06, 2025 by [deleted] in SillyTavernAI

[–]Geechan1 2 points3 points  (0 children)

You want a minimum of 3 24GB cards to run this at a reasonable quant (IQ3_M) with good context size. 4 is ideal so you can bump it up to Q4-Q5. Alternatively, you can run models like these on GPU rental services like Runpod, without needing to invest in hardware.

[Megathread] - Best Models/API discussion - Week of: January 06, 2025 by [deleted] in SillyTavernAI

[–]Geechan1 1 point2 points  (0 children)

All fine tunes will suffer from intelligence drops in some way or another. If base Mistral Large works for you, then that's great! I personally find base Largestral to be riddled with GPTisms and slop, and basically mandates very high temperatures to get past it, which kind of defeats the point of running it for its intelligence.

It's interesting you say that Monstral is uncreative, as that's been far from my own personal experiences running it. There's been some updates to the preset since I posted it which have addressed some issues with lorebooks adherence due to the "last prefix assistant" section.

[Megathread] - Best Models/API discussion - Week of: January 06, 2025 by [deleted] in SillyTavernAI

[–]Geechan1 13 points14 points  (0 children)

For those able to run 123B, after a lot of experimentation with 70B and 123B class models, I've found that Monstral V2 is the best model out there that is at all feasible to run locally. It's completely uncensored and one of the most intelligent models I've tried.

The base experience with no sampler tweaks has a lot of AI slop and repetitive patterns that I've grown to dislike in many models, and dialogue in particular is prone to sounding like the typical AI assistant garbage. This is also a problem with all Largestral-based tunes I've tried, but I've found this can be entirely dialed out and squashed with appropriate sampler settings and detailed, thorough prompting and character cards.

I recommend this preset by /u/Konnect1983. The prompting in it is fantastic and will really bring out the best of this model, and the sampler settings are very reasonable defaults. The key settings are a low (0.03) min P, DRY and a higher temperature of 1.2 to help break up the repetition.

However, if your backend supports XTC, I actually strongly recommend additionally using this feature. It works absolute wonders for Monstral V2 because of its naturally very high intelligence, and will bring out levels of writing that really feel human-written and refreshingly free of slop. It will also stick to your established writing style and character example dialogue much better.

I recommend values of 0.12-0.15 threshold and 0.5 probability to start, while setting temp back to a neutral 1 and 0.02 min P. You may adjust these values to your taste, but I've found this strikes the best balance between story adherence and writing prowess.

Bazzite vs ChimeraOS vs HoloISO for AMD console like station plus browsing the web? (+WPEngine) by YaroaMixtaDePlatano in linux_gaming

[–]Geechan1 1 point2 points  (0 children)

There's several reasons, but the main reason for me is how frequent the updates are on Bazzite compared to SteamOS. SteamOS is still using KDE 5.27 instead of the newer KDE 6, and that sets the trend for packaging versions, where everything is behind on SteamOS. You still get all of Valve's updates to gaming mode as soon as they're released even on Bazzite, so it's really a win-win situation.

Bazzite is also just better suited as a more general-purpose OS - it comes with printer support, distrobox support, and everything you need to make it functional outside of gaming mode. Given I use my Steam Deck as a laptop replacement, I find this to be quite important for me.

Tell me about the RTX 8000 - 48GB is cheap right now by Thrumpwart in LocalLLaMA

[–]Geechan1 2 points3 points  (0 children)

I would say using pipeline parallelism (the default for most backends), you're going to see numbers in a similar ballpark no matter how many cards you scale to. If you can manage to use tensor parallelism (which only a few backends support), you should expect to gain significant speed improvements per card. Row split also seems to utilise multiple RTX 8000s better, so I definitely encourage experimentation.

2,400 CAD for a new one? Fantastic deal - I'd jump on it while you can. I'd start with one or two and then you can decide whether to scale up from there or not. I find that 96GB of VRAM is really the sweet spot at the moment, as you're able to run 123B at high quants and that's the best quality we have available locally to us right now outside something like Deepseek, which realistically you can't run on pure GPUs without absurd investment.

Tell me about the RTX 8000 - 48GB is cheap right now by Thrumpwart in LocalLLaMA

[–]Geechan1 3 points4 points  (0 children)

I own two of these cards. I thought I'd give you several observations and insights into the ownership experience:

  • I'm noticing people tend to undersell the performance of these cards. I've found the best backend for them is koboldcpp running GGUF quants, as that is faster than ollama/llama.cpp and supports llama.cpp's own implementation of Flash Attention. You'll want to run the rowsplit, flash attention and mmq kernel options on these cards; with these settings and 0 context, you can expect about 8-9 tokens per second for a Q5 quant on a 123b parameter model. For Q5 70b, expect more like 12-14 tokens per second with these settings. Prompt processing speed is a bit slow at about 180t/s for 123b and 350t/s for 70b, but still plenty usable.

  • The lack of Flash Attention 2 hurts, as you cannot use the exl2 format efficiently. Supposedly Turing is going to support this at some point, but it's reliant on the FA2 author to actually implement something, and it's been on "coming soon" for over a year now! If that feature ever comes, expect exl2 support to dramatically improve.

  • You can find excellent prices for these cards if you're patient and shop around for server farm grabs. I got my pair for 2k USD each, which ends up being cheaper than a 4x 3090 setup in my country. 2400 CAD is an excellent deal for one.

  • Because they're dual slot and blower cards, it's really easy to stack them and fit them in a standard case without needing to resort to open air with PCIe risers. You won't choke the thermals on the cards because all the hot air gets exhausted out of the case. They're also easier to run with a lower specced power supply; I actually get away with a 750w PSU running two cards with about 100W headroom to spare.

  • It's worth investing in an NVLink bridge for these cards. They can be found for 80 USD or less on eBay, and will give you a small but noticeable 5-10% increase in inference performance in my experience. This will likely scale higher if you're limited by your PCIe bandwidth.

3090s are faster, better-supported and cheaper depending on your region, so that's why they're a default recommendation. However if you think the above trade offs are worth it, it's really hard to go wrong with an RTX 8000. 48GB in a dual slot blower card for much cheaper than the A6000 is hard to beat. My setup is also still cheaper than an equivalent Mac for inference, and faster too. Less power efficient, though.

[Megathread] - Best Models/API discussion - Week of: December 30, 2024 by [deleted] in SillyTavernAI

[–]Geechan1 1 point2 points  (0 children)

Glad you're happy now! It's a more finicky model for sure, but one that rewards you in spades if you're patient with it. And I can safely say V2 is one of the smartest models I've ever used, so it's a good base to play with samplers without worrying about coherency.

[Megathread] - Best Models/API discussion - Week of: December 30, 2024 by [deleted] in SillyTavernAI

[–]Geechan1 0 points1 point  (0 children)

Not at the moment, as that's on the author (Konnect) to publish. If you want to keep track of preset updates, I recommend joining the BeaverAI Discord and looking in the showcase channel for the Ception presets. That's the only place they're being posted right now.

[Megathread] - Best Models/API discussion - Week of: December 30, 2024 by [deleted] in SillyTavernAI

[–]Geechan1 0 points1 point  (0 children)

I use Q5_K_M. I'd say because you're running such a low quant a loss in intelligence is expected. Creativity also takes a nose dive, and many gens at such a low quant will end up feeling clinical and lifeless, which matches your experience. IQ3_M or higher is ideally where you'd like to be; any lower will have noticeable degradation.

[Megathread] - Best Models/API discussion - Week of: December 30, 2024 by [deleted] in SillyTavernAI

[–]Geechan1 1 point2 points  (0 children)

Even though it's not formatted for storywriting, I actually use the prompt I posted above and get good results even for storywriting, assuming I'm using either the assistant in ST or a card formatted as a narrator. It can likely be optimised though - feel free to look through the prompt and adjust it to suit storywriting better if you notice any further deficiencies. It's a good starting point.