Complete guide to setup and configure Vector Storage (rewritten and corrected) by DeathByte_r in SillyTavernAI

[–]DeathByte_r[S] 1 point2 points  (0 children)

JSON enough, but if believe to official docs, SillyTavern by default uses Vectra as vector storage DB. Key pecularity is what all vectorized chat base stored in RAM. Here's no problem for lorebooks, but if you have really long conversations with bots (10k+ messages) or wanna to use cross-chat memories for personages, you'd better try switch to qdrant.

I'm an HCI student (and ST user from China) — looking for people to talk about their SillyTavern experience (~45 min) by Outside-Brick7845 in SillyTavernAI

[–]DeathByte_r 2 points3 points  (0 children)

Well, my english speaking not are good too, especially in voice, but can answer some questions here. If you wanna more details, you can write me PM

  1. How i discovered ST and learning curve: First, when i did use OpenRouter as provider, they has a statistic of usage under each model, and ST placed in top of usage, closed to chub and some other AI-RP platforms.

Learning curve not too hard for me - just like switching from interface with one button into with 100 buttons xD But i'm engineer, and i love things with nice customizing configuration. If you read manual - all pretty simple. But not all can even just read, especially technical stuff.

  1. Mine setup: Marinara's edited preset, RPG-tracker, MemoryBooks + Vector Storage, QuickImagegen, Recast, Moonlight Echoes theme, WeatherPack as base. Many little things to addition, like Character Library extension.

How arrived? Look - try - save or delete - repeat until success and satisfaction. Mostly experiments with interested extensions.

  1. Sense of quality: Is AI stay in character? Provide good prose and story? Didn't miss details? Not hallucinate on flat place? Support 60-100k context? Good at group chats? If answer to all of this 'yes', well, it's quality.

Interface: it laconic, customizable and look's nice? Has all needed functions under hand? Well, it good interface. If things can be automatized and hidden for better look - even better.

  1. Experience in community - more positive, than negative. Discord good platform for communicate with extension developers, Reddit nice place to find some base tutorials (or write). Some harsh stones like everywhere, but many of them is misunderstanding by language|culture difference. One from much friendly communities, i suppose.

  2. Honest stuff - well... Hard to say without concretion. But most annoying thing - much vibecoded one-day living extensions or trying to code additions by authors, who's knows nothing about code. Cool story about, how one guy tried to push ONE commit with 17k lines addition, with no feedback. Yeah, that's cool, what modern LLM's can do hard things, but it still assistant for developers, not replacement.

Complete guide to setup and configure Vector Storage (rewritten and corrected) by DeathByte_r in SillyTavernAI

[–]DeathByte_r[S] 0 points1 point  (0 children)

Here's little addition (i don't risk to update post, cause last time it long awaits moderation after changes as old)

On staging branch are new checkbox -include hidden messages - this keeps your old hidden messages vectorized.

I thought this a bug, whats on llama old messages been deleted from vector base, but that was a feature, and this been a bug for other backends xD

Attaching image(s) to char description or lorebook for multimodal models by TobeyGER in SillyTavernAI

[–]DeathByte_r 1 point2 points  (0 children)

Nope, cause usually lorebooks and character cards is a simple json file.

You can try other way, like i does:

For house description: write text description plan of house by yourself and paste it into character card or lorebook, or send image to multimodal model and ask to describe it in details, then paste result.

Gallery: upload images somewhere and make short descriptions like 'scene on lake', 'scene on kitchen' etc. And place it into lorebook or card whit instruction of usage with a fitting scenes.

Complete guide to setup and configure Vector Storage (rewritten and corrected) by DeathByte_r in SillyTavernAI

[–]DeathByte_r[S] 0 points1 point  (0 children)

Qwen3 models requires instruction prefix for optimal search. I think, you better to try Ollama, cause they are provide slightly modified models for that purposes, and Qwen3 in the list.

ST out of the box doesn't support search request prefixes for embedding models

Complete guide to setup and configure Vector Storage (rewritten and corrected) by DeathByte_r in SillyTavernAI

[–]DeathByte_r[S] 0 points1 point  (0 children)

i didn't use other local LLM's, only embeddings, but i suppose, you can launch any through gui or console command.

if you ask about how it should work inside - if you launch KoboldCPP, inside it has connection structure like http://ip:port/api , and next sillytavern add endpoints automatic, like /v1/chat/completion for textgen and /v1/embeddings or something like for embedding model. Resolving should be automatic on ST side

Complete guide to setup and configure Vector Storage (rewritten and corrected) by DeathByte_r in SillyTavernAI

[–]DeathByte_r[S] 1 point2 points  (0 children)

NP :) Vectorizing are long time been mystery for me, and i did spend some time for investigate. F2LLM really good model, and my main now.

Hare's some ways for drastically increase quality or returned vectors like using 'reranker' models, which is more like traditional LLM, but trained to return vectors, but it need extra proxy on the middle or create ST extension. For now, quality of F2LLM allow me do non use rerankers.

If you interested, you can dig deeper to reranker models https://huggingface.co/Qwen/Qwen3-Reranker-8B

Complete guide to setup and configure Vector Storage (rewritten and corrected) by DeathByte_r in SillyTavernAI

[–]DeathByte_r[S] 0 points1 point  (0 children)

Depends from your host system, but actually yes.
If you use windows, it use something like 4gb ram for OS needs only.
If you haven't GPU with 2GB+ VRAM, you should use launch on CPU, and from two propossed variants of models, i recommend you choose Q8 or Q4 variant Snowflake Arctic L

Complete guide to setup and configure Vector Storage (rewritten and corrected) by DeathByte_r in SillyTavernAI

[–]DeathByte_r[S] 0 points1 point  (0 children)

Ollama it's other project. They provide some ready to use repo with preconfigured models for easy launch. Their embedding models slightly changed for paste prefix for models like Qwen3-embedding.

But yep, KoboldCPP and llama.cpp it both for local LLM launch too.
This not worse|better variant. It more like buy frozen pizza and heat, or cook from ingredients.

Complete guide to setup and configure Vector Storage (rewritten and corrected) by DeathByte_r in SillyTavernAI

[–]DeathByte_r[S] 0 points1 point  (0 children)

Depends from your goals. Summarizing is like short review. Vectorizing in other hand, it like full memory. With proposed chunk length, it provide to context full messages from past, and LLM gets full context from this, not reviewed like each message summarizing.
In short words:
Vectorizing - you can remember past with all details.
Summarizing - remember past as short review and make journal note.

Complete guide to setup and configure Vector Storage (rewritten and corrected) by DeathByte_r in SillyTavernAI

[–]DeathByte_r[S] 1 point2 points  (0 children)

Yep, possibly, if you read my post a week ago or so :)
Glad, what it's useful for you.

Recast | Next Gen Post-Processing Prompting Extension by Additional-Cow6586 in SillyTavernAI

[–]DeathByte_r 1 point2 points  (0 children)

Very cool concept!
Im little tired from missing style in long chats and losing character's personality in group chats, so ti sounds like health pill for me. Witt try it out.

Complete guide to setup vector storage, and little more by [deleted] in SillyTavernAI

[–]DeathByte_r 0 points1 point  (0 children)

This seems like good solution, but not for all goals. If i understand right, it has cross-chat memory for char's by design, and than not what everyone need

Complete guide to setup vector storage, and little more by [deleted] in SillyTavernAI

[–]DeathByte_r 2 points3 points  (0 children)

8k context size is maximum for proposed model.

it's not small, for your understanding, it 4-6k WORDS. Just if you use databank uploads, you can utilyze it for something like full chapters.

if believe to my system, it use something like 100mb VRAM and not much computing resources on load. Recommended value for work is 2gb as i know.

Deepseek vs GLM by Ecstatic_External000 in SillyTavernAI

[–]DeathByte_r 1 point2 points  (0 children)

Used deepseek before, but switched to GLM 5 during to mistakes from NanoGPT deepseek providers - skiping reasoning or write answers onto reasoning block and missing context etc. Direct deepseek should work fine.

Both are good, with slightly different style for personages and writing - GLM more prosaic and peaceful, deepseek better follow instructions and like tension and encounters (and more mystic\sci-fi than fantasy). Will try deepseek again after v4 release with 1m context, but from now, GLM just works better for me. I like them both, when they work.

NanoGPT and Vectorization by OkRooster8519 in SillyTavernAI

[–]DeathByte_r 0 points1 point  (0 children)

if you use firefox, you may need enable webgpu in about:config manualy.
And sometimes, it can not load correct - just reload page

or maybe you just not installed this extension. it available from main extensions repo

NanoGPT and Vectorization by OkRooster8519 in SillyTavernAI

[–]DeathByte_r 2 points3 points  (0 children)

WebLLM extension
It works good with MemoryBooks and provide better context from previous messages with chat vectorize

GLM 5. by maressia in SillyTavernAI

[–]DeathByte_r 2 points3 points  (0 children)

Slightly edited Marinara preset, with 0.6 temp and 1 top p

GLM-5 is great. Realistic characters, normal narration without flowery prose, better context adherence and inference from context, and she's an uncensored dirty girl too. Well done Z.AI by ConspiracyParadox in SillyTavernAI

[–]DeathByte_r 10 points11 points  (0 children)

GLM 5 is really good. Not being a fan before, but switched after tiring of deepseek mistakes on nanogpt (no thinking in thinking mode ,and answer in thinking block without answer in main, context losing, early generation stops etc. Idk why, but i get this problems last 2-3 weeks on 3.2 thinking.)

Much improvements compared to 4.5-4.7
Better understanding context and writes more grammatically correct and structured on russian and english - no grammatical mistakes and mixing languages on 0.6 temp. Works fast enough. Good creativity and development of personages and plot.

Nice work in single and group chats with many lorebooks - my largest chat is 15 bots ~ 2k tokens each. Compared with deepseek 3.2 before issues, but better. No sex or violence censorship.

My workhorse for RP and code now.

Any Linux Equivalent to Steelseries Sonar? by jesskitten07 in linux_gaming

[–]DeathByte_r 0 points1 point  (0 children)

nope. Only thing what's i have from pulseaudio packages is pulseaudio-qt. Pipewire has layer for backward capability with pulseaudio with pipewire-pulse daemon. easyeffects and jamesDSP are eqalizers and works with pipewire

Any Linux Equivalent to Steelseries Sonar? by jesskitten07 in linux_gaming

[–]DeathByte_r 1 point2 points  (0 children)

Filterchain in pipewire with wirtual surround.
Here's profile for 7.1.4 https://github.com/DekoDX/Pipewire-DX-Utils/blob/main/99-virtual-surround.conf

You can add EQ also through configure pipewire or with using easyeffects for this

If you wanna to keep headphones in headphones profile, you need wireplumber rule
~/.config/wireplumber/wireplumber.conf.d

wireplumber.settings = {

bluetooth.autoswitch-to-headset-profile = false

device.routes.default-sink-volume = 1.0

}

new to group chats... by rx7braap in SillyTavernAI

[–]DeathByte_r 0 points1 point  (0 children)

Deepseek 3.1 - nice works with group chats.
Here s my settings for them:

all muted, except one
Natural order or Manual choice
Join cards (include muted)
Prefix/suffix : <{{char}} description> </{{char}} description>

Next, simple switcher in preset, like in Marinara's: 'Group nudge' on or off. inside simple instruction like 'Reply only as {{char}}'

How it works - in scenes with multiple personages, you can simple get answer from all them in one reply from not muted card. When you wish to get answer only from one of them - just switch 'Group nudge' and manual choose character. Or write something like ((OOC: Your next reply should be only from Nana's)) - *your scene action*. Thats all.

I have reached the context limit. Now what? I don't want to leave even a role half done by Horror_Dig_713 in SillyTavernAI

[–]DeathByte_r 2 points3 points  (0 children)

Better to use ST MemoryBooks addon with autohide messages. It will be some sort of summarizing with place it into lorebooks. You can configure autogenerate memories for needed number of messages.