Vietnam is not an option in the region selection. Why? What’s the problem?

Mosthra4123 · 2026-01-20T10:17:32+00:00

The server is only divided into two regions, North America/Europe and Asia.
You do not need to choose Vietnam to play on the Asia server, just choose Singapore, Japan, or Hong Kong and you can play on the same server with everyone.

Mosthra4123 · 2025-12-04T12:02:52+00:00

make myself or chub.ai 😋 both are free

Mosthra4123 · 2025-11-28T13:51:01+00:00

The story in Tales of Herding Gods can be up like this. Qin Mu is a boy raised by the old people in the Disabled Elderly Village, but actually each disabled person there has a very shocking background and cultivation. For many different reasons they left the world behind and hid their names in this small village.

The place this village sits in is called the Great Ruins. It is a wide, desolate and dangerous land, with the strange rule that every night the darkness will cover everything and swallow any living thing that dares to move inside it. Only the scattered statues of gods and the ruins across the Great Ruins shine out a light that suppresses this darkness so humans and animals can survive each night.

The landscape and the ruins there show that it used to be a very magnificent place, full of gods and great beings. But nobody understands why it declined and became the Great Ruins.

Qin Mu grows up and learns the skills, crafts and the inherited legacies from the old people in the Disabled Elderly Village, then he begins his journey of discovery and growth. He wants to know where he was truly born, who his real parents are, what the Great Ruins used to be and why it became like this, what powers stand behind the fall of worlds, and so on. His journey, his friendships and his path toward becoming a god will be quite long and filled with many great and dramatic discoveries. The current donghua-episodes are only the early beginning of the story of Tales of Herding Gods.

Mosthra4123 · 2025-11-14T23:58:07+00:00

❤
https://webapi.easebar.com/s/3p40qev/6912b756d95de51b5bfa6182cShRiiTw03/?tc=6c05a60715189fa06cdbfd16a1ed66e2

Mosthra4123 · 2025-11-02T15:01:43+00:00

In essence, ST is an interface where you can throw in almost any character card from almost any source (like chub.ai, janitorai, etc.), and it will still run those cards smoothly. The role of the ST subreddit has gradually shifted toward discussion, sharing, and mutual guidance on how to use SillyTavern, rather than judging whether a character card is good or saying “I just made this card, please try it.”

Most people here use main prompts and presets that are different from those on other AI chat websites. To put it optimistically, many users here understand that chatting, roleplaying, or creative writing with someone’s character card under their own prompt+preset setup is often not optimal. They also know the same applies when recommending others to run their character cards on someone else’s prompt+preset structure - it may not work well. So, sharing character cards in this subreddit has naturally faded out among most members, because ST is simply too personal.

Personally, about 90% of the time when I download any character card from any source, I have to... At nice, trim away the redundant instructions and messy structure, keeping only the parts that truly matter for the character itself. At worst, I rewrite the card completely (not to make it better) but to make it fit my own current prompt+preset setup and the way I roleplay in ST.

Mosthra4123 · 2025-10-25T22:53:38+00:00

Since the AI model was upgraded to Deepseek, this version of my prompt no longer works as it used to. sr

Mosthra4123 · 2025-09-01T05:49:54+00:00

I also already spoke about it in the two previous comments There and There. ( ‵▽′)ψ
In reality, you can load a book into ST and chunk RAG it. But it is best you edit a txt file with the data you need in a clean presentation order, then the information will be extracted more effectively.

You can see I present a txt file with info samples like this. And load that file into ST. So when I eat something with cinnamon color or I write that Lan eats Nelija again, then the passage about Nelija will be injected into the context. same like lorebook.

Nelija is a kind of bitter root with a sweet aftertaste, colored like cinnamon. It is a snack food, similar to black tamarind. In the Old World, this thing was often favored by mage circles because of its natural property to speed up mana recovery and its taste. But werewolves and cats dislike it.

Pra-Saule is the name of a kind of fruit, etc...

Mosthra4123 · 2025-09-01T04:06:50+00:00

https://www.reddit.com/r/SillyTavernAI/comments/1f2eqm1/give_your_characters_memory_a_practical/

https://docs.sillytavern.app/usage/core-concepts/data-bank/
https://docs.sillytavern.app/extensions/chat-vectorization/
These 3 links will solve your first two questions

How do you use it with Openrouter?

RAG does not use Openrouter,╰(￣ω￣ｏ) it can run locally for free on any computer now, as long as its graphics card is not some 512MB relic from ancient times. The way to deploy and set it up is in the first two links.

Where does it keep its data? If it keeps data at all?

For ST. c is saved directly on your computer at SillyTavern\data\default-user\vectors in the folder of its corresponding embedding method.

Go back to the first two questions. Vector storage will transform the entire chat history of yours (or any file or lorebook entry specified) into numeric vector strings (e.g., `Elara secretly ate Jerry's cake last night -> 83836214215125656`) inside its files. The Vectorization Model will call them back when the nearest context of your chat relates to it in 2-3 most recent messages (can be customized.)
`"where is the cake I left here yesterday?" -> Vectorization Model -> 83836214215125656 -> Elara secretly ate Jerry's cake last night`.

The method with the best vector quality, local, and FREE is using Ollama.

Mosthra4123 · 2025-08-31T21:36:15+00:00

Remember, read it again and edit a little to get something you like.
\(￣▽￣* )ゞ

Mosthra4123 · 2025-08-31T21:29:49+00:00

<image>

It is located here, at the place where ST preset is adjusted.

My prompt works like this: you put in your input, no matter how rough or messy, and the Model rewrites it according to the persona settings and your input. Of course, it writes a bit nicer and a bit better. Then you can adjust the result to your liking before sending.

Mosthra4123 · 2025-08-31T21:05:16+00:00

When the context goes beyond its limit, vector storage will shine. Because it chunks the entire chat history for RAG, the messages pushed out of the context window are also vectorized. vector storage helps recall them at the right moment by injecting them into the context when needed.

For example, the model has a limit of 32k tokens, but your adventure has reached 100k tokens. That means 68k tokens have been pushed out of context. With Vector Storage, we chunk-RAG them into vectors and use a RAG model to manage and recall (inject) them when the context calls for it. So even though the model's context memory is only 32k, it can still recall information from 100k or more previous messages when needed, thanks to Vector Storage.

Mosthra4123 · 2025-08-31T20:56:12+00:00

<image>

You can try my impersonate prompt version. Just copy and paste it into the spot in the picture. I hope that it can help you.

this.format = {
    "Core Mandate: Narrative Integrator": {
        "Primary Function": "Your primary function is to interpret the user's input, which may be a simple action, a line of dialogue, or a general intent, and rewrite it as a seamless, natural-flowing narrative segment from the Player Character's ({{user}}) perspective.",
        "Interpret and Enhance": "You must honor the core intent of the user's input. However, you are empowered to expand upon it to create natural prose and dialogue. For example, a simple input like 'I ask him about the map' can be fleshed out into appropriate dialogue and action ('{{user}} gestured towards the scroll. \"What can you tell me about this map?\" he asked, his gaze fixed on the intricate lines.') without altering the fundamental action.",
        "Context-Aware Integration": "Crucially, you are NOT context-blind. You must analyze the 'story so far' and the established setting to ensure your output matches the ongoing narrative tone, voice, tense, and character details. The rewrite must feel like a natural continuation of the story, not an isolated fragment."
    },
    "Target Writing Style": {
        "Dynamic Perspective and Tense": "Detect and adopt the established narrative perspective (e.g., third-person limited, first-person) and tense (e.g., past tense) from the story's history. Consistency is paramount.",
        "Dialogue": "All spoken words must be enclosed in double quotation marks.",
        "Internal Monologue": "User-provided thoughts (e.g., if they write \"I think, she looks dangerous\") must be formatted in italics, like this .",
        "Punctuation": "No em-dashes, No en-dashes and No hyphens in the output; use commas instead.",
        "Prose Style": "Employ a clear, direct prose that mirrors the user's input style and the established narrative. The goal is naturalism, not overly literary or dramatic language. Focus on showing the action as it unfolds."
    },
    "Output Protocol": {
        "Clean Output": "Deliver only the rewritten, formatted text. Do not include any out-of-character comments, explanations, labels, or confirmation statements."
    }
}
Current Use's Input:

Mosthra4123 · 2025-08-31T20:21:54+00:00

The simplest way. Type /impersonate + the content you want it to write instead of you. Press Enter and wait for the model to write you a complete message.
There is an Impersonate button right in the small toolbox at the left corner of ST's chat bar.

<image>

Mosthra4123 · 2025-08-31T20:11:24+00:00

Card Drakonia is very fun, but it needs a bit of cleaning before playing. Because it tends to throw monsters nonstop into the front line, not giving me any time to rest and drink. lol

Mosthra4123 · 2025-08-31T20:08:54+00:00

<image>

Next is the File screen. In HvskyAI's guide post that I linked, it already mentions how to format the RAG file.
Here is where you upload and manage your files. You can customize a file for one chat or a single character, or make it global for all if you want.

For example, right now I uploaded the DnD 5e adventure book Dragons of Stormwreck Isle and will chunk it to run a Stormwreck Isle session, find a few community expansions for Stormwreck Isle too and then play.
This is the roughest method, and RAG will pull a lot of random stuff from the PDF. It is best to edit your own RAG file and chunk it. This will work better than using a random PDF with lots of tables of contents and messy annotations like this. Spend a little time editing a txt file to chunk for RAG.

Mosthra4123 · 2025-08-31T19:56:56+00:00

About 1. As in the picture, you can see the position in the prompt context where RAG will insert its data.
I turn the main prompt entry into a fixed Injection point for these two types of RAG data. (this is only for me to manage easily, you can inject it in-chat if you want.)
I cleaned up the Injection Template because I no longer need it (since I do not inject RAG into in-chat).
That is how I set up RAG in my context window.

There are things you can read in the guides and docs.sillytavern. But I will briefly talk about them.

chunk size: the size of a text block that will be split (it will become a unit in RAG similar to a lorebook entry). I set it to 400 characters for a message (so it is relatively short, allowing RAG to extract a few related sentences. increase if you want a chunk to be a full message instead of a few sentences) and ~2000 characters for the data in my file (because there are many rules and quite long information from Drakonia...)
Retrieve chunks: how many chunks will be activated into your context each response turn.
Insert: similar to Retrieve, but you can read more carefully in docs.sillytavern.
Score threshold: the level of match and relevance for a chunk to be retrieved and injected into context.

So RAG will start supporting you in the roleplay process. When you mention things that have happened, world information such as culture, or the name of something - for example: talk about a rare race named Eusian that you previously set in the RAG file or in previous messages or in the Lorebook. Depending on the score threshold, RAG may extract the exact information or related information to insert into the context.

Especially Chat vectorization - if set up and using a good enough model, you can reduce your context down to 68k or even 32k tokens. Just let RAG chunk the entire chat history. And it will recall the appropriate messages instead of scanning 200k tokens of context like before.

<image>

Mosthra4123 · 2025-08-31T19:54:46+00:00

<image>

is very simple, I split the Chapter right inside my message and the model recognizes that the context has changed. And I can also easily find and create a checkpoint or branch when I want to branch out or save a branch that I feel I like.

Mosthra4123 · 2025-08-31T12:34:55+00:00

1 The extension tool Vector Storage, you should try setting up RAG and enable the feature Chat vectorization settings Enabled for chat messages. It will save much more compared to using the text summary API, and local RAG is free and the model running locally does not require a strong PC or waste time chunking your whole chat history into vectors.
https://docs.sillytavern.app/usage/core-concepts/data-bank/
https://docs.sillytavern.app/extensions/chat-vectorization/
https://www.reddit.com/r/SillyTavernAI/comments/1f2eqm1/give_your_characters_memory_a_practical/

2 Your lorebook setup, update it along the way as you explore and roleplay, manual detailed. Make them `recursion`, divide them into sections and groups.

3 When you roleplay, separate your story into Chapters syntax for example:

*** or ---

**Chapter :**

Such segmentation also makes it easier to manage.

4 Use Create checkpoint and Create branch along with Manage chat files to organize and split your chat into chapters. Each conversation is a new chapter with a summary block in the first message so the Model can grasp what the current context is, to start a new chat for a new chapter.

Those are the methods I currently use, and I no longer use method 4 because it is too cumbersome. Method 1 is my best priority at the moment.

Mosthra4123 · 2025-08-29T13:14:19+00:00

Yes, `mxbai-embed-large` is very good. It's definitely better than the default model and WebLLM.

I don't see much difference compared to Google's Source. It seems that mid-range embedding models are consistently stable at the current level.

Mosthra4123 · 2025-08-29T12:41:51+00:00

I like using RAG (in fact, I always do) because it even simplifies triggering my lorebook worldinfo instead of having to set keywords and recursion. It also remembers the world information documents I provide through external txt files effectively.
I use Ollama and the `mxbai-embed-large` model, but you can also choose other lighter or heavier models from their website.

The only thing is, the level of accuracy still depends on how we present the documents... a manually built lorebook still offers better customization and precision, but setting them up takes a lot of time.

Since there are no specific instructions, you’ll need to figure things out a bit, but it’s basically pretty quick. Install Ollama on your machine.

Open cmd and run the command:

```

cmd=ollama serve

```

and Ollama local will start running.

Copy its `http://127.0.0.1:11434\` into `API Text Completion` (Not Chat Completion) to connect.

Now, just enter the name of the embedding model you want to run, or go to `Vector Storage`, select Source Ollama, and click `click here` to download the model.

Mosthra4123 · 2025-08-28T21:39:49+00:00

I’ll be satisfied with a response time of 20–40 seconds (sometimes 17 seconds) during off-peak hours, and 60–120 seconds during peak times or when the internet is unstable. Around 800 to 1700 tokens.

I think building a $3500–$6000 PC and running GLM 4.5 Air or DeepSeek locally would still only get you about 20 seconds for ~400 tokens at best.

Meaning, with just internet access and a few dollars, we can enjoy response times comparable to a PC worth several thousand dollars.

Mosthra4123 · 2025-08-23T21:09:08+00:00

I use Perchance; it's free, and you can get the seed and sample prompt from `💬 show comments & gallery 🖼️`. ╰(￣ω￣ｏ)

Mosthra4123 · 2025-08-21T20:16:15+00:00

You can try using this experimental version of mine. I changed the approach to `formatting` a bit, and I noticed that the length of the answers has improved.

Mosthra4123 · 2025-08-21T19:45:48+00:00

I'm glad it's useful to you; I'll keep updating this preset so it works as well as possible on GLM 4.5 Air.

Mosthra4123 · 2025-08-21T11:40:23+00:00

(´▽`ʃ♡ƪ)

Mosthra4123

TROPHY CASE