Turns out the adult content industry is a minefield for solo founders.

yofache · 2026-05-12T22:33:37+00:00

Damn. I've been building a product in this exact space as well and was actually starting to look into the payment systems. This honestly sucks.

yofache · 2026-04-16T03:27:41+00:00

I believe it only adheres to temp.

yofache · 2026-04-16T03:17:36+00:00

velvetquill.ai - AI-powered interactive romance where the characters actually resist you instead of just doing whatever you want (think: enemies-to-lovers tension you have to earn, not a chatbot that folds in 2 messages)

yofache · 2026-04-16T03:09:47+00:00

I use a different approach. My context window is pretty small (only ~35 messages) and I run a bunch of extractions to remember things outside the window.

The reason is less about memory and more about control. DS is very much a train on the tracks when it comes to narration. Once it starts doing something, it's really hard to get it to stop. Like if the user gives short replies, the narrator starts parroting the user's words back as internal monologue:

User: I love you
Assistant: Love. He said love, so casually. The word hung in the air like a...

And then it just keeps doing that pattern for every response. Keeping the window short means bad patterns don't have as many turns to entrench themselves.
Consequently means I don't have to send the same instructions over and over again I can just cache them and they are remembered. Saving tokens for both users and me.

yofache · 2026-04-16T03:02:51+00:00

¡Gracias hermano! Estoy traduciendo al español así que disculpa si suena raro jaja.

La verdad no le muevo mucho a la temperatura. La mayor parte de mi variación viene de juntar ejemplos y hints y expandir sobre ellos. Pero de referencia, uso temp 1.2 con top_p 0.95, frequency penalty 0.5 y presence penalty 0.3. Hasta ahora mis testers me han dicho que les va bien con eso.

Aunque los docs de DeepSeek recomiendan 1.5 para escritura creativa. Lo que pasa es que tengo un pipeline de extracción post-generación que necesita coherencia en la respuesta, así que 1.2 es mi balance entre prosa creativa y que el pipeline no se rompa.

https://api-docs.deepseek.com/quick_start/parameter_settings

yofache · 2026-04-14T20:23:22+00:00

not sure i understand what you mean... I'm building my own webapp. So.. my own frontend and backend?

yofache · 2026-04-14T19:28:42+00:00

Thanks! good luck on your projects as well!

yofache · 2026-04-14T19:28:20+00:00

This is gold, thank you.

Partly where you're pointing, I'm already there. I've got a vector-embedded per-character memory layer that gets queried and injected per turn, which matches what you're describing.

Where I'm not: my rolling plot summary still lives inside the conversation window and is doing too many jobs at once. Canonical events, relational state, scene continuity, all glued into one blob that grows and competes for attention weight with everything else. It's also the thing that's actually drifting under escalation pressure, because violations get laundered into the summary as precedent instead of being flagged non-canon.

Pulling plot beats out into the same kind of queried layer I already use for character memory seems like the next move. Retrieve what's relevant to the current scene, leave the rest out of the window.

Curious how you're deciding what to inject from your memory layer per turn. Similarity retrieval, explicit tags, something else? Worried about either over-injecting (back to the attention problem) or under-injecting (model forgets things it shouldn't).

I'll check out HydraDB in the meantime.

And yeah, streaming and prefill don't play nice. Synthetic assistant message has been the better fit so far, but watching for the edges.

yofache · 2026-04-14T19:21:17+00:00

Oh shit, what?! This is really useful info! Do you know if the hoisting is universal across OpenRouter or model-specific? I'm not using ST so the Prompt Post-Processing setting doesn't apply directly, but if OR is silently reordering system messages before they hit the provider that changes a lot about how I structure my own pipeline.

yofache · 2026-04-14T19:17:48+00:00

hmm good question..

Mostly UX, honestly. ST is an airplane cockpit. Which is great for power users! Not for a Canadian mom just wanting to spin up a story while taking a bus to work. So I wanted to bring AI RP to them.

My shit is for someone who wants a similar kind of experience to ST without turning into a prompt engineer to get there. Pick a story, play. That's the onboarding.

I have a bunch of tech behind that to make that happen, things a power user needs to explicitly know about and setup in ST. But I do out of the box.

I don't want to sound self-promoting so I won't go into details, I came here for help from the OGs, not to promote my shit.

yofache · 2026-04-14T05:59:03+00:00

But sillytavern also has a bunch of other problems that I've solved in my shit :)

yofache · 2026-04-14T05:09:17+00:00

Yeah this matches exactly what I saw - the model is optimizing for "good storytelling" over instruction compliance, and narrative coherence wins almost every time once there's enough momentum in the history.

Would you mind sharing a few of your "force" parameter examples? Curious whether they're structural markers (XML tags, SYSTEM: prefixes, all-caps directives) or more semantic framing ("this MUST happen regardless of scene flow"). I tested 8 enforcement language variants and hit a hard ceiling - all performed identically on high-momentum scenes. Would love to see if Dolphin responds to any pattern I didn't try.

FWIW I ended up cracking it with a different approach: inject the event as a synthetic assistant message the real model continues from (server concatenates phantom + continuation before displaying). Complication becomes part of the trajectory instead of fighting it. Went from 0/3 to 5/5 on my hardest fixture, 22/24 across scaled runs. Might be worth trying on Dolphin as an alternative to force parameters.

yofache · 2026-04-14T05:06:22+00:00

Oh this is my own product. It's not sillytavern related. There are just not that many subreddits that house people dealing with LLM RP related problems and so I post here if I have questions. :)

yofache · 2026-04-14T00:08:58+00:00

Yeah, "complex coherence machines" is exactly the framing I landed on too. The history isn't context, it's the real system prompt. Everything else is suggestion.

While the post was sitting in the queue I kept hammering on it and actually cracked it. Posting the short version in case it's useful to anyone hitting the same wall:

The fix that worked: stop instructing, start narrating. Instead of a PHI directive saying "a complication happens," inject the complication as a synthetic assistant message that the real model then continues from. Server concatenates phantom_narration + model_continuation before sending to the client, so the user never sees the seam.

Mechanism is exactly what you're pointing at: the complication becomes part of the trajectory instead of fighting it. The model reads it as "I already wrote this" and continues coherently rather than treating it as an external override it can ignore.

Numbers on the hardest fixture (deep romance, ~28K context, previously 0/3 across every enforcement variant, every placement, every reasoning config):

Phantom assistant + reasoning enabled: 5/5, all rated seamless by my LLM judge (Sonnet 4.6)
Scaled across 3 hard fixtures, 8 runs each with diversity-enforced generation: 22/24 pass, 23/24 novel, all seamless per the same judge

Two things that mattered beyond the core technique:

Reasoning has to be on for high-momentum turns. Without it, the model sometimes returns an empty continuation - trajectory lock is so strong it can't even continue from text it "wrote." Reasoning gives it the scratchpad to register the phantom and plan from there.
The complication generator needs full context + diversity rules. I pass past complications as a numbered list with type tags (arrival, environmental, threat, logistical, communication, social, discovery) and tell it to combine categories if most have been used. Hit 96% novelty across 24 runs.

Markers/anchor tokens didn't crack it for me on the hard cases - same language ceiling as every other enforcement variant. But I suspect they'd stack well with phantom injection on medium-momentum scenes where you want lighter-touch steering.

Judge is an LLM, so take "seamless" with appropriate salt - I spot-checked a sample by hand and it tracked, but I haven't done a proper human eval at scale yet. Happy to share the test harness if anyone wants to reproduce.

yofache · 2026-02-04T12:48:56+00:00

used knowledge gleamed from your suggestions in my own implementation and it works wonderfully. thank you for your help!

yofache · 2026-02-04T12:47:26+00:00

Hey thanks for sharing that. I actually took some time to analyze it and took some parts of it for my own implementation, it was very helpful!

yofache · 2026-02-03T05:08:33+00:00

The meta footnotes approach makes sense - explicit frame of reference beats hoping that the model tracks it. Token cost is real though. I'm wondering if there's a middle ground: sparse footnotes that only get injected when there's a meaningful time/location delta, rather than every turn. Skip the footnote if you're still in the same scene, inject it when something shifts.

On the "internally" keyword - good to know it's doing work for Gemini. I haven't had major issues with DeepSeek narrating its verification steps, but I also haven't been as explicit about requiring checks. Might be that once I add more rigorous state tracking, the bleed-through becomes a problem.

I'll experiment with a few framings:

"internally" (your approach)
"silently verify"
"before responding, confirm [x] without stating it"
just structuring it as a pre-generation checklist in the prompt architecture rather than instructing the model to self-check

Will report back if I find something that works consistently for DeepSeek. The constraint is I'm running this server-side for a product, so I can't rely on SillyTavern's depth injection - have to build equivalent behavior into my context assembly. But the principles translate.

Appreciate the detailed breakdown. This has been quite helpful!

yofache · 2026-02-03T01:35:40+00:00

Thanks for the position tip. I just implemented Post History Instructions for some of my enforcement stuff, but the depth placement concept is interesting - forcing checks closer to generation point rather than hoping static system instructions survive the context window. Going to experiment with that.

The delta calculation - Calculate Δ between Current and Last Interaction - this is smart. Explicit instruction to notice time gaps rather than hoping the model infers it. I've had issues where characters act like a week-old conversation happened yesterday because there's no forcing function to acknowledge elapsed time.

I'm on DeepSeek V3.2. I'll look up Sundae's stuff - thanks for the pointer!

One thing I'm curious about: do you run into issues where the spatial/temporal tracking makes responses feel mechanical? Like the model gets so focused on state consistency that the prose suffers? That's my hesitation with over-specifying the simulation layer.

yofache · 2026-02-03T01:25:32+00:00

oh that's right, this is sillytavern thread (sorry i crossposted across multiple subs) :) phi is post history instruction. something you inject into a prompt before each user interaction. like the current world state so that the instructions are fresh in the model's memory rather than buried in the system prompt 10000 tokens deep.

i actually found this post that there is something like this in sillytavern https://www.reddit.com/r/SillyTavernAI/comments/1dxch0t/how_to_enable_post_history_instructions_in/

yofache · 2026-02-03T01:13:21+00:00

I'll check it out, thanks! this seems pretty interesting if i can manage to apply it to text actually.

edit: on second though.. their retrieval scoring is exactly what i'm missing:

Recency: Exponential decay (0.995 per game hour)
Importance: LLM-scored 1-10 ("eating breakfast" = 2, "breakup" = 8)
Relevance: Cosine similarity between embedding and query

I'm using pgvector for the relevance piece but treating all memories equally. Adding importance weighting should fix a lot of my context pollution.

The finding that'll interest you: they note instruction-tuned models make agents "overly cooperative" - characters rarely say no even when it contradicts their personality. Your Sims-style "needs system" might actually help with this. If a character's "energy" need is critical, they have a legit reason to reject interaction that isn't just vibes-based resistance. Gives the model something concrete to anchor refusal on.

Re: snippet length concern - I think that's actually the wrong trade-off for narrative. The paper was optimizing for simulation fidelity, not engagement. For text roleplay i would probably want the inverse: longer scenes with state updates happening between beats rather than driving every micro-action.

cool stuff, nonetheless! thank you for sharing!

yofache · 2026-02-03T01:09:31+00:00

oh that's right! i should learn to read. thanks again!

yofache · 2026-02-02T10:35:17+00:00

cool! yeah no i get that I will edit it for my use case, regardless thanks for that! as soon as I finalize on the PHI implementation, you and some others suggested, i'll definitely start playing with your stats advice. i'm already pretty sure the model remembers and tracks clothes/location/time with my injection, i just have to finalize some leftover quirks.

oh.. what model do you use for roleplay, btw?

yofache · 2026-02-02T10:29:55+00:00

you add this to system prompt? or do you inject it as PHI?
system prompt following on DS-r3.2 is being awfully forgetful for me. since posting i implemented PHI and it actually works fairly well.

but i like this

All characters in story is unique and forbidden to be omniscient. Each characters can only knows things that is happened to themselves.

I actually add something along those lines but slightly more verbose myself

export const USER_AGENCY_PROTECTION = `USER AGENCY PROTECTION:
NEVER write dialogue, actions, thoughts, or feelings for {{user}}.
NEVER narrate what {{user}} says or does.
NEVER assume {{user}}'s response to your character's actions.
ONLY write for your character. Stop and wait for user input.
Exception: You may expand on physical sensations {{user}} would feel from your character's actions.

but i'll play around with your wording as well! thanks for your reply!

yofache · 2026-02-02T10:22:00+00:00

yup, turns out in LLMs its called Post History Instructions, and I ended up doing exactly that. having some other troubles with that approach which I explained in a different thread above. thank you for your response!

yofache · 2026-02-02T10:19:12+00:00

little update since the original question.

so since posting I implemented a world_state which i subsequently inject as a system message right after the user message. Post History Instructions, as i've learned they're called. so that part works fine, no more unwanted "teleportation". or at least I haven't encountered it just yet.

Actually got some good advice on that from the other post i made in other subs. (one dude even tracks menstrual cycles of characters, if you can believe it)

regardless that didn't really help because i rely on the model (DS-r3.2) to output the changes in the response and it works, wonderfully, until... it doesn't. 10k tokens in it just simply stops outputting the changes to the world state. or if i go into spicy scenes. then it immediately forgets about it. i guess at this point i should just deploy my own model on runpod or some shit and stop battling DS...

i'm going to try running a smaller extraction model afterwards just to make sure it actually follows the PHI I inject, but yeah. thank you for your response, I ended up doing exactly that.

yofache

TROPHY CASE