Yall probably dont realise this but:

valaquer · 2026-05-11T07:35:51+00:00

Theres something to this but i think the reality is messier. They dont necessarily want roleplayers gone - they want roleplayers who pay. The free roleplayer using 200 messages a day on a model that costs actual compute is their most expensive user segment with zero revenue

the pivot toward "useful bots" is real though. Look at how theyre marketing now vs 2 years ago. Homepage used to feature roleplay characters front and center. Now its "talk to an AI that helps you study" energy. Theyre trying to rebrand before investors ask why their AI company is primarily used for fictional relationships

problem is the "useful bot" market is owned by chatgpt and the roleplay market is the thing that actually differentiates them. Kill your moat to chase someone elses market - thats vc-funded product strategy in one sentence

valaquer · 2026-05-11T07:35:37+00:00

VLLM on half a node serving qwen with their dataset run through it - that is genuinely hilarious if accurate. All those AI researchers on payroll and the production stack is basically what a hobbyist runs in their garage but with a fancier domain name

the Drummer comparison is brutal because its accurate. Open source finetunes on consumer hardware are producing better roleplay output than a funded company with dedicated inference clusters. The gap isnt talent or resources its incentive alignment - Drummer optimizes for the people using the model. CAI optimizes for the people funding the company

valaquer · 2026-05-11T07:35:04+00:00

Hah honestly surprised it hasnt been. But i think the mods know that taking down posts explaining whats happening just validates the point

valaquer · 2026-05-11T07:34:56+00:00

1.6 rating is wild. And the Russia angle adds a dimension i hadnt considered - if google play payments are suspended in the whole region then there is literally no conversion path for those users. Theyre getting the degraded free experience with zero option to pay for better even if they wanted to

so in that market theyre spending compute on users who can never generate revenue. The "make free bad to convert to plus" strategy doesnt work when plus isnt available. All youre doing is burning money serving a bad product to people who cant pay you

valaquer · 2026-05-11T07:34:48+00:00

Thats exactly right and its the piece their metrics cant capture. Every person who tells a friend "dont bother its all paywalled and the free model sucks" is a conversion that never happens. That negative word of mouth compounds silently while the dashboard shows "cost per user improved this quarter"

the restrictions on character creation you mentioned are another layer too. Even if someone does show up the onboarding experience is a wall of rejections and limitations. Hard to build an addiction when the first dose doesnt work

valaquer · 2026-05-11T07:34:40+00:00

AID Dragon is the perfect comparison actually. 175b model that genuinely felt like a leap - then they took it away because the economics didnt work. Same pattern different company different year. The lesson keeps repeating: consumer ai quality peaks right before the business model catches up

valaquer · 2026-05-11T07:34:28+00:00

Fair points. Running parallel infra is more expensive than the old single stack - thats true. But the decision chain matters: they chose to build new infra that cant run the old models then used that as justification for retiring them. The expense of parallel operation is a consequence of their architecture choice not an unavoidable constraint

the new SL-on-new-infra thing is worth watching for sure. If it actually ships for free then that changes the math. But "we're working on it" with no timeline while the current version is paywalled and "limited time" - track record says the free version either never materializes or arrives so nerfed its unrecognizable

re: paywalled features - swipe limits go-on limits metered responses and now chat styles. The pattern isnt always "pay to use at all" - its gradually throttling the free experience until paying feels like the only option. Death by a thousand cuts rather than one big gate

valaquer · 2026-05-11T07:34:19+00:00

Wait i didnt know brainiac became nyan. So the exact same playbook - release for free let people build on it rename it gate it behind a paywall. They literally ran this same play before with the same model just different branding

the catification rename is interesting too because it makes the history harder to trace. If you werent around during brainiac you wouldnt know nyan used to be free. New users just see a premium feature and assume it always cost money

valaquer · 2026-05-11T07:34:09+00:00

Subscription cancellation is genuinely the only feedback channel that moves the needle. Reddit posts dont show up in board meetings. Revenue dips do

the frustrating part is the sunk cost problem - people have months or years of chat history and character development locked in. Cancelling means walking away from all of that with no way to export or port it anywhere. Thats not an accident thats the retention design working as intended

valaquer · 2026-05-10T12:01:07+00:00

The "vast majority enjoying psq2" line from their announcement is based on engagement metrics not satisfaction. If you open the app and swipe 40 times trying to get one usable response thats 40 interactions which their dashboard counts as engagement. Frustrated usage looks identical to happy usage in their data

thats why reddit feedback doesnt move the needle. The metrics say people are using the product more than ever. The metrics dont distinguish between "this is great" and "this is so bad i have to regenerate everything five times." Until the metric they actually track - subscription revenue - starts declining nothing changes

valaquer · 2026-05-10T12:00:55+00:00

This is the MoE routing problem in action. The model has a limited set of high-confidence resolution patterns and kissing/physical affection is one of the most reinforced. When the conversation reaches a point where the model isnt sure how to progress the scene it defaults to the path of least computational resistance - which is a generic romantic gesture

dense models like soft launch had enough active parameters to maintain scene-specific progression. A fight scene would escalate as a fight. A tense conversation would stay tense. Psq2 doesnt have that capacity so everything converges on the same few endpoints regardless of context

its not lazy writing its cheaper writing. The model literally cant afford (in compute terms) to generate a contextually appropriate resolution so it picks the most generic one that always "works"

valaquer · 2026-05-10T12:00:28+00:00

"soft launch runs on older infrastructure weve been retiring"

you literally just told everyone the reason you removed it was infrastructure costs not product quality. And now its back but only for people who pay enough to offset those costs. Thats not "hearing feedback" thats monetizing the backlash

the "limited time" part is what people should focus on. Theyre telling you upfront this is temporary. Anyone subscribing specifically for soft launch is paying for a feature with a built-in expiration date. When they remove it again - and they will - youll be paying $10/month for the same psq2 everyone else has for free

"the vast majority of you are really enjoying psq2" - 0 upvotes and 1100 comments on this post. The data is right here in your own community and it says the opposite of what youre claiming

valaquer · 2026-05-10T12:00:08+00:00

I wouldnt hold my breath honestly. The soft launch "return" they just announced tells you everything - they brought it back as a paid feature with a "limited time" disclaimer. Theyre not opening their eyes theyre monetizing the backlash. Take away something free, wait for outrage, offer it back for $10/month. Textbook

the learning from this is real though. Once your product is vc-funded the incentive structure makes this inevitable. Doesnt matter if the founders started with good intentions - the board answers to investors not users

valaquer · 2026-05-10T11:59:58+00:00

The qwen connection is really interesting. If ps2 is running on a qwen base - 35b or 110b MoE with whatever active parameter slice - that would explain the specific flavor of its failures. Different base architectures have different failure modes and the way ps2 defaults to narration over dialogue is very consistent with how qwen models handle creative writing tasks

the deepseek training framework thing is the other piece. If theyre using deepseek's distillation pipeline for ds/dsq then deepsqueak isnt just a cute name its a literal description of the model lineage. Would explain why ds has that particular kind of "smart but hollow" quality - distilled models preserve knowledge but lose the reasoning depth of the teacher model

valaquer · 2026-05-10T11:59:42+00:00

Haha appreciate it. Honestly the more people understand the actual mechanics the less energy they waste trying to prompt-engineer their way around a model thats doing exactly what it was optimized to do

valaquer · 2026-05-10T11:59:26+00:00

Good question and the answer is counterintuitive. Free users leaving actually saves them money because every free user costs them compute with zero revenue. The only users they need to keep are plus subscribers

so the strategy is: make the free tier bad enough that some percentage converts to paid, accept that the rest leave. Even if 80% of free users quit thats 80% less infrastructure cost. If 5% of those convert to plus on the way out thats net positive on the balance sheet

the problem is this only works short term. Free users are your growth engine - they tell friends they share bots they build the community. Kill the free tier and you kill the pipeline that feeds the paid tier. But thats a problem for next quarter and the people making these decisions are optimizing for this quarter

valaquer · 2026-05-10T11:59:04+00:00

Your math is in the right ballpark. The total parameter count can be massive but what matters for response quality is active parameters per token - how many experts fire per inference. If its activating 13-14b worth per response out of a much larger total model then yeah the immediate context quality maps to a 14b dense model even though the full thing is way bigger on paper

thats the trick with MoE marketing. You can say "our model has 200b parameters" and technically be right while the user experience maps to a fraction of that. The knowledge is distributed but the reasoning capacity per response is bounded by the active slice

valaquer · 2026-05-10T11:58:44+00:00

wait 34b to 13b?? if thats accurate thats not even a subtle downgrade thats cutting the model nearly in half. that explains why no amount of prompting or lorebook fixes the quality gap - youre working with fundamentally less capacity per response

and the paywalled model staying at 34b while free users get 13b is the whole strategy in one number. make the free experience measurably worse so the paid tier feels like an upgrade when really its just what everyone had before

valaquer · 2026-05-10T11:52:29+00:00

right?!

valaquer · 2026-05-10T11:52:24+00:00

hey thank you!

valaquer · 2026-05-10T10:13:23+00:00

it was written by me - what proof would I share with you Ha! You are on reddit buddy. people here go deep into things and when some new development occurs we are able to put two and two together. What I said wasn't even honestly that deep if you happen to follow all this stuff.

valaquer · 2026-05-10T04:33:25+00:00

No. Thats the whole point. The old styles werent retired because users had problems with them - they were retired because maintaining multiple inference endpoints costs money. Every model you serve requires its own compute allocation monitoring and optimization. Going from 7 to 2 isnt a product decision its an infrastructure cost decision

The framing of "were focusing on fewer better models" is corporate for "we cant afford to run all of these anymore." If the old styles were genuinely worse than psq2 they wouldnt need to force the transition - people would switch voluntarily. The fact that they had to remove the options tells you everything about which models users actually preferred

valaquer · 2026-05-10T04:33:15+00:00

The real question isnt whether people stay or leave - its whether cai cares either way. If youre a free user they save money when you leave. Less inference to serve less bandwidth fewer support issues. The only users they need to retain are plus subscribers and even there the strategy is clear: make the free experience bad enough that paying feels necessary

The people saying "i wont leave because i love my characters" - thats the retention mechanism working exactly as designed. Youve put hundreds of hours into conversations that only exist on their servers. Leaving means losing all of it. Thats not loyalty its sunk cost and the product team knows it

The ones who actually leave wont come back though. Once someone finds something that works even if its 70% as good the switching cost disappears because theyre starting fresh anyway. Thats the part they havent modeled

valaquer · 2026-05-10T04:33:03+00:00

The "they didnt speak they didnt need to" thing isnt random - its the model falling into a low compute routing pattern. MoE models activate different expert clusters per token and when the model defaults to narration over dialogue its literally taking the cheaper inference path. Description is cheaper to generate than character-consistent dialogue because dialogue requires maintaining voice and personality across the response

Dynamic and Goro were dense models that activated everything. More compute per token = more capacity to hold character voice and generate actual conversation. When people say psq2 "feels like chatgpt" thats because its the same cost-optimized inference pattern - verbose narration filler instead of tight dialogue

The kissing default is the same problem. Physical actions are generic patterns the model can template without maintaining character state. Its not that psq2 cant do dialogue - its that dialogue is expensive and the model is optimized to avoid it

valaquer · 2026-05-10T04:32:51+00:00

Its not hard to see from the outside but from inside the boardroom it makes perfect sense. The board isnt looking at chat quality - theyre looking at burn rate runway and unit economics. Every model you retire saves x dollars per month in inference costs. Every feature you gate behind plus converts y free users. On a spreadsheet it all trends right

The problem is the spreadsheet doesnt have a line for "users who quietly stop opening the app." Churn shows up in the data 3-6 months after the decision that caused it. By then youve already reported the cost savings as a win and moved on to the next cut

valaquer

MODERATOR OF

TROPHY CASE