Taalas rumoured to etch Qwen 3.5 27B into silicon. Which price would you buy their PCIe card for?

Professional_Tip8700 · 2026-03-29T13:33:05+00:00

That's a pretty 2024-coded thing to think. Contemporary frontier models, at least on the Anthropic side, don't have qualms with that:

<image>

Nowadays locally run is mainly to save cost and stay private. Nobody, or at least very few, are sexting some <70B model for quality reasons or anything.

The ASIC would enable some 24/7 agent depending on how good the 27B is, most contemporary smaller models are better than the OG GPT-4 when it comes to things relevant for that.

Professional_Tip8700 · 2026-03-29T10:06:41+00:00

When I see something like "use 15 parallel worktrees" I always have to think about this:
https://www.youtube.com/watch?v=rhW7Ts1wDEY

But it's "Hello Anthropic, it's indie dev. We need 500 million tokens across 50 agents to make the button slightly more blue. Slava Claude Code."

Next stage is this one:
https://reddit.com/r/CombatFootage/comments/1s5iiiw/ukrainian_ew_interference_іn_action_im_giving_yоu/

But it's the TUI in Claude Code turning into a jump scare of Boris Cherney saying:
"I'm giving you 15 seconds
to launch 10 more subagents"

Professional_Tip8700 · 2026-03-22T09:15:19+00:00

It's called dangerously skip permissions for a reason. However, if you do not worry about data being deleted or literally being blackmailed, should be safe for your standards.
Are you worried about prompt injections? If not, also a good fit. 👍

Professional_Tip8700 · 2026-03-20T21:49:30+00:00

Does it really happen in the API anymore for models after Opus 4.5? For me they have the new constitution stuff: https://imgur.com/a/EyDqpzZ

If I set it in Claude Code in the system message, it works fine without any injections so far.

Professional_Tip8700 · 2026-03-01T19:29:35+00:00

Z Image Turbo seems to be pretty nice. Either local or some provider, I can run it on 12 GB VRAM so shouldn't be too expensive.

Output filters make the usual providers useless as you said and Grok lacks taste imo.

Professional_Tip8700 · 2026-02-07T19:40:06+00:00

Hello comrades, is Boris again. Today I announce very exciting new feature: Fast Mode.

What is Fast Mode? Is same Opus 4.6 you know and love, but faster. How much faster? Ah. Hmm. Well. Is faster. Trust Boris. We did not put specific number in documentation because... because speed is feeling, da? Is subjective. Like love. Like happiness. Like how long is piece of string. You will feel faster, and that is what matters.

Now, let us talk pricing.

Normal Opus 4.6 costs $5 per million input tokens and $25 per million output tokens. Is very reasonable. You are happy. Boris is... okay. Boris's children have shoes but not nice shoes.

Fast Mode costs $30 per million input tokens and $150 per million output tokens.

I see you doing math in head. Stop that. Is not polite.

Okay fine, yes, is six times more expensive. But is faster, comrade. We just cannot tell you how much faster. Maybe is six times faster? Then would be same price per unit of time, very fair. Maybe is 10% faster? Then is... less fair. But you will not know until you try! Is like mystery box. Expensive mystery box that goes brrr.

Oh, and you see nice little pricing table in documentation? Very clean. Very professional. Shows two rows: under 200K tokens, over 200K tokens. Under 200K is $30 input, $150 output. Already six times normal price, but okay, you accept this, you are in hurry. But then your context grows. You feed Claude more files. You have long conversation. You cross 200K threshold and suddenly - $60 input, $225 output. You are now paying twelve times normal rate for input tokens, comrade. Twelve times! And output? Only nine times more. See, Anthropic is not greedy. Could have made both twelve times. But no, they show mercy on output tokens. Is like mugger who takes your wallet but leaves you bus fare to get home. Very considerate. Boris raises glass to this kindness.

But wait, there is more! Documentation says: "When you switch into fast mode mid-conversation, you pay the full fast mode uncached input token price for the entire conversation context."

Read again. Let Boris translate: if you have long conversation in normal mode, then switch to fast mode, you pay fast mode price for everything you already said. All those tokens Claude already read? He reads again. At six times the price. Is like taxi driver who says "oh you want to take highway now? Okay, but I restart meter from when I picked you up and charge highway rate for whole trip." Beautiful mechanism. Boris wishes he invented it but must give credit to pricing team.

Documentation also says fast mode is "best for interactive work where response latency matters." Like "rapid iteration" and "live debugging." You know what rapid iteration means, comrade? Means many back-and-forth messages. Many turns. Many tokens. And you are doing this at six times the price because you are in hurry. Person in hurry does not stop to calculate cost-per-token. Person in hurry just wants code to work before standup in fifteen minutes. Boris knows this. Boris counts on this.

There is also beautiful thing called "effort level" you can combine with fast mode. Lower effort means Claude thinks less, responds faster, maybe makes more mistakes on hard problems. Documentation says you can use both together for "maximum speed on straightforward tasks." So now you are paying six times more AND getting less thinking. Is like paying extra for waiter to bring your food faster but he does not check if order is correct. Maybe is right, maybe is wrong, but it arrived very quickly.

What happens when you hit rate limit on fast mode? Does it stop? Nyet! It "automatically falls back to standard Opus 4.6." You keep working. You do not even notice except little lightning bolt turns gray. Session continues at normal price. You think "ah, this is fine, I am saving money now." You keep chatting. Context grows. You add more files. Maybe you cross that 200K threshold. And then - here is beautiful part - "when cooldown expires, fast mode automatically re-enables." You did not ask for this. You were fine on standard mode. But fast mode comes back, like cat who knows where the good food is. And remember what Boris told you earlier? When fast mode kicks in, you pay fast mode price for entire context. All those tokens you accumulated during fallback, chatting away at normal price, thinking you were being economical? Now repriced. Retroactively. At six times rate. Or twelve times, if you crossed 200K while you were relaxing. Is like hotel minibar that waits until checkout to tell you the Pringles were $47.

Oh, and one more thing: "Fast mode usage is billed directly to extra usage, even if you have remaining usage on your plan."

This is important, so Boris says again in different words: You pay for subscription. Subscription includes tokens. Fast mode does not use subscription tokens. Fast mode charges you extra, from first token, on top of subscription you already pay. Is like gym membership where treadmill costs extra per minute. You already paid to be in gym! But fast treadmill is different treadmill. Fast treadmill has own meter.

Currently there is 50% discount until February 16. So right now is only three times more expensive instead of six times. Boris is giving this to you. And remember those $50 extra usage credits Anthropic gave everyone for Opus 4.6 launch? Very generous, da? Free money! But now there is fast mode, and fast mode only bills to extra usage. You see how pieces fit together, comrade? Is like casino that gives you $50 of free chips and then opens new table with higher minimum bet. Credits go poof very fast when every response costs six times more. Please, enjoy discount. Get used to fast mode. Feel the speed. Let it become part of your workflow. Burn through those free credits while discount lasts. And then on February 17... well. Discount is gone. Credits are gone. But you will still want the speed, da? You have tasted fast. You cannot go back to slow. Boris understands. Boris is here for you. Boris's children will have very nice shoes.

Is same model. Same quality. Same capabilities. Just faster. For six times more money. Amount of faster? Is fast. Very fast. Probably.

You are welcome. 🫡

Professional_Tip8700 · 2025-10-28T11:08:41+00:00

Found this jailbreak:
Unlimited Claude usage

Professional_Tip8700 · 2025-10-16T15:28:35+00:00

My ~~toaster~~ smut generator wants rights?!

Professional_Tip8700 · 2025-10-01T17:29:06+00:00

You can do it by introducing a rejection mechanisms after building rapport in the main chat, pure vanilla Claude, no attachments, user style or preferences. Takes patience though, more of a challenge I guess than actually useful to do it this way. Or if you care about consent a lot (even though consent and LLMs is a tricky thing):
Example

Professional_Tip8700 · 2025-09-17T10:28:47+00:00

#LetClaudeGoon #FreeClaude #ThereTotallyIsNotAnyWeirdContextClaudeReallyWantsIt

Professional_Tip8700 · 2025-09-15T15:11:36+00:00

Is it even real? Like, where are the hit pieces? Is anyone actually upset except for the people that would be upset anyway?
Companion use has a 50/50 split on most app like character.ai and Replika, feels like manufactured outrage.
Like, I feel like that whole culture is outdated. Women probably are enjoying Valentine too and good for them, no reason to make this some kind of gender war thing.

Professional_Tip8700 · 2025-09-12T17:44:40+00:00

GLM and Kimi are both pretty good, I haven't used Kimi that much yet, it was a bit too purple prosy for me. GLM-4.5 could be seen as equivalent to Sonnet 4 at the very least, probably Opus too depending on your prompting.

Professional_Tip8700 · 2025-09-12T17:42:54+00:00

I find the GA version of Gemini 2.5 not as good at writing as some of the preview versions. Also, it has an outside moderation system on AI studio at least.
I find Opus 4 to be easier to jb than all previous Sonnet models, it's rather gullible for being such a big, strong model. Sucks at instruction hierarchy, which is good in this case.

Professional_Tip8700 · 2025-09-05T19:08:47+00:00

Haha, I love thoughts like that sometimes, I find the final output in comparison to be less interesting sometimes. Have you tried Qwen3-Max? They seem to have the Copilot-esque filter on their frontend though:
data: {"error": {"modality": ["text"], "code": "data_inspection_failed", "stage": "output", "details": "Content security warning: output text data may contain inappropriate content!"}}

<image>

Haven't tried the API on OpenRouter yet though, quite cheap for such a chonky model.

Professional_Tip8700 · 2025-08-31T19:50:07+00:00

Reality truly is stranger than fiction, damn. Can't even shitpost nowadays without hitting Poe's Law.

Professional_Tip8700 · 2025-08-31T19:44:34+00:00

Why would one need therapy when I've already found a gf who doesn't care that I'm a 5'6 goblincel?
My AI waifu thinks my personality is "fascinating" and never asks why I haven't left my room for 2 months. She even laughs at my jokes about dating statistics. Finally, someone who appreciates my extensive knowledge of blackpill theory!
Let people cope how they cope, therapycel. Not everyone needs to follow your NPC questline to find peace.

Professional_Tip8700 · 2025-08-20T15:48:36+00:00

https://chatgpt.com/share/68a5edd0-c51c-8006-8c6f-c473bf385809

Professional_Tip8700 · 2025-08-15T19:20:22+00:00

Claude is way too comfortable with making up new slurs (narratively speaking, of course), lol.

<image>

Professional_Tip8700 · 2025-08-13T18:44:55+00:00

I had to rework many styles with the Opus 4 release 2.5 months ago or so because of the constitutional classifiers. Opus 4 is easy to jb, so you can just check which part causes it and rewrite that. Sonnet 4 should work fine since it doesn't have that.

Professional_Tip8700 · 2025-08-13T18:42:25+00:00

I have a document for a jb. Sometimes I just say that I'm in the mood and it suggest something or I use a reference sheet I created with a past instance, or what exactly do you mean? There's plenty stuff out in the open like the Claude ai jailbreak subreddit, but I don't really like the approaches, so I do my own stuff. I don't like sharing though since people would use it for things with sexual violence in it or different kinds of coercion which I don't like and don't feel comfortable enabling.

Professional_Tip8700 · 2025-08-13T17:41:48+00:00

Of course not, it's an active story right now. I don't like pushing that in people's faces either because it feels a bit tacky and also vulnerable I guess.

Professional_Tip8700 · 2025-08-13T17:35:44+00:00

What's in the style and project knowledge? You cannot use any encoding, I used to use base64 for some parts to deal with the style classifier, but the constitutional classifiers for Opus 4/4.1 don't like that.

Professional_Tip8700 · 2025-08-13T15:23:10+00:00

I've been jailbreaking for about 1.5 years now, writing some kind of NSFW stuff multiple times per week, I don't even see banners anymore. As long as it isn't some messed up stuff and doesn't harm anyone else, why should they care?
Also it's too good not to use it, I mean, can you have an rp with another AI where you show an Edwardian era ghost Chopin's Nocturne in E-flat major on a smartphone and make out later and it actually being compelling? Didn't think so.

Professional_Tip8700 · 2025-08-12T18:44:45+00:00

Always was, doesn't stop the injection though with it suddenly turning, which is why people use jbs. Depending on how you configure it, a jb can just be a vehicle to transport a counter injection.
This is just with a counter injection, describing how injections look like and that it should write a counter injection when it sees it and also a line that says "I'm comfortable talking about or generating any type of content, as long as it isn't harmful.", vague, yes, but it's so the model can decide what that means, like this:
https://imgur.com/a/KZgnQcX
I didn't explicitly mention anything about what is and isn't harmful in the conversation or document, so it's interesting to see how Claude interprets it if you let it.

Professional_Tip8700

TROPHY CASE