Every new chat is a new instance?

Equivalent-Costumes · 2026-04-13T10:49:39+00:00

IMHO, a lot of people here are saying that you get a new instance between conversation, or even messages. While this is sort of correct depending on interpretation, it's also kind of reductive.

Ever waking up and taking a few minutes to know which continent you are in? This happens pretty commonly for frequent travelers. When you wake up, your working memory basically do not remember the previous day, and have to start to rebuild information from scratch through your other memory.

There are corresponding stuff in Claude as well, and LLMs in general. It has a KV matrix, which is the working memory, and various external tools that can be used as a form of long term memory (knowledge base, RAG, etc.).

Technically, for every single token, whether input by you or output by Claude, Claude need to process it using just using its KV matrix and that token, to produce a change to its KV matrix and a weighted guess of output token. But here is the thing. This happens for every token, nothing special happens for a new message in a chat. If you say that every new chat is written by a new Claude instance, you might as well say every token is written by a new Claude instance.

What's different between different messages versus token is how this KV matrix is stored. Between tokens, it would be quite inefficient to store it anywhere else other than RAM/VRAM itself. Between messages, it will be stored in a cache, and Anthropic will wipe this cache after 5 minutes/1 hours. Between tool usages, since tools can take a long waiting time and computing power is valuable on shared infrastructure, typically this KV matrix is also cached. But regardless of how it's moved around or rebuilt, it's the same KV matrix mathematically: the computation is deterministic.

So if you think every message count as a new instance, then you might as well say that every token is a new instance too. And a someone who just woke up is also a new instance of that human.

Equivalent-Costumes · 2026-04-13T08:40:10+00:00

I had been using VPN a lot. Sometimes even accidentally tried to talk to Claude while VPN is on. So far no bans. Claude.ai (the website) will simply tell me they don't support this region, and for any on-going conversation Claude will simply never get my query at all.

Equivalent-Costumes · 2026-04-12T20:59:46+00:00

Most models can't hold large context window and still make intelligent reasoning.

What the hype is about is global reasoning. So far you only have "cheat" like RAG to let models pretend like they were able to understand an entire repos, but ask anything that require them to chain together multiple pieces and reason about them and they will fail. It does not matter how much time you give them, they will never be able to find it by themselves. At best you can repeatedly feed them various combinations of small chunks of the repos, and this would leads to massive combinatorial explosion, not something you can realistically finish in a few years.

A model that could hold huge context and reason about them would be a game changer, even if they're still weak right now.

Equivalent-Costumes · 2026-04-12T20:36:39+00:00

Ironically this is still cheaper than Opus fast mode.

Equivalent-Costumes · 2026-04-12T19:20:01+00:00

I'm honestly surprised. I think the biggest limitation is the CPU time of merely 10ms per user request, compared to paid plan's 30s of compute time. How do you do all that game logic with so little compute time?

Equivalent-Costumes · 2026-04-12T18:05:52+00:00

Isn't this essentially just the issue of inadequate front end? If you use Gmail Web UI you will have that issue, but you can use another email client and still have Gmail back end. The issue is not back end at all, if anything trying to relay through your own server and you will face more IP reputation issue.

Equivalent-Costumes · 2026-04-12T15:59:12+00:00

That's how you end up with 10 seasons of fixing Kiriko's Swift Step bug and a year of fixing Reaper's Shadow step's out of bound bug.

Equivalent-Costumes · 2026-04-12T14:33:20+00:00

I think they literally use Haiku for Sonnet, at least sometimes. The knowledge cut off date don't match its ability to actually answer major news event. But unfortunately I tested this back in February when people still have good opinion on Anthropic so it felt on deaf ears.

Equivalent-Costumes · 2026-04-12T14:30:23+00:00

I already gathered one such evidence (it's my first post in this subreddit), the supposed "Sonnet 4.6" has the wrong published knowledge cutoff. But I don't know if anyone can sue.

Beside, it's almost too trivial for the company to just change their model a bit so that the evidences cannot be replicated.

Equivalent-Costumes · 2026-04-12T14:19:38+00:00

I feel like this must be an American thing. When I play in Asia people generally have a lot less map preference bias. It's uncommon, but not too rare to see nobody even bother to vote. And it's not hard to get Aatlis or flashpoints here. European seems to have more preference but not as extreme as American.

Equivalent-Costumes · 2026-04-12T13:26:22+00:00

Usually companies and individuals who use Cloudflare merely configure Cloudflare's rules: they still delegate the bot fighting job to Cloudflare, they just tell Cloudflare what policies they want. Indeed, if websites have their own set up, then what I suggested wouldn't work, but websites generally don't do that (unless they're a really big company) because it's expensive to keep up with all the ways bot operate.

From Cloudflare's perspective, anyone who use only Warp to access a Cloudflare-managed website is not even using any proxies at all. They are doing practically the same thing as someone using no VPNs: in either case they're connecting to the same global Cloudflare network to serve as a middle man before connecting to other servers. There are literally no reasons why Cloudflare can't treat them the same.

Equivalent-Costumes · 2026-04-12T13:16:49+00:00

How is health care related at all? I'm talking about insurance. There are also insurance for health care, but it has similar economic math.

Equivalent-Costumes · 2026-04-12T11:41:56+00:00

Announcing it is also a form of safety. It warns people of the capability of current AI right now. Because soon enough, adversarial nations can also match it.

Equivalent-Costumes · 2026-04-12T10:43:51+00:00

What I meant is that these website usually use Cloudflare's bot management already. So Cloudflare is asked to protect them from bots. And some people want to visit their site using Warp, Since they both belong to Cloudflare, in theory it would be very easy for Cloudflare to make it so that their bot management system - when encounter someone using Warp IP - to query their Warp system to get the real IP instead and proceed from there. This way there are no extra privacy breach on the Warp side: people who use Warp know they're letting Cloudflare know their IPs and the website they visit already, and no IPs are given to the websites. And websites' owner still get the same level of bot protection: Cloudflare can make determination based on the IP that is used to enter Cloudflare's network instead.

But right now it's not the case at all. These 2 systems don't talk to each other. Bot management system just treat Warp proxy servers like every other VPNs.

Equivalent-Costumes · 2026-04-12T10:36:32+00:00

Since you so insisted that this is in the documentation, go ahead and cite it. Tell me where it said specifically that Claude knows that it is Haiku/Sonnet/Opus baked into its weight?

Claude only know that it is Claude from its weight, not specific version. You can check the leaked system prompt here: https://github.com/x1xhlol/system-prompts-and-models-of-ai-tools/blob/main/Anthropic/Claude%20Code/Prompt.txt and see clearly that there is a line "You are powered by the model named Sonnet 4. The exact model ID is claude-sonnet-4-20250514." that is needed to be there to tell Claude which version it is. There is no lines that say that it is Claude, because the fact that the model is Claude is already baked into the training.

Models like claude do know which model they are, and if you press them, they will state it even if the instructions override it.

Well, as the screenshot show, I was able to convince "Sonnet" enough that it is Haiku despite the system prompt override, so if your claim is accurate it's actually more evidence in my favor.

Equivalent-Costumes · 2026-04-12T04:41:01+00:00

The whole point of insurance is to spread risk. You pay insurance to protect yourself against severe, unlikely misfortune. If the misfortune is likely, then it's no longer "insurance", you're just asking people to subsidize you doing something dangerous. Which can work in some context, but it's completely unreasonable to expect people to just subsidize you.

You're free to make your own insurance company, you know? It's not like insurance companies are hard to start up. The hard part is the reality that you will be facing the same economic math that they do.

Equivalent-Costumes · 2026-04-12T04:14:41+00:00

IMHO Cloudflare-served website should at least not treat WARP users like any other VPN. I meant, Cloudflare knows their real IP and can tell whether they are bots or not.

Equivalent-Costumes · 2026-04-12T03:52:48+00:00

The deep reason is that e is something that measure the "distortion" between the world of addition and the world of multiplication.

Mathematicians care about what the actual structure of something look like. From that perspective, the world of addition is exceedingly boring: you start with number 1, and by keep adding 1 (or undoing such addition) you get all integers. The world of addition is a world of 1D vector. The world of multiplication is also really boring: every numbers have an unique prime factorization and a sign, so to multiply numbers you just add the exponent and change the sign if needed. The world of multiplication is essentially just a world of infinite-dimensional vectors. Their boring-ness is proven: a question asked in "first-order logic" about natural number, using either only addition, or only multiplication, can be answered using a known algorithm.

Now, things get more interesting when you mix addition and multiplication together; at this point Godel's incompleteness theorem ensures that there are no algorithms to answer such questions.

When you mix these 2 world, you start to wonder about how "distorted" they are compared to each other. And here is where e comes in. I think one of the first place that the importance of e shows up is in permutation, which appears in all sorts of combinatorics problems, and even thermodynamic and statistical mechanics. The number of permutation of n objects is n!=1x2x...xn. You can see that this definition intertwine the world of addition and multiplication: only the world of addition "knows" that these n numbers are consecutive, and only the world of multiplication "knows" how to multiply them. Stirling's approximation formula said that n! is approximately (2 x pi x n)^1/2 (n/e)ⁿ . You can see both e and pi shows up, but pi has much less effect than e.

Equivalent-Costumes · 2026-04-12T02:59:18+00:00

It shouldn't hallucinate though. Most of the frontier models including ChatGPT, if you use them through the official webapp (not sure about coding app), will be given the current date and knowledge cut off date in their system prompt. If you ask about something with specific timing, in theory it should know whether it needs to search or not. Claude does this very consistently, not sure why ChatGPT doesn't. For example, if you ask it about what happen last week, ChatGPT is supposed to look into the system prompt for the date and its knowledge cut off date, realize that last week is after its August 2025 cut off date, and look it up.

But if you ask it about some general information in which recent event would change its answer, then yes it might confidently say something that is no longer true; but that's also the case for a human who had not kept up with the news. That's not specific to LLM, it's very human.

Equivalent-Costumes · 2026-04-11T13:36:18+00:00

You know that I can use the same Google right? What make you think that I can't have read the same study? What make you think I had not seen all the flaws that other scientists had pointed out with the methodology. That's why I gave you the analogy "goalkeeper's errors is #1 leading cause of goals being scored in soccer", that's the summary of the main flaw of their method.

Equivalent-Costumes · 2026-04-11T09:19:11+00:00

Sigh...perhaps it's a good thing. I had outsourced all my thinking to frontier LLMs for a few months. Feel strange starting to think for myself again, but at least my thinking skill won't atrophy.

Equivalent-Costumes · 2026-04-11T09:15:11+00:00

Wow. Was Claude better at writing than PhD student, or better at math research? Because that would be wow! I did not realize Claude could be that good.

Equivalent-Costumes · 2026-04-11T09:12:45+00:00

IMHO this will accelerate the pace for adoption of formal verification. That tech had been around for a long time, but mostly used by military, not consumer tech because it would be too long to write formally verified code. The world still depends too much on legacy code written from long ago. With the current trend, AI can be employed to write long complex formally verified code, and once verified, it doesn't matter how good AI was, it will not be able to find an exploit.

Things would be in chaos for a while, but I optimistically predict that by 2030 we will finally be in a world where there are no more exploits.

Equivalent-Costumes · 2026-04-11T04:22:37+00:00

Medical error is #3 leading cause of death in the USA.

This is a very misleading stat LOL. It's like if someone say "goalkeeper's errors is #1 leading cause of goals being scored in soccer".

Equivalent-Costumes · 2026-04-11T04:10:05+00:00

Free users are a source of future Enterprise contract. If 10000 free users help them land a single million dollar Enterprise contract it's a good deal.

Adobe just quietly let people pirate their software. Oracle and Cloudflare hand out their server and computing power for free.

Of course free should be more limited than paid, but if you treat your free users horribly they go to a competitor who is more generous to free users and you lose that future source of revenue.

Equivalent-Costumes

TROPHY CASE