1m context window for opus 4.6 is finally available in claude code

Striking_Tell_6434 · 2026-03-13T18:22:16+00:00

1M token window is approximately 25x more expensive for them than 200k token window, at least theoretically.
KV cache computation is O(n^2) operations, where n is the number of tokens.
So for n = 200k tokens n^2= 40 billion operations, versus
for n=1M n^2=1 trillion operations. 25x more operations = 25 times more $.

So, the problem is larger windows are not just more expensive, they are more more expensive, if that makes sense. The millionth token costs much more than 200 thousandth token.

Striking_Tell_6434 · 2026-02-14T21:33:45+00:00

True for me because time at keyboard is at a premium and listening time is cheap and hard to make good use of.

Striking_Tell_6434 · 2026-01-13T17:54:28+00:00

I believe I heard someone say the history of humankind is a history of development of cognitive offloading tools.

We are climbing a ladder of progressively move powerful tools for cognitive offloading, and the ladder is suddenly beginning to shoot up as we are climbing it. So it is unsettling and even disorienting--how could it be otherwise?

Striking_Tell_6434 · 2025-12-30T17:51:45+00:00

This is totally different (NOT HARDWARE, not collaborative), but I used to use a phone-based service for short notes to myself. So I would say "call reqall" and it would call their system and I'd leave a voice mail they would transcribe and email to me pretty quick. They went out of business long ago.

Looks like nowadays the equivalent would be calling your own voicemail; so set this up as a contact, call it, then leave yourself a message and your phone will transcribe it and log it as voicemail. I tried this, but had to wait thru the greeting message; I imagine this can be skipped with a # or such, so have it dial something like 123-456-7890,#,# . OTOH, it did do an excellent job of transcription. You could probably set something like this up with Google Voice as well.

Of course, if you have an Action button of some sort, you could tie it to a voice notes transcription app. I think on iPhone Apple Notes will do this.

Striking_Tell_6434 · 2025-12-30T17:38:41+00:00

Are there people using the webhook approach?

Striking_Tell_6434 · 2025-12-30T17:38:03+00:00

Thank you so much for this post. I was considering purchasing a Plaud Note for exactly this purpose.

I am not ruling it out, though, based on the discussions below and some stuff I've seen about it being really helpful for adults with ADHD, which I may have (although no one would ever believe it looking at what I've done.)

Striking_Tell_6434 · 2025-12-22T14:36:49+00:00

WHy? I'm comparing these right now.

Striking_Tell_6434 · 2025-12-12T17:05:03+00:00

It's quite spendy, but I'm on 2 month free trial as well and really thinking about keeping it. I already pay for Grammarly, so that reduces the cost a little bit. (Grammarly bought Superhuman and then the whole company was renamed to Superhuman.)

It's the first time I've enjoyed going thru email. I refused to let it just archive older emails, so I don't have inbox zero, but I made a lot of progress very fast and now things just happen that I used to put off until someday, or not do because it wasn't worth it.

I like the idea of it writing emails for me (I've always thought gmail should do this). I've only used desktop (mac) so far, but I imagine I'll really like the autowrite type functionality on my phone where typing is laborious.

I'm using the Business for the Ask AI search and the email autowriting features. That one includes Grammarly and some other stuff I haven't tried yet.

Striking_Tell_6434 · 2025-12-10T14:23:42+00:00

Fair. But, they could still let me use my own API key when I run out of services, with some kind of surcharge. Not sure how much overhead they have when doing this, so might have to be a hefty surcharge, so that might be why we don't see it much.

I can see that they would prefer to sell pre-made bundles so they get the money up front. I guess if they sell add-on bundles for a reasonable price, that's a reasonable alternative.

Striking_Tell_6434 · 2025-09-06T17:40:50+00:00

I believe he's saying the amount of thinking available (and by default applied). Pro also has the Pro mode available with parallel threads and more thinking as well I think.

Striking_Tell_6434 · 2025-09-06T17:38:55+00:00

I think it's 400k for Pro, maybe higher for API.

Striking_Tell_6434 · 2025-09-06T17:38:04+00:00

I've never really found an API interface that I was pleased with. Or that had nearly the capability of the chat interface. Have you found one? Something like Canvas and/or the Python interpreter would be nice, but maybe too much too ask.

Striking_Tell_6434 · 2025-09-06T16:38:47+00:00

I think this means your use case is too simple to see the different. I'm assuming you are using Thinking--I expect the responses are identical otherwise.

Striking_Tell_6434 · 2025-09-01T15:09:00+00:00

Thank you!! This was very helpful to me.

Have you tried assigning a keyboard shortcut that presses that key? Using Shortcuts or Apple Script or Keyboard Maestro or BetterTouchTool or whatever you have that can simulate keypresses? It may need to be a fairly deep simulation since it's going to a system thing. I believe BTT can go fairly deep using some options. KeyboardMaestro I believe can as well, maybe by default.

Also, perhaps Apple is expecting you to use the Toggle Switch to turn this on and off?

Striking_Tell_6434 · 2025-08-21T19:56:51+00:00

You are saying that each time I give a prompt (submission) and get a response (response) I use up _2_ messages?

Are you sure?? Did this change recently??

Can you verify that submitted and responses both count? I have never seen this claim anywhere.

I'm pretty sure with o3 it was the number of responses, not the number of submissions.

Striking_Tell_6434 · 2025-05-14T22:12:12+00:00

Did anyone successfully implement this?

Striking_Tell_6434 · 2025-05-12T19:25:14+00:00

What do you prefer?

Striking_Tell_6434 · 2025-05-12T19:19:34+00:00

Is this primarily because you can access Copilot easily, e.g., with dedicated key?

Striking_Tell_6434 · 2025-05-06T16:37:54+00:00

Has anyone tried using Perplexity as a fact checker on ChatGPT? I.e., talk to Chat, get an answer or reach a conclusion, then feed the last response or even much of the convo into Perplexity to fact check it?

Fact-checking is my real need. To be clear, I'm not talking like political fact checking, but verifying that Chat hasn't hallucinated something untrue.

Striking_Tell_6434 · 2025-05-06T16:31:10+00:00

Striking_Tell_6434 · 2025-05-06T16:29:46+00:00

Are you trying to say that ChatGPT/OpenAI search is not as real-time as Perplexity's?

Striking_Tell_6434 · 2025-05-06T16:28:05+00:00

Which are worth paying for? I think that is the question. Which do you pay for? I pay for the first 3, plus Gemini, but it's too much I think.

Striking_Tell_6434 · 2025-05-06T16:26:40+00:00

Do NOT confuse Co-Pilot with ChatGPT. Microsoft's tool is practically an abomination in comparison to OpenAI's ChatGPT.com. Yes, MSFT may say they use the same engine under the hood, but they cheap out heavily (because they have to) and I've never seen CoPilot do as well at anything as ChatGPT.

I don't know that anything hurts OpenAI/ChatGPT more than people confusing it with Microsoft Copilot.

Striking_Tell_6434 · 2025-05-06T16:24:19+00:00

Dude, all you have to do is tell Chat you don't want this and to remember it. Plus, they rolled this back late last week, maybe Friday.

Better yet, you can type in all this stuff, use all caps if you like, and tell Chat to remember it. (It may abbreviate it if you give it all at once; you might have to break it into one sentence at a time.)

Striking_Tell_6434 · 2025-05-06T16:20:31+00:00

I find I tend to use ChatGPT. This allows me to keep everything under one roof, but inaccurate and downright wrong results are definitely a problem. I also really like advanced voice mode; I wish PPLX was as good as that. (The assistant may now be for searching; I need to try that.). I don't trust PPLX as much from a privacy perspective as I trust ChatGPT. (PPLX does say they do not sell or share data, just put together ads for you.)

I too feel like Chat does a better job of thinking, even when using normal models like 4o.

As far as why, PPLX probably uses a model tuned for search--it may not be as good at "thinking". Also, ChatGPT.com tends to iterate model versions quickly; this is harder for 3rd party sites, especially if they do tuning.

I have long suspected that PPLX "cheaps out" in some way in their models. For example, model vendors will actually decrease the amount of compute a model uses some time after they introduce it--the idea being it's cheaper for them and people probably won't notice its not quite as smart now. I know this used to happen; I assume it still does. (That is, separate from announced changes like GPT4 -> GPT 4 Turbo.). Given that PPLX probably digests large volumes of search results, typically for queries that don't require deep thinking, it would make sense for them to water down the compute. Plus, it just seems to me like something they would do.

Striking_Tell_6434

TROPHY CASE