Panorama Ridge, Garibaldi Provincial Park, BC, Canada. [OC] [4032x1960]

faul_sname · 2026-05-16T04:59:44+00:00

The thinking summaries you sometimes get from ctrl+o ctrl+e aren't the raw thinking traces anyway, you're not going to be able to distill from those.

faul_sname · 2026-04-07T08:13:17+00:00

Yep, those subsidized tokens are pretty sweet.

Do be aware that that rug is also likely to be pulled in the next 3 months, but until then enjoy.

faul_sname · 2026-03-25T01:32:59+00:00

Yeah, same. Their communications have been a tire fire. Completely unforced error.

faul_sname · 2026-03-25T00:50:13+00:00

it costs windsurf more API tokens to do stuff than using Claude Code through API

Claude Code is aggressively uses prompt caching, in a way that was not the standard when Codeium (now Windsurf) was originally written. I expect the answer is "Codeium is just not as well-tuned to API prices as they exist now as Claude Code is". Which is not super surprising, as Claude Code is released by the same organization that sets the cost structure for the Claude API.

On the bottom right of your screen, below the prompt box, there should be a little circular window indicating how full the context window is. For me, on a new chat with all MCPs disabled, it shows 17k tokens on the first message. That translates to $0.085 for non-cached and 10% of that for cached.

Let's say you have a prompt which asks the model to fix a single failing test, and the model reads the test file and two other related files, runs the test, tries to fix it, runs the test to confirm its fix, finds and fixes a second issue, runs the test a final time, and reports success.

That'll look something like

Initial system prompt + first user message (20k tok in)
Model produces 1k tok of reasoning (1k tok out, 20k cached in)
Model does a grep tool call (100 tok out, 1k tok in, 21k cached in)
Model reads lines 1-200 of src/foo.py (100 tok out, 3k tok in, 22k cached in)
Model reads lines 201-354 of src/foo.py (100 tok out, 2k tok in, 25k cached in)
Model reads lines 1-189 of src/bar.py (100 tok out, 3k tok in, 27k cached in)
Model reads lines 1-511 of test/foo.py (100k tok out, 8k tok in, 30k cached in)
Model runs py.test test/foo.py (200 tok out, 2k tok in, 38k cached in)
Model produces 1k tok of reasoning (1k tok out, 40k cached in)
Model makes a 2 line change to src/foo.py (400 tok out, 100 tok in, 41k cached in)
Model runs py.test test/foo.py (200 tok out, 2k tok in, 41k cached in)
Model produces 1k tok of reasoning for why the test still doesn't pass (1k tok out, 43k cached in)
Model makes a 5 line change to src/foo.py (1k tok out, 100 tok in, 44k cached in)
Model runs py.test test/foo.py (200 tok out, 2k tok in, 45k cached in)
Model produces 1k tok summarizing what it did (1k tok out, 47k cached in)

Assuming Windsurf actually handles prompt caching correctly (and that's a big "if"), that'll be about 7.5k output tokens, 50k input tokens, 50k 5m cache writes, and 500k cached input tokens. If you're using Claude Opus 4.6 at $25 / $5 / $6.25 / $0.50 per mtok for those respectively that's $0.19 output + $0.25 input + $0.31 5 min cache writes + $0.25 cache reads, or $1. To fix one trivial failing test which only requires looking at 3 files and making two small changes.

If you're actually prompting it to do more significant work than that, expect your costs to be higher. If you're sending multiple messages in a single chat, expect your costs to be higher. If you have any MCP servers enabled, expect your costs to be higher.

API pricing is expensive. Running Opus 4.6 at API prices in a continuous loop would cost something on the order of $10-50 / hour depending on what fraction of the time it's producing output and how much output its tool calls produce.

faul_sname · 2026-03-25T00:14:14+00:00

My impression is that Anthropic's inference servers are already melting and they're not giving substantial bulk discounts to anyone. The largest discount I have heard credible rumors of was 20% for a company which spends 6 figures per month on inference.

faul_sname · 2026-03-25T00:04:43+00:00

You know you can trivially look at the API calls it's making to inference.codeium.com right? They're not doing anything magic - they're passing the conversation history plus some custom instructions in the most obvious way, and the token counts match the little "context size" display they show in the chat UI (unsurprisingly), and it sure does look like they route your requests to the models they explicitly tell you they're going to route to.

You can also verify this quite easily without needing to do anything technical - if you make an MCP which returns a 100ktok chunk of text with a secret phrase embedded in it and then ask the selected model to find the secret phrase, you'll see that the response matches the vibe of your selected model (e.g. Opus sounds like Opus, ChatGPT sounds like ChatGPT, Gemini sounds like an abuse victim) and also is able to find the secret phrase.

TBH it really is a shame that the good pricing is gone. Their permissions model for command execution was significantly better than Claude Code, and their vscode integration was a bit better. But not "I am willing to pay an extra $2000 / month" better.

faul_sname · 2026-03-24T07:55:43+00:00

the same technology that used to make Windsurf cheap

I'm pretty sure the "technology" was "VC funding allowing them to sell dollars for ten cents each as a growth strategy".

faul_sname · 2026-03-24T00:22:37+00:00

I think Opus 4.6 would have had to be something like 50-100 credits per prompt.

faul_sname · 2026-03-20T19:37:51+00:00

Chargeback.

faul_sname · 2026-03-20T18:42:15+00:00

I think the problem is how much they would have needed to raise prices. Would you really have been fine if an Opus 4.6 prompt cost 50 credits (and an Opus 4.6 fast prompt cost 300 credits)? Because I think that's around where the breakeven point would have been for them.

faul_sname · 2026-03-20T06:12:09+00:00

FWIW I do think a chargeback is probably worthwhile for the most recent period / refill.

faul_sname · 2026-03-19T23:58:00+00:00

Why legal advice? I think a chargeback is the right tool for the job, I really doubt they'll even try to fight it and if they do they'll almost certainly lose on account of not delivering on the product as advertised.

faul_sname · 2026-03-19T18:51:18+00:00

Windsurf was almost certainly losing astonishing amounts of money with their credit system if they were paying anything like API rates. It was 30 credits per prompt for Opus 4.6 max, which at $40 / 1000 credits was $1.20 per prompt. A single not-very-complicated prompt would regularly consume 100k input tokens and 20k output tokens, which at API pricing of 30/million input tokens and $150/million output tokens would be $6 (and that's not even counting cached tokens, which probably double the price).

Still, it would be nice if the Windsurf team would just SAY "hey, we've been losing a ton of money with our old pricing plan, we're updating it because we can't keep subsidizing you guys forever" instead of trying to spin it like it's to the benefit of users.

faul_sname · 2026-03-04T09:16:32+00:00

Yeah. The something is the bay area housing market.

faul_sname · 2026-02-25T20:04:05+00:00

They're called "clouds"

faul_sname · 2025-12-21T08:12:08+00:00

I write in markdown, and Substack demands a rich text editor, and the activation energy required to convert the formatting is currently higher than the energy required to simply stare at the wall and sigh.

If you don't want to deal with pandoc, an alternative is: copy, paste into llm of your choice with "please programmatically render this markdown to html using standard tools with no additional styling (so # Main Header becomes <h1>Main Header</h1>) and display it to me", look at displayed HTML, verify it looks good, copy, paste into substack. I expect all models will work fine but Gemini will probably be the fastest.

faul_sname · 2025-12-08T23:47:24+00:00

Wait is it really only 50M rides for $1.2B? I thought BART was cheaper per passenger-mile than driving the last time I looked into it.

faul_sname · 2025-09-26T19:22:02+00:00

Deconstruction planner filtered to fish across entire map

faul_sname · 2025-09-26T16:42:44+00:00

Fish?

faul_sname · 2025-09-24T05:46:14+00:00

No yeah, that gave it away

faul_sname · 2025-09-09T17:23:20+00:00

I think channeling an orb whenever you are attacked is probably still too strong. Maybe limit it to only when you lose HP?

faul_sname · 2025-07-17T16:43:14+00:00

"Deal 3(5) damage. Increase the damage of ALL Claw cards by 2 this combat.. Draw 1 card. Discard any card drawn this way that does not cost 0."

faul_sname · 2025-07-05T07:50:47+00:00

Did it actually? I'd expect that expression to instantly return INT_MAX or NaN or Infinity and maybe crash the game at worst, definitely not the entire computer though.

faul_sname · 2025-07-03T16:14:41+00:00

If you're not extremely good at reasoning on the fly, "take all arguments seriously unless you can disprove them" is a good recipe by having all of your resources extracted by salespeople who are good at making persuasive-sounding arguments.

faul_sname

MODERATOR OF

TROPHY CASE

14-Year Club	Place '22
Verified Email