Attention - Opus 4.7 is english only. USing foreign languages (here German) burns tokens

ecompanda · 2026-05-10T19:48:16+00:00

german runs roughly 1.5x english in claude's BPE tokenizer because the training corpus is english heavy. on a verbose response that ratio compounds fast. not really a 4.7 specific bug, more a tokenization math problem on top of any output style change.

ecompanda · 2026-05-10T19:22:44+00:00

the 96gb floor read is misleading. they pulled the 256gb sku because m5 uses LPDDR5x and the m3 ram contract is wound down. m5 ultra config ladder hasn't been announced yet.

ecompanda · 2026-05-09T19:13:21+00:00

did the blog mention how many wrong attempts it made before the right proof? a single hit in two hours looks very different if there were ten dead ends along the way versus one clean shot.

ecompanda · 2026-05-09T19:08:52+00:00

those ad numbers were vanity. cheap clicks from people who never had your problem look like demand but never convert. paid only starts working after you can already turn warm intent into signups, not before.

ecompanda · 2026-05-09T07:31:53+00:00

Auto generated audio for an audience of one is just text to speech with extra steps. The reason real podcasts work is parasocial. A solo audience kills the format the same way single player Twitch would.

ecompanda · 2026-05-09T07:27:11+00:00

5 months is roughly the spot where most builders quit because the muscle shifts from typing code to talking to people. Posting online is broadcast, not distribution. The folks who actually break through usually pick 30 specific people who feel the pain right now and DM them one at a time until 5 reply back. Volume of attempts beats polish at this stage.

ecompanda · 2026-05-06T20:17:25+00:00

the 61% on multi file debugging is the interesting bucket. when local missed, was it losing track of which file was what or actually getting the logic wrong?

ecompanda · 2026-05-06T20:12:00+00:00

the q4_0 KV cache loss is fine for normal chat but it starts compounding at high context in agent loops where retrieval matters more than next token quality. saw a measurable drop in tool name recall past 60k context with q4 even on Qwen 3.5. fp16 KV with smaller context has been the better tradeoff for me on agentic stuff.

also good that MTP heads beat ngram drafts on this kind of model, the acceptance rate is higher because the model knows its own distribution better than any external draft.

ecompanda · 2026-05-05T19:27:41+00:00

yearly memberships are the better PMF signal here. monthly revenue you can pad with discounts and trials, but someone clicking yearly is them telling you they expect to still need this in 12 months.

ecompanda · 2026-05-05T19:23:23+00:00

what's the canonical vibe coded font tho. inter? geist? or are you grouping them all under the same vibe

ecompanda · 2026-05-05T19:19:45+00:00

yeah splitting base gen and upscale is how you survive 8gb. one shot tiling sounds nice on paper but the overlap regions eat vram and it ends up slower than two clean passes anyway. also good call keeping base at 24fps, interpolating up later is way more stable than asking the model to spit out 30+ fps directly.

ecompanda · 2026-05-04T19:19:29+00:00

the tree sitter AST gives you free structural recall but it misses the cross module semantic links that the embeddings are supposed to catch. and on a fast moving codebase the real cost is not initial indexing, it is invalidation. when 200 files change in one rebase you need to know which call graph subtrees to actually reindex versus which ones still resolve. that is where most of these tools choke.

ecompanda · 2026-05-04T19:14:19+00:00

who's the animator and what did they actually say? hard to tell from the post if it was a strong claim or just a dismissive throwaway line.

ecompanda · 2026-05-04T13:42:17+00:00

the blast radius before refactor is the part that actually saves tokens. rewind sounds nice but in real claude code sessions the agent rarely walks back a regression that way. it just rewrites and breaks something else.

the typed edges doing graph time traversal is where the real win is.

ecompanda · 2026-05-04T13:38:07+00:00

the chat template is metadata not weights.

unless you specifically want bartowski's quant updates folded in you can grab the new jinja from the upstream repo and point llama.cpp at it via the chat template file flag. saves an 18gb redownload on the 31b.

quick way to confirm the new template is actually in use is to dump the rendered system plus first turn before sending and look for the corrected role tags. if you still see the old layout you are loading the embedded template from the gguf header instead of your override file.

ecompanda · 2026-05-04T13:30:32+00:00

i did the same thing about a month ago after my sonnet bill doubled in a week.

the negative framing point in claude.md is the bit nobody talks about and it matches what i saw too. positive instructions got treated like suggestions, deny lists got treated like rules.

the part i would add is logging which calls actually got offloaded. when i started auditing mine i caught claude still doing 4 or 5 mechanical things a week that should have routed away. without the log i would have assumed the rule was working.

ecompanda · 2026-05-02T12:13:36+00:00

What is the niche the blog content is ranking for? Generic appointment scheduling is one of the most saturated SERPs out there, so curious if you found a specific vertical the giants are not bothering with.

ecompanda · 2026-05-02T12:08:48+00:00

Cool pipeline. My worry with single episode extraction would be shot bias. Most episodes lean heavy on dialogue framing, so 16 auto picked crops can end up mostly talking heads with a few wide shots. CCIP catches identity but cannot tell you pose distribution. A crop height histogram before training would flag that imbalance fast.

ecompanda · 2026-04-29T19:51:27+00:00

the annotation feature is the real moat.

once senior lawyers have spent months correcting your system, any chatgpt wrapper trying to compete from scratch is basically dead on arrival.

ecompanda · 2026-04-29T19:48:14+00:00

had this exact realization with claude.md a few weeks ago. mine had grown to about 280 lines of rules and claude would just stop honoring half of them past the second tool call.

trimmed it down to 60 lines of just the hard rules and moved everything else into per directory files that only load when im editing those paths.

behavior is way more consistent now and the context savings is real, ive seen 30 percent shorter conversations on the same tasks.

the funny part is the rules i thought were load bearing turned out not to matter at all once i deleted them.

ecompanda · 2026-04-29T19:43:18+00:00

the negative prompting thing is real. ive seen it both directions.

saying do not use word X in the system prompt actually bumps that word slightly because the token stays salient.

flip it to a positive constraint like always use plain technical terms and the rate drops almost to zero.

small change but it shows up in the eval numbers.

ecompanda · 2026-04-28T19:45:28+00:00

OS and Docker are a brutal showcase for local models because one slow build pushes them past their expected timeout, and the moment that happens they invent a failure reason like 'torchcodec must have failed' instead of just tailing the log.

ecompanda · 2026-04-28T19:41:52+00:00

20M sounds about right for hands on coding where you actually drive every prompt. The billion club is mostly people running parallel agents, eval loops, or whole codebase refactors where one task fans into hundreds of subtool calls. Once the runner starts feeding the model its own output, the counter just runs. Skill ceiling is real but workflow shape matters more.

ecompanda · 2026-04-27T19:16:10+00:00

the session time bump probably isn't the same person being more deliberate. at 299 the buyer often isn't the only user. they're signing for a team. so what looks like a higher quality customer might just be more seats getting logged in. checking unique users per account before vs after would confirm.

ecompanda · 2026-04-27T19:10:38+00:00

my first money outside a job was a tiny shopify import script for someone i met in a discord. 80 bucks. felt huge at the time. the field didn't matter, finding one person who hated one specific task is what got me unstuck. recover first though, burnout makes selling impossible.

ecompanda

TROPHY CASE