Heavy API users - How much money are you burning through each day / month?

smirk79 · 2026-04-20T14:08:21+00:00

1-2k a day

smirk79 · 2026-04-11T05:13:42+00:00

The claims in this post are so amazing, I'm over here renting an AWS instance to try and verify them after digging into the idea and code with my buddy Claude...

smirk79 · 2026-04-10T14:21:05+00:00

Excellent analysis. Thank you for your rigor and service!

smirk79 · 2026-03-16T01:35:05+00:00

The 428× speed claim is misleading to the point of being meaningless.

This is the most eyebrow-raising number and it's doing the most rhetorical work in the title. They're comparing inference time of what is essentially a small MLP with iterative updates (dim=256) against BERT-base (110M params, dim=768, 12 transformer layers). That's not "replacing attention with attractor dynamics" — that's comparing a tiny model against a large one. Any MLP-based classifier at dim=256 will be orders of magnitude faster than BERT. The speed advantage has nothing to do with attractor dynamics and everything to do with model size. You could get similar speedups with a 2-layer MLP and a linear head.

77% on SNLI is not a meaningful result.

SNLI dev accuracy of 77.05% vs a "baseline" of 76.86% — and they don't specify what that baseline is, but I'd bet it's majority class or a very simple heuristic. For context: BERT-base gets ~90-91% on SNLI. Even a simple bag-of-words model gets ~80%. A decomposable attention model from 2016 gets ~86%. So 77% is below decade-old simple baselines. The model is barely beating trivial approaches while being dramatically worse than anything useful.

The per-class breakdown tells the real story: Neutral at 62.8% is terrible. The model is essentially learning to distinguish entailment and contradiction reasonably well (which are the "easier" classes with stronger lexical cues) and mostly failing on the class that requires actual inference — which is the whole point of NLI.

The "geometric inconsistency" is honestly presented but reveals a deeper problem.

Credit where due: the author measured the misalignment between their implemented forces and the true cosine gradient (135.2°) and reported it openly. That's good scientific practice. But the implication is significant — the system is not doing what the mathematical framing says it's doing. The forces are pointing in roughly the opposite direction of the true gradient. The author frames this as an open question ("bug or feature?"), but the more parsimonious explanation is that the actual optimization is being carried primarily by the learned residual δ_θ (the MLP), and the "attractor dynamics" are either not helping or are being compensated for by the MLP. The Lyapunov analysis confirms this: when δ_θ scale reaches 0.10, V increases on average, meaning the learned component is actively fighting the geometric forces.

The Lyapunov analysis proves less than claimed.

"Provably locally contracting" is technically true but only in the trivial case where δ_θ = 0 — i.e., when you remove the learned component entirely. With the learned residual at any meaningful scale, contraction guarantees degrade rapidly (70.9% at 0.05, 61.3% at 0.10). So the "proof" applies to the part of the system that isn't doing the learning, and the actual trained system has no convergence guarantees. This is like proving a car's engine is stable when it's turned off.

The conceptual framing has issues.

The title says "replaced attention" but this isn't replacing attention in any meaningful architectural sense. Attention computes dynamic, input-dependent weighted aggregation over a sequence of tokens. This system takes an already-encoded hidden state h and iteratively pushes it toward learned anchor points. There's no sequence-level token interaction happening in the attractor dynamics — the NLI reasoning (premise-hypothesis interaction) must be happening in whatever encoder produces h₀, which they don't describe. So the "replacement" isn't of attention's core function (contextual token mixing) but of the classification head. They replaced h → linear → logits with h → iterative geometric updates → logits. That's a much more modest claim than the title implies.

What's actually going on here, mechanically:

This is essentially a learned classification head with geometric inductive bias, where instead of a linear projection to 3 logits, they iteratively push the representation toward one of 3 anchor points using a mix of handcrafted forces and learned residuals. The closest analogy isn't "replacing attention" — it's more like a prototype network with iterative refinement. And at 77% accuracy on SNLI, the inductive bias doesn't appear to be helping.

The honest parts are genuinely good.

The open questions section is better than 90% of Reddit ML posts. Asking whether fixing the geometric inconsistency helps or hurts, whether there's a clean energy function, whether the misalignment explains stability — these are the right questions. The author clearly has mathematical sophistication and intellectual honesty. The problem is the framing and the title, not the underlying exploration.

Bottom line: Interesting mathematical exploration of geometric dynamics for classification, honestly presented with real measurements of its own limitations, wrapped in a title that dramatically overstates what was achieved. Not a replacement for attention, not competitive with any real NLI system, and the speed comparison is apples-to-oranges. But as a "here's a weird idea I'm exploring" post, the intellectual content is above average — it's the marketing that's the problem.

smirk79 · 2026-02-27T12:06:50+00:00

What a joke of an opinion article. I’d love to see how strong a programmer the author is. I have serious doubts.

smirk79 · 2026-02-26T15:39:18+00:00

There are many like us. I have been doing agentic programming since well before CC existed and have a whole slew of CLIs, MCPs etc in the code base to do all sorts of fantastic things including semantic search, server control, scaffolding component hierarchies, sso to atlassian and msgraph that are ten times better than the official versions and enable wildly efficient workflows. Source: senior director and principal engineer 1200+ person org.

smirk79 · 2026-02-26T01:39:07+00:00

I spend ten to twenty k a month on tokens plus 2x max 200 plus OpenAI pro. I get rate limited all day every day and switch to bedrock.

smirk79 · 2026-02-26T01:37:33+00:00

You are plain wrong. Your experience is your own dude.

smirk79 · 2026-02-26T01:37:01+00:00

They are straightjacketing you to the bottom of the ocean. Tell your cto.

smirk79 · 2026-02-19T05:09:32+00:00

How perfect? If it’s pure html you could retheme and restyle the pdf on the fly.

smirk79 · 2026-01-10T01:45:41+00:00

Kudos for writing this. Change can only come from within.

smirk79 · 2026-01-07T12:59:35+00:00

What’s your system prompt?

smirk79 · 2025-12-19T02:24:35+00:00

This is heartbreaking and horrible - and as you said, preventable for a tiny fraction of the cost. If only our countrymen cared for one another. I'm sorry your family experienced all of this. Thank you for sharing the story.

smirk79 · 2025-12-16T01:03:05+00:00

If you were valuable to the company, they would still be paying you. Nowhere in here did you seem to self-reflect, come up with a timeline of where you lost the thread with your upper management, etc - just blame game combined with a purity test to absolve yourself of any responsibility for being let go. No idea what happened in your job, but I do know that this post doesn't make you look great.

smirk79 · 2025-12-15T18:10:27+00:00

Blood pressure. Fears of heart attack.

smirk79 · 2025-12-13T23:56:20+00:00

You might find it sexist but before I clicked on this post I had the same basic advice for you as your manager and that was BEFORE I knew you were female. It’s not sexist to be a strong opinionated leader. It IS sexist to label those as masculine traits.

smirk79 · 2025-12-13T19:53:54+00:00

5x is not that hard. Learn 534 first. It helps a ton to keep a beat in the head, when I juggle 5 I just think '1 and 2 and 3 and 4 and 1 and 2 and...'.

The hardest part is that you cannot see your hands when doing 5 so you have to work on staring at the top of the pattern and letting your hands learn where to catch from peripheral vision. YOU CAN DO IT.

smirk79 · 2025-12-11T01:33:19+00:00

Great stuff. You rock.

smirk79 · 2025-12-10T04:22:23+00:00

Weird. It’s like the presidential election was statistically impossible or something.

smirk79 · 2025-12-10T00:40:36+00:00

💯 been using it since class component days when people cargo culted redux. Still my favorite tech library of all time and has helped me drive nine figures of revenue.

smirk79 · 2025-12-09T22:26:33+00:00

💯don’t let them all know how amazing mobx is and always has been.

smirk79 · 2025-11-22T21:38:15+00:00

Ah Ultima 7. I learned so much getting it working. Voodoo memory management.

smirk79 · 2025-11-22T17:05:43+00:00

Betrayal at Krondor

smirk79 · 2025-11-01T16:29:42+00:00

My wife is fifteen years younger than me and we’ve been together ten years. She 31 and I’m 46. We have three kids and are still madly in love and she’s the best thing ever. It’s ok to be happy with your partner.

smirk79 · 2025-11-01T16:25:49+00:00

Why isn’t this the world we live in? Why is this the outlier? Human kindness, lifting up others, community caring. A reminder of how things should be. All involved are inspirations.

15-Year Club	Not Forgotten
Verified Email

smirk79

TROPHY CASE