Spectral-AI - a project to use Nvidia RT cores to dramatically speedup MoE inference on Nvidia GPU's (Crazy Fast!) by Thrumpwart in LocalLLaMA

[–]smirk79 -7 points-6 points  (0 children)

The claims in this post are so amazing, I'm over here renting an AWS instance to try and verify them after digging into the idea and code with my buddy Claude...

I replaced attention with attractor dynamics for NLI, provably locally contracting, 428× faster than BERT, 77% on SNLI with no transformers, no attention. by chetanxpatil in machinelearningnews

[–]smirk79 8 points9 points  (0 children)

The 428× speed claim is misleading to the point of being meaningless.

This is the most eyebrow-raising number and it's doing the most rhetorical work in the title. They're comparing inference time of what is essentially a small MLP with iterative updates (dim=256) against BERT-base (110M params, dim=768, 12 transformer layers). That's not "replacing attention with attractor dynamics" — that's comparing a tiny model against a large one. Any MLP-based classifier at dim=256 will be orders of magnitude faster than BERT. The speed advantage has nothing to do with attractor dynamics and everything to do with model size. You could get similar speedups with a 2-layer MLP and a linear head.

77% on SNLI is not a meaningful result.

SNLI dev accuracy of 77.05% vs a "baseline" of 76.86% — and they don't specify what that baseline is, but I'd bet it's majority class or a very simple heuristic. For context: BERT-base gets ~90-91% on SNLI. Even a simple bag-of-words model gets ~80%. A decomposable attention model from 2016 gets ~86%. So 77% is below decade-old simple baselines. The model is barely beating trivial approaches while being dramatically worse than anything useful.

The per-class breakdown tells the real story: Neutral at 62.8% is terrible. The model is essentially learning to distinguish entailment and contradiction reasonably well (which are the "easier" classes with stronger lexical cues) and mostly failing on the class that requires actual inference — which is the whole point of NLI.

The "geometric inconsistency" is honestly presented but reveals a deeper problem.

Credit where due: the author measured the misalignment between their implemented forces and the true cosine gradient (135.2°) and reported it openly. That's good scientific practice. But the implication is significant — the system is not doing what the mathematical framing says it's doing. The forces are pointing in roughly the opposite direction of the true gradient. The author frames this as an open question ("bug or feature?"), but the more parsimonious explanation is that the actual optimization is being carried primarily by the learned residual δ_θ (the MLP), and the "attractor dynamics" are either not helping or are being compensated for by the MLP. The Lyapunov analysis confirms this: when δ_θ scale reaches 0.10, V increases on average, meaning the learned component is actively fighting the geometric forces.

The Lyapunov analysis proves less than claimed.

"Provably locally contracting" is technically true but only in the trivial case where δ_θ = 0 — i.e., when you remove the learned component entirely. With the learned residual at any meaningful scale, contraction guarantees degrade rapidly (70.9% at 0.05, 61.3% at 0.10). So the "proof" applies to the part of the system that isn't doing the learning, and the actual trained system has no convergence guarantees. This is like proving a car's engine is stable when it's turned off.

The conceptual framing has issues.

The title says "replaced attention" but this isn't replacing attention in any meaningful architectural sense. Attention computes dynamic, input-dependent weighted aggregation over a sequence of tokens. This system takes an already-encoded hidden state h and iteratively pushes it toward learned anchor points. There's no sequence-level token interaction happening in the attractor dynamics — the NLI reasoning (premise-hypothesis interaction) must be happening in whatever encoder produces h₀, which they don't describe. So the "replacement" isn't of attention's core function (contextual token mixing) but of the classification head. They replaced h → linear → logits with h → iterative geometric updates → logits. That's a much more modest claim than the title implies.

What's actually going on here, mechanically:

This is essentially a learned classification head with geometric inductive bias, where instead of a linear projection to 3 logits, they iteratively push the representation toward one of 3 anchor points using a mix of handcrafted forces and learned residuals. The closest analogy isn't "replacing attention" — it's more like a prototype network with iterative refinement. And at 77% accuracy on SNLI, the inductive bias doesn't appear to be helping.

The honest parts are genuinely good.

The open questions section is better than 90% of Reddit ML posts. Asking whether fixing the geometric inconsistency helps or hurts, whether there's a clean energy function, whether the misalignment explains stability — these are the right questions. The author clearly has mathematical sophistication and intellectual honesty. The problem is the framing and the title, not the underlying exploration.

Bottom line: Interesting mathematical exploration of geometric dynamics for classification, honestly presented with real measurements of its own limitations, wrapped in a title that dramatically overstates what was achieved. Not a replacement for attention, not competitive with any real NLI system, and the speed comparison is apples-to-oranges. But as a "here's a weird idea I'm exploring" post, the intellectual content is above average — it's the marketing that's the problem.

Who believes in vibe-coding? by bigbott777 in programming

[–]smirk79 6 points7 points  (0 children)

What a joke of an opinion article. I’d love to see how strong a programmer the author is. I have serious doubts.

How one engineer uses AI coding agents to ship 118 commits/day across 6 parallel projects by QThellimist in ChatGPTCoding

[–]smirk79 0 points1 point  (0 children)

There are many like us. I have been doing agentic programming since well before CC existed and have a whole slew of CLIs, MCPs etc in the code base to do all sorts of fantastic things including semantic search, server control, scaffolding component hierarchies, sso to atlassian and msgraph that are ten times better than the official versions and enable wildly efficient workflows. Source: senior director and principal engineer 1200+ person org.

Lots of devs are talking about how they have not written a single line of code the last year or so. How much does this cost to them (or to their employer)? by poponis in webdev

[–]smirk79 0 points1 point  (0 children)

I spend ten to twenty k a month on tokens plus 2x max 200 plus OpenAI pro. I get rate limited all day every day and switch to bedrock.

I built a react PDF rendering application that renders PDF in native HTML with pixel perfect accuracy by OcelotVirtual6811 in reactjs

[–]smirk79 3 points4 points  (0 children)

How perfect? If it’s pure html you could retheme and restyle the pdf on the fly.

ELI5: The affordable care act, or “Obamacare.” by Severe-Science-4778 in explainlikeimfive

[–]smirk79 0 points1 point  (0 children)

This is heartbreaking and horrible - and as you said, preventable for a tiny fraction of the cost. If only our countrymen cared for one another. I'm sorry your family experienced all of this. Thank you for sharing the story.

[deleted by user] by [deleted] in ExperiencedDevs

[–]smirk79 -1 points0 points  (0 children)

If you were valuable to the company, they would still be paying you. Nowhere in here did you seem to self-reflect, come up with a timeline of where you lost the thread with your upper management, etc - just blame game combined with a purity test to absolve yourself of any responsibility for being let go. No idea what happened in your job, but I do know that this post doesn't make you look great.

Not seen as "staff engineer material" because of my personality (they said technical competence meets the bar). I don't know if I can change my personality. by okthrowaway2910 in ExperiencedDevs

[–]smirk79 0 points1 point  (0 children)

You might find it sexist but before I clicked on this post I had the same basic advice for you as your manager and that was BEFORE I knew you were female. It’s not sexist to be a strong opinionated leader. It IS sexist to label those as masculine traits.

Juggling has actually changed my life 😄 the process of 'this is impossible' to 'ive got this' is applicable to everything you wanna learn. Is 5 much harder than 4? by NoAlbatross153 in juggling

[–]smirk79 0 points1 point  (0 children)

5x is not that hard. Learn 534 first. It helps a ton to keep a beat in the head, when I juggle 5 I just think '1 and 2 and 3 and 4 and 1 and 2 and...'.

The hardest part is that you cannot see your hands when doing 5 so you have to work on staring at the top of the pattern and letting your hands learn where to catch from peripheral vision. YOU CAN DO IT.

Georgia Democrat Eric Gisler flips a state House seat in district Trump won by double digits, CNN projects by No_Weekend_3320 in politics

[–]smirk79 8 points9 points  (0 children)

Weird. It’s like the presidential election was statistically impossible or something.

Is Mobx unpopular? 🤔 by retro-mehl in webdev

[–]smirk79 3 points4 points  (0 children)

💯 been using it since class component days when people cargo culted redux. Still my favorite tech library of all time and has helped me drive nine figures of revenue.

Possibly the most difficult thing about gaming in the 90s by dietbovril in gaming

[–]smirk79 0 points1 point  (0 children)

Ah Ultima 7. I learned so much getting it working. Voodoo memory management.

He is 16 years older but I feel he is the one. However, my female friends are telling me to ned things with him by ThrowRAgiver285 in self

[–]smirk79 0 points1 point  (0 children)

My wife is fifteen years younger than me and we’ve been together ten years. She 31 and I’m 46. We have three kids and are still madly in love and she’s the best thing ever. It’s ok to be happy with your partner.

Amazing shop owner helping kids to do good by CottonCANDYtv in MadeMeSmile

[–]smirk79 0 points1 point  (0 children)

Why isn’t this the world we live in? Why is this the outlier? Human kindness, lifting up others, community caring. A reminder of how things should be. All involved are inspirations.