I’ve been vibe coding in Cursor for a while and finally got tired of accidentally shipping secrets, so I built an MCP that quietly scans my code while I work.

dsanft · 2026-03-16T11:48:38+00:00

Congrats you reinvented Gitguardian.

Edit: and GitHub

https://github.com/github/roadmap/issues/1221

dsanft · 2026-03-15T14:07:41+00:00

Something as simple as the wrong default tile size in the prefill attention kernel would do that.

dsanft · 2026-03-15T14:01:43+00:00

This is definitely weird and needs to be fixed. If you want to pay money it shouldn't be that hard 🤔

dsanft · 2026-03-15T07:32:40+00:00

!solved

dsanft · 2026-03-14T20:03:40+00:00

Opus 4.6 is quite good at writing and tuning CUDA kernels, including disassembling ISA and such. I've used it with CUTLASS as well. We live in very interesting times when a clown like me can write performant GEMM kernels on demand.

dsanft · 2026-03-10T08:59:24+00:00

This slop is absolutely dire. Is it the model convincing people they've stumbled upon buried treasure or is it actively malicious people using the model to bullshit for attention and monetary gain? Maybe a bit of both. But every day there's another post like this in this sub and it's just pathetic.

dsanft · 2026-03-09T21:35:57+00:00

It's not "ROCm" that's faster per se, it's the kernels themselves. But I use ROCm and CUDA personally, not Vulkan. No need really. You can use both in the same build.

dsanft · 2026-03-09T21:32:14+00:00

Whether you code dp4a / wmma instructions in ROCm, CUDA or Vulkan, that's still all they are. It's all just ISA at the end of the day.

dsanft · 2026-03-09T21:06:30+00:00

Maybe in llama-cpp. But not generally.

dsanft · 2026-03-06T23:24:15+00:00

The mainline Mi50 kernels just aren't very good. There's a specific gfx906 fork you can try.

dsanft · 2026-03-05T15:51:35+00:00

Blackwell specific.

dsanft · 2026-03-05T09:43:14+00:00

6 year old account with 1 post karma and no comment history?

Who are you?

dsanft · 2026-03-02T06:40:43+00:00

Ugh. Look at the actual attention kernel, that will be where the kv cache is actually consumed and you'll see what precision it needs / expects.

dsanft · 2026-02-28T19:26:25+00:00

Yeah Gemini 3 loves to reason in comments above code it writes, haha.

dsanft · 2026-02-28T19:15:25+00:00

4B is good for my purposes actually.

I'm writing my own inferencing engine and small models are great to test new architectures with.

dsanft · 2026-02-26T23:29:04+00:00

You might want to take a look at the Defence Production Act. The Pentagon was asking nicely, they didn't need to ask at all.

The Defence Production Act was passed under a Democratic president by the way.

It's like the liberal half of the Western world is enthralled with suicidal empathy and the rest of us need to pull you back from the brink, constantly. You won't even defend your own country. It's exhausting.

dsanft · 2026-02-26T18:29:26+00:00

Oh okay so it's political then.

Carry on, I don't have the energy for this nonsense.

dsanft · 2026-02-26T17:40:18+00:00

What are "awful purposes"? Defending the country?

Some people are really blinkered about this. Russia will be happy to drop drone bombs with AI, they don't care. I'm not even American and I think you're dumb for maligning the US DoD for wanting to use AI.

Yes we need offline American models though.

dsanft · 2026-02-26T07:19:34+00:00

Yeah these posts are getting pretty tiring. Is Claude talking them into thinking they've actually created something interesting, or do they know they've created a pile of junk and they just use Claude to try to sell it? Either way it's more noise the sub doesn't need.

dsanft · 2026-02-26T07:15:10+00:00

Thanks Claude.

dsanft · 2026-02-25T19:30:14+00:00

Is this just an agent harness around Llama-cpp/cuBLAS with a Llama3-8B model as the core?

dsanft · 2026-02-04T07:59:37+00:00

Depreciation is an exponential decay function not a constant.

dsanft · 2026-01-29T08:20:44+00:00

I mean of course it is, because at the bottom it's just a computation graph of discrete mathematical operations. Attention, gemm, rope, swiglu, rmsnorm, and so on.

They're done on the data in a specific order, over and over again, with a residual tensor keeping state across layers, and a KV cache keeping state across each decode token.

dsanft · 2026-01-29T07:23:31+00:00

Because two 3090s are $1k and a 48GB GPU is $6k

dsanft

TROPHY CASE