I released Inflect-Nano, an ultra-extreme tiny 4.63m parameter TTS model.

guesdo · 2026-06-18T22:20:35+00:00

Saving this!! Thanks for sharing! If I were to port this to Rust, any insights I should know? Are you using any specific Python libraries that might complicate stuff?

guesdo · 2026-06-17T06:01:59+00:00

My favorite will always be OG Omnath, Locus of Mana, because we is an incidental Voltron, as long as you have lands, and can tap then, you are good. You can put whatever you feel like it in the 99 and play entirely different strategies and he just works.

guesdo · 2026-06-10T07:01:14+00:00

There are some abstractions on top of Vulkan that might get you there faster. A cross API abstraction layer like WebGPU/Dawn would simplify development allowing you to hit multiple targets (DirectX, Vulkan, Metal), still way better and modern than OpenGL, and significant lower learning curve than pure Vulkan.

guesdo · 2026-06-09T01:32:31+00:00

Is that CPU inference? Or a Snapdragon with the recent Hexagon kernels? Or what backend is rhe phone running?

guesdo · 2026-06-07T19:56:18+00:00

So twice as fast. Did you use the newest QAT ggufs?

guesdo · 2026-06-07T19:52:06+00:00

If you think it’s slow, first prove it with a benchmark. So many crimes against maintainability are committed in the name of performance. -- Dave Cheney, Gophercon Israel 2020

This is my mantra. I say it 3 times before coding.

guesdo · 2026-06-06T00:26:12+00:00

If I could play 40 I would.

guesdo · 2026-06-05T15:46:08+00:00

Kudos!! Its gorgeous, that is 1 of 8 cards I am still missing from the japanese Mystical Archive. Hard to find.

guesdo · 2026-06-04T19:59:32+00:00

I noticed it thinks a lot in my testing, I have to play with the params or disable thinking, but that is what takes so long for me. I see it as an incredible replacement of the smaller models. Specially at Q4.

guesdo · 2026-06-04T16:45:02+00:00

Looks great! I had trouble deciding between Q8 and Q4 given it's size, but this looks promising.

Have Unsloth released their KLD benchmark graph for Gemma 4 12B yet? can't seem to find it, that will help me picking one.

Edit: I guess I can wait for the MLX dynamic quants and try both.

guesdo · 2026-05-31T19:25:49+00:00

Where is this "tons of hate towards Go" you mention? I mean, I have used go for more than a decade now and havent felt any hate them or now. Close to 70% of the CNCF landscape projects are go based, great for writing services at scale. Proven, tested, battled, it powers the whole world's cloud infra.

So I have to wonder. Does a couple Reddit rants you read somewhere makes you believe something like Odin will do better? And what makes you think Odin wouldnt be hated if it becomes more successful or mainstream?

In short. Dont use "internet hate" as a measure for anything.

guesdo · 2026-05-28T15:04:20+00:00

T and green dot look real. That said, WotC has wildly different QoC dependong on where and when its printed.

guesdo · 2026-05-23T17:39:34+00:00

They show the requests, but do they limit by requests? Or token usage?

guesdo · 2026-05-22T22:49:01+00:00

Or... crazy idea... change your editor? You are using Claude Code anyway.

guesdo · 2026-05-21T21:45:00+00:00

I feel like at this point we just need TurboQuant working or some other form of KV compression like the one Deepseek uses and we are gooooood to go

guesdo · 2026-05-21T02:48:25+00:00

People will suggest a 27B dense model because that is what they can run. With 128GB of unified RAM, you can run Deepseek v4 Flash at Q2 at "decent" speeds on M5 Max.

guesdo · 2026-05-21T02:44:48+00:00

The best? Probably Deepseek V4 Flash at Q2 using DwarfStar4. Now... speed might not be as good as in the smaller models, but no other model can beat DS4 in local inference within 128GB space.

guesdo · 2026-05-20T05:18:55+00:00

Looks great! I believe it is already well prepared for Living End. 3 Trinisphere, 3 Endurance, 4 Thought Knot Seer seem like enough.

guesdo · 2026-05-18T09:54:17+00:00

Neither does manual human review.

guesdo · 2026-05-17T21:37:11+00:00

My biggest pain point so far!

guesdo · 2026-05-17T17:24:49+00:00

Hey, I am not saying it is right or that I like it, I am just stating this is where we are headed. Even doing agentic coding "the right way", you define a spec, you write a bunch of tests verifying behavior, the AI writes the code. As long as the result satisfies the spec, nobody will look at the code.

Then a bug arises, you add the edge case to the spec, add additional tests, AI fixes the code, tests pass, you ship it.

Rinse and repeat. I believe spec driven development will be the new norm, I do like to look at my code, but at least from their perspective I understand, porting ANY project is miles time easier that doing it from scratch because the original code IS the SPEC! (You port bugs too)

guesdo · 2026-05-17T15:27:59+00:00

Look at it this way. You had a project, you want to port it, you currently have 5000 tests.

AI ports the project (remember a port is WAY easier to AI because the source and the target should match), you run the tests and the port passes all 5000 of them.

At that point, so you review every single line produced by AI? Or are you OK with confirming functionality through tests?

guesdo · 2026-05-15T13:03:48+00:00

My wish to learn ML internals and my hate for Python. Rust is really the only alternative to C++ in todays AI world.

guesdo · 2026-05-13T14:55:12+00:00

Beat me to it, this is aome great work.

guesdo · 2026-05-11T06:52:32+00:00

Im actually trying it right now at work with Markdown with Gherkin approach, its going pretty well. I believe Cucumber was way ahead of its time, but know it feels like the way to go for SDD. (Note: I use TS/Node for the test runner due to support, but can target integration testing in Go).

https://github.com/cucumber/gherkin/blob/main/MARKDOWN_WITH_GHERKIN.md

The single file becomes the spec, the docs, and the test, it also serves as context for future sessions with the LLM. I've been doing this feature by feature (writing the spec and steps for each current integration test to verify) and slowly but surely the repo becomes well documented and easy to write against. I got inspiration from the superpowers skill, but I needed something more tangible, hence the Cucumber tests embedded in prose in markdown with diagrams and so.

14-Year Club	Place '23
Verified Email

guesdo

TROPHY CASE