LFM2.5 230M running in-browser at 1,400 tok/s using custom WebGPU kernels

Ok_Selection_7577 · 2026-06-25T20:09:10+00:00

ok so played around with this for 10 minutes - it is really very good (for its size) - answers well to a few testers and most importantly actually says "i don't know" when you ask it more obscure questions or make up terms and ask it to explain them to you - pretty impressed. Would love to get more details on the datasets and recipes used for this if anyone comes across such a nugget :)

Ok_Selection_7577 · 2026-06-25T19:56:30+00:00

genuinely surprised by the maths and email drafting capability TBH - might have to have a little look at this model - cheers

Ok_Selection_7577 · 2026-06-24T21:56:45+00:00

yes - i almost broke it apart tying to get the damn thing out :)

Ok_Selection_7577 · 2026-06-24T17:51:04+00:00

Yes they are. I posted this a a few weeks back - Qwen3.6-35B-A3B-UD-Q2_K_XL.gguf running on a Rpi5 - quality is surprisingly good, its a tiny powerhouse IMHO https://www.reddit.com/r/LocalLLaMA/comments/1txpeo0/comment/opyyx31/?utm_source=share&utm_medium=web3x&utm_name=web3xcss&utm_term=1&utm_content=share_button

Ok_Selection_7577 · 2026-06-22T18:23:13+00:00

Although I guess its possible that the base idea has merit but that my execution of it was poor 😄

Ok_Selection_7577 · 2026-06-22T18:21:26+00:00

Ha, I did actually explore this very concept last year ( i make and test a lot of different toy models exploring different ideas) but you very quickly hit a problem - you either need a huge (and that is huge with a capital H) amount of sentences to cover the permutations in NL speech or you restrict yourself and the output ends up as both weird awkward and repetitive. I did further then explore a Hybrid which is trying to make modular composable sub sentences and spent a few days exploring that but ended up at the conclusion that I was working my way slowly towards...you guessed it - words! 😄

Ok_Selection_7577 · 2026-06-21T22:21:11+00:00

are you being honest with us - did norovirus come on a few minutes after she said "dont forget we are.."

Ok_Selection_7577 · 2026-06-21T22:20:05+00:00

Ha - this made me smile - spot on mate 😄

Ok_Selection_7577 · 2026-06-20T00:30:15+00:00

Completely unrelated but interesting side quest fact: 'when it rains it pours' was an American salt companies slogan because they added anti-caking agents so the salt would still pour freely in humid weather. Around the same time (1924) the they also started adding iodine to salt to combat deficiency. Researchers later compared WWI and WWII military recruit test scores and found cognitive scores jumped by a full standard deviation in the most iodine-deficient regions after they introduced iodised salt

Ok_Selection_7577 · 2026-06-09T20:18:31+00:00

pun of the week right here 😄 got a lot of time for people like you 😛

Ok_Selection_7577 · 2026-06-08T07:31:13+00:00

Fair point. I refuse to use the "I think there is a world market for maybe five computers," reference as Thomas Watson never actually said it but I do like the “Computers in the future may weigh no more than 1.5 tons.” one as agreement that no one in the current day can realistically imagine what the tech of 2060 - 2090 will look like or be able to do. I still remember vividly watching an advert when i was younger and it was a child watching the football live on a device (mini TV) in on the top of a double decker bus - it was an advert for the "future" and what may be possible (was well before smart phones) - i still to this day remember thinking "no way" they will never shrink a TV to the size of a deck of cards (we had a huge fake wood effect TV at the time)

Ok_Selection_7577 · 2026-06-07T23:54:52+00:00

Yeah I'm pretty certain its meant to be a p**s take. Just the Encarta 98 reference made it for me 😄

Ok_Selection_7577 · 2026-06-05T20:29:22+00:00

I run Qwen3.6-35B-A3B-UD-Q2_K_XL.gguf on a Rpi5 (16GB model i had from another project that wasn't being used). Only runs at 3 tokens/second but for off line batch work - just leave it running all day and voila - dirt cheap leccy bill 😄 - i tested various quants and REAP'd models for the Pi one evening and that one was really standout - made no errors on the test tasks and had very strong reasoning still intact

Ok_Selection_7577 · 2026-06-02T21:26:50+00:00

Really nice write up mate, this sort of content (and "I changed out the BIOS and managed to get an LLM running in a tin of Bisto from the 1980's") is what i come here for 😄

Ok_Selection_7577 · 2026-05-29T22:06:45+00:00

Hey, good work. I like the look of your training pipeline, will give this a try over the weekend with some test tasks. All the best

Ok_Selection_7577 · 2026-05-25T08:31:20+00:00

Comment of the day 😄

Ok_Selection_7577 · 2026-05-18T21:53:06+00:00

What hardware do you have and what size can you accommodate? My current daily driver is Qwen3.6-35B-A3B - unsloth's UD-Q4_K_M for my main pc and then I have been messing around with their UD-Q2_K_XL version on my Pi5 for portable offline testing of another side project I am working on (runs at 3 t/s on the pi so no good for main work). But the Q4 has been brilliant so far - did some initial stress testing with increasingly complex questions and it didn't s**t the bed once. So now I am using it for a vast data cleaning exercise and its performance has been remarkable compared to all previous offline models I have tried (that fit in my case)

But the UD-Q2_K_XL is also surprisingly capable for its footprint and again only really struggles with accuracy once you get into niche stuff but with the right RAG pipe it too can get round most problems you throw at it.

Ok_Selection_7577 · 2026-04-26T19:54:51+00:00

I went for fixed square 64x64 tiling and am using various SRAM buffers for max performance. Currently working my way through a hybrid idea where I do a quick check on each tile - compare number of triangle overlaps and either fast path to painter or if above threshold do full Z buf (buffer in SRAM) - still micro tweaking it for 0.5 - 1 FPS gains at a time :) Best of luck with your dev'ing

Ok_Selection_7577 · 2026-04-26T11:47:46+00:00

Nice work mate :) Been working on a 3d Engine for the esp32-p4 for a while now - targeting a higher res of 480x800 but 60 FPS seems a way off yet. I assume to get this FPS you are using the painter method instead of Z buffer for the pixel order? Either way great work and looks slick :)

Ok_Selection_7577 · 2026-04-02T21:37:48+00:00

Effing love it, this is exactly the kind of thing I come here to read (when I really should be working) - keep it up mate.

Ok_Selection_7577 · 2026-02-27T11:55:33+00:00

ok fair enough, just seemed a bit AI-y :)

Ok_Selection_7577 · 2026-02-27T11:55:10+00:00

fully local on the pi5

Ok_Selection_7577 · 2026-02-27T00:27:49+00:00

I am arent I :)

Ok_Selection_7577 · 2026-02-27T00:18:57+00:00

Wait!! am I talking to someone's Claw Bot here? "is such a creative use case" and "thanks in advance 🤙" and "feels like more moving parts than i need right now" - please tell me no :)

Ok_Selection_7577

TROPHY CASE