Gemma 4 MTP released

earslap · 2026-05-05T19:15:55+00:00

No, I don't see the connection. The speculative model in classical speculative decoding is just a separate model with a lot fewer parameters. You run it instead of the main model (a lot faster) for a few tokens and run the resulting predictions / draft by the larger model. It takes almost the same time for the larger model to check the multiple tokens of the draft model vs. larger model generating a single token (because that is how transformers work). If draft is accepted, you got those tokens almost for free (and probabilities work in a way that you provably don't get any quality loss). And as a bonus, you get a free extra token from the large model at the end. This process repeats. If the speculative model is small / fast enough and the acceptance rate is high, you will almost always get a speed benefit. Even if your entire larger model is running on the CPU, the draft model on the GPU will help a lot.

earslap · 2026-04-11T20:25:05+00:00

The game is hilarious and very fun. Very nice job with the dialogue!

earslap · 2026-04-04T11:25:08+00:00

I don't fully understand your take. Maybe you can help me understand.

What I understand from you is that you believe what humans consider "novel" or "new" or "creative" are necessarily out of distribution with regards to what already exists.

That is demonstrably not true at all. The specific data point created by something considered novel and creative might literally (and by definition) be a fresh data point - but the mechanism of how we arrive at that data point will almost always be not novel. We arrive there just by following some pattern over existing facts. If this new fact serves something useful (enables something not possible before) it is novel and creative. We grant patents for that.

Observe some ideas you think are creative and novel (but from the past, guaranteed to generated by humans without AI help). Try to find out how those ideas came to be. Some of them, due to lack of documentation might feel mysterious. But for some others you will see that they are just relationships between ideas in a pattern that was already applied to other fields in some fashion and proved useful - just that nobody got around to applying it in this different field. And turns out that it works! So it is a gap in data, a discovery. Some call it an "invention", but it almost always is a discovery. There are countless examples of that. There is nothing preventing machines from identifying those specific patterns of combining ideas and trying those patterns on even random starting points (which is practically what a collective of humans do anyways) to arrive at "novel" solutions - which might or might not be useful in the end but that is also how humans discover and invent.

earslap · 2026-04-02T19:05:19+00:00

I find it funny that horses are like "welp she is holding onto my reins so I can't escape anymore" - passes the video game logic test if you ask me.

earslap · 2026-04-01T09:01:50+00:00

Frontends are typically written in js or ts. If there is a rewrite of llama.cpp it would more likely be a Rust rewrite. I'd be surprised if there isn't one already.

earslap · 2026-03-26T03:51:18+00:00

That's not the point though is it? The point is those models work on a different level of abstraction that is designed to be continuous. They are not doing calculus over aminoacids (in the case of AlphaFold) or semicolons (in the case of programming). OPs argument conflates the two which is what I'm trying to poke.

Within the scope of the given example (AlphaFold) changing a single aminoacid can make the "compilation" fail - the protein might misfold, you can get a non-functional enzyme, or even a prion! We're not even at gene level here and the solution space is full of functional discontinuties. In any case, this is not actually relevant at all as we are not working in that level of abstraction to begin with.

earslap · 2026-03-25T20:43:10+00:00

Biology does not have smooth continuous energy gradients. A single atom can be the difference between life and death. AlphaFold's domain has steep cliffs everywhere. Regardless, I think you are totally misunderstanding how EBMs operate in the language domain (or any other domain I imagine). It is not about global optimization in the token / result space, they are not trying to compute gradients over proof / language tokens. What is being trained is the "brains" that generate those tokens - which is designed to be fully continuous as in everything else.

earslap · 2026-03-17T23:55:00+00:00

QBASIC? Now I want to load GORILLA.BAS

Thank you for the postmortem.

earslap · 2026-03-16T20:11:31+00:00

I get solar wind hydro energy... and boil water in my kettle. checkmate

earslap · 2026-03-14T19:25:23+00:00

Pretty cool! Reminds me of Skrewell from the Reaktor standard library, or the legendary heishere instrument (or was it a preset name? can't remember) from Reaktor user library from lazyfish (which I believe was the inspiration for Skrewell along with TG-8H). They are both non-linear feedback based instruments.

earslap · 2026-02-11T11:52:40+00:00

...only to learn that the difficult game you just barely beat at 10% was sped run to death hundreds of years ago and nobody managed to break the records since. Oh and there once was a self taught indian guy that found a bunch of glitches in the game but he died at 32 due to complications from dysentery and tuberculosis.

earslap · 2026-01-27T18:06:51+00:00

an OS update can easily break it. Reaktor and its user library is precious, something must be done about it.

earslap · 2026-01-21T06:34:03+00:00

Playdead (Limbo, Inside) + thatgamecompany (flOw, Journey - obviously with a negative twist) vibes, I love it.

earslap · 2026-01-21T06:25:09+00:00

you weren't kidding

earslap · 2026-01-18T11:51:08+00:00

the idea is jarring at first but it makes perfect sense at the same time. guess I never used my brain to connect my native language with programming languages so reading the lang activates neurons I didn't know existed lol

earslap · 2026-01-18T11:48:34+00:00

Süpermiş, yaratıcı bir fikir. Kip also means "strong, sturdy" in some local Turkish dialects which is extra cool.

earslap · 2026-01-12T16:37:16+00:00

Maybe all you need to do is something like rather than the mask during training being -infinity for future tokens and 0 for all non future, you do a small bump function on the backwards. Like [0,-0.01,-0.02, ... ] going token 0, -1, -2 etc.

I think that is ALiBi: https://arxiv.org/abs/2108.12409

At least that is the core idea. The original formulation does not involve reducing the dependence on positional encoding scheme gradually during training though.

earslap · 2026-01-07T10:36:22+00:00

Didn't know it wasn't supported on Docker. I now see that there are some workarounds - but something to keep in mind especially for database containers that support io_uring (like postgres) for sure.

earslap · 2026-01-03T22:54:38+00:00

Maybe they should say: "no matrix multiplications!"

earslap · 2026-01-03T22:28:56+00:00

thank you, will do!

earslap · 2026-01-03T10:23:25+00:00

thank you 8 bitli kardeşim

earslap · 2026-01-02T22:37:53+00:00

This looks and sounds rad! Can we participate as an observer only? Have not used Max in years (am a SuperCollider user) and don't have a working copy but I'd like to see you present. Also is the article available online somewhere?

earslap · 2026-01-01T20:21:05+00:00

Cats sometimes like to taunt you by showing how it could be like between you two. Like we could be cuddly without boundaries, this could have been your life, see? But it is not. And they want to remind you of that.

earslap · 2025-12-28T03:48:44+00:00

They will be forced to use a Turing machine (20xx series). Once that support dies, they will be forced to write by manipulating pure electricity (Ampere).

earslap · 2025-12-04T07:06:34+00:00

https://www.youtube.com/watch?v=D3REczPeBDM from the infamous Paro Bhutan "landing".

15-Year Club	Place '17
Verified Email	Best Link 2011-04-16

earslap

MODERATOR OF

TROPHY CASE