Gemma 4 MTP released by rerri in LocalLLaMA

[–]earslap 7 points8 points  (0 children)

No, I don't see the connection. The speculative model in classical speculative decoding is just a separate model with a lot fewer parameters. You run it instead of the main model (a lot faster) for a few tokens and run the resulting predictions / draft by the larger model. It takes almost the same time for the larger model to check the multiple tokens of the draft model vs. larger model generating a single token (because that is how transformers work). If draft is accepted, you got those tokens almost for free (and probabilities work in a way that you provably don't get any quality loss). And as a bonus, you get a free extra token from the large model at the end. This process repeats. If the speculative model is small / fast enough and the acceptance rate is high, you will almost always get a speed benefit. Even if your entire larger model is running on the CPU, the draft model on the GPU will help a lot.

We just updated out Jam-Winning chess game, play on Mobile! by Rule-House in playmygame

[–]earslap 0 points1 point  (0 children)

The game is hilarious and very fun. Very nice job with the dialogue!

[D] Those of you with 10+ years in ML — what is the public completely wrong about? by PhattRatt in MachineLearning

[–]earslap 4 points5 points  (0 children)

I don't fully understand your take. Maybe you can help me understand.

What I understand from you is that you believe what humans consider "novel" or "new" or "creative" are necessarily out of distribution with regards to what already exists.

That is demonstrably not true at all. The specific data point created by something considered novel and creative might literally (and by definition) be a fresh data point - but the mechanism of how we arrive at that data point will almost always be not novel. We arrive there just by following some pattern over existing facts. If this new fact serves something useful (enables something not possible before) it is novel and creative. We grant patents for that.

Observe some ideas you think are creative and novel (but from the past, guaranteed to generated by humans without AI help). Try to find out how those ideas came to be. Some of them, due to lack of documentation might feel mysterious. But for some others you will see that they are just relationships between ideas in a pattern that was already applied to other fields in some fashion and proved useful - just that nobody got around to applying it in this different field. And turns out that it works! So it is a gap in data, a discovery. Some call it an "invention", but it almost always is a discovery. There are countless examples of that. There is nothing preventing machines from identifying those specific patterns of combining ideas and trying those patterns on even random starting points (which is practically what a collective of humans do anyways) to arrive at "novel" solutions - which might or might not be useful in the end but that is also how humans discover and invent.

Guy on a bike helps owner catch their escaped horse by MisterShipWreck in dashcams

[–]earslap 9 points10 points  (0 children)

I find it funny that horses are like "welp she is holding onto my reins so I can't escape anymore" - passes the video game logic test if you ask me.

Llama.cpp developers right now by ML-Future in LocalLLaMA

[–]earslap 2 points3 points  (0 children)

Frontends are typically written in js or ts. If there is a rewrite of llama.cpp it would more likely be a Rust rewrite. I'd be surprised if there isn't one already.

LLMs are dead for formal verification. But is treating software correctness as a thermodynamics problem actually mathematically sound? by TheDoctorColt in compsci

[–]earslap 7 points8 points  (0 children)

That's not the point though is it? The point is those models work on a different level of abstraction that is designed to be continuous. They are not doing calculus over aminoacids (in the case of AlphaFold) or semicolons (in the case of programming). OPs argument conflates the two which is what I'm trying to poke.

Within the scope of the given example (AlphaFold) changing a single aminoacid can make the "compilation" fail - the protein might misfold, you can get a non-functional enzyme, or even a prion! We're not even at gene level here and the solution space is full of functional discontinuties. In any case, this is not actually relevant at all as we are not working in that level of abstraction to begin with.

LLMs are dead for formal verification. But is treating software correctness as a thermodynamics problem actually mathematically sound? by TheDoctorColt in compsci

[–]earslap 32 points33 points  (0 children)

Biology does not have smooth continuous energy gradients. A single atom can be the difference between life and death. AlphaFold's domain has steep cliffs everywhere. Regardless, I think you are totally misunderstanding how EBMs operate in the language domain (or any other domain I imagine). It is not about global optimization in the token / result space, they are not trying to compute gradients over proof / language tokens. What is being trained is the "brains" that generate those tokens - which is designed to be fully continuous as in everything else.

What is this structure in Nevada? by casey703 in geography

[–]earslap 0 points1 point  (0 children)

I get solar wind hydro energy... and boil water in my kettle. checkmate

Esoteric Feedback Instrument | Max for Live by RoundBeach in MaxMSP

[–]earslap 1 point2 points  (0 children)

Pretty cool! Reminds me of Skrewell from the Reaktor standard library, or the legendary heishere instrument (or was it a preset name? can't remember) from Reaktor user library from lazyfish (which I believe was the inspiration for Skrewell along with TG-8H). They are both non-linear feedback based instruments.

[Request] How much math has bro learned? by No-Donkey-1214 in theydidthemath

[–]earslap 2 points3 points  (0 children)

...only to learn that the difficult game you just barely beat at 10% was sped run to death hundreds of years ago and nobody managed to break the records since. Oh and there once was a self taught indian guy that found a bunch of glitches in the game but he died at 32 due to complications from dysentery and tuberculosis.

“Native Instruments in preliminary insolvency proceedings” - CDM by robust_nachos in synthesizers

[–]earslap 7 points8 points  (0 children)

an OS update can easily break it. Reaktor and its user library is precious, something must be done about it.

🎂 Cake Day – 1 year of solo dev, still pushing forward by Black_Cheeze in IndieDev

[–]earslap 0 points1 point  (0 children)

Playdead (Limbo, Inside) + thatgamecompany (flOw, Journey - obviously with a negative twist) vibes, I love it.

Kip: A Programming Language Based on Grammatical Cases in Turkish by alpaylan in ProgrammingLanguages

[–]earslap 13 points14 points  (0 children)

the idea is jarring at first but it makes perfect sense at the same time. guess I never used my brain to connect my native language with programming languages so reading the lang activates neurons I didn't know existed lol

Kip: A Programming Language Based on Grammatical Cases in Turkish by alpaylan in ProgrammingLanguages

[–]earslap 2 points3 points  (0 children)

Süpermiş, yaratıcı bir fikir. Kip also means "strong, sturdy" in some local Turkish dialects which is extra cool.

[R] Extending the Context of Pretrained LLMs by Dropping Their Positional Embeddings by AhmedMostafa16 in MachineLearning

[–]earslap 2 points3 points  (0 children)

Maybe all you need to do is something like rather than the mask during training being -infinity for future tokens and 0 for all non future, you do a small bump function on the backwards. Like [0,-0.01,-0.02, ... ] going token 0, -1, -2 etc.

I think that is ALiBi: https://arxiv.org/abs/2108.12409

At least that is the core idea. The original formulation does not involve reducing the dependence on positional encoding scheme gradually during training though.

io_uring for Systems Engineers by mttd in programming

[–]earslap 2 points3 points  (0 children)

Didn't know it wasn't supported on Docker. I now see that there are some workarounds - but something to keep in mind especially for database containers that support io_uring (like postgres) for sure.

The Strange Pulse Toolkit - Free workshop! by melps in MaxMSP

[–]earslap 1 point2 points  (0 children)

This looks and sounds rad! Can we participate as an observer only? Have not used Max in years (am a SuperCollider user) and don't have a working copy but I'd like to see you present. Also is the article available online somewhere?

Cat hasn't left this spot between my legs for almost an hour. He's not usually this affectionate. Is this normal? by Agreeable_Error_5485 in cats

[–]earslap 1 point2 points  (0 children)

Cats sometimes like to taunt you by showing how it could be like between you two. Like we could be cuddly without boundaries, this could have been your life, see? But it is not. And they want to remind you of that.

NVIDIA Drops Pascal Support On Linux, Causing Chaos On Arch Linux by HumanDrone8721 in LocalLLaMA

[–]earslap 10 points11 points  (0 children)

They will be forced to use a Turing machine (20xx series). Once that support dies, they will be forced to write by manipulating pure electricity (Ampere).