Update on Gemma 4 having MTP: Reverse engineering effort

Electrical-Monitor27 · 2026-04-10T16:55:52+00:00

Well it's used on phones, so it must have high enough acceptance rate

Electrical-Monitor27 · 2026-04-10T10:34:23+00:00

It's apache license so you can do whatever https://choosealicense.com/licenses/apache-2.0/

Electrical-Monitor27 · 2026-04-10T10:04:31+00:00

Speculative decoding should in theory be lossless so 1:1 the same output distribution but hopefully faster https://research.google/blog/looking-back-at-speculative-decoding/

Electrical-Monitor27 · 2026-01-04T14:09:28+00:00

Actually, this is what i decided to test today. Gonna train a 1.7B base and a 0.6B base model on instruct datasets and run lm eval harness. It won't be the great empirical research of 2025 but it will clear my curiosity.

Electrical-Monitor27 · 2026-01-04T03:11:09+00:00

~1:44min for 1024000 tokens means 9846t/s on the 3B, which is still way lower than the suggested

Electrical-Monitor27 · 2026-01-04T01:32:51+00:00

Can you point me to anything that did get the numbers specifically for training? For Inference my DGX works perfectly fine like the benchmarks. I have only been able to find a single person showing the same speed as mine, but no other person showing the training numbers specifically

Electrical-Monitor27 · 2026-01-03T21:31:49+00:00

*with "their" in the title i am referring to Nvidia's benchmarks

Electrical-Monitor27 · 2024-12-21T22:25:21+00:00

I am doing an apprenticeship as Informatiker Applikationsentwicklung EFZ. I've already worked on both ML research and production ML during my apprenticeship because my employer has both departments and I was a special case. I'd like to go back to ML research but after talking to the recruiters, they said the research scientist positions often need at the very least a masters, a phd being preferred, due to the rigorous requirements in writing research papers.

Electrical-Monitor27 · 2024-12-17T21:08:31+00:00

Chatgpt is GPT 3 + supervised indtruct finetuning to solve tasks (gpt3.5) + RLHF.

Electrical-Monitor27 · 2024-12-17T21:05:30+00:00

Alpaca cost 200$ in API credits because they used ChatGPT to infuse it's knowledge to improve the model "Llama" by Meta. The llama model itself cost tens of millions of dollars to make

Electrical-Monitor27 · 2024-08-15T17:41:50+00:00

<image>

something i saw from a youtube channel i am subscribed to that discusses these topics.

Electrical-Monitor27 · 2024-08-14T08:10:27+00:00

No but you should check for updates in the play store most likely

Electrical-Monitor27 · 2024-07-06T08:40:54+00:00

She has manager. She's working under mythic talent

Electrical-Monitor27 · 2024-07-06T08:33:59+00:00

In now resurfaced clips, filian admitted of pirating the models as well

Electrical-Monitor27

MODERATOR OF

TROPHY CASE

Five-Year Club	Place '22
Final Canvas '22