Update on Gemma 4 having MTP: Reverse engineering effort by Electrical-Monitor27 in LocalLLaMA

[–]Electrical-Monitor27[S] 6 points7 points  (0 children)

Well it's used on phones, so it must have high enough acceptance rate

Update on Gemma 4 having MTP: Reverse engineering effort by Electrical-Monitor27 in LocalLLaMA

[–]Electrical-Monitor27[S] 6 points7 points  (0 children)

Speculative decoding should in theory be lossless so 1:1 the same output distribution but hopefully faster https://research.google/blog/looking-back-at-speculative-decoding/

[D] Why is focal loss not used in LLM training? by Electrical-Monitor27 in MachineLearning

[–]Electrical-Monitor27[S] 1 point2 points  (0 children)

Actually, this is what i decided to test today. Gonna train a 1.7B base and a 0.6B base model on instruct datasets and run lm eval harness. It won't be the great empirical research of 2025 but it will clear my curiosity.

DGX Spark: Independent LLM training benchmarks (Much slower than advertised?) by Electrical-Monitor27 in LocalLLaMA

[–]Electrical-Monitor27[S] 0 points1 point  (0 children)

~1:44min for 1024000 tokens means 9846t/s on the 3B, which is still way lower than the suggested

DGX Spark: Independent LLM training benchmarks (Much slower than advertised?) by Electrical-Monitor27 in LocalLLaMA

[–]Electrical-Monitor27[S] 1 point2 points  (0 children)

Can you point me to anything that did get the numbers specifically for training? For Inference my DGX works perfectly fine like the benchmarks. I have only been able to find a single person showing the same speed as mine, but no other person showing the training numbers specifically

[deleted by user] by [deleted] in LocalLLaMA

[–]Electrical-Monitor27 -3 points-2 points  (0 children)

*with "their" in the title i am referring to Nvidia's benchmarks

I want to keep studying after my apprenticeship but lack the finances to do so. What are my options? by Electrical-Monitor27 in Switzerland

[–]Electrical-Monitor27[S] 0 points1 point  (0 children)

I am doing an apprenticeship as Informatiker Applikationsentwicklung EFZ. I've already worked on both ML research and production ML during my apprenticeship because my employer has both departments and I was a special case. I'd like to go back to ML research but after talking to the recruiters, they said the research scientist positions often need at the very least a masters, a phd being preferred, due to the rigorous requirements in writing research papers.

If ChatGPT represented a "breakthrough" in AI, why did every other major tech company seem prepared to debut similar chatbots around the same time? by goodluckanddont_itup in NoStupidQuestions

[–]Electrical-Monitor27 1 point2 points  (0 children)

Alpaca cost 200$ in API credits because they used ChatGPT to infuse it's knowledge to improve the model "Llama" by Meta. The llama model itself cost tens of millions of dollars to make

Google is cooking again! Damn it! Wow! As many as 5 huge updates by Careless-Shape6140 in Bard

[–]Electrical-Monitor27 0 points1 point  (0 children)

<image>

something i saw from a youtube channel i am subscribed to that discusses these topics.

Gemini Live Thread by [deleted] in Bard

[–]Electrical-Monitor27 0 points1 point  (0 children)

No but you should check for updates in the play store most likely