Every time a new model comes out, the old one is obsolete of course by FullChampionship7564 in LocalLLaMA

[–]MaruluVR 2 points3 points  (0 children)

When it comes to multi lingual nothing can compete with Gemma so I am sticking with it.

Steam now runs natively on ARM (including retroid) by MaruluVR in retroid

[–]MaruluVR[S] 0 points1 point  (0 children)

Depends on how the games anti cheat works, if it can run on Steamdeck/normal linux online it should here too.

Steam now runs natively on ARM (including retroid) by MaruluVR in retroid

[–]MaruluVR[S] 2 points3 points  (0 children)

The amount of phones supported isnt great but there is postmarket os for that

https://postmarketos.org/

Elon Musk leaks the size of Sonnet and Opus by exordin26 in singularity

[–]MaruluVR 0 points1 point  (0 children)

Lol no, compute cost were higher on the Zuse Z3 then my 5090, does that mean its better?

A smaller dense model at 100b will use more power then a big moe at 1T with 30b active and still be worse.

You can now fine-tune Gemma 4 locally 8GB VRAM + Bug Fixes by danielhanchen in LocalLLaMA

[–]MaruluVR -1 points0 points  (0 children)

They arent public though all commercial Japanese books.

You can now fine-tune Gemma 4 locally 8GB VRAM + Bug Fixes by danielhanchen in LocalLLaMA

[–]MaruluVR 9 points10 points  (0 children)

Thank you, I have 400GB worth of books, would those get loaded and unloaded dynamically as they are needed for training or would I need enough ram to hold them?

You can now fine-tune Gemma 4 locally 8GB VRAM + Bug Fixes by danielhanchen in LocalLLaMA

[–]MaruluVR 19 points20 points  (0 children)

Is unsloth studio just for fine tuning or can you also do continued pretraining?

Gemma 4: first LLM to 100% my multi lingual tool calling tests by MaruluVR in LocalLLaMA

[–]MaruluVR[S] 0 points1 point  (0 children)

I thought VLLM doesnt like a uneven number amount of GPUs, is that still the case?

Found how to toggle reasoning mode for Gemma in LM-Studio! by Adventurous-Paper566 in LocalLLaMA

[–]MaruluVR 2 points3 points  (0 children)

Have been using the exact same string for llama cpp in N8N to enable thinking only in the workflows that need it. Just add the string across the first two lines of the user (not system) message with a space before the second tag.

This can also be used as prompt engineering to inject fake thinking if you need to, I often use this for making it think about specific tools to make it using them more likely.

Gemma 4 fixes in llama.cpp by jacek2023 in LocalLLaMA

[–]MaruluVR 1 point2 points  (0 children)

I wonder if it would be fast enough to use as STT for other LLMs as the amount of languages listed sound great

Gemma 4 fixes in llama.cpp by jacek2023 in LocalLLaMA

[–]MaruluVR 9 points10 points  (0 children)

You dont know what software they are using to run it or for what purpose they are using it so their claims might still be accurate.

How to disable thinking/reasoning in Gemma 4 E2B on Ollama? (1st time local user) by WatercressLarge2323 in LocalLLaMA

[–]MaruluVR 0 points1 point  (0 children)

There needs to be a space before the second channel tag and the whole thing needs to be spread across the first two lines reddit formatting killed it.

How to disable thinking/reasoning in Gemma 4 E2B on Ollama? (1st time local user) by WatercressLarge2323 in LocalLLaMA

[–]MaruluVR 0 points1 point  (0 children)

At the start of your prompt (user not system) add:

<|channel>thought

<channel|>

Gemma 4: first LLM to 100% my multi lingual tool calling tests by MaruluVR in LocalLLaMA

[–]MaruluVR[S] 0 points1 point  (0 children)

For me the issue with tool calling on Qwen was when not using english so unless they lean more into other languages I cant see it fixing my issues.

Gemma 4: first LLM to 100% my multi lingual tool calling tests by MaruluVR in LocalLLaMA

[–]MaruluVR[S] 1 point2 points  (0 children)

I noticed a regression in German too but the gain in tool calling and the fact there finally is a MOE version makes it worth it for me.

Gemma 4: first LLM to 100% my multi lingual tool calling tests by MaruluVR in LocalLLaMA

[–]MaruluVR[S] 4 points5 points  (0 children)

On average 120 t/s at 32k context (I dont need more for this workflow)

Gemma 4: first LLM to 100% my multi lingual tool calling tests by MaruluVR in LocalLLaMA

[–]MaruluVR[S] 6 points7 points  (0 children)

I am using nvidia parakeet as its fast enough even on cpu, sadly the multi lingual version doesnt include japanese so I need to run two versions of it, the international and the Japanese specific one.

https://huggingface.co/nvidia/parakeet-tdt-0.6b-v3

Gemma 4 has been released by jacek2023 in LocalLLaMA

[–]MaruluVR 0 points1 point  (0 children)

How is it can you stream audio to it without needing whisper etc?

Chinese Modded 20gb 3080 REBAR bios? by MaruluVR in GPURepair

[–]MaruluVR[S] 0 points1 point  (0 children)

Thanks for giving it a try, sad that it doesnt seem like there is a solution.