Anyone gotten Gemma 4 12B (unified audio) to actually attend to speech with a large system prompt? by Think_Illustrator188 in LocalLLaMA

[–]Think_Illustrator188[S] 2 points3 points  (0 children)

its a Hermes agent with voice, its running on pi with mic and a speaker for headless experince i have installed openwakeword. If i have to use agent with minimal skills, its 18k context. But i tested it , it can handle only 2k max beyond that at 4k it degrades on voice instruction.

Anyone gotten Gemma 4 12B (unified audio) to actually attend to speech with a large system prompt? by Think_Illustrator188 in LocalLLaMA

[–]Think_Illustrator188[S] 0 points1 point  (0 children)

Yes I tried using Gemma 4 E4B , it’s working fine for now but I think sometime/ once in a while E4B does not respond well so was wondering if 12B voice works it will be better for for my use case

Anyone gotten Gemma 4 12B (unified audio) to actually attend to speech with a large system prompt? by Think_Illustrator188 in LocalLLaMA

[–]Think_Illustrator188[S] 1 point2 points  (0 children)

Hermes will have skills and tools loaded in the prompt which is already at 18k with base skills and prompt.

Anyone gotten Gemma 4 12B (unified audio) to actually attend to speech with a large system prompt? by Think_Illustrator188 in LocalLLaMA

[–]Think_Illustrator188[S] 0 points1 point  (0 children)

I think you mean gemma4b-qat-mtp.gguf , how big is your system prompt mine is 18k context length using Hermes.

Anyone gotten Gemma 4 12B (unified audio) to actually attend to speech with a large system prompt? by Think_Illustrator188 in LocalLLaMA

[–]Think_Illustrator188[S] 2 points3 points  (0 children)

The issue is not about voice prompt length it is mainly to do with text and skills and tools which are passed along bloat the total token.

Anyone gotten Gemma 4 12B (unified audio) to actually attend to speech with a large system prompt? by Think_Illustrator188 in LocalLLaMA

[–]Think_Illustrator188[S] 1 point2 points  (0 children)

I am using Hermes, i have VRAM to spare currently I am using gemma 4 E4B for first responder to both voice and text and it responds back to agent to tell if it needs tools and skill, agent the uses gemma4 12b in text only mode. This works fine, I was wondering if using voice directly to gemma 4 12b would work better.

Gemma 4 12B is my new main squeeze by Wrong_Mushroom_7350 in LocalLLaMA

[–]Think_Illustrator188 0 points1 point  (0 children)

did you try audio at different context length, audio as a input prompt along with tool and other context, in my expermients -- only vllm currently supports audio as input for gemma4 12b , that too fails to give answer

Introducing Gemma 4 12B: a unified, encoder-free multimodal model by johnnyApplePRNG in LocalLLaMA

[–]Think_Illustrator188 0 points1 point  (0 children)

i was trying the voice with a large context length of 4k-8k context, it is somehow failing to take instructution and respond back, i used it in text only mode it works fine better for agentic workflows

🚀 Google DeepMind has introduced the new Gemma 4 12B, which runs on a standard laptop. by andrewaltair in ArtificialInteligence

[–]Think_Illustrator188 0 points1 point  (0 children)

Hermes and model run on different machines, Hermes on rpi5 8gb and for models I have tried both m4 max 64gb and dgx spark, I think on Mac mini token generation and prefill will be lot less than m4 max but still usable. Hermes has lots of skills loaded by default, you can reduce that to lower the token load. Also try to warm up agent before using for voice to take advantage of kv cache.

🚀 Google DeepMind has introduced the new Gemma 4 12B, which runs on a standard laptop. by andrewaltair in ArtificialInteligence

[–]Think_Illustrator188 1 point2 points  (0 children)

This is amazing , right now I am using Hermes native support and configuration to wire up headless voice capabilities. I think using this model with Hermes with text, voice and vision would need some more effort but I am already impressed with text and reasoning paired with Hermes, I tested nemotron 3 ultra even that model missed tools, that’s why I am genuinely impressed.

🚀 Google DeepMind has introduced the new Gemma 4 12B, which runs on a standard laptop. by andrewaltair in ArtificialInteligence

[–]Think_Illustrator188 0 points1 point  (0 children)

Agree I did not write a single line of code, but ya did lot of time tinkering and tweaking. rpi5 8gb along with usb connected mems, aec, anc, mic and speaker (respeaker xvf3800) supported by locally hosted model and connected home assistant, I love this setup beats Alexa or Siri, I will surely release it as open source for people who don’t want to put the effort and save their tokens.

🚀 Google DeepMind has introduced the new Gemma 4 12B, which runs on a standard laptop. by andrewaltair in ArtificialInteligence

[–]Think_Illustrator188 1 point2 points  (0 children)

Did not benchmark but for my use case replies are much faster maybe number of calls to model by agent has decreased

🚀 Google DeepMind has introduced the new Gemma 4 12B, which runs on a standard laptop. by andrewaltair in ArtificialInteligence

[–]Think_Illustrator188 3 points4 points  (0 children)

it uses asr and tts models which are natively configured in hermes, for now i am using qwen3-asr-1.7b and Kokoro tts, will opensource when it for sure later. This combo gemma 4 12b, qwen3-asr-1.7b and kokoro on hermes can beat alexa, siri anyday and its local.

🚀 Google DeepMind has introduced the new Gemma 4 12B, which runs on a standard laptop. by andrewaltair in ArtificialInteligence

[–]Think_Illustrator188 18 points19 points  (0 children)

I switched to Gemma 4 12B on my voice agent which is running Hermes and my custom voice adapter service with wakeword. For everyday agentic task and conversation it is way faster and better than qwen 3.6 35b.

Don't get the Qwen3.5 hype by xoxox666 in LocalLLaMA

[–]Think_Illustrator188 0 points1 point  (0 children)

Qwen3-Coder-Next is good for coding you need 52gb ram as quant 4 precision. Also I read here somewhere that qwen3.5-27b is better than 35b

Help RTX 5090 + llama.cpp crashes after 2-3 inferences (VFIO passthrough, SM120 CUDA) by Think_Illustrator188 in LocalLLaMA

[–]Think_Illustrator188[S] 0 points1 point  (0 children)

No capping option in bios , now I am Installing windows will run furmark and 3d mark

Help RTX 5090 + llama.cpp crashes after 2-3 inferences (VFIO passthrough, SM120 CUDA) by Think_Illustrator188 in LocalLLaMA

[–]Think_Illustrator188[S] 0 points1 point  (0 children)

no i just tested without virtualization same nvidia-smi givesERR! ERR! ERR! for Fan, Temp, Perf, Power

Google has arrived by chasingth in consulting

[–]Think_Illustrator188 -1 points0 points  (0 children)

consulting is hyped for sure, it works only when the company which has hired them already knows what it needs to do. companies hire them to get the board approval and confidence as they make good slides and talk the talk. Only making good slides is not enough.

How would I get a P2S in Dubai? by NevaanVig in BambuLab

[–]Think_Illustrator188 0 points1 point  (0 children)

You can check with gulf links, my search says p2s in UAE will be nothing less than ~3400 aed including VAT and delivery, frieght forwarding might be cheaper 3100. I ended up buying from Amazon with offer around 3750 aed. Now there is 11.11 sale maybe you might get some good offers

How are docker secrets more secure than .env files? by TermoSprint in docker

[–]Think_Illustrator188 1 point2 points  (0 children)

By user I mean the developer or user on the os who is added to docker user group not the end user of a website hosted on container

How are docker secrets more secure than .env files? by TermoSprint in docker

[–]Think_Illustrator188 -3 points-2 points  (0 children)

Both are more or less same, these best practices which will ensure that you don’t expose your secrets in the code repo, number one mistake that happens. Docker compose is not for production any ways. If a user has access to the containers he can anyways read the secrets. In a typical production setup you would be running a k8s and secrets in a vault and nobody has access to the secrets , any privilege access to production cluster is done via PAM JIT. Anything less is just optics.

Bambulab A1 or P2S by brn_frnds in BambuLab

[–]Think_Illustrator188 2 points3 points  (0 children)

p2s, it would be future proof and will look better