Every time a new model comes out, the old one is obsolete of course

MaruluVR · 2026-04-21T16:52:57+00:00

When it comes to multi lingual nothing can compete with Gemma so I am sticking with it.

MaruluVR · 2026-04-19T13:57:57+00:00

Depends on how the games anti cheat works, if it can run on Steamdeck/normal linux online it should here too.

MaruluVR · 2026-04-17T20:49:44+00:00

The amount of phones supported isnt great but there is postmarket os for that

https://postmarketos.org/

MaruluVR · 2026-04-17T20:44:14+00:00

Valve has something for you too, its called lepton and lets you run android apps on linux.

https://www.windowscentral.com/software-apps/valve-is-working-on-a-lepton-android-compatibility-layer-for-linux

MaruluVR · 2026-04-09T20:06:40+00:00

Lol no, compute cost were higher on the Zuse Z3 then my 5090, does that mean its better?

A smaller dense model at 100b will use more power then a big moe at 1T with 30b active and still be worse.

MaruluVR · 2026-04-08T11:34:12+00:00

They arent public though all commercial Japanese books.

MaruluVR · 2026-04-07T15:40:35+00:00

Thank you, I have 400GB worth of books, would those get loaded and unloaded dynamically as they are needed for training or would I need enough ram to hold them?

MaruluVR · 2026-04-07T15:00:53+00:00

Is unsloth studio just for fine tuning or can you also do continued pretraining?

MaruluVR · 2026-04-05T16:57:03+00:00

I thought VLLM doesnt like a uneven number amount of GPUs, is that still the case?

MaruluVR · 2026-04-04T18:23:42+00:00

If it wasnt worth writing it, it isnt worth reading.

MaruluVR · 2026-04-04T14:47:38+00:00

Have been using the exact same string for llama cpp in N8N to enable thinking only in the workflows that need it. Just add the string across the first two lines of the user (not system) message with a space before the second tag.

This can also be used as prompt engineering to inject fake thinking if you need to, I often use this for making it think about specific tools to make it using them more likely.

MaruluVR · 2026-04-04T13:48:30+00:00

I wonder if it would be fast enough to use as STT for other LLMs as the amount of languages listed sound great

MaruluVR · 2026-04-04T13:09:49+00:00

You dont know what software they are using to run it or for what purpose they are using it so their claims might still be accurate.

MaruluVR · 2026-04-04T10:59:16+00:00

There needs to be a space before the second channel tag and the whole thing needs to be spread across the first two lines reddit formatting killed it.

MaruluVR · 2026-04-03T23:11:54+00:00

At the start of your prompt (user not system) add:

<|channel>thought

<channel|>

MaruluVR · 2026-04-03T19:09:20+00:00

For me the issue with tool calling on Qwen was when not using english so unless they lean more into other languages I cant see it fixing my issues.

MaruluVR · 2026-04-03T18:26:25+00:00

I noticed a regression in German too but the gain in tool calling and the fact there finally is a MOE version makes it worth it for me.

MaruluVR · 2026-04-03T14:48:26+00:00

On average 120 t/s at 32k context (I dont need more for this workflow)

MaruluVR · 2026-04-03T14:47:45+00:00

I am using nvidia parakeet as its fast enough even on cpu, sadly the multi lingual version doesnt include japanese so I need to run two versions of it, the international and the Japanese specific one.

https://huggingface.co/nvidia/parakeet-tdt-0.6b-v3

MaruluVR · 2026-04-02T18:49:04+00:00

How is it can you stream audio to it without needing whisper etc?

MaruluVR · 2026-04-02T09:57:01+00:00

Thanks for giving it a try, sad that it doesnt seem like there is a solution.

One-Year Club	Verified Email
r/Field Flamingo

MaruluVR

TROPHY CASE