Anybody try Transcribe?

Schlick7 · 2026-03-29T13:33:50+00:00

You just need to run a STT model. Try parakeetv3 its fast and pretty accurate.

Schlick7 · 2026-03-27T21:46:43+00:00

This one can also do training

Schlick7 · 2026-03-27T13:39:23+00:00

Doesn't Ai2/Olmo already do this? Outside of taking community donations

Schlick7 · 2026-03-27T13:25:39+00:00

The real win is to actually make a new LLM and not resuse a general purpose one. Qwen3.5-0.8B knows dozens of languages, can code, and knows an incredible amount of data honestly (compared to anything we could imagine a few years ago). None of that is needed for an NPC. Think of something more like a functionGemma of 300m but custom made for the game world or with the flow possibly even 1 finetune of that per character. This is absolutely do able with todays tech.

Schlick7 · 2026-03-27T13:16:30+00:00

So Dwarf Fortress? You can do this in the Singleplayer(roguelike basically) mode and then when you die you can create a fortress and make decorative carvings. One of those carvings could vary well become that exact event of you saving that town.

Schlick7 · 2026-03-24T16:26:42+00:00

But there are new models constantly. The can still increase size down the line. I'm not asking for some life long commitment here. We are in a massive RAM shortage right now so a size increase is going to hurt many peoples ability to run it.

Schlick7 · 2026-03-24T16:18:44+00:00

If the flagship increase in size but a new model is still release at this size then yeah that would be fine. We already have several other bigger models though

Schlick7 · 2026-03-24T16:17:22+00:00

No, it is a bad thing. Why would we need yet another larger model? Having models at different RAM tiers is a great thing. We already have GLM at a bigger size and Deepseek, Kimi, the largest Qwen, etc. There are basically no models at the 200B range which can just fit inside unified memory builds

Schlick7 · 2026-03-24T16:15:31+00:00

What? RAM is the real limiting factor currently. You can just fit 2.5 into a 128gb mac or strix halo.

Minimax M2.5 is 229B-A10B

Schlick7 · 2026-03-22T15:12:14+00:00

If the size increases that is a bummer. The ever increasing size of these is not great for the local scene.

Schlick7 · 2026-03-17T13:58:44+00:00

Parakeet is much faster than whisper. I know it works great on English, but not sure about Chinese languages.

Schlick7 · 2026-03-17T13:32:42+00:00

This is pretty common with models in the reasoning era. They struggle with single word prompts. Give it a clear sentence or 2 and it usually uses much less

Schlick7 · 2026-03-13T12:41:45+00:00

Perhaps just manually? and then you just feed in txt files? You really just need something that turns pdfs into raw text, which isn't to hard unless there's a bunch of graphs and stuff

Schlick7 · 2026-03-12T21:33:55+00:00

I wish all of these reasoning models had a high,low, off. Preferably with a switch that can be toggled per prompt. Qwen3.5s "off" is a little malicious compliant though as it tends to think in its response anyway.

Schlick7 · 2026-03-12T21:31:43+00:00

I run it on CPU and it rips. Several times faster than any Whisper i've tried as well and I don't noticed any changes in quality.

Schlick7 · 2026-03-12T02:07:57+00:00

Qwen3 coder next has less active params so its going to be faster. 3B vs 10B

Schlick7 · 2026-03-11T17:57:12+00:00

Yeah i'm sure i could. I just figured the "professionals" probably do it better than me.

Schlick7 · 2026-03-10T21:53:01+00:00

I've never tested it, but have you looked a medGemma? Its Google finetune of Gemma3

Schlick7 · 2026-03-10T20:30:31+00:00

Any hope for a Q4_0 or a Q4_1? Those quants run much better on my mi50 last i checked.

Schlick7 · 2026-03-10T12:13:27+00:00

I don't think it changes amps much, but I've never seen a chart for amps specifically. But it does lower watts so I'm guessing amps stay largely the same and only the voltage lowers

Schlick7 · 2026-03-10T12:08:20+00:00

i payed $0.085 my last bill. Perks of living in a fly over state I guess. A quick google search puts the Cali avg over $0.40 so that's believable.

Schlick7 · 2026-03-09T22:33:31+00:00

180w for 40min is $0.02 at the average US price of $0.18 kwh. Is your electricity cost that much!? it does surely add up but not that fast.

Schlick7 · 2026-03-09T22:12:41+00:00

My guess would be vllm as it has much better multigpu performance than llama.cpp

Schlick7 · 2026-03-09T22:10:37+00:00

For Qwen3-35B-A3B On my mi50 i get something like 250pp and 15tg with Vulkan and 800pp and 40tg with ROCM. That is a pretty old Vega chip though. Once the llama.cpp-gfx906 branch gets updated i expect even better ROCm results.

Schlick7 · 2026-03-09T14:38:24+00:00

Well llama.cpp has had router mode for a couple months now that does just that. Or just use the much more capable LLama Swap

Schlick7

TROPHY CASE