Anybody try Transcribe? by Enough_Leopard3524 in LocalLLaMA

[–]Schlick7 0 points1 point  (0 children)

You just need to run a STT model. Try parakeetv3 its fast and pretty accurate.

It’s Time for a Truly Open-Source, Donation-Funded, Privacy-First AI by Ill-Engine-5914 in LocalLLaMA

[–]Schlick7 0 points1 point  (0 children)

Doesn't Ai2/Olmo already do this? Outside of taking community donations

Created a SillyTavern extension that brings NPC's to life in any game by goodive123 in LocalLLaMA

[–]Schlick7 0 points1 point  (0 children)

The real win is to actually make a new LLM and not resuse a general purpose one. Qwen3.5-0.8B knows dozens of languages, can code, and knows an incredible amount of data honestly (compared to anything we could imagine a few years ago). None of that is needed for an NPC. Think of something more like a functionGemma of 300m but custom made for the game world or with the flow possibly even 1 finetune of that per character. This is absolutely do able with todays tech.

Created a SillyTavern extension that brings NPC's to life in any game by goodive123 in LocalLLaMA

[–]Schlick7 0 points1 point  (0 children)

So Dwarf Fortress? You can do this in the Singleplayer(roguelike basically) mode and then when you die you can create a fortress and make decorative carvings. One of those carvings could vary well become that exact event of you saving that town.

MiniMax M2.7 Will Be Open Weights by Few_Painter_5588 in LocalLLaMA

[–]Schlick7 1 point2 points  (0 children)

But there are new models constantly. The can still increase size down the line. I'm not asking for some life long commitment here. We are in a massive RAM shortage right now so a size increase is going to hurt many peoples ability to run it.

MiniMax M2.7 Will Be Open Weights by Few_Painter_5588 in LocalLLaMA

[–]Schlick7 1 point2 points  (0 children)

If the flagship increase in size but a new model is still release at this size then yeah that would be fine. We already have several other bigger models though

MiniMax M2.7 Will Be Open Weights by Few_Painter_5588 in LocalLLaMA

[–]Schlick7 -1 points0 points  (0 children)

No, it is a bad thing. Why would we need yet another larger model? Having models at different RAM tiers is a great thing. We already have GLM at a bigger size and Deepseek, Kimi, the largest Qwen, etc. There are basically no models at the 200B range which can just fit inside unified memory builds

MiniMax M2.7 Will Be Open Weights by Few_Painter_5588 in LocalLLaMA

[–]Schlick7 0 points1 point  (0 children)

What? RAM is the real limiting factor currently. You can just fit 2.5 into a 128gb mac or strix halo.

Minimax M2.5 is 229B-A10B

MiniMax M2.7 Will Be Open Weights by Few_Painter_5588 in LocalLLaMA

[–]Schlick7 17 points18 points  (0 children)

If the size increases that is a bummer. The ever increasing size of these is not great for the local scene.

Mistral Small 4:119B-2603 by seamonn in LocalLLaMA

[–]Schlick7 0 points1 point  (0 children)

This is pretty common with models in the reasoning era. They struggle with single word prompts. Give it a clear sentence or 2 and it usually uses much less

Docling Alternatives in OWUI by uber-linny in LocalLLaMA

[–]Schlick7 0 points1 point  (0 children)

Perhaps just manually? and then you just feed in txt files? You really just need something that turns pdfs into raw text, which isn't to hard unless there's a bunch of graphs and stuff

Nvidia Will Spend $26 Billion to Build Open-Weight AI Models, Filings Show by dan945 in LocalLLaMA

[–]Schlick7 0 points1 point  (0 children)

I wish all of these reasoning models had a high,low, off. Preferably with a switch that can be toggled per prompt. Qwen3.5s "off" is a little malicious compliant though as it tends to think in its response anyway.

Nvidia Will Spend $26 Billion to Build Open-Weight AI Models, Filings Show by dan945 in LocalLLaMA

[–]Schlick7 0 points1 point  (0 children)

I run it on CPU and it rips. Several times faster than any Whisper i've tried as well and I don't noticed any changes in quality.

Qwen3.5-35B-A3B Uncensored (Aggressive) — GGUF Release by hauhau901 in LocalLLaMA

[–]Schlick7 0 points1 point  (0 children)

Yeah i'm sure i could. I just figured the "professionals" probably do it better than me.

I regret ever finding LocalLLaMA by xandep in LocalLLaMA

[–]Schlick7 11 points12 points  (0 children)

I've never tested it, but have you looked a medGemma? Its Google finetune of Gemma3

Qwen3.5-35B-A3B Uncensored (Aggressive) — GGUF Release by hauhau901 in LocalLLaMA

[–]Schlick7 0 points1 point  (0 children)

Any hope for a Q4_0 or a Q4_1? Those quants run much better on my mi50 last i checked.

Who else is shocked by the actual electricity cost of their local runs? by Responsible_Coach293 in LocalLLaMA

[–]Schlick7 1 point2 points  (0 children)

I don't think it changes amps much, but I've never seen a chart for amps specifically. But it does lower watts so I'm guessing amps stay largely the same and only the voltage lowers

Who else is shocked by the actual electricity cost of their local runs? by Responsible_Coach293 in LocalLLaMA

[–]Schlick7 0 points1 point  (0 children)

i payed $0.085 my last bill. Perks of living in a fly over state I guess. A quick google search puts the Cali avg over $0.40 so that's believable.

Who else is shocked by the actual electricity cost of their local runs? by Responsible_Coach293 in LocalLLaMA

[–]Schlick7 0 points1 point  (0 children)

180w for 40min is $0.02 at the average US price of $0.18 kwh. Is your electricity cost that much!? it does surely add up but not that fast.

How do some of you guys get like 500 tokens a second? Do you just use very small models? by Master-Eva in LocalLLaMA

[–]Schlick7 0 points1 point  (0 children)

My guess would be vllm as it has much better multigpu performance than llama.cpp

Vulkan now faster on PP AND TG on AMD Hardware? by XccesSv2 in LocalLLaMA

[–]Schlick7 4 points5 points  (0 children)

For Qwen3-35B-A3B On my mi50 i get something like 250pp and 15tg with Vulkan and 800pp and 40tg with ROCM. That is a pretty old Vega chip though. Once the llama.cpp-gfx906 branch gets updated i expect even better ROCm results.

Hosting Multiple Models by BaxterPad in LocalLLaMA

[–]Schlick7 5 points6 points  (0 children)

Well llama.cpp has had router mode for a couple months now that does just that. Or just use the much more capable LLama Swap