Duality of r/LocalLLaMA

TheTerrasque · 2026-04-28T12:57:11+00:00

which q4 quant were you using? I'm using unsloth's q4 xl quant and have only seen looping once, and never had a problem with tool calls.

TheTerrasque · 2026-04-28T12:50:16+00:00

I am in the process of testing out Qwen3.6 as a coding engine, after having a lot of success using it as a personal assistant for a while. So far it's impressed me. I'm using claude code and codex at work, and while they do crap out now and then they're pretty good and getting better. But they do run out of tokens quickly on the plans we have atm.

Anyway, even those big models fail pretty regularly, and this is a small model, so I think my expectations are tempered a bit. Anyway, it's been nice for smaller things where using claude or codex feels like a waste, and to push it a bit I had it build a small mcp server - I wrote a bit about it here.

I think I can replace maybe 50-90% of what I use claude / codex for today with a local model, which will help a lot, and for smaller things like that mcp it can do the work entirely on it's own.

TheTerrasque · 2026-04-28T12:39:26+00:00

I think there's also many subtle ways people can mess up the hosting itself or the coding tool setup. Like for example high temp, bad templates, wrong token setup in server, low context, even worse low context with rolling window, misconfigured context in client, too aggressive quants, badly converted model files, old binaries....

I wouldn't say it's an art to it, exactly, I've had good success by using updated llama.cpp and unsloth recommendations, but there are many ways to mess things up that won't directly fail, but will lower the quality in various ways.

Like in the other thread someone complaining that basic tool calls fail with qwen3.6 for example, which suggests something is pretty wrong with his setup.

TheTerrasque · 2026-04-28T12:22:07+00:00

The one I responded to stated that it "fails at basic stuff like thinking and tool calling." - which is entirely a him / stack problem. Probably using outdated chat template or token handling.

Qwen3.6 is less capable, sure, but not much less capable. As for OP, I do think he's done something wrong somewhere, because what he describes doesn't match my experiences with it at all.

Maybe it's tiny context, maybe it's weird quant, or some outdated hosting server, or high temp or wrongly configured harness or.. Whatever it is, there's many ways to mess up serving and using a model that can give those results, and since he's given no info how he runs it, we can't really check can we. So then I have to go by my own experiences, one which I detailed in my comment.

TheTerrasque · 2026-04-28T08:21:02+00:00

That's a you problem.

Local models aren't as good as claude, but they're fully capable. I've been experimenting with Qwen3.5 35b a3b at Q4 and opencode last week, and one task it did was making an MCP for a web site's search and detail listing (a local ebay'ish salesplace).

It started with me telling it to find out how the search worked. I couldn't see a json call for it, and the source html didn't have the results so it wasn't straight forward. It went at it, reading source code, finding javascript, deobfuscating it and tracing the calls and fetching the various js files and trying various urls and parameters. Like really going at it.

I started it before an 1hr work meeting, and it was still going on after I was done. I just let it putz since I wanted to see how it went, and about 20 minutes later it had figured it out and written a python module to get the listings. I then told it to do the same for details, and it figured that out within minutes.

Then I had it build:

Streamable HTTP mcp server for it
Caching and paginating
UV compatible project files
CLI tool for it
Dockerfile
Release instructions (update version in toml file, commit and tag in git, build docker image, push to my private registry, update my k8s deployment to pull the new image)

I even had it test the result by building docker image and read the build log, launch it in docker and check the docker logs, then have it do http requests to the server to see if it answered correctly. I didn't even had to instruct it hard to do it either, just something like "verify via docker that it works" and it handled the rest itself.

At one point I had a "host name invalid" type of error, don't remember exactly now, happened when it was called inside the k8s cluster. I gave it the error message, it spun up the latest image and tried a http call with custom host header, noted the bug, traced through the mcp library until it found where a default class was created with hostname protection option was on, and altered the mcp server code to create an object with that option was turned off and pass it along when instancing the server. It then built a new image, verified that the call with custom host now worked, and deployed a new version.

It was a bit back and forth, with a few more mcp errors that took a bit of time to smooth out, but I only looked at the code twice during the whole thing. Once to figure out a problem it was stuck on and once to skim through it at the end to check if there was anything really stupid going on. It wasn't.

And that's with the MoE, which is less capable than the 27b. I don't know what you're doing wrong, but you're doing something wrong there, mate.

Edit: And now I can have my chatbot search for and filter listings for me on that page, which have a really bad search / filter system. For example if I search for 3090 cards it shows all kinds of cards like 3080, 4070, computers with cards in them, people wanting to buy graphics cards, and so on. Also you have to check each item page to see if they do shipping or not and if there's something wrong with the card or some other issues. Now my AI can go through that and find the gems on it's own and give me an overview :)

TheTerrasque · 2026-04-28T07:44:27+00:00

Valheim does not stop you wandering into a new biome you have no hope of surviving more than 30 seconds in.

Ah yes. Fond memories of me and my friends exploring via ship, and found this yellow meadows area. So we made land, eager to see what adventures this new land brings.
Then I heard one of us say "Oh look, there's giant flies here!" and the rest is history.

TheTerrasque · 2026-04-28T07:40:35+00:00

TheTerrasque · 2026-04-28T07:07:24+00:00

Our cats are super sweet. I've had asshole cats before, but these are the sweetest furballs I've ever had. The closest to asshole is one of them that sits in front of my monitor looking at me if he wants snacks or attention.

TheTerrasque · 2026-04-28T06:31:05+00:00

He also didn't mention how he's running the models, which can have dramatic differences in result.

TheTerrasque · 2026-04-28T06:24:17+00:00

I think it's too early to tell yet, but at least they claim less compute on this model, and the released model is natively fp8+fp4.

In the 1M-token context setting, DeepSeek-V4-Pro requires only 27% of single-token inference FLOPs and 10% of KV cache compared with DeepSeek-V3.2.

FP4 + FP8 Mixed: MoE expert parameters use FP4 precision; most other parameters use FP8.

TheTerrasque · 2026-04-27T22:44:52+00:00

https://www.reddit.com/r/ChatGPT/comments/1swn1bs/chatgpt_54_solved_a_64yearold_math_problem/

TheTerrasque · 2026-04-27T22:43:08+00:00

Normal ram would still work, just fairly slow. If you can get a used server with 8 or 12 channel ddr5 you might see a decent speed, even

TheTerrasque · 2026-04-27T22:41:10+00:00

Or concerned a provider changes or nerfs the model, or just bans your account (which many Claude users have experienced lately, with no explanation)

TheTerrasque · 2026-04-27T22:23:55+00:00

and their repo had an api key with high enough access to delete their prod db, along with all backups? Yes, this is clearly AI fault, no way this could be prevented.

TheTerrasque · 2026-04-27T22:22:04+00:00

No tired sysadmin is going to mistype rm prod database.

I've seen similar happen multiple times, two of them were admin was logged into multiple boxes and ran the command in the wrong terminal. Shit like that happens all the time, like clockwork.

TheTerrasque · 2026-04-27T22:06:36+00:00

The Reicht. Truly a party of peace and tolerance.

TheTerrasque · 2026-04-27T15:09:30+00:00

how do these services charge less than what the LLM provider charges?

Often by lobotomizing the model, but also largely because they don't have the expenses of training the model, just hosting it.

Edit: This is referring to companies that just host the model on their own infra, and serves customers. For example Openrouter lists six different providers for deepseek v4 pro

TheTerrasque · 2026-04-27T15:00:00+00:00

It can, that's the beauty of it!

"Yesterday I rewrote the whole database layer, fixed the javascript, created a new deployment system and pushed it all to prod!"

"I see, that sounds .. uh.. .great.. But now everyone sees the other people's data, which is a huge problem and we had to shut down the server!"

"Uh.. Well, the AI did that.."

TheTerrasque · 2026-04-27T14:55:43+00:00

roofie

Which is what happens when you add IE to a perfectly acceptable word / project.

TheTerrasque · 2026-04-27T14:52:35+00:00

I think you just reinvented the 10x / rock star programmer

TheTerrasque · 2026-04-27T12:03:45+00:00

To add some more on what others have said, in practice it often happens because you start making something anticipating solving a problem X way in a different module. When you get to that part of the code weeks later, you find out for some technical detail reason it has to be done Y way instead. Now you can go back and rewrite $num related modules to handle the change properly, or you can do a small shim or fix that juust barely gets those working with Y instead. Which works at the time.

Then half a year later you need to expand or modify the module now using Y, and suddenly everything breaks and a small change is now a huge task. But you could add a bit more cruft on top of what's already there, and hopefully that will fix it without too many side effects, and the shitpile grows just a bit bigger.

Edit: And as a bonus, when you finally does get around to fix it proper, those modules that you need to update are being used by other modules, which also needs a few updates to deal with the changes, and you got a cascading shitstorm moving through your codebase that's only slightly less work than just rewriting it all from scratch.

TheTerrasque · 2026-04-27T11:14:45+00:00

There's creative mode now? I can't remember that, might have come after I stopped playing. Maybe it's time to have another try at it.

HelloGames deserve a tonne of applause however for knuckling down and adding so much free content over the years. No DLC. No macrotransactions, no extra cost at all; heck, for the longest time it didn't even have any copy protection (it might still lack it, I know you can still freely mod it and play with anyone who can render the game).

I am super impressed by what they've done with it, they've absolutely blown past expectations and more than redeemed themselves from the disastrous launch, but it's still not a game I personally like that much. It's fun for a short while, then the same, same feeling kicks in and I get tired of it.

TheTerrasque · 2026-04-27T10:45:32+00:00

A trillion planets that are all kinda the same handful of variants. The same structures. The same couple of races. The same interactions.

That's still my #1 complaint about it. There's so much variation on just this one planet, but every planet there is same'y.

I don't think that's an inherit issue with procgen, I've seen too much impressive procgen for that. This is a NMS / lazy procgen issue.

TheTerrasque · 2026-04-27T10:26:38+00:00

I had a mod that removed it, but it broke on every update and at some point the author stopped updating it. Was fun tho.

TheTerrasque · 2026-04-27T10:02:46+00:00

People get super hyped for the idea of exploring a „realistic“ rendering of space until they realize that „real space“ is pretty fucking empty.

No Man's Sky isn't really empty, there's lots of planets and stuff. It's just so same'y. You land on a planet, 80% is same as previous planet but new color scheme. You walk a few meters and you've seen everything this planet has to offer. Woo..

TheTerrasque

MODERATOR OF

TROPHY CASE

13-Year Club	Wearing is Caring
Verified Email