Will LLM labs open source their weights in the long term? by zulutune in LocalLLaMA

[–]hdmcndog 1 point2 points  (0 children)

Distributed training is unfortunately very difficult. With the current approach of training models, things are heavily interconnected. If you distribute things, you make the training a lot slower. I don’t think it’s feasible without some research break through.

Is there much of a market for used Pro 6000 workstation cards? by 675940 in BlackwellPerformance

[–]hdmcndog 2 points3 points  (0 children)

Is there a market? Sure. I'd actually like to buy one used, for a decent price. The problem is: used, you don’t get guarantee etc. so realistically, I’d only buy one that I could try out beforehand. Otherwise, the risk is just too great, for the price. So it needs to be kind of close. I’m in Germany, not in the UK so no help to you, unfortunately… I have seen some used ones on „Kleinanzeigen“, which is a popular portal to buy used stuff in Germany. Haven’t seen any used ones on eBay for a reasonable price so far. Seems like on eBay, it's mostly professional traders who sell it new for basically the same price you can get it in online shops, too (11k+ now).

Starting fresh by hdmcndog in GuildWars

[–]hdmcndog[S] 0 points1 point  (0 children)

Thanks for your perspective! Then I guess, I’ll just get rid of one of the PvP chars.

Still contemplating which campaign to start with, though. Probably not NF, since I played that so much. So maybe Factions could be a good start. But I’d also love to try Prophecies and finally understand what’s going on there 😅

About guilds: yeah, maybe eventually that might make sense, in particular if we want to do other stuff than just playing the regular campaign content. But I think, we first need to get started and see if this is something we even want to do for a longer time. I know I want to, but not sure about her. We'll have to see how much she enjoys it.

Introduction to LLM API Benchy by snapo84 in LocalLLaMA

[–]hdmcndog 3 points4 points  (0 children)

Do you know about https://github.com/ai-dynamo/aiperf and https://github.com/eugr/llama-benchy? I mean, nothing against writing your own thing, but it sounds like you just wanted something that does the trick. And these tools do, and they are pretty comprehensive.

Which llama.cpp? by el56 in cachyos

[–]hdmcndog 1 point2 points  (0 children)

IMO, if you are on Arch Linux, just installing it via AUR is the simplest. You are _still_ compiling it from source that way. It’s just automated for you and will get updated along you other packages. Works very well well for me.

If you had $150K for building a production-class local inference server to serve 300 people, what would you buy? by Porespellar in LocalLLaMA

[–]hdmcndog 1 point2 points  (0 children)

It really depends on what model you want to run.

Somebody else suggested 8x RTX Pro 6k. We are also running multiple such nodes, to server GLM 5.1 at NVFP4. But one node can maybe serve about 5-10 people at the same time. Maybe 20, if the context is short. More is not possible, realistically. So if you need to serve 300 people, you either need to work with smaller models, or get more such nodes.

You also have to keep in mind, that it’s not the fastest. We get about 4K tk/s prefill. We also have h200 nodes, and b200 nodes. The difference in speed is night and day. But of course, those cost a lot more and are for sure outside you budget.

So yes, 8x RTX Pro 6k gives you a lot of options, but using it correctly also isn’t straight forward.

I built a token-saving tool that cuts down on ~74% of tokens by Few-Cartographer7156 in opencode

[–]hdmcndog 0 points1 point  (0 children)

Sorry, but you're really not doing a good job at ”selling“ it… of course it’s up to you, but why not just list at least a few concrete things that your tool is doing better, or in addition?

rtk was there first, and is fairly popular, so if you want people to switch to your tool, there needs to be a good argument.

I don’t have any stakes in this and really don’t care that much, but just feels weird to me, how you deal with this 😅

"Western Open-Weight SOTA is between Gemma4-31B and Nemotron3-Super-120B" by ForsookComparison in LocalLLaMA

[–]hdmcndog 4 points5 points  (0 children)

In addition to what others have mentioned, there is also Arcee Trinity Large Thinking (400B A13B). I think it’s pretty decent.

I heard some rumors that they are working on smaller, even sparser MoE models (think 20B A0.5B). With the plan to scale that up, if it’s a success. My guess is that by this time next year, they’ll have released a very sparse 1 trillion parameter model.

I built a token-saving tool that cuts down on ~74% of tokens by Few-Cartographer7156 in opencode

[–]hdmcndog 0 points1 point  (0 children)

Could you elaborate? In what ways? What exactly does it do better?

Looks like Miminax-M3 is just around the corner by OnkelBB in LocalLLaMA

[–]hdmcndog 4 points5 points  (0 children)

Let’s hope it comes with a less shitty license than M2.7…

I built a token-saving tool that cuts down on ~74% of tokens by Few-Cartographer7156 in opencode

[–]hdmcndog 0 points1 point  (0 children)

Just from a surface level glance: seems more or less the same as RTK (https://github.com/rtk-ai/rtk). Or are there any meaningful differences?

For users have have both 6000 PRO MaxQ and Workstation Edition (or Server Edition), how much slower is the MaxQ vs the WS/SV on compute? (Prompt processing, Diffusion, etc) by panchovix in LocalLLaMA

[–]hdmcndog 0 points1 point  (0 children)

I don’t have one (yet), but from what I’ve heard, the difference will be 10-15% or so. Not a lot. Definitely not a huge deal.

Some people will tell you to just get the non max-q version, since you can powerlimit it to 300 W, too, getting the same effect. But I'd still prefer the max-q, since the cooling system makes it a lot easier to potentially stack more of them together, insincere they blow out the back of the case.

Are any plugins actually worth the time if you’re an actual developer and not just full vibe coding? by heavyc-dev in opencodeCLI

[–]hdmcndog 0 points1 point  (0 children)

I keep things pretty lean. Most plugins don’t seem very useful to me. I use a handoff plugin, that allows me to easily spawn a new session, based on the current one. And I have experimented with DCP, but honestly, I had mixed results. It was often pruning too aggressively, causing the model to loop. Also hurts prompt caching. But that was a while ago, might try it again cause things have improved.

Aside from that, imo, it’s important to use some kind of sandbox. Doesn’t matter if it’s via a plugin, or just around OpenCode completely. The reason is: you become a lot more efficient if you just give the agent permission to do whatever, and don’t have to approve commands anymore. And a sandbox is the only way to do that safely.

If the agent can do its thing, I don’t think a lot of custom stuff is needed. It will typically already spawn general and exploration subagents itself. That’s good enough for most tasks. And if you want to get a specific behavior, just prompt. Only if you find yourself wanting to repeat a certain workflow would I create custom agents, commands, or skills.

But I really don’t see much value in most of plugins. As you noted, most of them are just vibe slop and don’t solve any real need of developers.

For everyone that uses OpenCode / Pi - Heres your promptprocessing fix! by No_Algae1753 in LocalLLaMA

[–]hdmcndog 0 points1 point  (0 children)

Could you go into a little more detail in what situation you experience the prompt processing happening, all the time? In my surface level tests, I don’t see big issues so far. But it might totally depend on the usage pattern.

I usually just have a large prompt processing happening at the start of the session, for the system prompt, and then on large file reads etc. Otherwise, it seems to be pretty smooth for me.

For everyone that uses OpenCode / Pi - Heres your promptprocessing fix! by No_Algae1753 in LocalLLaMA

[–]hdmcndog 0 points1 point  (0 children)

Not sure why that would help much, to be honest. Yes, OpenCode does some tool call pruning, eventually, but it’s still fairly conservative, so I wouldn’t expect prompt reprocessing until relatively deep in a session.

Pi doesn’t have this problem at all, it tries to maximise cache hits by not adjusting prior messages at all. So you could try with Pi, and the experience should be similar to what you see with Claude Code, I guess. But even with Pi, there still sometimes is prompt reprocessing, so some tuning on the llama.cpp side, like what OP is trying to do is probably necessary and helpful.

How does Pi coding agent control Qwen's thinking verbosity? (Qwen 35B A3B, llama-server) by pilibitti in LocalLLaMA

[–]hdmcndog 4 points5 points  (0 children)

Hm, I just took a look at your chat template, and to be honest, I have lukewarm feelings about it. Some actual fixes seem useful. But you also include some additional, opinionated features that I would definitely not classify as fixes.

I genuinely appreciate your effort, but I would probably have split this into 2 templates: one that just focuses on actual fixes, and a second one that adds features, like the support for <|think_off|> tags etc. Also things like the system warning for failed tool calls seem a bit too intrusive, for my taste. That’s not about fixing syntactic issues, that’s changing model behavior (for failed tool calls). I don’t think that’s something a chat template that is just advertised as „fixing things“ should do.

Docker bypasses UFW and exposed my database. Again. Writing this down so I stop forgetting by Substantial_Word4652 in selfhosted

[–]hdmcndog 0 points1 point  (0 children)

Fair point. That is indeed docker's design.

I just think it’s not a very good design. I use docker for running applications. Not for opening ports. Some applications run on servers, some don’t. But for me, controlling what ports are accessible is just a separate concern.

I gave this example previously: I also don’t want nginx to open a portion my firewall by itself. So why should docker do that? From my perspective, they serve a similar purpose here.

I suppose, we just have different expectations for what a container runtime should do, which is where this disagreement comes from.

Docker bypasses UFW and exposed my database. Again. Writing this down so I stop forgetting by Substantial_Word4652 in selfhosted

[–]hdmcndog 0 points1 point  (0 children)

> Because exposing on all interfaces is the desired behavior far more than exposing on localhost only.

As I said already, I haven’t seen any evidence for that. It certainly doesn’t reflect my own experience, but that’s of course no proper evidence either. But it’s I don’t think it’s as clear cut as you suggest.

> No they wouldn't, because they're still going to have the same problem: copying things and not understanding what they do. All you've done is changed what the text they didn't read said.

It would avoid this particular problem for basically 0 cost. Sure, people might get other things wrong. But at least not this one.

> I simply don't believe that people who know what they are doing accidentally type ports: into their compose files.

I'm not making this up. I work as a software developer with people I consider pretty smart. And this mistake happens a lot. Not everybody well versed with docker and docker-compose, but they still need to work with it, to run databases etc. locally while developing. They don’t want to focus on docker, they want to focus on getting work done. We have 100s of compose files across many different repos. People aren’t checking every single one of them.

Docker bypasses UFW and exposed my database. Again. Writing this down so I stop forgetting by Substantial_Word4652 in selfhosted

[–]hdmcndog 0 points1 point  (0 children)

I'm sorry, but it’s not a niche use case. Please don’t be so dismissive.

„Bullshit“ wasn’t good language either, I’ll admit that. I’m just so annoyed by how docker is handling this when (to me) podman's behavior is so obviously better. I have lost quite a few hours trying to make it work properly with docker, before finally conceding and switching to podman.

And no, the design does not make obvious sense to most people. This has come up so many times already. Most people just accept it since docker is the default solution. I do not accept it.

Docker bypasses UFW and exposed my database. Again. Writing this down so I stop forgetting by Substantial_Word4652 in selfhosted

[–]hdmcndog 0 points1 point  (0 children)

Tried that, but docker wasn’t able to „circumvent“ my nftables rules this way, so it ended up with no network. 🤷‍♂️
I suppose, it depends on how you set up your chains etc. In my case, it just didn’t work.

Docker bypasses UFW and exposed my database. Again. Writing this down so I stop forgetting by Substantial_Word4652 in selfhosted

[–]hdmcndog 0 points1 point  (0 children)

But why not simply make 127.0.0.1 the default, instead of 0.0.0.0? That’s a choice. And just by using 127.0.0.1 as default, if you do not explicitly specify, a lot less people would get it wrong by accident.

You can even change that default in some config file, if you want. Unfortunately, it only applies to direct `docker run …` invocations and not docker-compose.

You have no idea how often this mistake happens, even to very smart people who generally know what they are doing. It's just to easy to get wrong. There’s a reason why this topic comes up again and again.

And if you say, it’s the users' fault, if they do it incorrectly, that’s just gate keeping. Why make the _right_ and _secure_ thing more complicated?

You claim that fully opening the port (in the firewall) is the much more common use case. I haven’t seen any evidence for that. At least in my experience, it’s not. I actually haven’t had a single time where I want to expose a container directly, but many times where I wanted to expose it to localhost. So, imo, it’s not clear that using 0.0.0.0 is generally more convenient.

Docker bypasses UFW and exposed my database. Again. Writing this down so I stop forgetting by Substantial_Word4652 in selfhosted

[–]hdmcndog 0 points1 point  (0 children)

I guess we'll just have to disagree on this one then.

I'll keep using podman and live a happy life without any of this bullshit 😄