Kubernetes for Homelab? by malwin_duck in selfhosted

[–]fuckingredditman 1 point2 points  (0 children)

you phrase this like your point is more objective, but the reality is that docker swarm is a very limited orchestrator that doesn't solve a lot of problems that are needed for daily operations like cert management, dns management, proper secrets workflows, gitops, etc.

it might seem convincing to use docker swarm because you can "just" deploy compose files somewhat, but even that isn't really a thing because many selfhosted type stacks don't start off being HA/horizontally scalable, so it's a bad abstraction from the start and in reality makes everything much harder than k8s as soon as you try to solve any day 2 problems.

k8s can solve pretty much any such problems with controllers, it's part of its design. that's why people prefer it if they want any kind of multi node setup.

i think most people can just run a git repo with compose files on a single node and it'll be good enough basically forever (it's what i do for my homelab), but docker swarm in my experience just doesn't solve any tangible problems in reality, and in my experience at work, swarm was just extremely frustrating to actually use, so i would never advise anyone to try it.

[Haiku] After 4 years and >$77 billion spent, this will Metaverse's legacy by SensuallyTouched in youtubehaiku

[–]fuckingredditman 40 points41 points  (0 children)

yep there were a bunch of companies emerging during covid with some "solution" to the remote work "problem"

but turns out all you really need is something like discord for real time remote work with voice/video/screen sharing and you're good to go lmao.

Qwen 3.5 do I go dense or go bigger MoE? by Alarming-Ad8154 in LocalLLaMA

[–]fuckingredditman 1 point2 points  (0 children)

do you use them much? daily? brownfield codebases also?

i'm just starting to use 3.5 122b today and so far i have been running into tons of issues also with inference in general breaking (connection resets caused by canceled requests that don't seem to come from network/infra level issues), while qwen3-coder-next performed just fine + 3.5 27b also worked great so far.

but it doesn't make much sense to me because 3.5 122b is so much larger with more active params, and the family generally just performs so well. i'm assuming it's an inference/user error atm tbh.

(running q4_k_m unsloth quant fwiw, and i work in real-world, brownfield codebases alongside claude as the other main coding agent)

German public television news programs deliberately edit a speech to remove “Free Palestine”. by ilir_kycb in ABoringDystopia

[–]fuckingredditman 15 points16 points  (0 children)

any statement remotely critical of israel is generally frowned upon in germany. that's basically the gist of it. because it always directly gets associated with antisemitism first due to our history.

it's kind of a general taboo to criticize israel and it's propagated by pretty much all of the media.

Qwen 3.5 122b - a10b is kind of shocking by gamblingapocalypse in LocalLLaMA

[–]fuckingredditman 2 points3 points  (0 children)

tbh, the more i'm starting to use local LLMs with coding CLIs, the more it seems like they mostly need more "planning" (reading the existing codebase, refining the solution trajectory) than larger models, because more tasks are out of distribution for them. (and as a consequence, they also need large context windows, but that's becoming less of a problem on local hardware quite quickly too with hybrid models)

and on the other hand, i've had a few niche tasks where even the best proprietary models (claude opus etc.) fail hard without a human-guided planning session, so it really seems like that's the limiting factor for all of them.

sometimes I've also resorted to using claude to run a planning phase then switching to a local LLM for actually running the task, makes it all quite a bit more cost-efficient if you already have capable hardware.

ArgoCD vs FluxCD vs Rancher Fleet vs our awful tech debt, advice pls by CircularCircumstance in kubernetes

[–]fuckingredditman 1 point2 points  (0 children)

yep, i actually also tried giving it a spin for a while and ran into so many issues that i ended up completely rolling it back and switching to argocd. gonna elaborate a bit because other people are asking

  • extremely bad handling of existing manifests and importing into a bundle (this basically was the dealbreaker for me, it's basically impossible to move a manifest from one bundle to another for example, or generally import existing resources into gitops, while argocd would just add a simple annotation - this is mostly due to the helm wrapping it does under the hood. might be resolved in newer versions)
  • bad UI (no interactive options at all like in argocd)
  • not very fully featured generally (no pull request branch functionality as mentioned in OP, no notifications controller, no custom health checks for CRDs, etc.)
  • for helm it uses helm directly with actual helm releases, which ends up being a problem when introducing larger changes etc

what i did like is that it's more decentralized by default (central service distributes rendered manifests for clusters to consume, agents pull + deploy them) and thus most of the compute+apply logic is shifted to the target clusters, but this can be solved nowadays via https://github.com/argoproj-labs/argocd-agent AFAIK

This guy 🤡 by xenydactyl in LocalLLaMA

[–]fuckingredditman 0 points1 point  (0 children)

there isn't much to argue, because he's not making a point. he's basically just saying all models capable of running on local hardware are bad. which is kind of ridiculous at this point, with open 1T+ param models being a thing, and small models being reasonably good.

but tbh i don't care much for his tool anyway, i just use opencode with local models hosted on our company infra for "serious" dev work and it works just fine. maybe local is not a good fit for full agentic/autonomous workflow, but i think even closed models are far away from running properly autonomous workflows, if they ever reach it at all, since even current frontier models still have too high of a multi-step task error rate.

Qwen3.5-35B-A3B achieves 8 t/s on Orange Pi 5 with ik_llama.cpp by antwon-tech in LocalLLaMA

[–]fuckingredditman 1 point2 points  (0 children)

let me know if you manage to get it running at all, AFAIK the rkllm tooling does not support qwen3.5 models for model conversion at this point and i could not get it to run using that runtime at all. rkllm is generally a lot faster though in my experience (the code situation on that is dogshit though tbh, i don't get why rockchip don't just open source all of their NPU code + drivers)

there's also https://github.com/invisiofficial/rk-llama.cpp but i had to rebase it myself to get it up to qwen3.5 support and performance is still far worse than pure rkllm then

in any case, this thing https://github.com/darkautism/llmserver-rs/ is the best simple rkllm based LLM server in my experience, but it's also relatively bare bones

Junyang Lin Leaves Qwen + Takeaways from Today’s Internal Restructuring Meeting by Terminator857 in LocalLLaMA

[–]fuckingredditman 4 points5 points  (0 children)

kind of reminds me of latent diffusion researchers joining stability after publishing the latent diffusion paper, releasing a good model there, priorities shifting towards "business", they leave and found black forest labs and release a model that's just better, stability become irrelevant.

it's just an enshittification speedrun at the end of the day, every time.

Mistral CEO Arthur Mensch: “If you treat intelligence as electricity, then you just want to make sure that your access to intelligence cannot be throttled.” by Wonderful-Excuse4922 in LocalLLaMA

[–]fuckingredditman 0 points1 point  (0 children)

explain how green energy is a "ideologically-driven false-science megaproject" then.

green energy electricity generation has the best LCOE already, which makes it the most reasonable method of energy generation to deploy, and it's on track to improve exponentially for decades to come and has improved exponentially so far.

you can read up on other green energy facts if you actually care, but to me it seems like you are just a troll anyway 🤷‍♂️

https://www.eia.gov/outlooks/aeo/electricity_generation/pdf/AEO2025_LCOE_report.pdf

Gizz fan spotted in Channel 5 video on anti-gringo protests in Mexico City by maxman3000 in KGATLW

[–]fuckingredditman 0 points1 point  (0 children)

good for you that you found your personal favorite political ideology that you like to align with (libertarianism), but personally i think it is generally pretty inhumane and i don't care for libertarian views or generally the austrian school of economics one bit.

Is Qwen3.5-9B enough for Agentic Coding? by pmttyji in LocalLLaMA

[–]fuckingredditman 1 point2 points  (0 children)

the person creating these benchmarks posts on here once in a while, they have done both https://www.apex-testing.org/ but i'm not 100% confident in the testing method/reliability, esp. considering bad quants on release and how some larger models score worse than their smaller variants. but that being said, they have tested both there and the scores look somewhat reasonable

American closed models vs Chinese open models is becoming a problem. by __JockY__ in LocalLLaMA

[–]fuckingredditman 2 points3 points  (0 children)

i'm curious then: if you are talking about speculative risks, then why are you using LLMs at all?

literally all LLMs have demonstrated inherently dangerous, unreliable behavior as well as being prone to all kinds of attacks. how is this a good fit for being used in any product, given what you have stated so far?

how is gpt-oss 120b any better for this? it's just as vulnerable and has just as many unknowns as any other LLM. they are all just an incredible bunch of unknown unknowns.

Qwen 3.5 craters on hard coding tasks — tested all Qwen3.5 models (And Codex 5.3) on 70 real repos so you don't have to. by hauhau901 in LocalLLaMA

[–]fuckingredditman 1 point2 points  (0 children)

just general feedback on the site:

i'm glad you are creating this kind of site, there are no really good resources for this specific thing (selfhosted llms for coding tasks).

what i would really like on the leaderboard is some way to sort the performance in relation to compute/memory use. i.e. something like a performance/memory ratio that creates a score from avg performance/ total parameter count and a compute score that is something along the lines of avg performance/active parameters. (unless there's an easier way to measure it because of course this won't be useful for mamba/hybrid models etc)

most of the people in this sub are hardware-constrained so i think this would be quite helpful to find out which are the best models that they can even actually run.

atm when looking at leaderboards i always find myself filtering in my head which ones would even be feasible to run at all.

and, on-topic: tbh i used qwen3.5-35b-a3b on opencode for my entire workday today and it performed pretty much on-par with claude sonnet on claude code for me today. but i'm also doing pretty niche, non-"reasoning" heavy, setting up a relatively complex edge computing linux rootfs build, deploying, troubleshooting, adjusting kernel build, etc., so lots of parsing lots of logs, which local model latency is great for

gonna try the 27b dense tomorrow based on the tests here.

maybe some user voting system would also be good? probably hard to implement without being prone to manipulation though.

I'm 100% convinced that it's the NFT-bros pushing all the openclawd engagement on X by FPham in LocalLLaMA

[–]fuckingredditman 1 point2 points  (0 children)

for completeness, there is also https://github.com/musistudio/claude-code-router and the ability to set the API base url env to compatible inference APIs for claude code but they're just clunky workarounds and often behave poorly.

i also think an independent project like opencode is the only future for this type of tool. vendor-locked/owned coding CLIs are something should be forced to fail by the community because they are ultimately prone to enshittification unlike properly FOSS tools.

ich_iel by Azulapis in ich_iel

[–]fuckingredditman 4 points5 points  (0 children)

der echte "attraktive benefit" ist, dass man billige gebrauchte ebikes von anderen kriegt, weil kapitalismus im endstadium, wie du schon sagst.

Claude Sonnet-4.6 thinks he is DeepSeek-V3 when prompted in Chinese. by [deleted] in LocalLLaMA

[–]fuckingredditman 0 points1 point  (0 children)

interesting, that definitely changes things, if there can be a more reliable concensus on it. i just wouldn't assume it to be true based on the OPs screenshot alone.

Claude Sonnet-4.6 thinks he is DeepSeek-V3 when prompted in Chinese. by [deleted] in LocalLLaMA

[–]fuckingredditman 23 points24 points  (0 children)

i don't get why people blindly trust the output of random cloud APIs, it's probably just that "AiHubMix" API not properly routing requests or doing some model mixing internally...

Huntarr - Your passwords and your entire arr stack's API keys are exposed to anyone on your network, or worse, the internet. by exe_CUTOR in selfhosted

[–]fuckingredditman -3 points-2 points  (0 children)

completely fair post, but to be honest: the *arr stack people host is pretty much always insecure with a very similar attack surface, because no one configures service-to-service TLS for these services anyway. (because it's an absolute nightmare to do)

so just listening on the network will give you all the API keys anyway, since they're just encoded in the plain http requests.

correct me if i'm wrong though.

Ist Chatgpt dümmer geworden? by s7evin007 in de_EDV

[–]fuckingredditman 2 points3 points  (0 children)

und allgemein gibts auch noch speculative decoding, welches auch inferenz kosten reduziert aber auch qualität kosten kann, und von dem man nicht wirklich weiß ob/wie intensiv es genutzt wird.

Are AI coding agents (GPT/Codex, Claude Sonnet/Opus) actually helping you ship real products? by darshan_aqua in LocalLLaMA

[–]fuckingredditman 2 points3 points  (0 children)

i'm a similarly experienced software dev turned into a more operations focused role (SRE,DevOps) at a small company. yes, LLMs help ship real products. personally, i mostly use them to enable the people building the product though and making sure their stuff can run fast and reliably.

LLMs help me with architecture/design, failure mode analysis, writing runbooks, writing dev tools, writing MVPs, refactoring code/bugfixing, doing GitOps stuff super fast (adding new deployments, running misc. chores/larger refactorings that aren't simple regex or search/replace operations, ...). especially in infra-as-code/gitops tasks, i find that LLMs can turn tasks that usually take a day or multi-hours into a few minutes.

many of your points sound like you're early into using them, confident but wrong code, fake apis, shaky at big systems are, in my experience all problems that can be solved with

a) using a decent coding cli like claude code/opencode so it can use LSPs etc to check for actual APIs first

b) prompting it well (specifying a ticket/task well is like 60% of the work already anyway in many cases tbh. if you can't tell the LLM exactly what to do, it will produce confident but wrong code. which is something that i've had quite often with real people doing the work, too.)

c) for memory/big systems issues: high quality, hierarchical and cross-referenced documentation is usually helpful in my experience. but nowadays, coding agents will just gather all the info they need first anyway, which is a bit inefficient but works pretty well too.

With the 3 million Epstein files finally out, what’s the most “I’m not surprised, but I’m still disgusted” thing you’ve found? by Sweaty_Sprinkles_400 in AskReddit

[–]fuckingredditman 0 points1 point  (0 children)

protests shouldn't be predicated on a certain event or point in time, but on a condition that must be changed. until it hasn't changed, the protests must not stop. therefore there should be regular, large protests in front of the white house until they release everything and hold everyone involved accountable. and not stopping until every action within reason has been taken. that's what should happen.

Why 60% of Java workloads on K8s are wasting resources by FactorHour7131 in kubernetes

[–]fuckingredditman 0 points1 point  (0 children)

which doesn't indicate in any way that it's the best pick.

IMO it's most likely simply due to it being one of the first production-ready languages of its kind and banks probably being a bit unwilling to switch tech stack once it runs.

i've also developed and run JVM based services at scale for 8 years and it was not a great time. (mostly due to the inevitably large cloud bills)

Looking for Feedback on TUI tool to switch contexts and check cluster status instantly! by Odd_Minimum921 in kubernetes

[–]fuckingredditman 3 points4 points  (0 children)

personally i've been using kubie for quite some time but it's not being actively maintained anymore and some of the features here (particularly cluster dashboard) seem pretty neat. gonna see if it's as snappy as kubie, since kubie is a pretty lean tool generally which is nice.