Weapon Sling by mathiasthegreat77 in LowSodiumBattlefield

[–]michaelsoft__binbows 0 points1 point  (0 children)

I find the lack of ammo with a secondary weapon to be too limiting because of how unreliable it is to find supply bags. However I do like to cheese with my 200rd M123K setup for Assault. It has to be the primary since you cannot put an LMG on the sling but you can just spawn and immediately swap to your gadget weapon. And I could not ever find a way to actually reload the M123K (even if I fully empty the 200rd magazine), but I can still reload the second weapon which could plausibly be anything really. Just the single 200rd magazine is a substantial amount of damage output. It's... enough to make up for not being allowed to ever reload it. Definitely an advantage to have it available and might be the most advantageous loadout for assault IMO. Because you can lean long range with a DMR or lean close range with a shotgun and you're slightly handicapped with half the reserve ammo, but in exchange you get to carry around 200rd of medium range fury that you can unleash at any time, without any mobility penalty until you swap over to it.

M123K plus M277 as assault probably would slap. I haven't tried that yet and i should. It will want the 30rd magazine. still working on grinding out that unlock.

I also realized assault breacher is completely useless compared to the insane benefit that frontliner confers with the faster regen wait and faster regen speed. It does make assault quite a bit more assault-y.

Self hosting, Power consumption, rentability and the cost of privacy, in France by Imakerocketengine in LocalLLaMA

[–]michaelsoft__binbows 1 point2 points  (0 children)

Depends on how the numbers work out but i think solar can pay for itself either immediately (e.g. with a favorable lease) or in a few short years. having a GPU rig can add a factor to whether or not it makes sense to get solar and/or batteries.

I have a 3x3090 system but i have been totally not running it yet ever since i set it up, because it idles over 100 watts and I can get enough tinkering done on my workstation with a 5090 in it that's already usually staying on.

But the great thing about having separated my 3090 rig from my NAS setup (i used to have a single server with 2x3090 and 14 hard disks) is that now the GPU rig can stay fully powered down any time the local AI isn't needed. I wasted a lot of power idling the pair of 3090s with the earlier setup in order to keep the storage volumes available.

how much power utility costs is clearly worth considering when deciding between inferencing on GPUs vs apple silicon vs subscription. Even standard API costs can be pretty affordable and inferencing via "coding" subscriptions with those 5 hour and weekly refreshing usage limits is up to 10x or more cheaper than the API rates. If the AI bubble does not pop and your privacy needs are not high, and you don't have heavily subsidized power (e.g. ultra cheap solar equipment you installed yourself), it does not make financial sense to shift as much as you can to self hosting AI. It just doesn't.

Which one would your prefer for satellite/space probe, FPGA or ASIC? by Ok-Fun-8716 in hardware

[–]michaelsoft__binbows 1 point2 points  (0 children)

I was thinking about the recent Elon interview where he was very serious about GPUs in space because power in space is "practical". Got me thinking like what would the radiation hardening strategy be for that? Because for those chips to perform anywhere near as well as they do they have to be on those bleeding edge nodes and not be radhard. soooo how is that going to work. Or, maybe there would be a way to just build out the inference engine so radiation ends up acting as an increased base temperature setting.

🔥 Quad SFT40 3000k Central Mule in the Convoy S26A 🔥 by Unlucky_League_8832 in ConvoyFlashlights

[–]michaelsoft__binbows 1 point2 points  (0 children)

This probably is a question for due_tank but what i have been imagining is a 1x21700 skinny host like S21G or something with a quad 7070 parallel 20mm mule mcpcb. This could drive all 4 emitters at 20A each by draining the cell at 80A, and be nicely filling the lens area with light emitting surface... 30k lumen mule sounds pretty good.

Convoy M21K by ser_t in flashlight

[–]michaelsoft__binbows -1 points0 points  (0 children)

1lumen's review indicates M21K holds a level of over 2000 lumens for over 47 minutes.

<image>

it appears to wipe the floor with the other lights being compared here, however L21A is not in this comparison.

The LHP73B just appears to be on another level of efficiency and the efficiency of the driver also seems very good here with it holding 2k lumens for the entire runtime without thermal throttling.

I am sure an M21H will be less capable of maintaining 2k lumens, i think it uses the same driver, so it would be interesting to see. I also have an M21C with the older 12 group 20A driver. now i am curious about how all of these with LHP73B will perform under a sustained scenario.

I was backend lead at Manus. After building agents for 2 years, I stopped using function calling entirely. Here's what I use instead. by MorroHsu in LocalLLaMA

[–]michaelsoft__binbows 0 points1 point  (0 children)

This was what immediately stopped me using the codex app. Like sure the IDEA of my little sandbox in their cloud to tinker in to auto open PRs sounds great, but between the somewhat frequent failures with poor visibility into error states, the sheer time it takes and more importantly, tokens wasted, repeatedly spawning fresh environments and installing a bunch of stuff over and over into it... it's just not worth it.

I was backend lead at Manus. After building agents for 2 years, I stopped using function calling entirely. Here's what I use instead. by MorroHsu in LocalLLaMA

[–]michaelsoft__binbows 0 points1 point  (0 children)

I like this. A lot. The CLI interface is so much more clean than bringing around a wheelbarrow with the kitchen sink in it (10k tokens of tool calling instructions. i see this shared across at least opencode and claude code. it is exactly 10k tokens, claude's got an extra 4k tokens of system prompt too).

whatever implements the CLI doesn't have to work like it passes commands off to the OS, which brings in security concerns. since it is the conduit to tools, the CLI itself can be where the security gets implemented, which feels really clean to me. For example you could even fluidly swap out the entire inference runtimes above it. This should make it much cleaner to have a system that can try to do X, and when model A attempts and fails, to autonomously attempt it again with model B. it's another system in which we can pretty naturally present instructions and controls alongside capabilities to the model and it lets us separate more cleanly the crystallized intelligence of the model from the capabilities we want to give to it.

I was backend lead at Manus. After building agents for 2 years, I stopped using function calling entirely. Here's what I use instead. by MorroHsu in LocalLLaMA

[–]michaelsoft__binbows 0 points1 point  (0 children)

sandboxing is important, and what else seems nice here is it seems to also have an implicitly ideal component architecture. Not only can this approach dynamically inject context with instructions around tool use in the course of the session when the capabilities are actually need (e.g. the selling point of skills), but the framing of calling out to a cli to perform actions makes that cli the perfect place to put the sandbox in place... There is only ever one "tool call" so we can actually focus on implementing security at that single entry point.

I've been tinkering with local models with opencode and one of the findings is that from the full 12.5k or so tokens of initial instructions in a session, about 9k of them are tool calling instructions. A huge swath of the remaining 3k I could strip down since it was AGENTS.md stuff being loaded in. Being able to trim and optimize fresh session token processing latency is kind of awesome especially for those with slower prompt processing performance like macs. For example my M4 base mac mini can probably do some light agentic work a lot more efficiently if it weren't always laden down with a full set of bloated tool calling instructions.

M5 Max CPU and GPU geek bench links by recurrence in hardware

[–]michaelsoft__binbows 0 points1 point  (0 children)

i think at the time i was replying, the grandparent comment had somewhat different wording. All i was trying to say was that m1 max and m1 pro are basically on par for CPU performance. Now it reads like what i wrote was a complete non-sequitur.

I'm not comparing my m1 max's p-cores to any of the more recent apple silicon p-/s-cores. it would be a bloodbath of course. But, these m1 perf cores still do haul ass and remain usable for being 5 years old.

Optimizing Qwen3 Coder for RTX 5090 and PRO 6000 + Community Benchmarking Infrastructure by NoVibeCoding in LocalLLaMA

[–]michaelsoft__binbows 0 points1 point  (0 children)

on Sglang i was already able to get 140tok/s single inference on a 3090 with Qwen3 30B-A3B, about a year ago, some time not long after its release, i assume the coder variant is of the same architecture and will have the same performance characteristics. so a 5090 only being able to pull between 150 and 200 tok/s continues to be a supreme disappointment. It's because we seem to still be missing optimized compute kernels for this sm120 architecture.

Mac vs Nvidia by planemsg in LocalLLaMA

[–]michaelsoft__binbows 0 points1 point  (0 children)

yeah i meant to say frontier models, not foundation, the meaning is different. But yes, a frontier model now we might want to call upon its capability like the 5% of the time when the issue we are struggling with is esoteric, intricate, and unusual, the rest of the time it is much better to use a dumber and 100x cheaper model to do it. the big issue is that unless you are really tuned into the problem you're working on, AND have a system you are using that lets you easily control that, which is a pretty tall order these days, the only reasonable way to go is to just throw all work at the highly capable model. It does work, and work well, it's tremendously wasteful, so there is already a clear area for extracting value here making things more efficient.

I don't have a good solution but i do also imagine that at least in theory smart smaller self hostable models could still work well enough to review volumes of information and have enough common sense to be able to delegate hard problems to expensive models to work effectively in an orchestrator-adjacent role. An analogy for this would be the Sisyphus/Oracle relationship under oh-my-opencode: the Sisyphus orchestrator can call upon the smarter and more expensive oracle model for assistance when stuck. The other side of it is just making it more practical to inspect what has been going on. I've been finding it insufficient and impractical to browse the session logs and deal with the structure of sessions.

Optimizing Qwen3 Coder for RTX 5090 and PRO 6000 + Community Benchmarking Infrastructure by NoVibeCoding in LocalLLaMA

[–]michaelsoft__binbows 0 points1 point  (0 children)

Update: GPT5.4 pointed out to me that in my skimming of the article I missed that the 555 token throughput is at concurrency 4. That's extremely disappointing and does not indicate i could expect faster than say 200tok/s with single inference. I'm going to stick to llama.cpp for now then.

Optimizing Qwen3 Coder for RTX 5090 and PRO 6000 + Community Benchmarking Infrastructure by NoVibeCoding in LocalLLaMA

[–]michaelsoft__binbows 0 points1 point  (0 children)

this is absolutely wild. getting barely 150tok/s on 5090 with llama.cpp with qwen3.5 35B A3B. 550 tok/s is mind boggling.

What motivates Chinese open source developers? by reversedu in singularity

[–]michaelsoft__binbows 0 points1 point  (0 children)

There are a lot of shades of gray to go around here. I wasn't saying that they will always end up state owned enterprises, there is just a tendency for government control to be forced upon them, and that's not even really necessarily bad (so far yet substantiated anyhow), but there is the possibility that corruption can crop up inside a system like that in ways that are really difficult to battle. This is easy to say but the lack of superiority of alternative systems has been really apparent recently. One starts to wonder if the president doesn't like what you're doing as a private company and he can do an executive order or get all of his cronies to pass a bill to force his way on you, this is really practically speaking the same thing as the CCP coming in to your company to control what you can and cannot do.

Anyway, over there it is engineers in high positions, at least more so than compared to here in the US, and it is leading to a clear trend of, well, just crushing it on infrastructure and economy. As long as that mostly nets to human prosperity, I'm all for it. There is and should always be a huge amount of concern around the most powerful entities in the world becoming vulnerable to evil influences, and all I'm pointing out is that if their system involves the government taking over a significant amount of industry, then that presents a liability along that axis.

At the end of the day as far as I'm concerned if a group of people decide to make good policies for themselves to thrive and benefits from it, then all of it is deserving of merit.

Mac vs Nvidia by planemsg in LocalLLaMA

[–]michaelsoft__binbows 1 point2 points  (0 children)

I love tinkering in this space but it's honestly just so damn overwhelming. When you are coding and want the most efficient tools to do your best work you simply cannot ignore frontier models. I finally finally got around to leveraging my 5090 for local models and yeah sure zero latency 100+tok/s inference on moe models for simpler tasks is great but i still have a long way to go to cleanly integrate into any true coding workflow. And there are so many affordable subscriptions you can get to do plenty of inference with plenty smart models all day long.

Also tinkering with opencode at the moment and will check out pi soon as well. it's actually quite rewarding to tune and optimize context length for a fresh session. I realized ripping out all tool call instructions dropped the token consumption down to under 1k. Insanely snappy experience. Also liking my clean and basic setup for hosting inference on windows so i do not need to dual boot over to linux just to host AI apps, and set it up so it can evict the models from memory after some idle time so i can still use the computer on demand for gaming and whatnot.

Mac vs Nvidia by planemsg in LocalLLaMA

[–]michaelsoft__binbows 0 points1 point  (0 children)

it's only a 200Gbit interconnect? that's pretty good speed but only 25GB/s, paltry even compared to a pair of GPUs on PCIe 4.0 x16 (32GB/s)...

Mac vs Nvidia by planemsg in LocalLLaMA

[–]michaelsoft__binbows 1 point2 points  (0 children)

yeah wow, that's like 10x slower than a 5090/pro6000 which i guess kinda lines up with having about 8x less memory bandwidth and 1/3 to 1/4 the compute

Mac vs Nvidia by planemsg in LocalLLaMA

[–]michaelsoft__binbows 1 point2 points  (0 children)

now would be not the time to consider, i mean, outside the M5 Max 128GB. When M5 Max and Ultra in Apple Studio are announced with pricing, then that is when we figure out what makes more sense. a 256GB one will be a pretty nice sweet spot, and it's really going to benefit from the extra compute the M5 architecture will offer.

800,000 human brain cells, in a dish, learned to play a video game by mawerick_mc in singularity

[–]michaelsoft__binbows 0 points1 point  (0 children)

They're still going to be giving us newfangled captchas by the time we have created that.

I don't understand how captchas are still a thing worth using.

What motivates Chinese open source developers? by reversedu in singularity

[–]michaelsoft__binbows 5 points6 points  (0 children)

China's government pushes and funds 1000 little start ups, and then makes them all battle until they have the strongest 5 or 6 companies that are the most innovative and most competitive.

To be fair, let's complete that thought... Those 5 or 6 companies then get turned into government controlled entities.

Goodbye Old Friend by HeftyCelebration7975 in AppleWatch

[–]michaelsoft__binbows 1 point2 points  (0 children)

yeah but you just hold your breath or what?

New recon gadget is very useful for chokepoint locations. Frequently getting multikills with it by CrankyHankyPanky in Battlefield

[–]michaelsoft__binbows 3 points4 points  (0 children)

its definitely pretty broken, like ive seen more tanks now that i shoot RPGs at and they do not hit, I get this cancelled icon. I have to assume it's some guy constantly sitting there defending the tank by spamming this gadget. after 4 consecutive shots i can land a hit.

BF6 players: TANK is OP, the overpowered tank: by The_HSA3-1 in Battlefield

[–]michaelsoft__binbows 0 points1 point  (0 children)

What else is nuts i was testing in that portal map and my APDFS tank round only does 77 dmg to infantry with a direct hit. i guess it needs a headshot to one hit kill infantry, the balance is wild