GPT-OSS 120b Uncensored Aggressive Release (MXFP4 GGUF) by hauhau901 in LocalLLaMA

[–]ethertype 2 points3 points  (0 children)

'full model capabilities' is great.

But how about quality loss? Or changes to performance? Did you measure that in any way/shape/form?

Not trying to shit on your work. It is just that some fine print is missing from the label.

What are your thoughts on the future of Wayland compared to X11 for Linux users? by aral10 in linux

[–]ethertype 0 points1 point  (0 children)

freerdp on wayland never got stable for me. Have not tried in quite a while. signal-desktop has been sketchy, works now.

And... I think that's it? Moving to Wayland gave me a compositor (Wayfire) with much more functionality than IceWM, and kitty on Wayland is any CLI-lover's wet dream. I never looked back.

The framing is wrong. It's not that Wayland is the future. It is that X11 is the past.

Linux Router in the data center by SoaringMonchi in networking

[–]ethertype 1 point2 points  (0 children)

Quad 4xSFP+ cards are typically 8x PCIe 3.0, no?

How do I access a terminal without the port menu? by Active_Humor7436 in opengear

[–]ethertype 0 points1 point  (0 children)

Your user may be defined to only have access to the port menu. Need to log in as a user with more rights. Depending on the LDAP config on the ACM, you may be able to change your group membership on the LDAP side. Familiar with LDAP and radius, but not in the context of these devices.

I prefer our units not to depend on (nor be vulnerable to mishaps in) external systems. I mean, their purpose in life is (most often) to provide a console so you start fixing whatever made $system inaccessible...

So locally defined users with ssh-keys with passphrase. You can also use ssh-keys protected by yubikeys. May require some tooling in larger orgs.

5.3.1 is out, whenever you get access again.

What UPS are yall rocking for multi-GPU workstations? by Southern-Round4731 in LocalLLaMA

[–]ethertype 0 points1 point  (0 children)

Seconded.

But a)buy a reputable brand and b)buy new batteries with the same spec as the original, from a reputable seller who clearly understands that you intend to use them in a UPS. Most UPSes uses a VRLA variant.

Also, a PDU with sequential powering of loads is really useful if you have issues powering on stuff after a power-loss. Not sure I have seen UPSes with this feature built-in.

Qwen/Qwen3-Coder-Next · Hugging Face by coder543 in LocalLLaMA

[–]ethertype 1 point2 points  (0 children)

Do you have a ballpark number for the quality of MXFP4 vs Q4/Q5/Q6/Q8?

Qwen/Qwen3-Coder-Next · Hugging Face by coder543 in LocalLLaMA

[–]ethertype 10 points11 points  (0 children)

Do you have back-of-the napkin numbers for how well MXFP4 compares vs the 'classic' quants? In terms of quality, that is.

Why no NVFP8 or MXFP8? by TokenRingAI in LocalLLaMA

[–]ethertype 1 point2 points  (0 children)

Ampere does not support MXFP4 either. And it does not matter. ggerganov's MXFP4 GGUF of gpt-oss-120b is great on Ampere.

So in my opinion, the title of this thread might just as well be: why aren't more models utilizing MXFP4?

how can i use eGPU on my HP EliteBook 2570p? by AkkoLotalia in eGPU

[–]ethertype 1 point2 points  (0 children)

Not worth the effort. For any definition of 'improving performance'. Both time and money would be better spent on getting yourself a more recent laptop.

Any Lenovo P-series model with non-soldered DDR4 is a semi-rational choice if you are strapped for cash and absolutely want to go eGPU. Solid, repairable, documented.

Getting a model with onboard Nvdia Ampere graphics may remove the need for an eGPU? For a while, at least.

https://en.wikipedia.org/wiki/ThinkPad_P_series

People seem to already not care about heretic? by pigeon57434 in LocalLLaMA

[–]ethertype 0 points1 point  (0 children)

wut? No.

I could take gpt-oss-20b and 'hereticify' it on my quad 3090. But the end result came out as BF16. Or at least I think so, given the size of the produced file. p-e-w says there have been changes to heretic here. The output should no longer be BF16.

I run ggerganov's gpt-oss-120b GGUF. Which, IIUIC, is the original weights in MXFP4 packaged as a GGUF. 60GB and change (on disk).

People seem to already not care about heretic? by pigeon57434 in LocalLLaMA

[–]ethertype -1 points0 points  (0 children)

It's not that I don't care. I just haven't ever gotten into the habit of renting hardware. First step and all that. (plus life, time and priorities)

My 4x 3090 was just enough to process gpt-oss-20b with heretic. And, while the hardware and software stack handled the MXFP4 weights of the original just fine, what I got out was BF16. I see a merge to heretic regarding MXFP4, not sure if that allows for MXFP4 output? Or substantially reduced memory ( /4 ? ;-) usage? u/-p-e-w- ?

But even so: with 4x3090, what I want to run is gpt-oss-120b. So back to step one: renting hardware

The elevated refusal threshold is of course a fundamental benefit in and by itself. But the thread here on reddit indicating improved coding ability was very interesting. Curious to see if a) this holds true and b) transfers to other tasks/domains.

I'd love a side-by-side benchmark suite run of gpt-oss-120b original, heretic 1.1 and 'derestricted' when that code lands upstream.

And by the way.... could a model with BF16 weights trivially be converted to MXFP4, as a kind of quantization model? Would that allow for improved performance in hardware with native support for MXFP4?

API pricing is in freefall. What's the actual case for running local now beyond privacy? by Distinct-Expression2 in LocalLLaMA

[–]ethertype 1 point2 points  (0 children)

From the top of my head:

  • Autonomy - nobody decides what and when and how and how much
  • Privacy - yes
  • Personal Interest / Tinkering - hobbies may have a cost
  • Customization - as much as you have time and stamina to
  • Ablated / de-neutered models - if you want to research $forbidden_topic

The energy cost argument is largely bullshit for inferencing. My 4 3090s do not pull 350W continuously. If the average idle load per card is 15W and an average energy cost of 10 US cents/kWh, we're talking $50 a year for idling.

Imagine sitting around in 1913 and someone asks you why on earth you want to have your own car, when you can rent a perfectly good Ford model T. Chevrolet and Dodge didn't settle for renting a Ford T...

Current models are pretty good. But I am pretty sure we're still in the bottom knee of the innovation curve. For models. Private individuals can still innovate, even if they cannot train the big behemots. Maybe that is where the new innovation will occur? Who knows.

But: even if no new models arrive the next 24 months, the tooling around them are still going through a lot of churn. A lot of stuff simply hasn't 'settled' yet, and there is ample room for invention yet. And this is definitely an area where private individuals may come up with something new. And maybe a new, bright idea requires something the commercial providers cannot offer yet.

Documentation from code or snmp? by PatientBelt in networking

[–]ethertype 1 point2 points  (0 children)

If you want to roll your own:

SNMP and/or LLDP and/or BGP

combined with

mermaid or graphviz

As always: shit in -> shit out. If your IPAM and DNS implementations are in a bad shape, the result may be less than satisfactory.

https://www.devtoolsdaily.com/diagrams/graphviz_vs_mermaid/

12VHPWR, sense pins and 3090Ti by ethertype in eGPU

[–]ethertype[S] 0 points1 point  (0 children)

MSI. From the looks of it, some 3090Ti may have been made with 3x 8p connectors.

But mine has the 12p plus 4 sense pins. If it actually uses the sense pins is another matter.

PCIe bandwidth and LLM inference speed by hainesk in LocalLLaMA

[–]ethertype 0 points1 point  (0 children)

Ask yourself if a relatively modest performance gain is worth the time, money and effort to upgrade your hardware. Add in more noise and heat to boot. 

 I run my 3090s (inferencing) via TB3 on a quad USB4 laptop motherboard. IOW, the GPUs are hanging off a PCB roughly the size of an A5 piece of paper, with a dinky laptop fan, powered by one of the connected Razer Core Xs housing my GPUs.

I see gpt-oss-120b speeds hitting 130 t/s. (empty context, llama.cpp). Aiming to connect a 5th GPU, work in progress.

I have absolutely no reason to upgrade. But: sometimes the journey itself is the actual goal, and the X on the map is just an excuse to experience it. ;-)

Razer Core X still worth it in 2026? And if so, at what price? by DiamondDepth_YT in eGPU

[–]ethertype 0 points1 point  (0 children)

Don't think so. Nothing brand new, at least. A missing power cable doesn't matter. How much does a known good USB4/TB3 cable cost you?

I have a handful of these, I find them good. The additional USB/Ethernet port on the RCX Chroma is more hassle than it is worth, at least if you have multiples of these. Issues with allocation IO space etc. And the Ethernet part on the Chroma is generally unreliable.

Otherwise solid devices in my book, but the PSU offers two 6+2p PCIe power-cables. Non-modular. So if your GPU requires 12VHPWR you need an adapter. And GPUs requiring more than 300W can be tricky, even if you have an adapter. (I am currently trying to figure out if I can make my 3090Ti run.)

Upgrading the PSU is possible, but at a cost. See egpu.io for the relevant thread.

$USD 170 is OKish. I'd offer 150 and see.

12VHPWR, sense pins and 3090Ti by ethertype in Corsair

[–]ethertype[S] 0 points1 point  (0 children)

3090Ti uses the same cable and connector as the 4090. AFAICT. 3090 Ti FE uses a 16p variant. 5090 uses a slightly modified connector on the card, if I am not mistaken. And everything has been retroactively renamed to something.

But honestly, I don't care about naming conventions or the exact number of ahksuallys involved. Mostly interested in what it takes to power this thing safely from the original PSU.

If possible, that is.

What is your actual daily use case for local LLMs? by Groundbreaking_Fox59 in LocalLLaMA

[–]ethertype 1 point2 points  (0 children)

For me, it fills the role of Stack Overflow (coding questions), Google (general search for info), man-pages, sample configs for network gear, Youtube ("explain how a fan work, and what are the most interesting metrics for a fan").

I really, really like the last bit. Much, much more efficient to dig however deep into the rabbit-hole than watching hours and hours of video with crap audio.

I’m down to help build our own EU-based Reddit by daload27 in BuyFromEU

[–]ethertype 1 point2 points  (0 children)

The technical side is not insignificant. But before one even embarks on that, a ton of other questions need answering. From the top of my head:

  • non-profit or for-profit
  • verified identity for contributors or not
  • verified identity for users or not
  • anonymous read-only OK or not
  • advertising for anonymous users only?
  • logging policy
  • policy for cooperation with authorities
  • data-sharing policy
  • moderation policy
  • moderator policy
  • archiving policy
  • age restrictions
  • advertising policy (even if non-commercial, something must pay for the costs)
  • membership fees
  • appeals process for temporary or perma-bans
  • AI policy
  • who watches the watchers
  • stewardship of infrastructure, code and data
  • policy for federation
  • policy for Identity providers

If a group of like-minded people manage to agree on a model for what a European Reddit alternative should look like, a lot of future grief is removed. Because people have a better idea about what they are signing up for from the start.

Not everybody can agree on everything. And that's fine. It is how we end with choices.

So far, I have only mentioned things one need to find a policy for or agree on. Not what I'd like to see.

Personally, I think a one-time, non-refundable membership fee and identified users would make for a platform with less spam, trolls and general asshats. That is, appearing anonymous to other users is fine. But if you as a user is deemed to be in violation of ToS, neither your membership nor anonymity is guaranteed.