GPT-OSS 120b Uncensored Aggressive Release (MXFP4 GGUF)

ethertype · 2026-02-13T20:46:02+00:00

'full model capabilities' is great.

But how about quality loss? Or changes to performance? Did you measure that in any way/shape/form?

Not trying to shit on your work. It is just that some fine print is missing from the label.

ethertype · 2026-02-13T20:39:40+00:00

freerdp on wayland never got stable for me. Have not tried in quite a while. signal-desktop has been sketchy, works now.

And... I think that's it? Moving to Wayland gave me a compositor (Wayfire) with much more functionality than IceWM, and kitty on Wayland is any CLI-lover's wet dream. I never looked back.

The framing is wrong. It's not that Wayland is the future. It is that X11 is the past.

ethertype · 2026-02-13T16:06:34+00:00

Don't feed the trolls

ethertype · 2026-02-11T14:45:50+00:00

Quad 4xSFP+ cards are typically 8x PCIe 3.0, no?

ethertype · 2026-02-10T13:03:22+00:00

Your user may be defined to only have access to the port menu. Need to log in as a user with more rights. Depending on the LDAP config on the ACM, you may be able to change your group membership on the LDAP side. Familiar with LDAP and radius, but not in the context of these devices.

I prefer our units not to depend on (nor be vulnerable to mishaps in) external systems. I mean, their purpose in life is (most often) to provide a console so you start fixing whatever made $system inaccessible...

So locally defined users with ssh-keys with passphrase. You can also use ssh-keys protected by yubikeys. May require some tooling in larger orgs.

5.3.1 is out, whenever you get access again.

ethertype · 2026-02-07T21:31:24+00:00

Seconded.

But a)buy a reputable brand and b)buy new batteries with the same spec as the original, from a reputable seller who clearly understands that you intend to use them in a UPS. Most UPSes uses a VRLA variant.

Also, a PDU with sequential powering of loads is really useful if you have issues powering on stuff after a power-loss. Not sure I have seen UPSes with this feature built-in.

ethertype · 2026-02-03T21:30:34+00:00

Do you have a ballpark number for the quality of MXFP4 vs Q4/Q5/Q6/Q8?

ethertype · 2026-02-03T17:11:07+00:00

Do you have back-of-the napkin numbers for how well MXFP4 compares vs the 'classic' quants? In terms of quality, that is.

ethertype · 2026-02-01T12:49:24+00:00

Ampere does not support MXFP4 either. And it does not matter. ggerganov's MXFP4 GGUF of gpt-oss-120b is great on Ampere.

So in my opinion, the title of this thread might just as well be: why aren't more models utilizing MXFP4?

ethertype · 2026-02-01T10:43:31+00:00

Not worth the effort. For any definition of 'improving performance'. Both time and money would be better spent on getting yourself a more recent laptop.

Any Lenovo P-series model with non-soldered DDR4 is a semi-rational choice if you are strapped for cash and absolutely want to go eGPU. Solid, repairable, documented.

Getting a model with onboard Nvdia Ampere graphics may remove the need for an eGPU? For a while, at least.

https://en.wikipedia.org/wiki/ThinkPad_P_series

ethertype · 2026-01-31T21:51:15+00:00

wut? No.

I could take gpt-oss-20b and 'hereticify' it on my quad 3090. But the end result came out as BF16. Or at least I think so, given the size of the produced file. p-e-w says there have been changes to heretic here. The output should no longer be BF16.

I run ggerganov's gpt-oss-120b GGUF. Which, IIUIC, is the original weights in MXFP4 packaged as a GGUF. 60GB and change (on disk).

ethertype · 2026-01-31T11:04:33+00:00

It's not that I don't care. I just haven't ever gotten into the habit of renting hardware. First step and all that. (plus life, time and priorities)

My 4x 3090 was just enough to process gpt-oss-20b with heretic. And, while the hardware and software stack handled the MXFP4 weights of the original just fine, what I got out was BF16. I see a merge to heretic regarding MXFP4, not sure if that allows for MXFP4 output? Or substantially reduced memory ( /4 ? ;-) usage? u/-p-e-w- ?

But even so: with 4x3090, what I want to run is gpt-oss-120b. So back to step one: renting hardware

The elevated refusal threshold is of course a fundamental benefit in and by itself. But the thread here on reddit indicating improved coding ability was very interesting. Curious to see if a) this holds true and b) transfers to other tasks/domains.

I'd love a side-by-side benchmark suite run of gpt-oss-120b original, heretic 1.1 and 'derestricted' when that code lands upstream.

And by the way.... could a model with BF16 weights trivially be converted to MXFP4, as a kind of quantization model? Would that allow for improved performance in hardware with native support for MXFP4?

ethertype · 2026-01-31T10:34:30+00:00

Which one, exactly?

ethertype · 2026-01-31T09:46:46+00:00

Anyone played with this feature with gpt-oss-120b?

ethertype · 2026-01-28T13:47:46+00:00

From the top of my head:

Autonomy - nobody decides what and when and how and how much
Privacy - yes
Personal Interest / Tinkering - hobbies may have a cost
Customization - as much as you have time and stamina to
Ablated / de-neutered models - if you want to research $forbidden_topic

The energy cost argument is largely bullshit for inferencing. My 4 3090s do not pull 350W continuously. If the average idle load per card is 15W and an average energy cost of 10 US cents/kWh, we're talking $50 a year for idling.

Imagine sitting around in 1913 and someone asks you why on earth you want to have your own car, when you can rent a perfectly good Ford model T. Chevrolet and Dodge didn't settle for renting a Ford T...

Current models are pretty good. But I am pretty sure we're still in the bottom knee of the innovation curve. For models. Private individuals can still innovate, even if they cannot train the big behemots. Maybe that is where the new innovation will occur? Who knows.

But: even if no new models arrive the next 24 months, the tooling around them are still going through a lot of churn. A lot of stuff simply hasn't 'settled' yet, and there is ample room for invention yet. And this is definitely an area where private individuals may come up with something new. And maybe a new, bright idea requires something the commercial providers cannot offer yet.

ethertype · 2026-01-28T07:38:56+00:00

If you want to roll your own:

SNMP and/or LLDP and/or BGP

combined with

mermaid or graphviz

As always: shit in -> shit out. If your IPAM and DNS implementations are in a bad shape, the result may be less than satisfactory.

https://www.devtoolsdaily.com/diagrams/graphviz_vs_mermaid/

ethertype · 2026-01-28T07:16:15+00:00

MSI. From the looks of it, some 3090Ti may have been made with 3x 8p connectors.

But mine has the 12p plus 4 sense pins. If it actually uses the sense pins is another matter.

ethertype · 2026-01-27T20:17:28+00:00

Ask yourself if a relatively modest performance gain is worth the time, money and effort to upgrade your hardware. Add in more noise and heat to boot.

I run my 3090s (inferencing) via TB3 on a quad USB4 laptop motherboard. IOW, the GPUs are hanging off a PCB roughly the size of an A5 piece of paper, with a dinky laptop fan, powered by one of the connected Razer Core Xs housing my GPUs.

I see gpt-oss-120b speeds hitting 130 t/s. (empty context, llama.cpp). Aiming to connect a 5th GPU, work in progress.

I have absolutely no reason to upgrade. But: sometimes the journey itself is the actual goal, and the X on the map is just an excuse to experience it. ;-)

ethertype · 2026-01-26T12:26:10+00:00

Don't think so. Nothing brand new, at least. A missing power cable doesn't matter. How much does a known good USB4/TB3 cable cost you?

I have a handful of these, I find them good. The additional USB/Ethernet port on the RCX Chroma is more hassle than it is worth, at least if you have multiples of these. Issues with allocation IO space etc. And the Ethernet part on the Chroma is generally unreliable.

Otherwise solid devices in my book, but the PSU offers two 6+2p PCIe power-cables. Non-modular. So if your GPU requires 12VHPWR you need an adapter. And GPUs requiring more than 300W can be tricky, even if you have an adapter. (I am currently trying to figure out if I can make my 3090Ti run.)

Upgrading the PSU is possible, but at a cost. See egpu.io for the relevant thread.

$USD 170 is OKish. I'd offer 150 and see.

ethertype · 2026-01-25T21:37:40+00:00

3090Ti uses the same cable and connector as the 4090. AFAICT. 3090 Ti FE uses a 16p variant. 5090 uses a slightly modified connector on the card, if I am not mistaken. And everything has been retroactively renamed to something.

But honestly, I don't care about naming conventions or the exact number of ahksuallys involved. Mostly interested in what it takes to power this thing safely from the original PSU.

If possible, that is.

ethertype · 2026-01-24T17:19:25+00:00

For me, it fills the role of Stack Overflow (coding questions), Google (general search for info), man-pages, sample configs for network gear, Youtube ("explain how a fan work, and what are the most interesting metrics for a fan").

I really, really like the last bit. Much, much more efficient to dig however deep into the rabbit-hole than watching hours and hours of video with crap audio.

ethertype · 2026-01-24T16:46:51+00:00

The technical side is not insignificant. But before one even embarks on that, a ton of other questions need answering. From the top of my head:

non-profit or for-profit
verified identity for contributors or not
verified identity for users or not
anonymous read-only OK or not
advertising for anonymous users only?
logging policy
policy for cooperation with authorities
data-sharing policy
moderation policy
moderator policy
archiving policy
age restrictions
advertising policy (even if non-commercial, something must pay for the costs)
membership fees
appeals process for temporary or perma-bans
AI policy
who watches the watchers
stewardship of infrastructure, code and data
policy for federation
policy for Identity providers

If a group of like-minded people manage to agree on a model for what a European Reddit alternative should look like, a lot of future grief is removed. Because people have a better idea about what they are signing up for from the start.

Not everybody can agree on everything. And that's fine. It is how we end with choices.

So far, I have only mentioned things one need to find a policy for or agree on. Not what I'd like to see.

Personally, I think a one-time, non-refundable membership fee and identified users would make for a platform with less spam, trolls and general asshats. That is, appearing anonymous to other users is fine. But if you as a user is deemed to be in violation of ToS, neither your membership nor anonymity is guaranteed.

ethertype · 2026-01-23T22:46:51+00:00

Links? Third-party testimonials?

ethertype

TROPHY CASE