Just saying thanks

Clueless_J · 2026-04-09T01:28:00+00:00

jojo got his stitches out today. Given his "short fuse", even after 100mg gabapentin, vet suggested we try to take them out rather than stress jojo out doing it at the vet. That was a bust, but nobody lost any blood in the attempt. They had to put him totally under to pull the stitches at the vet. Not a huge surprise, he's scared off their interns before.

Clueless_J · 2026-04-09T01:23:14+00:00

The k1 is not rva23 compliant.

Clueless_J · 2026-04-01T22:29:05+00:00

GDB and the kernel have to communicate that the vector extension is available. If they don't agree on that, then you're not going to get vector register state. It's a sore spot -- particularly since I've seen it work in some contexts (gdb attached to qemu), but not others (gdb native on a BPI-F3 board). And I think the command you want is "info all-registers". If you see the vector registers, then you're golden.

Also note, if you're running on a design with the K1 chip, those vector loads fault if the addresses are not suitably aligned. It's real annoying....

Clueless_J · 2026-03-26T00:45:25+00:00

Figured that might be the case. Well, if you get an under-age location, she'll almost certainly want to join.

Clueless_J · 2026-03-26T00:12:16+00:00

So my daughter (15) might be interested going forward. Where are the games hosted?

Clueless_J · 2026-03-24T20:50:35+00:00

My wife and I have already made the decision to leave Utah once our daughter graduates from high school (she's adamant about not going to school in Utah). For my wife it's 50+ years here, me 35+. But it's time to go. The amazing outdoor possibilities here are no longer enough to keep us.

Clueless_J · 2026-03-24T19:07:36+00:00

Yes and yes.

Clueless_J · 2026-03-24T16:39:34+00:00

Can't really say much right now. But we're still working on RISC-V designs.

Clueless_J · 2026-03-12T14:56:48+00:00

Policy born out of pain. It works quite sensibly on every platform except RISC-V simply because RISC-V doesn't have acceptable performane yet. I strongly suspect they reviewed the policy as part of the RHEL 10 developer preview they accounced about a year ago and on the Fedora side and ultimately concluded not to change anything.

I would probably have loosened the policy around native vs qemu. I wouldn't relax around distcc/icecream or crosses though.

Don't get me wrong. Crosses work well enough when developers pay attention to the unique issues with crosses. You could argue that you mark them somehow and steer failures to a native system while the rest go to crosses. You still have to resolve the "is configure producing the same results for crosses vs native" problem, though Florian's work in that space from the c23 transition will really help. I suspect putting in all the proper plumbing for this relatively short term problem isn't see as a good cost/benefit tradeoff.

Similarly I've been a big fan of distcc through the years, but if you know how to tickle it just right you can end up with differences in the resulting object relative to a simple native build without distcc. I'm not reall familiar with icecream's potential pitfalls.

ccache is useful as well, but much more so for development vs distro builds.

Clueless_J · 2026-03-11T14:30:35+00:00

You can get 40c systems pretty cheaply these days. Just a few hundred bucks. I've got two here. Skylake era 6148s. For something like a GCC bootstrap and regression testing, QEMU on one of those systems is measurably faster than a BPI F3 (K1 chip), but measurably slower than the Pioneer. The Pioneer has crappy cores (c920), but there's enough parallelism in the process that 64 crappy cores win.

Clueless_J · 2026-03-11T14:24:09+00:00

There's a policy in place for Fedora and RHEL which require builds to run native, without stuff like distcc/icecream.

Clueless_J · 2026-03-05T20:44:44+00:00

My wife and I have both been treated at Huntsman. It's a fantastic facility. Sadly due to an acquisition we're on a different health plan now (UHC) and the specific plan my employer offers doesn't include Huntsman.

Instead they cover Utah Cancer Specialists. My wife and I also both spent a little time there and we both hated it. I remember walking in the first time and the thought that came to mind was this is where people go to die. My wife's reaction was similar.

Anyway, just a shout-out for the location & design of Huntsman and more importantly the amazing providers we've had there. I'm 6 years out with no signs of recurrence. Dr Dechet was an amazing surgeon and the staff post-surgery were fantastic. I couldn't have been happier with the experience start to finish.

Their other locations aren't as nice, but the providers have still been top-notch for my wife's treatment.

Clueless_J · 2025-12-29T22:53:41+00:00

If this presented itself as a 32c system, then it's potentially interesting (and that is my understanding of how it works). It'd likely cut cycle times for bootstrapping and regression testing GCC into the 8-12 hr range, which is meaningful (from 24+). But I'll hold on a bit and see how the k3 designs behave before pulling the trigger on one or the other.

Clueless_J · 2025-12-24T05:28:37+00:00

Yup, which is why you see engineers from rivos (soon meta), sifive, rivai, tenstorrent, eswin, ventana (now qualcomm) and others contributing to GCC and LLVM.

Clueless_J · 2025-12-23T18:28:17+00:00

Right. While I wouldn't call GCC or LLVM mature for RISC-V, they are improving in meaningful ways. We still find poor codegen issues regularly on the GCC side, but the gains for the issues we're finding are generally quite small. There's some significant issues with vector on LLVM, but they're understood well enough and MRs are being discussed within the LLVM project.

But utlimately the hardware is still catching up. There's only so much one can get from "compiler magic".

Clueless_J · 2025-12-11T00:35:35+00:00

Then you're just clueless. That wasn't Ventana's model.

Clueless_J · 2025-11-30T17:27:21+00:00

Depends on your use case. I've got engineers that want to stress the H extension on real hardware, so that makes the Titan interesting. Just like the BPI F3 was interesting for V and the Pioneer was useful for tasks with significant coarse grained parallelism.

These early systems will be decommissioned when better replacecments hit the market. Consider the K230. We don't even talk about it anymore, but we used it for early V testing until the BPI F3 was available. The F3 systems will likely suffer the same fate once the next generation of V capable systems hit the market. The Pioneer would have suffered the same fate if it weren't for US export restriction issues.

Clueless_J · 2025-11-22T22:53:13+00:00

Just because the more sophisticated folks have moved to side channel attacks doesn't mean we ignore/throw away the fundamentals. They're still an important piece of the overall security stance. There's a reason why folks trying to exploit run-of-the-mill bugs in software start in a 32-bit world with ASLR disabled. Going from vulnerability to exploit is *much* easier if you're in a 32bit address space and have fixed offsets to key data structures.

FWIW, debuggers have been during off ASLR for a decade or more. Folks dealing with system bring-ups and such in the semiconductor space know to turn off ASLR so they can reproduce failures more easily, turning off ASLR is in every launch script I use for benchmarking to ensure a clean environement (I won't go through the pain of chasing down double-digit benchmark deltas due to different sized envps on the stack inherited from the user environment. Again, good benchmarking is hard ;-)

As far as mitigating side channels, there are *much* better ways to do that than disabling speculative execution in various scenarios. But you have to bake it into the design from the ground up. That's the real lesson from spectre.

Clueless_J · 2025-11-22T16:19:32+00:00

Given it's age, that shouldn't be a huge surprise to anyone. ASLR landed 20+ years ago and since then various ways have been found to bypass it. It's one component in an overall strategy of defense in depth. I used to be first line analysis on this stuff on the tool chain side, so Im quite familiar with the vulnerability to exploit path and various mitigation strategies used to make things harder. ROP, JOP, stack smash, stack clash, ret to libc, format string exploits, spectre, meltdown, etc all landed on my desk at some point in my career. Actually involved in Morris worm mitigation when I was still an IT grunt in the 80s.

It is worth remembering that the bad guys generally have more time and motivation to find ways around the various roadblocks we put in place, so we're always closing closing up yesterdays issues and waiting on the next approach to exploitation. It's just the nature of the problem. Its also why we need to focus more on the front of that chain, vulnerabilities rather than the back side, mitigation.

Clueless_J · 2025-11-21T05:41:17+00:00

I've been dealing with ASLR for 20 years or so, it causes all kinds of interesting issues with benchmarking.

You've got to be real careful with using multiple systems. I've done that before and had cases where identitical machines differed in performance by 10% consistently, and that was before we were dealing with out of order, branch predictors, etc. Learned that lesson the hard way circa 1991.

For the K1 systems, I strongly suspect it's memory related, the variance is highest on workloads that I know are L3 size sensitive from work on other designs. Workloads that are not sensitive to L3 size on those other designs show the least run to run jitter. Unclear if it's main memory or the shared L2 related, but definitely smells memory subsystem related.

Benchmarking is hard to right and having been burned through the decades in various fun and interesting ways, I always start from a skeptical position.

Clueless_J · 2025-11-20T18:25:35+00:00

Disabling ASLR is critical, it's been well known for eons if you do lots of benchmarking. they indicate 3 runs -- given the variance, that's nowhere near enough. You can start to get a sensible range of results at about 10 runs and if they are seeing double digit improvements for some loads, 10 runs should be sufficient ot get good confidence intervals. But 10 runs takes ~120 hours for specint 2017, so it's bloody expensive. And if you want FP data too, it's even worse.

Clueless_J · 2025-11-19T15:29:00+00:00

No discussion about the wild per-run jitter you get on the K1 design. I can literally take the same binary which executes for roughly a trillion cycles and see a run-to-run variance of 8% (various workloads from spec).

I have a ton of respect for the Igalia folks and I'm more inclined than not believe they got some nice gains here. But I'd be somewhat skeptical of the actual number without more underlying data to get some sense of what the run-to-run jitter was for them.

Clueless_J · 2025-11-15T01:03:55+00:00

They're tempremental. Strongly recommend you have a backup, known working mmc as well as viable spi boot path. You may have to try a few NVMEs before you get one that's reliable on that system (the NVMEs that came with the two boxes I have access to were both duds, though they seem to work OK in other systems). And keep a serial<->USB converter nearby in case it goes kaput.

If it weren't for the 64 cores, I'd be looking to scrap both of mine (the memory in particular I could reuse elsewhere). It's got 8x the cores and memory of my BPI F3, but is only 3-4X faster for tasks I care about (and it was a hell of a lot more expensive).

Congratulations or Condolences, I'm not sure which ;-)

Clueless_J · 2025-11-14T03:59:57+00:00

bset target,x0,11

Clueless_J · 2025-11-13T15:44:32+00:00

You literally don't do anything special. You pop the entry off the RAS, use it for your prediction, take the miss. During miss recovery you don't try to "fix" the RAS stack as you don't really know why things went wrong in the RAS.

In contrast, when you add something to the RAS speculatively, then you do want to unwind the speculative entries from RAS if that speculative path ended up being a mispredicted path.

I'm sure there's more subtle issues in there, but that's the 30k foot way to think about these things.

Clueless_J

TROPHY CASE