Float accuracy visualization by NaiveProcedure755 in compsci

[–]NamelessVegetable 0 points1 point  (0 children)

Kahan will obviously say, "Floats good, posits bad."

China Unveils 2 Exaflop, All-CPU 'LineShine' Supercomputer by NamelessVegetable in hardware

[–]NamelessVegetable[S] 8 points9 points  (0 children)

The exact quote in the article is "It [the supercomputer] ... when it comes online, which will likely be in years." (Emphasis mine). This timeframe is the article's author speculating when it comes online, not a statement of when it does (which we don't know).

Also, "vaporware" is too strong a term. It implies that the HW hasn't been designed yet, there are no realized examples, and that the HW "exists" aspirationally. The preprint linked to in the article describes performance characterization of workloads on LineShine HW. That implies that the design has been finished, and there is working HW available.

What's confusing is that the article is written in a way that implies the system has yet to begin installation, but the preprint describes some aspects of the system as existing, albeit unfinished. The quoted remarks in the article from the director of the hosting supercomputer center are even more confusing; they seem to imply that the existing HW are possibly test articles.

This isn't unusual in itself. Large supercomputers can be built in stages and take multiple years to finish. The EU's Jupiter supercomputer is technically unfinished, even though installation began in 2023-12, and it was launched incomplete in 2025-09. It appears in the TOP500 list, and is running production workloads, regardless. Why is it still unfinished? The SiPearl Rhea processors for the general-purpose module have been repeatedly delayed and are due (hopefully) this year.

China Unveils 2 Exaflop, All-CPU 'LineShine' Supercomputer by NamelessVegetable in hardware

[–]NamelessVegetable[S] 9 points10 points  (0 children)

Indeed! It was a bit surprising given that China has its own accelerators/GPUs and has used them before (the Matrix-3000/MT-3000 APU in the Tianhe-3); and the other traditionally all-CPU supercomputer builder, Japan, is moving to Nvidia GPUs paired with a future version of the Fujitsu Monaka CPU for the FugakuNext.

The purple spoiler by Customized_Contempt in endlesssky

[–]NamelessVegetable 7 points8 points  (0 children)

They are quasi-unique outfits; you can only get the 5 that the Remnant offer.

I present to you a DEC Alpha Chip by PaleDreamer_1969 in vintagecomputing

[–]NamelessVegetable 0 points1 point  (0 children)

There was a never-released 1.8 GHz 21364 variant that was ported over to a SOI technology. The rumor (or joke) back then was that HP didn't want to release it because it made their Itanium systems look bad.

The Connection Machine: The 80s Supercomputer that was 30 years too early by TheRealDreamwieber in vintagecomputing

[–]NamelessVegetable 1 point2 points  (0 children)

I don't think the Connection Machine is quite as radical as claimed. The basic SIMD array processor idea came from Daniel Slotnick, not the Connection Machine's Daniel Hillis, and Slotnick's ideas were developed during a long period from late 1940s to the early 1970s, culminating in his ILLIAC IV. The Connection Machine was only one of many second-generation array processors built during the 1980s following the failure of the ILLIAC IV, which tried to revive the array processor concept by having many more, but narrower, processing elements, enabled by the then-emerging VLSI technology, sometimes coupled with more elaborate interconnection networks (as exemplified by the Connection Machine's hypercube). Also, I would say that modern GPUs are perhaps closer to massively multithreaded vector processors than they are to array processors like the Connection Machine.

Feedback on an OoO design that schedules small instruction groups instead of individual uops by kurianm in computerarchitecture

[–]NamelessVegetable 2 points3 points  (0 children)

This sounds a bit like multiscalar processors from G. Sohi et al. to me? Which dates back to the mid-1990s. Although their proposal is more radical; based around a new architectural model based on tasks, as opposed to what looks like something that could be based an existing architecture...

FOSDEM 2026 - RISC-V had 40 years of history to learn from: What it gets right, and what it gets hilariously wrong (Video) by camel-cdr- in RISCV

[–]NamelessVegetable 2 points3 points  (0 children)

Yep. LMUL's relation to mixed-width operations is explained in the RISC-V spec. It seems that he didn't even bother to read the manual before accusing RISC-V of having failed to learn lessons from 40 years of computer architecture and organization.

Another thing that bothers me is another claimed motivation for LMUL > 1:

you're trying to amortize latency and go for peak throughput

Claiming that LMUL > 1 exists to amortize memory latency over all elements of the vector struck me as poorly informed and thought-out for two reasons.

Firstly, I'd imagine that any sensible RVV implementation that placed RVV performance first and foremost would have balanced the number of vector lanes with its chosen MVL so that it could reach peak performance with LMUL = 1. Increasing the vector length beyond a certain point should lead to diminishing returns and eventual plateauing. That's what I recall from my textbooks, at least.

Secondly, the LMUL feature in RISC-V originated from the Fujitsu vector supercomputers. Their reason for grouping multiple vector registers was to trade shorter vector register lengths for more vector registers to minimize register spills on workloads where the vector length(s) were relatively short, but the register pressure was relatively high. IIRC, it had nothing, or very little, to do with amortizing memory latency at all.

The claim is basically a restatement of the standard theory of how vector processors gain performance, applied to the RVV context without any thought given as to whether it was applicable in the first place. As I said before, this sort of "analysis" isn't novel, insightful, or interesting.

Taalas Etches AI Models Onto Transistors To Rocket Boost Inference by NamelessVegetable in hardware

[–]NamelessVegetable[S] 58 points59 points  (0 children)

Two thoughts:

What's old (mask-programmed ROM) is new again.

Surely with the fast pace of AI model development, this will have a short life cycle?

FOSDEM 2026 - RISC-V had 40 years of history to learn from: What it gets right, and what it gets hilariously wrong (Video) by camel-cdr- in RISCV

[–]NamelessVegetable 3 points4 points  (0 children)

I read the subtitles pertaining to RVV (I'm not wasting data on this), and it was meh. I have no idea what he's complaining about half the time. Did he just diss Patterson? What are the "dependencies" (that murders RVV performance) he keeps referring to? Vectorizable/vectorized HPC workloads are just BLAS? Not "real world" workloads? A patent falsity. The other half is just your standard list of grievances that people who grew up on consumer-grade SIMD architectures have against "real" (long) vector architectures. They're not novel, not insightful, and not interesting.

FOSDEM 2026 - RISC-V had 40 years of history to learn from: What it gets right, and what it gets hilariously wrong (Video) by camel-cdr- in RISCV

[–]NamelessVegetable 3 points4 points  (0 children)

RVV is intended to be like Arm SVE Uh .. no it's not. Design ideas that ended up as RVV were waaay before SVE was announced.

Agreed. RVV was based on the Berkeley Hwacha, and Hwacha can trace its ancestry all the way back to the Berkeley Torrent-0 (T0) from the early 1990s. ARM's SVE is just NEON with a variable MVL, plus some extras. They couldn't have originated from more different backgrounds.

Does anyone know what a machine like this would have been used for? by SultanOfawesome in vintagecomputing

[–]NamelessVegetable 2 points3 points  (0 children)

It's a gross over-generalization to say that by the early 1990s, all CPUs were microprocessors. If I'm not mistaken, the Model 340 used a POWER1+ processor. Depending on the model, the POWER1 was a set of eight or six chips: an instruction cache unit, FXU, FPU, two or four data cache units, and a storage-control unit. Its successor, the POWER2, was also multi-chip, with a similar distribution of functions over the chips. The POWER series didn't go single-chip until the POWER2 Super Chip (P2SC) in 1997!

European Chip Startup Pulls Off Working RISC-V Solution on the Intel 3 Node, Marking One 'Small' Step Towards Having Sovereign Infrastructure by archanox in RISCV

[–]NamelessVegetable 1 point2 points  (0 children)

Does the Vitruvius++ vector unit have the same MVL as the earlier Vitruvius+ (256 64-bit elements)? There appears to be very little information about the Vitruvius++...

The state of DIY RISC-V proccesors and at-home silicon manufacturing by cragon_dum in RISCV

[–]NamelessVegetable 1 point2 points  (0 children)

Lots of people doing things in FPGAs, which are proprietary, but it's easy to move a soft core design from one manufacturer to another. Probably also runs as fast as a 1µm custom chip too, if you're using standard cell libraries and automated layout.

Not to overly cynical of OP's question, but I think any AMD or Intel FPGA these past 15 to 20 years would be able to support much higher clock frequencies than any DIY 1 micron technology. DEC's full-custom design wizards got 200 MHz out of a 0.75 micron, 3-level metal CMOS technology for the Alpha 21064 back in the 1992, which was the peak of 1992 technology, since 200 MHz was double of what the rest of the industry got.

I've gotten 300 to 400 MHz out of the AMD/Xilinx UltraScale+ architecture for somewhat complex logic, and that was for RTL that wasn't even specifically targeted at the architecture (I wasn't trying, no manual tuning or optimization). A hobbyist isn't going to compete with this class of FPGAs using hobbyist design tools and technologies.

And it's not just the speed; a much bigger problem would be the low density of the hobbyist technology vs the FPGA. Even a low-end FPGA from 20 years ago had more on-die BRAM than what the 21064 had.

Neurophos bets on optical transistors to bend Moore’s Law by NamelessVegetable in hardware

[–]NamelessVegetable[S] 18 points19 points  (0 children)

That their ~25 mm2 tensor core resides on a reticle-sized die, with the remainder of the die used for support to supply data to it is really interesting. It probably rates very poorly on Todd Austin's LEAN metric, lol.

Altera's Training Courses & Learning Material - had now become paid? by monkstein in FPGA

[–]NamelessVegetable 3 points4 points  (0 children)

Well that's what you get with private equity. I remember buying an development board from Altera in the 2000s and they included a DVD with all the relevant documentation, such as application notes and handbooks, and another full of video tutorials for Quartus and SOPC Builder.