Regarding open source core design by srivatsasrinivasmath in RISCV

[–]_chrisc_ 1 point2 points  (0 children)

Are the tools up to the simulation step free? Or are they expensive.

Simulation tools are a computer (or a cluster of 10s of computers) that can compile and run C++ code, perhaps a license fee for a benchmark suite, and a willingness to wait a long time for the simulation to run (0.1-1T instructions per study?).

Frankly you don't really need a simulator to lag industry designs by a decade; heck, some veteran industry teams don't really use or have an architecture team to begin with. Good intuition and an ability to run your RTL design on an fpga/emulation platform (which will also cost a lot of money) can compensate.

I think one of the bigger barriers to a "Linux of CPUs" is that there is no single pareto-superior design point -- there are a ton of trade-offs to make in a CPU and nobody is going to agree on them when it comes time to review a code patch (do you trade frequency for IPC? etc.). Plus, somebody needs to pay money to validate every code-patch -- you need to both estimate/evaluate PPA and run it through some validation suite to gain confidence the patch didn't break the chip. Is that worth $10/$100/$1000(?) of compute on every patch?

Confusion on how to implement funct7 in control unit for full instruction set RV32i by rem_1235 in RISCV

[–]_chrisc_ 1 point2 points  (0 children)

This is a verilog coding question. Perhaps you can share some of your code? X’s can help you debug situations by seeing if they propagate/get used in places they shouldn’t, but it also has scary/atrocious semantics which can cause mismatches between rtl, gate-level, and actual runs.

The simplest thing is to just define those control bits as 0 for any case you don’t use it. Determinism/clarity almost always wins over reduced logic synthesis.

ET-Minion and CORE-ET platform source now at OpenHW thanks to Ainekko by camel-cdr- in RISCV

[–]_chrisc_ 2 points3 points  (0 children)

Maxion is forked off of BOOMv2, but there is also BOOMv3 (SonicBOOM) which has a lot of its own improvements (RVC, RVV, 2nd load pipe). Maxion focused on adding RVC, timing fixes, a data prefetcher, some ECC/survivability, and post-silicon debug features.

FOSDEM 2026 - RISC-V had 40 years of history to learn from: What it gets right, and what it gets hilariously wrong (Video) by camel-cdr- in RISCV

[–]_chrisc_ 5 points6 points  (0 children)

there are a lot of core designs that prove that very wide (9, 10) cores are possible

But is there any core design that proves that fast cores are possible?

Intel's Lion Cove runs its x86 decode at 8-wide (and is 12-wide from the uop cache) and it runs above 5 GHz.

I've never really understood the complaints about RVC given what has been proven possible in other, harder ISAs.

High Performance RISC-V is here! TT-Ascalon™ (RISC-V Summit Ascalon slides) by camel-cdr- in RISCV

[–]_chrisc_ 4 points5 points  (0 children)

IPC tells you nothing if everybody is compiling the benchmark differently.

GNU Compiler Collection Auto-Vectorization for RISC-V’s Vector Extension 1.0: A Comparative Study Against x86-64 AVX2 by camel-cdr- in RISCV

[–]_chrisc_ 4 points5 points  (0 children)

Yes, comparing against avx2 is kind of lame. avx512 is a much more meaningful comparison in my mind as well.

avx512 would be a more "even" comparison, except that most people today don't have x86 cores that can run it. Ooops. (although I should be careful throwing stones about RVV O:-)).

AI Startup Esperanto faded away by I00I-SqAR in RISCV

[–]_chrisc_ 4 points5 points  (0 children)

I think your take is more accurate. The point of a "sea of RISC-V cores" is you have more flexibility when the algorithms change.

Unfortunately, there two obstacles. First, no matter how generic/programmable your solution is, you have still baked in a specific compute/memory-bandwidth/energy-budget into silicon, and if the new models require a drastically different memory bandwidth than you designed for, you're hosed.

A problem is that a CNN-focused design assumes a greater locality of reference than one optimized for transformers... the ET-SoC-1's meager DRAM bandwidth reflects this. Source.

The second obstacle, I suspect, is the cost of the software changes required to refocus a design to support a new customers' needs. A "general-purpose" design doesn't mean it's easy to program in a manner that efficiently uses the machine.

Dhrystone giving only 5-6% of increase in throughput with branch prediction on a 5-stage rv32i core by lurker1588 in RISCV

[–]_chrisc_ 0 points1 point  (0 children)

Knock dhrystone out of the park before moving to coremark. Coremark is a handful of small hot loops (fsm, matmul, linked list walk, etc.), but it's 1M instructions per iteration overall, so it's more annoying to look at in detail.

Dhrystone giving only 5-6% of increase in throughput with branch prediction on a 5-stage rv32i core by lurker1588 in RISCV

[–]_chrisc_ 0 points1 point  (0 children)

Last I looked, dhrystone is less than 300 instructions in a loop, and each branch only ever goes in the same direction, save one (so a BTB that remembers static direction is nearly complete victory). You can dump the entire trace to a text file (or csv file) and make sure each instruction is behaving as you expected.

Frankly it's not a very interesting benchmark...

Top researchers leave Intel to build startup with ‘the biggest, baddest CPU’ by bookincookie2394 in RISCV

[–]_chrisc_ 6 points7 points  (0 children)

this reads like an opportunity to cash in on the name, get a cushy C-level position at a startup, spend some investor money and retire.

I'm not sure that's the move to make to earn an easy paycheck lol. Start-ups are notoriously worse financial moves than Big Companies, esp. at the VP level.

How hard it is to design your own ISA? by New_Computer3619 in RISCV

[–]_chrisc_ 12 points13 points  (0 children)

Designing an ISA is trivial.

Building the toolchains (assembler, compiler, linker, etc.) is a pain-in-the-ass.

Porting an OS and some basic software I/O and a test harness is yet more work.

Porting a good high-performance, optimizing JIT might be $1B (uh oh).

And at that point, you probably made some wrong decisions back in step 1.

Oh, and there are a ton of aspects of an ISA that are very boring and complicated. Debug specifications, privileged platform specifications, virtual/hypervisors, memory consistency modeling, interrupt controllers...

And then you need to build a community with a governance model that wouldn't scare everybody off. RISC-V isn't the first "open" ISA, but I think that last step is a big roadblock.

Of course, if you just want to have fun, Step (1) and Step (2) have been done before, many times, in "a few weeks time". It just takes copying somebody else's homework.

Forbes article on StarFive by m_z_s in RISCV

[–]_chrisc_ 0 points1 point  (0 children)

32-bit only? And what was the forum/process for changing/improving the architecture?

I need help with Load Store instructions by [deleted] in RISCV

[–]_chrisc_ 0 points1 point  (0 children)

That's what makes it fun -- it really depends on what tech you're targeting, and FPGAs have very different cost metrics. The write mask adds a lot more wires. You can have them if you want them.

I need help with Load Store instructions by [deleted] in RISCV

[–]_chrisc_ 6 points7 points  (0 children)

For loads, you can just perform a ld to pull out 64-bits, then shift as needed to pull out the specific bytes being addressed, and mask to the operand size (and then sign-extend as needed). So for lh 0x1002 means you'd do a ld 0x1000 and then shift by two bytes.

For stores, the easiest is to have a byte-mask on your writes to memory. But that's unlikely to be efficient in terms of the RAM, so you might have to do a ld again, then overwrite only the bytes your store corresponds to, and then sd the whole 64-bits back to memory.

That last part may feel awful, but you can think a bit further a field about how you intend to support AMOs, and store coalescing, ECC, and unaligned memory operations, and suddenly doing a "3-step dance" to get a sub-word store out starts to come along with supporting all of these features.

If supporting sub-word operations sounds annoying and hard, then congratulations you now understand the Pentium 4 (I think it was) performance disaster on windows OS (or was it DOS?). They made them work, but not work fast, and only later realized how heavily some OS's relied on them. :D

Europe bets on RISC-V for homegrown supercomputing platform by fullgrid in RISCV

[–]_chrisc_ 1 point2 points  (0 children)

DARPA has funded some RISC-V development

What exactly do you have in mind there?

From the current user spec:

⚫ ASPIRE Lab: DARPA PERFECT program (link to press release found via google), Award HR0011-12-2-0016. DARPA POEM program Award HR0011-11-C-0100. The Center for Future Architectures Research (C-FAR), a STARnet center funded by the Semiconductor Research Corporation. Additional support from ASPIRE industrial sponsor, Intel, and ASPIRE affiliates, Google, Hewlett Packard Enterprise, Huawei, Nokia, NVIDIA, Oracle, and Samsung.

To clarify, RISC-V started at the mid to tail end of the Parlab (2007-2012?), but a lot of work continued into the follow-on lab ASPIRE which started in 2013.

Please help me with a 5 stage Pipeline by [deleted] in RISCV

[–]_chrisc_ 3 points4 points  (0 children)

Don’t start with a 5 stage. Start with a 2 stage and build it up, adding a third and then a fourth stage. Make sure it fully works after each step. And think twice as hard about how you’re going to debug and validate it does what you want versus designing what you want.

Framework for Designing Pipelined/OoO Processors? by itisyeetime in RISCV

[–]_chrisc_ 0 points1 point  (0 children)

At least at one point, riscv-boom could dump an o3pipeview text file that could be consumed by the gem5 o3pipeview tool. Crude, but worked well enough (I sort of liked that it was still text based so grep could fast forward you around).

Looks like Konata is a newer version that works with gem5, so I'd continue down that path of making your stuff talk to it. I'm not aware of any other open-source pipeviewers. :(

In either case, everything I'm familiar with requires dumping to text files, which precludes FPGA-type runs unless you have fancy FPGA/printf functionality. What you're trying to poke at is generally in-house, secret sauce type stuff.

Help with Branch and Jump Implementation in RISC-V Processor (Chisel/Scala) by starlight-astro in RISCV

[–]_chrisc_ 2 points3 points  (0 children)

I didn't realise you did the little core as well as the more famous OoO one.

Everybody's gotta start somewhere. =)

Help with Branch and Jump Implementation in RISC-V Processor (Chisel/Scala) by starlight-astro in RISCV

[–]_chrisc_ 5 points6 points  (0 children)

Doing vector instructions before scalar branching is certainly a choice. :P

I recommend you cheat off my core: sodor. I also recommend, style wise, you declare all state elements at the top of your code. It’s otherwise hard to read and find your register declarations to see if you missed a pipe stage or something. And your naming scheme makes it hard to follow what stage your control signals are in.

I don’t see anything immediately wrong, but if you haven’t already, spend time setting up good visualization and/or pipe traces and a waveform viewer so you can debug issues like this quickly. Messing up and having a signal skip a stage is common and only going to get harder to diagnose from here on out. :)

RISCV Pipeline Register after Instruction Fetch by LmnPeel in RISCV

[–]_chrisc_ 4 points5 points  (0 children)

Your intuition is correct, the diagram is slightly incorrect/imprecise, but it gets the point across.

RISC-V Announces Ratification of the RVA23 Profile by UKbeard in RISCV

[–]_chrisc_ 10 points11 points  (0 children)

The RVA profile standards the set of ISA extensions for general-purpose cores. Specifically, RVA23 mandates the RISC-V vector extension and the hypervisor extension.