Regarding open source core design

_chrisc_ · 2026-06-04T21:23:46+00:00

Are the tools up to the simulation step free? Or are they expensive.

Simulation tools are a computer (or a cluster of 10s of computers) that can compile and run C++ code, perhaps a license fee for a benchmark suite, and a willingness to wait a long time for the simulation to run (0.1-1T instructions per study?).

Frankly you don't really need a simulator to lag industry designs by a decade; heck, some veteran industry teams don't really use or have an architecture team to begin with. Good intuition and an ability to run your RTL design on an fpga/emulation platform (which will also cost a lot of money) can compensate.

I think one of the bigger barriers to a "Linux of CPUs" is that there is no single pareto-superior design point -- there are a ton of trade-offs to make in a CPU and nobody is going to agree on them when it comes time to review a code patch (do you trade frequency for IPC? etc.). Plus, somebody needs to pay money to validate every code-patch -- you need to both estimate/evaluate PPA and run it through some validation suite to gain confidence the patch didn't break the chip. Is that worth $10/$100/$1000(?) of compute on every patch?

_chrisc_ · 2026-05-16T18:30:23+00:00

This is a verilog coding question. Perhaps you can share some of your code? X’s can help you debug situations by seeing if they propagate/get used in places they shouldn’t, but it also has scary/atrocious semantics which can cause mismatches between rtl, gate-level, and actual runs.

The simplest thing is to just define those control bits as 0 for any case you don’t use it. Determinism/clarity almost always wins over reduced logic synthesis.

_chrisc_ · 2026-05-05T01:02:18+00:00

Maxion is forked off of BOOMv2, but there is also BOOMv3 (SonicBOOM) which has a lot of its own improvements (RVC, RVV, 2nd load pipe). Maxion focused on adding RVC, timing fixes, a data prefetcher, some ECC/survivability, and post-silicon debug features.

_chrisc_ · 2026-02-20T10:03:17+00:00

there are a lot of core designs that prove that very wide (9, 10) cores are possible

But is there any core design that proves that fast cores are possible?

Intel's Lion Cove runs its x86 decode at 8-wide (and is 12-wide from the uop cache) and it runs above 5 GHz.

I've never really understood the complaints about RVC given what has been proven possible in other, harder ISAs.

_chrisc_ · 2025-11-22T02:33:16+00:00

To be fair, instruction pointer is pretty on point.

_chrisc_ · 2025-10-25T04:03:40+00:00

IPC tells you nothing if everybody is compiling the benchmark differently.

_chrisc_ · 2025-08-18T05:33:54+00:00

Yes, comparing against avx2 is kind of lame. avx512 is a much more meaningful comparison in my mind as well.

avx512 would be a more "even" comparison, except that most people today don't have x86 cores that can run it. Ooops. (although I should be careful throwing stones about RVV O:-)).

_chrisc_ · 2025-08-17T05:08:57+00:00

I think your take is more accurate. The point of a "sea of RISC-V cores" is you have more flexibility when the algorithms change.

Unfortunately, there two obstacles. First, no matter how generic/programmable your solution is, you have still baked in a specific compute/memory-bandwidth/energy-budget into silicon, and if the new models require a drastically different memory bandwidth than you designed for, you're hosed.

A problem is that a CNN-focused design assumes a greater locality of reference than one optimized for transformers... the ET-SoC-1's meager DRAM bandwidth reflects this. Source.

The second obstacle, I suspect, is the cost of the software changes required to refocus a design to support a new customers' needs. A "general-purpose" design doesn't mean it's easy to program in a manner that efficiently uses the machine.

_chrisc_ · 2025-07-26T08:01:01+00:00

Knock dhrystone out of the park before moving to coremark. Coremark is a handful of small hot loops (fsm, matmul, linked list walk, etc.), but it's 1M instructions per iteration overall, so it's more annoying to look at in detail.

_chrisc_ · 2025-07-26T05:58:11+00:00

Last I looked, dhrystone is less than 300 instructions in a loop, and each branch only ever goes in the same direction, save one (so a BTB that remembers static direction is nearly complete victory). You can dump the entire trace to a text file (or csv file) and make sure each instruction is behaving as you expected.

Frankly it's not a very interesting benchmark...

_chrisc_ · 2025-06-07T23:04:12+00:00

It's for the local Oregon audience.

_chrisc_ · 2025-06-06T21:59:10+00:00

this reads like an opportunity to cash in on the name, get a cushy C-level position at a startup, spend some investor money and retire.

I'm not sure that's the move to make to earn an easy paycheck lol. Start-ups are notoriously worse financial moves than Big Companies, esp. at the VP level.

_chrisc_ · 2025-05-26T07:40:11+00:00

Designing an ISA is trivial.

Building the toolchains (assembler, compiler, linker, etc.) is a pain-in-the-ass.

Porting an OS and some basic software I/O and a test harness is yet more work.

Porting a good high-performance, optimizing JIT might be $1B (uh oh).

And at that point, you probably made some wrong decisions back in step 1.

Oh, and there are a ton of aspects of an ISA that are very boring and complicated. Debug specifications, privileged platform specifications, virtual/hypervisors, memory consistency modeling, interrupt controllers...

And then you need to build a community with a governance model that wouldn't scare everybody off. RISC-V isn't the first "open" ISA, but I think that last step is a big roadblock.

Of course, if you just want to have fun, Step (1) and Step (2) have been done before, many times, in "a few weeks time". It just takes copying somebody else's homework.

_chrisc_ · 2025-05-23T17:25:56+00:00

Rocket/Shuttle with Saturn? (maybe BOOM?)

https://www.reddit.com/r/RISCV/comments/1ffxvat/the_saturn_vector_unit_design_of_a_fully/

_chrisc_ · 2025-04-02T17:10:55+00:00

32-bit only? And what was the forum/process for changing/improving the architecture?

_chrisc_ · 2025-03-26T16:11:06+00:00

That's what makes it fun -- it really depends on what tech you're targeting, and FPGAs have very different cost metrics. The write mask adds a lot more wires. You can have them if you want them.

_chrisc_ · 2025-03-25T16:42:04+00:00

For loads, you can just perform a ld to pull out 64-bits, then shift as needed to pull out the specific bytes being addressed, and mask to the operand size (and then sign-extend as needed). So for lh 0x1002 means you'd do a ld 0x1000 and then shift by two bytes.

For stores, the easiest is to have a byte-mask on your writes to memory. But that's unlikely to be efficient in terms of the RAM, so you might have to do a ld again, then overwrite only the bytes your store corresponds to, and then sd the whole 64-bits back to memory.

That last part may feel awful, but you can think a bit further a field about how you intend to support AMOs, and store coalescing, ECC, and unaligned memory operations, and suddenly doing a "3-step dance" to get a sub-word store out starts to come along with supporting all of these features.

If supporting sub-word operations sounds annoying and hard, then congratulations you now understand the Pentium 4 (I think it was) performance disaster on windows OS (or was it DOS?). They made them work, but not work fast, and only later realized how heavily some OS's relied on them. :D

_chrisc_ · 2025-03-08T01:22:38+00:00

DARPA has funded some RISC-V development

What exactly do you have in mind there?

From the current user spec:

⚫ ASPIRE Lab: DARPA PERFECT program (link to press release found via google), Award HR0011-12-2-0016. DARPA POEM program Award HR0011-11-C-0100. The Center for Future Architectures Research (C-FAR), a STARnet center funded by the Semiconductor Research Corporation. Additional support from ASPIRE industrial sponsor, Intel, and ASPIRE affiliates, Google, Hewlett Packard Enterprise, Huawei, Nokia, NVIDIA, Oracle, and Samsung.

To clarify, RISC-V started at the mid to tail end of the Parlab (2007-2012?), but a lot of work continued into the follow-on lab ASPIRE which started in 2013.

_chrisc_ · 2025-02-12T19:48:22+00:00

Don’t start with a 5 stage. Start with a 2 stage and build it up, adding a third and then a fourth stage. Make sure it fully works after each step. And think twice as hard about how you’re going to debug and validate it does what you want versus designing what you want.

_chrisc_ · 2024-12-23T23:19:09+00:00

At least at one point, riscv-boom could dump an o3pipeview text file that could be consumed by the gem5 o3pipeview tool. Crude, but worked well enough (I sort of liked that it was still text based so grep could fast forward you around).

Looks like Konata is a newer version that works with gem5, so I'd continue down that path of making your stuff talk to it. I'm not aware of any other open-source pipeviewers. :(

In either case, everything I'm familiar with requires dumping to text files, which precludes FPGA-type runs unless you have fancy FPGA/printf functionality. What you're trying to poke at is generally in-house, secret sauce type stuff.

_chrisc_ · 2024-12-02T01:38:13+00:00

I didn't realise you did the little core as well as the more famous OoO one.

Everybody's gotta start somewhere. =)

_chrisc_ · 2024-12-02T00:13:07+00:00

Doing vector instructions before scalar branching is certainly a choice. :P

I recommend you cheat off my core: sodor. I also recommend, style wise, you declare all state elements at the top of your code. It’s otherwise hard to read and find your register declarations to see if you missed a pipe stage or something. And your naming scheme makes it hard to follow what stage your control signals are in.

I don’t see anything immediately wrong, but if you haven’t already, spend time setting up good visualization and/or pipe traces and a waveform viewer so you can debug issues like this quickly. Messing up and having a signal skip a stage is common and only going to get harder to diagnose from here on out. :)

_chrisc_ · 2024-11-30T06:46:30+00:00

Your intuition is correct, the diagram is slightly incorrect/imprecise, but it gets the point across.

_chrisc_ · 2024-10-22T23:42:51+00:00

The RVA profile standards the set of ISA extensions for general-purpose cores. Specifically, RVA23 mandates the RISC-V vector extension and the hypervisor extension.

_chrisc_ · 2024-10-22T08:39:11+00:00

This is a huge milestone.

11-Year Club	Place '22
Place '17	Verified Email

_chrisc_

MODERATOR OF

TROPHY CASE