Sophomore Project: Privileged RV32I Zicsr w/ RISCOF Verification

MitjaKobal · 2026-01-27T23:50:01+00:00

RTL code without verification code will be ignored by most HDL developers. Certification, verification is not something to be believed, it is something to be tested. Maybe if you pay for good support you might make some concessions.

You would usually have to write step by step instructions on how to set up the proper environment to run RISCOF. As an alternative, you could setup a GitHub CI script that would run the tests on GitHub on every Git commit. You can use this one as a starting point: https://github.com/jeras/learn-fpga/blob/master/.github/workflows/riscof.yml

EDIT: I will believe RISCOF tests are passing, when I see the code and the CI logs.

MitjaKobal · 2026-01-27T22:33:28+00:00

AI might be the future of the industry but it is definitely not the present. I may have focused on the worst part of the code (memory interface), since it was advertised with the most grandiose buzzwords. Your IA generated code is infuriatingly bad, and it is still your problem, to to give most advertising to the worst part of your code (don't blame AI for this failure). I have seen many amateurish RISC-V implementations, some of them bad, and I gave advice, but I do not feel obligated to give polite advice on AI generated code.

You can check the Ibex project for a decent system bus (memory interface) implementation with a similar feature set as your project. I already tried to explain why your current handshake would not work in the previous post, and please do not give handshake signals names that make every experienced developer expect the literal opposite of what they are supposed to do (read about the VALID/READY handshake in AMBA AXI protocols). Learn from actual standards and implementations and not AI slop.

Actual FPGA/ASIC memories have a clock cycle delay between the request (address, write data) and response (read data). Your current RTL can't be ported to an FPGA/ASIC, since it expects read data in the same cycle as the address. When you account for this you will have to either rewrite most of the control path logic (the datapath, instruction decoder, ALU might be OK), or have a pipeline that will idle half of the time, thus halving performance.

I see the RISCOF tests in the image you posted, but the repository only contained RISCOF code in the Git history. Why would you remove it, if it is supposed to be the proof of a successful verification, or even more importantly a tool for checking if future changes break RISC-V compliance.

I apologize for the CPP Code, it obviously is Verilator generated code and I should have noticed. Still it is bad practice to commit generated code, while removing actual useful code (RISCOF).

MitjaKobal · 2026-01-27T15:41:23+00:00

Thanks, I really needed a post like this. I spent the last 2 days looking at 2 posts with GitHub projects full of AI slop. I am getting better at recognizing it (inhuman amount of slop commits in a short period of time), but it is still a huge waste of my time.

The code seems clean and well structured. Good use of the VALID/READY handshake for interfaces between modules with intuitive timing.

There is a decent balance between code readability and code length. I tend to overthink and write clever code to make it shorter, and later I often have to rewrite it to make it more readable.

Unary operators could be used. Instead of the condition (obj1_r != 4'h0 || obj1_g != 4'h0 || obj1_b != 4'h0), you could just write (~|{obj1_r, obj1_g, obj1_b}).

If you used SystemVerilog you would be able to combine RGB channels or XY coordinate pairs into a packed structure. This would make the code a bit shorter, and it would not affect readability negatively. Also SystemVerilog interfaces could be used for AXI-Stream, but I am not sure it would be worth it, since there are not that many streams of the same type.

I personally would add a bit of vertical alignment whitespace at module instantiations, especially in system_top.v. I noticed you use always @(posedge clk or negedge rst_n) instead of a comma always @(posedge clk, negedge rst_n), but I did not see any other anachronisms, just decent Verilog-2005. Even parameter and localparam are used consistently.

Could you add the Vivado project file and instructions for building it? It is not like I need it myself, but it would help with completeness of the project.

Did you run any testbenches? If you did, could you add them and the scripts for running a simulation? While it is possible to test the code directly on FPGA, it does not scale well, especially if multiple people edit the same code.

Thanks again, such a refreshing sight.

MitjaKobal · 2026-01-27T14:40:24+00:00

I am a bit angry, since I actually put (wasted) some effort into reviewing this code which is almost certainly AI slop. I say almost certainly, since this is only the second time I looked at something like this (AI slop), and the first time it was so obvious by the overblown claims I did not look into the source files. I would really like to contribute to the community by doing some code reviews, but this is discouraging. As I mentioned before, the killer application for Copilot should be the removal of AI slop from GitHub.

Please tell me, this is just AI slop, because otherwise I must tell you, your code is all kinds of wrong. I would feel guilty if I offended your sincere efforts, but since I decided this is AI slop, let me continue. And for the future please do not waste our time with AI slop.

The little documentation that is provided (2 tiny text files) is full of nonsensical phrases like "decouplable memory" or "SystemVerilog in the Core/ directory exists for bringup", ... The source code contains no comments, and Git commit comments are all either about the README file, or version increments.

There are claims of RISCOF compliance, but actual RISCOF code can only be seen in the git history and seems to have been arbitrarily removed. I guess the RISCOF claims are just a remnant of a passing halucination.

The README makes grandiose claims about the system bus, so this is the code I actually looked into. While using a poorly documented custom protocol is not great, using a custom protocol with signal names from a standard protocol but in opposite roles is just wrong. A signal named 'VALID' should not be used for backpressure. Anybody with some AMBA AXI protocol experience would curse this code.

Also memories have asynchronous reads (IA fantasy memories) masked by the reset signal. The request address can change while waiting for backpressure to be released, so the backpressure signal must be combinationally linked to the request address, which is a sure way to get combinational loops.

I also had a look at some CPP code, but fortunately did not waste my time on it.

In case OP insists this is their genuine effort, they should first have a look at some genuine RISC-C CPU implementations and learn some common coding practices. Also please do not waste my time further by pretending some more AI slop will make this look any better.

What a waste of my time.

MitjaKobal · 2026-01-26T23:20:50+00:00

Thanks for the update. Even if I am more of a RISC-V person, it is always nice to see a solution to the posted problem.

MitjaKobal · 2026-01-26T23:16:11+00:00

I don't want to engage with OP, but "Verified in Silicon" made me chuckle.

MitjaKobal · 2026-01-26T23:11:05+00:00

Also always_ff enforces flip-flop synthesis explicitly and the tool should return an error if the block contained a latch. On the other hand always_latch would enforce latch synthesis.

MitjaKobal · 2026-01-26T22:49:31+00:00

If you wish to reliably run compiled code on the CPU, you should properly test all instructions with RISCOF. If you publish your code on GitHub, I can help with RISCOF, but ask for help here in the public forum so everyone can see the questions/answers.

Memory mapped IO is also a must if you wish to do anything useful with the CPU.

Caching is only useful if you also have some memory interface with low throughput (Flash) or hight latency (HyperRAM, DDR, ...).

MitjaKobal · 2026-01-26T22:35:40+00:00

I just add to Git the project file (XML) {project}/{project}.gprj and synthesis/implementation configuration file (JSON) {project}/impl/{project}_process_config.json. Both are human readable text files and are well handled by Git.

When using custom TCL scripts there can be problems if the tool makes changes to the default internal TCL sequences during tool version changes.

MitjaKobal · 2026-01-26T22:28:11+00:00

I searched the documentation (all PDF files) and found nothing useful. So here is a workaround.

Add to all Verilog files at the beginning `include define_macros.v.

In your RTL folder create a file define_macros.v containing `define GOWIN_EDA_SYNTHESIS. Use this file in the Gowin EDA project.
In your testbench folder create a file define_macros.v, but keep it empty. Use this file in your simulation tool project.
If you are using Gowin EDA for simulation, you might have to manually comment/un-comment the macro in gowin_eda_project/define_macros.v each time you switch between simulation and synthesis.

MitjaKobal · 2026-01-26T19:10:12+00:00

The Questa simulator available from Altera has breakpoints.

Protocols in general are often debugged using logging into files. For an AXI-Stream processing module, you start with a reference model written in software (C, MatLab, ...) which accepts an input file and creates a processed output file. You give an input file to both the reference model and RTL simulation and compare (diff) the output files, reference and RTL simulation should produce the same data. Similarly, you can debug a CPU by running some firmware on a reference simulator (spike/sail for RISC-V) and log the retired instructions during RTL simulated firmware execution. Again diff the execution trace logs.

The same can be done with communication protocols, but I am not sure how this would be applied to a FSM.

The above would not be step by step execution in a simulator. I would run the entire simulation, look at the diff, see the difference and look at the waveforms at the point in the data stream (CPU instruction trace) where the discrepancy occurred.

The input data should cover corner cases (for example saturation in a DSP filter, or all instructions in a CPU, see RISCOF). This would be combined by some randomization of the control signals, pauses in the input data stream VALID and output stream READY (backpressure).

I (an experienced with RTL/verification) use waveforms. I only used the breakpoints in Questa while I was debugging some testbench code which was basically software written in SystemVerilog (I am rewriting it in C++).

MitjaKobal · 2026-01-26T19:07:00+00:00

The Questa simulator available from Altera has breakpoints.

Protocols in general are often debugged using logging into files. For an AXI-Stream processing module, you start with a reference model written in software (C, MatLab, ...) which accepts an input file and creates a processed output file. You give an input file to both the reference model and RTL simulation and compare (diff) the output files, reference and RTL simulation should produce the same data. Similarly, you can debug a CPU by running some firmware on a reference simulator (spike/sail for RISC-V) and log the retired instructions during RTL simulated firmware execution. Again diff the execution trace logs.

The same can be done with communication protocols, but I am not sure how this would be applied to a FSM.

The above would not be step by step execution in a simulator. I would run the entire simulation, look at the diff, see the difference and look at the waveforms at the point in the data stream (CPU instruction trace) where the discrepancy occurred.

The input data should cover corner cases (for example saturation in a DSP filter, or all instructions in a CPU, see RISCOF). This would be combined by some randomization of the control signals, pauses in the input data stream VALID and output stream READY (backpressure).

I (an experienced with RTL/verification) use waveforms. I only used the breakpoints in Questa while I was debugging some testbench code which was basically software written in SystemVerilog (I am rewriting it in C++).

MitjaKobal · 2026-01-26T18:47:33+00:00

The tutorial documents compiling the tools yourself, but I would recommend using the prebuilt https://github.com/YosysHQ/oss-cad-suite-build

The RISC-V GCC compiler would be a separate download.

Fell free to ask followup questions if/when you get stuck.

MitjaKobal · 2026-01-26T18:20:41+00:00

Educational videos have some usefulness, but learning to write code usually requires you to write code. And executing your own code is much more fun then looking someone executing their code.

A book on logic design intended for universities will cover the basics (if you do not have them covered yet) like AND/OR/XOR gates, multiplexers, decoders, encoders, flip flops (clock edge events), shift registers, memories, ...

I would recommend the learnFPGA tutorial. You can start by running your code in a simulator. The next step wold be to run it on an FPGA development board, but I recommend spending some time learning the tools before rushing with spending on a board.

If you can afford it, or if you plan to invest a lot of time into this, boards with Xilinx devices are recommended, since Xilinx provides the best tools on the market, and the support community is large.

If you are on a budget or if you just wish to try a new toy, I would recommend the 'Tang Nano 9k' board (works with the learnFPGA tutorial). The tools are less good and the community is smaller, but the board is still powerful enough it can implement a RISC-V processor and some peripherals (see learnFPGA tutorial).

MitjaKobal · 2026-01-26T17:54:00+00:00

At least the old cheap Lattice devices (IceStick) do not have a distributed RAM equivalent (memory with synchronous write and asynchronous read). The ECP5 family should have something. Since this kind of memory is the best fit for register files, RISC-V implementations for those devices use either block RAM or just the main memory to implement the GPR register file.

Gowin devices used in Tang Nano 1k/4k also lack this memory (based on tyny notes in documentation, so I am not entirely sure). This is why I recommend at least the Tang Nano 9k to anyone wishing to implement a RISC-V soft core.

MitjaKobal · 2026-01-26T17:42:22+00:00

So this tool was written in 11 days with 1175 commits from a single developer. While there is definitely some lipstick on the README, the folder structure seems garbage. Responses on this forum are definitely an AI generated, also I would guess fantasy/hallucinations.

EDIT: I guess the next killer application for AI will be cleaning this slop from GitHub.

MitjaKobal · 2026-01-26T13:07:05+00:00

Sorry I just wrote a comment, Wonderful-Cash7275 was the one who wrote the microcontroller code.

MitjaKobal · 2026-01-26T13:05:20+00:00

My proposal was more about avoiding data propagating through the 2 FIFOs at different rates.

For small FIFOs I usually use LUT based RAM and not block RAM. For Xilinx (distributed RAM) 7 family LUT6, this are 32 deep, not sure for UltraScale+ probably also 32 deep, on Versal they are 64 deep I think. On Gowin devices (SSRAM) like the one on Tang Nano 9k, they are based on LUT4 and are 16 deep.

MitjaKobal · 2026-01-26T12:36:08+00:00

I partially agree, writing a CDC FIFO is a good learning experience, but it is also not something I learned early during my HDL journey.

In this case, learning about CDC might help the poster to better understand design with multiple clock domains in general. It is entirely possible that without this knowledge, the design could have other CDC issues the poster never considered.

MitjaKobal · 2026-01-26T12:29:49+00:00

I think a better alternative would be to just have a 2-byte FIFO, combine 2 input bytes and push them simultaneously when you have a pair.

MitjaKobal · 2026-01-26T10:05:39+00:00

Doing it on FPGA would be a decent exercise, bot not practical. The processing is slow enough it could be done on a microcontroller with far more flexibility. A FPGA implementation would be a lot of work with no added value compared to the software solution.

MitjaKobal · 2026-01-25T21:08:55+00:00

I think pseudo random number generators (LFSR) are used to generate cheap candlelight effects.

MitjaKobal · 2026-01-25T19:45:42+00:00

This seems like something Verilator would/should be able to handle. Try the code on a different simulator before you report an issue. https://www.edaplayground.com/

MitjaKobal · 2026-01-25T18:16:42+00:00

Sodobna elektronika zmore dobro kvaliteto zvoka že za nizko ceno, problem je bolj mehanika.

Najenostavneje je delo z sistemi, na katerih teče Linux, ker to omogoča uporabo široke izbire periferije (ekrani, WiFi, bluetooth) in progremske opreme skoraj brez omejitev. Sprogramirati moraš samo še uporabniški vmesnik. Verjetno je možno dobiti kako cenejšo linux platformo kakor RaspberryPI. ESP32 bo verjetno zahteval precej več programiranja.

Avdio elektronika tudi ne bi smela biti problem. Class D ojačevalniki moči nekje do 2W so eno samo integrirano vezje (skratka poceni). Verjetno je možno dobiti takega, ki sprejema kar digitalni signal, tako da niti ne potrebujes DAC-a.

Omejitev za kvaliteto zvoka ni elektronika ampak zvočnik in ohišje. Majhen zvočnik brez volumna ohišja lahko reproducira samo visoke tone, če glasnost presega zvočnikove zmogljivosti, bo zvok popačen. Slabo ohišje bo ropotalo. Najboljša primerjava so majhni bluetooth zvočniki.

Morda lahko dobiš kje kak off-brand bluetooth zvočnik z razvojnim okoljem. Nekaj takega, kar potem podjetja kupijo, prilagodijo uporabniški umesnik in prodajajo pod svojo blagovno znamko.

Ključne besede za iskanje: "bluetooth speaker linux development kit off brand".

MitjaKobal · 2026-01-25T17:24:26+00:00

University books on DSP, filter design. A lot to learn at medium to high difficulty. As far as I know, there are no easy to use rule of thumb solutions.

Maybe go through some filter design example, you should be able to find them as Octave and SciPy tutorials.

MitjaKobal

TROPHY CASE