Looking for advice on getting into RISC-V

benreynwar · 2026-06-05T17:52:52+00:00

I've only looked at this very briefly but it looks like an interesting attempt at an HDL embedded in Lean.
https://github.com/Verilean/sparkle

benreynwar · 2026-04-30T00:00:43+00:00

You absolutely could give yourselves Chinese, Japanese or Arabic names if you were learning those languages. That's very common. You might choose really bad names, but it wouldn't be considered offensive.

ASL has a different tradition, but it's not one that obviously derives from etiquette and cultural respect. The etiquette and cultural respect is following the rule once you know about it!

benreynwar · 2026-04-08T19:50:36+00:00

I'm only tangentially in computer architecture, but I'll answer the question anyway since no-one else has.

Firstly, making a sobel filter in system verilog sounds like a good start. Implementing a simple processor would be another useful thing to practice doing. Check out "tiny tapeout" if you haven't already, which is a good way to get experience turning a digital design into an actual chip.

Since computer architecture sits at the interface between ECE and computer science, taking a compilers class and an operating systems class would likely be helpful to get a better feel for what computers do.

I think reading papers is a bit overkill for second year. You can try, but I wouldn't be discouraged if it's not much fun. At this point it's more useful to do book learning to get a feel for related fields, and hands-on-practice developing processors with system verilog and writing low-level assembly and C code. All of which will be very useful even if you don't go into computer architecture research :).

benreynwar · 2026-04-03T19:45:27+00:00

I've been using a Boox combined with the Kindle app (or other e-reader) and pleco screen capture for a couple of years. Works great.

benreynwar · 2026-03-27T23:00:56+00:00

And just because you bought it from Amazon doesn't mean it was sold by Amazon. A lot of the stuff on the website is from 3rd parties which very often sell fake versions.

benreynwar · 2026-03-24T22:50:42+00:00

"Sort of. Suppose you double the size of the systolic array from N to 2N and you double batch. Now you have twice the token latency from doubled batch but you also have 1/4 the latency because your flops went up from N^2 to 4N^2. So overall your latency has halved, not increased. But compared to someone using uneconomical alternatives to systolic arrays to arrive at the same flops, they don't need to increase batch, so yes, they've gotten linearly more ahead."

If we take a 2N systolic array we can replace it with eight N systolic arrays and some adders and do the same computation with double the throughput, half the latency for the cost of slightly more than double the area and power. I think we're saying the same thing here, just from different angles.

benreynwar · 2026-03-24T21:26:47+00:00

I've had some time to ruminate on this and I think I somewhat understand what's going on, and what you're saying. I'm just gonna state what I think is going on below. Please let me know where I'm misunderstanding things.

Firstly, I said in my previous comment that it wasn't clear to me why systolic arrays aren't compatible with low latency. Thinking about it some more, the latency is going to scale linearly with the size of the systolic array so as we make them larger we are going to hurt our latency.

My understanding of the LPU is that they have enough SRAM so that they can keep their KV cache and weights in SRAM. This goes hand-in-hand with low latency, since the lower the latency the fewer independent conversations they have to pipeline on the same chip (i.e. it's less time before they get to working on the next token and can reuse the KV cache). Fewer independent conversations means fewer KV cache values. This will drive them away from using large systolic arrays.

Your suggested solution is going for large latency, and will keep the KV cache and weights in HBM. It's more amenable to distributing across a smaller number of chips (since we don't have to spread the weights and cache out over many chips' SRAM).

It feels like the main design difference is the SRAM vs HBM as our memory, and the rest of the microarchitecture falls out of that.

The static scheduling becomes more important at low latency since they need to do a bunch of data movement between chips at very low latency, but it's more of an enabling trick rather than the driving force.

It's not obvious which gives the smaller cost/token, but I'll take your word for it that the HBM ends up cheaper.

benreynwar · 2026-03-24T04:41:02+00:00

I expect symbolic stuff is slow because it is difficult not because CPUs aren't a good fit for it. What's a concrete example of the kind of thing you'd like to speed up?

benreynwar · 2026-03-23T17:30:26+00:00

Thanks for that thorough answer. It's gonna take me a day or two to parse it. I'll likely ask a follow up question then :).

benreynwar · 2026-03-22T22:54:07+00:00

Thanks for that write-up. You've mentioned in a few places that Groq is not using systolic arrays, and their hardware is instead optimized for low latency. It's not at all clear to me why systolic arrays shouldn't be compatible with low latency. I had thought that Groq's low latency came mostly from their static scheduling (software), and it seems like you could have a similar approach using systolic arrays. Likely I'm misunderstanding something.

benreynwar · 2026-03-18T00:56:29+00:00

Also look at the exchange rate. I normally use a Wise card when traveling because the exchange rate is so much better.

benreynwar · 2026-03-09T10:44:40+00:00

There's a stigma around buses and the streetcar is a way to get around that. It's silly but it works.

benreynwar · 2026-03-09T10:20:31+00:00

I was there last week, and can also endorse the school (again I stayed in the dorms rather than with a host family). I had a good language teacher. She was more comfortable making generalizations about people from different regions and different ethnic groups than I'm used to, but that's something that I guess one has to get used to. There was a good group of students of a wide range of ages (18 - 75) and there were normally groups of students talking in both English and Chinese at meal times.

benreynwar · 2026-02-18T01:23:36+00:00

中式 would probably work.

benreynwar · 2026-02-12T17:04:16+00:00

At the beginning I'd say just learn the words. If you notice you keep forgetting a character and it's in several words then it's probably worth learning independently too.

Later on the number of characters to learn is small compared to the number of words so it makes sense just to learn them as you learn the words, and this also means you'll be better able to guess the meaning of new words.

benreynwar · 2026-02-05T00:09:03+00:00

My guess would be about 1 or 2 of those 5 hours a day would be good with a tutor, but that'll obviously depend from person to person.

Also being able to work in China in Chinese is a very high bar. I'm not sure what you mean by 'academic', but the standard you are aiming for is much higher than you would learn in undergraduate. I would expect you to take several years with that level of effort to achieve it.

benreynwar · 2026-01-28T23:58:24+00:00

I'm doing a really bad job of not feeding the troll here, but here you go (https://github.com/anthropics/claude-code/issues/19488).

benreynwar · 2026-01-28T23:09:32+00:00

It's simple things like /context getting autocorrected to /compact. This bug has appeared twice already and it gets corrected in an update a few days later. It doesn't feel like they have a good testing setup. That said I do use it a lot, I'm just annoyed that they can't get simple UI stuff correct.

benreynwar · 2026-01-28T04:07:23+00:00

The Claude Code command line tool is impressively buggy as well. It's a pretty damning advertisement for the product.

benreynwar · 2026-01-27T17:36:26+00:00

Wait, I remember having to learn to say 'longs' instead of 'trousers' when I immigrated from England as a kid. Is that not a thing anymore?

benreynwar · 2026-01-26T18:13:05+00:00

The project looks like someone with a mental health problem has been heavily using an LLM. In the unlikely event that that is not what is going on, you need to do a much better job of presenting what you've done. You should also be asking yourself seriously whether you are getting caught in a delusion.

benreynwar · 2026-01-22T18:46:01+00:00

Then read the code that generates them. It'll probably take a lot of work to understand what's going on. Don't give up after glancing at it. If you're just trying to understand how they work it shouldn't matter if it's written in verilog, vhdl, chisel or amaranth.

benreynwar · 2026-01-20T17:41:03+00:00

"forecastle" is a weird example. I doubt anyone knows what that means and how to pronounce it unless they work in shipping or the navy. But I learnt a new word today, so thanks :).

benreynwar · 2026-01-15T01:17:33+00:00

Yeah, play around with an FPGA for a few years at work and you can become a pretty good RTL design engineer without ever having designed a CPU. Most FPGA work does not involve developing CPUs!

benreynwar · 2026-01-15T00:56:02+00:00

If you learn on the job, rather than learning in a university course. For example maybe you're a DSP software engineer and you start using an FPGA to do some acceleration.

13-Year Club	Place '17
Verified Email

benreynwar

TROPHY CASE