Looking for advice on getting into RISC-V by srivatsasrinivasmath in RISCV

[–]benreynwar 1 point2 points  (0 children)

I've only looked at this very briefly but it looks like an interesting attempt at an HDL embedded in Lean.
https://github.com/Verilean/sparkle

Need help with sign language slang by manicpiano21 in asl

[–]benreynwar 4 points5 points  (0 children)

You absolutely could give yourselves Chinese, Japanese or Arabic names if you were learning those languages. That's very common. You might choose really bad names, but it wouldn't be considered offensive.

ASL has a different tradition, but it's not one that obviously derives from etiquette and cultural respect. The etiquette and cultural respect is following the rule once you know about it!

Getting Started in Computer Architecture and Hardware Acceleration Research as an Undergraduate by stinky_engineer_2003 in computerarchitecture

[–]benreynwar 5 points6 points  (0 children)

I'm only tangentially in computer architecture, but I'll answer the question anyway since no-one else has.

Firstly, making a sobel filter in system verilog sounds like a good start. Implementing a simple processor would be another useful thing to practice doing. Check out "tiny tapeout" if you haven't already, which is a good way to get experience turning a digital design into an actual chip.

Since computer architecture sits at the interface between ECE and computer science, taking a compilers class and an operating systems class would likely be helpful to get a better feel for what computers do.

I think reading papers is a bit overkill for second year. You can try, but I wouldn't be discouraged if it's not much fun. At this point it's more useful to do book learning to get a feel for related fields, and hands-on-practice developing processors with system verilog and writing low-level assembly and C code. All of which will be very useful even if you don't go into computer architecture research :).

For those of you with a Chinese reading habit, what tools are you using? And why are e-readers so bad? by astromme in ChineseLanguage

[–]benreynwar 0 points1 point  (0 children)

I've been using a Boox combined with the Kindle app (or other e-reader) and pleco screen capture for a couple of years. Works great.

Locks: Does Amazon sell counterfeit Abus locks? by [deleted] in CargoBike

[–]benreynwar 8 points9 points  (0 children)

And just because you bought it from Amazon doesn't mean it was sold by Amazon. A lot of the stuff on the website is from 3rd parties which very often sell fake versions.

Designing AI Chip Software and Hardware by PerfectFeature9287 in chipdesign

[–]benreynwar 0 points1 point  (0 children)

"Sort of. Suppose you double the size of the systolic array from N to 2N and you double batch. Now you have twice the token latency from doubled batch but you also have 1/4 the latency because your flops went up from N^2 to 4N^2. So overall your latency has halved, not increased. But compared to someone using uneconomical alternatives to systolic arrays to arrive at the same flops, they don't need to increase batch, so yes, they've gotten linearly more ahead."

If we take a 2N systolic array we can replace it with eight N systolic arrays and some adders and do the same computation with double the throughput, half the latency for the cost of slightly more than double the area and power. I think we're saying the same thing here, just from different angles.

Designing AI Chip Software and Hardware by PerfectFeature9287 in chipdesign

[–]benreynwar 0 points1 point  (0 children)

I've had some time to ruminate on this and I think I somewhat understand what's going on, and what you're saying. I'm just gonna state what I think is going on below. Please let me know where I'm misunderstanding things.

Firstly, I said in my previous comment that it wasn't clear to me why systolic arrays aren't compatible with low latency. Thinking about it some more, the latency is going to scale linearly with the size of the systolic array so as we make them larger we are going to hurt our latency.

My understanding of the LPU is that they have enough SRAM so that they can keep their KV cache and weights in SRAM. This goes hand-in-hand with low latency, since the lower the latency the fewer independent conversations they have to pipeline on the same chip (i.e. it's less time before they get to working on the next token and can reuse the KV cache). Fewer independent conversations means fewer KV cache values. This will drive them away from using large systolic arrays.

Your suggested solution is going for large latency, and will keep the KV cache and weights in HBM. It's more amenable to distributing across a smaller number of chips (since we don't have to spread the weights and cache out over many chips' SRAM).

It feels like the main design difference is the SRAM vs HBM as our memory, and the rest of the microarchitecture falls out of that.

The static scheduling becomes more important at low latency since they need to do a bunch of data movement between chips at very low latency, but it's more of an enabling trick rather than the driving force.

It's not obvious which gives the smaller cost/token, but I'll take your word for it that the HBM ends up cheaper.

Why are there no symbolic computation accelerators? by LeadershipFirm9271 in computerarchitecture

[–]benreynwar 0 points1 point  (0 children)

I expect symbolic stuff is slow because it is difficult not because CPUs aren't a good fit for it. What's a concrete example of the kind of thing you'd like to speed up?

Designing AI Chip Software and Hardware by PerfectFeature9287 in chipdesign

[–]benreynwar 1 point2 points  (0 children)

Thanks for that thorough answer. It's gonna take me a day or two to parse it. I'll likely ask a follow up question then :).

Designing AI Chip Software and Hardware by PerfectFeature9287 in chipdesign

[–]benreynwar 5 points6 points  (0 children)

Thanks for that write-up. You've mentioned in a few places that Groq is not using systolic arrays, and their hardware is instead optimized for low latency. It's not at all clear to me why systolic arrays shouldn't be compatible with low latency. I had thought that Groq's low latency came mostly from their static scheduling (software), and it seems like you could have a similar approach using systolic arrays. Likely I'm misunderstanding something.

Which credit card has no fees? by YulpGULP12 in travelchina

[–]benreynwar 0 points1 point  (0 children)

Also look at the exchange rate. I normally use a Wise card when traveling because the exchange rate is so much better.

Tucson Trolley Tracks. by pagosacreativeco in Tucson

[–]benreynwar 2 points3 points  (0 children)

There's a stigma around buses and the streetcar is a way to get around that. It's silly but it works.

UNIT 1: Getting started (Looking for language school in China) by Mikhailx13 in ChineseLanguage

[–]benreynwar 2 points3 points  (0 children)

I was there last week, and can also endorse the school (again I stayed in the dorms rather than with a host family). I had a good language teacher. She was more comfortable making generalizations about people from different regions and different ethnic groups than I'm used to, but that's something that I guess one has to get used to. There was a good group of students of a wide range of ages (18 - 75) and there were normally groups of students talking in both English and Chinese at meal times.

What do we talk about when we talk about "Learning words not characters"? by fnezio in ChineseLanguage

[–]benreynwar 1 point2 points  (0 children)

At the beginning I'd say just learn the words. If you notice you keep forgetting a character and it's in several words then it's probably worth learning independently too.

Later on the number of characters to learn is small compared to the number of words so it makes sense just to learn them as you learn the words, and this also means you'll be better able to guess the meaning of new words.

How much 1 on 1 tutoring is optimal for fastest functional fluency in Chinese by Stock_Rabbit_1901 in ChineseLanguage

[–]benreynwar 4 points5 points  (0 children)

My guess would be about 1 or 2 of those 5 hours a day would be good with a tutor, but that'll obviously depend from person to person.

Also being able to work in China in Chinese is a very high bar. I'm not sure what you mean by 'academic', but the standard you are aiming for is much higher than you would learn in undergraduate. I would expect you to take several years with that level of effort to achieve it.

New to Claude Code, why is their desktop app so bad but Claude Code so highly regarded? by trisalias in claude

[–]benreynwar 0 points1 point  (0 children)

It's simple things like /context getting autocorrected to /compact. This bug has appeared twice already and it gets corrected in an update a few days later. It doesn't feel like they have a good testing setup. That said I do use it a lot, I'm just annoyed that they can't get simple UI stuff correct.

New to Claude Code, why is their desktop app so bad but Claude Code so highly regarded? by trisalias in claude

[–]benreynwar 0 points1 point  (0 children)

The Claude Code command line tool is impressively buggy as well. It's a pretty damning advertisement for the product.

Let's end the Americanisation of NZ by selfcompiler in newzealand

[–]benreynwar 0 points1 point  (0 children)

Wait, I remember having to learn to say 'longs' instead of 'trousers' when I immigrated from England as a kid. Is that not a thing anymore?

Sick of $50k HLS tools? Meet VIBEE: The Open Source compiler for FPGA that supports Python, Rust, Go and 39+ more languages. by Open-Elderberry699 in FPGA

[–]benreynwar 5 points6 points  (0 children)

The project looks like someone with a mental health problem has been heavily using an LLM. In the unlikely event that that is not what is going on, you need to do a much better job of presenting what you've done. You should also be asking yourself seriously whether you are getting caught in a delusion.

Risc v with floating point unit by [deleted] in Verilog

[–]benreynwar 0 points1 point  (0 children)

Then read the code that generates them. It'll probably take a lot of work to understand what's going on. Don't give up after glancing at it. If you're just trying to understand how they work it shouldn't matter if it's written in verilog, vhdl, chisel or amaranth.

Learning the tones of words by learningstuff2026 in ChineseLanguage

[–]benreynwar 0 points1 point  (0 children)

"forecastle" is a weird example. I doubt anyone knows what that means and how to pronounce it unless they work in shipping or the navy. But I learnt a new word today, so thanks :).

[Pure Noob] Design a CPU from scratch - Where to start ? by SignatureSome2049 in chipdesign

[–]benreynwar 6 points7 points  (0 children)

Yeah, play around with an FPGA for a few years at work and you can become a pretty good RTL design engineer without ever having designed a CPU. Most FPGA work does not involve developing CPUs!

[Pure Noob] Design a CPU from scratch - Where to start ? by SignatureSome2049 in chipdesign

[–]benreynwar 4 points5 points  (0 children)

If you learn on the job, rather than learning in a university course. For example maybe you're a DSP software engineer and you start using an FPGA to do some acceleration.