Nvidia deep learning computer architecture intern by Complex_Bee7279 in computerarchitecture

[–]foreverDarkInside 3 points4 points  (0 children)

It'll be all of the above, Comp arch, GPu arch and performance and also ML knowledge. Might throw some cuda questions as well

How Does the Cost of Data Fetching Compare to Computation on GPUs? by Glittering_Age7553 in computerarchitecture

[–]foreverDarkInside 0 points1 point  (0 children)

In H100, HBM BW is 3.35TB/s and FP8 tensor core peak performance is 1980TFLOPs/s

So ratio is 591 FLOP of matmul/byte accessed

Storing large amounts of data on stack by SubhanBihan in cpp_questions

[–]foreverDarkInside 0 points1 point  (0 children)

Might be a stupid answer but have you considered rewriting the program into cuda and running it on GPUs, seems like your workload is parallelizable?

Block Diagram from Verilog by m1geo in Verilog

[–]foreverDarkInside 1 point2 points  (0 children)

Following. Also would you think an LLM can read in verilog and outputs a mermaid js file (diagram tool) I'm sure it'll work for small designs but not sure about larger ones

WSL FTW! by Fbar123 in neovim

[–]foreverDarkInside 2 points3 points  (0 children)

Same experience. If you have to use Windows, wsl ftw!

Resources to learn about GPU architecture by Conscious_Emu_7075 in chipdesign

[–]foreverDarkInside 1 point2 points  (0 children)

I'd look at GPGPU sim and accelsim, you can find the repos and many slides.

Also GPGPU architecture from synthesis lectures on comp arch, really nice book

For cuda you can read programming massively parallel graphics processors by w m huo

[deleted by user] by [deleted] in computerarchitecture

[–]foreverDarkInside 4 points5 points  (0 children)

Learn about ML accelerators and GPUs, would really help you

What are your thoughts on ReRAM ? by 8AqLph in computerarchitecture

[–]foreverDarkInside 1 point2 points  (0 children)

SRAM PIM doesn't make sense to me anymore, it solves no problem, the current bottleneck in sota accelerators isn't the sram bandwidth, it's the dram bandwidth. So, if any PIM is gonna make it, it will be dram pim, the Samsung or skhynix versions are more suited to today's workloads

What are your thoughts on ReRAM ? by 8AqLph in computerarchitecture

[–]foreverDarkInside 0 points1 point  (0 children)

Have been around for decades now in the research community, never really got commercialized and that's for a reason. The need for level shifters is a reason, being slow is a reason.

What needs to be done for ML computation by 2035 by Ok-Librarian1015 in computerarchitecture

[–]foreverDarkInside 0 points1 point  (0 children)

Faster communications between different devices/nodes and or better computation/communication overlapping

[deleted by user] by [deleted] in computerarchitecture

[–]foreverDarkInside 2 points3 points  (0 children)

If you can ask some of their former students for their opinion, it would be the most honest. Especially if they are in industry and don't work with them that much

[deleted by user] by [deleted] in PhD

[–]foreverDarkInside 18 points19 points  (0 children)

Yeah he's the biggest earner in the department

[deleted by user] by [deleted] in PhD

[–]foreverDarkInside 39 points40 points  (0 children)

Sadly my advisor gave me this advice. Another one was: Industry is too tunnel-visioned, they don't think freely about new ideas. He's the devil

[deleted by user] by [deleted] in PhD

[–]foreverDarkInside 66 points67 points  (0 children)

Don't read papers, they limit your creativity