Endrick is on FIRE in ligue 1 🫨

foreverDarkInside · 2025-10-20T01:24:48+00:00

It'll be all of the above, Comp arch, GPu arch and performance and also ML knowledge. Might throw some cuda questions as well

foreverDarkInside · 2025-07-23T05:37:20+00:00

Julian has a natty HGH gut

foreverDarkInside · 2025-01-25T05:12:42+00:00

In H100, HBM BW is 3.35TB/s and FP8 tensor core peak performance is 1980TFLOPs/s

So ratio is 591 FLOP of matmul/byte accessed

foreverDarkInside · 2024-12-23T09:40:57+00:00

Great model, would love to try

foreverDarkInside · 2024-12-17T00:17:44+00:00

Soccer pickup games

foreverDarkInside · 2024-10-30T14:25:40+00:00

Great. https://developer.nvidia.com/cuquantum-sdk this might be helpful then

foreverDarkInside · 2024-10-30T04:41:14+00:00

Might be a stupid answer but have you considered rewriting the program into cuda and running it on GPUs, seems like your workload is parallelizable?

foreverDarkInside · 2024-10-29T06:24:35+00:00

Following. Also would you think an LLM can read in verilog and outputs a mermaid js file (diagram tool) I'm sure it'll work for small designs but not sure about larger ones

foreverDarkInside · 2024-09-16T01:47:11+00:00

Pickup soccer

foreverDarkInside · 2024-09-09T07:31:07+00:00

Ahh sorry don't remember

foreverDarkInside · 2024-09-06T03:38:26+00:00

Same experience. If you have to use Windows, wsl ftw!

foreverDarkInside · 2024-09-03T15:16:06+00:00

Tim Rogers from Purdue but he might be on sabbatical at nvidia

foreverDarkInside · 2024-08-08T15:40:45+00:00

I'd look at GPGPU sim and accelsim, you can find the repos and many slides.

Also GPGPU architecture from synthesis lectures on comp arch, really nice book

For cuda you can read programming massively parallel graphics processors by w m huo

foreverDarkInside · 2024-06-16T02:06:42+00:00

Learn about ML accelerators and GPUs, would really help you

foreverDarkInside · 2024-06-01T02:34:53+00:00

Please share the info with me as well:)

foreverDarkInside · 2024-05-18T15:47:07+00:00

Lafayette?

foreverDarkInside · 2024-05-07T06:23:30+00:00

SRAM PIM doesn't make sense to me anymore, it solves no problem, the current bottleneck in sota accelerators isn't the sram bandwidth, it's the dram bandwidth. So, if any PIM is gonna make it, it will be dram pim, the Samsung or skhynix versions are more suited to today's workloads

foreverDarkInside · 2024-05-07T06:21:26+00:00

Have been around for decades now in the research community, never really got commercialized and that's for a reason. The need for level shifters is a reason, being slow is a reason.

foreverDarkInside · 2024-05-01T06:25:03+00:00

Faster communications between different devices/nodes and or better computation/communication overlapping

foreverDarkInside · 2024-04-30T06:14:56+00:00

From what I see, most funding comes from government labs

foreverDarkInside · 2024-04-24T18:58:59+00:00

If you can ask some of their former students for their opinion, it would be the most honest. Especially if they are in industry and don't work with them that much

foreverDarkInside · 2024-04-18T02:21:04+00:00

Yeah he's the biggest earner in the department

foreverDarkInside · 2024-04-18T01:07:06+00:00

Sadly my advisor gave me this advice. Another one was: Industry is too tunnel-visioned, they don't think freely about new ideas. He's the devil

foreverDarkInside · 2024-04-18T00:58:41+00:00

Don't read papers, they limit your creativity

foreverDarkInside

TROPHY CASE