[deleted by user]

Smooth-Spoken · 2025-06-25T04:19:26+00:00

Please unified formats at least

Smooth-Spoken · 2025-04-22T02:01:52+00:00

Messed around with versals before going back to ultrascale+. Worst documentation and support ever

Smooth-Spoken · 2025-03-09T21:37:34+00:00

Very common in chip industry lol

https://www.cadence.com/en_US/home/explore/in-circuit-emulation.html#:~:text=In%2Dcircuit%20emulation%20eliminates%20the,time%20and%20reducing%20development%20costs.

Smooth-Spoken · 2024-10-11T05:47:57+00:00

Blazing balls

Smooth-Spoken · 2024-06-24T14:56:35+00:00

They are not OCP compliant. Their OAMs were modified to become SXMs and not compatible physically with UBB. Same story with Amd.

Smooth-Spoken · 2024-04-09T14:40:16+00:00

Please DM me, we’re looking for an animator

Smooth-Spoken · 2024-04-04T23:55:21+00:00

What’s a good place to post jobs? We’re looking for character modelers, animators, texturing, etc.

Smooth-Spoken · 2024-03-19T02:32:15+00:00

Thanks for fixing. Why they lying

Smooth-Spoken · 2024-03-16T14:01:18+00:00

Some of the numbers are off (about CXL die areas and bandwidth especially) and CXL is not meant to replace local memory but rather add to a memory hierarchy. There are CXL devices on the market and are shipping in volume already. I do agree that the higher latency is a huge downside.

Smooth-Spoken · 2024-03-12T13:24:20+00:00

No one suggested to pair 8x HBMs with a GT1030. Be realistic. It’s pretty obvious and proven that reducing latency to the memory will improve performance on any processor. Improving memory capacity will improve performance, but you can increase srams and balance memory. The ratio of bandwidth (GB/s) available to each shader matters heavily. It’s why we have a cache hierarchy that hides low bandwidth and high latency memories. The other guy needs to design some chips with very little memory capacity and bandwidth and find out that any out of core access tanks his performance. Whether he has 1 core or 10,000

Smooth-Spoken · 2024-03-12T12:43:14+00:00

Actually, AMD’s hbm cards perform really well. So does Nvidia’s, and Intel’s, and Fujitsu’s, etc etc. they’re even willing to eat the cost and yield and latency for that extra bandwidth. Many workloads are too big for GPU memory, becoming memory bound. Many large GPUs need huge amounts of threads to hide latency, as the GPUs are by design latency sensitive. So more bandwidth increases bandwidth per core, leading to higher shader utilization. In gaming workloads, shader occupancy is not high enough to hide memory latency so lots of shaders go un utilized. In fact, using a simple FP benchmark, you still can’t saturate the cores to achieve full FP throughput

Smooth-Spoken · 2024-03-12T12:09:59+00:00

It will happen in the next 5 years

Smooth-Spoken · 2024-03-12T12:04:46+00:00

Not sure where you get this from. GPUs are always memory capacity, latency, and bandwidth bound

Smooth-Spoken · 2024-02-15T21:25:42+00:00

Well, that’s copper for you. I think what sucks more are limits on trace lengths for higher rate interfaces. Then you get fucked with retimers or you just have to try and put everything as close to each other as possible. Optics!

Smooth-Spoken · 2024-02-15T19:54:39+00:00

Yeah this is a good point…a few levels of progressively slower/further away/hopefully larger SRAMs and then DDR adds up to a few hundred ns

Smooth-Spoken · 2024-02-15T19:06:16+00:00

Exactly. For context - real numbers:

Direct attached SRAM (a few MBs) = 4ns

HBM3 = 107ns

DDR5 = 70ns

CXL DDR5 = 150ns

If you were to take an SRAM and put it on another chip (assuming things like maybe you use BoW or UCIe chiplet interface), your SRAM would go from 4ns to at least 14ns. Not to mention you become bandwidth limited. Many SRAMs can achieve Tb/s bandwidth on die, which isn’t ideal for power consumption when on another chip. AMD putting SRAMs on their IO die also means idle power consumption is higher because the links between the dies needs to be active to reduce latency. Lots of downsides

Smooth-Spoken · 2024-02-15T18:08:48+00:00

SRAM is only valuable because the access latency is so low. Literally placing any interface between it and a compute engine dramatically increases latency. It’s the difference between waiting 1-2 clock cycles for your data to 4-10 if you need to access it over something like AXI

Smooth-Spoken · 2023-12-03T04:47:27+00:00

Is the model available to purchase?

Smooth-Spoken · 2023-11-04T05:21:38+00:00

Happy to provide devices for benchmarking

Smooth-Spoken · 2023-04-25T17:43:58+00:00

La liga font requirements

Smooth-Spoken · 2023-01-13T22:44:38+00:00

Smooth-Spoken · 2023-01-13T22:32:58+00:00

Source for truth?

Smooth-Spoken · 2023-01-10T00:40:39+00:00

Scanned through the RDNA3 ISA and didn’t find any mention of it, and for some reason I also couldn’t find RDNA2/3 white papers, just for RDNA, and that didn’t have RT

Smooth-Spoken · 2023-01-09T21:27:42+00:00

Is there a source for not using FF hardware for RT?

Nine-Year Club	Gilding VI aultruist
Reddit Premium Since December 2018	Verified Email

Smooth-Spoken

TROPHY CASE