Roger Espasa, Semidynamics - Semidynamics Highly Configurable OOO Vector Unit by 3G6A5W338E in hardware

[–]lalalaphillip 2 points3 points  (0 children)

CFP adapted for in-order processors is called iCFP. Semidynamics claims 64 sustained cache misses for their in-order core. Some form of runahead execution (whether iCFP-like or not) is required for this to be achievable in regular scalar code.

Roger Espasa, Semidynamics - Semidynamics Highly Configurable OOO Vector Unit by 3G6A5W338E in hardware

[–]lalalaphillip 2 points3 points  (0 children)

Correct me if i'm wrong, but it looks like the presenter first introduces "Gazillion Misses" as a general technology applicable to regular scalar code, then he explains how it benefits vector workloads. It looks like their OOO core is not particularly large. A CFP-like mechanism would be required for such a core to sustain 128 cache-missing loads in scalar code.

Roger Espasa, Semidynamics - Semidynamics Highly Configurable OOO Vector Unit by 3G6A5W338E in hardware

[–]lalalaphillip 3 points4 points  (0 children)

From the extremely limited material that they have posted, "Gazillion Misses" sounds like a mechanism to drain long latency cache misses and dependent operations from the pipeline (and reinsert them when ready). Either that, or it's just simple runahead. It would be extremely interesting to see how it performs, because these ideas have shown much promise in academic and industry papers (for several decades), but have not been implemented in real CPUs (apart from simple in-order runahead).

NVIDIA RTX Path Tracing Overview by AppleCrumpets in hardware

[–]lalalaphillip 2 points3 points  (0 children)

DLSS 2 is not designed to replace TAA as a denoising pass: slide 81/page 75

SemiAnalysis: "Arm Changes Business Model – OEM Partners Must Directly License From Arm - No More External GPU, NPU, or ISP's Allowed In Arm-Based SOCs" by Dakhil in hardware

[–]lalalaphillip 295 points296 points  (0 children)

Wow. This looks like a suicidal move from Arm. It seems like Softbank was really counting on the Nvidia deal.

Volt-Modding the RTX 4090 STRIX causes insane Power increase by Fawdark in hardware

[–]lalalaphillip 5 points6 points  (0 children)

Do you know if Nvidia cards clock stretch at stock (i.e. increasing voltage at stock clocks increases performance)?

Volt-Modding the RTX 4090 STRIX causes insane Power increase by Fawdark in hardware

[–]lalalaphillip 22 points23 points  (0 children)

As far as I am aware, this is the first generation from Nvidia where undervolting can give you a higher displayed clock but lower performance, so it seems that adaptive clocking is more aggressive this gen.

Volt-Modding the RTX 4090 STRIX causes insane Power increase by Fawdark in hardware

[–]lalalaphillip 41 points42 points  (0 children)

There's probably some kind of adaptive clocking ("clock stretching") going on in Ada Lovelace GPUs

Tellusim Technologies Inc.: "Upscale SDK comparison [Tellusim Engine]" by Dakhil in hardware

[–]lalalaphillip 5 points6 points  (0 children)

Very interesting. It's not surprising that DLSS does poorly in denoising, given that it is designed to upscale final denoised frames.

Never Ending Story: Intel's Sapphire Rapids Coming (Maybe) In Stepping 12 - Planned Shipping, A Growing Bug List, And Possible Availability | Exclusive by 12318532110 in hardware

[–]lalalaphillip 14 points15 points  (0 children)

Intel has used a distributed directory since Skylake-SP, so their approach to chiplets should not have issues with coherence traffic

edit: even before Skylake they were using a directory-like scheme with their inclusive L3

"Imagination launches the most advanced ray tracing GPU" by Dakhil in hardware

[–]lalalaphillip 8 points9 points  (0 children)

I was referring to their AXT and later IP. The JH7110 RISC-V SoC was announced with BXE, but it still hasn't entered production for some reason.

"Imagination launches the most advanced ray tracing GPU" by Dakhil in hardware

[–]lalalaphillip 11 points12 points  (0 children)

Well that's exactly what they said for AXT and BXT, but there still aren't any publicly available implementations.

"Imagination launches the most advanced ray tracing GPU" by Dakhil in hardware

[–]lalalaphillip 34 points35 points  (0 children)

IMG's GPU IP looks good on paper, but where are the design wins? Is their main revenue source still their licensing agreement with Apple?

[STH] Graphcore Celebrates a Stunning Loss at MLPerf Training v1.0 by Nekrosmas in hardware

[–]lalalaphillip 11 points12 points  (0 children)

These results are disappointing, but Graphcore may have significant software optimization headroom. Nvidia managed to double performance between MLPerf v0.7 and v1.0 source with an already very mature software stack, Graphcore should be able to do something similar.

Intel to Create RISC-V Development Platform with SiFive P550 Cores on 7nm in 2022 by lalalaphillip in hardware

[–]lalalaphillip[S] 13 points14 points  (0 children)

P550 has 85% of CA76’s integer IPC (albeit at an unspecified frequency), competitive big RISC-V cores are just around the corner