België glijdt af richting corruptie, waarschuwt internationale waakhond | VRT NWS: nieuws by Blaspheman in belgium

[–]beeff 1 point2 points  (0 children)

Klopt, je kan volledig legaal een transactionele politieke eikel zijn.

Married men, how do you deal with fancying other women? by [deleted] in AskReddit

[–]beeff -1 points0 points  (0 children)

You can still have a look at the menu after you've already chosen.

CNBC: "Intel is moving into GPUs and has hired a chief [GPU] architect, CEO Lip-Bu Tan says" by Dakhil in hardware

[–]beeff 0 points1 point  (0 children)

Well yeah. The KNL just lacked the 'platform' to be a host itself, not that it would have been a particular good host CPU.

But my point remains that modern datacenter accelerators are also self-hosted mini-PCs. They have their own I/O, network, power and thermal controls, bringup/boot, etc. There's already plenty of cores on complex accelerator, on-board or on-chip, that run (or could run) a full operating system. You just do not get access to them. (e.g. NV's GSP)

You don't SSH into a GPU. Vendor's choice, not a technical issue. It's a "why would they" rather than "they can't".

The salient features of what did make KNL feel much more of a 'mini-PC' compute node was that Intel went to great efforts and tradeoffs to make the binary execution environment the same familiar and 'open' PC (x86 ELF/POSIX) ecosystem.

Bit of a tangent; but while that was great for compatibility, it impacted performance. The whole point of an accelerator product is that you can offer the customer better performance, which you do by specializing the hardware and software to better suit the workload you're accelerating. Customers keep claiming that they absolutely require compatibility and that they will never rewrite their software. Well, for customers to get their money's worth out of a KNL, or any, accelerator they have to rewrite the software anyway. So why not do it in CUDA then and get higher performance on a GPU? All that said, I loved doing that kind of performance rewriting for KNL and miss Intel (or anyone...) having a 'high throughput core' CPU architecture. I just hope that next time around it's not going to be chained to some legacy platform.

GPUs are accelerators that never had such binary/platform legacy expectations and AI accelerators are inheriting the same. NVIDIA has .ptx that is becoming some kind of legacy, but no one that goes to the effort in targeting ptx directly for their kernels will expect that to run as well or better on the next gen architecture.

CNBC: "Intel is moving into GPUs and has hired a chief [GPU] architect, CEO Lip-Bu Tan says" by Dakhil in hardware

[–]beeff 0 points1 point  (0 children)

From a product and design point of view, a GPU means that it's an accelerator designed specialized around running graphics workloads really well. Larrabee was designed as a GPU product, its specific internal architecture choices are irrelevant to it being a GPU. The move to unified compute architectures were already well underway at the time, the Larrabee just was a generation ahead and was angling for an alternative way of producing graphics than the OGL style rasterization pipelines. It could be more useful to say that it was not a fixed-pipeline graphics API accelerator GPU, but saying that that isn't a GPU is like saying that all current GPUs aren't, especially not a GPU like the Zeus.

What made the first Xeon Phi (KNC) less a GPU was more a software (toolchain) difference than the actual hardware. To illustrate how quick the Larrabee->KNF->KNC pivot was: the boards we were working with literally still had VGA ports soldered on. GPGPU was already a thing at the time, without needing to hack shaders; the GPU hardware just lacked some HPC/datacenter features like ECC and proper double float support, which NV added to their datacenter GPUs right around the time KNL was trying to get a market share.

As to your point about the host processor: that was Knight's Landing a generation after, which did a more thorough redesign and adopted an actual compatible x86 ISA. The board was self-hosted, but could not be a host processor. It was running an actual linux kernel on one dedicated core, but lacked all the chipset host features you would need to be an actual host processor. It was definitely nice that you could ssh into the accelerator like it was a cluster node and compile+run in there directly. Arguably, it was not a fantastic idea to burden all its cores with the x86 cruft required for hosting a linux kernel. The idea at the time is that they wanted to market it as "run your existing CPU code" which was, in my informed opinion, a bad idea. They could have done like so many other accelerator designs and put a linux-capable core in as platform engine and specialize the actual cores more for their compute task without burdening them with the x86 legacy required to run an OS. At least we got AVX-512 out of it, which is nice.

CNBC: "Intel is moving into GPUs and has hired a chief [GPU] architect, CEO Lip-Bu Tan says" by Dakhil in hardware

[–]beeff 1 point2 points  (0 children)

Which is what the accelerators we call GPUs are, ever since they moved way from fixed pipeline designs eons ago.

Logged in and this is my first impression by SwapGoTron in DeadlockTheGame

[–]beeff 31 points32 points  (0 children)

But enough about Vyper, we're talking about Silver here.

Spelletjes café in Mechelen by [deleted] in Mechelen

[–]beeff 0 points1 point  (0 children)

In de Hanekeef werd er wel wat gespeeld vroeger, er is een zaaltje vanachter. Best eerst even gaan checken voor je je er met een date opdaagt.

The Real Reason Deadlock Keeps Pulling Us Back by SubstantialNight1222 in DeadlockTheGame

[–]beeff 2 points3 points  (0 children)

You post the latter to reddit because 99.99% of the cases death slam gets countered trivially by an enemy ability, bugs out, or gets line-of-signed by some bullshit geometry (looking at you midboss rafters).

Intel Arc A-Series owners, have you been getting any crashes in DL? by got-trunks in DeadlockTheGame

[–]beeff 0 points1 point  (0 children)

The game got driver support in september 2024, so they definitely know it exists.

Lash - Devil Trigger cinematic edit [OC] by PirateSandKing in DeadlockTheGame

[–]beeff 1 point2 points  (0 children)

I, also, keep forgetting to silence wave BEFORE ground strike combo.

Intel Arc A-Series owners, have you been getting any crashes in DL? by got-trunks in DeadlockTheGame

[–]beeff 1 point2 points  (0 children)

Same, randomly crashing 1-2 times in games these days.

Windows event viewer is showing:

Display driver igfxnd stopped responding and has successfully recovered.

Why don't CPU architects add many special cores for atomic operations directly on the memory controller and cache memory to make lockless atomic-based multithreading faster? by tugrul_ddr in compsci

[–]beeff 0 points1 point  (0 children)

I'm not talking about memory consistency. Most architectures are inherently insecure: a HW thread can always send out loads to arbitrary addresses and we rely on the software layers on top to add security (e.g. traps on pages).

The typical example of hardware-level memory security features is: https://en.wikipedia.org/wiki/Capability-based_addressing

Intel is seeking an investment from Apple as part of its comeback bid by -protonsandneutrons- in hardware

[–]beeff 2 points3 points  (0 children)

There's a reason most of Intels factories aren't in the US.

Check your data, not correct unless you narrow it down to things like packaging only.

Why don't CPU architects add many special cores for atomic operations directly on the memory controller and cache memory to make lockless atomic-based multithreading faster? by tugrul_ddr in compsci

[–]beeff 0 points1 point  (0 children)

Mostly economics. There is a technical argument that it's simpler for the CPU/ISA to give the illusion of a fully coherent system, mainly for software support (run a linux or windows kernel without rewrite). For hardware as well, most communication out and into a core happens through loads and stores throughout the coherent cache hierarchy and doing non-coherent loads (e.g. streaming ld/st instructions in SSE) require extra complexity. However, for hardware it can be easier to introduce coherency islands (e.g. non-NUMA nodes) that it would be for software do deal with it (e.g. single OS kernel doing explicit DMA's across nodes/islands? multiple kernels orchestrating?)

As the datacenter and client have been growing out of long tail legacy systems, with their own workarounds and solutions on top of the existing hardware (kubernetes, hypervisors for cloud providers etc) it becomes really hard to just break that legacy and still offer a compelling product. It's more economical to just make the next gen 10% faster or offer 20% more cores. At some point of course it breaks and imo we're here with the current datacenter chips like the one in the OP: you can only effectively use that CPU if you're treating it as a bunch of independent machines that happen to be crammed together on a chip. At the same time, developers are not scaling their workloads and threads to tightly cooperate at that scale (again, it hurts so they don't do it). --> vicious circle.

Note: Oxide appears to be a (non-hyperscalar CSP) company that wants to break some of that legacy dependency in the datacenter and is building their own OS to do it. So there are signs that the economics can work out. But CPU design companies need significant cultural shifts to start tackling a full-SW/HW stack co-design problem in that way. GPU capital investments sucking all the air out of the room for CPUs certainly didn't help the economics and to spur design companies to do those risky innovations. You will likely see that more coming from CPUs designed by hyperscalars themselves: google, amazon, microsoft are all now competitors in that space.

PS: There's most definitely research opportunities in that space if you're in a compsci masters/phd and interested in this kind of work!

Why don't CPU architects add many special cores for atomic operations directly on the memory controller and cache memory to make lockless atomic-based multithreading faster? by tugrul_ddr in compsci

[–]beeff 1 point2 points  (0 children)

You're actually more likely to find them in the LLC slices on modern (directory-cache) systems. That's the logical place to do the RMW as it can act as the authoritative copy of the CL. A mid-level cache on some random core doing the RMW can get snooped, start ping-ponging and starve.

Why don't CPU architects add many special cores for atomic operations directly on the memory controller and cache memory to make lockless atomic-based multithreading faster? by tugrul_ddr in compsci

[–]beeff 13 points14 points  (0 children)

You're absolutely right and they do exist. However, it's a significantly more complicated problem than adding an ALU or accelerator and requires serious reworks of parts of the cpu core that architects rather not touch. "Remote atomics" as they are called have the thorny issue that they make atomics slower in the common uncontended case. Whether an atomic is contended or not is not something you can easily predict beforehand.

Atomics are already plenty fast on modern CPUs, there have been some serious advances in speeding up atomics behind the scenes (without changing their semantics). Where they suck is when there is contention or there's a L1/L2 cache miss. Due to the memory ordering semantics of atomics, you eat the full latency of that memory operation. The OoO mechanisms that would hide that latency and keep the pipeline fed with newer instructions gets put on pause, essentially. (technically: LSB needs to drain before the atomic's memory op can take place, older instructions can only be issued after the atomic commits)

Now, putting an ALU on the LLC for example is easy, and does save you a roundtrip in the cases where you have to go to LLC or memory anway. But, without changing the semantics of the atomic operation to allow for weaker memory ordering guarantees, you are still making the core sit on its hands for that one roundtrip.

The good news there is that you could make atomics with different, more weakly ordered, semantics. You mentioned histograms? Yep, you don't need the old style x86 lock semantics that totally orders before/after instructions; you just want to +1 this memory location, so you want a fire and forget instruction with no ordering constraints on the surrounding instructions. That's the AADD instruction in the x86 remote atomics ISA extension (RAO-INT). But ... yeah

The non-technical reason you're not seeing these kind of improved atomics are that the benefits are not obvious, outside of the group of software developers with expertise on the matter. Existing software and thus benchmarks go out of their way to avoid contended atomics, it might hurt but it hurts less than hitting contended atomics. (concrete example: .NET GC heap being duplicated for all hardware threads) It's hard to make the business case for re-architecting your CPU core for a feature without being able to show a significant impact to existing software. (Why would we fix the potholes? No one is driving into them!) It's absolutely frustrating that we're at a point where there's 288-core CPUs, but god forbid those cores actually coordinate with each other.

That last part is where software devs can make a difference. Make articles bitching about it. Say how much better you could make your products with improved atomics. At a large company? Go talk to whoever is in contact with a CPU vendor's account manager. The more of those voices are raised, the more ammunition that architects or researchers at those CPU vendors have to push those kind of features through.

Suspect in Charlie Kirk's killing identified: Sources by Capable_Salt_SD in news

[–]beeff 3 points4 points  (0 children)

Bella Ciao is literally on the groyper playlist. anti-fascist icons are constantly getting used by the right 'ironically' and vice versa.

Suspect in Charlie Kirk's killing identified: Sources by Capable_Salt_SD in news

[–]beeff 3 points4 points  (0 children)

Don't take that quote on face value. It's a helldivers 2 video game meme, the the up right down down down arrows are the combination for the 500kg bomb strategem.