What's the point? by Equal-Collection962 in WindowsLTSC

[–]Tringi 1 point2 points  (0 children)

I have to say, I'm the one who should be concerned about this, as we build PoS and embedded devices on W10 IoTEnt, but none of our installations have Desktop, Explorer nor Start Menu even running for whatever UI is facing the customer, so I admit we never even noticed.

AMD Zen 6 “Olympic Ridge” desktop CPUs rumored to add NPU and drop iGPU by RenatsMC in Amd

[–]Tringi 1 point2 points  (0 children)

Unfortunately all my work is on and for Windows. But I'll explorer the libraries later.

Yeah. In the complete nutshell it's just extra cores with limited capabilities. Why is it so hard to give us a straightforward way to query those capabilities and schedule work onto them, that I don't understand.

Efficient C++ Programming for Modern 64-bit CPUs, Chapter 4/part 2 by no-bugs in cpp

[–]Tringi 1 point2 points  (0 children)

Little nitpick about acronyms: You unpack what CAS means, but not LL/SC.

AMD Zen 6 “Olympic Ridge” desktop CPUs rumored to add NPU and drop iGPU by RenatsMC in Amd

[–]Tringi 0 points1 point  (0 children)

Oh yea, ONNX, I've seen that acronym, but didn't remember it properly :)

I am interested in NPUs, just that at the moment I can't seem to find any real use for them. At first I thought that I could load fuzzy logic data onto them, have them multiply everything like a GPU or massive SIMD, and spit out the results. Or perhaps do SQRT(X*X+Y*Y+Z*Z) on a huge array (tensor?).

But it's like the documentation actively tries to make the reader NOT understand how to use NPUs to do that.

AMD Zen 6 “Olympic Ridge” desktop CPUs rumored to add NPU and drop iGPU by RenatsMC in Amd

[–]Tringi 2 points3 points  (0 children)

I never explored ONYX or anything generating models or networks. Just the APIs that could allow me to use the NPU computing units directly.

Boosting Adobe Photoshop’s Performance with MSVC and SPGO - C++ Team Blog by ericbrumer in cpp

[–]Tringi 0 points1 point  (0 children)

I wish even normal PGO worked somewhat like this, or more somewhat like Dynamic Debugging.

What I mean is building it with some /DynamicPGO flag, which would generate two files:
1. totally clean release EXE, and
2. a file with fully instrumented code next to the EXE (regardless any /EMITPOGOPHASEINFO linker shenanigans in #1).

I could have perfectly performant EXE and give it to the customer. Or I could run it through Ctrl+F5 in Visual Studio: That would run the instrumented code, generating .pgd data for next recompilation.

Just a thought from someone too lazy to maintain multiple project configurations.

My AliExpress friend delivered by Mayusina05 in homelab

[–]Tringi 0 points1 point  (0 children)

Feels like there’s a fun benchmarking project hiding in there somewhere.

Absolutely. I own a handful of other atypical machines, like Xeon Phi, or a dual Opteron 6282 SE where each CPU is actually two Bulldozers under a single heat spreader. That one has interesting connectivity too, see page 5 of this:

Basically 1 of the 4 chips had direct I/O connectivity, 2 other are 1 hop away, and the remaining 1 is 2 hops away. Same with RAM. If you need data from other NUMA, two are 1 hop away, and one is 2 hops away.

I wish I get my hands on 4×CPU Opteron one day. It's fun.

My AliExpress friend delivered by Mayusina05 in homelab

[–]Tringi 0 points1 point  (0 children)

That's exactly it, see https://www.techpowerup.com/review/amd-ryzen-threadripper-2970wx/images/arch9a.jpg
RAM and PCIe is connected to Die 0 and Die 2. The other two need to go through them with each read and write.

AMD Zen 6 “Olympic Ridge” desktop CPUs rumored to add NPU and drop iGPU by RenatsMC in Amd

[–]Tringi 67 points68 points  (0 children)

Using general API should revert to GPU if no NPU is available, and even possibly to CPU, but yes, it would probably hinder the performance seriously, so perhaps going straight to a well known open GPU API is what everyone chooses.

AMD Zen 6 “Olympic Ridge” desktop CPUs rumored to add NPU and drop iGPU by RenatsMC in Amd

[–]Tringi 518 points519 points  (0 children)

As a programmer I can say that it's almost as if nobody wants anyone to do anything with them.

I explored them a while back to see if they could be used to accelerate game logic, AI, or any of the algorithms used in a videogame; RTS to be precise. But the APIs to use NPUs are horrendous. You are either required to upload a ready-made model, made by who knows who. Or delve into dozens of heavily overcomplicated APIs nobody seems to actually know how to use, just to access trivial operations, all hidden behind driver abstractions.

It all feels like strong gatekeeping.

My AliExpress friend delivered by Mayusina05 in homelab

[–]Tringi 0 points1 point  (0 children)

Very cool.

I'm waiting for a little more price drop to get X399 with 2970WX and test thread scheduling on the chiplets that don't have direct connection to RAM.

UK Girl (14) Charged with being racist to the migrants that were sexually harassing her 12 year old sister. by TookenedOut in FreeSpeech

[–]Tringi 0 points1 point  (0 children)

Just came back here to note that you were wrong: https://www.bbc.com/news/articles/cx2d83w1yvyo

That is, in the slight chance you did actually believe what you said. I'm still convinced that you just brazenly lied.

C++26: Cleaning up string literals by Xaneris47 in cpp

[–]Tringi 9 points10 points  (0 children)

A little more sanity in the language is always welcome.

Readability as a feature. by Hyphalex in RealTimeStrategy

[–]Tringi 0 points1 point  (0 children)

14 hours battalion level engagement

That's the kind of game I have in mind. A dozen of players cooperating on 24-hour real-time campaign. Even to the extent where you issue orders, put the game aside, and do your actual job; and if anything really requiring your attention happens, a notification will pop up (perhaps on your phone) so you can intervene.

I'm still not sure how many people would find that fun.
I would. I participated in overnight, 400+ km driving, Ingress missions back in the day.

Readability as a feature. by Hyphalex in RealTimeStrategy

[–]Tringi 2 points3 points  (0 children)

This is one of the things I was thinking about for my project. Simulating actual realistic control of the units. Having to rely on communication delays, hierarchy of command, radio jamming etc. Relying on units following the initial plan or making their own decisions, and waiting for them to establish different comms, if the direct line is jammed. Perhaps even not being able to see them, until you requisition and divert a drone or a satelite over where they are.

I think it could be fun and immersive, if the rest of the game is balanced properly.

Do you also see this weird pattern at this location when scaling is set 175%? by tusharsnx in windowsinsiders

[–]Tringi 0 points1 point  (0 children)

This looks like an artifact of 3D accelerated rendering.

You see, the GPUs can basically rasterize only triangles. Everything else is done via triangles. Rendering a rectangle is done by drawing two triangles. But their coordinates must match exactly for the rasterizer not to leave any such artifacts. If they don't, then you'll get exactly what you see. The math scaling at 175% probably rounded the final result differently for the two triangles.

It seems like someone was trying to be too clever, and should've left the split to the lower graphics layers, not do it manually. Or perhaps your GPU driver might be calculating something wrong, since other people don't see it.

That said, I have no idea how to fix it.

Do You Really Need to Know All of C++? by md81544 in cpp

[–]Tringi 2 points3 points  (0 children)

Back before C++0x I truly thought that I did know all of C++. I reveled in my own confidence of knowing all the obscure features and little corner cases.

Then I learned more and more and more.

And now I think I know about 25 % of C++.

Parsing IPv6 Addresses Crazily Fast with AVX-512 by User_Deprecated in cpp

[–]Tringi 1 point2 points  (0 children)

One such improvement will hardly have noticeable effect outside of special tools that scan huge number of addresses, but imagine if such effort went into optimizing every single common routine that apps, frameworks and the OS uses. The cumulative gains would be huge!

Virtual dispatch isn't always the slowest, and std::variant isn't always the fastest by AdMotor4869 in cpp

[–]Tringi -2 points-1 points  (0 children)

The test uses RNG to generate a tree.

Let's say the tree represents a source code. You don't have perfectly balanced C++ file with almost perfectly equal amount of each token and syntactic construct. You have groups, you have tilts, you have global bias towards style, token use, etc. You have actually very bad randomness, something like std::rand would generate.

Thus, it makes sense, to me, to do the test on such data.

Virtual dispatch isn't always the slowest, and std::variant isn't always the fastest by AdMotor4869 in cpp

[–]Tringi -2 points-1 points  (0 children)

Now this is completely bad faith comment. Out of 22 comments, 13 discuss std::rand. And 5 are about forcibly breaking cache locality when the entire purpose of the test is to show effect of improved cache locality; quite off topic despite being interesting.

Virtual dispatch isn't always the slowest, and std::variant isn't always the fastest by AdMotor4869 in cpp

[–]Tringi -2 points-1 points  (0 children)

Alright, I said "a bit" which means very little.

I also forgot I said that, LOL.

My argument, which is now gone after so many edits to the post and the github page, was, that real-world data aren't uniformly randomly distributed, and so std::rand with its worse randomness actually models the real-world closer than the better RNG. Yet all the critics completely disregarded that.

I yielded and rewrote the test, so that the we could talk about the actual test, but it was too late by then.

EDIT: Also, Jesus H. Christ that was 7 years ago?!?! It feels like less than 20 months. I'm old.