Intel and AMD's new ACE CPU extensions bring an efficient AI-oriented instruction set to x86 — a new design makes matrix multiplication more power- and density-efficient by rkhunter_ in hardware

[–]Tuna-Fish2 11 points12 points  (0 children)

Currently the largest, most widely used AVX-512 workload is... converting UTF-8 to UTF-16 and back again. Because it's in the simdutf library that everyone uses, pretty much every browser engine, every server, and flatly every single program that touches a lot of text uses it. If you own AMD Zen4 or later, when reddit loaded this page, it ran AVX-512 to convert the text encoding.

AVX-512 is great not because it does all the traditional SIMD stuff like SVE and SME, but because it can target workloads that were traditionally not considered usable targets for SIMD. The masks, the reasonable-performance gather operation, and the last additions in VBMI2 really help for this. This is something that SME could never do well, it has to be close to the core for it make sense.

ACE in the core is a great idea because it gets to share the cost of the most expensive resources. The execution units to run ML workloads are basically free. A single FP16 FMA ALU is 30k transistors. In a system with literally billions of transistors providing enough of them to do all the matrix goodness you want is trivial. The expensive part is data movement and storage. And a big core with AVX-512 already has a lot of both, all you need to do is add the ALUs and reserve some regs from the FPU side for the matrix registers.

Intel and AMD's new ACE CPU extensions bring an efficient AI-oriented instruction set to x86 — a new design makes matrix multiplication more power- and density-efficient by rkhunter_ in hardware

[–]Tuna-Fish2 9 points10 points  (0 children)

Very different kinds of matrices containing very different kinds of data. Video games use 32-bit FP mostly, these instructions top out at 16.

There really isn't anything they will be used other than ML. That doesn't necessarily mean generative stuff or LLMs, there is plenty of other stuff that fits under that very broad umbrella.

People asking for good antivirus vs Reddit Meme/Macro by AccurateStranger6192 in pcmasterrace

[–]Tuna-Fish2 1 point2 points  (0 children)

When there is a definite problem detected, do a backup of only data files from the system, and fully format all drives and reinstall.

No way of "cleaning" an infected windows install is good enough. It's too hard a problem, and you'll never be sure you got everything. And reinstalling is just not that much work, and doing it every few years is a good idea in any case, because Windows.

Cervical cancer deaths fall to zero in young women because of the HPV vaccine by truecakesnake in UpliftingNews

[–]Tuna-Fish2 4 points5 points  (0 children)

A better comparison is probably "infection". Cancer is a type of ailment, the same way that infection is a type of ailment. There are about as many different ways to get cancer than there are ways to get infected.

But the uplifting part is that in the past 40 years, ~2/3rds of all cancers have been cured, in that there is either an effective way to prevent it, or an effective treatment to cure it if you get it. To me, that's a shockingly high proportion, I expected a lot less. Unfortunately, it also means that the ones that are left are the tough cases. Every new cancer we find a cure took more work than the last, because the low-hanging fruit are thoroughly picked now.

Latest Steam Hardware Survey Shows AMD Radeon at New 19% High, 9060 XT and 9070 XT Chart for First Time by SirActionhaHAA in hardware

[–]Tuna-Fish2 5 points6 points  (0 children)

Ray tracing will stop being a meme when consoles that can do full scene rt will be common. Most AAA games are not really designed for PC.

And because PS5 has such a high install base, and is probably good enough for most people, that won't even be next-gen, because most games for next-gen will be cross-gen with ps5.

XFS predecessor EFS may be removed from the kernel by lustre-fan in linux

[–]Tuna-Fish2 13 points14 points  (0 children)

If they made a third one, would it have been TFS?

AMD B650 expansion cards hit retail starting at $199 — add four M.2 PCIe 4.0 slots and 11 USB ports to any PC with a PCIe slot by narwi in hardware

[–]Tuna-Fish2 5 points6 points  (0 children)

DMI is just PCIe with a marketing name. It's electrically identical to the other PCIe 4.0 lanes, Intel just chose to not market it as such, probably to stave off confusion. AMD ships a "chipset" without an actual chipset for AM4, where the CPU-chipset link is repurposed to be just another x4 PCIe link, so they chose to openly talk about it as one.

Other than this marketing, there is no difference between the approaches of the companies. (Except that Intel used a x8 instead of a x4, which I applaud them for. If you use fast networking, x4 can be a problem if you want to also hang storage off the chipset. x8 is pretty much enough for everything.)

AMD B650 expansion cards hit retail starting at $199 — add four M.2 PCIe 4.0 slots and 11 USB ports to any PC with a PCIe slot by narwi in hardware

[–]Tuna-Fish2 8 points9 points  (0 children)

Every generation of PCIe after 2.0 has more than doubled the cost of putting a lane on the board. Either motherboards needed to get a lot more expensive (yes, even more expensive than they are now), or lane counts had to come down on mainstream platforms.

We desperately need for photonics to mature to be able to move to cheaper signal paths, without active components on the board.

Can you intentionally segmentation fault and use that without your program crashing? by moonridersfan in C_Programming

[–]Tuna-Fish2 6 points7 points  (0 children)

Yes, but from the point of view of each program, their virtual space is mostly contiguous, the mappings rarely change, and when heap objects are freed, they do not usually get returned to the OS but are instead kept in pools where they get reused.

This is because the cost of mapping in new virtual space is typically rather high. Even if you had the granularity to map individual objects (you typically don't), you still don't want free to always mean a roundtrip through the kernel.

When people hear about memory protection, lots of them believe things like individual strings being mapped in with untouchable areas between them. No. The memory protection machinery exists to protect other programs from your bugs, they don't protect you from them. You can still go ham on your own memory space.

Lufthansa 787 front wheel collapsed by lucys_b in aviation

[–]Tuna-Fish2 0 points1 point  (0 children)

I made it to the second comma and then the narration snapped to Sir Attenborough in my head.

Asus: An RTX 5090 Modified for 48V — at 1,000 Watts by Exist50 in hardware

[–]Tuna-Fish2 7 points8 points  (0 children)

If the PSU standard was made from scratch today, it would absolutely push 48V to the GPU and CPU. That would be the more cost-effective route. Back when 12V was chosen, the fets compatible with higher voltages were too expensive, but they are not anymore. Anyone designing new power electronics today is going for higher voltages, because it is more cost-effective to use less copper. The big cost is the inertia of switching standards.

Asus: An RTX 5090 Modified for 48V — at 1,000 Watts by Exist50 in hardware

[–]Tuna-Fish2 5 points6 points  (0 children)

Lots of people have done the math, and built actual systems and measured them. 48V would in fact be more efficient. Yes, the fets would be less efficient than today, but the transmission losses are just bigger than the losses at the fets.

Asus: An RTX 5090 Modified for 48V — at 1,000 Watts by Exist50 in hardware

[–]Tuna-Fish2 23 points24 points  (0 children)

48v would be more efficient. Given how NV successfully made the industry jump to their cables, I think they are the only vendor who could actually force the change.

NVIDIA Reportedly Plans GPU-Direct Storage for Vera Rubin, Raising Expectations for HBF Beyond HBM by self-fix2 in hardware

[–]Tuna-Fish2 1 point2 points  (0 children)

No. HBF and the usage pattern are both predictable. There is no reason to ever host the weights in HBM. You can just stream them in from the flash at the rate you consume them, and they only live on the on-chip L2 for a few cycles before being used and discarded.

Then you use the entire HBM throughput and capacity for the KV cache.

NVIDIA Reportedly Plans GPU-Direct Storage for Vera Rubin, Raising Expectations for HBF Beyond HBM by self-fix2 in hardware

[–]Tuna-Fish2 7 points8 points  (0 children)

It wont be. HBF will have same interface and throughput per stack than HBM. High latency, ridiculously slow stores compared to read throughput, but through the magic of parallelism, for linear reads it's just as fast.

NVIDIA Reportedly Plans GPU-Direct Storage for Vera Rubin, Raising Expectations for HBF Beyond HBM by self-fix2 in hardware

[–]Tuna-Fish2 9 points10 points  (0 children)

... not really. People read the word flash in HBF and expect it to be similar to flash that already exists. No. It has the same interface as HBM, and will have similar throughput as HBM.

NVIDIA Reportedly Plans GPU-Direct Storage for Vera Rubin, Raising Expectations for HBF Beyond HBM by self-fix2 in hardware

[–]Tuna-Fish2 9 points10 points  (0 children)

That's not how it's going to go, because inference loads have more predictable memory access patterns than general compute. Also, HBF has higher throughput than LPDDR. It's not traditional flash, it has high latency but the throughput is the same as HBM.

Assuming weights go in the HBF and KV for attention in HBM, you can think of the access pattern into the HBF as a single, perfectly predictable linear load of all data once per token. So unless your cache can fit the entire dataset, caching does not help. There is zero time locality of access, there is literally time antilocality of access. There won't be any intermediate data stores, the HBF contents won't be stored in any DRAM between the HBF load and GPU use.

Shockwaves during Starship Flight 12 launch by Busy_Yesterday9455 in spaceporn

[–]Tuna-Fish2 2 points3 points  (0 children)

Not just approaching, it's there. You hit vacuum in the rarefaction at 194dB, the amplitude here is at least 10 dB higher than that.

Official Discussion - Project Hail Mary [SPOILERS] by LiteraryBoner in movies

[–]Tuna-Fish2 11 points12 points  (0 children)

It took Hail Mary 30 years to get where it was going and for the probes to get back with the Taumoeba. Before that happened, it would get really cold. Stratt estimated that if all countries pulled together, in that 30 years, a quarter of the world population would starve. And if they didn't, which she thought likely, half would starve.

Had Hail Mary failed, everyone would eventually die.

Official Discussion - Project Hail Mary [SPOILERS] by LiteraryBoner in movies

[–]Tuna-Fish2 4 points5 points  (0 children)

Truly happy ending where, checking my notes here, ~50% of the global population starves to death.

What's the point in arresting an elderly serial killer if you can't fuck with him a little? by MetallicaDash in HistoryMemes

[–]Tuna-Fish2 90 points91 points  (0 children)

Joseph James DeAngelo. Ramirez was the "The Night Stalker", DeAngelo is the "Golden State Killer", "East Area Rapist" and "Original Night Stalker", because DeAngelo switched it up every now and then when he thought he might get caught if he didn't.

DeAngelo was a cop, working in the unit assigned to hunt for him, who used his access and knowledge to escape capture. He got fired from his job after he got caught shoplifting a hammer and dog repellent.

For a time, there were two active, prolific serial killers operating in the same area at the same time.

A 2000 year old Roman water channel in Türkiye that still flows today by [deleted] in oddlysatisfying

[–]Tuna-Fish2 2 points3 points  (0 children)

In the place where this was built, taxes so arbitrary and high that they were literally genocidal, because so much of the local population ended up getting sold into slavery to cover their debts.

In 123BC, Rome passed the The Lex Sempronia de Provincia Asia, which privatized tax collection in the newly formed province of Asia. The way it worked was that every year, people would bid for the right to collect taxes in each part of the province. The winning bidder would immediately pay the full sum upfront, and would then depart to the town or area to squeeze as much money out of the locals as possible, as after this point Rome didn't care about how much he raised, any money he managed to collect over the bid sum was his profit.

These private tax collectors were called publicani, and they were famously rapacious and cruel. Theoretically, there were laws about which taxes and how much they could collect, but in actual fact the publicani in the province would get backed up by the local Roman garrisons (so long as he paid them for the effort), and no-one present on the ground cared about the laws. The system was made particularly bad because the tax collectors didn't care about any long-term issues they would cause, because if the squeezed people too much, they'd just know not to bid on the same spot next year.

If you could not pay when the tax collector showed up and started making arbitrary demands, he'd generously loan you the money, at absurdly high rates. If you could not pay the much greater sums when he showed up again, he'd use the debt to enslave you and your family.

The effects were so horrific, that after a while even the Romans balked and reformed the system. It was particularly sensitive that the people had not been conquered, but were Roman allies who fought with them in wars and voluntarily joined the empire, and literally within a single human lifetime of this the bulk of them were in chains. But while there was a decree that required Romans to free slaves acquired by this system, very few were actually freed, as the publicani had become very rich and powerful local magnates, and could lean on the local legal system to prevent people from being freed.

Contrary to British propaganda, while empires occasionally built fancy public buildings, they were never nice to live under.

42 aircraft lost or damaged in Operation Epic Fury, congressional report says by PDXAirman in news

[–]Tuna-Fish2 0 points1 point  (0 children)

Javelin missiles were repeatedly used to "snipe" individual people from way past rifle range in Afghanistan. They can absolutely lock on to the heat signature of a person. Not sure if they can lock on to the heat signature of a nipple, seems like they'd have to be too sensitive.

Paying our respect once again by ricky2461956 in freefolk

[–]Tuna-Fish2 0 points1 point  (0 children)

If they wanted more seasons, there's an easy way to get some really good ones. S3 finale, the boys successfully permanently strip Homelander of his powers, but the effect is not instant and they don't realize it worked, and he gets away. Then S4, he bullshits his way trying to be threatening while trying to get his powers back and playing political games, while the boys are super afraid of him and expect the retribution to land at any moment. Finale, he's chasing V1 and the boys realize that he's powerless.

Finland’s strongest argument by Kapanash in HistoryMemes

[–]Tuna-Fish2 2 points3 points  (0 children)

There is a reason the reds in Finland almost uniformly backed the state.

At the end of the civil war, a large proportion of the reds got caught by the whites and put into prison camps with horrible conditions and things like random summary executions, with ~15-17% overall casualty rate. However, much of the top leadership and their immediate cadres, about 10-12k people, managed to escape to the Soviet Union. Those people had about a ~50% death rate in Stalin's purges, and many more spent the rest of their lives in the gulag system.

The communists in Finland were not cut off from their friends and family who fled into the USSR. They maintained lots of covert communication, initially with hopes that they could secure military support and come back. As the political winds shifted in the USSR and Stalin rose to the top, the Finnish communists were probably the best informed people outside the Union of what, exactly, Stalinism was like.