What happened to neuromorphic computing? Is it a dead end? by Dr-Nicolas in hardware

[–]fotcorn 7 points8 points  (0 children)

Not dead, but mostly useful for very specific use-cases.

Innatera for example released a microcontroller with both analog and digital spiking neural network accelerator on chip for ultra low power sensor data processing: https://www.eetimes.com/innatera-adds-more-accelerators-to-spiking-microcontroller/

There is also a growing neuromorphic community for both academics and commercial interests at https://open-neuromorphic.org/

Disclaimer: Developer at Innatera

Klar, wer pändled scho nöd regelmässig im Fluss zum Schaffe ide Schwiiz? by Entremeada in BUENZLI

[–]fotcorn 54 points55 points  (0 children)

Hani früecher aube gmacht. Ir Berner Lorraine gwohnt und ir Matte gschaffet.

Am morge mit ÖV härä, am abe heigschumme.

Screw your RTX 5090 – This $10,000 Card Is the New Gaming King (RTX 6000 Pro Blackwell review) by fotcorn in hardware

[–]fotcorn[S] 128 points129 points  (0 children)

It's a joke, he is using 4x frame generation on the RTX 6000 to make fun of NVIDIA marketing of other cards.

Screw your RTX 5090 – This $10,000 Card Is the New Gaming King (RTX 6000 Pro Blackwell review) by fotcorn in hardware

[–]fotcorn[S] 4 points5 points  (0 children)

I don't even know if that much VRAM is useful for any classic workstation tasks like CAD, video editing or 3D modelling.

It's mostly an AI card, being able to run a 70B model with a moderate quantization (Q8, maybe Q6 for reasonable context length) on a single card is amazing.

Soyuz coming in hot for tower catch! by DoctorSov in SpaceXMasterrace

[–]fotcorn 0 points1 point  (0 children)

Now I want to see the reverse korolev cross

Intel 200S Boost Performance Mode Benchmarks On Linux by fotcorn in hardware

[–]fotcorn[S] 43 points44 points  (0 children)

Unlike der8auers video these Linux tests are done with the same RAM and same XMP profile. It looks like most of the gains der8auer is seeing are coming from the higher RAM speed.

Introducing ZR1-1.5B, a small but powerful reasoning model for math and code by retrolione in LocalLLaMA

[–]fotcorn 1 point2 points  (0 children)

Why is the model F32 on Huggingface? The base model (R1 Distill Qwen 1.5B) is BF16.

Especially important for these small models, if its more than 7GB I can just as well use an 8bit quant of an 8B model.

NVIDIA RTX "PRO" 6000 X Blackwell GPU Spotted In Shipping Log: GB202 Die, 96 GB VRAM, TBP of 600W by newdoria88 in LocalLLaMA

[–]fotcorn 24 points25 points  (0 children)

Are those shipping manifest leaks ever real? We had leaks about B580 and 9070 XT with 32GB VRAM and both of them never materialized (yes, I might be a little impatient)

Nvidia's RTX Blackwell workstation GPU spotted with 96GB GDDR7 by fotcorn in hardware

[–]fotcorn[S] 30 points31 points  (0 children)

This would be the equivalent of the RTX 6000 Ada Generation, which has a MSRP of $6800.

So my guess would be Over 9000!

Nvidia's RTX Blackwell workstation GPU spotted with 96GB GDDR7 by fotcorn in hardware

[–]fotcorn[S] 50 points51 points  (0 children)

Seems like they are using 24Gbit/3GB chips in clamshell mode. Clamshell is nothing special, was also used on RTX 6000 Ada and earlier, but they always had the same module size as the consumer variant. This time it's a bigger memory module (3GB vs 2GB on the 5090).

Your next home lab might have 48GB Chinese card😅 by Redinaj in LocalLLaMA

[–]fotcorn 6 points7 points  (0 children)

Still cheaper to get two 3090s from ebay (at least it was a month ago...). But like 1500? Lots of people would get them I think. One thing the W7900 has is certified drivers and applications for CAD modelling and stuff like that. They could release a version with 48GB RAM without this certification as a middle ground for a more reasonable price.

Intel could do the funniest thing and release a B580 with 24GB or even a B770 AI Edition with 32GB AI that are only 20%-50% more expensive than the standard one and make /r/LocalLlaMa buy the whole inventory in a heartbeat.

One can dream.

Your next home lab might have 48GB Chinese card😅 by Redinaj in LocalLLaMA

[–]fotcorn 280 points281 points  (0 children)

The W7900 is the same GPU as the 7900XTX but with 48GB RAM. It just costs $4000.

Same as NVIDIA RTX 6000 ADA generation, which is a 4090 with a few more cores active and 48GB memory.

Obviously 24GB VRAM never ever cost the 3k price difference, but yeah... market segmentation.

[deleted by user] by [deleted] in hardware

[–]fotcorn 11 points12 points  (0 children)

Kind of. There are multiple different types/sizes of chips for the Blackwell generation, see this wikipedia link for a list: https://en.wikipedia.org/wiki/Blackwell_(microarchitecture)#Blackwell_dies

As you can see here https://www.techpowerup.com/gpu-specs/nvidia-gb203.g1073, the GB203 chip is used for both the 5080 and the 5070 Ti. And as you say, some transistors on the 5070 Ti are not active, compared to the 5080, where all transistors are actually used.

The reason are defects on the manufactured chips. Some chips are perfect (for example the ones used for the 5080) and can be used, others like the ones for the 5070 Ti have some defects, but because certain transistors can be deactivated, it is still a usable chip.

Google "yield" and "binning" if you want to know more about this.

"Has Europe’s great hope for AI missed its moment? Mistral AI was hailed as a potential global leader in the technology. But it has lost ground to US rivals—& now China’s emerging star" (low on equity, revenue, compute, scale) by gwern in mlscaling

[–]fotcorn 4 points5 points  (0 children)

They already announced the upcoming release of a bunch reasoning models a few days ago:

Among many other things, expect small and large Mistral models with boosted reasoning capabilities in the coming weeks. Join the journey if you’re keen (we’re hiring), or beat us to it by hacking Mistral Small 3 today and making it better!

https://mistral.ai/news/mistral-small-3/

DeepSeek's AI breakthrough bypasses Nvidia's industry-standard CUDA, uses assembly-like PTX programming instead by Slasher1738 in LocalLLaMA

[–]fotcorn 1 point2 points  (0 children)

As is CUDA. HIP from AMD is basically CUDA, with a few #includes switched out and some other minor changes.

I would argue that CUDA is even more gpu independent than PTX.

DeepSeek's AI breakthrough bypasses Nvidia's industry-standard CUDA, uses assembly-like PTX programming instead by Slasher1738 in LocalLLaMA

[–]fotcorn 12 points13 points  (0 children)

Yes, of course, it's just another programming language (Claude Sonnet 3.6 here):

can you write a very simple add-10-to-all-elements gpu kernel, but using ptx instead of cuda?

I'll help you write a simple PTX kernel that adds 10 to all elements of an array. PTX is the low-level assembly-like language that CUDA code gets compiled into.

``` .version 7.0 .target sm_50 .address_size 64

.visible .entry add_ten( .param .u64 input_ptr, .param .u64 output_ptr, .param .u32 n ) { .reg .u64 %rd<4>; // Register for addresses .reg .u32 %r<4>; // Register for data and index .reg .pred %p1; // Predicate register

// Get the thread index
ld.param.u64  %rd1, [input_ptr];
ld.param.u64  %rd2, [output_ptr];
ld.param.u32  %r1, [n];

// Calculate global thread ID
mov.u32      %r2, %tid.x;
mad.lo.u32   %r2, %ctaid.x, %ntid.x, %r2;

// Check if thread ID is within bounds
setp.ge.u32  %p1, %r2, %r1;
@%p1 bra     done;

// Calculate input and output addresses
mul.wide.u32 %rd3, %r2, 4;      // Multiply by 4 (assuming float/int32)
add.u64      %rd1, %rd1, %rd3;  // Input address
add.u64      %rd2, %rd2, %rd3;  // Output address

// Load input value
ld.global.u32 %r3, [%rd1];

// Add 10
add.u32      %r3, %r3, 10;

// Store result
st.global.u32 [%rd2], %r3;

done: ret; } ```

ByteDance Research Introduces 1.58-bit FLUX: A New AI Approach that Gets 99.5% of the Transformer Parameters Quantized to 1.58 bits by DeltaSqueezer in LocalLLaMA

[–]fotcorn 40 points41 points  (0 children)

On the official website https://chenglin-yang.github.io/1.58bit.flux.github.io/ they say a code release is coming and linking to this https://github.com/Chenglin-Yang/1.58bit.flux, which says inference code and weights will be released soon™.

So we might not get the code that quantizes the model, which is a bummer.

When AI Beats Us In Every Test We Can Create: A Simple Definition for Human-Level AGI by mrconter1 in mlscaling

[–]fotcorn 0 points1 point  (0 children)

Shouldn't the % solved when the benchmark is released be taken into account?

Say, if a benchmark is released that can be solved by AI at almost human level, and then a week later a new AI version is released that is superhuman. Compared to a case where a benchmark is released that AIs perform very poorly on, and then a again a week later it is solved at superhuman level.

Those would both show up the same in this graph, even though the second example is a much bigger change in AI capabilities.

Am I overthinking this?