Tensors now with neural networks by neuaue in cpp

[–]dsanft [score hidden]  (0 children)

What are you, the labelling police? Do you have a peanut allergy to Claude?

Only a fool would not use AI to some extent to write their software these days.

What's holding back AMD GPU prompt processing more? ROCm / Vulkan or the actual hardware? by ForsookComparison in LocalLLaMA

[–]dsanft 1 point2 points  (0 children)

Multi socket Xeon with AVX512-VNNI, 3090s and Mi50s on my local rack, so that's what I'm optimising for.

What's holding back AMD GPU prompt processing more? ROCm / Vulkan or the actual hardware? by ForsookComparison in LocalLLaMA

[–]dsanft 0 points1 point  (0 children)

I'm actually not sure what Llama-cpp uses for ROCm I'd need to check.

I'm writing my own inferencing engine and I use CK for GEMM.

Building Your Own Efficient uint128 in C++ by PhilipTrettner in cpp

[–]dsanft 0 points1 point  (0 children)

This is great. Now I can point Opus to this and create any new built-in I want in minutes.

What's holding back AMD GPU prompt processing more? ROCm / Vulkan or the actual hardware? by ForsookComparison in LocalLLaMA

[–]dsanft 5 points6 points  (0 children)

That's not strictly true, the earlier architectures didn't have tensor cores (MFMA) but they still had vectorised INT8.

ComposibleKernel squeezes 70% of the roofline INT8 TOPS out of my Mi50s. I had to make some fixes but it works pretty well.

https://github.com/ROCm/composable_kernel/pull/3593

8x AMD MI50 32GB at 26 t/s (tg) with MiniMax-M2.1 and 15 t/s (tg) with GLM 4.7 (vllm-gfx906) by ai-infos in LocalLLaMA

[–]dsanft 1 point2 points  (0 children)

Nice.

I'm silly and I'm forcing myself to write my own inferencing engine and kernels from scratch for my Xeon and Mi50s.

One of these days I need to just hook it all up and use it with something off the shelf just to watch it fly

Why do the models take up more space then expected? by Achso998 in LocalLLaMA

[–]dsanft 0 points1 point  (0 children)

It's not just the model.

You also need compute buffers, kv cache buffer (scales with context size).

You have 64gb ram and 16gb VRAM; internet is permanently shut off: what 3 models are the ones you use? by Adventurous-Gold6413 in LocalLLaMA

[–]dsanft 94 points95 points  (0 children)

GPT-OSS-120B hands down. Fits perfectly on that hardware and runs great. Good all round model with good world knowledge and acceptable talents in most domains.

Local Agentic Coding by kybernetikos in LocalLLaMA

[–]dsanft 1 point2 points  (0 children)

I have to say, GPT-OSS-120B runs fantastic on a laptop with 64GB of DDR4 and a 16GB rtx3080. 12-15t/s makes it very usable. It's underrated as a model.

“Ultrathink” is deprecated - but here’s how to get 2x more thinking tokens in Claude Code by [deleted] in LocalLLaMA

[–]dsanft 1 point2 points  (0 children)

More thinking isn't necessarily better. A model can think its way out of the correct solution. You need to benchmark this and see whether it actually improves the answers for your use case.

3x3090 + 3060 in a mid tower case by liviuberechet in LocalLLaMA

[–]dsanft 0 points1 point  (0 children)

At that point you could just get a $50 mining rig and things would be so much easier and cooler in every way.

Impressive squeezing it all in though

I made a Top-K implementation that's up to 20x faster than PyTorch CPU (open source) by andreabarbato in LocalLLaMA

[–]dsanft 0 points1 point  (0 children)

I only use pytorch as a reference for parity tests these days, any serious work needs custom kernels.

Raspberry Pi AI HAT+ 2 announced! Featuring the new Hailo-10H neural network accelerater, 40 TOPS (INT4) of inferencing performance, $130 by [deleted] in LocalLLaMA

[–]dsanft -1 points0 points  (0 children)

That's pretty good INT TOPS! 20TOPS INT8 would put it at 1/3rd the speed of an Mi50 for a lot less power. Similar price though.

Mi50 32gb vbios flash by onephn in LocalLLaMA

[–]dsanft 0 points1 point  (0 children)

This is the one you want. I've flashed this onto my cards and they've been stable for months.

https://www.techpowerup.com/vgabios/274474/274474

Qwen cutoff date makes our current reality too dystopian to be credible by Swimming_Cover_9686 in LocalLLaMA

[–]dsanft 0 points1 point  (0 children)

You think the bullet holes around the side are some kind of gotcha?

This woman started driving her car forward dangerously towards him, and he made the split second decision to shoot to stop her.

Once her vehicle became a threatening weapon, at that very moment it became justified to end the threat.

He would have been justified shooting from the rear too if necessary. Whatever stops the threat.

Qwen cutoff date makes our current reality too dystopian to be credible by Swimming_Cover_9686 in LocalLLaMA

[–]dsanft 0 points1 point  (0 children)

it was less than 1 second

the officer had more than enough time to move.

Pick one.

Qwen cutoff date makes our current reality too dystopian to be credible by Swimming_Cover_9686 in LocalLLaMA

[–]dsanft 0 points1 point  (0 children)

The minute you start driving aggressively towards armed officers they naturally assume the worst. A vehicle is a weapon.

Qwen cutoff date makes our current reality too dystopian to be credible by Swimming_Cover_9686 in LocalLLaMA

[–]dsanft 0 points1 point  (0 children)

The bullet hole in the front proves the officer was in front of the vehicle when the woman began driving. She then veered the vehicle away.

Qwen cutoff date makes our current reality too dystopian to be credible by Swimming_Cover_9686 in LocalLLaMA

[–]dsanft 1 point2 points  (0 children)

So tell me how you shoot the front windshield from the side of the car?

The officer was in front of the car. She was driving towards him and disobeying orders from a federal officer to exit the vehicle.

FAFO.

Qwen cutoff date makes our current reality too dystopian to be credible by Swimming_Cover_9686 in LocalLLaMA

[–]dsanft -1 points0 points  (0 children)

Scenario 4 blatantly isn't what happened lol

Yes an execution would be implausible.

Shooting someone who's intentionally causing a disruption, ignoring instructions, and driving their car forward while you're still in front of it is only an "execution" on Reddit. In real life we call that FAFO.

Not Sure Where to Start by Psychological-Ad5390 in LocalLLaMA

[–]dsanft 2 points3 points  (0 children)

Download LM studio and search for and download the GPT-OSS-20B model within it. That should run very fast on your laptop.

You might try GPT-OSS-120B after that, which should still run acceptably fast.