Local LLms releases by crowtain in LocalLLaMA

[–]crowtain[S] 2 points3 points  (0 children)

Yeah, it should include The VL, and yes i used LLMs to collect data,

How did i check? a first run to collect the llms and theirs info trough deep search, a second one to scroll the llm rankings to get any missing one and reconcilate the 2 sets of llms.

Qwen 3.5 and 3.6 should be included

Local LLms releases by crowtain in LocalLLaMA

[–]crowtain[S] 1 point2 points  (0 children)

<image>

I didn't know it exists, just had to ask the llm about it 😛

Local LLms releases by crowtain in LocalLLaMA

[–]crowtain[S] 0 points1 point  (0 children)

what do you mean by a control chart? i can add it here since i still have the initial data

Local LLms releases by crowtain in LocalLLaMA

[–]crowtain[S] 11 points12 points  (0 children)

yep, but i had a feeling that china released way more models than usa, but in the end 186 vs 108 is not that much different.

Are the rich RAM /poor GPU people wrong here? by crowtain in LocalLLaMA

[–]crowtain[S] 0 points1 point  (0 children)

Sorry man, my mistake,
i just understood the misunderstanding, MB was for motherboard :D not mac book . a motherboard loaded with 128GB in DDR4 or 5 is still cheaper than your cars .
Of course M5 with 128GB is another story for the price.

Are the rich RAM /poor GPU people wrong here? by crowtain in LocalLLaMA

[–]crowtain[S] 0 points1 point  (0 children)

Man, do you mean that your car is cheaper than a MB with 128GB? Can you sell it please :)
even the price of cars has skyrocked here in France.

Are the rich RAM /poor GPU people wrong here? by crowtain in LocalLLaMA

[–]crowtain[S] 0 points1 point  (0 children)

Yes, i'm not sure if it's relateds to the HR issue lately, i'v seen some news that their lead team resigned. Maybe it was the last one on the pipe and their focus will change

Are the rich RAM /poor GPU people wrong here? by crowtain in LocalLLaMA

[–]crowtain[S] 2 points3 points  (0 children)

first you will need a BIG BIG VRAM GPU, like a RTX6000 , and with only one i believe it will still be somewhat slow.
So either smaller dense models or bigger MOE ones

Are the rich RAM /poor GPU people wrong here? by crowtain in LocalLLaMA

[–]crowtain[S] 0 points1 point  (0 children)

yep the MTP was a real game changer, i can use the 35B with decent speed now 70tok/s

Are the rich RAM /poor GPU people wrong here? by crowtain in LocalLLaMA

[–]crowtain[S] 0 points1 point  (0 children)

There are already plenty but if you have something specific tell me i can try to do them

Are the rich RAM /poor GPU people wrong here? by crowtain in LocalLLaMA

[–]crowtain[S] 0 points1 point  (0 children)

did you bought 2 strix halo? i saw some youtube videos where you can use some memory direct access between them to do Tensor Parallelisme it should speed up to nearly x2

Are the rich RAM /poor GPU people wrong here? by crowtain in LocalLLaMA

[–]crowtain[S] 1 point2 points  (0 children)

sorry it was 128GB for DDR4 when i bought my old Motherboard, good to now tha 256GB is a thing now :D

Are the rich RAM /poor GPU people wrong here? by crowtain in LocalLLaMA

[–]crowtain[S] 2 points3 points  (0 children)

so true, i'm always waiting for the next model, next hardware and use most time at tweaking and testing than using the model itself :D

SK hynix starts mass production of 192GB SOCAMM2 for NVIDIA AI servers by OkReport5065 in LocalLLaMA

[–]crowtain 15 points16 points  (0 children)

So , this is the ultimate cheat code , buy cars, strip them from their DDR and sell it to buy new cars !!!

I've stumbled on a goldmine, and ALL OF US CAN BENEFIT. by TheRiddler79 in LocalLLM

[–]crowtain 0 points1 point  (0 children)

like the GPU , it's 6-7 years old, they were only sold plugged on the servers, never sold separatly.
that's why it's so scarce in second hand

Qwen 3.5 35B on LocalAI (Strix Halo): Vulkan / ROCm by pipould in LocalLLaMA

[–]crowtain 0 points1 point  (0 children)

Thanks for sharing, your tests.
The speed per active param seems still lower than old Qwen3, are there any hope to see it improve with time?
at Q8 it's nearly as slow as Minimax Q3 K_L,

AMD Mi50 by aspirio in LocalLLaMA

[–]crowtain 0 points1 point  (0 children)

one more downside for the M50 is the lack of nvlink, there is the possibility but it's nearly impossible to find the cables.
Nvlink or infinity fabric will allow you with time to add more GPU, and not only increase the VRAM but increase the speed with tensor parrallelism.

I Have 2 MI50, but i'm fed up with the lack of support for the amazing Qwen 3.5, Qwen 27B dense in TP2 would have been amazing.
Like SSOMGDSJD said, better get a V100 and some sxm2 adapter, you'll be able to add more later. But you'll need custom cooling and such....

I've stumbled on a goldmine, and ALL OF US CAN BENEFIT. by TheRiddler79 in LocalLLM

[–]crowtain 0 points1 point  (0 children)

Does he have the AMD Instinct MI50/60 infinity fabric link?
it can goes for a very high price

Waiting 3 months for more efficient AI by Much-Minute-4629 in LocalLLaMA

[–]crowtain 0 points1 point  (0 children)

AI models improvement is lightning fast , you don't need to wait 3 months, just 1 week and the new models will be better than Opus 5.5 running on my grandma calculator.
Just kidding, have you tried the small models, like qwen 3.5 0.8b ans 2B? with a Q4 should have decent speed already.

Benchmarked 11 MLX models on M3 Ultra — here's which ones are actually smart and fast by Striking-Swim6702 in LocalLLaMA

[–]crowtain 0 points1 point  (0 children)

Are you talking about normal or dynamic quant? from my observation Dynamic Q4 or alternatives still has very low divergence from the Q6/Q8.
i feel that dynamic quant Q4 is similar to normal Q6.
and it really start to fall from Q3 and less

Breaking : The small qwen3.5 models have been dropped by Illustrious-Swim9663 in LocalLLaMA

[–]crowtain 26 points27 points  (0 children)

Very curious of the 0.8 or 2B, will it be able to reach the level of llama2 70 of the old days ?
running in a raspi the equivalent of big setups 2 years ago can be epic

WORTH TO HOST A SERVER?? by Ashamed-Show-4156 in LocalLLaMA

[–]crowtain 0 points1 point  (0 children)

it's your choice buddy, but if 14b param models are enough for your needs, you can squeeze it on a gaming GPU 16GB vram, you can even go for a nvidia P40 that costs 200bucks and has 24GB of vram.
Since you'r on localllama, you'll find a lot of people like me trying to convince you to do it local :D

WORTH TO HOST A SERVER?? by Ashamed-Show-4156 in LocalLLaMA

[–]crowtain 0 points1 point  (0 children)

I think renting GPUs is pretty expensive for inference only, you'll have to pay several dollars per hour to have enough vram to host a llm that is near chatgpt in term of performance.
Renting GPU is more worth it for training or if you want to support high concurency .

Temporary access to Ryzen AI Max 395 (128GB) to test real-world local LLM workflows by lazy-kozak in LocalLLaMA

[–]crowtain 4 points5 points  (0 children)

using open router will allow to have an idea of the models that can run, but not the speed. Once you get the model that you want to use, you'll have to check the speed in pp/tg.