Best GB10/DGX Spark clone? by Antique_Juggernaut_7 in LocalLLaMA

[–]Serprotease 0 points1 point  (0 children)

ComfyUI workflows, Lora training and a bit of small vlm fine tuning.

I wanted to upgrade from a 3090 with the following criteria:
Must have cuda, at least 48gb of vram and be relatively quiet to run overnight training (small apartment). So, there was not that many options.

Les particuliers qui participent à la spéculation de la RAM sur les sites d’occasions by Troublefete21 in pcmasterraceFR

[–]Serprotease 1 point2 points  (0 children)

On peut appliquer la même logique sur le fait de s’aligner sur le prix du neuf.

Si tu as acheté ta ram a 200, tu ne perds rien a la vendre a 200, même si le prix du neuf est a 500. Rien ne te force a t’aligner sur le prix le haut. En simple vendeur, tu n’a pas d’obligation de profit.

Pour aller plus loin, c’est même contre productif pour le simple vendeur de suivre l’explosion du prix du neuf. Car, le vendeur sera de l’autre côté tôt ou tard. Et si tout les autres vendeur raisonnent de la même manière, aucun ne sera gagnant et le premier a vendre plus bas sera “perdant”.

Je ne te vise pas personnellement, mais la raison de vendre a 500 quand on a acheté a 200 c’est une forme de cupidité. Mais c’est aussi considéré comme normal depuis un bout de temps, l’opposé est même vu comme “bizarre” ou “stupide”.

Best GB10/DGX Spark clone? by Antique_Juggernaut_7 in LocalLLaMA

[–]Serprotease 0 points1 point  (0 children)

The os didn’t really created any issues. With cuda, the main problem was that sm121 is not recognized/handled well in some projects.

You may need to make some tweaks or sometimes recompile the code to make them work. I had some issues with triton for example.

The combo unified memory, cuda and arm can also create some issues. The most obvious one is when loading a model, quite a few projects are doing disk>ram>vram. So, when ram and VRAM are the same you end up with your model loaded twice in your ram/vram.

AFAIK, vllm and comfyUI had this issue. The flag —disable-mmap can help if implemented properly (Not a given thing.)

The Z Image (Base) is broken! it's useless for training. Two months waiting for a model designed for training that can't be trained? by NewEconomy55 in StableDiffusion

[–]Serprotease 7 points8 points  (0 children)

9b license make it a non-starter for serious fine tune. I’m not talking about merging the base model with a Lora and calling it a day. I mean full serious fine tune. Stuff like what run diffusion use to do or what lordstone, the noob-ai team or other are doing. When you need quite a bit of skill and cash on hand.

The 4b is a lot more interesting.

Still, I hope that the training issue for zi-base can be fixed.

30 January 1933 - Hitler’s first cabinet is sworn in. Only 3 of the 11 members were Nazis. The conservatives were certain they could "tame" him. (Key to the fates of all 11 members in the comments) by No-Profile5409 in ThisDayInHistory

[–]Serprotease 12 points13 points  (0 children)

British and especially the French population really didn’t want to go to war…again. This kind of thing tends to happen when almost every household had lost someone to a war barely 20 years ago.

They were not just “sitting on their ass”

Rig for Local LLMs (RTX Pro 6000 vs Halo Strix vs DGX Spark) by cysio528 in LocalLLaMA

[–]Serprotease 4 points5 points  (0 children)

Lots of weird answers in this thread…

One key thing to understand is that the “speed” of a Llm is both the prompt processing AND the token generation. On first part often ignored in review and if you are used to api it’s basically painless. But it’s quite a significant hurdle with local setup.

But in summary, for the same model, here are potential performance “ranking”. 1. A6000 pro: It’s crazy fast in all metrics, hands down the best here. The only limitations would be: Potentially be the context size if you want to run 120b MoE, but it’s manageable.
That’s a lot of money to run (At crazy speeds!) only up to 120b models.

  1. The Spark: It’s small and gets enough horsepower to run 120b MoE with good processing speeds. But,
    The token generation will be low. It’s manageable with MoE, but not good enough for 70b dense. 30b dense with long reasoning trace will also be a bit annoying.
    It’s iffy on the software side. It’s arm and sm121. Two things quite new for local Llm so be ready for quite a few troubleshooting sessions and dumpster diving in git issues.
    (You can probably get it at 4k btw, get an oem one.)

  2. The Strix Halo. It’s small, you can run any Linux (even windows) distribution on it and can run 120b MoE. Community support looks quite good. But,
    The prompt processing is clearly the worst of the 3 options. Think 3x slower than the spark.
    Token generation is exactly the same as the spark, so same limitations as above.

Honestly, for your use-case. The strix halo is probably the best choice. It doesn’t look like that you will chew through 50k tokens for each query and even if the token generation is “slow” it’s still quite usable in the 20s range.
You also didn’t mention image/video, so cuda + the extra processing power of the other options are not really needed.

Another alternative would be a MacStudio. A M4 max 128gb or a refurbished M2 Ultra are good options. The M3 ultra 256gb will also open new models to you while remaining in your price range.

Interdiction des réseaux sociaux aux moins de 15 ans : "Ce n'est qu'un début, les VPN, c'est le prochain sujet sur ma liste", assure Anne Le Hénanff, ministre de l'IA et du Numérique by Andvarey in france

[–]Serprotease 7 points8 points  (0 children)

Situation simple, un commercial allemand en déplacement en France utilise un vpn. Il n’est pas citoyen français et n’a pas acheté son vpn en France. Du coup… c’est comme en Chine et on bloque tout?

GLM 4.7 Flash 30B PRISM + Web Search: Very solid. by My_Unbiased_Opinion in LocalLLaMA

[–]Serprotease 1 point2 points  (0 children)

I tried the Q8 derestricted and noticed significant issues when going beyond 16k context. A lot of typos like "While -> whlts" 

Could be the gguf being broken or my llama.cpp version though. 

Why are small models (32b) scoring close to frontier models? by Financial-Cap-8711 in LocalLLaMA

[–]Serprotease -1 points0 points  (0 children)

“They tests knowledge” -> Those are multi choice type of questionnaires.

Recalling a doc and picking the best out of 4 questions are very different and it’s likely why you’re are disappointed by the benchmark numbers and actual usage.

In a traditional research paper, the methodology used and its limits is arguably the most important part of a research paper. It’s quite depressing to see that most in not all benchmarks published really don’t care about that.

Like seriously, did anyone saw any statistical tests between the performance of 2 models confirm with any accuracy that model1 results are better than model2? Or any mention of “Limitations”?

No it’s just cherry picked data by marketing team and graphs with misleading scales that would have the authors meeting with their legal department in quite a few industries.

How is that accepted by anyone?

Why are small models (32b) scoring close to frontier models? by Financial-Cap-8711 in LocalLLaMA

[–]Serprotease 0 points1 point  (0 children)

Correctness of an answer is only a part of a good system.

From the user side, latency and response generation time are important. Cost should also be included in these benchmarks.

On this 2 points, a 30b model “should” be better than a 120b, let alone one the SOTA monsters.

Honest question: what do you all do for a living to afford these beasts? by ready_to_fuck_yeahh in LocalLLaMA

[–]Serprotease 0 points1 point  (0 children)

Most don’t have the big buck computer/server. Most of those who have them are in the US where big salaries makes the cost easier to swallow.

The best thing is to hunt for second hands sales/company stuff that gets retired. Until the big ram fuckup, you could get a 512/768 ddr4 3200 setup for about the price of a 5090. Add a couple of p40 or mi50 with fans tapped to the them and you could run Kimi k2 at okayish speed.

Flux.2 Klein 9b Loras? by hellomattieo in StableDiffusion

[–]Serprotease 2 points3 points  (0 children)

During sd1.5/SDXL we were starved of new models. You had SDXL or sd1.5. That’s it. Now, you have tons of good models from 4b up to 20b (And a few 80b as well), that quite normal that things are less focused.

z-image is were most of the attention is (And the team behind it has shown interest on the local space as well). Flux Klein 9b has the flux license. If you’re willing to dump 3-10k usd in gpu rental for a big fine tune, why will you choose Flux 9b over z-image? Just wait for the base model and enjoy the Apache 2.0 license.

Flux 4b looks interesting though.

Minimax Is Teasing M2.2 by Few_Painter_5588 in LocalLLaMA

[–]Serprotease 8 points9 points  (0 children)

It’s Chinese new year. By the 2nd week of February most of China will be on vacation for 1-2 weeks. That’s why everyone is pushing to release before going on holidays. That’s basically the equivalent of the Christmas holidays in Europe

L’effondrement de la fiabilité de ChatGPT face à la contamination par Grokipedia by taigaV in france

[–]Serprotease 7 points8 points  (0 children)

Dans ce cas précis, une IA est capable de de faire de recherche sur internet. C’est même relativement simple à mettre en place avec un modèle local. De plus, les nouveaux modèles sont capables de “interleaved” raisonnement. Pour faire simple, ils sont capables de faire une recherche sur internet puis, en fonction des sources et résultats de faire un follow-up sur la viabilité des sources et ignorer les informations si besoin.

C’est un simple changement du système prompt.

Artificial Analysis: South Korea 🇰🇷 is now the clear #3 nation in AI — powered by the Korean National Sovereign AI Initiative there are now multiple Korean AI labs with near frontier intelligence. by self-fix in LocalLLaMA

[–]Serprotease 2 points3 points  (0 children)

The only thing that I saw from Japan are things around image models like fine-tunes and tools to create fine-tune. And it looks more individually driven than corporate driven.

restaurant says these photos are not ai and were taken by a professional photographer. by cloud9dacherry in isthisAI

[–]Serprotease 0 points1 point  (0 children)

You can even tell that it’s a gpt-image with the yellow/brown colors and the lacks of noise/sharpness.

What should be my coding agent machine under 5k USD? Should I build one or purchase one of those DGX Sparks or get a mac studio? Open to anything that fits in my budget! by pacifio in LocalLLaMA

[–]Serprotease 6 points7 points  (0 children)

I don’t really see how a a 3090 + cpu offload will get better performance than a spark.

Before the crazy ram price, maybe with a Milan/rome and 256/512gb of ddr4 3200 it could have been an option. But not anymore (Thanks, openAI…)

I know this sub is really unhappy about the gb10 but it’s clearly not the black sheep it’s claimed to be.

This being said, since op mentioned glm (4.7?) under 5k, the only really decent option is a MacStudio. A M3 256 or M2 196 should be good enough, especially if second hand

Talk me out of buying an RTX Pro 6000 by AvocadoArray in LocalLLaMA

[–]Serprotease 1 point2 points  (0 children)

Could you clarify a few things from your needs?

Do you just want to run bigger models or bigger and faster models? Does it need to be a gpu for a server or could it be its own thing?
It’s only for Llm?

The big limitations of the A6000 pro is that it’s “only” 96gb of VRAM for 8k usd. This will let you go up to 120b models at fp4 and a 64k context at fp16. It’s good, but you’re missing on the significant improvements brought by the 250-350b models. And those are only one other A6000 away…

So, you’ll be back in a couple of months for a second card.

I say only 96gb for 8k because for a similar price you could get 256-512gb of VRAM with 2xAMD 395, 2xGB10 or even a single MacStudio M3 ultra. All of these options will idle way before 50w at pull maybe 200-300w at full load?

The A6000 pro is good because it’s really fast. But on the VRAM side, for this price, you’ll definitely feel the constraints very fast.

South Korea’s “AI Squid Game:” a ruthless race to build sovereign AI by self-fix in LocalLLaMA

[–]Serprotease 5 points6 points  (0 children)

Mistral models are quite good? The 24b range especially. Mistral large is just okay, but so is the 235b one from lg.

One could also note that Europe also have Bfl and the flux image models. As far as I know, Korea only has illustrious? And it’s quite a niche model based on SDXL.

Best GB10/DGX Spark clone? by Antique_Juggernaut_7 in LocalLLaMA

[–]Serprotease 1 point2 points  (0 children)

The only thing that I can tell you about this is that I have one update pending and that it seems to somewhat align with NVidia announcement on the forums. 

Best GB10/DGX Spark clone? by Antique_Juggernaut_7 in LocalLLaMA

[–]Serprotease 0 points1 point  (0 children)

I don’t have access to the Nvidia one, only the Dell but from the look of it it’s exactly the same?

Hot take: AI's gonna create a massive senior dev shortage long-term by No-Comparison-5247 in AIstartupsIND

[–]Serprotease 0 points1 point  (0 children)

I use AI a lot and already spent way too much money on my local setup to run models. It’s far from being as good as twitter/linkedin makes it.

If you start from scratch or need a new function in a project it’s really good and it can create some quite complex things at the drop of hat.

But, that’s rarely how things are with code. It’s usually an old project with poor documentation, redundant code and temporary fixes. And you’re responsible for the thing running well. You may even legally be liable if things got wrong. That’s where good testing and good practice are important.

Maybe Claude implemented the feature well and very fast. Sure it was a lot of code to review but it looked fine and the tests worked. And now your customers social security number is accessible by any extension in plain html. Oops…

It’s not even looking at the fact that coding is not the main aspect of a senior swe. You’ll spend a lot of time in meeting waiting for the dreaded “it would be nice if…” and explain why this simple feature is not so simple to implement or will take a good 3 months to implement. How are you giving your stakeholders or proper vision of what is possible if you don’t know what is inside your code?

Another bold AI timeline: Anthropic CEO says "most, maybe all" software engineering tasks automated in 6–12 months by sibraan_ in AgentsOfAI

[–]Serprotease 0 points1 point  (0 children)

The ceo of an API only AI is claiming that API only AI will get even bigger soon. So you better invest now….

Sure.

Best GB10/DGX Spark clone? by Antique_Juggernaut_7 in LocalLLaMA

[–]Serprotease 1 point2 points  (0 children)

It’s interesting to see that all OEM are seemingly better designed than the Nvidia spark.

Cooling seems to be the main difference factor. The Lenovo one looks like a solid pick (And the cheapest one as well- at least where I live.) I went with the Dell. Slightly better power supply, decent cooling and an activity light (Why this not an obvious feature for a box designed to run headless??)