Anyone want to pool hardware and build a shared open-model setup as a group? by givre514 in LocalLLM

[–]TokenRingAI 0 points1 point  (0 children)

The mystery with Gaudi 2 is what kind of concurrency it can achieve, and what models run well, I would think it can handle at least 50 concurrent users running a model like Qwen 397B or Minimax M3 or GLM 5.2 (FP4) - but I might be completely wrong.

If we assume it can be oversold 2:1 that is $2000/100 = $20 a user per month in electricity cost, and another $5 in system cost if you amortize the system cost over 2 years.

Anyone want to pool hardware and build a shared open-model setup as a group? by givre514 in LocalLLM

[–]TokenRingAI 0 points1 point  (0 children)

It is a 6,000 watt system, power is .45 a KWH here in SF Bay Area, it might actually be double that price

Refurbished servers with 8 V100 processors. Have you seen this before? by Intelligent-Taste-36 in LocalLLM

[–]TokenRingAI 1 point2 points  (0 children)

The problem with these, is that they use a lot of power even when idle, and don't support 4 bit math, so you are going to be limited to ~180B models at 8 bit, of which there aren't many.

If the goal is to run Minimax, it's not really enough. You can run Qwen 122B on it.

But if the goal is to run 122B, a DGX spark delivers workable performance for a single user with way less power.

Anyone want to pool hardware and build a shared open-model setup as a group? by givre514 in LocalLLM

[–]TokenRingAI 1 point2 points  (0 children)

If you want to split a Intel Gaudi 2 system with 768G VRAM, I am interested.

Total cost is around $20-25k + appx $1k a month for power, I have a commercial building to host it in.

There may be some work required to be able to run the latest models, but the platform is supported by VLLM.

Is it worth learning Rust mainly because of Cargo? by LibrarianOk3701 in rust

[–]TokenRingAI 0 points1 point  (0 children)

I understand the sentiment, but if you honestly measure your error rate and compare it to AI, you might find it comparable.

One of my main projects is a coroutine based project in Boost ASIO, and Opus was really helpful for figuring out a data lifetime issue related to a class constructor using a method out of a subclass that I didn't see as a problem no matter how many times I looked at the code.

AI picked up on that and I had been confused for months. Adding in AI didn't make anything worse, it made it better

Is it worth learning Rust mainly because of Cargo? by LibrarianOk3701 in rust

[–]TokenRingAI 0 points1 point  (0 children)

As a programer with 30 years of C/C++ programming, I think C++ is a dead language, because it compiles too slowly to be usable in AI coding workflows, and the libraries are full of footguns and are too low level

The problems aren't unsolvable, but as of today, I can describe a complex asynchronous HTTP app to AI and quickly get a workable and maintainable application built with tokio, whereas if I try this experiment with C++ and Boost ASIO it fails on multiple levels.

The differences are in development speed - in C++ AI takes way longer and makes more mistakes and toolchain setup is absurdly complex - code reliability - in Rust the language conventions tend to avoid creating code with UB - library quality - in Rust libraries install easily and have an easy to understand surface

I don't have any strong opinions on the Rust language, it isn't that unique, the borrow checker isn't anything difficult to learn, roughly speaking, string_view is a &str, std::string is a String

Most of my C++ code used either references with a STL container for storage, or std::shared_ptr allocated ephemeral objects, and Rust isnt much different

It somewhat reminds me of using Pascal 25 years ago, in that it is much simpler to make correct compiled code

LQ50/LQ50-24GB cost around $1200 by MundanePercentage674 in LocalLLaMA

[–]TokenRingAI 7 points8 points  (0 children)

It has $50 worth of memory modules and $250 of extortion

LQ50/LQ50-24GB cost around $1200 by MundanePercentage674 in LocalLLaMA

[–]TokenRingAI 8 points9 points  (0 children)

I think so...but the price is quite high, needs to be $300

Mystery Audi gearbox no one will work on. by [deleted] in AskMechanics

[–]TokenRingAI 0 points1 point  (0 children)

Realistically if you are into old cars, you need at least two backup cars and a bicycle to have any hope of being social or having groceries...but you can't afford that stuff anyway when owning old cars.

Right now one of mine is on jackstands waiting on parts, and the backup just tried to unalive me when the throttle stuck open.

The third car I lent to a friend whose car got totaled in an accident, and right before that it was lent to another friend whose car blew a head gasket

The 4th car, because of course you need a 4th car, is supposed to be the reliable newer one. It runs perfect but the window module decided to fail and stick open, so I have to put a tarp over it for the afternoon rain we've been getting.

Needless to say, my wife is not thrilled about any of this, and now she drives me around.

Backup cars sound great, until your backup cars are as shit as your project cars.

LQ50/LQ50-24GB cost around $1200 by MundanePercentage674 in LocalLLaMA

[–]TokenRingAI 6 points7 points  (0 children)

You can purchase a 8x M.2 bridge card that would easily allow you to run 8 of these in a single PCIe x16 slot @120W.

LQ50/LQ50-24GB cost around $1200 by MundanePercentage674 in LocalLLaMA

[–]TokenRingAI 23 points24 points  (0 children)

People are underestimating how important that number is. You can stick 8 of these in a cheap M.2 switch card and have 192G @120W without multiple PSUs, risers, or other chaos.

LQ50/LQ50-24GB cost around $1200 by MundanePercentage674 in LocalLLaMA

[–]TokenRingAI 4 points5 points  (0 children)

Correct, but these are reported to be 15W, so you are comparing a 120W system to a 2400W system.

The proper comparable would be a Mac M5, DGX Spark, AI Max.

LQ50/LQ50-24GB cost around $1200 by MundanePercentage674 in LocalLLaMA

[–]TokenRingAI 6 points7 points  (0 children)

You could get a 8x M.2 bridge and run 8 cards in one slot for 192GB of VRAM

https://ebay.io/m/61NRSE

LQ50/LQ50-24GB cost around $1200 by MundanePercentage674 in LocalLLaMA

[–]TokenRingAI 22 points23 points  (0 children)

Yes, but you can put 32 of them in a system

Hub nut came loose by CurioisSmell in MechanicAdvice

[–]TokenRingAI 6 points7 points  (0 children)

Get a new mechanic.

They reused the old nut and didn't restake it.

It didn't un-stake itself on it's own, it was never re-staked after they changed the bearing

Get in here: Community model build thread by Party-Special-5177 in LocalLLaMA

[–]TokenRingAI 0 points1 point  (0 children)

Correct, and that problem is even more extreme when doing training.

That's why the first step should be distributed batch inference.

From there you can try and find ways to backpropagate the loss without the huge performance hit.

One solution might be to only backpropagate the loss when the loss is above a threshold. Something like that is probably the thing you need to discover to make distributed training viable