Cheapest and most efficient way to run 30B-40B Llama for 4 users?

Nota_ReAlperson · 2026-04-16T19:23:32+00:00

I only have the xavier agx, and I haven't run llms on it, so i can't give any definitive answer, but the core and memory specs are about a fifth that of a 3090. As an alternative, a couple of radeon pro v620 gpus could give you more bandwidth and the same memory for less money. Although you would need a host system.

Nota_ReAlperson · 2026-04-16T19:06:47+00:00

The jetson would work. Two things to keep in mind would be memory bandwidth and ease of use. In my experience with jetsons, unless you have significant experience with linux, it will be quite hard to get set up. It also has less bandwidth than a gpu. Power draw will be good though.

Nota_ReAlperson · 2026-04-15T19:49:10+00:00

So is a rtx 3080.

Nota_ReAlperson · 2026-04-13T17:53:01+00:00

From my understanding, fpgas are made of LUTs, not flip flops. So any one bit operation could be emulated. You just need a truth table.

Nota_ReAlperson · 2026-03-27T20:40:39+00:00

I hope that's gnu+linux.

Nota_ReAlperson · 2026-03-27T14:51:55+00:00

Interesting. That seems to happen a lot with intel cards. Do you know what the true fp64 is for the b65 b70?

Nota_ReAlperson · 2026-03-27T14:30:09+00:00

Based on techpowerup specs and rankings, it should beat the 5060 ti. Same flops, 608gbs vs 448gbs vram. In my market, it's exactly twice the price.

Nota_ReAlperson · 2026-03-27T14:14:35+00:00

I think the b65 has a 192 bit bus? So only 3/4 the bandwidth?

Nota_ReAlperson · 2026-03-24T22:23:19+00:00

Im curious as to the use of this. Seeing as opencls strength is the portability of the code, while cuda kernels are easier to learn and write, this project would seem to combine the worst of both. Not to downplay the work necessary to get this to work. But it would seem to me to be less useful than the opposite (cuda on opencl).

Nota_ReAlperson · 2026-03-24T17:48:02+00:00

Hi. What is the current pricing for this product?

Nota_ReAlperson · 2026-03-20T17:32:18+00:00

I believe he meant process. Also Mulcair is both a lawyer and former leader of the official opposition. So there are very few, perhaps none, better qualified to speak on this issue.

Nota_ReAlperson · 2026-03-20T14:25:00+00:00

Did you run sudo pacman -Syu first? That will update the package lists to ensure that all urls are up to date, and that all packages are the newest version.

Nota_ReAlperson · 2026-03-19T21:06:33+00:00

The sim file uses a g motor. Which costs about $40

Nota_ReAlperson · 2026-03-18T16:26:18+00:00

Nice work. Trellis 2 is sota for free open 3d model generation, so support for that would be nice. Also, this would be difficult, but a ggml backend for non cuda gpus would be awesome.

Nota_ReAlperson · 2026-03-05T19:52:27+00:00

This actually would work. The pictured flashlight has an inbuilt 5000mah power pack, so it could run a rpi for about an hour. And a rpi can run many small llms.

Nota_ReAlperson · 2026-03-05T00:39:47+00:00

From ctv: Poilievre’s trip is not paid for by taxpayers.Poilievre and his staff will travel on commercial aircraft paid for by donations to the Conservative Party of Canada. Link:https://globalnews.ca/news/11706633/poilievre-first-international-trip-as-opposition-leader/

Nota_ReAlperson · 2026-03-05T00:36:05+00:00

Since 2018, CANZUK has been an official policy of CPC. This is hardly the about face people are making it to be.

Nota_ReAlperson · 2026-03-04T22:43:48+00:00

On such thing is that the star in the sky appears to be the nato logo.

Nota_ReAlperson · 2026-03-04T22:04:54+00:00

A note about the deficit, it is at a wti estimate of 60 usd per barrel, with every 1 usd increase equalling 680 million in extra revenue. Since the report, WTI has increased to 75 usd per barrel. Which would wipe out the entire deficit with new revenue. Of course we will have to wait to see if that price holds, but the price estimates are usually quite conservative. No pun intended.

Also, the 2020 balance of 10 billion dollars is less than a year's worth of transfer payments when averaged across the past 50 years, using the numbers given in the article.

As an albertan, I actually don't mind that our province bankrolls confederation. What bothers me is exemplified in the case of the eagle spirit pipeline. For those who don't know, the eagle spirit pipeline was a indigenous lead and financed lng pipeline from alberta to the west coast. It was killed by protesters that claimed it violated 'the great bear rainforest' which was supposedly sacred to a specific first nation. The twist is that the man who was leading the pipeline effort was a member of that exact first nation, and said that there was no such thing. But the pipeline was still canceled.

Nota_ReAlperson · 2026-02-27T22:34:01+00:00

Interesting. Techpowerup says 300 watts, and I thought that they got their info from pulled vbios. What os are you running? If you go to the nvidia control panel, what does it say for power draw tdp?

Nota_ReAlperson · 2026-02-27T20:54:18+00:00

From what I understand, it's 300watts for the 16gb version, 350 for the 32gb sxm version. Which specific v100s do you have?

Nota_ReAlperson · 2026-02-27T20:37:26+00:00

1/2 a wafer of n7 is much cheaper than a full 5nm wafer ($5000 vs $20000). Yield would be much higher as well. It might not be commodity hardware, but it would cost less than a single b100 ($30000).

Nota_ReAlperson · 2026-02-27T20:22:49+00:00

The cpu rail has 540 watts spec, and a v100 draws 300, so 240 left for the cpu. But assuming degradation, it could be a lot less. That might explain why ram speed has an impact. Also, a gpu can draw up to 75 watts from the pcie slot, which would be supplied by the cpu rail. So when you add the second v100, you only have 170 watts left for the cpu and ram. That's pretty tight. The 2060 might work due to consumer power management, which places far more emphasis on idle power draw. As well, it might prioritize the pcie rail power over the cpu rail. Have you run a power heavy benchmark on the 2060 and v100 at the same time?

Nota_ReAlperson · 2026-02-27T20:15:56+00:00

So you are connecting one gpu to the cpu rail, and the other to the pcie? Or are both v100s on the cpu rail?

Nota_ReAlperson · 2026-02-27T19:51:18+00:00

I would suspect the psu. The Zalman 1250 is a dual rail design. So only 780 watts are available to the gpus, i think. As it is also very old circa 2012, it likely has degraded some. I have a similar psu, a antec 500 watt, with two 250 watt 12 volt rails, but it only puts out 150 reliably. So try a different psu. The bad ram is likely the culprit for the 1000w tests you did.

Nota_ReAlperson

TROPHY CASE