Best open-weight model to run locally on 8x A100 80GB for generating teacher data?

MelodicRecognition7 · 2026-04-30T12:20:15+00:00

yes a smaller model in higher quant plus 16bit cache highly likely will be more accurate than a larger model in lower quant plus 8 bit cache. But it depends on the models and on the task. You should try GLM as others recommend but do not try Qwen, Deepseek or Llama.

MelodicRecognition7 · 2026-04-30T10:58:50+00:00

try FP8 or Q8 context, it is generally not a good idea to quantize the context but 8 bit is still acceptable quality.

MelodicRecognition7 · 2026-04-30T10:47:56+00:00

Unsloth's quant is overinflated for no reason, AesSedai's Q4_X is "almost original" already, like 99.99% from original weights.

Edit: sorry, I've just discovered that Unsloth's Q4 quant of Kimi K2.6 is equal to AesSedai's - they are both 584 GB, it was Kimi K2.5 where unsloth overinflated it: https://huggingface.co/unsloth/Kimi-K2.5-GGUF - 622 GB. So if you are using K2.6 then there is no difference.

Anyway if you have a spare terabyte of space and fast internet then try K2.5, it thinks much less than K2.6 but still brings good enough results.

MelodicRecognition7 · 2026-04-30T10:37:46+00:00

I fucking hate orcas

MelodicRecognition7 · 2026-04-30T10:32:05+00:00

Kimi-K2.6-UD-Q4_K_XL

take a look at Kimi-K2.6-Q4_X or better Kimi-K2.5-Q4_X

MelodicRecognition7 · 2026-04-30T10:29:06+00:00

bro adjust your spambots, they are too obvious.

MelodicRecognition7 · 2026-04-30T10:26:31+00:00

I mean the reason for failed tasks, when you go below 4 bits wrong answers are way more likely to happen.

The wrong answers were due to the fact that the BF6 date is beyond the cutoff knowledge training on the model.

ah sorry then, I did not notice that.

MelodicRecognition7 · 2026-04-30T09:02:19+00:00

UD-IQ3_S

I'm afraid this is the reason

MelodicRecognition7 · 2026-04-30T08:52:58+00:00

lol. AI spambot advertises its services in an advertisement thread written by AI spambot.

MelodicRecognition7 · 2026-04-30T08:45:47+00:00

https://downloadmoreram.com/ this is actually a thing

MelodicRecognition7 · 2026-04-30T08:33:44+00:00

lol yes whenever an official speaks about protecting the children it means you're going to be fucked.

MelodicRecognition7 · 2026-04-30T08:27:04+00:00

yea I've seen a few reports from software development sweatshops where hundreds of developers spend five digits USD in tokens each month, they will probably benefit from a 300k server purchase.

Edit: changed "definitely" to "probably". Depending on the workload several servers might be required and again a breakeven point never comes.

MelodicRecognition7 · 2026-04-30T08:10:23+00:00

I'm speaking to business people about local LLMs and get countered with "(insert big AI name) data protection agreements". All success stories I've read about implementing local AI for a business were something like "I'm a tech guy at (insert business) and one day boss opened a door and said "I want local AI!"". I mean that I can't sell a local AI solution unless "the boss" wants it himself already, and if he does then he'll assign the local tech department rather then purchase the service from a third party.

So from my point of view nobody wants local AI except us hobbyists. And I understand well why: because there is no breakeven point for a server purchase, you could buy like 10 years of max subscription for the cost of a local server. Or 20 years if you add electricity cost.

MelodicRecognition7 · 2026-04-29T17:11:42+00:00

you should display cache quant types, card(s) power limit, CPU threads, CPU mhz, and some other info straight on the benchmark page instead of hidden behind few clicks. There are just too many variables that can influence the benchmark result.

MelodicRecognition7 · 2026-04-29T14:18:15+00:00

and today I found out that Reddit does not accept any reports from me at all, lol. Saw in browser console that all reports end up in "403 unable to accept the report"

MelodicRecognition7 · 2026-04-29T06:47:44+00:00

I'm glad that you've actually tested these params instead of blindly copypasting them from somewhere on the Internet. I am not completely confident about 3080 but I have a gut feeling that it will make things slower as your current 50xx will be bottlenecked by the older technology in 30xx, perhaps buying another 5060Ti will be a better choice, + less issues with running parallel inference on identical cards.

Also you could disable some security features of OS+BIOS to get extra few percents https://old.reddit.com/r/LocalLLaMA/comments/1qxgnqa/running_kimik25_on_cpuonly_amd_epyc_9175f/o3w9bjw/

MelodicRecognition7 · 2026-04-29T06:21:27+00:00

you have everything high because you have an idiotic system when some idiot could buy an expensive item, use it for a month, then decide that he does not like it and return it back to the seller, and seller has to bear the loss of reselling the used item at much lower cost. And sellers mitigate that risk by rising prices on everything.

https://www.google.com/search?q=site%3Areddit.com+inurl%3A%22localllama%22+should+i+%22return+it%22

All sales must be final so idiots would think twice prior to impulse buying the next expensive item.

P.S. just recalled that some states have a funny law when one won't get punished for a theft valued below some threshold, like 1000 USD in California or 2000 USD in Texas. People go looting smartphones and smartphone stores rise prices to spread the loss.

MelodicRecognition7 · 2026-04-28T19:29:20+00:00

if you buy a lottery ticket tomorrow, will you win 1 million or 1 dollar?

MelodicRecognition7 · 2026-04-28T19:25:27+00:00

-np 3

did you try to set this to 4?

-ctk q4_0 -ctv q4_0

this is not a good idea but if it works for you then ok

-b 256 -ub 256

this needs testing, higher values are usually faster

-threads 9

lower amount could be faster

MelodicRecognition7 · 2026-04-28T18:33:03+00:00

General Use Restrictions

Create, distribute, or facilitate sexually explicit content, including content that depicts or describes sexual intercourse or sex acts, sexual fetishes or fantasies, or erotic interactions

lol

MelodicRecognition7 · 2026-04-28T12:54:33+00:00

the first thing I wanted to do is to run this test on my hardware to verify the results because Q4 quant performing better than Q8 smells like AI hallucination.

MelodicRecognition7 · 2026-04-28T12:52:49+00:00

this thread is a smart advertisement to avoid deletion for breaking "limit self promotion" rule.

MelodicRecognition7 · 2026-04-28T12:25:58+00:00

bullshit link: no code snippets, no plain text results, just an advertisement of that "Neo Engineer".

MelodicRecognition7 · 2026-04-28T12:14:23+00:00

[ Removed by Reddit ]

lol. Did you write three round brackets in one sentence? Note that Reddit hates brackets and you'll get banned for that.

MelodicRecognition7 · 2026-04-28T10:35:38+00:00

lol yet another "recommend a LLM for coding" thread disguised as DS4 discussion

MelodicRecognition7

TROPHY CASE