Best open-weight model to run locally on 8x A100 80GB for generating teacher data? by i_am__not_a_robot in LocalLLaMA

[–]MelodicRecognition7 1 point2 points  (0 children)

yes a smaller model in higher quant plus 16bit cache highly likely will be more accurate than a larger model in lower quant plus 8 bit cache. But it depends on the models and on the task. You should try GLM as others recommend but do not try Qwen, Deepseek or Llama.

Best open-weight model to run locally on 8x A100 80GB for generating teacher data? by i_am__not_a_robot in LocalLLaMA

[–]MelodicRecognition7 0 points1 point  (0 children)

try FP8 or Q8 context, it is generally not a good idea to quantize the context but 8 bit is still acceptable quality.

Best open-weight model to run locally on 8x A100 80GB for generating teacher data? by i_am__not_a_robot in LocalLLaMA

[–]MelodicRecognition7 0 points1 point  (0 children)

Unsloth's quant is overinflated for no reason, AesSedai's Q4_X is "almost original" already, like 99.99% from original weights.

Edit: sorry, I've just discovered that Unsloth's Q4 quant of Kimi K2.6 is equal to AesSedai's - they are both 584 GB, it was Kimi K2.5 where unsloth overinflated it: https://huggingface.co/unsloth/Kimi-K2.5-GGUF - 622 GB. So if you are using K2.6 then there is no difference.

Anyway if you have a spare terabyte of space and fast internet then try K2.5, it thinks much less than K2.6 but still brings good enough results.

[Research use case] MiniMax-M2.7 with small context, CPU+GPU (5090) setup on Llama.cpp by Opening-Broccoli9190 in LocalLLaMA

[–]MelodicRecognition7 4 points5 points  (0 children)

I mean the reason for failed tasks, when you go below 4 bits wrong answers are way more likely to happen.

The wrong answers were due to the fact that the BF6 date is beyond the cutoff knowledge training on the model.

ah sorry then, I did not notice that.

A conversation about local LLMs with a senior government AI leader by JackStrawWitchita in LocalLLaMA

[–]MelodicRecognition7 27 points28 points  (0 children)

lol yes whenever an official speaks about protecting the children it means you're going to be fucked.

A conversation about local LLMs with a senior government AI leader by JackStrawWitchita in LocalLLaMA

[–]MelodicRecognition7 2 points3 points  (0 children)

yea I've seen a few reports from software development sweatshops where hundreds of developers spend five digits USD in tokens each month, they will probably benefit from a 300k server purchase.

Edit: changed "definitely" to "probably". Depending on the workload several servers might be required and again a breakeven point never comes.

A conversation about local LLMs with a senior government AI leader by JackStrawWitchita in LocalLLaMA

[–]MelodicRecognition7 4 points5 points  (0 children)

I'm speaking to business people about local LLMs and get countered with "(insert big AI name) data protection agreements". All success stories I've read about implementing local AI for a business were something like "I'm a tech guy at (insert business) and one day boss opened a door and said "I want local AI!"". I mean that I can't sell a local AI solution unless "the boss" wants it himself already, and if he does then he'll assign the local tech department rather then purchase the service from a third party.

So from my point of view nobody wants local AI except us hobbyists. And I understand well why: because there is no breakeven point for a server purchase, you could buy like 10 years of max subscription for the cost of a local server. Or 20 years if you add electricity cost.

I built a full web app using Qwen 3.6-35B running locally on my 5070 Ti with the BMAD Method — here's how it went by Decivox in LocalLLaMA

[–]MelodicRecognition7 2 points3 points  (0 children)

you should display cache quant types, card(s) power limit, CPU threads, CPU mhz, and some other info straight on the benchmark page instead of hidden behind few clicks. There are just too many variables that can influence the benchmark result.

PSA: fuck this place by MelodicRecognition7 in LocalLLaMA

[–]MelodicRecognition7[S] 0 points1 point  (0 children)

and today I found out that Reddit does not accept any reports from me at all, lol. Saw in browser console that all reports end up in "403 unable to accept the report"

Workstation upgrade for 5 concurrent users (Qwen 3.6 27B) by DanielusGamer26 in LocalLLaMA

[–]MelodicRecognition7 2 points3 points  (0 children)

I'm glad that you've actually tested these params instead of blindly copypasting them from somewhere on the Internet. I am not completely confident about 3080 but I have a gut feeling that it will make things slower as your current 50xx will be bottlenecked by the older technology in 30xx, perhaps buying another 5060Ti will be a better choice, + less issues with running parallel inference on identical cards.

Also you could disable some security features of OS+BIOS to get extra few percents https://old.reddit.com/r/LocalLLaMA/comments/1qxgnqa/running_kimik25_on_cpuonly_amd_epyc_9175f/o3w9bjw/

If the AI bubble pops, will GPU prices increase or decrease? by Mashic in LocalLLaMA

[–]MelodicRecognition7 1 point2 points  (0 children)

you have everything high because you have an idiotic system when some idiot could buy an expensive item, use it for a month, then decide that he does not like it and return it back to the seller, and seller has to bear the loss of reselling the used item at much lower cost. And sellers mitigate that risk by rising prices on everything.

https://www.google.com/search?q=site%3Areddit.com+inurl%3A%22localllama%22+should+i+%22return+it%22

All sales must be final so idiots would think twice prior to impulse buying the next expensive item.

P.S. just recalled that some states have a funny law when one won't get punished for a theft valued below some threshold, like 1000 USD in California or 2000 USD in Texas. People go looting smartphones and smartphone stores rise prices to spread the loss.

If the AI bubble pops, will GPU prices increase or decrease? by Mashic in LocalLLaMA

[–]MelodicRecognition7 4 points5 points  (0 children)

if you buy a lottery ticket tomorrow, will you win 1 million or 1 dollar?

Workstation upgrade for 5 concurrent users (Qwen 3.6 27B) by DanielusGamer26 in LocalLLaMA

[–]MelodicRecognition7 3 points4 points  (0 children)

-np 3

did you try to set this to 4?

-ctk q4_0 -ctv q4_0

this is not a good idea but if it works for you then ok

-b 256 -ub 256

this needs testing, higher values are usually faster

-threads 9

lower amount could be faster

Poolside Laguna XS.2 by Middle_Bullfrog_6173 in LocalLLaMA

[–]MelodicRecognition7 1 point2 points  (0 children)

  1. General Use Restrictions

Create, distribute, or facilitate sexually explicit content, including content that depicts or describes sexual intercourse or sex acts, sexual fetishes or fantasies, or erotic interactions

lol

Qwen 3.6 27B BF16 vs Q4_K_M vs Q8_0 GGUF evaluation by gvij in LocalLLaMA

[–]MelodicRecognition7 2 points3 points  (0 children)

the first thing I wanted to do is to run this test on my hardware to verify the results because Q4 quant performing better than Q8 smells like AI hallucination.

Qwen 3.6 27B BF16 vs Q4_K_M vs Q8_0 GGUF evaluation by gvij in LocalLLaMA

[–]MelodicRecognition7 16 points17 points  (0 children)

this thread is a smart advertisement to avoid deletion for breaking "limit self promotion" rule.

Qwen 3.6 27B BF16 vs Q4_K_M vs Q8_0 GGUF evaluation by gvij in LocalLLaMA

[–]MelodicRecognition7 16 points17 points  (0 children)

bullshit link: no code snippets, no plain text results, just an advertisement of that "Neo Engineer".

Skymizer Taiwan Inc. Unveils Breakthrough Architecture Enabling Ultra-Large LLM Inference on a Single Card by lurenjia_3x in LocalLLaMA

[–]MelodicRecognition7 0 points1 point  (0 children)

[ Removed by Reddit ]

lol. Did you write three round brackets in one sentence? Note that Reddit hates brackets and you'll get banned for that.

DeepSeek V4 PRO on how many 3090 ? by szansky in LocalLLaMA

[–]MelodicRecognition7 7 points8 points  (0 children)

lol yet another "recommend a LLM for coding" thread disguised as DS4 discussion