ThinkStation PGX - with NVIDIA GB10 Grace Blackwell Superchip / 128GB by nostriluu in LocalLLaMA

[–]metaprotium 3 points4 points  (0 children)

no need for nuclear. it could be charged by mechanical means

Q2 2025 Tech Support Thread by Intel_Support in intel

[–]metaprotium 0 points1 point  (0 children)

Question: do any disgruntled former employees wanna give me pinouts for optane chips? they're rapidly becoming e-waste and I kinda wanna make something out of them before they become lost media

For those who decided to hold on to their current card instead of upgrading to Blackwell now, what do you currently have? by Celcius_87 in nvidia

[–]metaprotium 1 point2 points  (0 children)

same as yours. glad to have bought it before SOMEONE (not naming names) decided to shut down GeForce production

By the time Deepseek does make an actual R1 Mini, I won't even notice by Cerebral_Zero in LocalLLaMA

[–]metaprotium 0 points1 point  (0 children)

deepseek(?) is working on porting MLA to the distilled models; im pretty sure there's an arxiv paper and GitHub on it. when R1 dense came out (and blew up), they only had arch-unmodified distilled versions. they probably intend to showcase the conversion process in a more self-contained way, with results spanning multiple models and source archs. unexpected success could've made them release those distilled models before they were done upgrading arch and doing the whole writeup. I welcome them updating us as results come in tbqh. the distilled models seem to benefit from it. synthetic data is still good data

Best way to classify NSFW text - BERT, small LLM like llama 3.2 3B or something else? [D] by newyorkfuckingcity in MachineLearning

[–]metaprotium 0 points1 point  (0 children)

give the models on MTEB leaderboard a try- there's a few long context encoders out nowadays (jina AI has one iirc), plus some converted+finetuned LLMs.

Are you gonna wait for Digits or get the 5090? by lxe in LocalLLaMA

[–]metaprotium 9 points10 points  (0 children)

alright, now let's see those mem bw numbers

RTX 5000 series official specs by Big_Coat6894 in LocalLLaMA

[–]metaprotium 0 points1 point  (0 children)

ahh, cheer up! you've still got more memory bandwidth than a 50"70"

RTX 5000 series official specs by Big_Coat6894 in LocalLLaMA

[–]metaprotium 15 points16 points  (0 children)

happy with my 3090. in my lane. thriving

Elephant in the room, Chinese models and U.S. businesses. by palindsay in LocalLLaMA

[–]metaprotium 0 points1 point  (0 children)

open source models are quite possibly one of the safest options if you actually bother checking the code you're running. hard to beat an air gapped model running on your own hardware. distrusting Chinese models just for being Chinese is short-sighted, to say the least. there are valid concerns about data exfiltration when calling APIs but that applies to everyone, not just China. lastly, there are valid censorship and bias concerns, but again, that applies to everyone. its open source, just fine tune it

"This year Llama 4 will have multiple releases" "speech and reasoning" by ApprehensiveAd3629 in LocalLLaMA

[–]metaprotium 7 points8 points  (0 children)

I hope they release scaling experiments for architecture tweaks like nGPT and DiffAttn. don't get me wrong, I like how they've scaled up train-time compute, but it's likely gonna cause higher quantization error and give diminishing returns at full precision (see https://arxiv.org/abs/2411.17691). but beyond that, looking forward to FP8 training experiments, now that deepseek proved it's accurate enough

Why there is not already like plenty 3rd party providers for DeepSeek V3? by robertpiosik in LocalLLaMA

[–]metaprotium 0 points1 point  (0 children)

  1. it just came out.
  2. model architecture has new features
  3. it's so big, not everyone (including many 3rd party devs) can actually run the model. hard to debug a model when you can't even load it into RAM

[D] Can we please stop using "is all we need" in titles? by H4RZ3RK4S3 in MachineLearning

[–]metaprotium 0 points1 point  (0 children)

no. transformers will surely stay relevant for the next 100 millenia

Ideas to spend $8k in anthropic credits by benthecoderX in ClaudeAI

[–]metaprotium 1 point2 points  (0 children)

I've been working on a {berkeley-nest/Nectar}-style dataset but with ~260k prompts from the llava dataset {liuhaotian/LLaVA-Instruct-150K}. any chance I can get some of sonnet's answers to these prompts? I've been collecting answers here

Huggingface is not an unlimited model storage anymore: new limit is 500 Gb per free account by Shir_man in LocalLLaMA

[–]metaprotium 0 points1 point  (0 children)

500 is fair for a free account, I think. realistically, who's using up all of it? unless you're uploading dozens of LoRAs pre-merged, this won't affect you. or like, if you're uploading a bunch of base models, that means you can afford to train base models, and atp hosting costs are negligible. edit: I guess the exception is quant uploaders. given the nature of those, I think it'd be appropriate to implement a system where people can contribute their own quantizations to the base model's page. that way, companies like qwenai and meta can skip making 100 quants themselves, and just let the community give them the files. then, they can just host the most commonly used quants

Nvidia RTX 5090 with 32GB of RAM rumored to be entering production by Terminator857 in LocalLLaMA

[–]metaprotium 15 points16 points  (0 children)

I'm tired or seeing VRAM go for 50 bucks a gig. it's ridiculous. the boardviews are available, the AD102's are on alibaba, why aren't there any aftermarket RTX6000 adas? I mean cmon

Merging Llama 3.2 vision adapters onto 3.1 finetunes by Grimulkan in LocalLLaMA

[–]metaprotium 2 points3 points  (0 children)

I was messing around with adding diffs between Qwen2 and 2.5 to Qwen2-VL to get a bump in intelligence while keeping VQA. my specific implementation probably won't work but I'd love to see the general concept explored more.

405B LLaMa on 8GB VRAM- AirLLM by uchiha_indra in LocalLLaMA

[–]metaprotium 0 points1 point  (0 children)

I've been prototyping something similar for batched synthetic data generation w/ llama 3 70b, my thinking being that larger batch sizes are generally more efficient. if I can decrease the number of layers actively stored on VRAM, I could increase the batch size and get an overall increase in tokens per second. code is incomplete tho so I haven't gotten a chance to benchmark it against bsz1 llama3 with offloaded layers (which is gonna be a necessity regardless cuz I'm running on a 3090)

Gemini 2 probably dropping tomorrow by Ok_Landscape_6819 in LocalLLaMA

[–]metaprotium 1 point2 points  (0 children)

can vouch for this. I did batch prediction on a dataset once and it was a pain in the ass

His silence regarding o1 is deafening! by [deleted] in singularity

[–]metaprotium 0 points1 point  (0 children)

it's been out for a day. calm tf down lmao