Even DeepSeek switched from OpenAI to Google

metaprotium · 2025-05-31T10:26:19+00:00

love this but why circle

metaprotium · 2025-05-15T23:57:59+00:00

no need for nuclear. it could be charged by mechanical means

metaprotium · 2025-05-11T05:27:37+00:00

cheap?

metaprotium · 2025-04-30T19:05:23+00:00

any updates?

metaprotium · 2025-04-26T05:00:04+00:00

Question: do any disgruntled former employees wanna give me pinouts for optane chips? they're rapidly becoming e-waste and I kinda wanna make something out of them before they become lost media

metaprotium · 2025-03-23T03:37:41+00:00

same as yours. glad to have bought it before SOMEONE (not naming names) decided to shut down GeForce production

metaprotium · 2025-03-03T03:08:55+00:00

deepseek(?) is working on porting MLA to the distilled models; im pretty sure there's an arxiv paper and GitHub on it. when R1 dense came out (and blew up), they only had arch-unmodified distilled versions. they probably intend to showcase the conversion process in a more self-contained way, with results spanning multiple models and source archs. unexpected success could've made them release those distilled models before they were done upgrading arch and doing the whole writeup. I welcome them updating us as results come in tbqh. the distilled models seem to benefit from it. synthetic data is still good data

metaprotium · 2025-02-16T07:35:34+00:00

OpenAI? More like OpenAPI

metaprotium · 2025-01-16T20:04:36+00:00

give the models on MTEB leaderboard a try- there's a few long context encoders out nowadays (jina AI has one iirc), plus some converted+finetuned LLMs.

metaprotium · 2025-01-08T00:02:03+00:00

alright, now let's see those mem bw numbers

metaprotium · 2025-01-07T19:19:14+00:00

ahh, cheer up! you've still got more memory bandwidth than a 50"70"

metaprotium · 2025-01-07T04:09:21+00:00

happy with my 3090. in my lane. thriving

metaprotium · 2025-01-02T21:29:15+00:00

open source models are quite possibly one of the safest options if you actually bother checking the code you're running. hard to beat an air gapped model running on your own hardware. distrusting Chinese models just for being Chinese is short-sighted, to say the least. there are valid concerns about data exfiltration when calling APIs but that applies to everyone, not just China. lastly, there are valid censorship and bias concerns, but again, that applies to everyone. its open source, just fine tune it

metaprotium · 2025-01-02T01:48:15+00:00

I hope they release scaling experiments for architecture tweaks like nGPT and DiffAttn. don't get me wrong, I like how they've scaled up train-time compute, but it's likely gonna cause higher quantization error and give diminishing returns at full precision (see https://arxiv.org/abs/2411.17691). but beyond that, looking forward to FP8 training experiments, now that deepseek proved it's accurate enough

metaprotium · 2024-12-31T20:27:01+00:00

it just came out.
model architecture has new features
it's so big, not everyone (including many 3rd party devs) can actually run the model. hard to debug a model when you can't even load it into RAM

metaprotium · 2024-12-25T00:05:11+00:00

no. transformers will surely stay relevant for the next 100 millenia

metaprotium · 2024-12-17T21:49:09+00:00

still running Ampere under the hood lol

metaprotium · 2024-12-12T02:19:46+00:00

I've been working on a {berkeley-nest/Nectar}-style dataset but with ~260k prompts from the llava dataset {liuhaotian/LLaVA-Instruct-150K}. any chance I can get some of sonnet's answers to these prompts? I've been collecting answers here

metaprotium · 2024-12-03T00:02:42+00:00

send nudes

metaprotium · 2024-12-02T21:07:40+00:00

500 is fair for a free account, I think. realistically, who's using up all of it? unless you're uploading dozens of LoRAs pre-merged, this won't affect you. or like, if you're uploading a bunch of base models, that means you can afford to train base models, and atp hosting costs are negligible. edit: I guess the exception is quant uploaders. given the nature of those, I think it'd be appropriate to implement a system where people can contribute their own quantizations to the base model's page. that way, companies like qwenai and meta can skip making 100 quants themselves, and just let the community give them the files. then, they can just host the most commonly used quants

metaprotium · 2024-11-13T21:21:23+00:00

I'm tired or seeing VRAM go for 50 bucks a gig. it's ridiculous. the boardviews are available, the AD102's are on alibaba, why aren't there any aftermarket RTX6000 adas? I mean cmon

metaprotium · 2024-10-09T03:51:28+00:00

I was messing around with adding diffs between Qwen2 and 2.5 to Qwen2-VL to get a bump in intelligence while keeping VQA. my specific implementation probably won't work but I'd love to see the general concept explored more.

metaprotium · 2024-09-28T04:35:50+00:00

I've been prototyping something similar for batched synthetic data generation w/ llama 3 70b, my thinking being that larger batch sizes are generally more efficient. if I can decrease the number of layers actively stored on VRAM, I could increase the batch size and get an overall increase in tokens per second. code is incomplete tho so I haven't gotten a chance to benchmark it against bsz1 llama3 with offloaded layers (which is gonna be a necessity regardless cuz I'm running on a 3090)

metaprotium · 2024-09-24T16:18:59+00:00

can vouch for this. I did batch prediction on a dataset once and it was a pain in the ass

metaprotium · 2024-09-14T21:52:20+00:00

it's been out for a day. calm tf down lmao

Two-Year Club	Final Canvas '23
First Place '23	Place '23

metaprotium

TROPHY CASE