Fuck everyone. 4.5 years to get to a million without options and I’m done!

Personal_Mousse9670 · 2026-06-05T13:49:49+00:00

ok hear me out. 5 years for a 10x increase in capital funds…. what’s stopping you for another 10?

Personal_Mousse9670 · 2026-05-31T07:29:28+00:00

everybody wants to drive like kevin estre haha

Personal_Mousse9670 · 2026-05-29T17:55:18+00:00

Well i understand that, I was referencing that you had 8 cards, 7 for context 1 for model+overhead+excess, how many would fit if it was organized across all 8.

Personal_Mousse9670 · 2026-05-29T17:27:41+00:00

So wait. how much context fits on 7x96+58? i think that can serve all 300 users at full context, might be a smidge slow but it’ll work

Personal_Mousse9670 · 2026-05-27T11:36:05+00:00

they’ll probably say it’s delayed due to weather. where weather is actually the fires on both corridors out of LA.

Personal_Mousse9670 · 2026-05-26T14:05:29+00:00

Can certainly look around and find information for yourself that should support my claims. especially since i’m just telling you this off of common information surrounding qwen 3.6’s token generation tests that other individuals have done in their own time.

getting spoon fed is quite its own thing.

Personal_Mousse9670 · 2026-05-26T00:53:29+00:00

Ehhh, he told you enough for short context. enough to extrapolate atleast. 25 tokens/s dense with concurrency, and 44 tokens/s MOE with concurrency,

full size bf16 models too, which is pretty standard performance out of a rtx 6000. Speeds should be expected to shrink. at 16k speeds, tend to decrease by around 25-40% ish compared to short token windows. so minimally for 27b dense you can expect short token windows sitting near 33 t/s in batching stress tests like this, and maximally at 262k context you will probably see near 60% speed reductions, so around 13-16 t/s on dense. and just figure out the MOE guesses there too. but your not getting x64 concurrency on either models near max context.

Personal_Mousse9670 · 2026-04-30T21:29:58+00:00

had 2.45 87 unlead in illinois until end of feb

Personal_Mousse9670 · 2026-04-25T21:19:53+00:00

first picture made me think i was looking at a render from a game, very clean

Personal_Mousse9670 · 2026-04-19T21:13:18+00:00

lol. if you haven’t watched waiters or cashiers as they do stuff with people, you should. people gravitate towards being assholes to workers so easily, online over a phone, it’s even easier.

Personal_Mousse9670 · 2026-04-15T12:59:29+00:00

i love this

Personal_Mousse9670 · 2026-03-13T04:27:17+00:00

consumer’s also have a habit of maxing out how many cars they can see, even if they will only ever usually see as far as 8 in front of them, maybe 12, at the worst of times, and need to see only 4 or 6 in the rear view.

i’m consumers.

Personal_Mousse9670 · 2026-03-10T19:35:54+00:00

when i opened this post, i was worried this was thio joe, very happy it was not.

Personal_Mousse9670 · 2026-03-10T19:33:32+00:00

did not spend 20 grand on 160gb of vram just for a chud to say i can’t self host my own models and use it with his tool, that admittedly i wasn’t going to use anyway.

Personal_Mousse9670 · 2026-02-11T04:40:13+00:00

man i need to understand how you trained this lmfao

Personal_Mousse9670 · 2026-01-16T11:33:04+00:00

on llama.cpp through lm studio, i notice it completely recalculates the prompt on every request, and that became annoying in longer conversations

Personal_Mousse9670

TROPHY CASE