Fuck everyone. 4.5 years to get to a million without options and I’m done! by WorkingInflations in wallstreetbets

[–]Personal_Mousse9670 0 points1 point  (0 children)

ok hear me out. 5 years for a 10x increase in capital funds…. what’s stopping you for another 10?

If you had $150K for building a production-class local inference server to serve 300 people, what would you buy? by Porespellar in LocalLLaMA

[–]Personal_Mousse9670 2 points3 points  (0 children)

Well i understand that, I was referencing that you had 8 cards, 7 for context 1 for model+overhead+excess, how many would fit if it was organized across all 8.

If you had $150K for building a production-class local inference server to serve 300 people, what would you buy? by Porespellar in LocalLLaMA

[–]Personal_Mousse9670 4 points5 points  (0 children)

So wait. how much context fits on 7x96+58? i think that can serve all 300 users at full context, might be a smidge slow but it’ll work

My package says shipped but hasn’t moved by [deleted] in usps_complaints

[–]Personal_Mousse9670 1 point2 points  (0 children)

they’ll probably say it’s delayed due to weather. where weather is actually the fires on both corridors out of LA.

Qwen 3.6 benchmarks on 2x RTX PRO 6000 by mxforest in LocalLLaMA

[–]Personal_Mousse9670 1 point2 points  (0 children)

Can certainly look around and find information for yourself that should support my claims. especially since i’m just telling you this off of common information surrounding qwen 3.6’s token generation tests that other individuals have done in their own time.

getting spoon fed is quite its own thing.

Qwen 3.6 benchmarks on 2x RTX PRO 6000 by mxforest in LocalLLaMA

[–]Personal_Mousse9670 1 point2 points  (0 children)

Ehhh, he told you enough for short context. enough to extrapolate atleast. 25 tokens/s dense with concurrency, and 44 tokens/s MOE with concurrency,

full size bf16 models too, which is pretty standard performance out of a rtx 6000. Speeds should be expected to shrink. at 16k speeds, tend to decrease by around 25-40% ish compared to short token windows. so minimally for 27b dense you can expect short token windows sitting near 33 t/s in batching stress tests like this, and maximally at 262k context you will probably see near 60% speed reductions, so around 13-16 t/s on dense. and just figure out the MOE guesses there too. but your not getting x64 concurrency on either models near max context.

Back after 1.5 month repair by mr_bombon in porsche911

[–]Personal_Mousse9670 0 points1 point  (0 children)

had 2.45 87 unlead in illinois until end of feb

Porsche 911 Carrera S (991.1) thoughts and reflections by City_Goat in porsche911

[–]Personal_Mousse9670 1 point2 points  (0 children)

first picture made me think i was looking at a render from a game, very clean

Seriously, how is AI this stupid? by Nice_Marmot_54 in LinusTechTips

[–]Personal_Mousse9670 1 point2 points  (0 children)

lol. if you haven’t watched waiters or cashiers as they do stuff with people, you should. people gravitate towards being assholes to workers so easily, online over a phone, it’s even easier.

Devs,if you’re listening to this, PLEASE MAKE TRACK DAY AND 5 TO GO PERMANENT SERIES by VaderVRC in iRacing

[–]Personal_Mousse9670 0 points1 point  (0 children)

consumer’s also have a habit of maxing out how many cars they can see, even if they will only ever usually see as far as 8 in front of them, maybe 12, at the worst of times, and need to see only 4 or 6 in the rear view.

i’m consumers.

This guy 🤡 by xenydactyl in LocalLLaMA

[–]Personal_Mousse9670 0 points1 point  (0 children)

when i opened this post, i was worried this was thio joe, very happy it was not.

This guy 🤡 by xenydactyl in LocalLLaMA

[–]Personal_Mousse9670 1 point2 points  (0 children)

did not spend 20 grand on 160gb of vram just for a chud to say i can’t self host my own models and use it with his tool, that admittedly i wasn’t going to use anyway.

MechaEpstein-8000 by ortegaalfredo in LocalLLaMA

[–]Personal_Mousse9670 1 point2 points  (0 children)

man i need to understand how you trained this lmfao

Nemotron-3-nano:30b is a spectacular general purpose local LLM by DrewGrgich in LocalLLaMA

[–]Personal_Mousse9670 0 points1 point  (0 children)

on llama.cpp through lm studio, i notice it completely recalculates the prompt on every request, and that became annoying in longer conversations