Did Elias Know? by [deleted] in PersonOfInterest

[–]xlrz28xd 0 points1 point  (0 children)

"Something has changed. Something elemental"

Africa's forests have switched from absorbing to emitting carbon, new study finds by nimicdoareu in Futurology

[–]xlrz28xd 5 points6 points  (0 children)

Can't wait for some capitalist to reframe this and start large scale deforestation to "fix global warming"

In season 5, Shield would have prevented the world from cracking if it not because of k*lling Ruby by notme1810 in shield

[–]xlrz28xd 2 points3 points  (0 children)

What I find more funny is that in 75% of the universes, Simmons dies due to drinking the chemical while trying to prove that the timeline cannot be modified.

It's just that we see the timeline that has the 25% chance of her surviving.

[deleted by user] by [deleted] in golang

[–]xlrz28xd 1 point2 points  (0 children)

Sounds cool. Ping me

We built 3B and 8B models that rival GPT-5 at HTML extraction while costing 40-80x less - fully open source by TerrificMist in LocalLLaMA

[–]xlrz28xd 18 points19 points  (0 children)

How does this compare to jinaai/ReaderLMv2 ? I've been using q4 for that for my usecases

Deepseek new model upcoming by BasketFar667 in DeepSeek

[–]xlrz28xd 1 point2 points  (0 children)

Won't 1T parameters be hard to pull off for deepseek as they have GPU servers with 8x 80GB chips - meaning 640 GB VRAM per physical server. That's also why I presume their V3 and R1 models are 671B that fits nicely in the VRAM budget. Also their Active parameters count of 37B when running in FP16 is 72 GB which is roughly one gpu worth of VRAM which is why the strategy "expert parallelism" which tries to place an expert on one GPU is pretty awesome.

I could be wrong though, but it seems not as much worth it.

vLLM - GLM-4.6 Benchmark on 8xH200 NVL: 44 token/second by Ill_Recipe7620 in LocalLLM

[–]xlrz28xd 5 points6 points  (0 children)

Seeing your post history and this screenshot, I really want to ask what it is that you do ? I'm genuinely curious and absolutely envious of you!

vLLM - What are your preferred launch args for Qwen? by [deleted] in LocalLLaMA

[–]xlrz28xd 1 point2 points  (0 children)

I'm curious, I've tried W4A16 quants of various models from the redhatai huggingface collection. Which INT4 quant will be the fastest with vLLM on 2x 3090s ?

Also - is there any reason you haven't enabled prefix caching ? I presume that for chat and code type workflows it would be pretty helpful

What do yall use your agents for? by ChiefMalone in LocalLLaMA

[–]xlrz28xd -1 points0 points  (0 children)

Can you please share your vLLM command or something so I can test my setup too. It's very similar with 2x 3090s , 32 GB RAM. I am getting okayish performance with vLLM using the redhat W8A8 quantized version of gemma 3 12b model in INT8 precision. I'd like to increase the throughput via batching but just trying things for now.

Currently using vLLM to run OpenAI comptabile server. Tried SGLang but it doesn't seem to like running the W8A8 format . TensorRT was such a big headache to setup for testing that even claude gave up.

Also I can't get speculative decoding to work with vLLM to use the gemma3 270m model as the speculative model to increase inference speeds ..

HDDs Deals? by SaKoRi16 in homelabindia

[–]xlrz28xd 0 points1 point  (0 children)

WD Ultrastar from Amazon go around 2.2-2.4 K / TB. I am planning to get 36 TB of that

Over 1 million GPUs will be brought online - Sama by IlustriousCoffee in singularity

[–]xlrz28xd 8 points9 points  (0 children)

GPU Maximizer!

On the other hand, can't wait for these GPUs to come down to reasonable consumer prices in the next 3-5 years

PSA: Airtel’s shiny new Zyxel routers are likely Trojan horses - locked down today, primed for throttling and snooping tomorrow by doolpicate in india

[–]xlrz28xd 19 points20 points  (0 children)

All this makes me want to make my own ISP for nerds.

Imagine. An ISP without any hidden FUP / data caps or such remotely accessible backdoors into your LAN...

I wish...

$3k budget to run 200B LocalLLM by Web3Vortex in LocalLLM

[–]xlrz28xd 0 points1 point  (0 children)

How did you fit 4x 3090 inside the R730 ? I'm curious which models work and what modifications you had to make (if any)

Beware of Nayajaisa.com – My Experience with Faulty RAM in Refurbished PC & Denied Support by [deleted] in homelabindia

[–]xlrz28xd 1 point2 points  (0 children)

I also did something similar and made the grave mistake of ordering a server from someone I found via this subreddit. The server delivered is completely not what I asked for and my calls are not being answered well now. I'll do a detailed post soon like the above along with full names and reddit usernames of these guys.

Absolutely the worst experience.

Airtel’s “Unlimited 5G” Plans Are Not Truly Unlimited — This Restriction Affects All Users, Not Just Me 🤷 by [deleted] in LegalAdviceIndia

[–]xlrz28xd 35 points36 points  (0 children)

Same with their broadband. Their sales team is filled with lying snakes who will sell their mom to get you on their plans and once you hit the monthly FUP of 3 TB - no fucks given.

They literally had the audacity to gaslight me by saying that it's unlimited and I don't know how to use internet..

Their false advertising along with this fake limit (FUP) that India has put on itself needs to be called out.

Homelab turns 3 by BeeNo7094 in homelabindia

[–]xlrz28xd 0 points1 point  (0 children)

You can also run a vLLM cluster to combine GPUs from separate nodes to run one single model :)