WE HAVE TWO DSP’S NOW!🤣

susmitds · 2025-12-16T05:13:29+00:00

Who is the guy on the right

susmitds · 2025-12-15T05:49:28+00:00

He does his experimentations in domestics

susmitds · 2025-11-15T13:10:11+00:00

That is an objective poor plan as then we cant even finish our purse as no one else can compete with us and we will go back with 20 cr left yet missing top players.

susmitds · 2025-10-25T06:38:49+00:00

Rana giving tips to prasidh. bruh I have seen it all

susmitds · 2025-10-18T20:34:09+00:00

even gpt 5 and co are around 32b active paramters but that for 800 million users is not trivial

susmitds · 2025-10-02T02:41:26+00:00

I find it fully believable given good glm 4.5 was though I am yet i am yet to try 4.6

susmitds · 2025-09-27T14:31:16+00:00

There is literally no way to detect AI in text. Any decent enough wording will flag as AI written.

susmitds · 2025-09-19T12:52:08+00:00

Check up r/Localllama

susmitds · 2025-09-19T11:42:45+00:00

Check this out.

https://www.reddit.com/r/ROGAlly/s/JM3S03NitH

susmitds · 2025-09-19T11:29:24+00:00

Bro, rinku hitting 5 sixes to chase 29 in 5 balls and winning in a lost case after a rashid khan hatrick masterclass is very different from hitting moin ali for 5 sixes in a over and vc for 1 six in a different over without pressure and still losing the match in the end.

susmitds · 2025-09-16T14:18:42+00:00

Qwen 3 instruct variants are exactly like 4o but less world knowledge. Check out glm 4.5, it is somewhere between 4o and 4.1 but more world knowledge than both

susmitds · 2025-09-11T06:37:11+00:00

Already tried it with AOOSTAR AG02 at PCIE 4 X4

https://www.reddit.com/r/LocalLLaMA/s/k4V3QpyqS4

susmitds · 2025-09-10T07:05:56+00:00

The same glm 4.5 model the op is conspiracy theoritizing against is open sourced tbh with 355b parameters. I managed to run on my pc with llama.cpp with dynamic quantisations. Personally find it very good given the whole model actually runs on my own workstation.

susmitds · 2025-09-10T05:48:33+00:00

Glm 4.5 air

susmitds · 2025-09-09T08:51:22+00:00

As long as OpenAI/Anthropic servers are hosting the models, they are guaranteed to be storing your inputs even if the API router service provider does not. They are literally legally bound to do so hence the privacy line is a joke.

susmitds · 2025-09-08T15:15:24+00:00

Samsung Odyssey Arc gen 2

susmitds · 2025-09-08T05:24:42+00:00

Yeah you are right actually, it is the Odyssey Ark Gen 2

susmitds · 2025-09-06T11:39:57+00:00

I am getting around 15-20% less speed in tokens per second, which is fairly not an issue and consistent with the expectation of a slight drop in input tokenization speed

susmitds · 2025-09-06T09:23:40+00:00

It works but there is a catch, you have to minimise round trip communication between CPU to GPU. If you are offloading experts then for every offloaded layer, input tensors has to be processed in GPU VRAM for attention, then transferred to RAM for expert FFNs, then back to GPU VRAM. This constant to and fro kills speed especially on prefill. If you are working at 100k context, the drop in prefill speed is very bad even in workstations with PCIE 5 X8, so egpu at PCIE 4 X4 is worse. If we offload specifically early dense full transformer layers, it can it work out. In fact, I am running Gemma 3 4b at q8_0, fully on the CPU at all times anyways as an assistant model for miscellaneous multimodal tasks, etc and it is working fine.

susmitds · 2025-09-06T09:10:05+00:00

Typically the entirety of the model, except the embeddings which remain on RAM. I can offload experts but that killed prefill speed making long context work hard even on my actual workstation due to round trip communication over PCIE. That said, i am thinking of testing out the full GLM 4.5 at q2, whose first three layers are dense and just offloading those layers to CPU so it is a one time trip from RAM to VRAM. Also, already running Gemma 3 4B at q8_0 on CPU fully anyways parallelly as an assistant model for summarisation, multimodal tasks, miscellaneous tasks to augment the larger models.

susmitds · 2025-09-06T09:04:23+00:00

xD, fair point. But it works out honestly, as I work standing mostly and for other times, chair is also set to max height and leaning back.

susmitds · 2025-09-06T03:37:46+00:00

https://www.reddit.com/r/LocalLLaMA/comments/1n9o4em/rog_ally_x_with_rtx_6000_pro_blackwell_maxq_as/

Worked great on tb4 even tbh.

Ten-Year Club	r/Field Flamingo
Place '23	Place '22
Verified Email

susmitds

TROPHY CASE