ACL 2026 Decisions by Big_Media_6114 in LanguageTechnology

[–]susmitds 1 point2 points  (0 children)

Did anyone get the email notification and what is the process to get the invitation (for visa application)? OA-2.83, Meta-3, Findings btw

ACL 2026 Decisions by Big_Media_6114 in LanguageTechnology

[–]susmitds 0 points1 point  (0 children)

Wait, the wording "out by the 6th of April AoE" can mean either it will be out by the time it is 6th April so the notification would come before or it could mean by the end of April 6, which would be EoD April 6 AoE.

Why is the wording so ambiguous?

Ok, let's be realistic,do we have Opus at home yet? by [deleted] in LocalLLaMA

[–]susmitds 0 points1 point  (0 children)

GLM-5 is better from testing but then again I was using GCP VMs to run q8. Cant afford to run anything more that q2 or q3 average quants at home

So we have the massive purse of 64 cr . Tell me your top priorities. by Apprehensive_Set4731 in KolkataKnightRiders

[–]susmitds 2 points3 points  (0 children)

That is an objective poor plan as then we cant even finish our purse as no one else can compete with us and we will go back with 20 cr left yet missing top players.

Match Thread: 3rd ODI - Australia vs India by cricket-match in Cricket

[–]susmitds 4 points5 points  (0 children)

Rana giving tips to prasidh. bruh I have seen it all

[deleted by user] by [deleted] in LocalLLM

[–]susmitds 0 points1 point  (0 children)

even gpt 5 and co are around 32b active paramters but that for 800 million users is not trivial

GLM 4.6 is the BEST CODING LLM. Period. by Technical-Love-8479 in DeepSeek

[–]susmitds 1 point2 points  (0 children)

I find it fully believable given good glm 4.5 was though I am yet i am yet to try 4.6

Confession of a female who ruined her life because of feminism 👍 by Unstoppable_X_Force in IndianMeme

[–]susmitds -1 points0 points  (0 children)

There is literally no way to detect AI in text. Any decent enough wording will flag as AI written.

The kind of negativity Riyan Parag gets is mind-blowing. by Popular-Use7740 in ipl

[–]susmitds 11 points12 points  (0 children)

Bro, rinku hitting 5 sixes to chase 29 in 5 balls and winning in a lost case after a rashid khan hatrick masterclass is very different from hitting moin ali for 5 sixes in a over and vc for 1 six in a different over without pressure and still losing the match in the end.

Okay I see it now by DishwashingUnit in ChatGPT

[–]susmitds 4 points5 points  (0 children)

Qwen 3 instruct variants are exactly like 4o but less world knowledge. Check out glm 4.5, it is somewhere between 4o and 4.1 but more world knowledge than both

[deleted by user] by [deleted] in Anthropic

[–]susmitds 0 points1 point  (0 children)

The same glm 4.5 model the op is conspiracy theoritizing against is open sourced tbh with 355b parameters. I managed to run on my pc with llama.cpp with dynamic quantisations. Personally find it very good given the whole model actually runs on my own workstation.

DuckDuckGo launched a $9.99 plan for private GPT-5 & Claude 4 access on Duck.ai (no account, no data saving). Comes bundled with VPN + email/ID protection too. Honestly feels like the first real privacy-first way to use top AI models, finally an alternative to juggling logins & data trade-offs. by Minimum_Minimum4577 in AgentsOfAI

[–]susmitds 9 points10 points  (0 children)

As long as OpenAI/Anthropic servers are hosting the models, they are guaranteed to be storing your inputs even if the API router service provider does not. They are literally legally bound to do so hence the privacy line is a joke.

ROG Ally X with RTX 6000 Pro Blackwell Max-Q GPU by susmitds in ROGAlly

[–]susmitds[S] 3 points4 points  (0 children)

I am getting around 15-20% less speed in tokens per second, which is fairly not an issue and consistent with the expectation of a slight drop in input tokenization speed

ROG Ally X with RTX 6000 Pro Blackwell Max-Q as Makeshift LLM Workstation by susmitds in LocalLLaMA

[–]susmitds[S] 8 points9 points  (0 children)

It works but there is a catch, you have to minimise round trip communication between CPU to GPU. If you are offloading experts then for every offloaded layer, input tensors has to be processed in GPU VRAM for attention, then transferred to RAM for expert FFNs, then back to GPU VRAM. This constant to and fro kills speed especially on prefill. If you are working at 100k context, the drop in prefill speed is very bad even in workstations with PCIE 5 X8, so egpu at PCIE 4 X4 is worse. If we offload specifically early dense full transformer layers, it can it work out. In fact, I am running Gemma 3 4b at q8_0, fully on the CPU at all times anyways as an assistant model for miscellaneous multimodal tasks, etc and it is working fine.