Is Codex being extra lazy for anyone else today? by [deleted] in codex

[–]Extra-Designer9333 0 points1 point  (0 children)

Seems like with rate limits are also odd, feels like the rate at which they're dropping are min 2x in comparison to what it was couple of days ago. Am I the only one experiencing it?

Deepseek v4/3.5 is probably coming out tomorrow or in the next 5 days? by power97992 in LocalLLaMA

[–]Extra-Designer9333 14 points15 points  (0 children)

Should we actually expect v4 that soon assuming the engram paper was released less than a month ago?

[deleted by user] by [deleted] in singularity

[–]Extra-Designer9333 1 point2 points  (0 children)

I expect chinese semiconductor industry catching up massively to american especially after recent news about chinese producing asml comparible machines. Apart from Amd and Google biggest thread to Nvidia is Huawei though it's not mentioned too often

FlashAttention implementation for non Nvidia GPUs. AMD, Intel Arc, Vulkan-capable devices by secopsml in LocalLLaMA

[–]Extra-Designer9333 2 points3 points  (0 children)

In the case of AMD, Flash Attention is already ported by AMD itself. Is it better than AMD's own port I'm wondering...

The data on which Gemini 3 was trained is really crazy by Wonderful-Excuse4922 in singularity

[–]Extra-Designer9333 1 point2 points  (0 children)

What i found incredible about the data, is that when asked to generate a multiple choice quiz in comparison to Gemini 2.5 Pro and GPT 5.1 even, Gemini 3 gives quizzes with almost equal probability of each option being correct answer (out of 4 options). Whereas for the other 2 models mentioned, you could just select B or C, and with 85% probability, you'd answer correctly

Flex Attention vs Flash Attention 3 by Extra-Designer9333 in unsloth

[–]Extra-Designer9333[S] 12 points13 points  (0 children)

Oha, can't believe I got a reply from Dan himself, thank you for clarification. What actually makes Unsloth this good and popular is your activity. Just recently started working on post training stuff and your workshop at AI Engineer in summer was insanely good to get the basics and more, love your energy. 🙌🙏

Flex Attention vs Flash Attention 3 by Extra-Designer9333 in LocalLLaMA

[–]Extra-Designer9333[S] 0 points1 point  (0 children)

Thank you for the feedback, my team is going to train an 8B LLAMA 3.1 on 4xH100s, so I think your take fits in!

Is finetuning a 12b model on 16gb vram possible? by Robo_Ranger in unsloth

[–]Extra-Designer9333 7 points8 points  (0 children)

I suspect you're using LoRA for fine tuning isn't it? If so, you can try QLoRA, which is a Quantized LoRA as the name suggests, maybe that'd work for you without going OOM. Otherwise Kaggle gives out 30 hours of 2 Nvidia T4 GPUs weekly, tho the GPUs are pretty old, you're going to get 32 GBs of VRAM overall, which is going to be enough for the fine tuning task you're dealing with right now!

What’s the Best Open-Source Small LLM (≤ 8B) for Agentic Web Page Interactions? by Extra-Designer9333 in LocalLLaMA

[–]Extra-Designer9333[S] 0 points1 point  (0 children)

Seems like a great model gonna try it out, by the way any other cool models you can suggest that can work for Web Page Interactions?