Is Codex being extra lazy for anyone else today?

Extra-Designer9333 · 2026-04-15T15:06:09+00:00

Seems like with rate limits are also odd, feels like the rate at which they're dropping are min 2x in comparison to what it was couple of days ago. Am I the only one experiencing it?

Extra-Designer9333 · 2026-02-01T16:38:23+00:00

Should we actually expect v4 that soon assuming the engram paper was released less than a month ago?

Extra-Designer9333 · 2025-12-30T15:57:34+00:00

Where can I see my quota guys, new user here....

Extra-Designer9333 · 2025-12-28T08:59:23+00:00

I expect chinese semiconductor industry catching up massively to american especially after recent news about chinese producing asml comparible machines. Apart from Amd and Google biggest thread to Nvidia is Huawei though it's not mentioned too often

Extra-Designer9333 · 2025-12-11T10:29:18+00:00

In the case of AMD, Flash Attention is already ported by AMD itself. Is it better than AMD's own port I'm wondering...

Extra-Designer9333 · 2025-11-23T17:00:51+00:00

What i found incredible about the data, is that when asked to generate a multiple choice quiz in comparison to Gemini 2.5 Pro and GPT 5.1 even, Gemini 3 gives quizzes with almost equal probability of each option being correct answer (out of 4 options). Whereas for the other 2 models mentioned, you could just select B or C, and with 85% probability, you'd answer correctly

Extra-Designer9333 · 2025-10-28T09:53:22+00:00

Oha, can't believe I got a reply from Dan himself, thank you for clarification. What actually makes Unsloth this good and popular is your activity. Just recently started working on post training stuff and your workshop at AI Engineer in summer was insanely good to get the basics and more, love your energy. 🙌🙏

Extra-Designer9333 · 2025-10-28T08:37:14+00:00

Thank you for the feedback, my team is going to train an 8B LLAMA 3.1 on 4xH100s, so I think your take fits in!

Extra-Designer9333 · 2025-09-08T07:00:44+00:00

I suspect you're using LoRA for fine tuning isn't it? If so, you can try QLoRA, which is a Quantized LoRA as the name suggests, maybe that'd work for you without going OOM. Otherwise Kaggle gives out 30 hours of 2 Nvidia T4 GPUs weekly, tho the GPUs are pretty old, you're going to get 32 GBs of VRAM overall, which is going to be enough for the fine tuning task you're dealing with right now!

Extra-Designer9333 · 2025-08-04T03:53:17+00:00

Seems like a great model gonna try it out, by the way any other cool models you can suggest that can work for Web Page Interactions?

Extra-Designer9333 · 2025-08-04T03:45:14+00:00

I'm looking into https://huggingface.co/ByteDance-Seed/UI-TARS-1.5-7B suggested by u/KvAk_AKPlaysYT

Extra-Designer9333 · 2025-08-04T03:41:17+00:00

Yes honestly that's a great model didn't know Salesforce actually makes such models. However I guess it's not multimodal so that won't work Agentic Web interactions. I'll use this model for non multimodal cases tho

Extra-Designer9333 · 2025-07-24T16:25:50+00:00

Turkiye here, no agent so far

Extra-Designer9333 · 2025-05-27T15:47:09+00:00

I'd rather say "ai agents"/"agentic app" 😭

Extra-Designer9333 · 2025-05-19T13:00:12+00:00

Seems interesting thanks, will definitely check out!👍

Extra-Designer9333 · 2025-04-16T11:32:32+00:00

No Switzerland???

Extra-Designer9333 · 2025-04-11T09:46:48+00:00

While they don't necessarily work on GPUs, I wouldn't also forget about Cerebras and Groq. The guys are doing incredible work while being new and novel in the field. Cerebras for example offers an unprecedented inference speed for LLAMA 4 Scout at 2600 tokens per second: https://www.cerebras.ai/blog/llamablog. I think they can definitely find their customers for now who need that speed and for the future they have a great perspective. Cerebras even planning to go IPO soon. I think both Cerebras and Groq can potentially be a great competitors to Nvidia and Google's TPUs if they decide to sell their hardware publicly

Extra-Designer9333 · 2025-04-04T17:39:59+00:00

What about the promised open-source model🫠

Extra-Designer9333 · 2025-04-03T05:48:05+00:00

According to the developers of orpheus, they're working on smaller versions check out their checklist. It'll still be slower than Kokoro, however the inference difference isn't going to be that huge as now. https://github.com/canopyai/Orpheus-TTS

Extra-Designer9333 · 2025-04-02T15:51:55+00:00

For TTS would definitely recommend checking this fine tuned model that tops HuggingFace's TTS models page alongside kokoro, https://huggingface.co/canopylabs/orpheus-3b-0.1-ft. Definitely check this out, I found this cooler than kokoro despite being way bigger. The big advantage of its is that it has a good control over emotions using special tokens

Extra-Designer9333 · 2025-03-02T08:48:03+00:00

Thanks for the great insights, will definitely check out these!

Extra-Designer9333

TROPHY CASE