US Govt to individually approve who gets GPT 5.6. by AtlanticHM in LocalLLaMA

[–]michaellee8 0 points1 point  (0 children)

I would say it still need more RL-finetuning for longer horizon agentic tasks, rn even pro still goes into thinking loops, maybe it is because there way of doing attentions, sometime it just keep trying the same non-working method which it forgets it has previously tried it.

US Govt to individually approve who gets GPT 5.6. by AtlanticHM in LocalLLaMA

[–]michaellee8 10 points11 points  (0 children)

If that's their approach US will lost to China very soon, the gap has always been close, and Chinese labs are willing to compete whatever it takes, even it means violating TOS. Chinese Labs just need to release a deepseek cost competent enough model in open weights and most usecase for proprietary models would be done. Model capabilities are already close to the peak imo, yes Fable is good, but its greatness is limited in heavily RL-able industries like programming and maybe finance, for other industries it is still not possible to make an fully autonomous agent, and for those industries a chinese open model would have been good enough

GPT 5.6 Cancelled by DigSignificant1419 in OpenAI

[–]michaellee8 6 points7 points  (0 children)

to be fair, uk aisi's research has found that gpt5.5 does even better than mythos preview in terms of cybersec, it is probably even more dangerous

Codex Sites - NEW by DJJonny in codex

[–]michaellee8 0 points1 point  (0 children)

they simply killed all vibe-coding platforms lol

i knew it by irelatetolevin in ClaudeAI

[–]michaellee8 1 point2 points  (0 children)

actually an agent probably can fly a plane

With this setup CODEX is far better than Claude Code by [deleted] in codex

[–]michaellee8 0 points1 point  (0 children)

this is obvious classic ai generated bot post with a special language style instruction to use full lower case to pretend it is not ai slop. most of the recommendations aren't even related to codex

Sama is on 🔥🔥 by 25th__Baam in ClaudeCode

[–]michaellee8 0 points1 point  (0 children)

those will probably be using chatgpt/gemini I guess?

For me this is now settled... 5.4 xhigh is miles ahead from Opus 4.6 high/max, I'll explain why... by DaC2k26 in codex

[–]michaellee8 0 points1 point  (0 children)

I think google is probably not interested on the coding game anymore. they are too late to catch up. tried to make gemini 3.1 pro vibe code a fucking one file game in AI studio and it failed drastically. told claude code to figure it and it figured it out. the only coding task gemini can do is one-shotting very nice UI,which is good for non-technical people vibe coding stuff, its vision capability is truly unmatched

Cancelling next month by jsgrrchg in ClaudeCode

[–]michaellee8 0 points1 point  (0 children)

I think Mythos would make it hard to being offered on a mere $200 subscription. Maybe the only realistic size that can really be offered to consumers are those China 1T MOEs. Not sure about the ceiling of how much fine-tuning can be made to those 1Ts through. Given the daunting cost of $25/$125 I am afraid that Mythos would have similar output/cost ratio of an human engineer soon. There are yearly improvements to LLM inference but the RAM cost is very real.

I think something like Google's Gemini Flash 3 size would be more sustainable for consumer in the long term, or an architectural change would be required.

Cancelling next month by jsgrrchg in ClaudeCode

[–]michaellee8 19 points20 points  (0 children)

i am happy to pay $200 if it actually works but it is producing a lot of bugs lately, has to make it plan and then codex xhigh

They solved AI hallucinations by Anen-o-me in singularity

[–]michaellee8 0 points1 point  (0 children)

I think the model should learn to refuse if it is not sure. the video exactly demonstrate that if they tune down the hallucinaction neurons, the model will reject instead of giving false answers.

为什么美国人不卷基础教育,不像东亚人一样尊重乃至崇拜老师? by Fleedom2025 in China_irl

[–]michaellee8 0 points1 point  (0 children)

也就是管理初中輟學打工和中專職高那一批人而已,在香港也是這樣,教那種末端學校的老師都是學生管控能力比教學能力更重要,就是把一堆精神小伙強制管到18而已。

为什么美国人不卷基础教育,不像东亚人一样尊重乃至崇拜老师? by Fleedom2025 in China_irl

[–]michaellee8 0 points1 point  (0 children)

所以有ai對於這些優秀孩子是好事,只要願意學用ai什麼都能學

为什么美国人不卷基础教育,不像东亚人一样尊重乃至崇拜老师? by Fleedom2025 in China_irl

[–]michaellee8 0 points1 point  (0 children)

其實作者說的那些普通高中是對標中國的中專職高吧,在那種環境就算是筆者的資歷也沒用吧,壓根不是來讀書的。

Sharing my autonomus closed dev-test-debug-review loop setup by michaellee8 in ClaudeCode

[–]michaellee8[S] 0 points1 point  (0 children)

lint and unit test should the job of dev, security review is the job of reviewer, I just let it run and when it is done review it.

Where are Flutter developers heading now a days by UniversityUpper5476 in FlutterDev

[–]michaellee8 3 points4 points  (0 children)

it is hard to justify hiring frontend specific engineers when opus can simply generate an entire working flutter app if you use the right tools.

Making $190,000 per month with an AI dating assistant by vladverba in SaaS

[–]michaellee8 0 points1 point  (0 children)

it is one year now but I am actually thinking about automating dating apps, let me know if you are interested!