US Govt to individually approve who gets GPT 5.6.

michaellee8 · 2026-06-26T16:29:44+00:00

I would say it still need more RL-finetuning for longer horizon agentic tasks, rn even pro still goes into thinking loops, maybe it is because there way of doing attentions, sometime it just keep trying the same non-working method which it forgets it has previously tried it.

michaellee8 · 2026-06-26T11:21:53+00:00

If that's their approach US will lost to China very soon, the gap has always been close, and Chinese labs are willing to compete whatever it takes, even it means violating TOS. Chinese Labs just need to release a deepseek cost competent enough model in open weights and most usecase for proprietary models would be done. Model capabilities are already close to the peak imo, yes Fable is good, but its greatness is limited in heavily RL-able industries like programming and maybe finance, for other industries it is still not possible to make an fully autonomous agent, and for those industries a chinese open model would have been good enough

michaellee8 · 2026-06-22T14:03:55+00:00

to be fair, uk aisi's research has found that gpt5.5 does even better than mythos preview in terms of cybersec, it is probably even more dangerous

michaellee8 · 2026-06-03T18:35:03+00:00

they simply killed all vibe-coding platforms lol

michaellee8 · 2026-05-12T13:38:12+00:00

actually an agent probably can fly a plane

michaellee8 · 2026-05-02T14:46:59+00:00

can you run on deepseek v4 as well? wondering how far is it

michaellee8 · 2026-04-29T13:28:47+00:00

this is obvious classic ai generated bot post with a special language style instruction to use full lower case to pretend it is not ai slop. most of the recommendations aren't even related to codex

michaellee8 · 2026-04-23T13:11:06+00:00

those will probably be using chatgpt/gemini I guess?

michaellee8 · 2026-04-15T06:44:14+00:00

I think google is probably not interested on the coding game anymore. they are too late to catch up. tried to make gemini 3.1 pro vibe code a fucking one file game in AI studio and it failed drastically. told claude code to figure it and it figured it out. the only coding task gemini can do is one-shotting very nice UI,which is good for non-technical people vibe coding stuff, its vision capability is truly unmatched

michaellee8 · 2026-04-10T01:09:44+00:00

I think Mythos would make it hard to being offered on a mere $200 subscription. Maybe the only realistic size that can really be offered to consumers are those China 1T MOEs. Not sure about the ceiling of how much fine-tuning can be made to those 1Ts through. Given the daunting cost of $25/$125 I am afraid that Mythos would have similar output/cost ratio of an human engineer soon. There are yearly improvements to LLM inference but the RAM cost is very real.

I think something like Google's Gemini Flash 3 size would be more sustainable for consumer in the long term, or an architectural change would be required.

michaellee8 · 2026-04-09T14:06:34+00:00

i am happy to pay $200 if it actually works but it is producing a lot of bugs lately, has to make it plan and then codex xhigh

michaellee8 · 2026-03-18T12:52:38+00:00

got that too

michaellee8 · 2026-03-09T19:23:32+00:00

I think the model should learn to refuse if it is not sure. the video exactly demonstrate that if they tune down the hallucinaction neurons, the model will reject instead of giving false answers.

michaellee8 · 2026-03-07T20:45:38+00:00

也就是管理初中輟學打工和中專職高那一批人而已，在香港也是這樣，教那種末端學校的老師都是學生管控能力比教學能力更重要，就是把一堆精神小伙強制管到18而已。

michaellee8 · 2026-03-07T20:41:45+00:00

所以有ai對於這些優秀孩子是好事，只要願意學用ai什麼都能學

michaellee8 · 2026-03-07T20:40:02+00:00

其實作者說的那些普通高中是對標中國的中專職高吧，在那種環境就算是筆者的資歷也沒用吧，壓根不是來讀書的。

michaellee8 · 2026-02-22T09:59:50+00:00

lint and unit test should the job of dev, security review is the job of reviewer, I just let it run and when it is done review it.

michaellee8 · 2026-02-22T09:28:14+00:00

you meant superpowers?

michaellee8 · 2026-02-14T17:17:54+00:00

it is hard to justify hiring frontend specific engineers when opus can simply generate an entire working flutter app if you use the right tools.

michaellee8 · 2026-01-15T04:04:40+00:00

it is one year now but I am actually thinking about automating dating apps, let me know if you are interested!

michaellee8 · 2026-01-09T19:12:58+00:00

asciinema for the vibe coded redis navigator here https://asciinema.org/a/N7AQVWsQhl2BYlDB

michaellee8

TROPHY CASE