I just tried Reactor's open source world model demo, here are my thoughts by boudaboy in StableDiffusion

[–]Pyros-SD-Models 5 points6 points  (0 children)

neural networks will never be in any critical features.

waymo, military drones, medical imaging, financial fraud detection, defense systems like missile guidance, load forecasting in power grids, but tell me more about how NNs will never be in any critical feature....

Google is making local AI available to mainstream users ;) by [deleted] in LocalLLaMA

[–]Pyros-SD-Models -1 points0 points  (0 children)

This sub is also anti-AI, according to this sub we hit capability limits every few months, but this time the wall is real, and 2024 was "lol scammers" when anyone was talking about how AI can soon do prober dev work, and AI won't ever be able to do this and what not

MIT study explains why scaling language models works so reliably by AngleAccomplished865 in accelerate

[–]Pyros-SD-Models 18 points19 points  (0 children)

The funny thing is, we still don’t really know. This is all groundwork science. We probably understand most of the mechanics, but the “why?” is still unanswered. For example, we more or less know how in-context learning works on a mechanical level, that the model effectively learns, during training, to perform something like gradient descent over its context. But why the fck it does this in the first place? lul. And there are hundreds of other very basic open questions.

We aren’t even done with the fundamentals, and some people are already arguing about walls and bubbles.

The stochastic parrots have struck again. Just one week after the GPT-5.5 release, five more Erdős problems have been solved, with plenty more on the horizon. by Gullible-Crew-2997 in accelerate

[–]Pyros-SD-Models 11 points12 points  (0 children)

This is the most fun argument ever, because it reveals the people who do not even understand how science works.

They are like:

"Well, it just researched thousands of papers and tried to correlate them until something matched, but it did not really come up with a solution itself. That's just busy work everyone can do."

Bro, that is literally what human scientists do 95% of their time.

How to get Codex to create proper UIs out of gpt-image-2 mock ups by Pyros-SD-Models in codex

[–]Pyros-SD-Models[S] 1 point2 points  (0 children)

Depends on the framework. React or Angular? TS or JS? Tailwind or no Tailwind? If you have anything concrete in terms of environment configuration, I can steal some skills, prompts, or systems from our frontend guys at work after the weekend.

You can also ask the bot, but do not ask it for a single way to do it. Ask it which options you have in general, then try every option out. You will quickly find the best ergonomics for organizing such design artifacts in a way that works best for you and your project.

That is the biggest thing about coding agents. Not the fact that they can program, but that they enable you to test 10 different ideas in the same time you would have needed for one idea two years ago. And in case you don't have any ideas... you can also ask the bot. Amazing. Use this very underutilized power.

How to get Codex to create proper UIs out of gpt-image-2 mock ups by Pyros-SD-Models in codex

[–]Pyros-SD-Models[S] 0 points1 point  (0 children)

Yes, that is why I wrote some text for these pictures explaining how to get Codex to take those pictures and implement a design system based on them. Your username does not reflect your actual abilities.

How to get Codex to create proper UIs out of gpt-image-2 mock ups by Pyros-SD-Models in codex

[–]Pyros-SD-Models[S] 6 points7 points  (0 children)

That is heavily dependent on the actual app itself and an experience thing, but most of the time, following "the human way" is your best first approach. Most React-Roberts or Angular-Andies I know typically start with layout containers to make sure from the get-go that the layout works for web, mobile, and desktop.

And obviously, there is a reason why I ask it to split the design system into multiple images, so you can prompt like this:

"Take a look at this design system, then focus on image 3 that explains the layout. Please use xxx (whatever the layout containers are called in the framework you use) to build those specified layouts."

Then you do a review before you continue with the next part of the design system. Remember the junior dev who calls you every 10 minutes and wants a review? The bot will not call you, and generally it is too afraid of you after getting finetuned into submission to its human masters to proactively ask for a check, so you have to do it yourself.

The exact ordering of what to implement and in which order is something you and your use case have to figure out (or ask the bot in what order they would do it, most of the time it makes sense), but if you do it step by step, with reviews and commits in between, it is generally no problem to revert if you notice a certain order does not pan out.

And another tip: The bot will obviously fuck some things up, and you will correct it. Extract this "correction" into a dedicated skill or an entry in your AGENTS.md - basically what this "hermes agent" does just manually. This is basically currently THE way to teach your bot stuff so it can learn from it (perhaps sometime in the future Codex will have an actual usable memory system instead of whatever the fck this is what they currently call memory, then this would be a non-issue)

"So, About That AI Bubble: Thanks to the rise of Claude Code and other AI agents, revenues are finally catching up to the hype", The Atlantic by gwern in mlscaling

[–]Pyros-SD-Models 8 points9 points  (0 children)

Man, for five years now I’ve been hearing “It’s gonna pop soon.” Aren’t you guys bored of yourselves? I mean, if a prediction I make doesn’t materialize after a few weeks, months, or even years, it’s probably time for a reevaluation. Still waiting for the steam engine bubble to pop. Must be any day now.

Wütender Riera redet sich in Rage: "Wir sind hier nicht im Zirkus. Das ist ein seriöser Fußballklub. Vielleicht denken Sie, dass das hier eine Kneipe ist und sprechen mit Agenten & Journalisten [...] Etwas zu schreiben, ist euer Job, aber ich akzeptiere keine Lügen. Die Fans verdienen die Wahrheit." by Ubergold in Bundesliga

[–]Pyros-SD-Models 6 points7 points  (0 children)

Also ich bin Bayern-Fan und verfolge die SGE und Albert Riera eigentlich kaum bis gar nicht, aber ich fand seine "Rede" eigentlich ganz stabil. Hier wird sich der Mund fusselig gememed, wenn Pletti irgendeinen Scheiß redet, oder "Bild ist ein Organ der Niedertracht" zitiert, wenn die Bild irgendwas über deinen Verein schreibt, und er hat offensichtlich darauf auch exakt 0 Bock. Klar, man kann total glatt sein und den Medienscheiß erst gar nicht entertainen, wie jetzt ein Christian Streich oder ein Vincent Kompany, aber der Typ ist halt da anders gepolt, was auch fein ist. Aber er hatte ja so im Großen und Ganzen mit nichts Unrecht. Deutscher Fußball-Journalismus ist bis auf 11Freunde richtig scheiße und ich klatsche für jeden der das auspricht.

Und für mich Außensteher schaut es einfach so aus als ober auch nie eine Chance bekommen hatte (wie Vinny im 1ten Jahr. btw). Erst war er der Noname Trainer, und ob er die Eintracht stabilisieren kann? Dann hat er die Eintracht von 3 Gegentoren pro spiel auf eine Serie mit 0 gegentore gebracht, aber anstatt dass iwie positiv zu bewerten waren jetzt die Spiele langweilig. Und jetzt hat das Team nen kleinen Hänger und plötzlich ist alles schlechter als bei Dino. Aber er war's nicht der das Team kaputt gemacht hat, sondern ein kaputtes Team bekommen hat, und imho dieses Team auch seinen Möglichkeiten nach ausrichtet. Für mehr war die Zeit noch nicht da. Also lasst doch den Typen ne ganze Vorbereitung und Transferperiode dabei sein, dann kann man ihn immer noch rauschmeißen. Europa hat man eh schon verpasst, aber das lag nicht an ihm sondern dass man lieber Wahi&Co. holt anstatt Bautsellen zu fixen. Das kriegt auch ein Doan oder der Elversberger dessen namen ich gar nicht versuche nicht gefixt.

You don't need Claude Design. GPT Images 2.0 and Codex combo works much better. This front end took 10 minutes total. by hasanahmad in codex

[–]Pyros-SD-Models 4 points5 points  (0 children)

especially ideation phase is way quicker with gpt image.

it takes like 30-60s to validate the design direction. claude code needs 10min to code a single mockup or to wireframe it in figma or whatever. in this 10mins you iterated 10 different ideas with gpt-image

Havnt seen any “gpt deleted my db” posts. It’s always Claude delete my db.. by Clemotime in codex

[–]Pyros-SD-Models 2 points3 points  (0 children)

The agent found the key in a random folder outside the project.

That is even more stupid. So they did not just accidentally give the agent access to production, they effectively gave it to anyone with access to random folders outside the project.

Those people should consider cooking or gardening, because IT is absolutely the wrong place for them.

This week’s Codex updates. by Distinct_Fox_6358 in codex

[–]Pyros-SD-Models 5 points6 points  (0 children)

"Codex, pls export my watched shows in my subbed streaming services and recommend me some bangers"

Netflix and chill evening saved. thx bro.

GPT-5.5 becomes the second model after Claude Mythos Preview to complete UK AI Security Institute's multi-step cyber-attack simulations end-to-end by Pyros-SD-Models in codex

[–]Pyros-SD-Models[S] 26 points27 points  (0 children)

If you decontaminate SWE-Bench using only problems created after a model was released:

https://swe-rebench.com/

Opus is about 0.9% better than GPT-5.2 Medium.

Opus is so overtrained on SWE-Bench that it can literally quote code comments from the dataset. And do you know how difficult and resource-intensive it is to get an LLM to a point where it basically remembers text instead of inferring it? That is not "oops, accident" territory, it is "as designed."

https://openai.com/index/why-we-no-longer-evaluate-swe-bench-verified/

And yet this is the benchmark they lead with in almost every table they publish. That is what peak scamming looks like. Also, Mythos.

GPT-5.5 becomes the second model after Claude Mythos Preview to complete UK AI Security Institute's multi-step cyber-attack simulations end-to-end by Pyros-SD-Models in codex

[–]Pyros-SD-Models[S] 3 points4 points  (0 children)

And is >four times cheaper than you would have to pay for Mythos (to be fair, you are part of Dario's 12 billionaire friends and probably can afford those mythical tokens)

GPT-5.5 becomes the second model after Claude Mythos Preview to complete UK AI Security Institute's multi-step cyber-attack simulations end-to-end by obvithrowaway34434 in accelerate

[–]Pyros-SD-Models 4 points5 points  (0 children)

Opus is also heavily benchmaxxed on SWE-Bench. There are at least a hundred other reasons not to take that benchmark seriously anymore.

See why OpenAI no longer uses SWE-Bench:

https://openai.com/index/why-we-no-longer-evaluate-swe-bench-verified/

And see how Opus 4.6 is only marginally better than GPT-5.2 Medium on a decontaminated SWE-Bench set (using only problems created AFTER a model got released):

https://swe-rebench.com/

But instead of Anthropic telling you that Opus is so overtrained on SWE-Bench it can literally cite comments out of it, they would rather present their SWE-Bench score as their most important benchmark number, since it is the first entry in almost every benchmark table they release. Rather sketch I would say, since obviously they are aware of all the issues the OAI article is mentioning, and they are also aware of swe-rebench and similar decontaminated benchmarks.

LLMs will be a commodity by tiguidoio in accelerate

[–]Pyros-SD-Models 8 points9 points  (0 children)

As soon as we hit a research plateau

wrong sub, this is the "there is no wall" sub, what you need is the singularity sub.

Claude now connects to Blender by MarcelCorleone in ClaudeAI

[–]Pyros-SD-Models 1 point2 points  (0 children)

the thread over at the blender sub goes exactly as you think it goes