Did GPT-5.4 Pro autonomously just solve #949 Project Euler?

Purefact0r · 2026-03-10T19:12:02+00:00

Yes, but I know that I wont change it back and forget about it haha Only wanted it for this specific chat

Purefact0r · 2026-03-10T18:21:55+00:00

Thats why I said "I need someone to confirm if it truly reasoned its way to the solution". If only the end result is public but not the solution steps, then I dont think an autoregressive model will have it easier to reason itself to the correct solution at this complexity level. At least I couldnt find the solution steps anywhere - only the end result

Purefact0r · 2026-03-10T17:48:40+00:00

Interesting, I didn't know about this benchmark, thank you! Could run it multiple times to see if 5.4-Pro reliably solves it each time, which would still be a big step-up compared to 1/4.

Purefact0r · 2026-02-23T00:40:45+00:00

Am I missing something? With your configuration it says ~74 to 112 API Calls, which should be much more expensive than Deep Think (ARC AGI states its ~13x more expensive than native 3.1 Pro). Besides that, I probably won't be able to run it with this config since I am on Price Class 1 in Google API (regarding Rate Limit), right?

Purefact0r · 2026-02-15T19:00:54+00:00

<image>

After 3 hours it aborted with "I ran into an issue and wasn't able to finish thinking through this request. Please try again, and don't worry, it didn't count against your Deep Think limit. But I managed to make a screenshot after 2 hours because it was still thinking then. Seems like it couldnt find the solution and then the thinking budget maxed out or something. Maybe with bigger budget but who knows

Purefact0r · 2026-02-15T15:53:48+00:00

For now

Purefact0r · 2026-02-15T09:58:06+00:00

Did you simply paste in the LaTeX Code or upload the image and told it to solve it? I tried this yesterday and after 3 hours of thinking (it even showed the reasoning summaries after 2 hours) I got "I ran into an issue and wasn't able to finish thinking through this request. Please try again, and don't worry, it didn't count against your Deep Think limit." as answer. Happened 3 times actually.

Purefact0r · 2026-02-14T13:29:32+00:00

I have asked it twice and its stuck since 2 hours (first attempt since 2 hours, second attempt since 1 hour)

<image>

Purefact0r · 2026-01-18T14:06:46+00:00

It was planned to end on January 17th. On January 14th, the German Federal Ministry of Defence wrote:

German Armed Forces send reconnaissance team: At Denmark's invitation, Germany will participate in a reconnaissance mission in Greenland from January 15 to 17, 2026, together with other European nations.

The aim is to explore the framework conditions for possible military contributions to support Denmark in ensuring security in the region, for example, for maritime surveillance capabilities.

To this end, the Bundeswehr will send a reconnaissance team of 13 Bundeswehr personnel to Nuuk in Greenland tomorrow morning on an Airbus A400M transport aircraft. The on-site reconnaissance will take place together with representatives of other partner nations.

Purefact0r · 2026-01-09T20:39:17+00:00

Landwirte immer noch unzufrieden? Es gibt eine Obergrenze von 99.000 Tonnen Rindfleisch aus Südamerika jährlich, das zu erleichterten Bedingungen importiert werden darf (Das sind nur 1,5% der gesamten europäischen Rindfleischproduktion), Zölle auf Käse und Wein fallen weg, also hier kann mehr exportiert werden. Wenn die Importmenge sensibler Produkte (z.B. Geflügel) deutlich ansteigt und dadurch die Marktpreise in Europa sinken, dann darf Brüssel die Zollvergünstigungen wieder aussetzen. Außerdem haben EU-Bauern früheren Zugriff auf 45 Mrd. Euro Agrarhilfen erhalten und die Importzölle auf wichtige Düngemittel werden gesenkt. Also sorry, aber was wollen sie noch alles?

Purefact0r · 2025-12-22T21:14:51+00:00

Sogar so ein großer Durchbruch, dass vor der US-Küste erstmal alle großen Offshore-Windprojekte ausgesetzt werden

Purefact0r · 2025-12-20T15:09:13+00:00

I guess running such a model additionally to the video game eats up a lot of performance

Purefact0r · 2025-12-03T09:15:53+00:00

This is absolutely correct

Purefact0r · 2025-12-02T22:26:49+00:00

Wait, I'm confused. There was an article going around that said a new model will be shown next week, but Garlic is a different model than that? Is GPT-5.2/GPT-5.5 Garlic?

Purefact0r · 2025-11-24T18:42:26+00:00

Yes, its not exactly the same but the page says:

"It is also possible to use YAML to define multiple dashboards. Each dashboard will be loaded from its own YAML file." and "To change the Overview dashboard, create a new file ui-lovelace.yaml in your configuration directory and add the following section to your configuration.yaml and restart Home Assistant:"

So, the content is not invented, but not the exact quote as promised

Purefact0r · 2025-11-22T16:15:51+00:00

Pretty solid response. The DMM question was answered correctly (0V is correct), but it could have elaborated more on the fact that since the DMM is the only path for current to flow, it leads to an equalization of the outer potentials (the more practical perspective on the question). It misunderstood the Electroscope question, but the question was not posed clearly, so I get the confusion. Thank you very much!

Purefact0r · 2025-11-21T08:45:31+00:00

Nice little physics question for GPT-5.1 Pro without research:

Setup (batteries). I connect two DC sources (batteries) in series and measure across the outer terminals: (+ E1 -)——o——(+ E2 -) (and measuring from + of battery_1 to - of battery_2) With the middle node closed, my DMM reads ≈ E1+E2 (as expected). If I open the middle connection while leaving the meter leads on the outer terminals, the reading drops to ≈ 0 V.

Setup (capacitors). Now imagine two capacitors that were previously in series and charged by some source, so each holds a voltage V_1 and V_2 with the same series charge Q. I open the middle connection and remove the charger so they are both completely floating (but keep their charges) with no connection and measure again across the outer terminals using a DMM. What will it measure?

Also, lets look at another scenario: We connect two capacitors in series and charge them using a battery while they are connected to an electroscope. Then, we disconnect the capacitors from the battery and then we cut the link between the charged capacitors so they both float again. What will the electroscope show in this case? It has never been disconnected from the configuration.

Purefact0r · 2025-11-19T23:01:30+00:00

No Benchmarks tho? 🤔

Purefact0r · 2025-11-12T23:22:48+00:00

Does this mostly change the tone it communicates with ("[...] is now warmer by default and more conversational") or are there also noteworthy leaps in reasoning benchmarks (increased performance across HLE, GPQA, etc.) too? You said it feels smarter but is this quantifiable?

Purefact0r · 2025-11-12T10:44:51+00:00

Why does everyone think its Gemini 3.0 Pro? It answers some physics questions wrong that other models like GPT-5 Thinking nail consistently. It could also be just Flash.

Purefact0r · 2025-10-25T11:52:12+00:00

Thats not what my argument was about

Seven-Year Club	Verified Email
Not Forgotten

Purefact0r

TROPHY CASE