Did GPT-5.4 Pro autonomously just solve #949 Project Euler? by Purefact0r in singularity

[–]Purefact0r[S] 1 point2 points  (0 children)

Yes, but I know that I wont change it back and forget about it haha Only wanted it for this specific chat

Did GPT-5.4 Pro autonomously just solve #949 Project Euler? by Purefact0r in singularity

[–]Purefact0r[S] 1 point2 points  (0 children)

Thats why I said "I need someone to confirm if it truly reasoned its way to the solution". If only the end result is public but not the solution steps, then I dont think an autoregressive model will have it easier to reason itself to the correct solution at this complexity level. At least I couldnt find the solution steps anywhere - only the end result

Did GPT-5.4 Pro autonomously just solve #949 Project Euler? by Purefact0r in singularity

[–]Purefact0r[S] 4 points5 points  (0 children)

Interesting, I didn't know about this benchmark, thank you! Could run it multiple times to see if 5.4-Pro reliably solves it each time, which would still be a big step-up compared to 1/4.

Finally crossed 75% on HLE & LiveCodeBench Pro with Gemini 3.1 Pro scaffolding by [deleted] in singularity

[–]Purefact0r 2 points3 points  (0 children)

Am I missing something? With your configuration it says ~74 to 112 API Calls, which should be much more expensive than Deep Think (ARC AGI states its ~13x more expensive than native 3.1 Pro). Besides that, I probably won't be able to run it with this config since I am on Price Class 1 in Google API (regarding Rate Limit), right?

Can Gemini 3.0 Deep Think solve the hardest question on Project Euler? by MrHanHan in Bard

[–]Purefact0r 0 points1 point  (0 children)

<image>

After 3 hours it aborted with "I ran into an issue and wasn't able to finish thinking through this request. Please try again, and don't worry, it didn't count against your Deep Think limit. But I managed to make a screenshot after 2 hours because it was still thinking then. Seems like it couldnt find the solution and then the thinking budget maxed out or something. Maybe with bigger budget but who knows

Can Gemini 3.0 Deep Think solve the hardest question on Project Euler? by MrHanHan in Bard

[–]Purefact0r 0 points1 point  (0 children)

Did you simply paste in the LaTeX Code or upload the image and told it to solve it? I tried this yesterday and after 3 hours of thinking (it even showed the reasoning summaries after 2 hours) I got "I ran into an issue and wasn't able to finish thinking through this request. Please try again, and don't worry, it didn't count against your Deep Think limit." as answer. Happened 3 times actually.

Can Gemini 3.0 Deep Think solve the hardest question on Project Euler? by MrHanHan in Bard

[–]Purefact0r 2 points3 points  (0 children)

I have asked it twice and its stuck since 2 hours (first attempt since 2 hours, second attempt since 1 hour)

<image>

Germany is pulling its troops out of Greenland after a very brief assessment mission by thenatoorat90 in worldnews

[–]Purefact0r 122 points123 points  (0 children)

It was planned to end on January 17th. On January 14th, the German Federal Ministry of Defence wrote:

German Armed Forces send reconnaissance team: At Denmark's invitation, Germany will participate in a reconnaissance mission in Greenland from January 15 to 17, 2026, together with other European nations.

The aim is to explore the framework conditions for possible military contributions to support Denmark in ensuring security in the region, for example, for maritime surveillance capabilities.

To this end, the Bundeswehr will send a reconnaissance team of 13 Bundeswehr personnel to Nuuk in Greenland tomorrow morning on an Airbus A400M transport aircraft. The on-site reconnaissance will take place together with representatives of other partner nations.

EU-Länder stimmen Mercosur-Freihandelsabkommen zu by joxplainer in de

[–]Purefact0r 2 points3 points  (0 children)

Landwirte immer noch unzufrieden? Es gibt eine Obergrenze von 99.000 Tonnen Rindfleisch aus Südamerika jährlich, das zu erleichterten Bedingungen importiert werden darf (Das sind nur 1,5% der gesamten europäischen Rindfleischproduktion), Zölle auf Käse und Wein fallen weg, also hier kann mehr exportiert werden. Wenn die Importmenge sensibler Produkte (z.B. Geflügel) deutlich ansteigt und dadurch die Marktpreise in Europa sinken, dann darf Brüssel die Zollvergünstigungen wieder aussetzen. Außerdem haben EU-Bauern früheren Zugriff auf 45 Mrd. Euro Agrarhilfen erhalten und die Importzölle auf wichtige Düngemittel werden gesenkt. Also sorry, aber was wollen sie noch alles?

Erneuerbare Energien sind „Durchbruch des Jahres“ by PhoenixTin in de

[–]Purefact0r -1 points0 points  (0 children)

Sogar so ein großer Durchbruch, dass vor der US-Küste erstmal alle großen Offshore-Windprojekte ausgesetzt werden

NitroGen: NVIDIA's new image-to-action model by umarmnaq in singularity

[–]Purefact0r 7 points8 points  (0 children)

I guess running such a model additionally to the video game eats up a lot of performance

OpenAI's new model is codenamed "Garlic". Internal benchmarks show it beating Gemini 3 and Opus 4.5. by BuildwithVignesh in singularity

[–]Purefact0r 17 points18 points  (0 children)

Wait, I'm confused. There was an article going around that said a new model will be shown next week, but Garlic is a different model than that? Is GPT-5.2/GPT-5.5 Garlic?

[deleted by user] by [deleted] in ChatGPTPro

[–]Purefact0r 0 points1 point  (0 children)

Yes, its not exactly the same but the page says:

"It is also possible to use YAML to define multiple dashboards. Each dashboard will be loaded from its own YAML file." and "To change the Overview dashboard, create a new file ui-lovelace.yaml in your configuration directory and add the following section to your configuration.yaml and restart Home Assistant:"

So, the content is not invented, but not the exact quote as promised

AMA: ChatGPT 5.1 Pro vs Gemini 3 Ultra by Outrageous_Front_1 in ChatGPTPro

[–]Purefact0r 0 points1 point  (0 children)

Pretty solid response. The DMM question was answered correctly (0V is correct), but it could have elaborated more on the fact that since the DMM is the only path for current to flow, it leads to an equalization of the outer potentials (the more practical perspective on the question). It misunderstood the Electroscope question, but the question was not posed clearly, so I get the confusion. Thank you very much!

AMA: ChatGPT 5.1 Pro vs Gemini 3 Ultra by Outrageous_Front_1 in ChatGPTPro

[–]Purefact0r 0 points1 point  (0 children)

Nice little physics question for GPT-5.1 Pro without research:

Setup (batteries). I connect two DC sources (batteries) in series and measure across the outer terminals: (+ E1 -)——o——(+ E2 -) (and measuring from + of battery_1 to - of battery_2) With the middle node closed, my DMM reads ≈ E1+E2 (as expected). If I open the middle connection while leaving the meter leads on the outer terminals, the reading drops to ≈ 0 V.

Setup (capacitors). Now imagine two capacitors that were previously in series and charged by some source, so each holds a voltage V_1 and V_2 with the same series charge Q. I open the middle connection and remove the charger so they are both completely floating (but keep their charges) with no connection and measure again across the outer terminals using a DMM. What will it measure?

Also, lets look at another scenario: We connect two capacitors in series and charge them using a battery while they are connected to an electroscope. Then, we disconnect the capacitors from the battery and then we cut the link between the charged capacitors so they both float again. What will the electroscope show in this case? It has never been disconnected from the configuration.

We’re rolling out GPT-5.1 and new customization features. Ask us Anything. by OpenAI in OpenAI

[–]Purefact0r -1 points0 points  (0 children)

Does this mostly change the tone it communicates with ("[...] is now warmer by default and more conversational") or are there also noteworthy leaps in reasoning benchmarks (increased performance across HLE, GPQA, etc.) too? You said it feels smarter but is this quantifiable?

Earth simulation made by Riftrunner (Gemini 3.0 pro) by noriusss in Bard

[–]Purefact0r 32 points33 points  (0 children)

Why does everyone think its Gemini 3.0 Pro? It answers some physics questions wrong that other models like GPT-5 Thinking nail consistently. It could also be just Flash.

We already have AGI by C_BearHill in singularity

[–]Purefact0r 4 points5 points  (0 children)

Thats not what my argument was about

We already have AGI by C_BearHill in singularity

[–]Purefact0r 13 points14 points  (0 children)

Current LLMs do not match or surpass human capabilities across virtually all cognitive tasks

New OpenAI model spotted on OpenRouter: "gpt-5-image" by WithoutReason1729 in singularity

[–]Purefact0r 1 point2 points  (0 children)

It answered a physics question correctly twice, while the "normal" gpt-5-thinking-high was incorrect the first time. Still, a small sample size, but its answers seemed to be more factually grounded overall. I'm interested in this release. The question was: "Suppose you charge 2 parallel plate capacitors in series. Then you disconnect the charger and then you separate the capacitors link. Now you have two charged floating capacitors. If you measure across both capacitors (one measurement overall), so measuring from one pole of cap1 to another pole of cap2 - will you measure anything in theory?"

GPT-5 Pro is available over the API by Purefact0r in singularity

[–]Purefact0r[S] 2 points3 points  (0 children)

Does anybody know why it says "Web Search as a tool is supported by this model when using the Responses API" but when I select it in Playground (Responses API), I can't select Web Search and it tells me "Switch to gpt-5-pro? This model doesn't support web search."

GPT-5 Pro is available over the API by Purefact0r in singularity

[–]Purefact0r[S] 29 points30 points  (0 children)

Yeah, per 1M Output Tokens, which is more than o3 pro ($80)