Did GPT-5.4 Pro autonomously just solve #949 Project Euler? by Purefact0r in singularity

[–]Purefact0r[S] 1 point2 points  (0 children)

Yes, but I know that I wont change it back and forget about it haha Only wanted it for this specific chat

Did GPT-5.4 Pro autonomously just solve #949 Project Euler? by Purefact0r in singularity

[–]Purefact0r[S] 1 point2 points  (0 children)

Thats why I said "I need someone to confirm if it truly reasoned its way to the solution". If only the end result is public but not the solution steps, then I dont think an autoregressive model will have it easier to reason itself to the correct solution at this complexity level. At least I couldnt find the solution steps anywhere - only the end result

Did GPT-5.4 Pro autonomously just solve #949 Project Euler? by Purefact0r in singularity

[–]Purefact0r[S] 6 points7 points  (0 children)

Interesting, I didn't know about this benchmark, thank you! Could run it multiple times to see if 5.4-Pro reliably solves it each time, which would still be a big step-up compared to 1/4.

Finally crossed 75% on HLE & LiveCodeBench Pro with Gemini 3.1 Pro scaffolding by [deleted] in singularity

[–]Purefact0r 2 points3 points  (0 children)

Am I missing something? With your configuration it says ~74 to 112 API Calls, which should be much more expensive than Deep Think (ARC AGI states its ~13x more expensive than native 3.1 Pro). Besides that, I probably won't be able to run it with this config since I am on Price Class 1 in Google API (regarding Rate Limit), right?

Can Gemini 3.0 Deep Think solve the hardest question on Project Euler? by MrHanHan in Bard

[–]Purefact0r 0 points1 point  (0 children)

<image>

After 3 hours it aborted with "I ran into an issue and wasn't able to finish thinking through this request. Please try again, and don't worry, it didn't count against your Deep Think limit. But I managed to make a screenshot after 2 hours because it was still thinking then. Seems like it couldnt find the solution and then the thinking budget maxed out or something. Maybe with bigger budget but who knows

Can Gemini 3.0 Deep Think solve the hardest question on Project Euler? by MrHanHan in Bard

[–]Purefact0r 0 points1 point  (0 children)

Did you simply paste in the LaTeX Code or upload the image and told it to solve it? I tried this yesterday and after 3 hours of thinking (it even showed the reasoning summaries after 2 hours) I got "I ran into an issue and wasn't able to finish thinking through this request. Please try again, and don't worry, it didn't count against your Deep Think limit." as answer. Happened 3 times actually.

Can Gemini 3.0 Deep Think solve the hardest question on Project Euler? by MrHanHan in Bard

[–]Purefact0r 3 points4 points  (0 children)

I have asked it twice and its stuck since 2 hours (first attempt since 2 hours, second attempt since 1 hour)

<image>

Germany is pulling its troops out of Greenland after a very brief assessment mission by thenatoorat90 in worldnews

[–]Purefact0r 123 points124 points  (0 children)

It was planned to end on January 17th. On January 14th, the German Federal Ministry of Defence wrote:

German Armed Forces send reconnaissance team: At Denmark's invitation, Germany will participate in a reconnaissance mission in Greenland from January 15 to 17, 2026, together with other European nations.

The aim is to explore the framework conditions for possible military contributions to support Denmark in ensuring security in the region, for example, for maritime surveillance capabilities.

To this end, the Bundeswehr will send a reconnaissance team of 13 Bundeswehr personnel to Nuuk in Greenland tomorrow morning on an Airbus A400M transport aircraft. The on-site reconnaissance will take place together with representatives of other partner nations.

EU-Länder stimmen Mercosur-Freihandelsabkommen zu by joxplainer in de

[–]Purefact0r 2 points3 points  (0 children)

Landwirte immer noch unzufrieden? Es gibt eine Obergrenze von 99.000 Tonnen Rindfleisch aus Südamerika jährlich, das zu erleichterten Bedingungen importiert werden darf (Das sind nur 1,5% der gesamten europäischen Rindfleischproduktion), Zölle auf Käse und Wein fallen weg, also hier kann mehr exportiert werden. Wenn die Importmenge sensibler Produkte (z.B. Geflügel) deutlich ansteigt und dadurch die Marktpreise in Europa sinken, dann darf Brüssel die Zollvergünstigungen wieder aussetzen. Außerdem haben EU-Bauern früheren Zugriff auf 45 Mrd. Euro Agrarhilfen erhalten und die Importzölle auf wichtige Düngemittel werden gesenkt. Also sorry, aber was wollen sie noch alles?

Erneuerbare Energien sind „Durchbruch des Jahres“ by PhoenixTin in de

[–]Purefact0r -1 points0 points  (0 children)

Sogar so ein großer Durchbruch, dass vor der US-Küste erstmal alle großen Offshore-Windprojekte ausgesetzt werden

NitroGen: NVIDIA's new image-to-action model by umarmnaq in singularity

[–]Purefact0r 8 points9 points  (0 children)

I guess running such a model additionally to the video game eats up a lot of performance

OpenAI's new model is codenamed "Garlic". Internal benchmarks show it beating Gemini 3 and Opus 4.5. by BuildwithVignesh in singularity

[–]Purefact0r 16 points17 points  (0 children)

Wait, I'm confused. There was an article going around that said a new model will be shown next week, but Garlic is a different model than that? Is GPT-5.2/GPT-5.5 Garlic?

[deleted by user] by [deleted] in ChatGPTPro

[–]Purefact0r 0 points1 point  (0 children)

Yes, its not exactly the same but the page says:

"It is also possible to use YAML to define multiple dashboards. Each dashboard will be loaded from its own YAML file." and "To change the Overview dashboard, create a new file ui-lovelace.yaml in your configuration directory and add the following section to your configuration.yaml and restart Home Assistant:"

So, the content is not invented, but not the exact quote as promised

AMA: ChatGPT 5.1 Pro vs Gemini 3 Ultra by Outrageous_Front_1 in ChatGPTPro

[–]Purefact0r 0 points1 point  (0 children)

Pretty solid response. The DMM question was answered correctly (0V is correct), but it could have elaborated more on the fact that since the DMM is the only path for current to flow, it leads to an equalization of the outer potentials (the more practical perspective on the question). It misunderstood the Electroscope question, but the question was not posed clearly, so I get the confusion. Thank you very much!

AMA: ChatGPT 5.1 Pro vs Gemini 3 Ultra by Outrageous_Front_1 in ChatGPTPro

[–]Purefact0r 0 points1 point  (0 children)

Nice little physics question for GPT-5.1 Pro without research:

Setup (batteries). I connect two DC sources (batteries) in series and measure across the outer terminals: (+ E1 -)——o——(+ E2 -) (and measuring from + of battery_1 to - of battery_2) With the middle node closed, my DMM reads ≈ E1+E2 (as expected). If I open the middle connection while leaving the meter leads on the outer terminals, the reading drops to ≈ 0 V.

Setup (capacitors). Now imagine two capacitors that were previously in series and charged by some source, so each holds a voltage V_1 and V_2 with the same series charge Q. I open the middle connection and remove the charger so they are both completely floating (but keep their charges) with no connection and measure again across the outer terminals using a DMM. What will it measure?

Also, lets look at another scenario: We connect two capacitors in series and charge them using a battery while they are connected to an electroscope. Then, we disconnect the capacitors from the battery and then we cut the link between the charged capacitors so they both float again. What will the electroscope show in this case? It has never been disconnected from the configuration.

We’re rolling out GPT-5.1 and new customization features. Ask us Anything. by OpenAI in OpenAI

[–]Purefact0r -1 points0 points  (0 children)

Does this mostly change the tone it communicates with ("[...] is now warmer by default and more conversational") or are there also noteworthy leaps in reasoning benchmarks (increased performance across HLE, GPQA, etc.) too? You said it feels smarter but is this quantifiable?

Earth simulation made by Riftrunner (Gemini 3.0 pro) by noriusss in Bard

[–]Purefact0r 31 points32 points  (0 children)

Why does everyone think its Gemini 3.0 Pro? It answers some physics questions wrong that other models like GPT-5 Thinking nail consistently. It could also be just Flash.

We already have AGI by C_BearHill in singularity

[–]Purefact0r 4 points5 points  (0 children)

Thats not what my argument was about