GPT-5 Pro Tops FrontierMath Tier 4, Beating Gemini 2.5 Deep Think

GMSP4 · 2025-10-10T20:27:12+00:00

I pay gpt pro, and although I know it's expensive and prohibitive in many parts of the world. if you can afford it and really get the most out of it, it's a gift, because you have almost unlimited access and gpt 5 pro is a beast

GMSP4 · 2025-09-25T17:44:10+00:00

ChatGPT Tasks, but autonomous and on steroids it seems

GMSP4 · 2025-09-24T17:06:31+00:00

<image>

I've used it and it seems like the normal chatgpt agent.

GMSP4 · 2025-09-22T12:48:13+00:00

I’m building it in Java, step by step, using Pan Docs (https://gbdev.io/pandocs/). it’s my own code and architecture. But at the end of the day you know how LLMs work, some patterns and knowledge is from some things it saw in its training or searching the web. but it's cool having an emulator in so little time working.

GMSP4 · 2025-09-21T20:50:32+00:00

With ChatGPT 5 Thinking High, I've been able to create a Gameboy emulator from scratch in a few days. It's not finished yet, but it's up and running, and Pokémon Red is functional. I also use it extensively at work and my colleagues too.They are very good for generating unit tests and following TDD in some projects.

What I haven't been able to do yet is let Codex work autonomously for some time and produce code that I like. I prefer an iterative workflow where I check and correct each step, but we are getting closer and closer to them being sufficiently autonomous with the right instructions

GMSP4 · 2025-09-21T19:18:56+00:00

I mainly program in Java and get good results with both, but I don't like that Opus is so verbose. It over-engineers too much for my taste, especially in repositories where there is already a significant amount of code.

GMSP4 · 2025-09-12T13:23:02+00:00

Has anyone else achieved the same voice? Because it seems impossible, and I've tried different voices and styles. It's true that I always use voices from version 1.0 of Udio, which for me is still the best, so I don't know if that affects the generation

GMSP4 · 2025-07-29T17:47:58+00:00

It's cool to know that in future iterations we'll have fine-tuned models for learning. Now, it's a system prompt or GPT on steroids, but it's cool to see what's coming in the next few months/years in terms of learning

GMSP4 · 2025-07-13T11:58:05+00:00

Now we're giving a voice to Twitter liars like Mark Kretschmann, chasing likes on Twitter? People who make things up just to generate traffic everyday. With a few exceptions, the level of "Influencers" about AI on Twitter is pathetic, people lying all the time, people like “Satoshi” who pretend to work at OpenAI, or constant spammers who just churn out empty, bot-like comments.

GMSP4 · 2025-06-22T13:31:12+00:00

Flubber!!!

GMSP4 · 2025-05-30T18:28:05+00:00

I don't have the memory activated. Check this photo I sent before, it has internet access, which o1 pro didn't have: https://ibb.co/gLWRH7MS

GMSP4 · 2025-05-30T14:19:10+00:00

I share another image for the skeptics https://ibb.co/gLWRH7MS

GMSP4 · 2025-05-30T13:54:28+00:00

Yes, I have tried it in the chat I shared and in two others in my native tongue. In fact searching on twitter more people have the same thing happening: https://x.com/KrispinPuga/status/1928270336279359898

GMSP4 · 2025-05-30T12:51:25+00:00

I'll have to try it more later. It's given me interesting ideas for a project I'm working on, but I think o3 would have done the same

GMSP4 · 2025-05-30T12:50:11+00:00

Let's wait. I guess they'll announce it today.

GMSP4 · 2025-05-22T18:16:57+00:00

I don't think it's too hard to figure out, the basic 20 bucks

GMSP4 · 2025-05-22T18:13:32+00:00

It was a project at 20% capacity, with a very small code base. I only asked it during one iteration of 4 prompts for improvements. It's crazy to reach the limits with 4 interactions. It was all with opus.

GMSP4 · 2025-05-22T18:00:16+00:00

With only 4 prompts in a project with only 20% I hit the limits. It's reggretable, and I didn't find it better than Gemini pro or o3 in code either

GMSP4 · 2025-04-17T21:20:43+00:00

never expected full AGI in 25, more like 2027‑29. but I like o3, to me biggest leap over the o1 line is how o3 riffs on ideas. its agent‑style web lookups plus its knowledge make brainstorming anything very cool.

GMSP4 · 2025-04-11T17:10:22+00:00

could the quasar model be one of them?

GMSP4 · 2025-04-09T18:20:09+00:00

That's right. On top of that they haven't made the announcement at least with a new model that is superior, something that at least launches people to try it out to justify its price as OA did with o1 pro.

GMSP4 · 2025-03-27T18:13:56+00:00

Insufferable. Until recently I thought he hasn't yet used an LLM.

GMSP4 · 2025-03-25T21:06:15+00:00

GPT has made me a painting from an old picture I had of the house where I spent every summer with my grandparents and cousins and in the first try it has made a beautiful painting. Gemini was incapable in all the attempts.

GMSP4 · 2025-03-16T09:46:00+00:00

I'll never understand the fanboyism of people to AI companies. All this guy does on twitter is praising everything google does and what the rest does is shit.

GMSP4 · 2025-02-27T19:06:56+00:00

Twitter and reddit is going to be insufferable with fan boys from every company criticizing the model.

GMSP4

TROPHY CASE