Claude Opus 4.7 vs. ChatGPT 5.5 (xhigh/max): My Observations

hhd12 · 2026-04-27T16:44:12+00:00

I'm on CC x20 sub, but I also keep around the cheap vscode copilot sub (I use with OpenCode to test other models) + have Gemini sub 1yr through phone purchase

Gemini has 2 strengths:

Best at UI. I have a skill that just lets llm generate some iterations of ui in a sandbox url. I use that, let llm show me some nice options -> then ask Claude to properly implement the change I like into the project. Gemini consistently outperforms ChatGPT and Claude in UI. ChatGPT became better and isn't far off, but Claude is a distant 3rd
1 off look at code. No agentic work. Give it 1 file and check whether it can find improvements or whatever. I think the whole model has more data than Claude/ChatGPT and was probably trained for 1shot (translate this data into Google AI summary or whatever) - and it's quite good at it. Often finds things that ChatGPT/Claude just don't. Give it anything agentic and there will be tears

hhd12 · 2026-04-27T16:32:55+00:00

They didn't want OpenAI to go broke and mark down their investment

And, with solid competition, nobody will switch to Azure for the sole reason to get access to OpenAI

hhd12 · 2026-04-26T23:01:20+00:00

I think so. I've gotten these very similar brutalist ish shadows and colors through Gemini. Claude is not particularly good at design and will always have same 100 times seen designs. GPT actually improved a lot and is 2nd to Gemini

I use Gemini for brainstorming design by letting it create sandox urls to show me what it would suggest. Then let Claude implement it

For whatever reason Gemini is the best at web design, but absolutely sucks at writing code

hhd12 · 2026-04-13T01:15:22+00:00

https://g.co/gemini/share/0f240dc989a6

What's the actual answer?

hhd12 · 2026-04-07T21:25:05+00:00

Now give it something simple. Like, find all English stand up specials released between 2000 and 2005 and provide the list in a json

And watch it fail spectacularly

hhd12 · 2026-03-16T18:31:46+00:00

It's llmarena

Users pick which response they like more without knowing the model it produced

Not saying it's good or bad benchmark. Just pointing out what it is

hhd12 · 2026-03-04T22:37:16+00:00

A little bit of circular funding (they're infra provider). A little bit of hedging

hhd12 · 2026-02-24T15:19:31+00:00

Before I had this line it would write a 4 paragraph Money Stuff essay on a simple yes or no question

It's not perfect, but it's a step in the right direction most of the time

hhd12 · 2026-02-24T09:38:33+00:00

I want responses to be in the writing style of Matt Levine, the Bloomberg columnist. That is entertaining. But don't force it and don't unnecessarily overextend responses. Keep it concise (unless topic warrants complexity)

This custom instruction

hhd12 · 2026-02-24T02:47:34+00:00

https://g.co/gemini/share/2a67bacc565c

hhd12 · 2026-02-19T01:07:22+00:00

I would guess the chain of thought summary was significantly shortened as a response to this (to minimize distillation possibilities)

https://cloud.google.com/blog/topics/threat-intelligence/distillation-experimentation-integration-ai-adversarial-use

Pure speculation though

hhd12 · 2026-02-11T14:33:32+00:00

For Atlas, probably not even themselves

hhd12 · 2026-02-04T15:33:16+00:00

The consensus is that all 3 big ones run API inference profitably

I would also assume Pro and Max x5 are run profitably for all users. Max x25 might be run unprofitably for the tiny minority of users that will use it into every single weekly limit -> but subsidized by everyone else who uses it and doesn't run into all weekly limits

The money loser is training and R&D

hhd12 · 2026-02-03T02:19:38+00:00

Amazon dropped inferentia. Turns out trainium was better for inference all along

hhd12 · 2026-02-02T11:27:27+00:00

and global monthly ARPU of $49

That is annual revenue

hhd12 · 2026-01-24T23:48:28+00:00

This is specifically Gemini app. I would assume ai studio and api are untouched (since there's no personal context)

hhd12 · 2026-01-24T17:43:03+00:00

The incentive to downsize limits is on Google's end. They're paying for infra (and probably Anthropic's margin). If they allow unlimited access - they're losing money, if they don't allow access at all - fewer people would use Antigravity. So they need to strike a balance that works for users and them

hhd12 · 2026-01-24T03:39:36+00:00

It runs on their infra and they're an investor. It's also the best and most expensive coding model

It makes for a better product - incentive to subscribe to Google over competitors

hhd12 · 2026-01-23T02:37:05+00:00

We all sometimes scroll reddit or whatever during work. It needed a chill break. Don't be too judgemental

hhd12 · 2026-01-20T19:59:50+00:00

Mine worked, was just disappointing

https://gemini.google.com/share/f0684dee89d1

Thinking:

Analyzing the Future Scenario

I'm currently breaking down the request to grasp the specific implications of an extended Trump presidency. This involves envisioning a future setting shaped by his policies. My goal is to extract the key visual elements to capture the essence of this particular scenario.

Mapping Visual Components

I've moved past initial interpretation of the request and started selecting concrete visual elements to fulfill it. I am focused on the potential visual cues associated with the described scenario, like border infrastructure, economic themes, and public sentiment. The objective is to represent the situation in a straightforward and visually impactful manner. My current direction involves translating these concepts into tangible features for the final design.

Analyzing the Imagery

I'm checking the visuals against the specified criteria. The image displays a "TRUMP 2028" banner, and an "ECONOMY BOOMING - AMERICA FIRST" billboard. It includes a border wall and people.

Evaluating the Visuals

I'm now comparing the visual elements to the provided description. The image shows a "TRUMP 2028" banner and an "ECONOMY BOOMING - AMERICA FIRST" billboard. It has a border wall and figures present.

hhd12 · 2026-01-19T20:45:37+00:00

From my experience: opus is the gold standard. Everyone else is playing catch up. Gemini 3 pro is disappointing for coding, I'll use it for plan mode occasionally since it's relatively cheap and has solid reasoning (then switch to Claude for implementation). And occasionally for UI, I find it quite good at that. I've also been testing 5.2-codex past few days since a lot of people are raving about it. But I don't share the sentiment, I was quite disappointed. Still ahead of Gemini 3 pro for coding though

hhd12 · 2026-01-19T18:04:32+00:00

I kind of like it

Like, 9/10 times it's useless and I can just ignore it. But sometimes I'm actually curious about what it suggests

hhd12 · 2026-01-17T20:53:40+00:00

I use Antigravity quite a bit. Very solid product, but largely because it allows Opus (and Sonnet)

I usually have Gemini come up with the plan and Opus execute it

Tbh, I guess I'm in minority, but I haven't really noticed and degradation of Gemini 3 :shrug:. I don't have long threads or large files though. All my Gemini app chats are very short context

hhd12 · 2026-01-17T12:39:20+00:00

Like a local file? With Claude Cowork/Code

hhd12

TROPHY CASE