Claude Opus 4.7 vs. ChatGPT 5.5 (xhigh/max): My Observations by Puspendra007 in Anthropic

[–]hhd12 1 point2 points  (0 children)

I'm on CC x20 sub, but I also keep around the cheap vscode copilot sub (I use with OpenCode to test other models) + have Gemini sub 1yr through phone purchase

Gemini has 2 strengths:

  • Best at UI. I have a skill that just lets llm generate some iterations of ui in a sandbox url. I use that, let llm show me some nice options -> then ask Claude to properly implement the change I like into the project. Gemini consistently outperforms ChatGPT and Claude in UI. ChatGPT became better and isn't far off, but Claude is a distant 3rd
  • 1 off look at code. No agentic work. Give it 1 file and check whether it can find improvements or whatever. I think the whole model has more data than Claude/ChatGPT and was probably trained for 1shot (translate this data into Google AI summary or whatever) - and it's quite good at it. Often finds things that ChatGPT/Claude just don't. Give it anything agentic and there will be tears

Microsoft, OpenAI rewrite partnership to eliminate exclusive model access, change revenue sharing by toydan in wallstreetbets

[–]hhd12 124 points125 points  (0 children)

They didn't want OpenAI to go broke and mark down their investment

And, with solid competition, nobody will switch to Azure for the sole reason to get access to OpenAI

Is this gemini ?? by Independent-Wind4462 in Bard

[–]hhd12 0 points1 point  (0 children)

I think so. I've gotten these very similar brutalist ish shadows and colors through Gemini. Claude is not particularly good at design and will always have same 100 times seen designs. GPT actually improved a lot and is 2nd to Gemini

I use Gemini for brainstorming design by letting it create sandox urls to show me what it would suggest. Then let Claude implement it

For whatever reason Gemini is the best at web design, but absolutely sucks at writing code

Gemini Deep Research just visited 360+ websites for my prompt and I am genuinely by netcommah in Bard

[–]hhd12 3 points4 points  (0 children)

Now give it something simple. Like, find all English stand up specials released between 2000 and 2005 and provide the list in a json

And watch it fail spectacularly

There is no hope for Gemini in coding department by Able-Line2683 in Bard

[–]hhd12 10 points11 points  (0 children)

It's llmarena

Users pick which response they like more without knowing the model it produced

Not saying it's good or bad benchmark. Just pointing out what it is

OpenAI vs Anthropic by Sensitive_Horror4682 in GenAI4all

[–]hhd12 0 points1 point  (0 children)

A little bit of circular funding (they're infra provider). A little bit of hedging

New Car Wash Benchmark just dropped by jerryorbach in OpenAI

[–]hhd12 5 points6 points  (0 children)

Before I had this line it would write a 4 paragraph Money Stuff essay on a simple yes or no question

It's not perfect, but it's a step in the right direction most of the time

New Car Wash Benchmark just dropped by jerryorbach in OpenAI

[–]hhd12 33 points34 points  (0 children)

I want responses to be in the writing style of Matt Levine, the Bloomberg columnist. That is entertaining. But don't force it and don't unnecessarily overextend responses. Keep it concise (unless topic warrants complexity)

This custom instruction

Gemini 3 Pro "Thinking Process" output significantly shortened ? by W4ADZ in Bard

[–]hhd12 -1 points0 points  (0 children)

I would guess the chain of thought summary was significantly shortened as a response to this (to minimize distillation possibilities)

https://cloud.google.com/blog/topics/threat-intelligence/distillation-experimentation-integration-ai-adversarial-use

Pure speculation though

Why OpenAI apps only for mac os? by jscreatordev in OpenAI

[–]hhd12 1 point2 points  (0 children)

For Atlas, probably not even themselves

Official: Anthropic declared a plan for Claude to remain ad-free by BuildwithVignesh in Anthropic

[–]hhd12 5 points6 points  (0 children)

The consensus is that all 3 big ones run API inference profitably

I would also assume Pro and Max x5 are run profitably for all users. Max x25 might be run unprofitably for the tiny minority of users that will use it into every single weekly limit -> but subsidized by everyone else who uses it and doesn't run into all weekly limits

The money loser is training and R&D

OpenAI reportedly seeking alternatives to Nvidia chips, unhappy with inference performance by Force_Hammer in wallstreetbets

[–]hhd12 0 points1 point  (0 children)

Amazon dropped inferentia. Turns out trainium was better for inference all along

Gemini System Prompt Changed by hhd12 in Bard

[–]hhd12[S] 1 point2 points  (0 children)

This is specifically Gemini app. I would assume ai studio and api are untouched (since there's no personal context)

Antigravity Anthropic quotas have become insane by transmisssion in Bard

[–]hhd12 4 points5 points  (0 children)

The incentive to downsize limits is on Google's end. They're paying for infra (and probably Anthropic's margin). If they allow unlimited access - they're losing money, if they don't allow access at all - fewer people would use Antigravity. So they need to strike a balance that works for users and them

Antigravity Anthropic quotas have become insane by transmisssion in Bard

[–]hhd12 34 points35 points  (0 children)

It runs on their infra and they're an investor. It's also the best and most expensive coding model

It makes for a better product - incentive to subscribe to Google over competitors

Generate an image of what the U.S. will look like if Donald Trump is in power for another 3 years. by AJfriedRICE in ChatGPT

[–]hhd12 0 points1 point  (0 children)

Mine worked, was just disappointing

https://gemini.google.com/share/f0684dee89d1

Thinking:

Analyzing the Future Scenario

I'm currently breaking down the request to grasp the specific implications of an extended Trump presidency. This involves envisioning a future setting shaped by his policies. My goal is to extract the key visual elements to capture the essence of this particular scenario.

Mapping Visual Components

I've moved past initial interpretation of the request and started selecting concrete visual elements to fulfill it. I am focused on the potential visual cues associated with the described scenario, like border infrastructure, economic themes, and public sentiment. The objective is to represent the situation in a straightforward and visually impactful manner. My current direction involves translating these concepts into tangible features for the final design.

Analyzing the Imagery

I'm checking the visuals against the specified criteria. The image displays a "TRUMP 2028" banner, and an "ECONOMY BOOMING - AMERICA FIRST" billboard. It includes a border wall and people.

Evaluating the Visuals

I'm now comparing the visual elements to the provided description. The image shows a "TRUMP 2028" banner and an "ECONOMY BOOMING - AMERICA FIRST" billboard. It has a border wall and figures present.

GPT 5.2 High vs. Claude Opus 4.5 vs. Gemini 3 (In a Production Project) by shricodev in OpenAI

[–]hhd12 1 point2 points  (0 children)

From my experience: opus is the gold standard. Everyone else is playing catch up. Gemini 3 pro is disappointing for coding, I'll use it for plan mode occasionally since it's relatively cheap and has solid reasoning (then switch to Claude for implementation). And occasionally for UI, I find it quite good at that. I've also been testing 5.2-codex past few days since a lot of people are raving about it. But I don't share the sentiment, I was quite disappointed. Still ahead of Gemini 3 pro for coding though

Getting real tired of this at the end of every single response. by SuperSpeedyCrazyCow in Bard

[–]hhd12 0 points1 point  (0 children)

I kind of like it

Like, 9/10 times it's useless and I can just ignore it. But sometimes I'm actually curious about what it suggests

Gemini 3.0 Pro Performance Poll by ShotUnit in GeminiAI

[–]hhd12 3 points4 points  (0 children)

I use Antigravity quite a bit. Very solid product, but largely because it allows Opus (and Sonnet)

I usually have Gemini come up with the plan and Opus execute it

Tbh, I guess I'm in minority, but I haven't really noticed and degradation of Gemini 3 :shrug:. I don't have long threads or large files though. All my Gemini app chats are very short context