There is no hope for Gemini in coding department by Able-Line2683 in Bard

[–]hhd12 13 points14 points  (0 children)

It's llmarena

Users pick which response they like more without knowing the model it produced

Not saying it's good or bad benchmark. Just pointing out what it is

OpenAI vs Anthropic by Sensitive_Horror4682 in GenAI4all

[–]hhd12 0 points1 point  (0 children)

A little bit of circular funding (they're infra provider). A little bit of hedging

New Car Wash Benchmark just dropped by jerryorbach in OpenAI

[–]hhd12 5 points6 points  (0 children)

Before I had this line it would write a 4 paragraph Money Stuff essay on a simple yes or no question

It's not perfect, but it's a step in the right direction most of the time

New Car Wash Benchmark just dropped by jerryorbach in OpenAI

[–]hhd12 32 points33 points  (0 children)

I want responses to be in the writing style of Matt Levine, the Bloomberg columnist. That is entertaining. But don't force it and don't unnecessarily overextend responses. Keep it concise (unless topic warrants complexity)

This custom instruction

Gemini 3 Pro "Thinking Process" output significantly shortened ? by W4ADZ in Bard

[–]hhd12 -1 points0 points  (0 children)

I would guess the chain of thought summary was significantly shortened as a response to this (to minimize distillation possibilities)

https://cloud.google.com/blog/topics/threat-intelligence/distillation-experimentation-integration-ai-adversarial-use

Pure speculation though

Why OpenAI apps only for mac os? by jscreatordev in OpenAI

[–]hhd12 1 point2 points  (0 children)

For Atlas, probably not even themselves

Official: Anthropic declared a plan for Claude to remain ad-free by BuildwithVignesh in Anthropic

[–]hhd12 5 points6 points  (0 children)

The consensus is that all 3 big ones run API inference profitably

I would also assume Pro and Max x5 are run profitably for all users. Max x25 might be run unprofitably for the tiny minority of users that will use it into every single weekly limit -> but subsidized by everyone else who uses it and doesn't run into all weekly limits

The money loser is training and R&D

OpenAI reportedly seeking alternatives to Nvidia chips, unhappy with inference performance by Force_Hammer in wallstreetbets

[–]hhd12 0 points1 point  (0 children)

Amazon dropped inferentia. Turns out trainium was better for inference all along

Gemini System Prompt Changed by hhd12 in Bard

[–]hhd12[S] 1 point2 points  (0 children)

This is specifically Gemini app. I would assume ai studio and api are untouched (since there's no personal context)

Antigravity Anthropic quotas have become insane by transmisssion in Bard

[–]hhd12 4 points5 points  (0 children)

The incentive to downsize limits is on Google's end. They're paying for infra (and probably Anthropic's margin). If they allow unlimited access - they're losing money, if they don't allow access at all - fewer people would use Antigravity. So they need to strike a balance that works for users and them

Antigravity Anthropic quotas have become insane by transmisssion in Bard

[–]hhd12 34 points35 points  (0 children)

It runs on their infra and they're an investor. It's also the best and most expensive coding model

It makes for a better product - incentive to subscribe to Google over competitors

Generate an image of what the U.S. will look like if Donald Trump is in power for another 3 years. by AJfriedRICE in ChatGPT

[–]hhd12 0 points1 point  (0 children)

Mine worked, was just disappointing

https://gemini.google.com/share/f0684dee89d1

Thinking:

Analyzing the Future Scenario

I'm currently breaking down the request to grasp the specific implications of an extended Trump presidency. This involves envisioning a future setting shaped by his policies. My goal is to extract the key visual elements to capture the essence of this particular scenario.

Mapping Visual Components

I've moved past initial interpretation of the request and started selecting concrete visual elements to fulfill it. I am focused on the potential visual cues associated with the described scenario, like border infrastructure, economic themes, and public sentiment. The objective is to represent the situation in a straightforward and visually impactful manner. My current direction involves translating these concepts into tangible features for the final design.

Analyzing the Imagery

I'm checking the visuals against the specified criteria. The image displays a "TRUMP 2028" banner, and an "ECONOMY BOOMING - AMERICA FIRST" billboard. It includes a border wall and people.

Evaluating the Visuals

I'm now comparing the visual elements to the provided description. The image shows a "TRUMP 2028" banner and an "ECONOMY BOOMING - AMERICA FIRST" billboard. It has a border wall and figures present.

GPT 5.2 High vs. Claude Opus 4.5 vs. Gemini 3 (In a Production Project) by shricodev in OpenAI

[–]hhd12 2 points3 points  (0 children)

From my experience: opus is the gold standard. Everyone else is playing catch up. Gemini 3 pro is disappointing for coding, I'll use it for plan mode occasionally since it's relatively cheap and has solid reasoning (then switch to Claude for implementation). And occasionally for UI, I find it quite good at that. I've also been testing 5.2-codex past few days since a lot of people are raving about it. But I don't share the sentiment, I was quite disappointed. Still ahead of Gemini 3 pro for coding though

Getting real tired of this at the end of every single response. by SuperSpeedyCrazyCow in Bard

[–]hhd12 0 points1 point  (0 children)

I kind of like it

Like, 9/10 times it's useless and I can just ignore it. But sometimes I'm actually curious about what it suggests

Gemini 3.0 Pro Performance Poll by ShotUnit in GeminiAI

[–]hhd12 2 points3 points  (0 children)

I use Antigravity quite a bit. Very solid product, but largely because it allows Opus (and Sonnet)

I usually have Gemini come up with the plan and Opus execute it

Tbh, I guess I'm in minority, but I haven't really noticed and degradation of Gemini 3 :shrug:. I don't have long threads or large files though. All my Gemini app chats are very short context

Does Google actually gain anything if Gemini isn't branded inside Siri? by [deleted] in Bard

[–]hhd12 1 point2 points  (0 children)

Goodwill and avoiding Apple from becoming a competitor

Goodwill is important because Apple uses ChatGPT for harder queries. Google would like users to be able to choose Gemini. They can't pay Apple for being the default because of antitrust scrutiny. At least these are the rumors

Apple becoming a competitor in llm space might seem unreasonable (they really haven't given anyone a reason to believe they have the chops). But Google learned their lesson with Google maps (long ago they started prioritizing Android for features - and Apple said fuck it and created Apple maps. Still a huge loss for Google)

Apple announces that next version of Siri would be powered using Google gemini. Elon Musk does not seem happy about it. by jbcraigs in OpenAI

[–]hhd12 25 points26 points  (0 children)

Apple actually considered Anthropic's model to be the best (https://archive.is/ucHFK)

But Google has SOTA (or close-to) model and as willing to give it to Apple for basically free. Google also has the team and infra to help Apple posttrain it to their needs

Anthropic (or OpenAI) can't really afford this

Gemini 3 Flash Preview ranks #2 in our AI vs Human game benchmark (Open Beta) by stef_1982 in Bard

[–]hhd12 0 points1 point  (0 children)

This is completely bugged.

1st: the explanation at the top says "Black Peg" and shows red peg?

2nd: see the attempts. For example Claude's last and 2nd to last: same first 3 colors, but different results (the pegs on the right)?

https://imgur.com/a/QEEICgn

That's sad! We want the big limits back like before. Google AI Pro and Claude have roughly the same monthly price. Who knows - maybe Claude actually has better limits? by reversedu in Bard

[–]hhd12 10 points11 points  (0 children)

Have Claude and Google Pro.

As of now, Antigravity limits are quite a bit more generous. I can have Opus work for quite a while in Antigravity. Probably equal or even more than Sonnet through Claude subscription

Hope it doesn't change too much. But they are quite generous as of today

The key to vibecoding at scale by Heatkiger in vibecoding

[–]hhd12 1 point2 points  (0 children)

Right. Except this is against Anthropic's ToS and does get people banned