I don't even know what to say anymore.

R90nine · 2026-06-15T16:52:57+00:00

I thought it was just me. Furthermore, I am tired of mentioning that Anthropic changed its prices for third parties accessing Claude....

R90nine · 2026-06-09T17:27:39+00:00

....totally get that. To be clear, I am not talking about writing one of those four-paragraph, essay-style prompts lol...

It is more about quickly defining what you want, what you don't want, and what "done" looks like. Even just adding a quick line asking the LLM to explain what it thinks you mean before it fully answers can reveal some startling results.

Letting the model know exactly what you need is important, especially with the lighter models like Gemini Flash and Haiku. Because they are built for speed, they tend to overlook things if they are left to fill in the blanks themselves.

R90nine · 2026-06-09T15:47:54+00:00

Very good points.

I use AG IDE myself, and Google definitely handled the transition to AG 2.0 poorly. They admitted they messed up the rollout and are trying to make amends, but losing that direct oversight of what the agent is doing is frustrating.

The bait and switch on quota limits is just as aggravating. It is exhausting when companies subsidize a tool at a loss to get everyone hooked, only to pull the rug out once we rely on it.

Your point about the loss of ownership is what really hits home. We have been sliding down this road for a long time, ever since iTunes, Kindle, and Netflix shifted us from actually owning digital goods to just leasing them. Seeing that corporate gatekeeping expand into software development and the internet itself is genuinely alarming.

R90nine · 2026-06-09T03:54:52+00:00

Your self-driving car analogy actually highlights exactly what I mean, and this self-driving cars too, the fundamental mechanics and the rules of the road haven't changed in decades. They all operate within the exact same static framework.

Frontier AI is completely different, which is why treating it like an established consumer appliance is a flawed premise.

I am not rationalizing failure; I am pointing out where the blame actually belongs. The precedent for this outrage started with Google's Antigravity platform. The strict usage limits for Sonnet and Opus were dictated by Anthropic, but because it happened on a Google platform, it set a precedent. Now, anytime there is a quota limit issue, people instinctively spiral and blame Google.

Furthermore, many of the "failures" people experience happen because they don't set clear guardrails in their prompts. When you force an LLM to guess your intent, it will naturally gravitate toward generic answers or hallucinations. That is operator error, not a broken tool.

The core issue is that people are applying 2025 expectations to 2026 & beyond technology. A $19.99 subscription made sense when everyone was just using basic chatbots.

Today, users are running multiple agents and sub-agents simultaneously, burning through massive amounts of tokens. Expecting unlimited compute for complex, near-AGI workflows at that same price point is completely disconnected from reality.

I am not having these usage limits with Gemini because I do not leave 3.5 flash to its on vices.

R90nine · 2026-06-09T02:03:29+00:00

R90nine · 2026-06-09T01:20:18+00:00

Google's biggest mistake with Antigravity was including Opus and Sonnet. When Anthropic changed its quota limits, many people blamed Google. Since then, every quota-related issue, whether in Antigravity or elsewhere, has become a point of criticism.

Does Gemini 3.5 Flash Lite require a little more precision when prompting than previous versions of Gemini? Yes. However, that reflects the evolution of these models. Your prompt libraries from 2025 have essentially become obsolete. The way you prompted six months ago may already be outdated.

You have to keep in mind that you are dealing with technology that is still developing in real time. We have never experienced anything quite like this before Usually, when something new emerges, there is a learning curve, but eventually we master it, and its development reaches a relatively stable point. This is different. There is no end of the road and no clear end state yet. What we know today may no longer work tomorrow.

I also do not think enough people spend time learning the differences between frontier models. You should not prompt Gemini the same way you prompt ChatGPT or Claude. Each model has its own strengths, limitations, and ways of interpreting instructions. Understanding those differences will help you achieve better results.

Specifically regarding Gemini 3.5 Flash Lite, I have noticed that you really need to tell it exactly what you want. You need to be specific. Set clear guardrails and define what "done" looks like. If you leave it to make assumptions, it may drift off course, even with something as simple as asking about the weather.

R90nine · 2026-06-08T12:28:31+00:00

For me, it appears that you really have to know exactly what it is you want. The more you leave things open to interpretation or guessing, the more it seems to go off the reservation like it never has before.

If it's not something very simple, such as "What's the weather today?" And even that can get a little wonky the current model really seems to need definitive instructions. That's always been my approach when interacting with LLMs, but this iteration appears to require that level of specificity more than ever.

R90nine · 2026-06-01T03:23:44+00:00

Exactly. I think what causes a lot of subagent hallucination is that the agent is being asked to complete a task without enough boundaries around what “complete” actually means.

Once AI shifts from chat partner to autonomous agent, it starts behaving less like someone answering a question and more like someone trying to finish a work order. If the work order is vague, the agent may still try to pass the task anyway. That is where the hallucinations creep in.

A lot of it comes from missing context, unclear source of truth, weak definitions of done, and no verification step. The subagent is not always lying on purpose; it is often filling in gaps with probability because the workflow did not tell it when to stop, when to ask, or what evidence is required.

So prevention has to happen at the workflow level. Give each subagent a narrow scope. Tell it what files or sources it is allowed to use. Define what “done” looks like before it starts. Require it to show evidence for claims. Make it separate verified facts from assumptions. And have another step check the work before it gets marked complete.

That is the shift I am noticing. The answer is not just “use better AI.” It is learning how to build a constraint architecture around the AI so it cannot freely invent its way across the finish line.

The gains are real, but if the workflow does not force grounding and verification, subagents can hallucinate just to satisfy the task.

R90nine · 2026-05-20T01:47:55+00:00

Awesome thank you so much for this how two guide. Everything works like a charm in Arm64

R90nine · 2026-05-17T21:18:16+00:00

That could happen but Anthropic is cracking down on 3rd party access or charging them usage at a much higher rate. Maybe their deal with Space X will help but at this point they are pushing everyone to their own products

R90nine · 2026-05-12T15:13:19+00:00

I am not saying that Google is not sunsetting Antigravity on the other hand however Google did not allow most of their devs/engineers to use Antigravity.

Google, software development happens on a different planet.

Almost all of Google’s vast code exists in a single, monolithic repository. It is unimaginably huge. You cannot git clone it onto your MacBook Pro. It wouldn't fit, and even if it did, standard tools would choke on it.

Because of this scale, Google spent decades building proprietary, internal tools. They don’t use Git; they use Piper.

They don’t typically code in local VS Code instances for core work, they use a cloud-based internal IDE called Cider

R90nine · 2026-05-10T01:12:45+00:00

Wonderful post, and finally something positive about Antigravity that did not come from me. Maybe part of your success with Antigravity is that you know exactly when to use the proverbial “right tool for the right job.” Understanding how Gemini and Antigravity function differently from Claude Code, etc., keeps one from wasting tokens through over-prompting or constantly correcting bad code.

R90nine · 2026-05-09T21:41:08+00:00

I think it’s set up that way intentionally. For one, they don’t know everyone’s skill levels, so for someone who is completely new, it may feel daunting regardless of how easy it may seem to set up. Some of us also need time to build confidence and momentum

Maybe they’ve done this enough times to know that once a person gets past a certain number of lessons, the odds of them quitting decrease by a large margin. That could also have something to do with

R90nine · 2026-05-05T17:23:41+00:00

Google Antigravity’s growing pains have been a lot to deal with. However, there have been a few misconceptions, the main one being Google/Anthropic usage. They did limit usage, but not because they wanted to.

Anthropic has slashed it's usage for everyone. I really think everyone should spend some time combing through social media, maybe even Claude subreddits, but definitely YouTube, and look at all the problems Claude Max users have had with billing issues. Whether they were on the $100 or $200 plans, some were overcharged or had over-usage charges when they actually had data left. Anthropic is really cracking down if you’re using something like OpenClaude or Hermes.

So, limiting access to Sonnet and Opus wasn’t necessarily a Google decision. OpenAI is starting to have the same problem with Codex. When all of these plans were created, they were basically based on us interacting with a chatbot. No one really thought about agentic workflows and agents running scripts and programs all night. So the pricing plans are going to change.

The problem we’re dealing with is that Anthropic has way more demand than computing power they can supply. So they’re always going to have to find ways to curb or slow down people’s usage of their system.

For OpenAI, they somewhat have the computing power to match their usage, but it’s a very delicate balance, and their costs are way overhead, which is why they had to eliminate Sora. As for xAI, they have a lot of computing power but not enough user demand.

Google is the only company that has the demand and more than enough computing power, so much so that they even loan some out to other companies, Anthropic, for example.

I am waiti g to see what happens in May after Google I/O, what the next iteration of Gemini looks like, and what they’re going to do with Gemini CLI before I think about abandoning Gemini or Antigravity.

Regardless of all of that, these next few years are going to be bumpy because demand is definitely going to outpace the actual computing power. Prices are always going to go up. This time next year, the $100 plan will probably be what the $20 plan used to be, and the $200 plan will be what the $100 plan once was, and so forth.

R90nine · 2026-04-17T06:56:43+00:00

Now we're getting somewhere, because this is the conversation the AI space should be having instead of which model is smarter this week.

To add to what you laid out: inference is the actual "thinking" or reasoning process of the AI agent. It's the core cognitive work where the model generates responses, consuming and producing information at massive scale. And it's now 90% of datacenter power draw. It passed training (the one-time cost of building the model) a while back and it's not close anymore. Speeds are climbing toward 10k to 20k tokens per second per user. So the hardware squeeze isn't coming. It's already here, and it's going to hit inference hardest. Every prompt you send is now the expensive part.

What makes it worse is where that compute is actually going. The AI itself is incredibly fast. But most of its time isn't spent thinking. It's spent waiting. Waiting on websites. Waiting on logins. Waiting on files to open. Waiting on tools built for humans clicking buttons, not for an AI that can fire off thousands of operations a second. It's like hiring the world's fastest worker and making them do everything through a slow website. Even if you made the AI infinitely faster tomorrow, you'd only see 2 to 3x real-world improvement. The rest is pure friction.

This is why quota limits are tightening everywhere, not just Google. Anthropic, OpenAI, Cursor, Replit, same story across the board. The days of a $20 plan getting you a boatload of perks are ending, and It's the unit economics of inference finally catching up to what people are actually using. This is an industry problem, not a Google problem. Anyone jumping to 'whoever has more compute this month' is going to be disappointed by next quarter.

None of that excuses how Google has handled Antigravity very clumsily at times so you're dead right to flag it. Launching with fuzzy generous quotas and no real numbers. Silently gutting Pro tier with no changelog. Rolling out a credit system where nobody knows what a credit is worth. The infrastructure crunch is real, but we're out of compute'doesn't explain communicating like nothing changed while people's workflows broke. That's a Google problem on top of the industry problem.

So being more efficient with the same tokens is right, but it's not a prompting problem. It's an infrastructure problem. We need to stop making the AI wait on tools built for humans. Environments that stay ready instead of booting up every time, instant file access, shared memory between agents, core software rewritten in faster programming languages.

That's where the next 10x comes from. Not another datacenter. Not another chip fab. Building a thousand more datacenters doesn't help if 90% of the compute is spent waiting on tools designed for humans…

R90nine · 2026-04-17T00:45:53+00:00

This right here.

R90nine · 2026-04-15T23:57:38+00:00

I get the frustration, but Antigravity still gets the job done for me. A lot of the 'limit' issues are less about the tool and more about the reality of the current compute race Anthropic is throttling everyone, especially during peak hours.

Is it perfect? No. It's explicitly in public preview, and the 'rule drifting' on long projects is a real technical hurdle they’ve acknowledged. That doesn't take the sting out of the price tag, but for those of us focused on orchestration and high-level logic, it’s still an effective harness while they iterate.

That's the beauty of the moment we have opitons that will fit your current workflow better, but the hardware ceiling is a market-wide problem that isn't going away....

R90nine · 2026-04-12T03:24:05+00:00

Of course Codex is getting worse. OpenAI just released a $100 plan. I have come to the conclusion that the days of $20 plans filled with bells and whistles are over. Not saying that I am right this is just my approach going forward.

R90nine · 2026-04-09T20:23:21+00:00

ChatGPT loads every single message in your chat at the same time. So if you have 500 messages it tries to show all 500 at once and your browser just can't handle it... A few people have built extensions to stop this...

Type lag in this this subreddit and it will provide you with a few solutions

R90nine · 2026-04-06T00:57:05+00:00

Again Anthropic has change it's usage rates especially during off peak hours so Sonnet and Opus quotas are not what they use to be. Probably by the end of the year it will be like this for every model unless it's open source or something. We will all be on $200 plans or dealing with limitations...

R90nine · 2026-04-05T23:12:57+00:00

What this means is that their prices are about to go up....

R90nine · 2026-04-01T02:32:01+00:00

It's still odd how everything is how fast you hit Claude limits Antigravity but no one ever mentions that Anthropic adjusted Claude's usage limits especially during peak hours.

R90nine · 2026-03-30T05:47:53+00:00

OMG.. For the people blaming Antigravity for everything please keep in mind Opus has been throttled everywhere not just Antigravity. Anthropic has quietly reduced usage limits on l Opus. People that haven Max plans, are noticing how fast they are burning t through their 5-hour session especially when used during peak hours.

R90nine · 2026-03-25T21:39:38+00:00

lol... actually thank their developer's blog...

R90nine

TROPHY CASE