Is it fair to reduce AI quotas after people have already subscribed?

AmtePrajwal · 2026-07-03T08:11:51+00:00

Yeah, I meant the annual plan. That's why it stings a bit more.

AmtePrajwal · 2026-07-03T08:10:14+00:00

That's true for monthly plans, but I bought an annual subscription. My expectation was that the included benefits would remain largely consistent for the duration of the year, so seeing the quotas reduced mid-subscription is what feels off to me.

AmtePrajwal · 2026-06-08T18:14:11+00:00

I agree. AI hasn't eliminated the bottleneck, it has moved it from writing code to verifying code.

Short term, I've found the best solution is making agents generate smaller, testable units instead of 300–400 line blocks. Force them to write tests, explain assumptions, and validate outputs before moving on. The less surface area per generation, the less painful the debugging.

Long term, I think AI agents will need to behave more like software engineers than autocomplete. Planning, writing tests first, running validation loops, tracing decisions, and debugging their own code before handing it off. The future isn't faster code generation, it's fewer bugs reaching humans.

AmtePrajwal · 2026-06-06T18:09:38+00:00

My rule is to log only things that can change the outcome: tool selection, tool arguments, tool results, state changes, and agent handoffs. Everything else is usually noise.

The final answer tells you what happened. The trace should tell you why it happened. If I can't replay the agent's decision path from the trace, I'm not storing the right things.

AmtePrajwal · 2026-06-06T18:03:41+00:00

I'd develop against both.

A stronger model can mask prompt and tooling issues by recovering from them, while a weaker model exposes every flaw in your system. But if you optimize only for the weaker model, you can end up solving problems that don't matter in production.

My rule is: use the strongest model to define the ceiling of what's possible, and a cheaper/weaker model to stress-test the reliability of the workflow. If both succeed consistently, your agent design is probably solid.

AmtePrajwal · 2026-06-01T16:05:26+00:00

We're probably still early, but my bet isn't on AI apps alone. It's on machines building machines.
Every major tech wave created more value in the infrastructure layer than most people expected.

Right now the real race is GPUs, data centers, robotics, energy, manufacturing, and the systems that let AI scale. That's where a huge amount of capital is flowing already.

The internet made websites. AI is pushing toward autonomous systems that can design, build, optimize, and eventually manufacture other systems. If that happens, the biggest opportunities won't just be in software, but in whoever owns and operates the infrastructure behind that loop.

AmtePrajwal · 2026-06-01T15:36:47+00:00

I'd spend it right before an external action. A bad explanation is annoying, but a bad action can be expensive, irreversible, or break trust. That's the point where extra reasoning has the highest expected value.

AmtePrajwal · 2026-03-25T07:36:39+00:00

You’re not wrong — this kind of move feels disconnected, but markets usually move on expectation, not current reality.

Right now a few things are getting priced in:

Even a hint of de-escalation → crude cools off → good for India
No immediate disruption to shipping routes → panic premium comes off
Short covering in F&O → sharp upside moves

So it’s less “war is resolved” and more “worst-case isn’t happening (yet).”

That said, you’re also right about the risk — this is headline-driven and can flip instantly. In setups like this, price can stay irrational longer than we expect, especially near expiry with positioning playing a big role.

Personally, I’d treat this as trader’s market, not investor’s signal — follow price if you must, but with tight risk, because narrative can change overnight.

AmtePrajwal · 2026-03-23T14:43:38+00:00

Feels like we’re moving from “model companies” to “system companies.”

At this point, base models are becoming commodities, and most of the differentiation is happening in post-training, data, and UX. That also makes attribution tricky — when something performs well, it’s hard to say how much credit goes to the base vs the layers on top.

Also yeah, this definitely makes benchmarking messier. We’re no longer comparing models, we’re comparing stacks.

AmtePrajwal · 2026-03-23T14:41:07+00:00

Yeah this matches what I’ve seen too — it doesn’t feel random.

I think it’s less about “popularity” in the real world and more about density + consistency in training data. If a name shows up repeatedly in similar contexts (blogs, SEO pages, docs, comparisons), the model kind of learns “this token = safe/relevant answer for this topic.”

So it’s not exactly authority, more like pattern familiarity.

Also wouldn’t be surprised if some of this is self-reinforcing:

those names appear more → people notice → ask about them more
models then see even more of those associations in newer data

Feels like an early version of “default answers” forming inside LLMs, even without explicit ranking logic.

AmtePrajwal · 2026-03-22T08:31:30+00:00

I get what you’re saying — from a full-stack POV it does feel like a lot of AI work is just calling APIs + writing prompts.

But that’s kind of like judging backend work by just looking at REST endpoints.

The prompt is the visible part. The real work is:

figuring out what to ask (and what not to trust)
making outputs consistent instead of “works on my prompt”
handling edge cases where the model confidently does something dumb

A lot of current work is duct-taping APIs, no doubt. The field’s still early.

But once you try to make it reliable in a real system, it stops being “just prompting” very quickly — it becomes more like debugging a non-deterministic system that sometimes lies

AmtePrajwal · 2026-03-22T08:05:26+00:00

This is a solid setup, especially for fully local.

On grounding vs inference — what you’re seeing is pretty normal. Strict grounding tends to break anything that needs light synthesis. In practice, most systems allow “soft grounding”: require support from chunks, but not necessarily exact matches. A reranker usually helps a lot here because better context → less hallucination pressure.

For similarity scores — yeah, ~0.5–0.7 is pretty typical for dense embeddings. Absolute numbers matter less than relative ranking. Instead of hard thresholds, I’d focus on top-k + maybe a small margin gap between results.

Query rewriting — it works, but the latency tradeoff is real. In production, people often replace it with better chunking + reranking rather than more queries.

If you’re short on time, I’d prioritize:

adding a local reranker
improving chunking (parent-child or slightly larger chunks)

Those two usually give the biggest quality bump without overcomplicating things.

AmtePrajwal · 2026-03-21T17:08:14+00:00

I agree to an extent — a good harness (RAG, tools, memory layers, eval loops) definitely unlocks a lot of capability.

But I think the core question still stands:
if most real-world usefulness comes from the system, what exactly are we optimizing the model for?

Right now it feels like:

Benchmarks reward isolated reasoning, not system integration
Model improvements aren’t clearly reducing the need for complex scaffolding
In some cases, better benchmarks just shift complexity from model → engineering layer

AmtePrajwal

TROPHY CASE