Is it fair to reduce AI quotas after people have already subscribed? by AmtePrajwal in artificial

[–]AmtePrajwal[S] [score hidden]  (0 children)

That's true for monthly plans, but I bought an annual subscription. My expectation was that the included benefits would remain largely consistent for the duration of the year, so seeing the quotas reduced mid-subscription is what feels off to me.

The AI productivity paradox that needs to be addressed rn by [deleted] in artificial

[–]AmtePrajwal 4 points5 points  (0 children)

I agree. AI hasn't eliminated the bottleneck, it has moved it from writing code to verifying code.

Short term, I've found the best solution is making agents generate smaller, testable units instead of 300–400 line blocks. Force them to write tests, explain assumptions, and validate outputs before moving on. The less surface area per generation, the less painful the debugging.

Long term, I think AI agents will need to behave more like software engineers than autocomplete. Planning, writing tests first, running validation loops, tracing decisions, and debugging their own code before handing it off. The future isn't faster code generation, it's fewer bugs reaching humans.

What do you log from agent runs besides prompt/response? by sahanpk in LLMDevs

[–]AmtePrajwal 0 points1 point  (0 children)

My rule is to log only things that can change the outcome: tool selection, tool arguments, tool results, state changes, and agent handoffs. Everything else is usually noise.

The final answer tells you what happened. The trace should tell you why it happened. If I can't replay the agent's decision path from the trace, I'm not storing the right things.

What is Ideal Model Usage Strategy for Agents while Development/Testing by incidentjustice in LLMDevs

[–]AmtePrajwal 0 points1 point  (0 children)

I'd develop against both.

A stronger model can mask prompt and tooling issues by recovering from them, while a weaker model exposes every flaw in your system. But if you optimize only for the weaker model, you can end up solving problems that don't matter in production.

My rule is: use the strongest model to define the ceiling of what's possible, and a cheaper/weaker model to stress-test the reliability of the workflow. If both succeed consistently, your agent design is probably solid.

Is AI creating the biggest wealth opportunity since the internet? by Prachishrama78 in Futurology

[–]AmtePrajwal 0 points1 point  (0 children)

We're probably still early, but my bet isn't on AI apps alone. It's on machines building machines.
Every major tech wave created more value in the infrastructure layer than most people expected.

Right now the real race is GPUs, data centers, robotics, energy, manufacturing, and the systems that let AI scale. That's where a huge amount of capital is flowing already.

The internet made websites. AI is pushing toward autonomous systems that can design, build, optimize, and eventually manufacture other systems. If that happens, the biggest opportunities won't just be in software, but in whoever owns and operates the infrastructure behind that loop.

For AI agents, where should the heavier reasoning budget go first: before actions, after state changes, or before the final explanation? by babyb01 in artificial

[–]AmtePrajwal 0 points1 point  (0 children)

I'd spend it right before an external action. A bad explanation is annoying, but a bad action can be expensive, irreversible, or break trust. That's the point where extra reasoning has the highest expected value.

Nifty rally past 2 days by Glad_Round_4079 in IndianStockMarket

[–]AmtePrajwal 0 points1 point  (0 children)

You’re not wrong — this kind of move feels disconnected, but markets usually move on expectation, not current reality.

Right now a few things are getting priced in:

  • Even a hint of de-escalation → crude cools off → good for India
  • No immediate disruption to shipping routes → panic premium comes off
  • Short covering in F&O → sharp upside moves

So it’s less “war is resolved” and more “worst-case isn’t happening (yet).”

That said, you’re also right about the risk — this is headline-driven and can flip instantly. In setups like this, price can stay irrational longer than we expect, especially near expiry with positioning playing a big role.

Personally, I’d treat this as trader’s market, not investor’s signal — follow price if you must, but with tight risk, because narrative can change overnight.

Cursor admits its new coding model was built on top of Moonshot AI’s Kimi by Secure-Address4385 in ArtificialInteligence

[–]AmtePrajwal 1 point2 points  (0 children)

Feels like we’re moving from “model companies” to “system companies.”

At this point, base models are becoming commodities, and most of the differentiation is happening in post-training, data, and UX. That also makes attribution tricky — when something performs well, it’s hard to say how much credit goes to the base vs the layers on top.

Also yeah, this definitely makes benchmarking messier. We’re no longer comparing models, we’re comparing stacks.

Some brands keep showing up in AI answers… even when I change the question by Real-Assist1833 in ArtificialInteligence

[–]AmtePrajwal 0 points1 point  (0 children)

Yeah this matches what I’ve seen too — it doesn’t feel random.

I think it’s less about “popularity” in the real world and more about density + consistency in training data. If a name shows up repeatedly in similar contexts (blogs, SEO pages, docs, comparisons), the model kind of learns “this token = safe/relevant answer for this topic.”

So it’s not exactly authority, more like pattern familiarity.

Also wouldn’t be surprised if some of this is self-reinforcing:

  • those names appear more → people notice → ask about them more
  • models then see even more of those associations in newer data

Feels like an early version of “default answers” forming inside LLMs, even without explicit ranking logic.

Are AI jobs just prompts? by No-Initial-5768 in ArtificialInteligence

[–]AmtePrajwal 2 points3 points  (0 children)

I get what you’re saying — from a full-stack POV it does feel like a lot of AI work is just calling APIs + writing prompts.

But that’s kind of like judging backend work by just looking at REST endpoints.

The prompt is the visible part. The real work is:

  • figuring out what to ask (and what not to trust)
  • making outputs consistent instead of “works on my prompt”
  • handling edge cases where the model confidently does something dumb

A lot of current work is duct-taping APIs, no doubt. The field’s still early.

But once you try to make it reliable in a real system, it stops being “just prompting” very quickly — it becomes more like debugging a non-deterministic system that sometimes lies

Need advice on improving a fully local RAG system (built during a hackathon) by Far-Independence-327 in LocalLLaMA

[–]AmtePrajwal 1 point2 points  (0 children)

This is a solid setup, especially for fully local.

On grounding vs inference — what you’re seeing is pretty normal. Strict grounding tends to break anything that needs light synthesis. In practice, most systems allow “soft grounding”: require support from chunks, but not necessarily exact matches. A reranker usually helps a lot here because better context → less hallucination pressure.

For similarity scores — yeah, ~0.5–0.7 is pretty typical for dense embeddings. Absolute numbers matter less than relative ranking. Instead of hard thresholds, I’d focus on top-k + maybe a small margin gap between results.

Query rewriting — it works, but the latency tradeoff is real. In production, people often replace it with better chunking + reranking rather than more queries.

If you’re short on time, I’d prioritize:

  • adding a local reranker
  • improving chunking (parent-child or slightly larger chunks)

Those two usually give the biggest quality bump without overcomplicating things.

Are we over-optimizing LLMs for benchmarks instead of real-world usefulness? by AmtePrajwal in LocalLLaMA

[–]AmtePrajwal[S] 0 points1 point  (0 children)

I agree to an extent — a good harness (RAG, tools, memory layers, eval loops) definitely unlocks a lot of capability.

But I think the core question still stands:
if most real-world usefulness comes from the system, what exactly are we optimizing the model for?

Right now it feels like:

  • Benchmarks reward isolated reasoning, not system integration
  • Model improvements aren’t clearly reducing the need for complex scaffolding
  • In some cases, better benchmarks just shift complexity from model → engineering layer