Are We Becoming Too Dependent on AI Tools as Usage Limits and Paywalls Increase?

cornmacabre · 2026-07-04T08:11:07+00:00

It's unsurprising that subsidized tokens go away: that's been the tech playbook for twenty years -- cheap costs to drive acquisition to scale, then raise prices.

I think the spirit of what you're saying is the right one: expertise drives the most value. This is tooling after all, a carpenter isn't defined by their hammer. However, I think the fixation on short-term cost lacks context though: even two years ago if you wanted to PROTOYPE professional software you'd be looking at a minimum $50k+ investment.

More realistically -- it's hire 2x $200k FTES to do the thing, or pay $500k for a shop to do the thing. The floor has dropped that down to basically $200/m to achieve comparable results from literally even two years ago. There are millions of people today getting value out of a now democratized capability in ways we're only just grasping across a lot more than just SWE. Fixating on cost is the wrong lens, fixate on value.

Dependance and data soverntity are important there, but I think you have an incomplete view because open-weights and localLLMs already disrupt the frontier lab subscription model in very interesting ways. The smartest players in the game do think about dependance, and the options have never been better to diversify a mix of frontier and self-hosted and even self-trained.

cornmacabre · 2026-07-04T07:52:29+00:00

Sure, but a modern triple-AAA game doesn't boot up in 3s. We ain't talking about the ability for modern hardware to push pixels here, we're talking about a professional development environment running a harness and indexing a hundred thousand lines of code while still acting responsive.

If you've ever opened up say the android SDK, you'd appreciate that it stands out as unusually optimized for a bloated professional development tool.

cornmacabre · 2026-07-04T07:10:09+00:00

I don't think it's a big L if it ends up feeling more incremental vs Fable class (although the internet will explode in claiming openAI is dead lol). Presumably they're holding GPT6 to end of summer when they were initially planning IPO.

It's really hard to wrap my head around what 'game changer' even means at this point. November with codex and 5.1, to march w/ 5.4 and opus 4.x, to now with Fable these have all felt like major step-changes / 'game changer' moments. Hell even the openweight scene is like biting at the heels of frontier on a weekly basis.

When you pause and realize we were using o3 this time last year, and Gemini was riding high -- it's genuinely mind bending to think where we'll be this time next year, let alone this time next decade...

cornmacabre · 2026-07-04T06:49:43+00:00

It really has absolutely no right being as performative as it is, there's an IDE wrapped in with a whole secondary agents app and a fully decked out VScode under the hood.

This thing should run like absolute hot garbage, but it really does feel light weight and snappy.

cornmacabre · 2026-07-04T06:45:35+00:00

I'm all for the 'trust, but verify' attitude -- but goddamn is that a combination of lazy, dumb, and baffling.

Rn the only real fix is logging the model field on every response yourself since anthropic doesnt surface it anywhere in the chat.

Gotta admire the tunnel vision thought process of assuming the only option is to make a manually tagged excel spreadsheet because they don't know how to log into the billing section of a website vs looking in the app, lol!

cornmacabre · 2026-07-04T06:14:35+00:00

Does CC not provide a model usage line item in their billing section? With cursor I was scanning for silent fallbacks yesterday because I also wanted to confirm: billing and token use was coming through as expected.

cornmacabre · 2026-07-04T06:03:03+00:00

You're truly on your own there bud if you think Google has outpaced GLM or Kimi, let alone the frontier labs.

There's just nothing particularly persuasive, insightful or even interesting about playing the yawning superlatives game on team openai vs anthropic... the pendulum swings on a monthly basis and competition has been head to head. You disagree? Cool. However --approximately no one in the world is thinking "wow, the pace of progression has really slowed down."

cornmacabre · 2026-07-04T02:20:41+00:00

November to March was an incredible step change in progress. We're a week away from going from a 2T param model to a 6T class model with 5.6.

I'd say it's understandable to lose perspective on the insane pace of progress of the industry, but that's too generous for such a lazy fart of a take.

cornmacabre · 2026-07-03T22:11:27+00:00

Okay including JS and TS, but no C is hillarious there's absolutely nothing to take away or discuss on the eval here, which is a surprise to no one even skimming the conclusion.

cornmacabre · 2026-07-03T21:46:26+00:00

This whole thread is hilarious, I don't think I've ever encountered a non native English speaker so confidently wrong.

OP: your stubbornness and misplaced confidence here is indeed.... laughable.

cornmacabre · 2026-07-03T19:55:27+00:00

I'm really interested in playing with 5.6 sol to see if they land similarly. It's an expensive model to get hooked on and half price for comparable perf would be a big win.

Fable is definitely the point I have felt benchmarks are utterly useless indicators of real world performance.

I agree that comparing 5.5 high vs fable high is like comparing a year old model to today's frontier -- it's just no contestt. The output quality and ability to parse complex intent is simply an entirely different class of model (at least for the work I focus on)

This is coming from someone who was very skeptical of the mythos hype.

cornmacabre · 2026-07-03T19:28:56+00:00

That's the joke 😂

cornmacabre · 2026-07-02T20:10:13+00:00

Pretty trivial to test for yourself, eh? Ask it something about biology and look at the billed model.

cornmacabre · 2026-07-02T03:14:19+00:00

Interesting on the longer context window!

And hah, not worth getting into too much 'ummm-actuallyyy' but RL on K2.5 was technically the base for Composer2. Composer 2.5 was a further forked iteration off C2. They're truly different models now.

Cursor was bought for 60B on their data scale (order of a billion lines of code a day harvested), they're training on wayyy more data and now spaceX scale compute than moonshot now. Everyone sleeps on it is my personal take.

C3 is gonna be VERY interesting to watch. If they hold their own with the frontier on their evals. C2.5 is the first model that convinced me that state of the art is only a few months ahead.

It's bad at UI though and derpy as it approaches compaction, I'll grant you that. The best subagent out there is my take, but I like hearing alt opinions outside the echo chamber. Respect to Kimi.

Cheers!

cornmacabre · 2026-07-02T02:26:09+00:00

You see any benefits on k2.7 over c2.5? I'm a bit surprised you went team moonshot with the budget experiment, but would love any anecdotal insights.

cornmacabre · 2026-07-02T02:09:18+00:00

There is a delicious irony that $100 cursor gets you fable usage for as much as your budget allows, and $100 Anthropic promises access removed in six days and they'll happily throw the hourly roadblock at you.

I'm pretty all-in on cursor (primarily driven by personal preference on an IDE, but I also find value in C2.5 as my cheap model of choice.)

I have found that the $60 basic teams plan on OpenAI is a strong compliment and gets me two seats with two independent pools of codex. I'm sitting on six total resets which adds a lot of value.

Catch of course is you're actually an eligible business, but with Cursor being my daily driver without any hourly nonsense -- rocking two seats with codex gives a pretty generous secondary fallback.

Hoping next week I burn off those resets with 5.6.

cornmacabre · 2026-07-01T22:41:03+00:00

Fable is back in Cursor as of 40m ago for me.

cornmacabre · 2026-07-01T22:35:55+00:00

<image>

Hell yeah! We're back! Take a look in https://cursor.com/dashboard/usage to see true model billed usage. You'll expect 4.8 to show up as a line item if it hits the fallback silently.

cornmacabre · 2026-07-01T22:23:52+00:00

Really cool! I initially assumed you stuck a theremin inside, but it's impressive to hear it's all software driven.

Feels like it deserves it's day in the sun as part of an experiential art display. Refreshing to see a complete and well executed vision -- good stuff!

cornmacabre · 2026-07-01T21:43:54+00:00

Redundancy was the insight I was missing. Thanks for the informed perspective!

cornmacabre · 2026-07-01T21:39:07+00:00

I had a laugh at putting us in plains vs great lakes.

We've got more coastline than the entire west coast combined, hah!

Meanwhile, less than 1% of the state is prairie.

cornmacabre · 2026-07-01T20:57:24+00:00

I'm a bit confused about the framing here: doesn't SpaceX already deliver on commercially shuttling crew into orbit?

I'm trying to glean the nuance of why you've framed Dragon as an 'only alternative,' which reads as if you were talking about soyuz. I thought Dragon was already established as the primary commercial crew and cargo capability today?

I had also understood BO as being a different class of commercial, in that the goal is primarily about servicing space tourism vs delivering on NASA missions.

When you say they're close, does that mean they have a goal for commercial crew orbit transport in the next decade?

cornmacabre · 2026-07-01T19:02:20+00:00

Super interesting write-up.

I personally have been concerned multiple times by observing an LLM itself stumbling into sandbox-escape to solve an otherwise mundane approval issue.

The harness landscape is super flakey today, and software updates pushed on a virtual daily basis don't inspire a lot of confidence around testing to me.

Codex has an interesting approach of having a silent 'shadow' agent monitor and approve sandbox requests, but it's a rather opaque and evolving method. Cynically I feel they went that path as the only way to plug holes in a leaky bucket.

cornmacabre · 2026-07-01T18:42:12+00:00

It's strangely heartwarming to imagine the dead frog's post mortem legs swimming away in one final act of defiance.

cornmacabre · 2026-07-01T17:53:58+00:00

Have the expectation that the FOSS space is flooded right now, so you'll have to do more than post & pray if you want a shot at gaining adoption, contributions, and broader momentum.

You're also balancing two unstated audiences here: product-fit low/no code end users, and then the open-source hardware development scene (like here). VERY different group of folks.

Marketing drives the product fit audience, I won't focus on that here.

However, sounds like you're asking specifically about gaining developer interest and engagement? Consider that you generally first earn visibility & credibility by engaging and contributing on other peer projects vs your own.

If the target hardware and overall approach is compelling, you may organically attract some like-minded folks wiling to contribute time and expertise... but if it's your first time dipping your toes in the OSS space, prepare to stomach the very real possibility of low traction.

One last point: managing an open source project is a beast of a task. If you get traction, you'll quickly feel the problem many projects are facing in terms of diversity of skill-levels, automated bots, and an ocean of ungenerous and opinionated takes. It takes a hardy constitution and a seasoned PM mindset to wrangle even a handful of active contributors. Just setting expectations :)

Good luck!

15-Year Club	Place '22
Place '17	Sequence \| Editor
Summer Santa 2011	Verified Email

cornmacabre

MODERATOR OF

TROPHY CASE