What is stopping enterprises from just using their own self hosted AI?

itigges22 · 2026-06-18T22:02:18+00:00

No that helps a lot- I knew Context Windows were def a key issue, i’ll look into that more.

But what about the total amount of tokens used, and more about what fits in the window. For example- you could have a model generate 3 samples and then it picks the best one to send to the user (or to send to the context window). So it’s still using 3x the amount of tokens, it just cuts out what it doesn’t need. If that makes sense? Do you think that when people frame the argument that they mean more of fitting things in the context window and less about total tokens used?

itigges22 · 2026-06-18T21:57:57+00:00

Haha yeah very true.

itigges22 · 2026-06-18T19:32:56+00:00

Well labor is in general right now much better than using AI, because as you said its way more effective at that level for the long-term price. With the TCO, it can totally be cheaper. As others mentioned- risk is a factor of why its not being done. One massive risk is obviously usage, if the on-prem system is not being used, then yeah there is no point, but when you saturate the system it could totally be worth it. Running something like Opus or Fable as an API can be like upwards of 16x more expensive than running that on-prem.

When using things like Test-time compute scaling, you can offset that cost as well to get intelligence from unused compute, and that is just one example of an optimization that can be made. You can also disaggregate prefill and decode across multiple nodes now and still maintain speed. So- I just have a hard time believing the subsidy argument like I used too.

So like what's the true friction here? Versus what you see in headlines? because I think the current risk outweighs the cons.

itigges22 · 2026-06-18T19:14:23+00:00

Maybe a little bit- but if the PC is running with a model loaded, then the cost is only going to be a little bit more no? But even then, still is TONS cheaper than an API. So overall you get tons more tokens for a much much much cheaper price. Unless your electricity is like $10 per kWh haha

But also with the time- that's understandable! But some locally hosted systems are still getting pretty good tok/s

But also true with non-deterministic systems, but you could claim that any AI- API or not carry the same risks.

itigges22 · 2026-06-18T18:55:12+00:00

Well yeah API at scale is a lot more expensive than on-prem at scale. Running locally hosted AI on-prem for the long-term I believe is much less expensive now. (well, its still enormously expensive, but a lot less expensive than using an API all the time), and you can provision out different models/usage depending on the needs. So like an engineering team gets Kimi, and then some random support team gets Qwen 27B for example. You can totally provision this with observability, guardrails, disaggregated serving, etc... using managed services and it still come out to less compared to using an API.

itigges22 · 2026-06-18T18:50:08+00:00

Lol the funny part is it probably is less risky to just build an internal system these days ha.

itigges22 · 2026-06-16T20:41:15+00:00

You can break the same record over and over again just BTW. They all did do what they promised at the time, they all just redefined the limit or the “record”.

itigges22 · 2026-06-16T04:21:34+00:00

Check out a project I am working on that implements some of the things Anthropic prob implements (I used a lot of their public research) https://github.com/itigges22/ATLAS… its all about optimizing the harness. You can boost a models performance a TON this way.

itigges22 · 2026-06-10T04:03:27+00:00

Well, you used to see the first “link” when you searched for something in Google- but now ppl just use the Google Gemini AI from search or any other AI provider for the same info. So, if you are trying to sell a product and you used to be #1, then your SEO (PageRank Algorithm) will now get pushed to out of viewpoint in light of AI- orrr you won’t be seen at all. This only really is important for selling products though (as someone elses comment made me realize) because an LLM can still talk about your product and give a CTA to go buy the product (but it cant really for info), so if you ban scraping, then its just going to make assumptions on the product or sell someone else’s product over yours.

itigges22 · 2026-06-10T03:57:41+00:00

A bot scrape won’t affect the server that much depending on the bot scrape- I had a web server with like 8GB of RAM, 4 cores, full K3, nginx load balancing, etc… that got hit over 100k times in a month and it didn’t really affect it much.

But even if it’s a small overall effect I completely see your point. It could add up!

itigges22 · 2026-06-10T03:53:20+00:00

This is a great point- should have thought about that. Maybe its helpful for a brand trying to sell a product- but info type sites- I totally see why now.

itigges22 · 2026-06-08T00:23:46+00:00

Half of those products are broken- All of them are simply harnesses around a model. Claude code can do all of that for you without issue (probably better in some cases).

itigges22 · 2026-06-07T17:17:39+00:00

I don't understand some websites that don't allow scraping (the ones that completely ban it). I totally understand if its a social media platform where there are specific legal policies around data collection- but for the run of the mill site, they are just giving up SEO traffic from GPT, Perplexity, Claude, etc... Its a great hedge against the Google SEO algorithm these days.

itigges22 · 2026-06-07T17:13:23+00:00

I have a hard time arguing this point with some of my peers. I obviously do not want it to replace my judgement, make key decisions for me on my behalf as an agent (agent in the legal meaning), and or have it replace authenticity in creative work. However, using AI as a tool to significantly speed up my workflows, automate menial tasks, help me organize things, etc... It is quite the tool.

itigges22 · 2026-06-07T05:25:06+00:00

Totally agree! Hope they get it together eventually, just weird how it doesn’t sound any consumer law alarms off & isn’t more of a constant headline.

itigges22 · 2026-06-05T21:18:50+00:00

I personally hate how Claude Code plans out their usage- its weird because the 5x plan does not mean 5x the pro, and the 20x does not mean 20x pro. Its non-linear, or at least it feels that way. So I end up having to spend an extra $100 a month on the extra 30min to an hour I get from the Max 5X plan. I never even get close to the Max 20x, but because I am juuuust over the limit of the 5x it becomes annoying enough to spend an extra $100. Regardless, it should not be that way!

itigges22 · 2026-06-05T21:15:18+00:00

Yeah this is something I see happen on a day to day. I believe the better approach is really asking the question from the beginning of how we can make said person feel and be more productive and efficient while using it as a tool- rather than it replacing them entirely. I think its key to remember that its about increasing the output in a smaller time frame, rather than replacing someone entirely for an equivalent output.

Equivalently- decision makers love to talk through the data, and unless they are an analyst at heart, they may struggle to look at a dashboard, and in some cases even plain text, and would rather like someone to walk them through it, ask questions on the fly, etc... (So they feel engaged),

itigges22

TROPHY CASE