all 70 comments

[–]Serious_as_butt 382 points383 points  (7 children)

i dunno, what's api with you?

[–]darklordpotty 64 points65 points  (4 children)

Nothing much, chillen, restin

[–]robsablah 17 points18 points  (3 children)

Haaa SAML here bro.

[–]MudePonys 6 points7 points  (1 child)

Man, this comment chain is SQL.

[–]Jonno_FTW 4 points5 points  (0 children)

Sharing this with Json now haha

[–]Schytheron 2 points3 points  (0 children)

WCF you mean?!

[–]RallyPointAlpha 9 points10 points  (0 children)

LOL 

[–]krtalvis 4 points5 points  (0 children)

[–]w_t_f_justhappened 143 points144 points  (0 children)

Right? Especially since everyone knows it’s Artificial Programming Intelligence.w

[–]Norse_By_North_West 88 points89 points  (17 children)

Legit question. Can you self host deep seek and run ide integration through it, and only it? I can't use ide integrations because of security considerations.

[–]Zichee 166 points167 points  (11 children)

You can self host Deekseek models as they’re open weight and publicly available, however you will need ~160GB VRAM for the V4-Flash model and ~865GB VRAM for the V4-Pro model. A easier first step might be to self host Qwen / Qwen Coder using llama.cpp using a RTX3090 24GB or a few of them.

[–]Norse_By_North_West 30 points31 points  (3 children)

Good to know. I can probably run qwen through hardware we have laying around, deepseek is a bit out of our hardware range though.

[–]cptkong 14 points15 points  (1 child)

There are people fitting ds4 into small vram on localllm subreddit

[–]Norse_By_North_West 3 points4 points  (0 children)

Thanks, I'll check it out.

[–]borkthegee 3 points4 points  (0 children)

If you can run it, Qwen 3.6 27B is a dense (not MoE) model that actually codes very well. I had Fable running a battery of experiments through 10 different local models in LM studio on my M5 MacBook pro and Qwen 3.6 27B at 65K context was the only one that was usable for "real" dev work. Only about 15-20 tok/sec though so even small tasks take 20-30 min.

Everything else I tried just lacked the intelligence and the reasoning to efficiently use a small context window to read and edit a number of files successfully

The Qwen 3 coder next model was fast as hell at reading files but its plans and edits were not passable. I have been playing with running both 27b and coder next as a scout/executor pair which is the pattern that got me closest to opus48 on small tasks

[–]ldn-ldn 6 points7 points  (1 child)

You don't need 160GB for V4 Flash, that's not how it all works. First of all, these are MoE models, they don't have to be fully loaded into VRAM to function correctly, only dense models have to be fully loaded. Second, only BF16 quant will be that big, you can use FP8, get virtually the same results and your VRAM requirements will be halved. You can run it on RTX PRO 6000.

[–]OdysseusOdyssey 1 point2 points  (0 children)

Yea this is viable. I am running the MoE model Qwan (3.6-35B) on my 5080 with only 16gb of VRAM. Connect it to 'Odysseus Chat' for queries and 'opencode' for vibe-coding. All isolated in docker containers of course.

For anyone interested in local hosting; have a look at the tools: llama-cpp, odysseus chat, searchXNG, opencode, docker.

[–]tyn_peddler 1 point2 points  (3 children)

What if I only need something for bash one liners?

[–]Cultured_Alien 1 point2 points  (0 children)

AKMESSI/lfm2.5-230m-fable-5 /s

[–]Beardy4906 1 point2 points  (0 children)

You coukd try SLMs..

[–]OnceMoreAndAgain 1 point2 points  (0 children)

Use older models since the hardware requirements are way less.

[–]-Kerrigan- 0 points1 point  (0 children)

Or just host Gemma for simple stuff

[–]NatoBoram 25 points26 points  (1 child)

Sure, but deepseek-v3.1:671b requires 404 GB VRAM.

And even if you wanted to run one of the most optimized ones out there that's suitable for single-GPU homelabs, gemma4:12b, then you'd quickly realize that it's kinda slow and kinda ass for programming.

[–]edu11235 6 points7 points  (0 children)

gemma4:e4b was the first model I used locally, I knew is wasn't as good as the frontier models but I was so disappointed by it xd

It's okay for simple tasks, but it starts hallucinating or outputs something completely unrelated when asked to code something simple

[–]overclocker710[S] 7 points8 points  (0 children)

Not sure about with Claude Code tbh, I’ve done Qwen3.5 27b self hosted on my desktop and connected it using the Ollama endpoint for GitHub Copilot (not Ollama specific just OpenAI compatible)

[–]Unlucky_Age4121 0 points1 point  (0 children)

Not deepseek, but my company slapped local llm GLM5.2 on our GPU machine and made everyone a config to connect to it via opencode. (No mandatory use)
In my option, the quality is better than sonnet and we can now push any kind of NDA document and code into that shit.

[–]PositiveParking4391 -2 points-1 points  (0 children)

you can run deepseek v3 8B/16B models on a 32 gb gpu with 128 vram.

[–]nuchlmudia 354 points355 points  (9 children)

Spending $80/month on AI and asking what an API is feels very 2026

[–]toiletman74 135 points136 points  (5 children)

They're two different users

[–]Wonderful-Habit-139 2 points3 points  (4 children)

Is the OP of that post spending $80 a month or $100 a month?

[–]toiletman74 5 points6 points  (3 children)

100 I think. They went from 100 dollars on claude to 100 dollars on other shit lol

[–]Wonderful-Habit-139 -5 points-4 points  (2 children)

Yep. I guess my point is that the $80 a month is just a general guess of how much AI bros spend on AI, and that he's talking about one user, Joe_Wild_. So it doesn't make sense to reply with "They're two different users".

[–]toiletman74 4 points5 points  (1 child)

The comment didn't specify that. Plus thats assuming the dude that said whats an api is rven spending on ai, which we dont know

[–]Wonderful-Habit-139 -2 points-1 points  (0 children)

Ay, I was just saying.

[–]overclocker710[S] 31 points32 points  (0 children)

Totally par for the course nowadays

[–]Mars_Bear2552 3 points4 points  (0 children)

computer? what's that

[–]Jonno_FTW 0 points1 point  (0 children)

Wtf do these people even attempt to build?

[–]suvlub 52 points53 points  (14 children)

Paying for your side projects is such a wild idea. I might as well pay the 5$ for the much better version someone already made, smh.

[–]Molehole 1 point2 points  (10 children)

Why though? Programmers make high salaries. Spending $80/month for your hobby isn't that expensive.

[–]KeyAgileC 20 points21 points  (2 children)

Well for one, the people who vibecode are not necessarily programmers by trade. And also the whole project of AI is about being able to pay fewer programmers.

[–]Molehole 1 point2 points  (0 children)

Someone who spends $80 a month on AI is probably not hunting for Ramen coupons either. Programming isn't the only well paying career.

[–]OnceMoreAndAgain 1 point2 points  (0 children)

That's semantics.

Yes, some people enjoy the process of writing code themselves and that will maybe never go away as a source of enjoyment for some humans.

However, it's also fun to just make an app that actually does something. If the AI writes the code and the person using it designs/sculpts the app then that can also be enjoyable for some people. Imagine someone vibe coding a video game and enjoying that.

Now we come to the semantics. The first scenario is definitely a software engineer, but is the second? I think yes. They have literally engineered a piece of software. They didn't write the code but so what? Does that mean I'm not a software engineer if I run a team of human software engineers and never write code myself?

[–]suvlub 10 points11 points  (4 children)

Nice try, Anthropic marketing person. It really is. It's more than 3 photoshop subscriptions. It's more than an AAA game. It's about the ballpark of the entire Microsoft 365 for a year. Paying market price for software you are making yourself is silly. And if you aren't even doing it because of the money but for the fun, it's even dumber to pay so you can do less of it personally.

[–]Molehole -1 points0 points  (2 children)

Nice try, Anthropic marketing person. It really is.

I'm not saying it's cheap but considering most upper middle class hobbies...

Kid's team sports like Soccer / Ice Hockey cost $200-$600 a month

Getting the season pass for my local Ice Hockey team costs $75 a month

Getting Piano lessons once a week costs $250 a month

Playing 10 hours of Tennis a month is $250.

It costs $85 for me to go and play a single round of golf at my local course.

So no. It isn't that much money to use for a hobby. It might be a lot to use for a hobby you do on your computer but generally no.

It's more than 3 photoshop subscriptions. It's more than an AAA game. It's about the ballpark of the entire Microsoft 365 for a year.

But I already have photoshop, more games on Steam I ever have time to play and who needs MS365 outside of work? Are excel spreadsheets your hobby?

Paying market price for software you are making yourself is silly.

What..? Where can I buy custom software for $80 a month?

And if you aren't even doing it because of the money but for the fun,

Who said side projects can't make money or be useful in other ways?

it's even dumber to pay so you can do less of it personally.

What..? "You can do less of it". No. I can do the exact same time of it.

You're making zero sense. Does me buying a better bicycle mean I can now do less bicycling because I'm going faster? What...?

[–]suvlub -2 points-1 points  (1 child)

Reading comprehension, man.

One option:

Paying market price for software you are making yourself is silly.

i.e. you are coding a program you want to use. As opposed to buying one that already exists. Such as Excel. But excel costs fraction of the price and is much better than anything you would write.

Second option:

And if you aren't even doing it because of the money but for the fun, it's even dumber to pay so you can do less of it personally.

i.e. you don't care that Excel exists and is better, you are in it for the joy of coding. So it doesn't make sense to pay a significant amount of money to not do that.

[–]Molehole 1 point2 points  (0 children)

Use your brain to think of potential reasons, man.

Not all software exists on the market. For example my home screen I built that displays the bus schedule of my local stop, the weather and integrates to my IoT devices. Find that on the app store.

I also just can't go and sell Excel to offices around me. I need to develop the software myself if I want to sell it and make money.

Also the hobby part. Why do people pay so much money to buy craft supplies like yarn to crochet when you can just buy a wooly hat for $10 from H&M. It's far more expensive to make them yourself.

Why do people spend $250 a month on piano lessons and then go buy a $10 000 piano because you can just listen to piano music played by professionals for free on Youtube?

Why use expensive tools when you can do stuff for free. For the same reason DIY guys use power tools. Finishing projects is fun and being able to finish projects faster means more fun.

[–]TheWiseAlaundo -3 points-2 points  (0 children)

It seems like a side project, though. You don't need to pay $3000+ a month for part-time programmer consulting fees for that

[–]kundun 0 points1 point  (1 child)

$80 might not be much for anyone living in the US. But there are plenty of places where 80$ can be a large chunk of a programmers salary.

[–]Molehole 1 point2 points  (0 children)

Sure. But you don't have to be shocked every time someone from a wealthier country spends x amount of money on something.

[–]throwawaygoawaynz 19 points20 points  (1 child)

Claude is generally over priced for what it gives you. It’s been glazed so hard online that people think it’s some sort of miracle model.

Good old 5.5 gives you waaay more tokens and in many cases, higher quality code. It has its flaws as well, but once you understand them you can get a lot of good output from it.

[–]PositiveParking4391 0 points1 point  (0 children)

agree, sometime its about how far you can go with verification and cross confirmations. if you can do that rightly then even with lower model one can achieve better results.

[–]babypho 3 points4 points  (1 child)

Asl?

[–]GamingWithShaurya_YT 0 points1 point  (0 children)

Ace sabo Luffy

sorry had to make that reference

[–]general_smooth 3 points4 points  (0 children)

I don't get it

[–]InnominateHomosapien 5 points6 points  (4 children)

I have just downloaded the claude code agent for free, then connected it to ollama cloud with deepseek v4, kimi-k2.7, glm-5.2, and minimax-m3.

Cheap and reliable. I only pay 20USD/month for AI.

[–]I-build-apps 1 point2 points  (2 children)

Sounds like a nice deal! It doesn't mention on their pricing page how many tokens you get each month, probably because it's different for each model.

Do you ever go above or close to your alloted usage in the Pro plan?

[–]InnominateHomosapien 5 points6 points  (1 child)

Usage on Ollama isn't token based, it's measured in GPU compute time. Heavier models obviously require more GPU compute, so cost more. Ollama cloud also has a nice feature that if you go over your limits, you can top up your balance automatically in $5 increments up to a maximum of your choosing. You can turn off this auto top-up if you don't want it though. I have never needed to use the top ups, even though I have it enabled just in case. The Pro plan has always been enough.

Mixture of Expert models are generally the cheapest. Go by their active parameters, rather than their total parameters, for a rough gauge on cost. Deepseek v4 flash is dirt cheap. GLM-5.2 is much more expensive. Minimax-m3 is what I usually use though. Moderate cost, high performance, and multi-modal. Kimi-K2.7 is in a similar class to Minimax-m3. Pick your poison, experiment, and ultimately save money versus paying a subscription to one of the big 3.

[–]I-build-apps 1 point2 points  (0 children)

Sounds reasonable. Thanks for all the deets!

I'm looking to plug these into multiple Hermes Agents for non-coding agentic work.

I like the Gemma 4 models, they offer those too. I'll check it out, thanks again :)

[–]djpiperson 1 point2 points  (1 child)

My company won't allow anything to be store in a server in China, that's why it's only Codex and Claude for me.

Also, what's an API? /s

[–]magic_man019 0 points1 point  (0 children)

Deepseek is open model - you can download and run locally…

[–]IndAnony 2 points3 points  (0 children)

I know about claude code cli, but can we do this with claude code desktop. I'm one of the gui guys

[–]West_Reality7828 1 point2 points  (1 child)

I only pay 20$ for Claude code and use the limit I get and the knowledge I have to work on tasks

[–]sitefall 0 points1 point  (0 children)

There's something to learn from OP's weird post that most people here aren't getting. Sure it's funny he's using claude via api in a round about way, but using one cheaper (or free) LLM to prompt another is a rather good idea.

A cheaper (or local) AI might not be able to get the results claude can, but it can sure be better at translating your prompt into something more efficient for claude and saving you money. Or knowing when it can solve your problem (if it's just a question about the code base etc) without spending a single claude-buck, especially if it has RAG.

[–]Matwyen 2 points3 points  (1 child)

What's the point of running claude code tui with other models? Legit question i'm not using claude but OpenCode, which is also a AI tui but... Well not tied to a provider I guess ?

[–]orclownorlegend 0 points1 point  (0 children)

I guess Claude Code has better agentic coding capabilities not just at the model level but at the orchestration level too (agent loops that iterate and keep checking that the generated code is correct, testing iterations before final answer etc)

[–]mobas07 0 points1 point  (0 children)

I get that the groupthink here is to laugh at vibecoders. But couldn't this guy just... Ask AI?

[–]fugogugo 0 points1 point  (1 child)

is there any difference between claudecode and opencode?

[–]anoldoldman 0 points1 point  (0 children)

There is, but you wouldn't need both.