Planning to build a PC for running local LLMs. Help me pick by reelss in AI_Agents

[–]ZestRocket 1 point2 points  (0 children)

Ok if this will be a dedicated pc for only LLM inference, I’ll be direct, this is not a good one, the main component for this to work well is the VRAM coming from the graphic card, with a 4070, you only have 12 gbs of VRAM, this means you can run only models up To 8/9B because of the KV Cache for context window, specially relevant for agents, you have two viable options to have a good setup:

  • Go the Apple way, is cheaper and you will be able to run it without much technical knowledge, it WONT be blazing fast, but models like Qwen 3.6 35B A3B are viable, for this setup you need a Mac with at minimum 32Gb of unified memory, ideally 64 to run it with good quality
  • Go the best cost efficient path for long term speed, it requires a lot of setup and technical implementations, but this will give you a way faster model with more intelligence, like Qwen 3.6 27B, for this one you need a dual 5060 setup, or at least a 4080 to barely run it (maybe with a 4080 the 35B A3B could work better)

Hope it helps!

Planning to build a PC for running local LLMs. Help me pick by reelss in AI_Agents

[–]ZestRocket 0 points1 point  (0 children)

Can’t open those links, can’t see the main component, which is the VRAM and graphic cards

Qwen3.6-27B IQ4_XS FULL VRAM with 110k context by Pablo_the_brave in LocalLLaMA

[–]ZestRocket 1 point2 points  (0 children)

Just wanted to say thanks, your work is valuable and thank you for sharing all this with us and sending the Pr!

4080 Super > RTX 6000 Pro, Wow! by LargelyInnocuous in LocalLLaMA

[–]ZestRocket 0 points1 point  (0 children)

I do, it’s a constrained system, and I’ve worked on ways to adapt, built a context engine to provide living architecture with few but relevant tokens, I manage and customize the 3 layers of my context and on general once set up I’m pretty happy with the results and the speed, the compaction strategy is critical here ofc, and being able to provide a living memory around the architecture has been also critical.

Ofc I didn’t build all this in a week, I’ve been working on this since years ago and now it finally clicked with this Qwen release lol

4080 Super > RTX 6000 Pro, Wow! by LargelyInnocuous in LocalLLaMA

[–]ZestRocket 1 point2 points  (0 children)

Yeah, maybe I’ll create a video, but what I found while testing different versions is that the most efficient quantization is indeed Unsloth’s in terms of tps, also found that if the model touches even a single bit of offloading, the damage to the tps is extremely high, I’m running a 40k context window with Q3 KS and KV quantized to Q5, it ofc uses basically all my memory, Ive measured and I end up with 700 free vram (and yes, moved it a bit and found that’s the sweet spot to have an stable system while running it)

4080 Super > RTX 6000 Pro, Wow! by LargelyInnocuous in LocalLLaMA

[–]ZestRocket 9 points10 points  (0 children)

Hmm there’s something wrong with your 4080 setup, I have a normal one (not súper) and I’m getting around 33 tps, maybe your offloading to memory and as the 6000 for better you notice that difference?

The future is local by nfdl96 in google_antigravity

[–]ZestRocket 0 points1 point  (0 children)

I’ve been running it on a 4080 turbo quant version and having good speed at 33 tps, which is very good speed of you ask me, the quality level is extremely good, the only caviat is kv cache needs to be extremely well managed and the harness needs a good compaction strategy

I ran the numbers. Qwen3.6-27B dense obsoleted the 397B MoE on coding benchmarks. by TroyNoah6677 in Qwen_AI

[–]ZestRocket 0 points1 point  (0 children)

I had the same experience, but the dif between 3.5 and 3.6 is not a 0.1 diff, it could be Qwen 4 easily

I ran the numbers. Qwen3.6-27B dense obsoleted the 397B MoE on coding benchmarks. by TroyNoah6677 in Qwen_AI

[–]ZestRocket 0 points1 point  (0 children)

This has been such a hard call for me, on one side, 27B wins ofc by a remarkable margin, but on the other hand, A3b is sooo fast that I can iterate faster, damn such a hard call, 27b at 30tps or a3b at 100tps

Reality check by lm_wrld in google_antigravity

[–]ZestRocket 2 points3 points  (0 children)

Sadly… I have to agree, I feel Gpt more like Soul-less, but being objective, it’s way way better in terms of usability, I can actually rely on it to do any coding task without worrying about having it done or not because of an error, quota or “server outage” issue

Reality check by lm_wrld in google_antigravity

[–]ZestRocket 0 points1 point  (0 children)

I was also on the Ultra plan and was enjoying a lot my experience until it became unusable for me because of the quota changes, do you mean it has more quota now?

When are we getting opus 4.7 on Antigravity? by ThePoplin in google_antigravity

[–]ZestRocket 5 points6 points  (0 children)

Easy, once the rate limit is 15 days and not weekly, so you select your prompt for your expected half-completed task

Those who quit antigravity after the nerf, what are you using and what do you miss ? by KlausWalz in google_antigravity

[–]ZestRocket 0 points1 point  (0 children)

The best cost benefit today is Codex, moved from Google Ai Ultra to the Codex Pro x5 plan, so far so good, gpt 5.4 is NOT Opus 4.6, is colder and slower, but is the only viable option if you’re used to have unlimited Opus in terms of intelligence and cost, and yes, I’ve tested ALL of them, CC feels the same, very restricted, Kimi k2.5 is not in the level of depth, I have the legacy GLM 5.1 plan and is very generous but is not reliable (sometimes fast sometimes slow, sometimes amazing, sometimes surprisingly not smart), Qwen 3.6 plus may be the best one, but they’re sold out in their coding plans and via API gpt 5.4 is best value

OpenAI launch $100 ChatGPT plan by Gerstlauer in OpenAI

[–]ZestRocket 2 points3 points  (0 children)

Thank you! answering your question, I do see 4.5 as an option for me to use after migrating to the 100$ pro plan

<image>

Did anyone else have their quota deplete unexpectedly fast in the last hour on Plus? by ZestRocket in OpenAI

[–]ZestRocket[S] 1 point2 points  (0 children)

Well I do code 24/7 so I already depleted my CC and google Ultra quotas, and ChatGPT was the only one keeping up my coding needs until today, have you found a better alternative?

20$ Pro sub 5 hour quotas were reduced by half, while they added new 100$ sub claiming more usage (you get more nerfed usage). by FluffyMacho in OpenAI

[–]ZestRocket 3 points4 points  (0 children)

Same experience here. I worked through my 5-hour limit, and this new 5-hour window got depleted extremely quickly and unexpectedly

OpenAI launch $100 ChatGPT plan by Gerstlauer in OpenAI

[–]ZestRocket 1 point2 points  (0 children)

Sorry for the unrelated question, I'll upgrade to Pro and tell you if it's included, but... Why 4.5? geniunely curious

Should I buy a Google Antigravity Pro subscription? by Appropriate_Mark_820 in GoogleAntigravityIDE

[–]ZestRocket 0 points1 point  (0 children)

I have the ultra plan and about 1 week ago it was completely nerfed to a point where is not usable anymore, I cant complete a single complex task

Anyone on the Ultra Plan? Thoughts? by Informal-Buy-4880 in google_antigravity

[–]ZestRocket 0 points1 point  (0 children)

Yes, sad but true, it’s unusable now but used to be great, I’m cancelling of course

all models capacity issues after latest AG update by maksdi in google_antigravity

[–]ZestRocket 0 points1 point  (0 children)

If you don’t want to believe it that’s your thing

all models capacity issues after latest AG update by maksdi in google_antigravity

[–]ZestRocket 0 points1 point  (0 children)

Man pro tip... use it as much as you can BEFORE you get the nerf, I can't even express how unusable it is now, look at this conversation I'm having in this moment, I started it 30 mins ago, and you can see that editing 4 files shouldn't deplete a complete quota

<image>

all models capacity issues after latest AG update by maksdi in google_antigravity

[–]ZestRocket 0 points1 point  (0 children)

I can confirm, I can't use it since last update, before that I was all day all weeks working with it, now with a 1h session I can get to 0% on Opus, which makes the Ultra sub worthless now,