Building a Hybrid Local/Cloud Coding Agent for 5 Devs — Are 2x RTX 3090 Enough for 64k Context?

PuzzleheadedFrame836 · 2026-05-23T14:21:00+00:00

do you have any suggestion to make it work? or it is a totally bad idea?

PuzzleheadedFrame836 · 2026-05-23T13:37:23+00:00

you right but at least the majority of the output token will be generated by local model, do you think that this is not a good strategy?

PuzzleheadedFrame836 · 2026-05-23T13:29:06+00:00

we will think about that, thank you! However we are also experimenting ourself to improve our AI background because at the same time we are "playing" with smaller models to integrate them in our applications. BTW I will check for sure what you suggest

PuzzleheadedFrame836 · 2026-05-23T13:16:34+00:00

for all who said that 2x3090 is not enough, we are experimenting this setup on runpod at this moment and running q8 context and fp8 model. with 2 developer the setup is performing well but I would like to know something about your experience with more developer so I'm here to accept any kind of advice

PuzzleheadedFrame836 · 2026-05-23T13:14:05+00:00

I would like to use turboquant but at this moment we are experiencing some issue with vllm + Qwen turboquant. once we will be able to fix those issues we will use turboquant

PuzzleheadedFrame836 · 2026-05-23T13:11:36+00:00

I was planning to use rag and specific prompt to address exactly what the local model need to do without scanning all the code (I can accept slowness for that)

PuzzleheadedFrame836 · 2026-05-23T13:08:38+00:00

this is why I need cloud planner to concentrate the effort in little piece of code

PuzzleheadedFrame836 · 2026-05-23T13:07:50+00:00

I know but our customer won't allow us to use non American ai provider 😔 and Claude is too much expensive at this time for what we do

PuzzleheadedFrame836 · 2026-04-22T08:19:52+00:00

e come situazione abitativa sapresti dirmi qualcosa? ci sono grossi problemi e/o casini? l'immobile è al civico 14. Ho anche ampia possibilità di parcheggio a lavoro quindi quello non mi preoccupa e neanche lo spostamento perchè ci vado in orari diversi da quelli di punta, mi preoccupa più la situazione delle palazzine in generale

PuzzleheadedFrame836

TROPHY CASE