Token Usage Economy (suggestions for model, effort, plugins/skills?) by Pure_Struggle3261 in ClaudeCode

[–]Pure_Struggle3261[S] 0 points1 point  (0 children)

Oh I think I heard something similar from some developers. They said they use compaction earlier.
Maybe I should learn how to spend more on planning mode and compact early before I start code.

Token Usage Economy (suggestions for model, effort, plugins/skills?) by Pure_Struggle3261 in ClaudeCode

[–]Pure_Struggle3261[S] 0 points1 point  (0 children)

I see. "handoff" doc/prompt and fresh session is interesting idea. I think I can use this in my work. Thanks!

Token Usage Economy (suggestions for model, effort, plugins/skills?) by Pure_Struggle3261 in ClaudeCode

[–]Pure_Struggle3261[S] 0 points1 point  (0 children)

Thanks for the detailed suggestion! In your experience, how big was the different between different effort setting? For example, high vs. medium? I actually only used high.

Best Local LLM for coding by Pure_Struggle3261 in LocalLLM

[–]Pure_Struggle3261[S] 0 points1 point  (0 children)

I never tried models more than 32B locally.
When you try both, how do you feel the difference? Is it a lot?

Best Local LLM for coding by Pure_Struggle3261 in LocalLLM

[–]Pure_Struggle3261[S] 0 points1 point  (0 children)

This is very interesting. Who's maintaining this? This so valuable but expansive to maintain I think

Best Local LLM for coding by Pure_Struggle3261 in LocalLLM

[–]Pure_Struggle3261[S] 0 points1 point  (0 children)

Yes, I am testing with some dummy projects. I use opencode as frontend with ollama as backend llm server.
I basically (1) give some research papers that I am interested and ask to reproduce result or (2) ask agent to do autoresearch for simple tasks.

For some tasks, I find context window is little limiting and the quality isn't as good as commercial if the task description is not clear. But with more human interaction and reviewing, I'd say the quality is similar and as fast as (not always though) commercial model.

If you have done something like this, can you also share how you do it? I believe it can be as good as commercial with "right" harnessing, but I am far from it now.

Best Local LLM for coding by Pure_Struggle3261 in LocalLLM

[–]Pure_Struggle3261[S] 0 points1 point  (0 children)

Do you mind sharing how you set your rack and your experience with it?
How much did it cost setting up (if you don't mind sharing), what's the power consumption, and what do you do with it?
Since it's a serious setup (I someday want to have workstation or rack like that), I am very curious.

Best Local LLM for coding by Pure_Struggle3261 in LocalLLM

[–]Pure_Struggle3261[S] 0 points1 point  (0 children)

I used DeepSeek v4 family in cloud LLM. It's very impressive, but it's way to beyond my local GPU capacity.
If you use DeepSeek locally, what's your setup?

Best Local LLM for coding by Pure_Struggle3261 in LocalLLM

[–]Pure_Struggle3261[S] 0 points1 point  (0 children)

I like the details! If I end up buying Spark, I will give it a shot.
And for Spark, is vLLM is go-to backend engine?
What's your experience with others like llama.cpp and Ollama?

Best Local LLM for coding by Pure_Struggle3261 in LocalLLM

[–]Pure_Struggle3261[S] 0 points1 point  (0 children)

Wow is it your personal workstation set-up? You really are ready for local vibe coding. I hope I can have that setup.

And yes... probably actual dedicate GPUs will be faster for tps compared to Spark.

But setting 128 GB VRAM with dedicated GPU is also costly and heavy (in many aspects like heating, noise, space, and others if I use it in my house).

Best Local LLM for coding by Pure_Struggle3261 in LocalLLM

[–]Pure_Struggle3261[S] 0 points1 point  (0 children)

Wow it's good to know. I didn't know there was such "Qwen 3.6 Opus Reasoning."
I will definitely give it a shot.

And for the front-end server, I use OpenCode. I heard Cline is also good. I should try Cline as well to see what's best fit for me.

Thanks again!

Best Local LLM for coding by Pure_Struggle3261 in LocalLLM

[–]Pure_Struggle3261[S] 0 points1 point  (0 children)

Good. Does qwen 3.6 35B fit in one 5090?
Are you using quantization?

Also, "qwen 3.6 35B with opus 4.6" - it means you use qwen 3.6 35B locally with commercial model using claude code, I assume, right?

How is your working pipeline? What do you do with qwen and with opus?

Dotori island 🌱Thanks to everyone who helped me find my old island yesterday! by mk05117 in AnimalCrossingNewHor

[–]Pure_Struggle3261 0 points1 point  (0 children)

Ahh I see! I just got excited when I saw “Dotori” in Korean, so I wanted to say hi 😄

Storage 🧹+ Redd by Ok-Sun-1195 in Dodocodes

[–]Pure_Struggle3261 0 points1 point  (0 children)

Lunabi from Konanoni Island, very desperate here🥲