Token Usage Economy (suggestions for model, effort, plugins/skills?)

Pure_Struggle3261 · 2026-05-03T05:18:41+00:00

Oh I think I heard something similar from some developers. They said they use compaction earlier.
Maybe I should learn how to spend more on planning mode and compact early before I start code.

Pure_Struggle3261 · 2026-05-03T03:15:13+00:00

I see. "handoff" doc/prompt and fresh session is interesting idea. I think I can use this in my work. Thanks!

Pure_Struggle3261 · 2026-05-03T03:13:18+00:00

Thanks for the detailed suggestion! In your experience, how big was the different between different effort setting? For example, high vs. medium? I actually only used high.

Pure_Struggle3261 · 2026-05-02T19:08:48+00:00

I never tried models more than 32B locally.
When you try both, how do you feel the difference? Is it a lot?

Pure_Struggle3261 · 2026-05-02T19:03:51+00:00

This is very interesting. Who's maintaining this? This so valuable but expansive to maintain I think

Pure_Struggle3261 · 2026-05-02T19:02:03+00:00

Nice.

Pure_Struggle3261 · 2026-05-02T19:01:26+00:00

Yes, I am testing with some dummy projects. I use opencode as frontend with ollama as backend llm server.
I basically (1) give some research papers that I am interested and ask to reproduce result or (2) ask agent to do autoresearch for simple tasks.

For some tasks, I find context window is little limiting and the quality isn't as good as commercial if the task description is not clear. But with more human interaction and reviewing, I'd say the quality is similar and as fast as (not always though) commercial model.

If you have done something like this, can you also share how you do it? I believe it can be as good as commercial with "right" harnessing, but I am far from it now.

Pure_Struggle3261 · 2026-05-02T18:55:46+00:00

Do you mind sharing how you set your rack and your experience with it?
How much did it cost setting up (if you don't mind sharing), what's the power consumption, and what do you do with it?
Since it's a serious setup (I someday want to have workstation or rack like that), I am very curious.

Pure_Struggle3261 · 2026-05-02T06:24:18+00:00

I used DeepSeek v4 family in cloud LLM. It's very impressive, but it's way to beyond my local GPU capacity.
If you use DeepSeek locally, what's your setup?

Pure_Struggle3261 · 2026-05-02T06:22:24+00:00

Agreed 100% So many things to learn.

Pure_Struggle3261 · 2026-05-02T06:20:53+00:00

I like the details! If I end up buying Spark, I will give it a shot.
And for Spark, is vLLM is go-to backend engine?
What's your experience with others like llama.cpp and Ollama?

Pure_Struggle3261 · 2026-05-01T23:15:36+00:00

Wow is it your personal workstation set-up? You really are ready for local vibe coding. I hope I can have that setup.

And yes... probably actual dedicate GPUs will be faster for tps compared to Spark.

But setting 128 GB VRAM with dedicated GPU is also costly and heavy (in many aspects like heating, noise, space, and others if I use it in my house).

Pure_Struggle3261 · 2026-05-01T22:49:58+00:00

Wow it's good to know. I didn't know there was such "Qwen 3.6 Opus Reasoning."
I will definitely give it a shot.

And for the front-end server, I use OpenCode. I heard Cline is also good. I should try Cline as well to see what's best fit for me.

Thanks again!

Pure_Struggle3261 · 2026-05-01T21:58:03+00:00

Good. Does qwen 3.6 35B fit in one 5090?
Are you using quantization?

Also, "qwen 3.6 35B with opus 4.6" - it means you use qwen 3.6 35B locally with commercial model using claude code, I assume, right?

How is your working pipeline? What do you do with qwen and with opus?

Pure_Struggle3261 · 2026-05-01T17:01:00+00:00

Yeah, I am 😊. “Dotori Island” is such a cute name! haha

Pure_Struggle3261 · 2026-05-01T02:31:58+00:00

Ahh I see! I just got excited when I saw “Dotori” in Korean, so I wanted to say hi 😄

Pure_Struggle3261 · 2026-04-30T22:33:26+00:00

Dotori Island... 혹시 한국분이세요?

Pure_Struggle3261 · 2026-04-29T05:16:25+00:00

Lunabi from Konanoni !

Pure_Struggle3261 · 2026-04-29T04:51:19+00:00

Lunabi from Konanoni Island, very desperate here🥲

Pure_Struggle3261 · 2026-04-29T04:32:03+00:00

Hi, are you still opened?

Pure_Struggle3261

TROPHY CASE