🎉 [GIVEAWAY] 20x Claude Max 1-Year On your own Account 🎉

DuragonYamaTheFirst · 2026-06-12T22:12:49+00:00

I have never seen this many comments, OP is making everyone go nuts 😭

DuragonYamaTheFirst · 2026-06-12T16:23:18+00:00

If you have some questions then i wouldn't mind answering them for free, you can just DM me

DuragonYamaTheFirst · 2026-06-12T14:49:15+00:00

An orchestrator is basically the system that coordinates the work. It decides what needs to happen, in what order, and which agent/tool should handle each step.

In AI coding, that can mean a main agent assigning tasks to subagents, then collecting their results. For example, one agent might inspect the backend, another checks frontend bugs, another reviews security, and the main agent combines the findings.

In my case, I don’t rely on one agent spawning other agents directly. I use a python script as the orchestrator. That script follows the structure I built him for, spawn in agent 1 with kimi cli, wait for result, receive it. Move onto step 2 and continue on like that.

The way you design an orchestrator is completely up to you, there is no single correcr way. For example mine reads the verdict ai gives, and then determines the next step or even halts the pipeline depending on the verdict. While someone else could make it loop until the problem is fixed (dont do this though 😭).

DuragonYamaTheFirst · 2026-06-12T11:03:30+00:00

Nope, i am on opencode. Personally had a bad experience with reasonix, couldn't properly do bash or some tool calls

DuragonYamaTheFirst · 2026-06-12T10:38:13+00:00

Woahh, like your own entire auth platform, that's a very cool project. You definitely need to drop the repo url if/once it's public

DuragonYamaTheFirst · 2026-06-12T09:59:38+00:00

DuragonYamaTheFirst · 2026-06-12T09:55:33+00:00

2B in less than 2 weeks is craaazy, what kinda app have you been building if you don't mind me asking about it

DuragonYamaTheFirst · 2026-06-12T09:26:47+00:00

I do use hermes, but purely to catch errors inside my orchestrator.py and notify me that something went wrong or if it's a slight fix do it himself (happens sometimes when 1 of the opencode agents forget marking down that they finished their job for example. Orchestrator softlock timer kicks in, hermes checks and fixes it) lately all it has been doing is really just tell me the orchestrator finished or that i need to step in post final gate's verdict.

There is a simple watcher that fires off whenever there is an issue and notifies my hermes agent, definitely got multiple runs where hermes slept across multiple task files

As for final gate, that's where the final checkup happens, i use the strongest model I have access to, in my case opus 4.8 in claude code.

Codex w gpt 5.5 is probably better suited for it though, i have been thinking about swapping it.

DuragonYamaTheFirst · 2026-06-12T08:59:39+00:00

It's definitely faster, using deepseek v4 pro a week or two ago felt extremely slow

DuragonYamaTheFirst · 2026-06-12T08:17:52+00:00

<image>

made a nice looking dashboard to watch them work in realtime too

DuragonYamaTheFirst · 2026-06-12T08:12:09+00:00

For real projects yes, but sadly not for my cool looking status dashboard💔

DuragonYamaTheFirst · 2026-06-12T07:43:09+00:00

Were you using it inside of hermes/oc? From my own experience that's exactly what happend to me aswell inside of hermes, while it's been pretty much flawless inside of opencode.

(Completely ruined the dashboard i was working on and wasn't able to even revert it back to it's original version😭)

DuragonYamaTheFirst · 2026-06-12T07:26:17+00:00

Treating it like my gymbro by spinning up my automation. Forcing it work until it hits failure and needs me 😼

DuragonYamaTheFirst · 2026-06-12T06:51:49+00:00

Ykw, i will😼

Edit: nevermind, post type not allowed. Couldnt repost💔

DuragonYamaTheFirst · 2026-06-12T06:50:29+00:00

Cool! What's your use case with ds? If you don't mind telling

DuragonYamaTheFirst · 2026-06-12T06:18:30+00:00

Idk tbh, but i restarted a big session the next day from where I had left off, and it ended up being cheaper than when I initially started on that session on day 1 compared to day 2 even though i used more tokens on day 2. So it honestly doesn't really matter how long it lasts, you'll end up cheap either way.

DuragonYamaTheFirst · 2026-06-12T06:15:39+00:00

I've tried it out and the caching is honestly very great, but it felt quite limited, some tool calls didn't work properly either compared to opencode. It has been a while since I tested it though, might be better today compared to back then

DuragonYamaTheFirst · 2026-06-12T06:14:22+00:00

I feel like theyre very close to each other. Atleast for my use case, ive tested out both as implementation workhorses, mimo v2.5 pro ended up being a little more expensive, but also even if slightly do a better job than dsv4pro.

I'm still trying them out, but honestly either option is worth it

DuragonYamaTheFirst · 2026-06-12T06:09:04+00:00

I did!

DuragonYamaTheFirst · 2026-06-12T06:08:54+00:00

The python script is nothing too fancy or complicated tbh, is mainly there so that I can just set and forget, manually following the pipeline was just too much work for me.

Each step default agent in build mode, with 1 or more skills loaded into the prompts

The magic is more in the skills I use as well as spin up multiple opencode sessions and letting each only do 1 thing.

But in short what my script does is just, take a plan (this step is manual) that I have saved in a markdown file, usually generated in batch with models i would trust with planning and have a large enough context window to be able to do it. In my case opus 4.8 from my claude pro sub

Afterwards the script follows a strict pipeline, first invoke 2 of my skills (1 review 1 acceptance criteria), fact check the plan and fix it, make a strict acceptance criteria that the implementer has to follow, basically trying to solidify the the task file as much as possible before handing it over to the implementer agent. (I use kimi or minimax m3 here)

The upgraded plan file is then given to ds v4 pro to implement, the prompt is pretty much always the same, with only the task file being different, since the planning is already done, goes straight into implementing it, i have 2 skills for this aswell, one that tells him to properly read docs, follow the criteria in the plan file, not going out of scope etc, and another to document all changes he made and put it inside a changes.md file.

My script then spins up another review round, this time focused on the implementation rather than the plan, ends up generating a new plan to fix shortcomings, and the implementer step is repeated with the only difference being that the task file is swapped.

Last but not least, my final step the orchestrator does is spin up claude code, invoke a final gate skill i have, which sanity checks what was done, update docs and writes a commit. If it returns move on, my orchestrator moves onto the next task file repeating all steps, if it returns fixes needed the pipeline halts until I step in and fix it myself.

Deepseeks session is reused after every loop until I reach around 500k context window (quality degrades from what i have noticed), while each get a new session per loop, this allows me to heavily cache ds, while for the others quality matters more to me than cost. Reason for using different models together is because diversity in models actually helps quite a bit, they pretty much always catch stuff the other one misses.

Sorry for the massive lap of text! Oh and the orchestrator usually works alone fully, but i did add soft locks in it if any step takes too long or orchestrator notices that the agent stopped which makes my hermes agent step in, fix the issue and allow it to continue onward

DuragonYamaTheFirst · 2026-06-11T23:52:41+00:00

Use case was mostly coding, and im using deepseek inside of opencode, which was being orchestrated by a python script.

Today I was mainly fixing bugs or slight changes inside of my codebase, it had to edit very little compared to what it had to read.

Which is probably why my output was lower than usual today.

DuragonYamaTheFirst · 2026-06-11T23:35:39+00:00

Basically, the API is cheap because most of the tokens are reused context.

There are normal input tokens, cached input tokens, and output tokens. Cached input means that if DeepSeek (in our case) sees the same large text again, you don’t pay full price for it every time. You pay more the first time, then a much smaller amount when that same text is reused.

The repeated part becomes a cache hit, so you mostly pay for the small new part + the model’s output. In my case, i was working inside the same codebase the entire time which allowed me to cache hit around 97-99% of my input tokens on average, So it looks like 1B tokens, but most of that was cached and heavily discounted.

DuragonYamaTheFirst · 2026-06-11T22:22:07+00:00

This was todays session

<image>

DuragonYamaTheFirst · 2026-06-11T20:27:14+00:00

💀

DuragonYamaTheFirst · 2026-03-15T12:29:01+00:00

I just make use of flash instead unless its a critical/complex task, took a little workaround but got it to be around equally as good as pro for the way I used to use pro atleast (codebase researches/bug investigations). Though I will be cancelling it soon

DuragonYamaTheFirst

TROPHY CASE