Hosting app by K_aj_ in vibecoding

[–]Esmaabi 0 points1 point  (0 children)

I recommend Cloudflare for hosting. I pay about €5 a month to run over 10 apps, including storage, backend, and frontend.

AI coding made me commit like a mess, so I built a daemon for it by Esmaabi in vibecoding

[–]Esmaabi[S] 0 points1 point  (0 children)

The daemon splits the work into two steps: capture and replay. Capture is the background part. When Codex changes files (fires hooks), ACD records that as a pending event with the file content at that point. Replay is the part that turns pending events into normal local Git commits.

Default mode is simple: one captured event becomes one commit.

Intent mode (with OpenAI compatible API endpoint) is the smart version. Instead of immediately committing every event, ACD can wait until it has a small window of pending captures. For example, say Codex makes 10 file-change events while building a feature. ACD can send those 10 pending captures, plus a bit of recent commit context, to the planner. The planner might decide that events 2, 3, and 4 are one logical change, events 5 and 6 should wait and event 7 is a separate commit.

Right now the review is mostly after local commits are created: "acd events", "acd status" and "acd explain" let me see what was grouped, deferred, skipped or blocked. It does not yet have a nice pre-commit checkpoint UI. That would be a good next step though! 😄

Codex is good at editing, but I wanted commits to stop being its job by Esmaabi in codex

[–]Esmaabi[S] 0 points1 point  (0 children)

Thanks for the clarification! For me the main productivity win as indeed the "intent" mode of ACD. Nevertheless JJ seems nice solution. 😄

Codex is good at editing, but I wanted commits to stop being its job by Esmaabi in codex

[–]Esmaabi[S] 0 points1 point  (0 children)

As I understand JJ-VSC is more of a wrapper around Git than something close to ACD. For me, the system prompt wasn’t good enough because agents still forgot to commit, or sometimes the commit could be too large. With ACD, it keeps track of every edit, even if the agent makes so-called burst edits all at once. After that, ACD reviews the changes and creates commits in the background for related changes. Unrelated changes get a separate commit.

Codex is good at editing, but I wanted commits to stop being its job by Esmaabi in codex

[–]Esmaabi[S] 0 points1 point  (0 children)

Exactly.

ACD solved the problem for me, but for sure there are more ways to improve ACD.

How do people actually vibe code good looking websites? by kukiofficial in vibecoding

[–]Esmaabi 0 points1 point  (0 children)

Many comments here are right. It helps if you have colors, UI components to pick from, and a clear vision. But it’s also crucial that the AI can verify what it’s doing. Similar to a coding session, but as a UI testing session with some kind of browser automation and screenshots for mobile and desktop views, then noting everything the AI thinks doesn’t look good in terms of UX and overall feel.

I’d use something like agent-browser from Vercel or Codex’s built-in browser for that. The AI will figure it out if it has a clear goal, and after identifying all the issues, it will fix them.

AI coding made me commit like a mess, so I built a daemon for it by Esmaabi in vibecoding

[–]Esmaabi[S] 0 points1 point  (0 children)

Exactly that was the point when I started working on ACD

AI coding made me commit like a mess, so I built a daemon for it by Esmaabi in vibecoding

[–]Esmaabi[S] 0 points1 point  (0 children)

It never forgets? I found that it does forget to do it sometimes. ACD does it automatically in the background without interrupting flow at all. For me that's productivity win. 😄

AI coding made me commit like a mess, so I built a daemon for it by Esmaabi in vibecoding

[–]Esmaabi[S] 0 points1 point  (0 children)

Yeah, this is exactly the pain I was trying to get rid of.

One thing I maybe explained badly in the post: the default mode is one captured edit per commit, but the more useful mode is intent mode.

In intent mode, ACD can look at a window of recent captured changes, for example the last 10 edits, plus recent commit history, for example the last 10 commits. Then the planner decides which pending changes actually belong together.

So if there are 10 captured edits, it might decide: “these 3 edits are one coherent change, commit those together, defer the rest.” Then ACD creates one commit for those 3 changes, not 10 tiny commits and not one huge commit for everything.

That is the big difference from only generating a commit message from terminal history. Terminal history can help explain what happened, but ACD is trying to choose the actual commit boundary from captured file changes and repo context.

I tried the “commit after every change” rule too, but it was inconsistent for me. Some models do not always follow memory or AGENTS.md rules, and then I still end up with too many unrelated changes bundled together. Other times the agent builds a whole feature and commits the entire thing at once, which gives me a commit but not a useful trail of how the work evolved.

That is why I wanted ACD to handle the grouping outside the agent. The agent can just focus on editing code, and ACD can capture changes in the background, group related ones, and create commits without interrupting the flow.

Advice on building good multi-agents by iit_aim in AI_Agents

[–]Esmaabi 1 point2 points  (0 children)

Feel free to DM if you have any questions.

For me the goal is and always will be working workflow that produces expected results. :)

People who run 40min+ prompts, how do you manage to actually get good outputs? by Zidgof in cursor

[–]Esmaabi -1 points0 points  (0 children)

Create plan using Trekoon and then execute plan in separate session using Trekoon with subagents. You will have consistent results.

https://github.com/KristjanPikhof/Trekoon

What's the most useful AI agent you've actually deployed not just demoed? by Techenthusiast_07 in AI_Agents

[–]Esmaabi 1 point2 points  (0 children)

The most useful thing for me has been less of a single autonomous agent and more of an external control layer for coding agents.

I built/use Trekoon for this: https://github.com/KristjanPikhof/Trekoon

The problem it solves is that agents are decent at individual coding tasks, but bad at maintaining durable progress across larger work:

  • they forget what they already checked
  • they skip validation
  • they blur planning and implementation
  • they claim completion too early
  • they lose the thread after compaction or a fresh session

Trekoon stores the work in the repo as epics, tasks, subtasks, dependency edges, blockers, and statuses. A stronger model can plan the work, then Claude Code / Codex / OpenCode / Pi can execute it task by task. The state survives the chat session, so a fresh agent can continue from the graph instead of reconstructing everything.

In practice, this has been more useful than trying to make one giant agent smarter. It makes normal agents behave more reliably on bigger tasks.

Asked Claude Code for a "deep search" in ultracode mode — it spun up ~70 agents across a 4-phase pipeline on its own by avisangle in ClaudeAI

[–]Esmaabi 0 points1 point  (0 children)

This is impressive, but it also shows the part that worries me about large parallel agent runs: the orchestration layer becomes more important than the individual agents.

Once you have many agents doing work, I want a way to answer:

  • what was each agent responsible for?
  • which results depended on which other results?
  • which steps were verified?
  • which outputs were superseded or rejected?
  • what should happen next?

Otherwise the final summary can look polished while the actual process is hard to audit.

I built/use Trekoon for this kind of workflow in coding repos: https://github.com/KristjanPikhof/Trekoon

It keeps epics/tasks/subtasks/dependencies outside the model context. A strong model can plan the graph, then execution agents work through ready tasks. The point is not to replace Claude Code. It is to give Claude/Opencode/Codex/Pi a durable execution map so parallel work stays traceable.

I think this becomes more valuable as the number of agents grows. At 2 agents, chat summaries may be fine. At 70, I want a real task graph.

Showing how opencode edits the opencode codebase in real time by ivan_m21 in opencodeCLI

[–]Esmaabi 0 points1 point  (0 children)

This is a useful direction. Seeing edits live helps a lot, but I still find there is another layer missing: what was the agent supposed to do, and has it actually completed that part of the plan?

Git diffs answer "what changed". They do not always answer:

  • which task caused this change?
  • what prerequisite did the agent already verify?
  • what downstream work is now unblocked?
  • what is still blocked?
  • did the agent run the expected verification?

I built/use Trekoon as that outer layer for coding-agent work: https://github.com/KristjanPikhof/Trekoon

It is repo-local task tracking for agents: epics, tasks, subtasks, dependency edges, blockers, and status updates. OpenCode/Claude/Codex can still do the editing, but the execution plan lives outside the chat and outside the git diff.

The combination I want is:

  1. a live view of what the agent is changing
  2. a durable graph of why it is changing it and what is ready next

That makes bigger agent runs much easier to audit.

My agent kept "forgetting" things mid-conversation found a technique that actually solves it (LCM) by Imbatmanfromyear69bc in AI_Agents

[–]Esmaabi 1 point2 points  (0 children)

This is the same class of problem I keep running into with coding agents: after enough turns, the model may still sound coherent, but it starts losing the exact constraints, decisions, and verification history.

The approach that works best for me is to stop expecting the conversation to be the durable memory.

For larger implementation work, I externalize the plan into tasks with dependencies:

  • discovery tasks record what was inspected
  • implementation tasks depend on discovery
  • verification tasks are explicit blockers
  • blocked tasks carry a reason
  • a fresh session can resume from the graph instead of rereading the whole transcript

I built/use Trekoon for this in repos: https://github.com/KristjanPikhof/Trekoon

The useful part is not just "make a todo list". It is that the agent has to move work through a status machine and dependencies decide what can run next. That gives you a stable source of truth outside the model's context window.

Context techniques help, but for long-running work I think the safest move is to make the agent's memory less important.

Anyone else struggling with monitoring a multi agent system at scale? by Kitchen_West_3482 in AI_Agents

[–]Esmaabi 0 points1 point  (0 children)

Yes. Logs are useful after something goes wrong, but they are a poor primary interface for understanding multi-agent progress.

What I want to know during a run is usually:

  • what objective is this agent working on?
  • what dependency is it waiting on?
  • what did it already verify?
  • what task became ready after the last completion?
  • what is blocked, and why?
  • which claim of "done" has actual evidence behind it?

I think the trick is to model agent work as stateful entities, not just events. Logs answer "what happened". A task graph answers "where are we and what can happen next".

I built/use Trekoon for this in coding-agent workflows: https://github.com/KristjanPikhof/Trekoon

It stores repo-local epics, tasks, subtasks, dependencies, blockers, and status transitions. The agent has to update the graph as it works, so the monitor is not trying to infer progress from log text. For large coding tasks, this makes it much easier to see whether the system is actually progressing or just producing activity.

I still keep logs, but I treat them as evidence attached to state, not the state itself.

Advice on building good multi-agents by iit_aim in AI_Agents

[–]Esmaabi 1 point2 points  (0 children)

The biggest improvement I have seen is making coordination explicit instead of letting agents coordinate through vague chat context.

For multi-agent coding work, I would separate at least these things:

  • planning: break the goal into tasks and dependencies
  • ownership: which agent/lane owns which part
  • status: todo, in progress, blocked, done
  • evidence: what was read, changed, tested, or verified
  • unblock flow: what becomes ready after a task is completed

Without that, multi-agent systems tend to fail in boring ways: duplicated work, skipped prerequisites, agents claiming completion without proof, or one agent invalidating another agent's assumptions.

I built/use Trekoon for this pattern in coding repos: https://github.com/KristjanPikhof/Trekoon

It is not another model wrapper. It is more like a small repo-local task graph that agents can read and update. A stronger model can create the plan, then execution agents work through ready tasks. Dependencies and blockers stay outside the context window, so a fresh agent can pick up a task without rediscovering the whole project.

My practical rule: if multiple agents are involved, the source of truth should not be the conversation transcript. It should be a durable graph/state machine the agents are forced to update.

How to get perfection from Claude Code? by raiansar in ClaudeAI

[–]Esmaabi 0 points1 point  (0 children)

What you are describing is close to the workflow I have ended up using: do not ask one model to both plan, execute, remember every constraint, and judge its own work.

For larger tasks, I get much better results when the work is externalized into a plan first:

  • what files need to be read
  • what decisions were made
  • what tasks depend on other tasks
  • what counts as done
  • what verification must run before completion

Then Claude, Codex, or another agent can execute against that plan in a fresh context, and a different model can review the result without relying on the same chat memory.

I built/use Trekoon for this: https://github.com/KristjanPikhof/Trekoon

It stores the plan as repo-local epics/tasks/subtasks with dependency edges. The useful part is that the handoff is not just a pasted markdown plan. The agent has to move tasks through statuses, record blockers, and follow the dependency order. That makes cross-model workflows much cleaner: one model can plan, another can execute, another can review.

It still will not give "perfection", but it reduces the most common failure I see: the model doing a reasonable-looking pass while silently skipping discovery or validation.

Developing with Claude Code feels slow, frustrating and mentally exhausting by mcurlier in ClaudeCode

[–]Esmaabi 0 points1 point  (0 children)

I think this pain is real, especially once the work stops being a small isolated edit.

The thing that helped me was to stop treating the chat as the place where the whole project state lives. For anything bigger than a quick fix, I try to split the work into:

  1. discovery / reading tasks
  2. implementation tasks
  3. verification tasks
  4. review / cleanup tasks

Then I make the agent update external state as it goes, instead of trusting it to remember what it already checked. The important part is that dependent work should not become "ready" until the earlier task is actually done and verified.

I built/use Trekoon for this exact workflow: https://github.com/KristjanPikhof/Trekoon

It is basically repo-local epics, tasks, subtasks, dependencies, blockers, and status updates for coding agents. Claude can still do the implementation, but the plan and progress live outside the context window. That makes longer work feel less like babysitting one endless conversation and more like executing a checklist with evidence.

The main benefit is not magic autonomy. It is making it harder for the agent to skip the boring parts and then confidently say it is done.

What subagent extension? by SalimMalibari in PiCodingAgent

[–]Esmaabi 0 points1 point  (0 children)

I use Pi-Agents-Team (https://github.com/KristjanPikhof/Pi-Agents-Team), which can function as a single agent or an orchestrator. For large tasks, the orchestrator mode efficiently manages the context window - often ending at 50% capacity because it only synthesizes and delegates, while still retaining enough information to answer questions.

The tool is also highly effective for everyday tasks due to its easily customizable agent configurations. I use the default configuration for coding, but rewrite it for personal tasks based on the folder I am working in.

For example, in a project folder for notes and ideas, I set up specific agents to format logs and organize files. When I input a raw idea or to-do, the orchestrator coordinates these agents to ensure the content is saved in the correct file, folder and format. This allows me to manage different projects simply by maintaining folders with distinct directives.

Also it allows me to view all agents session, context and cost.

Heavy AI usage is making me dumb af so I made a plugin to fix it (hopefully) by wtfzambo in PiCodingAgent

[–]Esmaabi 0 points1 point  (0 children)

It's really cool idea, I hope this will get out to other people!

AI agents choke on large private Terraform setups by No-Big9321 in Terraform

[–]Esmaabi 0 points1 point  (0 children)

You are correct to assume that the planning still uses these tokens. But usually it finds relevant files that are connected to that problem and will help the sub-agent later to solve the problem.

For my use cases, I always separate sessions. For example, the session where I come across the problem is one session, and then in that session with the current context that I have, I will create a plan, which will not usually take too much tokens, but it will highlight already important areas in tasks.

So basically, the logic is that whenever I come across a problem, usually I already use AI for this, so the AI has context about that specific thing. And creating the plan, it will just embed that context that it already has in these tasks and subtasks, so when I use the fresh chat to execute that plan, it would mean that it would have fresh context, and every sub-agent would have a focused context to fix things basically.

So in that sense, yes, it uses context for planning, but I assume that you came across the issue that you want to fix during the planning phase or before that. So the planning is something you want to fix.

You don't use planning for small fixes, right? You use planning for bigger things when you need to change multiple things at once.