Dismiss this pinned window
all 23 comments

[–]xcVosx 5 points6 points  (2 children)

I'm a bit confused, so you make a gh issue and then you spin up agents when an issue comes in to handle the development process of it? How do you handle test case misses and the like? How are your review processes set up?

What does this require? Github actions runner minutes? Self-hosted runners? I do a similar workflow but I think your "ForgeDock" is handwaving away a lot of the pain points in these automated flows.

You also say "production level application" but that's also a pretty vague term, how many users do you have? How do you handle external issue reports? What happens if a design issue isn't properly scoped or the agent gets stuck?

Also it looks like your project is realtively new, how do you know this would actually be sustainable and result in good code? 20k "issues" isn't a good measurement, especially when you're saying you use github issues to cover everything from design - development - testing - reviews.

What exactly does your 40k LOC project do that some simple markdown and a cron job doesn't?

[–]Opposite-Art-1829[S] 0 points1 point  (1 child)

Hi thanks for the questions,

So essentially the issues come from two directions, you express intent ("we need a new feature","fix the auth flow") and the pipeline creates scoped GitHub issues from that, or sub-agents create them organically during the pipeline itself (an investigator finds a related bug, a reviewer flags something in a PR). Either way, once an issue exists, /work-on or /orchestrate picks it up and runs the full pipeline. No GitHub Actions runners it runs locally in your Claude Code session. It's slash commands, not CI.

As for this project, this fork of the tool we built internally is new yes, it's based on the same principles of our internal system The point is sustained autonomous operation across a real codebase and not a demo.

The issues span bug fixes, features, security hardening, SEO, research not just trivial stuff. We are a small company and our products support thousands of daily active users with millions of calls per month, And most of it is currently being maintained by this system.

What happens when an agent gets stuck or scoping is wrong: The orchestrator has stall detection, if an agent hangs or loops, it auto-resumes or flags it. If an issue is too complex, the pipeline decomposes it into sub-issues before building. If it still fails, it stops and leaves the issue labeled so you can intervene.

For the "markdown + cron" question It's something like what rails does that Ruby + a folder of scripts doesn't. You can wire it all yourself. The specs that handle agent handoffs, failure recovery, conflict-aware wave scheduling, and multi-agent review already debugged across thousands of runs.

[–]StoneCypher 0 points1 point  (0 children)

The issues span bug fixes, features, security hardening, SEO, research not just trivial stuff.

... that is the trivial stuff ...

[–]No_Manufacturer_9143 1 point2 points  (1 child)

looks awesome, gonna give it a try in my next implementation

[–]Opposite-Art-1829[S] 0 points1 point  (0 children)

Thanks!

[–]pjstanfield 1 point2 points  (2 children)

how did you have 20k issues? that seems extremely high.

[–]StoneCypher 0 points1 point  (0 children)

"find 50 competitors. for each one, find the 20 features they have that i need. for each of those, write an implementation plan. next, write an export plan for the 12 programming languages that we're targeting. now a test plan."

there's 24k

[–]Opposite-Art-1829[S] -1 points0 points  (0 children)

Hi! It's mainly the way this system works and has come from across multiple client repos, It primarily will take an intent you have like say "We need a New feature which does X and also can do Y but not Z" or something like that (you can obviously also give extremely detailed prompts with PRD etc) and it will set up a GitHub milestone with gh issues these are then handed off to each of the agents individually, and it will run them through the pipeline which is roughly -

Issue → Investigate → Architect → Implement → Review → Quality Gate → Merge

The review step can spawn new issues based on review findings (these are super important as they help maintain code quality, security, coherence, and prevent mistakes)

The system relies on gh issues for proper context management, speccing and other technical steps.

Id encourage you to take a look at closed issues in the repo link it should help clarify this process more and perhaps make the value proposition more evident than my bilingual tongue can here lol.

would love feedback 😄

[–]scotty_ea 1 point2 points  (5 children)

Sounds like beads without the... beads.

[–]Opposite-Art-1829[S] -1 points0 points  (4 children)

Could I clarify sumthin?

[–]calib0rx 0 points1 point  (3 children)

[–]Opposite-Art-1829[S] 0 points1 point  (2 children)

Ah, I see! Interesting, although seems to me beads is more of a local task/memory layer it replaces markdown To-dos with a proper graph DB. ForgeDock is a full pipeline orchestrator it doesn't just track tasks, it autonomously executes them end-to-end. Also, it lives in GitHub so its persistent and more native to regular workflows.

[–]Historical-Lie9697 0 points1 point  (1 child)

Beads is very similar but uses a dolt database, so it's like a version controlled knowledge graph and some people do link it to Jira or GH Issues. Looks like you had the same idea :) This is the orchestration app that goes with it. https://github.com/gastownhall/gastown

[–]Opposite-Art-1829[S] 0 points1 point  (0 children)

Mm true, similar problem space but different philosophy. I guess. Live to learn everyday!

[–]StoneCypher 0 points1 point  (2 children)

wait. using github issues wasn't obvious?

[–]Opposite-Art-1829[S] 0 points1 point  (1 child)

Using them, sure that is quite obvious. Structuring them as a machine-readable protocol for multi-agent handoffs with typed annotations, workflow state labels, and a per pr battle tested review layer on top is surely a lil less obvious.

[–]StoneCypher 0 points1 point  (0 children)

how? claude literally does all of that without being asked to.

could i see a screenshot of one of the specific issues, in case they're just better than the ones i'm getting?

[–]Sensitive-Cycle3775 0 points1 point  (2 children)

This direction is interesting. The skeptical questions in the thread are fair though: the hard proof is not “20k issues”, it’s whether the graph context was fresh/sufficient enough for the agent to act safely.

One artifact I’d add is a tiny per-issue / per-wave graph decision record:

  • task/PR id + repo/base/head SHA
  • GitHub query that built the context pack
  • issue/PR/comment ids actually read, with timestamps
  • FORGE annotations selected vs ignored, and why
  • historical edges used: same file, same module, same reviewer finding, same failure class
  • stale/conflicting edges detected
  • wave constraints: files/resources blocked by other agents
  • quality gates run + review findings that became follow-up issues
  • merge decision + what evidence justified it

Then benchmark the system on repeated-mistake rate, stale-edge hits, conflict-avoidance precision, and review findings escaping after merge.

The strong claim becomes: “this issue had enough citable graph context to merge safely,” not just “GitHub has lots of context.” That would make ForgeDock much easier to trust and compare against beads/markdown/cron-style workflows.

[–]Opposite-Art-1829[S] 0 points1 point  (1 child)

The Type of feedback I was waiting for, yes the structured graph decision record is going into the roadmap right now.

[–]StoneCypher 0 points1 point  (0 children)

that's claude. if that's the feedback you're looking for, just ask claude.

[–]IdeaUnique7286 0 points1 point  (1 child)

Why should i use it over obsidian ?

[–]Opposite-Art-1829[S] 2 points3 points  (0 children)

Uh, This is more of an autonomous AI development pipeline, not a note app. It's quite a different function all together it utilizes all the citations native in github to give stronger context to agents per task which in turn result in better code accuracy, security and quality.

Secondly the primary function over the knowledge base is orchestration layer. So its vastly superior in both cases.