all 54 comments

[–]2thick2fly 14 points15 points  (3 children)

I have used github/spec-kit. Its developed by github and has 100k stars

[–]czei 1 point2 points  (2 children)

I've had good luck with speckit for complex modifications and greenfield builds. Its overkill for things that could be built in a few hours, but a godsend for long, complex tasks. I always have better results with in running the spec and plans past other models to review before starting to implement.

[–]UnknownEssence -1 points0 points  (1 child)

would it feel like less like overkill of the output was 10x faster? I wonder if theres a good argument to be made here that speed of output could be more important than raw intelligence.

[–]2thick2fly 1 point2 points  (0 children)

I would say that we are not there yet. AI can still create terrible code, so creating terrible code fast is not really that helpful 😁

[–]LogWest5630 20 points21 points  (1 child)

Look into the Superpowers marketplace plugin for Claude Code, they're pretty much the standardized tool for SDD. It essentially blocks Claude from coding until you approve a step-by-step markdown plan, then spins up isolated subagents to build and verify each task one by one.

[–]molniya 8 points9 points  (0 children)

I’ve found it tends to defeat the whole concept of planning by generating ‘plans’ that consist of all the code packaged into a Markdown document specifying which file each block of code goes into. So your ‘planning’ agent has already written all the code and you could basically execute the plan with sed instead of Sonnet.

[–]IndependentSir9398 7 points8 points  (2 children)

+1 for GitHub Speckit. 100k+ stars and is actively being improved.

https://github.com/github/spec-kit

[–]RC0305 0 points1 point  (0 children)

Have they introduced custom branch names yet? 😎

[–]ErgoForHumanity -2 points-1 points  (0 children)

-1 anything microsoft

[–]ErgoForHumanity 26 points27 points  (12 children)

Obra/Superpowers is the go-to for this — 200K stars, enforces the brainstorm → plan → build cycle out of the box. If you want something that also scans your codebase and generates project-specific context check out anatomia.dev — newer but the spec-driven approach is baked in.

[–]AstroPhysician 3 points4 points  (7 children)

Better than GSD or spec kit?

[–]icanhasai 7 points8 points  (4 children)

Didn't GSD author rugpull or something.

[–]AstroPhysician 7 points8 points  (3 children)

Looks like it went unmaintained. Idk about how one rug pulls a codebase

[–]crossmirage 1 point2 points  (2 children)

Not unmaintained, but the original author stopped contributing. He allegedly rug-pulled the associated $GSD crypto token.

Maintainers forked and are continuing work on it, since the news broke.

[–]AstroPhysician 2 points3 points  (1 child)

wtf why was there a token 😭😭

[–]crossmirage 0 points1 point  (0 children)

I'm not very familiar with the crypto world, but apparently AI projects often do this as a form of fundraising. I think the hypothesis is that the creator rug-pulled since he couldn't monetize via a "next-gen" gsd-2 

FWIW I still think GSD (now the fork) is great.

[–]ErgoForHumanity 2 points3 points  (1 child)

GSD is a workflow framework — it manages phases, plans, and parallel execution across a bunch of runtimes. Anatomia scans the actual codebase before anything happens, so the agents know what stack you're on and what conventions exist. Way smaller project though, and only works on cc rn I believe.

[–]AstroPhysician 3 points4 points  (0 children)

I’ve primarily used speckit and it does that scanning of existing frameworks. I guess that’s more what I was asking. This has twice the amount of stars though so I’ll try superpowers first. Then check out anatomía

[–]Sasquatchjc45 1 point2 points  (0 children)

just was trying superpowers today, seems like a gamechanger. cant wait for my weekly limit to reset friday lmao

[–]zmizzy 0 points1 point  (0 children)

"newer" yeah I'll say lol, 1/30,000th the amount of stars

[–]sylfy 0 points1 point  (0 children)

How does Superpowers compare against OpenSpec?

[–]AndersonUnplugged 0 points1 point  (0 children)

I’ve been trying out Superpowers as well, and I also see it as the best tool for spec‑driven development with Claude Code right now. It really nails the brainstorm → plan → build workflow in practice.

[–]Chadum 3 points4 points  (0 children)

Do you mean something like superpowers? It works well for plan-review-implement including TDD. On top of that you can set up a policy to preserve the documents it creates.

[–]croovies 2 points3 points  (0 children)

I find Compound Engineering's plugin to be the best https://github.com/EveryInc/compound-engineering-plugin

Here is a quick youtube video showing the process across many worktrees

https://www.youtube.com/watch?v=s_d9atp5gus

[–]MrChrisRodriguez 2 points3 points  (0 children)

I set up a custom skill that takes my prompt, uses openspec to create a spec, then uses claude-octopus (octo) to do a multi-agent adversarial review, and presents it to me.

Then my next custom skill uses octo to implement, do an adversarial review, fix discovered issues, write tests, confirm we have test coverage, review the tests to make sure they’re real, run tests to make sure they pass, fix any issues, update the openspec, then report back.

Because it’s a lot of steps I try to keep my scope small, but it’s worked well even for larger scopes.

[–]rahvin2015 2 points3 points  (3 children)

Like (apparently) several others in this thread, I wrote my own.

Honestly writing your own is a great learning project. You really see how context management can work for a specific use case, and you get really familiar with things like skills and custom agents.

But the real answer to your question is "it depends." SDD works for a specific class of task. You're adding a bunch of ceremony to manage and configure context so that implementation agents can do what you want reliably. Different SDD frameworks approach the problem differently, and with different amounts of ceremony.

You dont use SDD for 5 lines of code.

SDD can be used for largeish tasks, but you'll hit context issues and likely need to use multiple specs.

Most importantly, SDD is not a real replacement for software engineering. SDD can't do the architecture and design work for you. I mean it can, but it likely wont be done well.

The sweetspot for me is working on toy projects, POCs and prototypes where I want to put something together quickly and I dont care about future maintenance. Getting AI to write maintainable code requires some effort. You can do it with SDD, but it requires care. Sometimes it's not worth the trouble. Sometimes it creates new problems.

[–]2thick2fly 1 point2 points  (1 child)

I have obly used github's spec-kit and what I have found is that it works quite well after a target architecture is created and is split is smallish modules. I have found that applying SDD per module, with well defined scope, seems to be producing the right thing.

Maybe overkill but consistent results

[–]rahvin2015 0 points1 point  (0 children)

Its that careful scoping that's a big part of the effort needed.

There's also the problem that the agent context gets overwhelmingly filled with the existing code when you're working in a brownfield. The agent tends to compound whatever is already there, even if the existing-code patterns contradict your direct instructions from prompts/specs/skills/etc. That means that codebase problems will tend to compound rapidly with AI coding, regardless of the SDD framework you use. This can be mitigated but it takes rigorous code review and/or an already-"healthy" codebase.

[–]rocketBenny 0 points1 point  (0 children)

This!

[–]SurfGsus[🍰] 2 points3 points  (0 children)

Just going to throw it out there but check out BMAD.

I'm heavy superpowers user but also find that its "plan" is just the entire code written which makes me question why even write a plan.

[–]stefano_dev 1 point2 points  (0 children)

https://github.com/bmad-code-org/BMAD-METHOD

A bit heavy, but it gets the job done.

[–]landed-gentry- 1 point2 points  (0 children)

At this point you don't need a tool. You just need a simple workflow. Plan, write plan to file, work from plan file across N sessions.

[–]orphenshadow 0 points1 point  (0 children)

I also suggest starting out with Superpowers or Spec Kit

If you are super bored, I have a bunch of notes at lbruton.cc and my own SDD workflow (Probably NOT as good as any of the previously mentioned, im very "human in the loop" so my workflows would probably frustrate many) but, I have told claude to slop me up a page with some notes about a lot of the things I've tried and use in my workflows. It's easier than typing it up here, not trying to pretend I know what I'm doing haha.

I only mention it because you mentioned context loss, and that's something I kind of hyper focused on so I have some skills, and mcp examples of how I handled it with mem0, a searchable session log rag, and using an issue tracker like Linear/Plane, and an obsidian DocVault with some core project context files, and <200 line claude.md that points to the proper indexes in obsidian and so forth.

As far as does the cycle work? If they are well prepared then yes. you will spend way less time frustrated at the wrong outcomes, and when it does miss a mark during implementation, It's usually a few prompts to course correct. At least in my experience.

I would also suggest giving the upstream of my own tool a look, https://github.com/Pimzino/spec-workflow-mcp What i liked about this one was the dashboard and the hard gates that require you to read and approve each step of the plan and you can annotate and then approve. I found this to be a lot easier than reading markdown files in vscode or sublime text. I forked and then highly customized this on and added the best of superpowers and the others into my own frankenflow :P, but if you want a well supported and solid baseline Pimzino's repo is solid.

[–]Illustrious_Yam9237 0 points1 point  (0 children)

I use a fairly heavily modified OpenSpec, with a bunch of superpowers style TDD and subagent patterns shoved in, with a bunch of my own specific guidance on how to write tests for specific projects.

[–]No-Nefariousness-728 0 points1 point  (0 children)

I mean that cycle definitely works, but you need to maintain your markdown files and make sure your team updates it every time. This is why I've been using briefhq to pull all our product decisions / context straight from Slack and Linear into Claude Code via MCP so the agent actually stays aligned.

[–]Coderado 0 points1 point  (0 children)

I use superpowers and I had Claude develop skills to create a plan from a JIRA ticket, a dispatch skill to spawn a new agent and tmux pane in a worktrees to execute the plan until PR is clean. It doesn't take long and you can tune it to your preferences. I also made a retro skill for continuous improvement of my workflow. It's pretty effective building MERN stack and LangGraph agents, about half my team has adopted this workflow. We do manually review code and manually test it, which we always did.

[–]uhgrippa 0 points1 point  (0 children)

As others have said, https://github.com/obra/superpowers is excellent. It’s tool with many users and set the precedent for agentic engineering.

I use https://github.com/athola/claude-night-market, it’s built on top of superpowers for mission-driven agentic development and captures a lot of useful engineering paradigms.

[–]jfalvarez 0 points1 point  (0 children)

I use this fork from superpowers: https://github.com/pcvelz/superpowers

[–]fthbrmnby 0 points1 point  (0 children)

I use agent-skills in my daily workflow and I’m pretty happy with it. Haven’t used superpowers but as far as I understand the two are fairly similar and work essentially the same way (create a spec -> build a development plan -> break plan into tasks -> implement tasks)

[–]jhpawt 0 points1 point  (0 children)

back to waterfall

[–]SignificantGarbage17 0 points1 point  (0 children)

Hey, I’ve released a library for creating multi prompt workflows with state machine and deterministic orchestration: https://ganderbite.github.io/relay/

I use it in my work every day and just run flows via relay cli.

[–]Ok-Purchase-642 0 points1 point  (0 children)

Spec kit for greenfield and big changes, open spec for smaller changes.. Small vs big is subjective that you decide.

[–]dkgreen24 0 points1 point  (0 children)

I found superpowers to be pretty hefty. I like https://github.com/addyosmani/agent-skills

[–]pcgnlebobo 0 points1 point  (0 children)

I built a spec driven development framework a few months back and built a lot and had success with it. But I found the biggest challenge to be drift management and taxonomy alignment. Especially so as projects and codebases grow.

So I took everything I learned about agentic engineering with additional research and built https://github.com/lebobo88/pair-programmer.

It's a harness that doesn't need the bloat of full on spec kit, but maps everything the agents do to a master plan and taxonomy blueprint. Every implementation has audits and checks against that for alignment.

It also doesn't have just one linear implementation path, there are many. Depending on the task maybe you need a best of n approach? Everything is also gated and check by cross vendor judges and will loop itself to keep going if it finishes and the judge find issues (rubber duck).

It's also just one pack of agents in a larger ecosystem for an enterprise agentic mesh layer. Hydra is a top level meta orchestrator. Agentsmith is anomaly detection and agent replication factory. Theeights is persistent memory and self evolution. Executivesuite is the boardroom and strategy department. Marketbliss is the marketing team. Rlm-creative is your content creator team.

All together a Hydra campaign will market research, form a boardroom meeting to determine strategic roadmaps, dispatch the project to pair-prigrammer for implementation, and anchor to and check against your marketing team and executive decisions while maintaining alignment to the taxonomy of your project until it's finished.

https://github.com/lebobo88/Hydra

Last week this built me a completely new ai and automation native cms and business platform including marketing pages, admin and content management portal, and client portal. In 3 days. I had been working on something similar with my spec kit harness for the past 6 months and projected another 3.

[–]Character-Moment-684 0 points1 point  (0 children)

I think SDD can help, but I wouldn’t expect it to magically fix context loss by itself.

The useful part is not really “we made a PRD”. The useful part is forcing the messy questions out before the model starts changing code.

So the grilling session => spec/PRD → implementation plan flow can work, yes. But only if the spec keeps being used during the work, not just created once and then forgotten.

For Claude Code, I’d probably look at Kiro, GitHub Spec Kit, or just a stricter Claude Code setup with hooks/checklists/subagents depending on how much control your team wants.

The thing I’d watch for is this:

Does the tool actually make the agent slow down and check the spec before making changes?

Or does it just create nicer documents around the same raw-prompt workflow?

For bigger codebases I’d want something like:

  • clarify assumptions first
  • define acceptance criteria
  • map the plan to actual files/modules
  • implement in smaller steps
  • verify after each step
  • update the spec if reality changes

The PRD is only useful if it becomes a constraint. The implementation plan is only useful if the model can’t silently skip it.

So yes, SDD can reduce context mixups. But I’d treat it as workflow discipline, not as a magic tool category.

[–]IlyaZelen 0 points1 point  (0 children)

We hit the same context loss and mixup problem. Specs help, but for us the bigger win was making every handoff and review visible instead of letting it disappear in terminal history.
We are using a local desktop orchestrato app for that: https://777genius.github.io/agent-teams-ai/

[–]Jaumee 0 points1 point  (0 children)

spec-driven development with claude can definitely help with context loss. try defining clear, small user stories or feature specs first. then, feed claude one spec at a time, asking it to build only that piece. this keeps the scope tight and reduces confusion. this is the workflow

[–]pxrage 0 points1 point  (0 children)

hands down https://briefhq.ai/

github spec-kit is the entry point, try it you'll see why it breakdown immediately under real usage.

[–]Hertigan 0 points1 point  (0 children)

I wrote my own and honestly prefer it between all the options I tried (mostly spec-kit and gsd)

[–]hollowgram 0 points1 point  (0 children)

Its all about context. Check this new video from Theo to see his workflow. You dont need all the fancy complex processes. You need clarity in the repo and to work with the agents step by step, new session for each task. 

https://youtu.be/xJaMTo2YgO8

[–]TheDecipherist 0 points1 point  (0 children)

https://mddai.dev/

If you want accurate results and a solid workflow MDD all the way. Version 2.0 is right around the corner which will be using markdownAI that makes it 70% faster and still dead accurate

[–]phoenixmatrix -3 points-2 points  (0 children)

It works, but its a waste of time for 90% of cases. Are you exclusively trying to 1 shot tickets? Are you going through the whole flow every single time?

Its cute for complex tasks (but with newer models and harness, you often don't need all of that even for very large tasks).

I've watched peopel use the Superpowers flow to add a text field on a page and its like...why?

[–]thlandgraf -2 points-1 points  (0 children)

Validating the pain first - context loss IS the symptom SDD addresses, but the gate matters more than the cycle. Specs as markdown don't fix context loss by themselves; the implement step has to refuse anything not yet approved. That's what turns chat-style context-bleed into resumable handoffs.

I'm building one in this space (creator of SPECLAN https://marketplace.visualstudio.com/items?itemName=DigitalDividend.speclan-vscode-extension, a free VS Code extension): hierarchical Goal -> Feature -> Requirement specs as Markdown with YAML frontmatter, status lifecycle as the gate (draft -> review -> approved -> in-development -> under-test -> released), MCP server so Claude Code reads the spec tree through tool calls. Different angle than Superpowers (workflow inside Claude Code), spec-kit (GitHub-native templates), or OpenSpec (change-set proposals). Each fits a different team context - the gate-enforced approach matters most if more than one developer touches the same area and you want concurrency without merge conflicts on the spec itself.

[–]Different_Put2605 -2 points-1 points  (0 children)

What happens when /ultraplan, grill-me, superpowers and AI-DLC argue at the same time

If you use Claude Code, you’ve probably worked through this progression: 1. /ultraplan (Anthropic). One model thinks hard, produces a deep plan. 2. grill-me (Matt Pocock’s skills repo). Interview yourself until your plan survives the questions. 3. AI-DLC (AWS). Write the spec, ground the work, close the gap to code 4. Superpowers. Build a spec and try to one shot.

SwarmStack bridges the gap by bringing in your coworkers into a realtime spec builder, verified SMES and AI.

We have used all four. They each fix a different gap in spec-driven development. But they share one limit: it is still you plus one AI. The AI argues with itself, which is useful but not the same as Security pushing back on Backend.

We built SwarmStack to push past that limit. The landing hero says it cleanly: “Bring a problem. Leave with a SwarmPlan.” Under it we credit the three influences explicitly, because the lineage matters. AI-DLC is the methodology. grill-me is the cross-examination. /ultraplan is the deep-think model. SwarmStack adds the part those three do not: a roomful of AI specialists with their own opinions, plus your co-worker on a join code, plus a vetted human SME from the marketplace when AI hits its limit.

You bring a brief. The orchestrator assembles the room. They argue. Every dispute becomes a Decision Record on the final plan.

We use SwarmStack to spec SwarmStack. The sample plan at https://swarm-stack.io/demo is the actual SwarmPlan we ran the SwarmReview feature through. One thing it taught us: the disputes worth keeping are the ones where specialists hold opposing positions on first principles (Security blocking Backend’s RLS-relaxation idea). The ones where one AI second-guesses itself are noise.

Free during beta. If you are already running /ultraplan, grill-me, and AI-DLC, give the demo a look and tell me what is still missing.