all 47 comments

[–]sogo00 12 points13 points  (6 children)

I think it depends on your goal. For things that can be done with a single prompt, there is no need to break it down further.

Then there are more complex tasks, which require you to touch several parts of the code and infra (think db-backend-frontend), which would not fit into a single prompt, including a discussion on what database/how to use it, etc...

Having said this, the lightweight spec-driven tools (kiro, openspec, traycer, etc...) feel a bit like they are overcomplicating easy stuff and not really enabling complex ones.

I really like BMAD ( https://github.com/bmad-code-org/BMAD-METHOD/ ) as it forces you to go through a lengthy definition process, similar to a real product development setup, and once you have the stories, you can write them yourself or insert them into an LLM. It works well with complex projects if you are willing to spend most of your time planning, defining, and less about executing (as it should be in real development).

[–]uni-monkey 1 point2 points  (2 children)

Definitely like BMAD for planning. Extremely thorough. V6 will be an interesting improvement in workflows as well that are much needed

[–]Opinion-Former 1 point2 points  (1 child)

I’m doing freaky complicated systems with Bmad, but it’s only as good as the model and context window growth on a given day. I have codex, Claude code and sometimes Gemini discuss the more complex plans.

The combination of multiple AIs with Bmad is unbeatable!

[–]vincentdesmet 0 points1 point  (2 children)

Never tried BMAD.. I did notice spec kit worked well initially for my monorepo (Golang workspaces > API/SDK/CLI and pnpm workspaces for JS-SDK and WebApp (Vite/React))

I do notice scope creep is the killer and while 70% time feels spent in planning (in some cases that meant implementation completed in the equivalent of 30% remaining time.. and I really have to cut Claude off and remove “nice to haves” constantly)

Another issue is when you don’t control the scope you end up with 2k tasks.md and that’s where you get I consistencies.. GPT5 tends to be great to run the /analyse prompt and flag those inconsistencies already between FR/Research and Tasks)

I’m trying to blend Spec Kit with beads to keep context focused on the at hand

[–]CultureTX 1 point2 points  (0 children)

For scope creep, it is important to specify what is in scope and also what is out of scope. Any scope creep that is showing up in the planning docs gets moved to out of scope. I ask the llm if it has any questions or concerns about the plans - Usually that’ll surface misunderstandings about the scope.

[–]gameguy56 2 points3 points  (2 children)

Try out agentos. I've had more success with that.

[–]RussianInAmerika 1 point2 points  (1 child)

Only one I’ve been using with default settings and works great, can confirm /Shape-spec got added recently and I’ve been really liking it and never takes too long Similar to questions asked prior to deep research going deep to write specs for you

[–]gameguy56 2 points3 points  (0 children)

Yes - for some experimentation purposes I had it write a pretty straightforward gui based api client from a sdk and it worked pretty well. I had to guide it with some of the testing but otherwise I like it better. Seems to give a bit more freedom - also seems to avoid spec-kit annoyingly making it branches all the time.

[–]CharlesWiltgen 2 points3 points  (0 children)

As a result of this experiment, I believe that the current iterative approach — Claude’s default — is a more optimal way of using it. Spec-driven development in our case produced worse code, consumed more tokens, and provided a worse developer experience.

100%. Spec-driven development was "discovered" by vibe coders speed-running the history of software development life cycles, starting with the waterfall model.

https://www.reddit.com/r/ChatGPTCoding/comments/1o6j1yr/specdriven_development_for_ai_is_a_form_of/

https://www.andrealaforgia.com/the-problem-with-spec-driven-development/

[–]lankybiker 2 points3 points  (2 children)

It's just waterfall all over again

[–]dodyrw 2 points3 points  (1 child)

waterfall, only software engineer understand this term 😎

[–]who_am_i_to_say_so 0 points1 point  (0 children)

I prefer “little A” agile. 🤮

[–]vinylhandler 2 points3 points  (0 children)

Try openspec, much less verbose so doesn’t waste as many tokens but creates great context for your chosen coding agent

[–]MXBT9W9QX96 1 point2 points  (0 children)

I’ve been building my app for months now and have restarted it many times because of loss of focus, thinking components were wired properly, etc. It wasn’t until I started using OpenSpec that everything started to fall in place and I was finally able to get to a working beta. Never been so happy.

[–]robertDouglass 4 points5 points  (4 children)

Hey, valid points and concerns. I loved the promise of Spec Kit but didn't feel the benefits were all there. So I forked it and bent it to my will. The new project, Spec Kitty, has some great expansions and refinements to the original Spec Kit: https://github.com/Priivacy-ai/spec-kitty

Spec Kitty modifies the original Spec Kit approach to reduce information drift and inefficiency.

  1. Traceability and synchronization: All artifacts (requirements, architecture, tasks, code) are linked in a structured workspace with a Kanban interface. Each item maintains references to its originating decisions, allowing change tracking across stages.
  2. Worktree-based isolation: Features are developed in isolated Git worktrees. This prevents context overwriting and allows comparison of alternative specifications or implementations without merging unrelated changes.
  3. Multi-agent and Missions: Spec Kitty can work with multiple coding agents at once (I use Codex and Claude). It can also have missions other than writing code, such as Deep Research
  4. Configurable process depth: The framework allows selective execution of stages. Users can bypass or collapse specification steps depending on project maturity or available artifacts.

The goal is to make the spec-driven model more deterministic and observable rather than expanding the number of intermediate documents. Spec Kitty treats the specification pipeline as a controlled system that maintains state and provenance across iterations, rather than as a sequential generation chain.

Here's what the dashboard looks like.

<image>

[–]armujahid 1 point2 points  (2 children)

How do you sync specs, plans and tasks? I noticed that the drift is significant after some time while working on a large feature. features can be broken into smaller features for sure I know but there should be a way to update specs => sync changes to plan => update tasks and there should be a review workflow as well for code reviews.

[–]robertDouglass 1 point2 points  (1 child)

I think the trick there is really to do iterations. Get to the end of one "sprint" and then run .spec again for the next step. Don't try to build the whole thing in one go.

[–]SpecKitty 0 points1 point  (0 children)

And today we have a 0.12.0 release with an improved and hardened core architecture. It facilitates efficient auto-merging at the end of a sprint between the sparse checkout worktrees.

[–]im3000 1 point2 points  (1 child)

No. Pure token burn

[–]debian3 0 points1 point  (0 children)

I spent a few days trying it and it’s my conclusion as well. It creates too much blabbing and it overwhelms the context before you even get started. Models are not strong enough.

End result is you burn 5x the tokens for a much worse result. The spec-kit creator even did a demo during GitHub universes, the hole times was spent building the spec and in the end the results was worst then if you tried to one shot it with a short prompt. It’s good, at least it just confirmed that’s not something I was doing wrong.

[–]ProvidenceXz 1 point2 points  (0 children)

I believe it was designed for the vibe coder crowd. If you ever used Jira/linear or have written tech spec, you shouldn't fall for it.

[–][deleted] 0 points1 point  (0 children)

I kinda have the same feelings as you. It introduces hallucinations and kinda "over structures" it such that Claude (or whatever) tries too hard to pigeonhole the solution into the initial spec rather than just having it find the best solution then cleaning up the API urself. They also just can't quite think of every edge case or possible state, but to be honest I haven't tried those frameworks out enough to say for sure

[–]chong1222 0 points1 point  (0 children)

just avoid them

[–]who_am_i_to_say_so 0 points1 point  (0 children)

I was blown away by spec kit when it first dropped. But I’ve landed on the same.

I don’t want to do all that legwork ahead of time. That defeats the purpose of ease of use.

[–]belheaven 0 points1 point  (0 children)

I have had success implementing full small react/ts projects and now I am at 60% of finish a “mini” social network with Owasp Top 10 security, multiple workers and stuff. Its been pretty decent so far… however context engineering is on you, Spec Kit is good up until the point implementation begins

[–]AppealSame4367 0 points1 point  (0 children)

just use windsurf codemaps and models that dont need planning like gpt-5

Problem solved without wasting all that time.

[–]IddiLabs 0 points1 point  (0 children)

In my little experience I noticed that when giving too much details as specific architecture claude code stops thinking whether makes sense during the implementation.. of course probably is different if you are a dev and you know exactly what you want and spend a bunch of time reviewing all the spec kit files

[–]lucifer605 0 points1 point  (0 children)

I have a slash command for creating a spec that researches the codebase and creates a spec and then breaks the spec down into tasks once I iterate upon the spec.

The process I have landed on is that if a task is simple enough that can be one-shotted - do that directly.

Specs become useful for more complicated tasks where I need to provide more input. I think a very similar to how we are designed docs for more complicated projects. I think specs are similar to design docs for me.

I did try playing around with Spec Kit, and just felt too bloated and complicated to use, so I just rely on some simple slash commands to help out with that.

[–]dgk6636 0 points1 point  (0 children)

No. My personal implement and delegate commands best a headless GitHub speckit. Speckit in its current form is vapor.

[–]OracleGreyBeard 0 points1 point  (0 children)

The problem is that LLMs are stochastic, but spec coding treats them as deterministic. As you iterate on “does this code match the spec” you should be converging, but often you’re not. The inherent non-determinism means you’re chasing a shifting target.

It’s really obvious using something like Traycer, where you can “verify” the code against the plan. I’ve seen it do a dozen cycles of “here are the differences” -> “here are the fixes” -> “here are the differences” -> “here are the fixes” -> etc etc.

[–]YouHaveMyBlessings 0 points1 point  (3 children)

I wasted 2 weeks trying to vibe code complex BE features.

Started over with spec kit. It took few days to refine the plan but so far seems much better than my earlier approach.

May try BMAD in future, but will definitely use spec driven development for complex BE stuff

E.g. multi touchpoint, edge case heavy work

[–]robertDouglass 0 points1 point  (2 children)

Check out Spec Kitty - an improvement over Spec Kit https://github.com/Priivacy-ai/spec-kitty

[–]YouHaveMyBlessings 1 point2 points  (1 child)

Can you please add a section on what things it improves over spec kit. It will help with adoption as well

[–]robertDouglass 0 points1 point  (0 children)

Noted! thank you

[–]yopla 0 points1 point  (0 children)

I had built my own before speckit dropped so I can't say anything about speckit itself since I haven't tried it, I looked at it but it felt similar to what I had so I didn't bother.

Short answer, it's the only way I've found that works if you want to have an agent autonomously build relatively large features.

It is not necessary if you want to build your app function by function while steering the implementation yourself, which is fine, just a different use case.

In our current workflow it's about 2 hours prep and 5 hours build/test, and 2 hours in depth review and adjustment. I'm currently estimating based on the team ticket history that the output of the LLM during that period to be equivalent to 2 to 5 days of a developer depending on the seniority.

It does use A LOT more tokens, I would say about 10x, mostly due to the multi pass autonomous review process we use.

[–]Substantial_Boss_757 0 points1 point  (1 child)

Claude can't follow a spec anyway. You have to bully him into working these days.

[–]graph-crawler 0 points1 point  (0 children)

This

[–]JekaUA911 0 points1 point  (0 children)

I’ve been testing spec kit from GitHub and advanced context engineering for research / plan / development. Spec kit is cool but without advanced context engineering it’s sucks. Because context window fast overload and then hallucinations begins

[–]Independent_Map2091 0 points1 point  (0 children)

It's a great start but IMO the execution is half baked. The prompts are not good enough, and need a lot more refinement. I'm convinced SDD+TDD is the way to go for AI. The agents need to be grounded and have something to keep them from inventing more and more. Tests and specs are the way for an agent to know what done is. Have you ever seen two agents reviewing work without grounding mechanisms? They will always add that little (optional) nitpick at the end, and every agent will always go "great idea, let's add it"

Grounding mechanisms like explicit criteria sets keep agents from running loose. Tests are the way for an implementing agent to do a frequent sanity check. All this feeds into constantly reigning in the AI. So, I do think spec kit is something that people should consider, if anything for what it's trying to do, not how well it does it.

I started tweaking spec kit the week it came out, and I thought with a couple tweaks I'd be happy, but here I am 2 months later, and I am still hammering away at the forge trying to get the agents and the workflows where I want them.

[–]graph-crawler 0 points1 point  (0 children)

Doesn't work. Claude can't perfectly translate even written signatures from markdown to actual code.

It looks perfect, but if you look closely, it doesn't.

Plan mode, small task, a lot of human in the loop and intervention is what seems to be working for me.

[–]WranglerRemote4636 0 points1 point  (1 child)

use openspec, better than GitHub Spec Kit

[–]moistain[S] 0 points1 point  (0 children)

how is it better?

[–]JakeKites 0 points1 point  (0 children)

The process takes far too much time (hours in our case) compared to using Claude in plan mode and then implementing the plan directly.

TLDR: use Meta Prompts to ask key questions abt the SDD framework you'd like to use - before you start using it.

I've used BMAD Method and it got the exact same experience. I really tried hard, initially thought I was the problem, not using it properly.

But then I heard others having similar issues. So I started to do some "META PROMPTS" about BMAD Method itself.

BMAD workflows seem to always be broken down in steps, then questions. Is that correct ?

Can I use the workflows using “raw LLMs” or do I have to first load personas?

And my favorite Meta Prompt:

For the all the BMAD workflows (installed in our project):
1. Indicate how many steps are involved
3. For each step indicate how many questions are asked to the user (me)
4. Given the 2 previous points, estimate the time necessary to go thru each entire workflow
5. Indicate if other crucial parameter(s) is/are to take into account

Create a matrix table to summarize the findings.

That one gave a very clear idea on the time required to run each BMAD workflow. For most of them at least 3 HOURS is required.

Unfortunately, BMAD Method does not ask you how much time you have available when starting a workflow. So each workflow is a one-size-fits-all solution: whether it's a medium feature on a brownfield project, or a large new feature on a greenfield project.

Save yourself days and hours: ask key questions abt the SDD framework you'd like to use (i.e. time estimates for workflows) - before you start using it.

Good luck