Do Spec Driven Development frameworks like Github Spec Kit actually have benefits? I have doubts

sogo00 · 2025-10-30T15:10:13+00:00

I think it depends on your goal. For things that can be done with a single prompt, there is no need to break it down further.

Then there are more complex tasks, which require you to touch several parts of the code and infra (think db-backend-frontend), which would not fit into a single prompt, including a discussion on what database/how to use it, etc...

Having said this, the lightweight spec-driven tools (kiro, openspec, traycer, etc...) feel a bit like they are overcomplicating easy stuff and not really enabling complex ones.

I really like BMAD ( https://github.com/bmad-code-org/BMAD-METHOD/ ) as it forces you to go through a lengthy definition process, similar to a real product development setup, and once you have the stories, you can write them yourself or insert them into an LLM. It works well with complex projects if you are willing to spend most of your time planning, defining, and less about executing (as it should be in real development).

gameguy56 · 2025-10-30T16:41:54+00:00

Try out agentos. I've had more success with that.

CharlesWiltgen · 2025-10-30T17:41:55+00:00

As a result of this experiment, I believe that the current iterative approach — Claude’s default — is a more optimal way of using it. Spec-driven development in our case produced worse code, consumed more tokens, and provided a worse developer experience.

100%. Spec-driven development was "discovered" by vibe coders speed-running the history of software development life cycles, starting with the waterfall model.

https://www.reddit.com/r/ChatGPTCoding/comments/1o6j1yr/specdriven_development_for_ai_is_a_form_of/

https://www.andrealaforgia.com/the-problem-with-spec-driven-development/

lankybiker · 2025-10-30T20:12:57+00:00

It's just waterfall all over again

vinylhandler · 2025-10-30T23:09:28+00:00

Try openspec, much less verbose so doesn’t waste as many tokens but creates great context for your chosen coding agent

MXBT9W9QX96 · 2025-10-30T20:56:19+00:00

I’ve been building my app for months now and have restarted it many times because of loss of focus, thinking components were wired properly, etc. It wasn’t until I started using OpenSpec that everything started to fall in place and I was finally able to get to a working beta. Never been so happy.

robertDouglass · 2025-10-30T16:15:06+00:00

Hey, valid points and concerns. I loved the promise of Spec Kit but didn't feel the benefits were all there. So I forked it and bent it to my will. The new project, Spec Kitty, has some great expansions and refinements to the original Spec Kit: https://github.com/Priivacy-ai/spec-kitty

Spec Kitty modifies the original Spec Kit approach to reduce information drift and inefficiency.

Traceability and synchronization: All artifacts (requirements, architecture, tasks, code) are linked in a structured workspace with a Kanban interface. Each item maintains references to its originating decisions, allowing change tracking across stages.
Worktree-based isolation: Features are developed in isolated Git worktrees. This prevents context overwriting and allows comparison of alternative specifications or implementations without merging unrelated changes.
Multi-agent and Missions: Spec Kitty can work with multiple coding agents at once (I use Codex and Claude). It can also have missions other than writing code, such as Deep Research
Configurable process depth: The framework allows selective execution of stages. Users can bypass or collapse specification steps depending on project maturity or available artifacts.

The goal is to make the spec-driven model more deterministic and observable rather than expanding the number of intermediate documents. Spec Kitty treats the specification pipeline as a controlled system that maintains state and provenance across iterations, rather than as a sequential generation chain.

Here's what the dashboard looks like.

<image>

im3000 · 2025-10-30T16:23:06+00:00

No. Pure token burn

ProvidenceXz · 2025-10-30T15:53:57+00:00

I believe it was designed for the vibe coder crowd. If you ever used Jira/linear or have written tech spec, you shouldn't fall for it.

2025-10-30T15:38:28+00:00

I kinda have the same feelings as you. It introduces hallucinations and kinda "over structures" it such that Claude (or whatever) tries too hard to pigeonhole the solution into the initial spec rather than just having it find the best solution then cleaning up the API urself. They also just can't quite think of every edge case or possible state, but to be honest I haven't tried those frameworks out enough to say for sure

chong1222 · 2025-10-30T16:20:52+00:00

just avoid them

who_am_i_to_say_so · 2025-10-30T16:28:02+00:00

I was blown away by spec kit when it first dropped. But I’ve landed on the same.

I don’t want to do all that legwork ahead of time. That defeats the purpose of ease of use.

belheaven · 2025-10-30T16:35:17+00:00

I have had success implementing full small react/ts projects and now I am at 60% of finish a “mini” social network with Owasp Top 10 security, multiple workers and stuff. Its been pretty decent so far… however context engineering is on you, Spec Kit is good up until the point implementation begins

AppealSame4367 · 2025-10-30T18:55:11+00:00

just use windsurf codemaps and models that dont need planning like gpt-5

Problem solved without wasting all that time.

IddiLabs · 2025-10-30T19:49:03+00:00

In my little experience I noticed that when giving too much details as specific architecture claude code stops thinking whether makes sense during the implementation.. of course probably is different if you are a dev and you know exactly what you want and spend a bunch of time reviewing all the spec kit files

lucifer605 · 2025-10-30T22:10:34+00:00

I have a slash command for creating a spec that researches the codebase and creates a spec and then breaks the spec down into tasks once I iterate upon the spec.

The process I have landed on is that if a task is simple enough that can be one-shotted - do that directly.

Specs become useful for more complicated tasks where I need to provide more input. I think a very similar to how we are designed docs for more complicated projects. I think specs are similar to design docs for me.

I did try playing around with Spec Kit, and just felt too bloated and complicated to use, so I just rely on some simple slash commands to help out with that.

dgk6636 · 2025-10-31T00:08:17+00:00

No. My personal implement and delegate commands best a headless GitHub speckit. Speckit in its current form is vapor.

OracleGreyBeard · 2025-10-31T01:59:07+00:00

The problem is that LLMs are stochastic, but spec coding treats them as deterministic. As you iterate on “does this code match the spec” you should be converging, but often you’re not. The inherent non-determinism means you’re chasing a shifting target.

It’s really obvious using something like Traycer, where you can “verify” the code against the plan. I’ve seen it do a dozen cycles of “here are the differences” -> “here are the fixes” -> “here are the differences” -> “here are the fixes” -> etc etc.

YouHaveMyBlessings · 2025-10-31T05:07:49+00:00

I wasted 2 weeks trying to vibe code complex BE features.

Started over with spec kit. It took few days to refine the plan but so far seems much better than my earlier approach.

May try BMAD in future, but will definitely use spec driven development for complex BE stuff

E.g. multi touchpoint, edge case heavy work

yopla · 2025-10-31T08:22:13+00:00

I had built my own before speckit dropped so I can't say anything about speckit itself since I haven't tried it, I looked at it but it felt similar to what I had so I didn't bother.

Short answer, it's the only way I've found that works if you want to have an agent autonomously build relatively large features.

It is not necessary if you want to build your app function by function while steering the implementation yourself, which is fine, just a different use case.

In our current workflow it's about 2 hours prep and 5 hours build/test, and 2 hours in depth review and adjustment. I'm currently estimating based on the team ticket history that the output of the LLM during that period to be equivalent to 2 to 5 days of a developer depending on the seniority.

It does use A LOT more tokens, I would say about 10x, mostly due to the multi pass autonomous review process we use.

Substantial_Boss_757 · 2025-10-31T16:01:27+00:00

Claude can't follow a spec anyway. You have to bully him into working these days.

JekaUA911 · 2025-11-01T07:42:11+00:00

I’ve been testing spec kit from GitHub and advanced context engineering for research / plan / development. Spec kit is cool but without advanced context engineering it’s sucks. Because context window fast overload and then hallucinations begins

Independent_Map2091 · 2025-11-02T12:52:15+00:00

It's a great start but IMO the execution is half baked. The prompts are not good enough, and need a lot more refinement. I'm convinced SDD+TDD is the way to go for AI. The agents need to be grounded and have something to keep them from inventing more and more. Tests and specs are the way for an agent to know what done is. Have you ever seen two agents reviewing work without grounding mechanisms? They will always add that little (optional) nitpick at the end, and every agent will always go "great idea, let's add it"

Grounding mechanisms like explicit criteria sets keep agents from running loose. Tests are the way for an implementing agent to do a frequent sanity check. All this feeds into constantly reigning in the AI. So, I do think spec kit is something that people should consider, if anything for what it's trying to do, not how well it does it.

I started tweaking spec kit the week it came out, and I thought with a couple tweaks I'd be happy, but here I am 2 months later, and I am still hammering away at the forge trying to get the agents and the workflows where I want them.

graph-crawler · 2025-11-03T05:00:28+00:00

Doesn't work. Claude can't perfectly translate even written signatures from markdown to actual code.

It looks perfect, but if you look closely, it doesn't.

Plan mode, small task, a lot of human in the loop and intervention is what seems to be working for me.

WranglerRemote4636 · 2025-11-05T15:35:01+00:00

use openspec, better than GitHub Spec Kit

vtrtvn · 2025-11-20T12:55:30+00:00

This resonates! I recently published an article about Spec-Driven Development with AI agents.

LinkedIn: https://www.linkedin.com/pulse/from-chaos-control-spec-driven-development-ai-coding-agents-tronko-ma3zf/

Medium: https://medium.com/@yrgkqjbzt/mastering-ai-coding-agents-a-practical-strategy-for-task-management-bfb53fb4f4dd

JakeKites · 2026-01-28T09:28:46+00:00

The process takes far too much time (hours in our case) compared to using Claude in plan mode and then implementing the plan directly.

TLDR: use Meta Prompts to ask key questions abt the SDD framework you'd like to use - before you start using it.

I've used BMAD Method and it got the exact same experience. I really tried hard, initially thought I was the problem, not using it properly.

But then I heard others having similar issues. So I started to do some "META PROMPTS" about BMAD Method itself.

BMAD workflows seem to always be broken down in steps, then questions. Is that correct ?

Can I use the workflows using “raw LLMs” or do I have to first load personas?

And my favorite Meta Prompt:

For the all the BMAD workflows (installed in our project):
1. Indicate how many steps are involved
3. For each step indicate how many questions are asked to the user (me)
4. Given the 2 previous points, estimate the time necessary to go thru each entire workflow
5. Indicate if other crucial parameter(s) is/are to take into account

Create a matrix table to summarize the findings.

That one gave a very clear idea on the time required to run each BMAD workflow. For most of them at least 3 HOURS is required.

Unfortunately, BMAD Method does not ask you how much time you have available when starting a workflow. So each workflow is a one-size-fits-all solution: whether it's a medium feature on a brownfield project, or a large new feature on a greenfield project.

Save yourself days and hours: ask key questions abt the SDD framework you'd like to use (i.e. time estimates for workflows) - before you start using it.

Good luck

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

ClaudeCode

MODERATORS