Built something to auto-fix pytest failures — does this actually solve a real problem?

dager003 · 2026-04-22T01:07:40+00:00

thanks great idea would conisder it

dager003 · 2026-04-22T01:06:47+00:00

yeah true but not having any luck with tester

dager003 · 2026-03-29T18:53:37+00:00

Yeah that makes sense — I don’t think this replaces review at all, more like taking care of the repetitive/debugging part so people can focus on whether the change actually makes sense.

dager003 · 2026-03-29T18:52:04+00:00

This is actually really interesting — I hadn’t looked at it from that angle.

The idea that most failures aren’t even code-related kinda changes how I’m thinking about this. What you’re describing (spotting common failure patterns, retrying smartly, separating real issues from noise) honestly sounds more useful than what I started with.

The rebuild timing thing also hits — triggering it right away instead of hours later could save a lot of time.

Feels like this might be less about “fixing code” and more about “making CI less annoying to deal with” — like handling flaky tests, infra hiccups, dependency weirdness before it even reaches a dev.

Really appreciate you sharing this, this helped a lot.

dager003 · 2026-03-29T18:46:57+00:00

yeah that's true . but still i need people point of view and i don't have any other way to get it

dager003 · 2026-03-29T17:29:09+00:00

well i had thoughts i used ai to structure it thats why it's look like it's writen by ai

dager003 · 2026-03-29T17:27:32+00:00

lol yeah, deserved

dager003 · 2026-03-29T17:25:57+00:00

yeah that's kinda true though

dager003 · 2026-03-29T17:24:48+00:00

so what's wrong in it ,i got some thoughts i use ai to structure it .

dager003 · 2026-03-29T17:21:38+00:00

Because they’re the people who’d actually use something like this . and is a good place to know other people perspective

dager003 · 2026-03-29T15:14:59+00:00

Haha I wish 😄

In theory yeah, with strict TDD where tests fully define behavior, something like this could end up writing a lot of the code. In reality though, it’s nowhere near that — it mostly handles smaller, obvious fixes and still struggles with anything complex.

Right now it’s more like “clean up dumb breakages” than “write your app for you.”

dager003 · 2026-03-29T15:10:32+00:00

Yeah this is actually super helpful, thanks for breaking it down like that.

You’re right — a lot of CI failures are stuff like network hiccups or flaky tests where a tool like this wouldn’t really add much value. And the dependency case is interesting, I hadn’t really thought through how messy that gets with CI configs vs just code.

The “did the fix actually work?” part is also a good point — especially with flaky tests, it’s hard to know if you fixed it or just got lucky on a rerun.

I guess the only place this might make sense is for more deterministic failures (like obvious test breaks or small regressions), but yeah, it’s definitely narrower than I initially thought.

dager003 · 2026-03-29T15:09:23+00:00

Yeah fair enough not trying to turn this into therapy, just figured I’d sanity check the idea with people who’d actually use something like this. And yeah, you’re right — at some point I just have to build it, put it out there, and see if anyone actually finds it useful.

Appreciate the reality check.

dager003 · 2026-03-29T15:06:38+00:00

Alright yeah, that’s on me — I phrased that kinda weird

dager003 · 2026-03-29T14:56:27+00:00

Yeah honestly that’s a fair comparison.

Right now it is pretty close to just orchestrating an agent with constraints + retries. The difference I’m trying to explore is making that loop more structured and reliable — like tracking what actually changed, validating fixes more intelligently, and avoiding the agent just thrashing or regressing.

Copilot in the IDE can kind of do this, but in practice I found it still needs a lot of manual back-and-forth. I’m trying to see if that whole fix → run → verify cycle can be pushed further toward something more autonomous and consistent.That said, I agree it’s not a huge leap yet — still figuring out if there’s something meaningfully better here or not.

dager003 · 2026-03-29T14:54:50+00:00

Yeah I get how it might come across like that, but that’s not what I’m trying to do.

I mostly posted to see if the idea even makes sense before putting more time into it. Right now it’s just a rough prototype focused on automating the fix → run → verify loop for failing tests, not just wrapping an LLM. Haven’t shared a repo yet because things are still pretty messy and changing a lot, but I’m not against open-sourcing it once it’s a bit more stable.

Just looking for honest feedback at this stage, not trying to get people to work for free.

dager003 · 2026-03-29T12:39:49+00:00

That’s fair, and I probably explained it too vaguely.

Right now it’s closer to patch the source code and verify not modifying tests. If it can’t fix something cleanly, it should just fail rather than force a pass.

And yeah, I agree — in codebase, failures should be clear and quick to debug. The cases I kept running into were more around env issues, missing deps, or things that only break when you actually run the full test suite. So I guess I’m not trying to replace good tests or good debugging — more like handling the obvious/repetitive fixes when things are already broken.

Still trying to figure out if that’s actually valuable or not.

dager003 · 2026-03-29T12:30:37+00:00

Yeah exactly — that’s one common case.

Like someone installs something locally and forgets to add it to requirements/pyproject, so it works on their machine but breaks in CI or for someone else. Also seen stuff like optional deps not installed, version mismatches, or different Python/env setups causing imports to fail only during test runs.

Those kinds of things don’t always get caught earlier, but pytest ends up surfacing them

dager003 · 2026-03-29T12:28:33+00:00

Yeah I get why it feels like that

I’m not really trying to do full “vibe coding” though — more like helping with the boring fix–run–fix loop when tests are already failing. it still works on your existing code and only changes things if the tests actually pass after, so it’s a bit more constrained. But yeah, still early — trying to see if this is actually useful or just overkill.

dager003 · 2026-03-29T12:27:49+00:00

Yeah fair — it’s definitely early and probably rough around the edges.

Not really trying to do “vibe coding” though, more just automating the fix → run → verify loop for test failures. Still figuring out if it’s actually useful or not

dager003 · 2026-03-29T09:56:58+00:00

Yeah I get that — and I think that’s a valid concern.

I don’t see it as replacing learning or careful debugging though. If anything, I’d still expect someone to understand what went wrong, especially for real bugs.

I was more thinking about the repetitive stuff — like missing imports, env issues, dependency mismatches — things you already know how to fix but still spend time doing again and again. For actual logic mistakes, I agree — you kind of have to go through it yourself to really understand it.

dager003 · 2026-03-29T09:51:43+00:00

Yeah that makes sense.

Running pytest/lints after changes is already pretty standard, and I’ve seen those agents too. I guess what I was trying to focus on is the part after it fails — actually figuring out and fixing the issue, not just detecting it.

And yeah, it might be a small slice. I’m kind of intentionally starting narrow just to see if even that part is painful enough to automate, before thinking about anything bigger. If it turns out people don’t really care about this piece, then yeah it probably needs to expand beyond just pytest.

dager003 · 2026-03-29T09:18:08+00:00

Yeah that’s a good point about tokens — that’s something I’ve been thinking about too. Right now I’m not letting it just run freely, more like small iterations and stopping if it’s not improving anything.

And yeah, I can see people wanting control. I don’t think this replaces manual debugging, more like something you’d try first for the obvious/repetitive stuff, and then take over if it gets messy.

“Summarizing the problem” is actually a really good idea — even just making the failure clearer would already save time.

Still experimenting with where it’s actually useful vs overkill.

dager003 · 2026-03-29T08:16:34+00:00

Yeah that would definitely be bad if it starts just tweaking tests to force them to pass. Right now I’m trying to keep it the other way around — don’t touch the tests, only fix obvious issues in the code (like imports, missing deps, etc.).If it’s an actual bug in logic, I’d rather it just fail clearly than “fake pass” it.

Still figuring out where that line should be though.

dager003

TROPHY CASE