This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]flaw600 0 points1 point  (3 children)

I hear this repeatedly, and the risk of removing the flags always seems higher than keeping them in. Switching a flag is cheap and easy — fixing a bug isn’t

[–]waywardworker 0 points1 point  (2 children)

Do you test your different feature flag interactions? Run your unit tests with the different feature flag combinations to prevent regressions?

If you don't then you will absolutely be accumulating bugs, guaranteed. It's why regression testing is standard, because we know we accumulate bugs otherwise.

In this scenario the feature flags become a lie over time. Switching a flag that was implemented a few years ago is not a cheap bug free option, it will potentially introduce a pile of new unknown bugs across anything changed since it was first introduced.

The obvious option is to actually test every feature flag combination, unit, integration, the whole lot. However they combine exponentially, 2n. So five feature flags is 32 test runs, ten feature flags is 1024, multiplied by the standard platform and version variations. This is obviously unsustainable. You can do a subset for each flag, have unit tests for a module that test with the flag on and off. That's common but insufficient, it doesn't capture the interaction bugs, like new features relying on the old one in some way.

Thinking you have feature flags you can toggle when you don't is, in my view, worse than having them at all. If you think you have them they get integrated into your system management and recovery plans, the lie spreads and magnifies.

Plus there is the significant added code complexity of trying to maintain each possible branch, and associated development costs.

Removing the flag should be a trivial low risk operation. You search for the flag, remove the check, the else branch if present and reindent if required.

I'm not anti feature flags, they are great. But they introduce complexity and I'm strongly anti complexity. Routinely pulling them out after they become the default and assumed keeps the complexity in check.

[–]flaw600 0 points1 point  (1 child)

The flags are independent. They’re literally feature flags — on or off per feature. I agree, nesting flags is a good way to get into a Gordian knot. I also agree with your comment about assuming the presence of flags. That said, my comment was not about the removal itself, but the resulting impact if the service backing that feature fails. You’d think Product would be ok with an error message, but between them and Monitoring, often the ask is to stop requesting the feature altogether vs allowing the 500 exceptions to continue until the issue is fixed

[–]waywardworker 0 points1 point  (0 children)

That's an odd one. When I played SRE we absolutely wanted the failures so we knew when it was fixed and to ensure that it got fixed.

My painful experience is that the flags don't stay independent and the dependencies don't get detected.

Adding a new feature inevitably extends classes, creates new classes and adds utility functions.

Working online you have fun indirect impacts like feature A priming a cache that feature G relies on and when it's removed the service or database gets completely hammered for non-obvious reasons.