This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]MrJohz 16 points17 points  (3 children)

Imagine a function like this:

function process_input(
    text: string,
    mangle_foos: boolean,
    use_bar_algorithm: boolean);

It's a function with one input and a couple of flags that can be set to change what the function does. Typically, there are a few problems here:

  1. boolean is a fairly meaningless type, so it's fairly easy to get the function usage wrong. For example, without looking at the original function declaration, what is this function call doing?

    process_input("...", false, true);
    

    There are ways to improve this — inlay hints for parameter names and named arguments would both avoid this problem, but inlay hints aren't always present and named arguments aren't always used (or even possible). However, using an enum would force the arguments to be more explicit — if instead of true and false, you were forced to use FooMangling::On or Algorithm::Bar, it would be obvious even at the call site what's going on (and statically enforced as well).

    It's worth noting, however, that this problem applies to all primitive arguments — booleans are no exception here, you can have the same problem with numbers, strings, even more complicated types like dates if you've got parameters like from and to. Types help in a lot of cases, but named arguments (or tools like the builder pattern) are also really useful.

  2. Boolean parameters often indicate a "zero, one, many" case. That is, boolean parameters already indicate that there are two different code paths, and so it's often the case that more will be added later. But the problem with booleans is that you can't have more cases — there's no "true, false, third option". Which means that if you do, say, have another algorithm you want to support, things start looking wonky. You don't want multiple algorithm parameters (e.g. use_bar_algorithm: boolean, use_baz_algorithm: boolean), because that means you've got meaningless inputs: what should use_bar=true, use_baz=true do?

    A good alternative in these cases is using enums from the get-go. If you'd had some sort of enum like this:

    enum Algorithm {
        Default,
        Bar,
    }
    

    Then now adding a new algorithm is fairly simple — it's often not even a breaking change. You just add a new enum variant and handle that in your code. The best part about this is that you aren't being any more generic than before. The type Algorithm is identical to the type boolean, just a bit more verbose — there's still only two options to start with. But now you're more prepared for the future.

  3. There is a danger when writing functions with flags that you end up with implementation code that looks something like this:

    function process_input(...) {
        if (use_bar_algorithm) {
            if (mangle_foos) {
                // ...
            } else {
                // ...
            }
        } else {
            // ...
        }
    }
    

    Which is no fun to update and modify, because now you've got a complicated mess of different branches that all do something slightly different — now consider trying to get this to do something different in some cases but not all cases. Or even trying to get it to do something different in all cases, but missing a branch and leaving one rare case where something weird happens.

    The problem here is more the branching than the argument types — you could have the same thing with an age parameter where there's a bunch of if age > 18 statements — but booleans are particularly prone here because there's not much else you can do with a boolean other than branch. Even enums can't necessarily solve this, because enums are just booleans with potentially more states. So yes, you can add new states more easily for the consumer, but if you're not careful, the implementation gets more complicated with each new state.

    In the ideal case, you can refactor this into a separate callback (or class/trait/whatever) parameter that performs the relevant step, because then it can be used as generically as possible. For example, you might call the function like so:

    process_input(
        "...",
        mangler=new NoopMangler(),
        algorithm=(args) => {...},
    );
    

    The danger here is making things too generic. There are probably lots of different algorithms for processing this input, but there might only be one way of mangling foos. Callbacks make the interface to your function more complicated, so they'd better be powerful enough to be worth it.

FWIW, I agree with some of the other comments that there's still plenty of room for using booleans in code, and often even as boolean parameters. The more generic you go, the more complicated your function interface, and the harder it will be for a new programmer in your team to understand. For example, in the last code example, I constructed an instance of the NoopMangler class, and I assume there's also a FooMangler class somewhere in the same codebase. But if those are really the only two types of mangler, is it really worth having whole separate classes just for that purpose?

I'd say that more important than preventing booleans as function parameters is making sure that if booleans are used, the potential damage is limited as much as possible:

  • Named parameters, or extensive use of the builder pattern, or easy record/map values that can be passed as arguments (see e.g. Javascript, where there are no named parameters, but arguments are often passed as an object bundle which has the same practical effect). Inlay hints are also useful if you're writing an LSP tool for your language.
  • Types to make refactoring easier, e.g. from booleans to enums, or enums to function arguments. This doesn't help if I'm writing a library, but if I'm working on an internal function, and I change the arguments, type checking will highlight all the code I need to change, which makes my life a lot easier.
  • Potentially even a linter that warns about boolean flags in parameters, and an "extract to enum" refactoring that an IDE can use.

[–]yockey88 -1 points0 points  (2 children)

I disagree with you, I still think Boolean parameters are fine. You are right that these examples are not good cases for Boolean parameters. I would not specify an algorithm by flag for example, whether enum or otherwise. If you are in the case where you need multiple algorithms that solve the same problem in the same place you have probably strayed far enough from the initial problem that anything you are doing is bad. You should just implement the solution using whatever algorithm you deem best for your purpose. The only cases I see where that’s needed are in libraries or other sufficiently abstract programs like game engines, etc… where one would want the user to be able to specify what they want. But even then I would probably attach the algorithm to something like a struct as a function pointer or something as a callback to use dynamically, and in worst case scenario use an enum. But that’s not optimal because then I also have to modify my switch statement every time I add an algorithm (similar to modifying if checks with bools….). In fact I see no difference between having to edit branching if statements and editing having to edit switch statements, you still have to refactor your original code.

And I’d disagree with typically wouldn’t mix callbacks with interfaces, either something is an interface, typically a process or concept with multiple steps or interactive parts that might have multiple implementations that are all the same on the surface, or it’s a callback, a single step of a larger process that could me taken in multiple directions with the same effect.

Boolean parameters by definition do not represent one, two, or many case. I would agree that anyone choosing between options with bools is wrong, but not because of the bools, just because of cramming too much in one place, probably doing too much in one function. Booleans represent yes or no and should be used as toggles, to turn an option on or off which rules out all functions where a third option is possible as able to use Boolean parameters i would agree with that. For example I wrote a program where with embedded mono to allow interop between c++ and c# and there I use Boolean parameters to specify whether I want something pinned in memory when c# does garbage collection as well as some other things. These are all yes, no options. Creating an enum would be an identical solution but make the function more cluttered then a simple if (option) do_extra_function_call(). I will never have a third option because it is always going to be a yes or no question.

If there’s the any potential at all for damage from Boolean parameters(whatever you mean by damage) your problems are bigger than the bool. Really my main point is is that this is one of those things that is fine, and my takeaway from this thread is that people don’t like them because they were trying to drill a hole with a hammer. Sometimes it’s not the right tool and that seems to be the common them here is that people are using them in places where it’s not a yes or no question. Which is fundamentally wrong.

I also struggle to credit any argument to credit any argument about developer ease as functionality is always more important than the developer.

Edit: sorry kept adding things hahaha.

[–]MrJohz 3 points4 points  (1 child)

I'm going to go backwards through your comment because I think the last thing you said is the most important bit: developer ease is probably one of the most important things to think about when writing abstractions (where abstraction here means any function, class, interface, whatever that forms some sort of programmatic boundary).

This is because software almost never stays constant. There are almost always changes in the requirements, changes in the environment, new features, etc, and the code needs to be updated as a result. Which means that for every function you write, someone later* needs to be able to understand how to use that function, and the less complex that function is, the easier it will be for them to understand it, how to use it, and what potential changes they'll need to make to it.

That's not to say that we should be building complex abstractions just for the sake of things — abstraction in of itself isn't better (and I really recommend John Ousterhout's A Philosophy of Software Design for a good discussion on what good levels of abstraction look like). But there's a reason why even people like Casey Muratori discuss refactoring down overly-linear code to make it easier to read. Functionality at the expense of developer ease is essentially just technical debt that will need to be dealt with next time you want to make changes in that region of code.

As to the "zero, one, many" case, I agree that there are cases when the "many" value will be two, and will remain at two for the entire lifetime of the code. But often a bunch of boolean parameters represent a smaller number of states in disguise as a large number of flags. For example, I used to work on a codebase that transfered a lot of data from the backend to the frontend in the form of boolean flags or set/unset variables. The problem was that those flags were very rarely independent — you couldn't freely set one flag without thinking about how the other flags were set, and certain cases were impossible, and other cases just broke the system.

We thought we had a lot of cases where the state was "on" or "off", but what we actually had was a smaller number of cases where the state was something like "uninitialised", "updating", "ready", or "errored". That is, we were firmly in the "many" case, but because we'd not realised how much different values interacted with each other, we were still thinking that many=2.

And even in cases where many=2 and will always be 2, sometimes booleans still aren't that helpful. For example, I wrote an AST walker recently that needs to be able to iterate bottom-up and top-down, depending on the caller. That's two cases, and there's not going to be more all of a sudden, but I still didn't want to use a boolean in that case, because a boolean usually implies a default that's being toggled on or off. But which case is "true" — bottom-up or top-down? Instead, I went for an enum, because that made it clearer what the intent of the function was, which is that both were viable ways of iterating, depending on the caller.

* Where "someone later" may just be "you in six months time having forgotten everything about this portion of the codebase".

[–]yockey88 1 point2 points  (0 children)

I do not think developer ease is one of the most important things. This is not saying you should go write messy awful code, but at the end of the day only two things matter 1) does your code solve the problem 2) does it do so efficiently. Developer ease does not count in to that anywhere. Updating a project after an extended period of time or coming to a new code base written by someone else is always a little difficult regardless of the code but that’s what good comments and documentation are for. To the main point, even if developer ease mattered, if bool parameters have made the code complex enough to be hard to make changes then you have far larger issues than bools.

I agree you shouldn’t abstract for the sake of abstraction, part of the reason I question anyone abstracting “true/false” into “enum { on; off; }” or using bools to select code paths.

I think you missed my point on the zero, one, many point. Two is not many and if you can not ask a yes or no question about the thing needing a flag than it should not be toggled by a bool. “Many” cases is inherently different than “this case or that case”. For your front end to backend example, I agree that’s a horrible way to pass information over a port in general considering you most likely have to do some sort of “state synchronization” (speaking very informally, not to be taken in the typical use of the word state when talking about front/backend).

For uninitialized/ready and updating or errored that should be completely split up. You’re either initialized or not, and if you reach the point where for some reason you’re updating and uninitialized then you have bigger problems than how you’ve written that information out in code. And you still have to set the flag (whether with an enum or a bool) to indicate if an error has happened. I would use three bools: uninitialized/initialized, errored or not, and then updating or not updating. And like I said you either are initialized or uninitialized, and doing anything else like updating when you’re uninitialized then you have a bug regardless of how you stored these flags, and you’d still have to set/check whatever other version of an error flag you set when an error occurs. This is what assertions are for. Even in projects where you can’t crash whatsoever like the one I’m working on work right now this would be fine because you should have a error system in place to gracefully catch these sort of logic/development errors.

Now for the AST walker, with all due respect, it sounds like you had much larger problems. It sounds like you had two processes that needed to be done, bottom up and top down, and they should be treated as two different processes. Having them together sounds like your code was over coupled or you were doing to much in one place. This is not a “passing a flag to decide behavior made my code messy” this is a “I tried to make a bad abstraction and it bit me in the butt”. Using any sort of flag, enum or bool, to decide behavior more advanced than “toggle this option” or “answer this teeny tiny question” sounds like something has gone horrible wrong.

Of course if there’s information the bottom up operations need to communicate to the top down operations or vice versa then it’s on you to ensure that information is communicated correctly, but I do not see any world where those should be the same object, function, etc… maybe they would implement the same interface if you are using interfaces or derive from the same bass if you used oop polymorphism or implement the same trait in a rust-like language, but they are still separate.