Advertisements
Posted by: jsonmez | January 27, 2010

Refactoring Boolean Conditions Into Methods

Some people really, really don’t like to refactor boolean conditions into private methods.  One time someone told me that if they ever saw code like:

 public void DoStuff()
 {
    if (isProgramSpecial())
    {
       // do stuff;
    }
 }

 private static bool isProgramSpecial()
 {
    return ProgramCode == "Special Code";
 }

They would find me and kill me. I wonder if Bob Martin gets lots of death threats, because he advocates this practice in his book “Clean Code: A Handbook of Agile Software Craftsmanship“, which I am presently reading.

I’ve been doing this kind of refactoring ever since I read “Code Complete: A Practical Handbook of Software Construction“.  (By the way every time I mention this book, I feel obligated to tell you that if you have not read this book, go read it now.  Seriously.  Buy this book and read it.)

Now, the example I show above is a little “simple”, but it is still easier to read than leaving the ProgramCode == “Special Code” in the main method.  (Forget that”Special Code” should be a constant for now.)

Let’s look at a less trivial example:

if ((smellsGood && ( (isChicken || isSeafood) && !isCheapBrand && !iFeelRandomlyPicky)) || otherCatIsEatingIt)
{
    cat.eat();
}

In this example it is much more unclear what the logic is trying to do.  When you are reading this code, you have to mentally figure out what all these boolean statements are actually testing.

Tell me… Is this easier to understand?

public void blah()
{
   if ( decideToEatFood())
   {
     cat.eat();
   }
}

private boolean decideToEatFood()
{
   return (smellsGood && isFoodILikeAndImNotGoingToBePicky()) ||  otherCatIsEatingIt;
}

private boolean isFoodILikeAndImNotGoingToBePicky()
{
   return (isExpensiveChickenOrSeafood() && !iFeelRandomlyPicky);
}

private boolean isExpensiveChickenOrSeafood()
{
   return (isChicken || isSeafood) && !isCheapBrand;
}

If you think the first example is easier to understand, you’re much smarter than I am, because I just can’t wrap the first set of boolean conditions around my head, (and I wrote them).  It may seem silly to write little one line methods, but the value is in abstraction.  These one line methods, make it much easier to understand what the code is doing, and allow you to choose your level of “zoom” into the logic.

  • You can choose not to care about how the cat decides to eat, and just look at it from the level of if the cat decides to eat, then he eats.
  • Or you can look at it from the level of how does this cat decide to eat.
  • Or you can zoom down one more and look at the what kind of food does that cat like level.

Writing good code is about making code easy to read and understand, not about making it compact.

What do you think?  Do you agree with refactoring boolean conditions?

Advertisements

Responses

  1. Sounds pretty solid to me. You actually had someone argue against this based on some ‘Compactness Principle’?

    • Yeah, I have had people call that bad code. Basically saying that a one line method is pointless especially if it is not reused anywhere.

      I could almost understand this back in the C++ and slow processor days when having a bunch of little methods would create extra entries in the function pointer table and slow down performance, but any modern compiler will inline this statement automatically.

  2. I think it’s ridiculous, despite the fact that the compiler can inline those. I’m going to do you a favor and explain exactly why, and it will be long and clear. If you haven’t got the attention span to read it all, please, go back to watching reality shows.

    I can understand chopping up a large block of code into separate methods, if and only if those blocks are going to be repeatedly called from multiple places in your code, OR if your method is several pages long. But if these are going to be called from only one place, what you’ve done is to make your code HARDER to debug.

    You believe it’s easier to read because you’re the one writing it, and it’s true that it does look more like english, and it does looks simpler on the surface. But if you’re reading this for the very first time, it becomes much harder to understand. Sure, I can guess what each of those methods mean from their name, but if there’s a bug, I need to understand exactly what the methods do, and how they interact with each other to trigger the bug I’m trying to fix in your code.

    To do that with your “optimization”, I’d have to repeatedly scroll back and forth (or worse split my screen) and paste it back together in my head to see where it actually breaks. In doing so, I risk losing my context several times. If I don’t lose interest, I may have to paste your code back into a new window, or write it down on paper or whatever, and that will take me far longer.

    Remember, humans can only keep about 7 items in short term memory (think it as registers). If your code is all over the place, (and despite your short example, in real life it will be), it will tend to get exponentially harder to read, and thus, shallow bugs that could have been caught by a second set of eyes will now be glossed over because the reader loses interest or track of what they were trying to do.

    Instead, format your function so it looks cleaner, and add a comments about what you’re trying to do.

    It’ll be all in one place, cleanly visible and easy to understand. Use white spaces to format things properly. If they can’t understand your code, and you’ve used proper formatting and comments and names for things, it’s because they don’t understand either the language, or don’t know how to program and have no business reading it in the first place.

    Also, you seem to have parens around a subexpression that doesn’t need it:
    (smellsGood && _(_ (isChicken || isSeafood) && !isCheapBrand && !iFeelRandomlyPicky _)_ )

    The smellsGood is &&’ed with the next subexpression, and you can remove the extra parens, because its inner expression is a string of &&’s. This below, is much better:

    (smellsGood && (isChicken || isSeafood) && !isCheapBrand && !iFeelRandomlyPicky)

    The parens around the (isChicken || isSeafood) are of course required, but the rest of it, is simply a chain of && statements. They don’t need to be a subexpression.

    This in itself is a hint that you don’t yet understand boolean expressions as well as you should, and are compensating by architecture astronautism.

    It may sound like a put down, but it isn’t.

    As you’ll gain experience over the years with boolean expressions, you’ll come to understand why more experienced people are yelling at you. Don’t hide what you don’t understand behind abstractions that complicate things, instead, go deeper into it until you grok it well. When people complain, ask them to explain why they think you’re wrong and how they would have done better. Learn from them instead of taking the cop out that the compiler will fix it for you.

    When you have a chain of && expressions, if any of them is false the whole thing will short circuit immediately to false. When you have a chain of || expressions, if any of them is true, the whole thing will short circuit to true. Use this to your advantage. It will both produce far better code, and make things readable.

    I’ve moved the isOtherCatEatingIt to the front, because that shortcircuits everything else when it’s true, and it is simple expression (just a single variable) in the major outer expression. So it clearly stands out and says “If I’m true, this whole thing will be true.”

    Sure, your compiler has logic in it to optimize boolean expressions, but it may not always do as good a job as when you give it hints as to which expression is most likely to be first. That’s why I put those which I’ve assumed are most likeliest first. (In the real world, I’d know the code well, so I’d have a better idea than in an example, and yes, I do take the time to think about each expression I write. And yes, it does make a difference in real life.)

    // If the other cat is eating it, trust his judgment.
    // otherwise, if it’s either chicken or seafood, and
    // and it meets all my other criteria, I’ll eat it.
    if ( otherCatIsEatingIt ||
         (
              (isChicken || isSeafood) && smellsGood && !isCheapBrand && !iFeelRandomlyPicky
         )
       )
    {
       cat.eat();
    }

    (Sorry of the HTML form messed up the required indentation spaces, I’ve had to use option-space to make it work in the form, I’ve no idea what it will look like once I’ve hit submit. For some reason hitting space twice only produces a single space in the reply form. Weird that.)

    Notice, I’ve not made this as compact as possible as that would be harder to read. It’s not about violating compactness.

    You could also do && !(isCheapBrand || iFeelRandomlyPicky), however, that would make it less understandable because of the negation required, and would require another subexpression, which would break the short circuit chain.

    And now, as a reader, I don’t have to scroll out of your method to see what each of those three expressions means, then scroll back and see why your code doesn’t work the way it should, then scroll back again, etc. It’s all in one place, and it’s easy to see where there are bugs, and all three subexpressions are clearly denoted by the indentation as well as the comments.

    Normally I wouldn’t put in comments for something this trivial, but since you think it’s complex, comments are needed to clarify.

    • I almost didn’t approve your comment, since you seem to have the gusto to be condescending while lacking the courage to post it in any other way than using a fake email address and an anonymous name.

      There are a few reason why you are wrong. Although some of what you say makes sense.

      First what I agree with:

      You are right about not needing the extra parens in the first example. But, not everyone clearly understands order or precedence, and even when they do, parens can also be used to help separate logical areas of interest.

      You are also right about short-circuit evaluation. I know this, and you know this. But not everyone does, and relying on short-circuit evaluation can often lead to someone refactoring the code and not realizing that you were.

      Why I disagree with you:

      1. If you are using a modern compiler, it is plenty easy to “jump” around the code if you need to. If you need to see how the cat determines isFoodILikeAndImNotGoingToBePicky then you can go there easily and you just have to understand that one piece.

      2. I am trying to actually reduce the amount of things you have to think about in your head. The way I am reducing them is by saying isFoodILikeAndImNotGoingToBePicky <- You don't have to know how this works in order to know if the cat will decideToEatFood. BUT if you want to know how it works, then go ahead look inside, BUT when you are looking inside, you won't have to think about smellsGood and isOtherCatEatingIt. Instead you can focus on just the part of the logic you are interested in at that time.

      3. Although your comments clearly explain the conditions that follow, we all know that the code will change, and your comments won't. That is what happens in code that is being maintained. Also, what if you comments are wrong, so I just assume the code is doing what your comments say?

      In general though, if I saw the code the way you organized it above. I would leave it alone. It is pretty easy to understand. I like how you broke apart the or condition from all the ands, so that I can clearly see that there is a choice between this one condition and all these other ones combined. The auto-formatter will probably kill your code and make it hard to read, but assuming we had that turned off, it is not that big of a deal.

      My example is a bit contrived, but basically I am saying that breaking out a logical condition into a method with an appropriate name, in most circumstances, will make your code easier to read and understand.

  3. Yes, sorry that, I’m an old cypherpunk graybeard, who values both privacy and anonymity, it’s not about courage, I just value anonymity and privacy. I don’t believe identity has anything to do with the technical merits of this discussion (from either point of view). And yes, it is possible to build anonymous reputation.

    If anyone says “what have you got to hide if you’re not doing anything wrong?” try answering back with “Oh, OK, since you have nothing to hide, please write down your full name, social security number, date of birth, address, mother’s maiden name, credit card numbers and expirations, and all your bank account numbers on this piece of paper and I’ll post them on the web for you.” See what reaction you get. 🙂 Privacy is important, especially in this day and age.

    Forgive me if I sound condescending, that’s certainly not the intent.

    BTW, a modern compiler has nothing to do with jumping around or editing code. That’s what an editor or IDE is for. (Perhaps this is why I sound condescending, but please don’t confuse which job each of your tools does.)

    Regarding points 1,2 this is our core disagreement.

    If I have to jump around, you’ve already made my life exponentially harder by hiding things from me. I get it, I know that to you it sounds easy “just go scroll up 500 lines and read it and then come back”.

    But your expressions are part of other expressions which are used together as a whole. They’re not standalone. To figure out what you mean in your code, I have to mentally paste it back together. When I scroll up 500 lines, then scroll back down 500 lines three times (or more), I’ve incurred several context switches and no longer have a crisp view of your code.

    I have some fuzzy idea maybe, or if I’m not lazy and know better, I’ll take your code and paste it somewhere and rewrite it, not to replace it in the source control system, JUST TO BE ABLE READ IT and understand it. This is what I need you to understand. Look at it from the point of view of a debugger, not the original author. If I’m debugging your code, it’s never optional for me to skip over parts of it that are involved in a bug. I need to see all of them in one place so I can figure out where the bug is and in what conditions it gets triggered.

    I absolutely understand what you intend, but to me, you’re actually achieving the opposite.

    (BTW, the “I” and “you” in my posts aren’t intended to be personal, “I” is the debugger, and “you” is the original author.) I’m trying to debug your code, I need to understand it, and see all of it. (at least, all of it involved in the bug I’m trying to fix.)

    Solutions?

    Perhaps one solution, assuming the language allows for it, is use nested functions and define those functions right above your code, such that they fit on one page. That’s not necessarily an evil thing, and I’d be more accepting of it in that case because I could fit it all the same screenful.

    On second thought, that’s still ugly, and a much nicer option is to simply define three variables and use those instead. Like this:

    boolean decideToEatFood = (smellsGood && isFoodILikeAndImNotGoingToBePicky()) || otherCatIsEatingIt;
    boolean isFoodILikeAndImNotGoingToBePicky = (isExpensiveChickenOrSeafood() && !iFeelRandomlyPicky);
    boolean isExpensiveChickenOrSeafood = (isChicken || isSeafood) && !isCheapBrand;

    This would work for me, would you also find it acceptable?

    Now, you’re no longer hiding code, and I don’t have to jump around. I can see all of the involved code, and find the bugs. Each variable has your intended meaning as its name, and it’s all much nicer and cleaner when read by those who might get confused by large expressions.

    Honestly though, as for someone else reading the code, if they don’t understand boolean logic, precedence, or optimal use of parens, they have no business editing your code, never mind actually debugging it or updating it. If they’re incompetent, let them seek out something else to make a living with. If you let incompetents work on your code, you’ll only wind up cleaning the mess later, and it will cost you 10x the time it would have taken you to update the code in the first place. It’s fine to delegate, but it’s never acceptable to accept incompetence.

    Regarding 3, I agree that comments can fall behind the times, but that’s the responsibility of whomever updates it, just as it is their duty to rename variables that no longer match their names, or to rewrite or remove code that no longer does what it needs, or to update documentation. Everyone has to follow the coding standards of that group.

    If someone screws up, the source code repository will show you who checked in the updated code but left uncorrected out of date comments, the same way as if they’d checked in code that broke the build. This is why you have periodic code reviews and cleanup.

    If they make a mess, send’em back to fix it. If they’re incompetent, or repeatedly break the standards for coding that everyone else in your group follows, show them the door.

    I come from environments where it’s bad form to randomly reformat someone else’s code (either with an autoformatter, or by hand.) Granted modern repository systems don’t care about whitespace as much as the elder ones, but that’s how I learned.

    To me, white space is important and shouldn’t be tampered with frivolously. By all means use the white space to your advantage, to make the code easier to read. We all know the saw about clean code having less bugs.

    Tell your tribe to do the same. Also, everyone needs to agree to not check in code unless they’re fixing bugs or rewriting it – that way, white space is preserved, also code is preserved and unintentional typos aren’t checked back in. And if they fix or rewrite a piece of code, they now are responsible for it.

    Also, when you touch some code, you leave a marker in the comments describing the method. i.e.

    // 2010.03.30 rtm – added isChicken to food types http://127.0.0.1/track?featurereq=20202
    // 2010.03.31 tim c. may – fixed cat segfault http://127.0.0.1/track?bugid=102320
    // 2010.04.03 anony mouse – fixed foo bar baz condition for http://127.0.0.1/track?bugid=102343

    That’s how we roll. Sometimes the comments clutter up the source, so we remove them during the code reviews before starting new major releases. But they’re still in the source control and the bug db if we need to look back.

    (BTW, can you change your CSS template to not have a fixed width? I have a 1440×900 display, there’s no reason I should only see 1/3rd of the display being used. This is actually in the same spirit of my complaint with your original method. Let me make optimal use of my screen. A 2″x2″ comment box is also too small for meaningful discourse. I typed this up in a separate text editor and pasted it in.)

    • First off before anything else. I just want to make this point clear. Obviously, if you have read any of the other posts in my blog, I understand the difference between a compiler and an IDE. I said compiler, but I assumed you knew I meant IDE. Yes, my mistake, but I didn’t think I needed to be that specific.

      As far as your example, I can live with it, but how much nicer is it to just go ahead and hide away those booleans variables into methods, so that we don’t have to look at them? I used to write code like that, but I realized that since the actual compiler will inline the methods anyway, you might as well hide them so someone doesn’t have to look at them unless they want to.

      Speaking of which, it is so easy to jump to the definition of a method in any modern IDE by holding down the crtl button and left clicking. This is why it doesn’t matter where in the file the code is. You can just get there by clicking and going to definition. If not for that feature, I would probably agree about using variables instead of methods.

      One point that I disagree about very strongly, and I think that most of my readers would, is about putting comments into the code to indicate some change you made to a variable or method. That is what source control is for. The second I see a comment like that, I immediately delete it. That is an archaic practice and is no longer needed.

      I can see that you have wisdom, but that it is also mostly self-taught. Which is fine, but you must understand that the knowledge of the rest of the world will always supersede yours. I don’t claim to know everything, and I certainly know little in comparison to what there is to know about programming, but my source of authority is the empirical evidence of others who have learned more than I can in a lifetime.

      Can I change you? No. Of course not, but you bring to light an interesting contrast between the old guard and the new guard.

      As far as randomly reformatting code. Get used to it, because most software projects I have been on in the last 4 years have used and auto code formatter to maintain consistency. You code should be so clear that whitespace is not needed to understand it. I used to think that comments were good, but now I realize that comments are only representations of failures to properly communicate what our code is doing in our code. (Not my words, but the words of Uncle Bob Martin via Clean Code.)

      It is easy to claim the value of privacy by exaggerating the amount of information that is asked from you, but I think you will find that any man worth his salt will stand on his words, right or wrong.

      Nevertheless, I respect you and your opinion and I think there is a place for it. But you may find that many others do not, and one day you may find that you are wrong.

  4. I was reading the comments here and I just figured I’d add my opinion to the mix. I’m not exactly seasoned, but I have been thinking a lot lately about how I can improve the readability of my code. While I would say that I do not like blowing up my call stack, one liners I believe do not really do any harm there. I mean seriously, you have to look for one function and that’s in the context/scope that you know the function exists. As long as the file is kept to a reasonable line count, there is no problem with making these sorts of boolean expressions into functions. I would, however, caution nested expressions. While I like the idea of zooming into implementation from abstraction, I think that it can introduce unnecessary complexity. It really depends, but I think you ought to have a good reason to do so.

    I do agree though with this idea of one liners. I found it is especially helpful when your logic might be based off of some more concrete idea than what the code tells you.

    Random example:
    if (prisoners.Count > 250 && prison.SqFt < 5000)
    {}

    can be:
    if (overCrowded()) {}

    In some cases the logic can be more comprehensable (not just readable!) if you use this technique – and I think that those are the proper times to use it. (And they are more common than you think..)

    I would also go as far as to say that in some cases abstracting away logic one layer could help in restructuring or breaking changes happening, because the logic for a condition can possibly change over time, but the idea behind the logic, or the end of the logic, can stay the same.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Categories

%d bloggers like this: