refactoring

2 posts

Five Suggestions for Cleaner Code

Being involved in open source exposes you to a lot of different projects, individuals, coding styles... the works. It also means lots of code reviews. More reviews than I'd typically perform at work. In code reviews, I can be pretty opinionated. But I believe they stem from a strong desire to ensure that the code base is as clean and readable as possible.

With that said, these are all approaches that I would like to see all developers adopt. Nothing is hard to reason about and should be easy to start trying to implement from day one.

1. Avoid Flag Arguments

What exactly do I mean by a flag argument? I'll demonstrate with an example.

public void DoWork(bool log)  
{
  Execute();

  if(log)
  {
    // write to some log
  }
}

In the code above, we have a method that does something. When that something is done, it checks to see if we should log the event or not. If the passed in value is true, we go ahead and log the event. Otherwise, we bail out of the method.

This approach to writing methods is ultimately very confusing for the consumer. The implementation details may seem clear enough, but it's hard to reason about when used in the wild.

DoWork(true); // what could true possibly mean?  

I feel a much better approach is to actually split this behavior up into two individual methods. One method for each condition.

public void DoWork()  
{
  Execute();
}
public void DoWorkWithLogging()  
{
  DoWork();
  // log stuff
}

Here, we have a method DoWork that does just that. It only does the work and does not perform any logging. We've also introduced a method called DoWorkWithLogging that just calls DoWork, but additionally performs a logging operation. Hopefully, it's obvious given the name of this method, that it will not only do work, but it will log that it did it.

I feel this approach expresses a lot more intent throughout your codebase and reduces the cyclomatic complexity. We don't have any conditionals!

2. Do One Thing

Now, we hear this a lot, right? Do one thing and do it well. The single responsibility principle. However, it can be hard to actually reason about what does one thing actually mean? A lot of it is subjective, but I generally tend to reason about it in the form of abstractions.

An abstraction isn't a responsibility -- the caller doesn't know anything about the implementation details. The caller simply calls the abstraction and nothing more. Let's see an example.

public int CalculateCostForSize(int size)  
{
  if(size >= LARGE)
  {
    return size * LARGE_MODIFIER;
  }

  return size * DEFAULT_MODIFIER;
}

At first muster, this method is may seem completely acceptable. It's doing one thing, right? It's calculating the cost for a given size. However, there are a lot of implementation details that this method is concerning itself with.

The CalculateCostForSize method has to know what it means to be large. In this case, size is greater than or equal to LARGE. The method also has to know how to actually do the calculation. So one could argue that this method is actually doing three things.

  1. Figure out if the size is large.
  2. Calculate the cost when the size is large.
  3. Calculate the cost when the size is not large.

I feel a better approach would be:

public int CalculateCostForSize(int size)  
{
  if(SizeIsLarge(size))
  {
    return HandleLargeSize(size);
  }

  return HandleDefaultSize(size);
}

private bool SizeIsLarge(int size)  
{
  return size >= LARGE;
}

private bool HandleLargeSize(int size)  
{
  return size * LARGE_MODIFIER;
}

private bool HandleSmallSize(int size)  
{
  return size * DEFAULT_MODIFIER;
}

Now the method really only knows about the algorithm in which to calculate the size. It's not aware of any implementation details. Those are hidden behind abstractions.

3. Leverage Intention-Revealing Names

This is one that I see a lot and is an incredibly easy fix. Consider the following code snippet:

public void DoWork()  
{
  if(input % 2 == 0)
  {
    // when do I do this?
  }
}

Now as programmers, we probably all know what this code means. Using the modulus operator to figure out if a given number is even or not. But what if it's not clear? That's an easy fix then!

public void DoWork()  
{
  // check to see if the input is an even number
  if(input % 2 == 0)
  {
    // oh, why didn't you just say so?
  }
}

This is probably one of the things that grinds my gears the most when it comes to comments. I've ranted on comments before. But this just hurts me. Physically. Please avoid using comments to explain potentially confusing code. I humbly offer an alternative:

public void DoWork()  
{
  var inputIsEven = input % 2 == 0;
  if(inputIsEven)
  {
    // easy enough!
  }
}

Put another way, think instead if the example was some complicated business rule. We've all seen them..

if(input > 5 && input < 10 || (input >= 100))  

That really needs to use an intention-revealing name..

var isWithinAcceptableParameters = (input > 5 && input < 10 || (input >= 100))  
if(isWithinAcceptableParameters)  
{
}

No comments required!

4. Use Source Control.. Not Comments

I am a firm believer that code that is not required in any way to run the application, should not exist in the code base. While this does include dead code, unreachable code, etc (which may be hard to detect at times and sneak in). I mostly refer to code that has been commented out.

public void DoWork()  
{
  if(input > 0)
  {
  }
  /*
  * why didnt this work?
  * if(input > -1)
  * {
  * }
  */
}

It's an extreme example, but not uncommon in my travels. A more common example might look a little something like

var config = new Config();  
config.Title = "title";  
//config.MaxSize = 100;
config.MinimumSize = 1;  

Source control is an invaluable tool. Most say that regardless of what you're coding, you should always use source control. Even for the smallest of projects. There is no reason to muddle up your codebase with these type of comments when source control can retrieve it for you at a moments notice.

5. Avoid Long Parameter Lists

In the world of refactoring, there is a code smell called a data clump. I see this pretty frequently in code reviews. The developer introduced a slew of new methods that all have the same data set, but are all individual parameters.

Data clumps reveal themselves in a couple of ways. Through class members:

private int max;  
private int min;  
private DateTime date;

private string result;  

and through methods..

public string GetResult(int max, int min, int age, DateTime date)  
{
}

We even seem to instinctually clump these related datasets together without even noticing. Doesn't putting that newline between date and result just feel better?

The big problem is when we start copy-pasting these parameters into multiple locations. In order for the dataset to make sense, it requires a max, min, and date so why not encapsulate all three pieces of data into an object?

public class ResultContext  
{
  public ResultContext(int max, int min, DateTime date)
  {
    Max = max;
    Min = min;
    Date = date;
  }

  public int Max { get; }
  public int Min { get; }
  public DateTime Date { get; }
}

Now we can simply pass around the object, rather than a long list of parameters.

In the end, the true takeaway in all of this is.. please step away from // and /* */

A Look at Functional Refactoring

A Look at Functional Refactoring

Refactoring is a common task for developers.. hopefully anyway. It keeps code cleaner, more maintainable, and reduces code duplication. It also allows for code to be extended more easily, saving us time in the long run.

xkcd source: xkcd - The General Problem (https://xkcd.com/974/)

There are many approaches we can take to refactoring as outlined by Martin Fowler's refactoring catalog. I'd wager that one of the most common refactoring approach is taking code that exists in multiple spots, wrapping it up in a method, and then calling that new method. Otherwise known as Extract Method.

To give an example, we're probably all familiar with something like the following..

public void MyMethod()  
{
  using (var file = new System.IO.StreamWriter(@"C:\log.txt"))
  {
    file.WriteLine($"Called { nameof(MyMethod) }");
  }
}
public void AnotherMethod()  
{
  using (var file = new System.IO.StreamWriter(@"C:\log.txt"))
  {
    file.WriteLine($"Called { nameof(AnotherMethod) }");
  }
}

Now, there are a couple problems with this implementation. Where the log is being stored to is explicitly defined in both of the methods. This could be problematic in the future if we decide to change the location and/or the file name of the log file.

The other problem is how we are logging is present in both methods. If we ever decided to create distributed logs, we would have to go into each method that called our logging functionality and update the call to StreamWriter to a network call.

Fear not though, we can refactor this approach and make our code a lot better. Applying the Extract Method refactoring gives us:

public void MyMethod()  
{
  WriteToDefaultLog($"Called { nameof(MyMethod) }");
}

public void AnotherMethod()  
{
  WriteToDefaultLog($"Called { nameof(AnotherMethod) }");
}

public void WriteToDefaultLog(string message)  
{
  using (var file = new System.IO.StreamWriter(@"C:\log.txt"))
  {
    file.WriteLine(message);
  }
}

Now that all of the implementation details are put into a central location, and the dependent code snippets are calling the abstractions, it's a lot easier to introduce change into the system. This approach to refactoring is called Imperative Refactoring or "Inside out" refactoring.

imperative

Visualized, we take the code in orange (the code to refactor), drop it all into a new method, somewhere in isolation. We then have the code call that new method, that abstraction we just created, instead of being intimately familiar with all of the implementation details. Pretty neat.

However, there is another approach to refactoring called Functional Refactoring or "Outside in". It is useful for when you want to refactor the surrounding code. Think of try/catch blocks or using blocks.

functional

If that is not entirely clear, let's go ahead and look at some code.

Consider the following try/catch block that calls our default logger when a method throws an exception.

public void MyMethod()  
{
  try
  {
    SomeScaryStuff();
  }
  catch (Exception ex)
  {
    WriteToDefaultLog(ex);
  }
}

public void AnotherMethod()  
{
  try
  {
    LessScaryButStillPrettyScary();
  } 
  catch (Exception ex)
  {
    WriteToDefaultLog(ex);
  }
} 

So while the above may seem elegant and straightforward, we have some code duplication. We have explicitly written two try/catch blocks as well as called the same WriteToDefaultLog method. If we wanted to change that logging call, we'd have to update it in two spots.

This may not seem like such a big deal at first, but this could rapidly grow out of control if we wanted to call some logging functionality every time an error occurred in our application.

So how can we remove this duplication?

public static void Try(Action operation)  
{
  try
  {
    operation();
  }
  catch (Exception ex)
  {
    WriteToDefaultLog(ex);
  }
}

By refactoring out the surrounding try/catch block, and passing in the inside code via an Action, we're able to create a new method, Try.

In order to leverage this method, the caller would need to pass in the code that they would want executed within the try block through a lambda expression.

public void MyMethod()  
{
  Try(() =>
  {
    SomeScaryStuff();
  });
}

public void AnotherMethod()  
{
  Try(() =>
  {
    LessScaryStuffButStillPrettyScary();
  });
}

With this approach we're able to encapsulate all of the logic that our try/catch blocks contain into a single location. Need to change the logging approach when an exception is thrown? No problem, we just need to update our Try method and we're good to go.

But wait, there's more! Consider the need to connect to a database through ADO. First you need to establish a connection to your database, and then you need to create a command to execute against that connection.

using(var connection = new SqlConnection("myConnectionString"))  
{
  using(var command = new SqlCommand("dbo.myCommand", connection)
  {
    // do some work
  }
}

This is a pretty common setup, but has a couple drawbacks. First and foremost, it is verbose. The ceremony to call a procedure is multiple lines long. Secondly, if you ever needed to change how you retrieved your connection string... it'd be a sizable effort to update every procedure call. But there's hope! By applying some functional refactoring, we can abstract these implementation details away from the callers and make our procedure calls more succinct.

...
ExecuteCommand("dbo.myCommand", () => {  
  // do work
});
...

public void ExecuteCommand(string commandName, Action operation)  
{
  using(var connection = new SqlConnection("myConnectionString"))
  {
    using(var command = new SqlCommand("dbo.myCommand", connection)
    {
      operation();
    }
  }
}

As we can see, this approach can be applied to a lot of different scenarios. Logic and implementation details are hiding everywhere!, so don't always think Inside out... try Outside in!