coding style

The Importance of Naming

A story all too real...

We've all seen the joke, time and time again --

"There are only two hard things in Computer Science: off by one errors, cache invalidation, and naming things."

It's true. Naming things can be really hard. There seems to be all of these hidden rules around how we can name our properties and classes.

You: I'm just going to introduce this helper class, FileHelper.cs
The World: No! You can't use Helper in your class! That's just begging to violate the single responsibility principle.

But naming things correctly can save yourself and others a lot of time. I'll share with you an example from earlier this week. Consider this command:

webdriver-manager start

Assuming you had no knowledge of what this command did, I bet you have some guesses. Maybe you would have only one guess, and I'd honestly be Okay with that. You're probably thinking:

Well.. it starts the web driver manager..?

And you'd be right. Almost.

Unfortunately, webdriver-manager start also performs an update. Although the only way you'd be able to figure this out is if you read the documentation (which honestly seems a little buried to me) or if you ran into the same issue I ran into this week.

While not incredibly relevant to the story, if you want to learn more about what webdriver-manager is, you can read the projects npm page or the GitHub repository. The TL;DR is that it is a means to manage E2E tests for Angular projects.

It goes a little something like this...

For most of the things we develop at my company, we put them into Docker containers. This includes our end-to-end (E2E) tests. The Dockerfile for our tests is pretty straightforward, and honestly as to not add too many unnecessary details, the only thing that really matters is that we have the following command in the Dockerfile:

RUN webdriver-manager update

When the image is built, webdriver-manager update will be run, which will install the latest version of the webdriver manager. Docker images are immutable. That is, when they are created, they do not change. This means that the Docker image will be created with the latest version of the web-driver manager.

Now, to start the web-driver manager, we need to run the infamous webdriver-manager start command from within the Docker container.

Though depending on when you created your Docker image and when you started running your container, you're going to get one of two scenarios:

  1. The container will start up just fine and run your tests as you expect.
  2. The container will error when trying to run the webdriver-manager.

This is due to the fact that, unfortunately, webdriver-manager start not only starts, but attempts to start the latest version. Regardless of what version is installed. So it is possible that a new version has been released, and the Docker image is no longer relevant.

Luckily the solution isn't too bad. We just need to update the Dockerfile to update to a specific version. This forces our Docker image to always have version 3.0.0 installed.

RUN webdriver-manager update --versions.standalone 3.0.0

We then need to change the webdriver-manager start command to also include the same parameter:

webdriver-manager start --versions.standalone 3.0.0

Which in turn, forces the webdriver manager to start a specific version.

A simple solution, but a problem that took a decent time to figure out what was going wrong. Never would I have imagined that start did more than just that. Had the command been called startupdate or just left as start with an optional update parameter, the problem would've been much more apparent.

The biggest takeaway from all of this is that your naming should say exactly what it is doing and..

Have no side effects

Side effects are lies. Your function promises to do one thing, but it also does other hidden things. Sometimes it will have unexpected behavior. They are devious and damaging mistruths that often result in strange temporal couplings and order dependencies.              – Miguel Loureiro

Your code should do what it says it does, and nothing more.

Five Suggestions for Cleaner Code

Being involved in open source exposes you to a lot of different projects, individuals, coding styles... the works. It also means lots of code reviews. More reviews than I'd typically perform at work. In code reviews, I can be pretty opinionated. But I believe they stem from a strong desire to ensure that the code base is as clean and readable as possible.

With that said, these are all approaches that I would like to see all developers adopt. Nothing is hard to reason about and should be easy to start trying to implement from day one.

1. Avoid Flag Arguments

What exactly do I mean by a flag argument? I'll demonstrate with an example.

public void DoWork(bool log)  
{
  Execute();

  if(log)
  {
    // write to some log
  }
}

In the code above, we have a method that does something. When that something is done, it checks to see if we should log the event or not. If the passed in value is true, we go ahead and log the event. Otherwise, we bail out of the method.

This approach to writing methods is ultimately very confusing for the consumer. The implementation details may seem clear enough, but it's hard to reason about when used in the wild.

DoWork(true); // what could true possibly mean? 

I feel a much better approach is to actually split this behavior up into two individual methods. One method for each condition.

public void DoWork()  
{
  Execute();
}
public void DoWorkWithLogging()  
{
  DoWork();
  // log stuff
}

Here, we have a method DoWork that does just that. It only does the work and does not perform any logging. We've also introduced a method called DoWorkWithLogging that just calls DoWork, but additionally performs a logging operation. Hopefully, it's obvious given the name of this method, that it will not only do work, but it will log that it did it.

I feel this approach expresses a lot more intent throughout your codebase and reduces the cyclomatic complexity. We don't have any conditionals!

2. Do One Thing

Now, we hear this a lot, right? Do one thing and do it well. The single responsibility principle. However, it can be hard to actually reason about what does one thing actually mean? A lot of it is subjective, but I generally tend to reason about it in the form of abstractions.

An abstraction isn't a responsibility -- the caller doesn't know anything about the implementation details. The caller simply calls the abstraction and nothing more. Let's see an example.

public int CalculateCostForSize(int size)  
{
  if(size >= LARGE)
  {
    return size * LARGE_MODIFIER;
  }

  return size * DEFAULT_MODIFIER;
}

At first muster, this method is may seem completely acceptable. It's doing one thing, right? It's calculating the cost for a given size. However, there are a lot of implementation details that this method is concerning itself with.

The CalculateCostForSize method has to know what it means to be large. In this case, size is greater than or equal to LARGE. The method also has to know how to actually do the calculation. So one could argue that this method is actually doing three things.

  1. Figure out if the size is large.
  2. Calculate the cost when the size is large.
  3. Calculate the cost when the size is not large.

I feel a better approach would be:

public int CalculateCostForSize(int size)  
{
  if(SizeIsLarge(size))
  {
    return HandleLargeSize(size);
  }

  return HandleDefaultSize(size);
}

private bool SizeIsLarge(int size)  
{
  return size >= LARGE;
}

private bool HandleLargeSize(int size)  
{
  return size * LARGE_MODIFIER;
}

private bool HandleSmallSize(int size)  
{
  return size * DEFAULT_MODIFIER;
}

Now the method really only knows about the algorithm in which to calculate the size. It's not aware of any implementation details. Those are hidden behind abstractions.

3. Leverage Intention-Revealing Names

This is one that I see a lot and is an incredibly easy fix. Consider the following code snippet:

public void DoWork()  
{
  if(input % 2 == 0)
  {
    // when do I do this?
  }
}

Now as programmers, we probably all know what this code means. Using the modulus operator to figure out if a given number is even or not. But what if it's not clear? That's an easy fix then!

public void DoWork()  
{
  // check to see if the input is an even number
  if(input % 2 == 0)
  {
    // oh, why didn't you just say so?
  }
}

This is probably one of the things that grinds my gears the most when it comes to comments. I've ranted on comments before. But this just hurts me. Physically. Please avoid using comments to explain potentially confusing code. I humbly offer an alternative:

public void DoWork()  
{
  var inputIsEven = input % 2 == 0;
  if(inputIsEven)
  {
    // easy enough!
  }
}

Put another way, think instead if the example was some complicated business rule. We've all seen them..

if(input > 5 && input < 10 || (input >= 100))  

That really needs to use an intention-revealing name..

var isWithinAcceptableParameters = (input > 5 && input < 10 || (input >= 100))  
if(isWithinAcceptableParameters)  
{
}

No comments required!

4. Use Source Control.. Not Comments

I am a firm believer that code that is not required in any way to run the application, should not exist in the code base. While this does include dead code, unreachable code, etc (which may be hard to detect at times and sneak in). I mostly refer to code that has been commented out.

public void DoWork()  
{
  if(input > 0)
  {
  }
  /*
  * why didnt this work?
  * if(input > -1)
  * {
  * }
  */
}

It's an extreme example, but not uncommon in my travels. A more common example might look a little something like

var config = new Config();  
config.Title = "title";  
//config.MaxSize = 100;
config.MinimumSize = 1; 

Source control is an invaluable tool. Most say that regardless of what you're coding, you should always use source control. Even for the smallest of projects. There is no reason to muddle up your codebase with these type of comments when source control can retrieve it for you at a moments notice.

5. Avoid Long Parameter Lists

In the world of refactoring, there is a code smell called a data clump. I see this pretty frequently in code reviews. The developer introduced a slew of new methods that all have the same data set, but are all individual parameters.

Data clumps reveal themselves in a couple of ways. Through class members:

private int max;  
private int min;  
private DateTime date;

private string result; 

and through methods..

public string GetResult(int max, int min, int age, DateTime date)  
{
}

We even seem to instinctively clump these related data sets together without even noticing. Doesn't putting that newline between date and result just feel better?

The big problem is when we start copy-pasting these parameters into multiple locations. In order for the dataset to make sense, it requires a max, min, and date so why not encapsulate all three pieces of data into an object?

public class ResultContext  
{
  public ResultContext(int max, int min, DateTime date)
  {
    Max = max;
    Min = min;
    Date = date;
  }

  public int Max { get; }
  public int Min { get; }
  public DateTime Date { get; }
}

Now we can simply pass around the object, rather than a long list of parameters.

In the end, the true takeaway in all of this is.. please step away from // and /* */