opinion

4 posts

Code Coverage is Useless

Code Coverage is Useless

Not too long ago there were talks around the office regarding a new testing initiative. Now, by itself, this is fantastic news. Who wouldn't want to actually spend some time and get our testing story up to par?

The problem lies within the approach that was proposed, going so far as to say: "We need to ensure that we have at least 80% test coverage."

While the intention is a good one, code coverage is unfortunately useless.

cotton

Now, that is a pretty bold statement, so let me clarify a little bit. Code coverage goals are useless. You shouldn't strive for X% coverage on a given codebase. There are a few reasons for this, so let me explain.

It is possible to test enough

Not all code bases are created equal. One could be for an application that sees millions of hits in a day and is grossly complicated. Another could be for a tiny application that services a couple users a day, if that. I always like to envision these different kinds of applications on a risk plane.

risk plane

.. yeah I still know my way around MS Paint

Imagine if you will that each dot is an application in our system. The further top-right we go, the more likely that if something were to go wrong it'd be some bad news bears. Whereas the further bottom-left.. eh? Maybe someone would notice.

Now, it would be a silly little to say that every application should have at least 80% code coverage. Why? Opportunity cost. While I am a huge proponent of testing, I don't like to test just because. We should aim to test enough. Test enough so that we have enough confidence that our application will function as we expect it to.

In reality, maybe for our right-winged applications, 80% isn't enough. Maybe that actually should be higher and we should not stop at 80%. On the flipside, our smaller applications in the bottom left probably don't need such a high coverage percentage. The cycles spent adding tests would potentially bring us little to no value and end up just being a waste of time.

Note: I feel like at this point some individuals may be a little confused as to how adding tests would be invaluable. There's a whole development methodology called TDD that creates a high level of coverage just by following the red, green, refactor cycle.

The points I make here generally refer to going back and adding tests because someone dictated that the code bases coverage percentage was too low. If you're doing TDD to begin with, then setting a target really won't help. It's just a byproduct.

It's all about context. We can't generalize a percentage of coverage in our code base, because each code base is different.

Fun Fact: Did you know this sort of risk plane chart can be applicable to many different scenarios? Ever wondered what the risk plane for the security guy looks like?

security plane

Anyway...

In the same vein, not everything needs a test around it. Let's say we wanted to introduce a new public member into our codebase, something simple

public FirstName { get; set; }  

Introducing this line of code, if not called in any of our tests will drop code coverage. Maybe even below our beloved 80%. The fix?

[Fact]
FirstName_ByDefault_CanBeSet()  
{
  var myClass = MyClass();
  myClass.FirstName = "testname";
  Assert.AreEqual("testname", myClass.FirstName)
}

At this point, we're just testing .NET -- something we definitely want to avoid. I tend to only put tests around code that I know could actually have the potential to change in a way that I do not want it to. Logical code.

Code coverage is easy

Just because we have a lot of code coverage, does not necessarily mean that we can have a lot of confidence that our application works as we expect it to. Everything is always more clear with examples, so let's consider the following:

public class Flawless  
{
  public bool IsGuarenteedToWork()
  {
    // some code
  }
}

Now, methods usually have logic that we would normally want to test, right? Conditionals, mathematical operations, you name it. Though, for our example, it doesn't matter! We just want to increase code coverage. That's our goal.

[Fact]
public void IsGuarenteedToWork_ByDefault_Works()  
{
  var flawless = new Flawless();

  var actual = flawless.IsGuarenteedToWork();
}

And there you have it! 100% code coverage. By default, tests that do not have an Assert will be considered passing. Now you're probably thinking.. oh come on, who would actually do this?

People do silly things when incentivized. My go-to example is that of a scenario in which a company tells QA that for every bug they find at the end of the quarter, they will be given a bonus. Seems pretty reasonable right? The flip side of that is the same company tells development that they will receive a bonus based on how few bugs they introduce into the system.

This scenario incentivizes the failure of opposing groups. The development organization doesn't really want to write any code for fear of introducing a bug and wants QA to miss bugs in their analysis. Whereas the QA group wants development to introduce bugs into the system so that they can find them and be rewarded for doing so.

The other thing that we need to keep in mind is that...

Code coverage context matters

Let's consider that our developer wasn't just trying to game the system, and actually put forth an honest effort to obtaining his code coverage goal. Our implementation could be something like the following:

public class Flawless  
{
  public bool IsGuarenteedToWork()
  {
    for(var x = 0; x < int.MaxValue; x++) 
    {
      // Man, this is gonna work. I'll find that solution.. eventually.
    }
  }
}

.. and let's not forget the test.

[Fact]
public void IsGuarenteedToWork_ByDefault_Works()  
{
  var flawless = new Flawless();

  var actual = flawless.IsGuarenteedToWork();

  Assert.True(actual);
}

I hope it was obvious that the example above is far from performant. But in this case, we've reached 100% code coverage and we're actually asserting that the code is working as we intend it to. The implementation works. The test is correct. Everyone is happy. Almost...

When it comes to testing, there are different stakeholders.

Stakeholders are people whose lives you touch - Mark McNeil

This can be broken down further into the types of stakeholders.

  1. Primary Stakeholder (who I'm doing it for)
    Example: The customer who requested the feature.

  2. Secondary Stakeholder (others who are directly involved)
    Example: Your boss and/or other developers on the project.

  3. Indirect Stakeholder (those who are impacted otherwise)
    Example: The customers of your customer.

As programmers, we are writing code to solve problems for other people (sometimes ourselves if we can find the time). The same section of code matters differently to different people. Person A only cares that the answer is correct. Maybe they're notified when it's ready, but they're pretty indifferent to when they receive it. Person B needs the answer soon after requesting it. Our test only completely satisfies Person A.

There can be a lot of stakeholders when it comes to writing code. Unfortunately, we can't say with confidence, even at 100% code coverage, that our code is going to be compatible with everyone's needs.

After all of the harping on why code coverage is useless as a target. I need to wrap up by saying...

Code coverage can actually be useful

I prefer to leverage code coverage as a metric. Coverage is something that we're aware of, something that we can use to make informed decisions about each codebase.

If we notice that one codebase is consistently dropping in coverage, we can take that as a sign to look a little deeper into what's going on. Is the codebase incredibly hard to test? Are the developers just not putting forth the effort to test, even when it makes sense? Maybe it's actually what we would expect from that code base, so everything is gravy.

Coverage can also just let us know if we're doing an adequate amount of testing. If a mission-critical application only has 10% coverage, we should investigate the reasons for that and potentially start a quality initiative and gets some tests strapped on. It allows us to prioritize our testing initiatives without just randomly picking a codebase and start throwing tests at it.

The entire point of all of this is that setting coverage targets will just be counterproductive to your goals. We should be aware of coverage so that we can make informed decisions, but not let it impact the quality of our code just for the sake of coverage attainment.

Five Suggestions for Cleaner Code

Being involved in open source exposes you to a lot of different projects, individuals, coding styles... the works. It also means lots of code reviews. More reviews than I'd typically perform at work. In code reviews, I can be pretty opinionated. But I believe they stem from a strong desire to ensure that the code base is as clean and readable as possible.

With that said, these are all approaches that I would like to see all developers adopt. Nothing is hard to reason about and should be easy to start trying to implement from day one.

1. Avoid Flag Arguments

What exactly do I mean by a flag argument? I'll demonstrate with an example.

public void DoWork(bool log)  
{
  Execute();

  if(log)
  {
    // write to some log
  }
}

In the code above, we have a method that does something. When that something is done, it checks to see if we should log the event or not. If the passed in value is true, we go ahead and log the event. Otherwise, we bail out of the method.

This approach to writing methods is ultimately very confusing for the consumer. The implementation details may seem clear enough, but it's hard to reason about when used in the wild.

DoWork(true); // what could true possibly mean?  

I feel a much better approach is to actually split this behavior up into two individual methods. One method for each condition.

public void DoWork()  
{
  Execute();
}
public void DoWorkWithLogging()  
{
  DoWork();
  // log stuff
}

Here, we have a method DoWork that does just that. It only does the work and does not perform any logging. We've also introduced a method called DoWorkWithLogging that just calls DoWork, but additionally performs a logging operation. Hopefully, it's obvious given the name of this method, that it will not only do work, but it will log that it did it.

I feel this approach expresses a lot more intent throughout your codebase and reduces the cyclomatic complexity. We don't have any conditionals!

2. Do One Thing

Now, we hear this a lot, right? Do one thing and do it well. The single responsibility principle. However, it can be hard to actually reason about what does one thing actually mean? A lot of it is subjective, but I generally tend to reason about it in the form of abstractions.

An abstraction isn't a responsibility -- the caller doesn't know anything about the implementation details. The caller simply calls the abstraction and nothing more. Let's see an example.

public int CalculateCostForSize(int size)  
{
  if(size >= LARGE)
  {
    return size * LARGE_MODIFIER;
  }

  return size * DEFAULT_MODIFIER;
}

At first muster, this method is may seem completely acceptable. It's doing one thing, right? It's calculating the cost for a given size. However, there are a lot of implementation details that this method is concerning itself with.

The CalculateCostForSize method has to know what it means to be large. In this case, size is greater than or equal to LARGE. The method also has to know how to actually do the calculation. So one could argue that this method is actually doing three things.

  1. Figure out if the size is large.
  2. Calculate the cost when the size is large.
  3. Calculate the cost when the size is not large.

I feel a better approach would be:

public int CalculateCostForSize(int size)  
{
  if(SizeIsLarge(size))
  {
    return HandleLargeSize(size);
  }

  return HandleDefaultSize(size);
}

private bool SizeIsLarge(int size)  
{
  return size >= LARGE;
}

private bool HandleLargeSize(int size)  
{
  return size * LARGE_MODIFIER;
}

private bool HandleSmallSize(int size)  
{
  return size * DEFAULT_MODIFIER;
}

Now the method really only knows about the algorithm in which to calculate the size. It's not aware of any implementation details. Those are hidden behind abstractions.

3. Leverage Intention-Revealing Names

This is one that I see a lot and is an incredibly easy fix. Consider the following code snippet:

public void DoWork()  
{
  if(input % 2 == 0)
  {
    // when do I do this?
  }
}

Now as programmers, we probably all know what this code means. Using the modulus operator to figure out if a given number is even or not. But what if it's not clear? That's an easy fix then!

public void DoWork()  
{
  // check to see if the input is an even number
  if(input % 2 == 0)
  {
    // oh, why didn't you just say so?
  }
}

This is probably one of the things that grinds my gears the most when it comes to comments. I've ranted on comments before. But this just hurts me. Physically. Please avoid using comments to explain potentially confusing code. I humbly offer an alternative:

public void DoWork()  
{
  var inputIsEven = input % 2 == 0;
  if(inputIsEven)
  {
    // easy enough!
  }
}

Put another way, think instead if the example was some complicated business rule. We've all seen them..

if(input > 5 && input < 10 || (input >= 100))  

That really needs to use an intention-revealing name..

var isWithinAcceptableParameters = (input > 5 && input < 10 || (input >= 100))  
if(isWithinAcceptableParameters)  
{
}

No comments required!

4. Use Source Control.. Not Comments

I am a firm believer that code that is not required in any way to run the application, should not exist in the code base. While this does include dead code, unreachable code, etc (which may be hard to detect at times and sneak in). I mostly refer to code that has been commented out.

public void DoWork()  
{
  if(input > 0)
  {
  }
  /*
  * why didnt this work?
  * if(input > -1)
  * {
  * }
  */
}

It's an extreme example, but not uncommon in my travels. A more common example might look a little something like

var config = new Config();  
config.Title = "title";  
//config.MaxSize = 100;
config.MinimumSize = 1;  

Source control is an invaluable tool. Most say that regardless of what you're coding, you should always use source control. Even for the smallest of projects. There is no reason to muddle up your codebase with these type of comments when source control can retrieve it for you at a moments notice.

5. Avoid Long Parameter Lists

In the world of refactoring, there is a code smell called a data clump. I see this pretty frequently in code reviews. The developer introduced a slew of new methods that all have the same data set, but are all individual parameters.

Data clumps reveal themselves in a couple of ways. Through class members:

private int max;  
private int min;  
private DateTime date;

private string result;  

and through methods..

public string GetResult(int max, int min, int age, DateTime date)  
{
}

We even seem to instinctually clump these related datasets together without even noticing. Doesn't putting that newline between date and result just feel better?

The big problem is when we start copy-pasting these parameters into multiple locations. In order for the dataset to make sense, it requires a max, min, and date so why not encapsulate all three pieces of data into an object?

public class ResultContext  
{
  public ResultContext(int max, int min, DateTime date)
  {
    Max = max;
    Min = min;
    Date = date;
  }

  public int Max { get; }
  public int Min { get; }
  public DateTime Date { get; }
}

Now we can simply pass around the object, rather than a long list of parameters.

In the end, the true takeaway in all of this is.. please step away from // and /* */

Coding Standards Should Not Matter

Yeah that's right. That behemoth of a document that you call your standards document just does not matter. Now that's a pretty bold statement, so let me explain.

I believe that a standards document should be about how to style code. It should be about where to place curly braces, it should answer the great spaces vs. tabs debate (even though there is obviously only ever one clear choice), and similar concerns. What a standards document should not contain is substance, best practices. So what is a standard? A best practice?

It's very likely that the definition of a standard or a best practice vary from individual to individual, so to get everyone on the same page, these are how I define standards and best practices.

  • Standard: A rule put in place by an authority for the purpose of consistency.
  • Best Practice: A proven approach, generally the best way to solve a problem.

The key would in the standards definition is consistency. It should not matter at all how the code is formatted as long as the team agrees to an approach and follows it. Ensuring that the coding style of the repository remains consistent.

Consistency is extremely important. Code should look like a team wrote it, and not any one individual. Everyone on the team is going to be sharing the same code base and the style should not vary from developer to developer. You definitely do not want developers as their first order of business when starting a new task to be to reformat the code to their own personal preference.

A best practice is more along the lines of do not wrap your entire class in a try-catch and swallow the exception to avoid generating errors or keep your collection setters private. While as important as a standard, if not more so, these types of statements should not exist within a standards document.

So why shouldn't a standards document contain both?

First and foremost, they are different concerns. We generally like to split up concerns into easily digestible bits and not combine them. It is not going to matter all that much if your team puts curly braces on the same lines as your conditionals. It is going to matter if you start piling all of your concerns into a single class.

However, from my experience, a standards document is rarely just standards. A standards document will generally combine standards and best practices.

Though I do not believe the solution is to split up these two concerns into different documents. No, I believe a standards document should not exist. Emphasis on document. We should have standards. Unfortunately the medium we choose to present our standards is generally the incorrect one.

So whats the better approach? Let your code be your standards document.

I believe that coding standards are better enforced through tooling and the code base itself. Tools such as ReSharper and StyleCop allow teams to define their standards and then have them be automatically enforced through the development environment. No need to spend time writing out paragraphs in English, comparing and contrasting different styling approaches (Do vs. Don't). No need to worry about document formatting. Simply define your rules and push it to the teams who you intend to consume the styling that was agreed upon.

Like the old saying goes, the best standards document is no standards document at all.

A Comment on Comments

Full disclosure, I really don't like comments.

Like any feature of a language, I believe they have their purpose. After all, they were put in the language in the first place. I've just seen a lot of incorrect uses of comments, and the more I run into them, the more I feel like they don't add a whole lot of value.

Lets dive into some of the reasons why.

Comments bloat classes

I see auto generated comments everywhere. They plague a large portion of our code base. They have good intentions, but often end up making the file larger and harder to understand. Especially when the developer pays no mind to the content that was generated.

public class Cube  
{
    /**
    * The height of the cube.
    */
    public Height { get; set; }

    /**
    * The width of the cube.
    */
    public Width { get; set; }

    /**
    * The depth of the cube.
    */
    public Depth { get; set; }
}

We've turned a simple model of a cube which could be a few lines long, into a bloated model with more lines of comments than of actual useful code. Everything that the comments are describing are self evident. The comment headers just aren't needed. In all honesty, if I see something similar to the above, I just go ahead and delete the comments.

Comments are code duplication

One of my favorite light bulb moments. We as developers like to avoid code duplication, reuse is king. If you take a step back and think about comments in this regard, that's exactly what they are.

Code explains what is going to be done. Comments explain what is going to be done. They just do so in different languages. Just like with code duplication, if you change a line of code, you best remember to also change the comment lest you want to be left with:

/**
* ...
<summary>Returns true when the test succeeds; false otherwise.</summary>  
*/
public TestResult RunTest()  
{

You can probably guess that this method used to be a bool, and it was decided that the returned result needed to be a little more verbose. So the developer went ahead and created a TestResult object to store some more information about the result of the test. However, they forgot to update the comment, and now it's out of sync

Comments can be a sign of a bigger problem

Even if you believe you are using comments as intended, to explain potentially confusing code, a better solution is to just make the code less confusing.

if(!(x > 10 && y < 5 && z == -1)) // ensure the object is not out of bounds  
bool isOutOfBounds = (x > 10 && y < 5 && z == 0)  
if(isOutOfBounds) {  

You could even go a step further and assign each int its own const variable so the user isn't guessing what exactly 10, 5, and 0 even mean.

Finally.. I've seen my fair share of auto generated comments, and you know what?

Auto generated comments make me cry
/**
* Processes the async
*/
public async Task<Result> ProcessAsync()  

Oh.. I had no idea.

Anyway, after all this comment bashing, lets talk about a couple cases that I believe to be a strong candidates for comment use.

Public APIs are a great use of comments. You are going to have a large body of people consuming your code, calling your methods. It can be greatly helpful if everything is well documented, explaining exactly how everything is supposed to work.

Another valid use for comments is explaining why code is present. I like to think that code explains how it works, where comments explain why it works. Code can only make so much sense, regardless of how you name your variables, how well you format it, etc. Sometimes, there exists a scenario that just needs a little more explanation.

/* Reflection may seem like an odd choice here, but after testing the use case
* scenarios it proved to be much faster than splitting everything up into database
* calls */

Like I said in the beginning. Comments were put into languages for a reason, they have their place. Please, just use them sparingly.