testing

Code Coverage is Useless

Not too long ago there were talks around the office regarding a new testing initiative. Now, by itself, this is fantastic news. Who wouldn't want to actually spend some time and get our testing story up to par?

The problem lies within the approach that was proposed, going so far as to say: "We need to ensure that we have at least 80% test coverage."

While the intention is a good one, code coverage is unfortunately useless.

Now, that is a pretty bold statement, so let me clarify a little bit. Code coverage goals are useless. You shouldn't strive for X% coverage on a given codebase. There are a few reasons for this, so let me explain.

It is possible to test enough

Not all code bases are created equal. One could be for an application that sees millions of hits in a day and is grossly complicated. Another could be for a tiny application that services a couple users a day, if that. I always like to envision these different kinds of applications on a risk plane.

.. yeah I still know my way around MS Paint

Imagine if you will that each dot is an application in our system. The further top-right we go, the more likely that if something were to go wrong it'd be some bad news bears. Whereas the further bottom-left.. eh? Maybe someone would notice.

Now, it would be a silly little to say that every application should have at least 80% code coverage. Why? Opportunity cost. While I am a huge proponent of testing, I don't like to test just because. We should aim to test enough. Test enough so that we have enough confidence that our application will function as we expect it to.

In reality, maybe for our right-winged applications, 80% isn't enough. Maybe that actually should be higher and we should not stop at 80%. On the flip side, our smaller applications in the bottom left probably don't need such a high coverage percentage. The cycles spent adding tests would potentially bring us little to no value and end up just being a waste of time.

Note: I feel like at this point some individuals may be a little confused as to how adding tests would be invaluable. There's a whole development methodology called TDD that creates a high level of coverage just by following the red, green, refactor cycle.The points I make here generally refer to going back and adding tests because someone dictated that the code bases coverage percentage was too low. If you're doing TDD to begin with, then setting a target really won't help. It's just a byproduct.

It's all about context. We can't generalize a percentage of coverage in our code base, because each code base is different.

Fun Fact: Did you know this sort of risk plane chart can be applicable to many different scenarios? Ever wondered what the risk plane for the security guy looks like?

Anyway...

In the same vein, not everything needs a test around it. Let's say we wanted to introduce a new public member into our codebase, something simple

public FirstName { get; set; } 

Introducing this line of code, if not called in any of our tests will drop code coverage. Maybe even below our beloved 80%. The fix?

[Fact]
FirstName_ByDefault_CanBeSet()  
{
  var myClass = MyClass();
  myClass.FirstName = "testname";
  Assert.AreEqual("testname", myClass.FirstName)
}

At this point, we're just testing .NET -- something we definitely want to avoid. I tend to only put tests around code that I know could actually have the potential to change in a way that I do not want it to. Logical code.

Code coverage is easy

Just because we have a lot of code coverage, does not necessarily mean that we can have a lot of confidence that our application works as we expect it to. Everything is always more clear with examples, so let's consider the following:

public class Flawless  
{
  public bool IsGuarenteedToWork()
  {
    // some code
  }
}

Now, methods usually have logic that we would normally want to test, right? Conditionals, mathematical operations, you name it. Though, for our example, it doesn't matter! We just want to increase code coverage. That's our goal.

[Fact]
public void IsGuarenteedToWork_ByDefault_Works()  
{
  var flawless = new Flawless();

  var actual = flawless.IsGuarenteedToWork();
}

And there you have it! 100% code coverage. By default, tests that do not have an Assert will be considered passing. Now you're probably thinking.. oh come on, who would actually do this?

People do silly things when incentivized. My go-to example is that of a scenario in which a company tells QA that for every bug they find at the end of the quarter, they will be given a bonus. Seems pretty reasonable right? The flip side of that is the same company tells development that they will receive a bonus based on how few bugs they introduce into the system.

This scenario incentivizes the failure of opposing groups. The development organization doesn't really want to write any code for fear of introducing a bug and wants QA to miss bugs in their analysis. Whereas the QA group wants development to introduce bugs into the system so that they can find them and be rewarded for doing so.

The other thing that we need to keep in mind is that...

Code coverage context matters

Let's consider that our developer wasn't just trying to game the system, and actually put forth an honest effort to obtaining his code coverage goal. Our implementation could be something like the following:

public class Flawless  
{
  public bool IsGuarenteedToWork()
  {
    for(var x = 0; x < int.MaxValue; x++) 
    {
      // Man, this is gonna work. I'll find that solution.. eventually.
    }
  }
}

.. and let's not forget the test.

[Fact]
public void IsGuarenteedToWork_ByDefault_Works()  
{
  var flawless = new Flawless();

  var actual = flawless.IsGuarenteedToWork();

  Assert.True(actual);
}

I hope it was obvious that the example above is far from performant. But in this case, we've reached 100% code coverage and we're actually asserting that the code is working as we intend it to. The implementation works. The test is correct. Everyone is happy. Almost...

When it comes to testing, there are different stakeholders.

Stakeholders are people whose lives you touch - Mark McNeil

This can be broken down further into the types of stakeholders.

  1. Primary Stakeholder (who I'm doing it for) Example: The customer who requested the feature.
  2. Secondary Stakeholder (others who are directly involved) Example: Your boss and/or other developers on the project.
  3. Indirect Stakeholder (those who are impacted otherwise) Example: The customers of your customer.

As programmers, we are writing code to solve problems for other people (sometimes ourselves if we can find the time). The same section of code matters differently to different people. Person A only cares that the answer is correct. Maybe they're notified when it's ready, but they're pretty indifferent to when they receive it. Person B needs the answer soon after requesting it. Our test only completely satisfies Person A.

There can be a lot of stakeholders when it comes to writing code. Unfortunately, we can't say with confidence, even at 100% code coverage, that our code is going to be compatible with everyone's needs.

After all of the harping on why code coverage is useless as a target. I need to wrap up by saying...

Code coverage can actually be useful

I prefer to leverage code coverage as a metric. Coverage is something that we're aware of, something that we can use to make informed decisions about each codebase.

If we notice that one codebase is consistently dropping in coverage, we can take that as a sign to look a little deeper into what's going on. Is the codebase incredibly hard to test? Are the developers just not putting forth the effort to test, even when it makes sense? Maybe it's actually what we would expect from that code base, so everything is gravy.

Coverage can also just let us know if we're doing an adequate amount of testing. If a mission-critical application only has 10% coverage, we should investigate the reasons for that and potentially start a quality initiative and gets some tests strapped on. It allows us to prioritize our testing initiatives without just randomly picking a codebase and start throwing tests at it.

The entire point of all of this is that setting coverage targets will just be counterproductive to your goals. We should be aware of coverage so that we can make informed decisions, but not let it impact the quality of our code just for the sake of coverage attainment.

A Best Practices Guide for Unit Testing

As more and more developers embrace the joys of unit testing, I'm starting to see a lot more tests in code reviews, which is great to see! I am, however, seeing a lot of the same mistakes pop up.

To help with this, I wanted to use this blog post as a means to showcase a document I've been working on that outlines some best practices when writing unit tests in C#.

UPDATE: This document has been added to the Microsoft docs website (https://docs.microsoft.com/en-us/dotnet/core/testing/unit-testing-best-practices)

If you would like to contribute, the GitHub repo can be found here.

Happy unit testing!

Fakes, Mocks, Stubs, and Spies.. Oh My..

As I do more and more work with the exercism.io community, I've noticed a reoccurring theme. There seems to be a lot of confusion around the various types of "test doubles" and what each of them actually do.

Now, if you've had some exposure to test driven development already, you may already be familiar with test doubles such as fakes, stubs, spies, mocks, and dummies (did I miss any)? I feel like a lot of these terms are very similar and do not add much value, but rather just further complicate the discussion.

I prefer to simplify by classifying every test double as either a Fake, Stub, or Mock. I would even categorize a test double as a Fake. But what are Fakes, Mocks, and Stubs? I've taken a leaf out of Roy Osherove's book The Art of Unit Testing to define them as I believe they are the easiest to understand.

Fake - A fake is a generic term which can be used to describe either a Stub or a Mock object. Whether it is a Stub or a Mock depends on the context in which it's used. So in other words, a Fake can be a Stub or a Mock.

Mock - A mock object is a fake object in the system that decides whether or not a unit test has passed or failed. A Mock starts out as a Fake until it is asserted against.

Stub - A stub is a controllable replacement for an existing dependency (or collaborator) in the system. By using a stub, you can test your code without dealing with the dependency directly. By default, a fake starts out as a stub.

Now that we have a formal definition of each, there's a few key points I want to highlight.

When you say mock, you probably mean stub.

This is probably the most common mistake I see when performing code reviews of test suites and discussing tests with other developers.

Consider the following code snippet:

var mockOrder = new MockOrder();  
var sut = new Purchase(mockOrder);

sut.ValidateOrders();

Assert.True(sut.CanBeShipped); 

This would be an example of Mock being used improperly. In this case, it is a stub. We're just passing in the Order as a means to be able to instantiate Purchase (the system under test). The name MockOrder is also very misleading because again, the order is not a mock.

A better approach would be

var fakeOrder = new FakeOrder();  
var sut = new Purchase(fakeOrder);

sut.ValidateOrders();

Assert.True(sut.CanBeShipped); 

By renaming the class to FakeOrder, we've made the class a lot more generic, the class can be used as a mock or a stub. Whichever is better for the test case. In the above example, FakeOrder is used as a stub. We're not using the FakeOrder in any shape or form during the assert. We just passed it into the Purchase class to satisfy the requirements of the constructor.

To use it as a Mock, we could do something like this

var fakeOrder = new FakeOrder();  
var sut = new Purchase(fakeOrder);

sut.ValidateOrders();

Assert.True(fakeOrder.Validated);  

In this case, we are checking a property on the Fake (asserting against it), so in the above code snippet the fakeOrder is a Mock.

It's important to get this terminology correct. If you call your stubs "mocks", other developers are going to make false assumptions about your intent.

The main thing to remember about mocks versus stubs is that mocks are just like stubs, but you assert against the mock object, whereas you do not assert against a stub. Which means that only mocks can break your tests, not stubs.

When you say mocking framework, you probably mean isolation framework

In the same vein as "stubs are not mocks" I wanted to touch a little bit on the idea of mocking frameworks.

Unfortunately the term "Mocking framework", is confusing and ultimately incorrect. Take any framework you may consider a mocking framework (Moq, NSubstitute, etc), sure they can mock, but they can do so much more (e.g. they're also capable of stubbing).

The goal of these frameworks are to isolate your code. We should be calling them isolation frameworks. This even confused me for quite some time, having had some experience with Moq.

Consider the example

var mock = new Mock<IOrder>();  
var purchase = new Purchase(mock.Object);

Assert.True(purchase.Pending); 

Even though we called the variable mock and Moq provides a class called Mock, in this context, it's actually a stub.

So unfortunately, mock is an overloaded word. It's confusing. Avoid if it all possible, unless you actually are referring to mocking. Explaining to a developer that you can "isolate" something from its dependencies rather than "mock" its dependencies, feels more natural.

Key Takeaways

1) If you must roll your own class for isolation purposes, consider naming it a Fake. This gives you the freedom of using it as either a Mock or a Stub and does not give off the wrong intent.

2) Mocks are not Stubs. If you do not assert against it, it's a Stub.

3) Prefer the terminology Isolation framework. Mocking framework can be confusing.

The Transformation Priority Premise

Recently I stumbled across a test driven development article that mentioned something I had not heard before. It's a premise that Uncle Bob came up with as a means to order the priority of the transformations you should apply when practicing test driven development. He called it the Transformation Priority Premise

I wrote a couple small programs using the premise, and really liked the concept he was trying to convey. Though in order to fully explain the premise, we should probably talk about test driven development itself.

So.. what is TDD?

Test Driven Development

TDD is a software development methodology that has three "laws" set forth by none other than Uncle Bob. They are as follows:

  1. You are not allowed to write any production code unless it is to make a failing unit test pass.
  2. You are not allowed to write any more of a unit test than is sufficient to fail, and compilation failures are failures.
  3. You are not allowed to write any more production code than is sufficient to pass the one failing unit test.

These three laws, if adhered to, force you into a cycle commonly referred to as red, green, refactor. Let's demonstrate cycle by writing our own program. First, we'll need some requirements.

This program will return four different responses, and the response will be based on what kind of sentence is used as input.

  1. If the input ends with a period, return "Ok."
  2. If the input ends with a question mark, return "No."
  3. If the input ends with an exclamation mark, return "Quiet!"
  4. If no punctuation is found, return "What?"

This program is based on an exercism exercise called Bob, which is actually based off of another exercise, Deaf Grandma.

So where to start? The test.

Before we write any production code (the code that will ultimately end up into the compiled binary) we need to first stand up a unit test. To start, we'll need to create our System Under Test (SUT).

[TestMethod]
public void Input_has_no_punctuation_response_says_what()  
{
    var sut = new Responder();
}

And not all that surprising, the compiler is already yelling at us.

The type or namespace name 'Responder' could not be found (are you missing a using directive or an assembly reference?)  

But that's ok! We're already abiding by the first law since we started with a unit test. The compilation error is also expected; the second law states that we can't write any more of the unit test than is sufficient to fail (and compilation errors are failures).

So let's switch context a little bit and start writing some production code.

public class Responder  
{
}

We're done! The unit test compiles and passes. The third law forbids us from writing any more production code.

At this point of our development cycle we have gone through red (unit test compilation error), green (adding the Responder class to make the test pass), and now we're onto refactoring. Heh well, in this case, there's not really anything we can refactor, so we can move on.

With one cycle completed, we start from the beginning again with red. Just like last time, we need to write some more code in our test case so that it fails.

We'll want a method on the Responder that can take an input of type string, and we know our first requirement is that if no punctuation is found the result of the method is "What?"

[TestMethod]
public void Input_has_no_punctuation_response_says_what()  
{
    var _sut = new Responder();
    Assert.AreEqual("What?", _sut.Response("Hello"));
}

Now we can go ahead and compile that...

'Responder' does not contain a definition for 'Response' and no extension method 'Response' accepting a first argument of type 'Responder' could be found (are you missing a using directive or an assembly reference?)  

Another compiler error. Let's go ahead and fix that up.

We know the compiler error stems from the fact that we never implemented a Response method on the Responder class, so that's pretty easy to implement. But what do we write inside of the method body? The answer may seem a little surprising.

public string Response(string input)  
{
    return "What?";
}

That's right. A constant string value of "What?". Once again, this is because of the third law. We cannot write any more production code than is sufficient to pass the one failing unit test. It may seem a little silly at first, but bear with me, it'll hopefully make a little more sense as we continue writing our program.

Alright, so we've tested the case of no punctuation. Let's move onto a case that includes punctuation, the period. Testing for that gives us a unit test that looks like this:

[TestMethod]
public void Input_is_statement_response_says_ok()  
{
    Assert.AreEqual("Ok.", _sut.Response("Do it now."));
}

Continuing with the red, green, refactor cycle, we now have a failing test. Let's go ahead and write the bare minimum implementation.

public string Response(string input)  
{
    if(input.EndsWith("."))
    {
        return "Ok.";
    }

    return "What?";
    }
}

Easy enough, time for another test.

[TestMethod]
public void Input_is_question_response_says_no()  
{
    Assert.AreEqual("No.", _sut.Response("Please?"));
}

Next up? You've got it. Let's make this test pass.

public string Response(string input)  
{
    if(input.EndsWith("."))
    {
        return "Ok.";
    }

    if (input.EndsWith("?"))
    {
        return "No.";
    }

    return "What?";
}

Now, when we make this test pass, we can see that there is some code duplication going on that we should probably refactor. After all, after making a test pass, we are given the opportunity to refactor the code. Unfortunately, it may not always be clear on how to refactor it. There is hope, however!

The Transformation Priority Premise

As stated in the introduction, The Transformation Priority Premise (TPP) is a premise that was put together as a means to prioritize the transformations that occur when getting unit tests to pass.

When you're practicing TDD you may ask: "How doesn't all code produced by using TDD, just result in code that is specifically tailored to pass the tests?"

You might notice that we're starting to see that a little in our current program. As it stands right now, we have one conditional per unit test. There's really nothing to stop this trend from occurring. There is, however, another little mantra that goes with TDD that pushes developers away from this practice.

“As the tests get more specific, the code gets more generic.”

Put another way: As we add more tests to our system (become more specific), our code becomes more generic (agnostic to the input).

With this in mind, it should be a little clearer to see that our current approach may not be the best one that we can take to solve this problem. We're just introducing more and more if statements to make the tests pass. Let's take a stab at refactoring our code and get away from our potential mountain of conditionals.

To start, the root of the TPP is its list of transformations and their priority. Here is the full list:

  1. ({}–>nil) no code at all->code that employs nil
  2. (nil->constant)
  3. (constant->constant+) a simple constant to a more complex constant
  4. (constant->scalar) replacing a constant with a variable or an argument
  5. (statement->statements) adding more unconditional statements.
  6. (unconditional->if) splitting the execution path
  7. (scalar->array)
  8. (array->container)
  9. (statement->recursion)
  10. (if->while)
  11. (expression->function) replacing an expression with a function or algorithm
  12. (variable->assignment) replacing the value of a variable.

.. and In case you've forgotten, this is the code we're trying to refactor.

public string Response(string input)  
{
    if(input.EndsWith("."))
    {
        return "Ok.";
    }

    if (input.EndsWith("?"))
    {
        return "No.";
    }

    return "What?";
}

Now we want to refactor this in order to get rid of the duplication. We started with a single constant, "What?" which was priority #3 and moved onto splitting the execution path, #6. It's time to consult the list and see what transformations we can make in order to clean up the if statements.

Being at #6 currently, the next logical step would be to take a look at #7, scalar to array. That could probably work, but given the context of this problem, we know it's a mapping issue. We're mapping punctuation to results. So let's take it one step further and leverage #8, array to container.

Note: The difference between an array and container is that an array is generally going to be a primitive array (think int[], string[], etc). Whereas a container is going to be something like a List, Set, or Dictionary.

Using scalar to array, and then array to container, we get a refactored method that looks like this:

public string Response(string input)  
{
    var inputResponses = new Dictionary<char, string>()
    {
        { '.', "Ok." },
        { '?', "No." }
    };

    if (inputResponses.ContainsKey(input.Last()))
    {
        return inputResponses[input.Last()];
    }

    return "What?";
}

That's pretty neat. No more repeating if statements. Recompile, ensure the tests still pass.. and they do! Now, there's only one punctuation mark that remains in our requirements, and that's the exclamation mark. We just finished refactoring, so we start again from red and introduce our last test:

[TestMethod]
public void Input_is_yelling_response_says_quiet()  
{
    Assert.AreEqual("Quiet!", _sut.Response("Woot!"));
}

Going back to our production code, it should be pretty straight forward as to how we can get this test to pass.

public string Response(string input)  
{
    var inputResponses = new Dictionary<char, string>()
    {
        { '.', "Ok." },
        { '?', "No." },
        { '!', "Quiet!" }
    };

    if (inputResponses.ContainsKey(input.Last()))
    {
        return inputResponses[input.Last()];
    }

    return "What?";
}

That's all there is to it! All of our tests pass, and we've met all of the requirements that we set out to meet.

The gain from leveraging the TPP is that it keeps us grounded, and forces us to continue to take baby steps when developing code. We generally do not want to take giant leaps forward. Start with repeating if statements over and over until something like a dictionary or a loop pops out at you.

If you're interested in learning more about all of the nuances of the Transformation Priority Premise, I highly recommend checking out Uncle Bob's initial blog post on the subject which can be found here.