Code Coverage is Useless

Not too long ago there were talks around the office regarding a new testing initiative. Now, by itself, this is fantastic news. Who wouldn't want to actually spend some time and get our testing story up to par?

The problem lies within the approach that was proposed, going so far as to say: "We need to ensure that we have at least 80% test coverage."

While the intention is a good one, code coverage is unfortunately useless.

cotton

Now, that is a pretty bold statement, so let me clarify a little bit. Code coverage goals are useless. You shouldn't strive for X% coverage on a given codebase. There are a few reasons for this, so let me explain.

It is possible to test enough

Not all code bases are created equal. One could be for an application that sees millions of hits in a day and is grossly complicated. Another could be for a tiny application that services a couple users a day, if that. I always like to envision these different kinds of applications on a risk plane.

risk plane

.. yeah I still know my way around MS Paint

Imagine if you will that each dot is an application in our system. The further top-right we go, the more likely that if something were to go wrong it'd be some bad news bears. Whereas the further bottom-left.. eh? Maybe someone would notice.

Now, it would be a silly little to say that every application should have at least 80% code coverage. Why? Opportunity cost. While I am a huge proponent of testing, I don't like to test just because. We should aim to test enough. Test enough so that we have enough confidence that our application will function as we expect it to.

In reality, maybe for our right-winged applications, 80% isn't enough. Maybe that actually should be higher and we should not stop at 80%. On the flipside, our smaller applications in the bottom left probably don't need such a high coverage percentage. The cycles spent adding tests would potentially bring us little to no value and end up just being a waste of time.

Note: I feel like at this point some individuals may be a little confused as to how adding tests would be invaluable. There's a whole development methodology called TDD that creates a high level of coverage just by following the red, green, refactor cycle.

The points I make here generally refer to going back and adding tests because someone dictated that the code bases coverage percentage was too low. If you're doing TDD to begin with, then setting a target really won't help. It's just a byproduct.

It's all about context. We can't generalize a percentage of coverage in our code base, because each code base is different.

Fun Fact: Did you know this sort of risk plane chart can be applicable to many different scenarios? Ever wondered what the risk plane for the security guy looks like?

security plane

Anyway...

In the same vein, not everything needs a test around it. Let's say we wanted to introduce a new public member into our codebase, something simple

public FirstName { get; set; }  

Introducing this line of code, if not called in any of our tests will drop code coverage. Maybe even below our beloved 80%. The fix?

[Fact]
FirstName_ByDefault_CanBeSet()  
{
  var myClass = MyClass();
  myClass.FirstName = "testname";
  Assert.AreEqual("testname", myClass.FirstName)
}

At this point, we're just testing .NET -- something we definitely want to avoid. I tend to only put tests around code that I know could actually have the potential to change in a way that I do not want it to. Logical code.

Code coverage is easy

Just because we have a lot of code coverage, does not necessarily mean that we can have a lot of confidence that our application works as we expect it to. Everything is always more clear with examples, so let's consider the following:

public class Flawless  
{
  public bool IsGuarenteedToWork()
  {
    // some code
  }
}

Now, methods usually have logic that we would normally want to test, right? Conditionals, mathematical operations, you name it. Though, for our example, it doesn't matter! We just want to increase code coverage. That's our goal.

[Fact]
public void IsGuarenteedToWork_ByDefault_Works()  
{
  var flawless = new Flawless();

  var actual = flawless.IsGuarenteedToWork();
}

And there you have it! 100% code coverage. By default, tests that do not have an Assert will be considered passing. Now you're probably thinking.. oh come on, who would actually do this?

People do silly things when incentivized. My go-to example is that of a scenario in which a company tells QA that for every bug they find at the end of the quarter, they will be given a bonus. Seems pretty reasonable right? The flip side of that is the same company tells development that they will receive a bonus based on how few bugs they introduce into the system.

This scenario incentivizes the failure of opposing groups. The development organization doesn't really want to write any code for fear of introducing a bug and wants QA to miss bugs in their analysis. Whereas the QA group wants development to introduce bugs into the system so that they can find them and be rewarded for doing so.

The other thing that we need to keep in mind is that...

Code coverage context matters

Let's consider that our developer wasn't just trying to game the system, and actually put forth an honest effort to obtaining his code coverage goal. Our implementation could be something like the following:

public class Flawless  
{
  public bool IsGuarenteedToWork()
  {
    for(var x = 0; x < int.MaxValue; x++) 
    {
      // Man, this is gonna work. I'll find that solution.. eventually.
    }
  }
}

.. and let's not forget the test.

[Fact]
public void IsGuarenteedToWork_ByDefault_Works()  
{
  var flawless = new Flawless();

  var actual = flawless.IsGuarenteedToWork();

  Assert.True(actual);
}

I hope it was obvious that the example above is far from performant. But in this case, we've reached 100% code coverage and we're actually asserting that the code is working as we intend it to. The implementation works. The test is correct. Everyone is happy. Almost...

When it comes to testing, there are different stakeholders.

Stakeholders are people whose lives you touch - Mark McNeil

This can be broken down further into the types of stakeholders.

  1. Primary Stakeholder (who I'm doing it for)
    Example: The customer who requested the feature.

  2. Secondary Stakeholder (others who are directly involved)
    Example: Your boss and/or other developers on the project.

  3. Indirect Stakeholder (those who are impacted otherwise)
    Example: The customers of your customer.

As programmers, we are writing code to solve problems for other people (sometimes ourselves if we can find the time). The same section of code matters differently to different people. Person A only cares that the answer is correct. Maybe they're notified when it's ready, but they're pretty indifferent to when they receive it. Person B needs the answer soon after requesting it. Our test only completely satisfies Person A.

There can be a lot of stakeholders when it comes to writing code. Unfortunately, we can't say with confidence, even at 100% code coverage, that our code is going to be compatible with everyone's needs.

After all of the harping on why code coverage is useless as a target. I need to wrap up by saying...

Code coverage can actually be useful

I prefer to leverage code coverage as a metric. Coverage is something that we're aware of, something that we can use to make informed decisions about each codebase.

If we notice that one codebase is consistently dropping in coverage, we can take that as a sign to look a little deeper into what's going on. Is the codebase incredibly hard to test? Are the developers just not putting forth the effort to test, even when it makes sense? Maybe it's actually what we would expect from that code base, so everything is gravy.

Coverage can also just let us know if we're doing an adequate amount of testing. If a mission-critical application only has 10% coverage, we should investigate the reasons for that and potentially start a quality initiative and gets some tests strapped on. It allows us to prioritize our testing initiatives without just randomly picking a codebase and start throwing tests at it.

The entire point of all of this is that setting coverage targets will just be counterproductive to your goals. We should be aware of coverage so that we can make informed decisions, but not let it impact the quality of our code just for the sake of coverage attainment.

Best of GOTO Chicago 2018

OS Panel

speaking at GOTO Chicago's Open Source panel

Wow, what a week! I'll spare you from any May the fourth be with you puns (unless that in itself counts) and dive right into the best of GOTO Chicago 2018.

This past week I had the opportunity to attend my first GOTO conference, and overall I would say it was a good experience. I honestly was not blown away by the sponsor booths, and some of the talks were a little rough around the edges... but what conference doesn't have its fair share of flops? We're not here to talk about those though, I want to highlight the talks and experiences that really stuck out.

Let's start with...

Testing legacy code

My first exposure to GOTO was an all-day workshop hosted by Michael C. Feathers, the author of the Working Effectively with Legacy Code. While the workshop consisted mostly of techniques for getting legacy code into a testable state, the content presented that was presented would take up an entire blog post in itself (which I will assuredly do at some point in the near future). So I am just going to focus on two key elements that really resonated with me.

public versus published
We've all heard of access modifiers (private, public, etc), but what about published? The easiest way to describe it is to give an example, comparing it to public.

Public should be familiar territory. If something is marked as public, it can be accessed by other classes, subclasses, and even assemblies. Within an organization, it should be relatively easy to change public signatures. Sure it might be a lot of work to do so, depending on how many projects reference the public interface, but it's clear to see what needs to change and you can have a high level of confidence that you won't break anything. You may even have the ability to update all of the references yourself (which I would encourage you to do so).

Published, on the other hand, you don't have any control over. A prime example would be an API that other companies leverage. You can't change the interface willy-nilly because you don't have the ability to update the references. Published is an abstract concept. You don't really know if something is marked as published, because there isn't an access modifier for it. It just isn't supported in the language.

While this makes complete sense, I was just happy to finally put some context around the concept. Furthermore, I've been officially converted to the camp of: If you're changing a public interface, do your due diligence and update the references yourself. No more of this obsoleting a method that will never be updated.

Best is the enemy of better

In reality, after some google-fu, the official quote is:

Perfect is the enemy of good - Voltaire

Regardless, to me this was a simple yet powerful quote that I previously had not heard before. It stems from the idea that so many times in our career we spin our wheels indefinitely while trying to seek out the perfect solution. Maybe we never implement the perfect solution, because it's just not possible right now. However, it may be possible to implement a good solution. We just fail to realize it sometimes.

A big part of the workshop was refactoring code to make it testable. To do so, we made a lot of new classes that the old classes just called into. Our old code essentially just became wrappers around the new code. Obviously, this is not a perfect solution, but it's a step in the right direction to get the code into a clean and testable state.

After wrapping up the testing workshop, it's time to head onto over to the first Keynote of the conference.

Drifting into failure

Presentation: https://www.youtube.com/watch?v=mFQRn_m2mP4

The first keynote of the conference was given by Adrian Cockcroft, the VP of Cloud Architecture Strategy for Amazon. It was a little surprising, as he wasn't on the original agenda. Apparently the original speaker came down with a cold, so he filled in. Honestly, I'm glad he did. It was a very enlightening talk. Adrian focused on a central theme, drifting into failure.

Drifting into failure stems from the idea that people in the workplace may not report issues to upper management for fear of inadequacy or even worse, losing their job. These issues, albeit small at the time, start to pile up eventually leading to a catastrophic event.

One example of this was the fact that airlines that had more reported incidents, actually had less fatalities. At first, this seems pretty counter intuitive. How can you have more incidents, but less catastrophic events? Does this mean that an airline with less reported incidents actually had more fatalities? Yes.

The take away here is that incidents are going to occur no matter what. It's just a matter if they're actually reported. Just because an incident isn't reported, doesn't mean it didn't happen. We can't make our processes better if we are unaware of the problems that are occurring. We need to embrace blameless postmortem cultures so that we become aware of these incidents and do not eventually drift into failure. So how can we prevent this from happening? Create non-events.

Adrian spoke of dynamic non-events. It wasn't clear to me what the actual definition was, so I looked up the meaning in its entirety.

Reliability is a dynamic non-event. It is dynamic because the processes remain within acceptable limits due to moment-to-moment adjustments and compensations by the human operators. It is a non-event because safe outcomes claim little or no attention. - Karl Weick

Creating a non-event essentially means to report abnormal behavior, or at least be aware of it. It doesn't have to be catastrophic, and in all honesty it shouldn't be. If a metric is slightly incorrect or just under the minimum, it should be reported and discussed before it balloons into a much larger problem.

The example that he gave comes from a book, Drifting into Failure by Sidney Dekker. A component of an aircraft passed all safety measures, but barely. It was never reported, because everything was technically fine. The employees did everything they were supposed to do as laid out by the system, but in this case, the system was wrong. It lead to a disastrous crash, killing everyone on board the craft.

Non-events are a time for people to report and learn. No one should be afraid to report bad news, or the possibility of bad news. We should strive to actively create safety. His recommendation to create safety?

Break things on purpose!

GOTO had quite a bit of content on a concept that I wasn't completely aware of, chaos engineering. I knew it existed, but never really dove into the specifics of how it worked or how to practice it. This is another topic that I could easily dedicate a whole post to, so we'll just touch on some key takeaways.

Chaos engineering exists because production hates us. Leveraging DevOps is a step in the right direction, but unfortunately, production is a war zone. Nothing can guarantee that when you push the deploy button, everything is going to work exactly as you expect it to. In all honestly, production doesn't want your code to work. Chaos engineering is really about engineering ourselves out of chaos. Not introducing it into our systems, that would just be silly.

It was noted that people actually cause the most chaos. One example that was given was a command line argument. If a certain command was passed with the letter g, everything was a-OK. However, if you were to forget in that letter g, all hell would break lose. One may think this is just another case of user error, but the speaker presented it as a system error. Why is the line so small between success and failure?

So how do we implement chaos engineering? Typically through learning loops. Chaos engineering is all about "you know what you don't know", so you should actively prod those possible failure cases. The implementation steps were broken down as follows:

  1. Prod the system
  2. Hypothesis of why it happened
  3. Communicate to your team
  4. Run experiments
  5. Analyze the results
  6. Increase the scope
  7. Automate the experiments

In the end, the goal with chaos engineering isn't to break the system intentionally. You would never want to intentionally take down production and impact your customers. If you know of a failure case and how it behaves, there really isn't much value in running a chaos experiment. Everything is already known!

Old is the NEW new

Presentation: https://www.youtube.com/watch?v=AbgsfeGvg3E

I love Kevlin as a speaker, he always does a fantastic job. I highly recommend giving the presentation itself a watch, but the Cliffs notes are essentially that everything in computer science has already been done. Most of the new stuff we see today has appeared in literature that existed as early as the 50s. Software design principles, testing approaches, you name it.

The vast majority of the presentation was comparing new ideas to old ideas. Highlighting the fact that we've been thinking about and solving the same problems for quite some time now.

To me this is very similar to Uncle Bob's post, The Churn. Which states that we as developers are so infatuated with learning all of the new and shiny frameworks and libraries that are released, we drift away from the basics. We forget core principles, and never really advance because we're stuck in the churn of learning every new thing that comes out.

I agree with most of it. I couldn't imagine trying to pick up every little new thing that pops up, and frankly I just don't think it's possible.

In the end, it was a worthwhile conference. I may just have to add it to my list of yearly must GO-TO conferences...

Until next time!

Five Suggestions for Cleaner Code

Being involved in open source exposes you to a lot of different projects, individuals, coding styles... the works. It also means lots of code reviews. More reviews than I'd typically perform at work. In code reviews, I can be pretty opinionated. But I believe they stem from a strong desire to ensure that the code base is as clean and readable as possible.

With that said, these are all approaches that I would like to see all developers adopt. Nothing is hard to reason about and should be easy to start trying to implement from day one.

1. Avoid Flag Arguments

What exactly do I mean by a flag argument? I'll demonstrate with an example.

public void DoWork(bool log)  
{
  Execute();

  if(log)
  {
    // write to some log
  }
}

In the code above, we have a method that does something. When that something is done, it checks to see if we should log the event or not. If the passed in value is true, we go ahead and log the event. Otherwise, we bail out of the method.

This approach to writing methods is ultimately very confusing for the consumer. The implementation details may seem clear enough, but it's hard to reason about when used in the wild.

DoWork(true); // what could true possibly mean?  

I feel a much better approach is to actually split this behavior up into two individual methods. One method for each condition.

public void DoWork()  
{
  Execute();
}
public void DoWorkWithLogging()  
{
  DoWork();
  // log stuff
}

Here, we have a method DoWork that does just that. It only does the work and does not perform any logging. We've also introduced a method called DoWorkWithLogging that just calls DoWork, but additionally performs a logging operation. Hopefully, it's obvious given the name of this method, that it will not only do work, but it will log that it did it.

I feel this approach expresses a lot more intent throughout your codebase and reduces the cyclomatic complexity. We don't have any conditionals!

2. Do One Thing

Now, we hear this a lot, right? Do one thing and do it well. The single responsibility principle. However, it can be hard to actually reason about what does one thing actually mean? A lot of it is subjective, but I generally tend to reason about it in the form of abstractions.

An abstraction isn't a responsibility -- the caller doesn't know anything about the implementation details. The caller simply calls the abstraction and nothing more. Let's see an example.

public int CalculateCostForSize(int size)  
{
  if(size >= LARGE)
  {
    return size * LARGE_MODIFIER;
  }

  return size * DEFAULT_MODIFIER;
}

At first muster, this method is may seem completely acceptable. It's doing one thing, right? It's calculating the cost for a given size. However, there are a lot of implementation details that this method is concerning itself with.

The CalculateCostForSize method has to know what it means to be large. In this case, size is greater than or equal to LARGE. The method also has to know how to actually do the calculation. So one could argue that this method is actually doing three things.

  1. Figure out if the size is large.
  2. Calculate the cost when the size is large.
  3. Calculate the cost when the size is not large.

I feel a better approach would be:

public int CalculateCostForSize(int size)  
{
  if(SizeIsLarge(size))
  {
    return HandleLargeSize(size);
  }

  return HandleDefaultSize(size);
}

private bool SizeIsLarge(int size)  
{
  return size >= LARGE;
}

private bool HandleLargeSize(int size)  
{
  return size * LARGE_MODIFIER;
}

private bool HandleSmallSize(int size)  
{
  return size * DEFAULT_MODIFIER;
}

Now the method really only knows about the algorithm in which to calculate the size. It's not aware of any implementation details. Those are hidden behind abstractions.

3. Leverage Intention-Revealing Names

This is one that I see a lot and is an incredibly easy fix. Consider the following code snippet:

public void DoWork()  
{
  if(input % 2 == 0)
  {
    // when do I do this?
  }
}

Now as programmers, we probably all know what this code means. Using the modulus operator to figure out if a given number is even or not. But what if it's not clear? That's an easy fix then!

public void DoWork()  
{
  // check to see if the input is an even number
  if(input % 2 == 0)
  {
    // oh, why didn't you just say so?
  }
}

This is probably one of the things that grinds my gears the most when it comes to comments. I've ranted on comments before. But this just hurts me. Physically. Please avoid using comments to explain potentially confusing code. I humbly offer an alternative:

public void DoWork()  
{
  var inputIsEven = input % 2 == 0;
  if(inputIsEven)
  {
    // easy enough!
  }
}

Put another way, think instead if the example was some complicated business rule. We've all seen them..

if(input > 5 && input < 10 || (input >= 100))  

That really needs to use an intention-revealing name..

var isWithinAcceptableParameters = (input > 5 && input < 10 || (input >= 100))  
if(isWithinAcceptableParameters)  
{
}

No comments required!

4. Use Source Control.. Not Comments

I am a firm believer that code that is not required in any way to run the application, should not exist in the code base. While this does include dead code, unreachable code, etc (which may be hard to detect at times and sneak in). I mostly refer to code that has been commented out.

public void DoWork()  
{
  if(input > 0)
  {
  }
  /*
  * why didnt this work?
  * if(input > -1)
  * {
  * }
  */
}

It's an extreme example, but not uncommon in my travels. A more common example might look a little something like

var config = new Config();  
config.Title = "title";  
//config.MaxSize = 100;
config.MinimumSize = 1;  

Source control is an invaluable tool. Most say that regardless of what you're coding, you should always use source control. Even for the smallest of projects. There is no reason to muddle up your codebase with these type of comments when source control can retrieve it for you at a moments notice.

5. Avoid Long Parameter Lists

In the world of refactoring, there is a code smell called a data clump. I see this pretty frequently in code reviews. The developer introduced a slew of new methods that all have the same data set, but are all individual parameters.

Data clumps reveal themselves in a couple of ways. Through class members:

private int max;  
private int min;  
private DateTime date;

private string result;  

and through methods..

public string GetResult(int max, int min, int age, DateTime date)  
{
}

We even seem to instinctually clump these related datasets together without even noticing. Doesn't putting that newline between date and result just feel better?

The big problem is when we start copy-pasting these parameters into multiple locations. In order for the dataset to make sense, it requires a max, min, and date so why not encapsulate all three pieces of data into an object?

public class ResultContext  
{
  public ResultContext(int max, int min, DateTime date)
  {
    Max = max;
    Min = min;
    Date = date;
  }

  public int Max { get; }
  public int Min { get; }
  public DateTime Date { get; }
}

Now we can simply pass around the object, rather than a long list of parameters.

In the end, the true takeaway in all of this is.. please step away from // and /* */

A Look at Functional Refactoring

Refactoring is a common task for developers.. hopefully anyway. It keeps code cleaner, more maintainable, and reduces code duplication. It also allows for code to be extended more easily, saving us time in the long run.

xkcd source: xkcd - The General Problem (https://xkcd.com/974/)

There are many approaches we can take to refactoring as outlined by Martin Fowler's refactoring catalog. I'd wager that one of the most common refactoring approach is taking code that exists in multiple spots, wrapping it up in a method, and then calling that new method. Otherwise known as Extract Method.

To give an example, we're probably all familiar with something like the following..

public void MyMethod()  
{
  using (var file = new System.IO.StreamWriter(@"C:\log.txt"))
  {
    file.WriteLine($"Called { nameof(MyMethod) }");
  }
}
public void AnotherMethod()  
{
  using (var file = new System.IO.StreamWriter(@"C:\log.txt"))
  {
    file.WriteLine($"Called { nameof(AnotherMethod) }");
  }
}

Now, there are a couple problems with this implementation. Where the log is being stored to is explicitly defined in both of the methods. This could be problematic in the future if we decide to change the location and/or the file name of the log file.

The other problem is how we are logging is present in both methods. If we ever decided to create distributed logs, we would have to go into each method that called our logging functionality and update the call to StreamWriter to a network call.

Fear not though, we can refactor this approach and make our code a lot better. Applying the Extract Method refactoring gives us:

public void MyMethod()  
{
  WriteToDefaultLog($"Called { nameof(MyMethod) }");
}

public void AnotherMethod()  
{
  WriteToDefaultLog($"Called { nameof(AnotherMethod) }");
}

public void WriteToDefaultLog(string message)  
{
  using (var file = new System.IO.StreamWriter(@"C:\log.txt"))
  {
    file.WriteLine(message);
  }
}

Now that all of the implementation details are put into a central location, and the dependent code snippets are calling the abstractions, it's a lot easier to introduce change into the system. This approach to refactoring is called Imperative Refactoring or "Inside out" refactoring.

imperative

Visualized, we take the code in orange (the code to refactor), drop it all into a new method, somewhere in isolation. We then have the code call that new method, that abstraction we just created, instead of being intimately familiar with all of the implementation details. Pretty neat.

However, there is another approach to refactoring called Functional Refactoring or "Outside in". It is useful for when you want to refactor the surrounding code. Think of try/catch blocks or using blocks.

functional

If that is not entirely clear, let's go ahead and look at some code.

Consider the following try/catch block that calls our default logger when a method throws an exception.

public void MyMethod()  
{
  try
  {
    SomeScaryStuff();
  }
  catch (Exception ex)
  {
    WriteToDefaultLog(ex);
  }
}

public void AnotherMethod()  
{
  try
  {
    LessScaryButStillPrettyScary();
  } 
  catch (Exception ex)
  {
    WriteToDefaultLog(ex);
  }
} 

So while the above may seem elegant and straightforward, we have some code duplication. We have explicitly written two try/catch blocks as well as called the same WriteToDefaultLog method. If we wanted to change that logging call, we'd have to update it in two spots.

This may not seem like such a big deal at first, but this could rapidly grow out of control if we wanted to call some logging functionality every time an error occurred in our application.

So how can we remove this duplication?

public static void Try(Action operation)  
{
  try
  {
    operation();
  }
  catch (Exception ex)
  {
    WriteToDefaultLog(ex);
  }
}

By refactoring out the surrounding try/catch block, and passing in the inside code via an Action, we're able to create a new method, Try.

In order to leverage this method, the caller would need to pass in the code that they would want executed within the try block through a lambda expression.

public void MyMethod()  
{
  Try(() =>
  {
    SomeScaryStuff();
  });
}

public void AnotherMethod()  
{
  Try(() =>
  {
    LessScaryStuffButStillPrettyScary();
  });
}

With this approach we're able to encapsulate all of the logic that our try/catch blocks contain into a single location. Need to change the logging approach when an exception is thrown? No problem, we just need to update our Try method and we're good to go.

But wait, there's more! Consider the need to connect to a database through ADO. First you need to establish a connection to your database, and then you need to create a command to execute against that connection.

using(var connection = new SqlConnection("myConnectionString"))  
{
  using(var command = new SqlCommand("dbo.myCommand", connection)
  {
    // do some work
  }
}

This is a pretty common setup, but has a couple drawbacks. First and foremost, it is verbose. The ceremony to call a procedure is multiple lines long. Secondly, if you ever needed to change how you retrieved your connection string... it'd be a sizable effort to update every procedure call. But there's hope! By applying some functional refactoring, we can abstract these implementation details away from the callers and make our procedure calls more succinct.

...
ExecuteCommand("dbo.myCommand", () => {  
  // do work
});
...

public void ExecuteCommand(string commandName, Action operation)  
{
  using(var connection = new SqlConnection("myConnectionString"))
  {
    using(var command = new SqlCommand("dbo.myCommand", connection)
    {
      operation();
    }
  }
}

As we can see, this approach can be applied to a lot of different scenarios. Logic and implementation details are hiding everywhere!, so don't always think Inside out... try Outside in!

A Best Practices Guide for Unit Testing

As more and more developers embrace the joys of unit testing, I'm starting to see a lot more tests in code reviews, which is great to see! I am, however, seeing a lot of the same mistakes pop up.

To help with this, I wanted to use this blog post as a means to showcase a document I've been working on that outlines some best practices when writing unit tests in C#.

UPDATE: This document has been added to the Microsoft docs website (https://docs.microsoft.com/en-us/dotnet/core/testing/unit-testing-best-practices)

If you would like to contribute, the GitHub repo can be found here.

Happy unit testing!

Fakes, Mocks, Stubs, and Spies.. Oh My..

As I do more and more work with the exercism.io community, I've noticed a reoccurring theme. There seems to be a lot of confusion around the various types of "test doubles" and what each of them actually do.

Now, if you've had some exposure to test driven development already, you may already be familiar with test doubles such as fakes, stubs, spies, mocks, and dummies (did I miss any)? I feel like a lot of these terms are very similar and do not add much value, but rather just further complicate the discussion.

I prefer to simplify by classifying every test double as either a Fake, Stub, or Mock. I would even categorize a test double as a Fake. But what are Fakes, Mocks, and Stubs? I've taken a leaf out of Roy Osherove's book The Art of Unit Testing to define them as I believe they are the easiest to understand.

Fake - A fake is a generic term which can be used to describe either a Stub or a Mock object. Whether it is a Stub or a Mock depends on the context in which it's used. So in other words, a Fake can be a Stub or a Mock.

Mock - A mock object is a fake object in the system that decides whether or not a unit test has passed or failed. A Mock starts out as a Fake until it is asserted against.

Stub - A stub is a controllable replacement for an existing dependency (or collaborator) in the system. By using a stub, you can test your code without dealing with the dependency directly. By default, a fake starts out as a stub.

Now that we have a formal definition of each, there's a few key points I want to highlight.

When you say mock, you probably mean stub.

bride

This is probably the most common mistake I see when performing code reviews of test suites and discussing tests with other developers.

Consider the following code snippet:

var mockOrder = new MockOrder();  
var sut = new Purchase(mockOrder);

sut.ValidateOrders();

Assert.True(sut.CanBeShipped);  

This would be an example of Mock being used improperly. In this case, it is a stub. We're just passing in the Order as a means to be able to instantiate Purchase (the system under test). The name MockOrder is also very misleading because again, the order is not a mock.

A better approach would be

var fakeOrder = new FakeOrder();  
var sut = new Purchase(fakeOrder);

sut.ValidateOrders();

Assert.True(sut.CanBeShipped);  

By renaming the class to FakeOrder, we've made the class a lot more generic, the class can be used as a mock or a stub. Whichever is better for the test case. In the above example, FakeOrder is used as a stub. We're not using the FakeOrder in any shape or form during the assert. We just passed it into the Purchase class to satisfy the requirements of the constructor.

To use it as a Mock, we could do something like this

var fakeOrder = new FakeOrder();  
var sut = new Purchase(fakeOrder);

sut.ValidateOrders();

Assert.True(fakeOrder.Validated);  

In this case, we are checking a property on the Fake (asserting against it), so in the above code snippet the fakeOrder is a Mock.

It's important to get this terminology correct. If you call your stubs "mocks", other developers are going to make false assumptions about your intent.

The main thing to remember about mocks versus stubs is that mocks are just like stubs, but you assert against the mock object, whereas you do not assert against a stub. Which means that only mocks can break your tests, not stubs.

When you say mocking framework, you probably mean isolation framework

In the same vein as "stubs are not mocks" I wanted to touch a little bit on the idea of mocking frameworks.

Unfortunately the term "Mocking framework", is confusing and ultimately incorrect. Take any framework you may consider a mocking framework (Moq, NSubstitute, etc), sure they can mock, but they can do so much more (e.g. they're also capable of stubbing).

The goal of these frameworks are to isolate your code. We should be calling them isolation frameworks. This even confused me for quite some time, having had some experience with Moq.

Consider the example

var mock = new Mock<IOrder>();  
var purchase = new Purchase(mock.Object);

Assert.True(purchase.Pending);  

Even though we called the variable mock and Moq provides a class called Mock, in this context, it's actually a stub.

So unfortunately, mock is an overloaded word. It's confusing. Avoid if it all possible, unless you actually are referring to mocking. Explaining to a developer that you can "isolate" something from its dependencies rather than "mock" its dependencies, feels more natural.

Key Takeaways

1) If you must roll your own class for isolation purposes, consider naming it a Fake. This gives you the freedom of using it as either a Mock or a Stub and does not give off the wrong intent.

2) Mocks are not Stubs. If you do not assert against it, it's a Stub.

3) Prefer the terminology Isolation framework. Mocking framework can be confusing.

DEVintersection 2017 Shenanigans

Well, that's a wrap! DEVintersection is over after a long week of sessions, technical difficulties, and an intense all-day workshop. I'm back in my hotel room at the MGM Grand, but we need to talk about everything that happened this week before I inevitably forget.

Docker all the things!

For starters, one thing is clear, Docker isn't going anywhere. Every speaker was leveraging Docker in some form in all of their sessions, even if the talk had nothing to do with it. Scott Hanselman gave one of the keynotes, Microsoft's open source journey, and he somehow managed to segway into demoing a raspberry pi cluster leveraging Kubernetes and Docker that hosted a .NET Core web application.

Coolness factor aside, it was actually pretty interesting to learn that .NET was actually "open sourced" a long time ago, the early 2000s I believe he said. Though it was just shared as a zip file and was intended for academia. The source code was sanitized as to not leak IP, and they refused to accept any contributions. Not that there were really any channels that would allow it. So in reality, it was more... "source opened".

So.. what exactly is DevOps?

After the keynote I attended a session called How DevOps Practices Can Make You a Better Developer and while the subject matter was pretty straightforward: defining what CI (continuous integration), CD (continuous delivery), etc were.. the biggest takeaway for me was Microsoft's definition of DevOps.

DevOps is the union of people, process, and products to enable continuous delivery of value to our end users. Donovan Brown, Microsoft DevOps PM

It made a lot of sense. DevOps is about so much more than just delivery. The above definition really highlights how so many companies can get DevOps "teams" wrong. It isn't a team! It's combining process and people together in order to deliver software to our users. That means that DevOps is not only delivery, but infrastructure, testing, monitoring, and so much more.

Another common theme? Microservices!

Yep, on top of containerization, microservices was another big discussion point of the conference. Which in all honesty, makes sense. The two go pretty well together.

I attended a few sessions relating to microservices, and though they were a little introductory, there were definitely some highlights I felt jotting down. One of the sessions I went to was hosted by Dan Whalin, a pretty big name in the JavaScript world, and he outlined a quick list of four questions to ask yourself before taking the plunge into microservices.

  1. Are multiple teams working on different business areas of the application?
  2. Is it hard for new team members to understand the application and contribute quickly?
  3. Are there frequently changing business rules?
  4. Can subsystems within the application be changed with a minimal amount of impact?

I believe there's a lot of value in taking a step back and questioning whether microservices are right for you and your project. Like everything else in our field, there are no silver bullets. Microservices are a good solution to an array of problems, but it's not one size fits all. The additional complexity, especially in infrastructure, may not be worth the investment if you don't answer "yes" to all of the above, or at least some.

To top off my microservices immersion, I also attended a session on implementing a microservices messaging architecture, specifically with Azure Event Hubs. Now, the session was cut pretty short. The speaker's presentation apparently relied heavily on the Internet for some demos, which.. did not get up and running for probably a good 30 minutes. They even attempted to use their cell phone to tether a connection. A little unfortunate, but I think they made their point.

Their main points? When using messaging and events, events are facts. They happened! And, you shouldn't necessarily immediately jump into breaking up your database architecture into consistent and eventually consistent. Start with everything going to the same destination (consistent store), and if/when you need to scale, you can take an eventually consistent approach.

I promise there were other session topics..

As of late, I've been reading and writing more functional code. It's a completely different way of thinking, and there are a lot of great benefits to doing so. I saw a session entitled Functional Techniques for C# and had to give it a go.

There were a lot of defining of terms, which I am completely OK with. I seem to always forget the definitions of words, so getting refreshers on currying and high order functions was a nice introduction to the talk.

The biggest takeaway I had from the session was the idea of Functional Refactoring. Now, we've all seen Imperative Refactoring. It's the type of refactoring that most of us are familiar with. It's where we pull out some code and shove it into a method, then call that method instead of the code.

Here's a quick refresher..

public void PrintResponse(int choice)  
{
  var output = string.Empty;
  switch(choice)
  {
    case 1:
      output = "Hello";
      break;
    case 2:
      output = "Goodbye";
      break;
  }
  Console.WriteLine(output);
}

Now in this .. case .. PrintResponse has two responsibilities: it has to figure out what the response is and then it has to print it. Applying some inside-out refactoring, we get the following solution:

public void PrintResponse(int choice)  
{
  var output = GetResponse(choice);
  Console.WriteLine(input);
}

private string GetResponse(int choice)  
{
  var output = string.Empty;
  switch(choice)
  {
    case 1:
      output = "Hello";
      break;
    case 2:
      output = "Goodbye";
      break;
  }

  return output;
}

Now PrintResponse only has one job, to print the response. The job of getting the response has been abstracted away into a new private method called GetResponse

Functional refactoring is a little different, again, it's outside-in rather than inside-out as I just showed. Here's a simple example:

public void Save()  
{
  try
  {
    context.Save();
  }
  catch(Exception ex)
  {
    Logger.Log(ex);
  }
}

Applying functional refactoring..

public void Save()  
{
  Handling.TryWithLogging(() => context.Save());
}

public static void TryWithLogging(Func<DbContext> operation)  
{
  try
  {
    operation();
  }
  catch(Exception ex)
  {
    Logger.Log(ex);
  }
}

So as you can hopefully see, we've taken the outside code, the using block, and replaced it with a TryWithLogging higher-order function. With this approach, you can wrap using statements for calls to your database, wrap try and catch blocks to automatically log to the database on failure instead of copy pasting Logger.Log(ex); at the end of every catch.

Lastly an all day grind.. with C#!

I was excited about this workshop. I have honestly never done a workshop before at a conference, so I didn't really have any idea what to expect. All I knew was that it was going to be an all-day event, in a room, with the one and only Kathleen Dollard.

There was so much information in this workshop, it might even need to be a blog post all by itself, but I wanted to highlight some of the key points that I felt stood out and that I resonate with.

The first part of the workshop really focused on teams and learning. Improving yourself and others. She spoke to that research has shown that within a team, the who didn't actually matter, that is, the team's diversity. But rather, teams performed better if everyone spoke around the same amount and that there was psychological safety (no fear of belittlement during a review or just for speaking up). It's interesting to think about anyway, as I feel most of us are conditioned to believe that diversity plays a big role in the success of a given team.

She also spoke about cognitive dissonance. Which essentially states that you will take new input and associate it with your current perception. Seems obvious when it's put like that, but it makes a lot of sense and really shows why things such as code reviews are so important. We really need to be challenging what we know, day to day, so we can take that new information and apply it correctly.

For example, I once had this preconceived notion a long time ago that several meant seven in mathematics. This perception caused me to get stuck on a particular mathematics problem for quite some time because I just couldn't reason about how the problem could even be solved. The question didn't make any sense in the context of several being seven. The world was much more clear to me when I was corrected. This is why code reviews are so beneficial. If our perception is wrong, and we stay in our bubbles, we will continue to be wrong.

Now, all of that really has little to do with C#, but I found it immensely helpful as a software professional. The tips and tricks were great but mostly focused on functional techniques that I discussed previously, just in greater detail.

All good things must come to an end...

And with that, the conference was over. I got to spend a good 30 minutes talking with Kathleen after everyone else had left the workshop room about her new position at Microsoft and what they were up to. Most others had to catch flights or had other arrangements. I for some reason thought the red eye was a good idea and wouldn't be leaving until 10:00 that night..

Until the next conference!

Hacktoberfest is Coming!

photo from hacktoberfest.digitalocean.com

It's that time again! For those who don't know, Hacktoberfest is put on by DigitalOcean and GitHub as a means to celebrate open source software and to get more people involved in contributing to open source projects.

Getting Involved

So how can you get involved? Well, I'm going to be a little bias and suggest the Exercism C# repository. For this event, we have gone through and added a hacktoberfest label to issues that we felt would be perfect for Hacktoberfest. If you don't feel like working on any of those issues, no problem! We're always welcoming new ideas and/or things that we've simply missed. Just create an issue, and let's have a discussion about it.

Not interested in programming? Contributions do not have to be programming related. There are quite a few open issues that relate to documentation or DevOps. Whatever fits your skill sets, we'll find something for you.

Of course, we are just one repository out of many. Exercism itself has a variety of repositories to choose from, but any project that you can find on GitHub is free game.

If you're struggling to find projects to work on, there's an application which you can find here that lists projects that are looking for assistance during Hacktoberfest.

Did I Mention Free Stuff?

That's right! If you submit four pull requests during the event, Digital Ocean will hook you up with a free T-Shirt (the request does not even need to be merged). As long as the request was made between Oct 1 and Oct 31, it counts.

So make sure you sign up at their website and start making some contributions!

Red, Green, Refactor and the Transformation Priority Premise

Recently I stumbled across a test driven development article that mentioned something I had not heard before. It's a premise that Uncle Bob came up with as a means to order the priority of the transformations you should apply when practicing test driven development. He called it the Transformation Priority Premise

I wrote a couple small programs using the premise, and really liked the concept he was trying to convey. Though in order to fully explain the premise, we should probably talk about test driven development itself.

So.. what is TDD?

Test Driven Development

TDD is a software development methodology that has three "laws" set forth by none other than Uncle Bob. They are as follows:

  1. You are not allowed to write any production code unless it is to make a failing unit test pass.
  2. You are not allowed to write any more of a unit test than is sufficient to fail, and compilation failures are failures.
  3. You are not allowed to write any more production code than is sufficient to pass the one failing unit test.

These three laws, if adhered to, force you into a cycle commonly referred to as red, green, refactor. Let's demonstrate cycle by writing our own program. First, we'll need some requirements.

This program will return four different responses, and the response will be based on what kind of sentence is used as input.

  1. If the input ends with a period, return "Ok."
  2. If the input ends with a question mark, return "No."
  3. If the input ends with an exclamation mark, return "Quiet!"
  4. If no punctuation is found, return "What?"

This program is based on an exercism exercise called Bob, which is actually based off of another exercise, Deaf Grandma.

So where to start? The test.

Before we write any production code (the code that will ultimately end up into the compiled binary) we need to first stand up a unit test. To start, we'll need to create our System Under Test (SUT).

[TestMethod]
public void Input_has_no_punctuation_response_says_what()  
{
    var sut = new Responder();
}

And not all that surprising, the compiler is already yelling at us.

The type or namespace name 'Responder' could not be found (are you missing a using directive or an assembly reference?)  

But that's ok! We're already abiding by the first law since we started with a unit test. The compilation error is also expected; the second law states that we can't write any more of the unit test than is sufficient to fail (and compilation errors are failures).

So let's switch context a little bit and start writing some production code.

public class Responder  
{
}

We're done! The unit test compiles and passes. The third law forbids us from writing any more production code.

At this point of our development cycle we have gone through red (unit test compilation error), green (adding the Responder class to make the test pass), and now we're onto refactoring. Heh well, in this case, there's not really anything we can refactor, so we can move on.

With one cycle completed, we start from the beginning again with red. Just like last time, we need to write some more code in our test case so that it fails.

We'll want a method on the Responder that can take an input of type string, and we know our first requirement is that if no punctuation is found the result of the method is "What?"

[TestMethod]
public void Input_has_no_punctuation_response_says_what()  
{
    var _sut = new Responder();
    Assert.AreEqual("What?", _sut.Response("Hello"));
}

Now we can go ahead and compile that...

'Responder' does not contain a definition for 'Response' and no extension method 'Response' accepting a first argument of type 'Responder' could be found (are you missing a using directive or an assembly reference?)  

Another compiler error. Let's go ahead and fix that up.

We know the compiler error stems from the fact that we never implemented a Response method on the Responder class, so that's pretty easy to implement. But what do we write inside of the method body? The answer may seem a little surprising.

public string Response(string input)  
{
    return "What?";
}

That's right. A constant string value of "What?". Once again, this is because of the third law. We cannot write any more production code than is sufficient to pass the one failing unit test. It may seem a little silly at first, but bear with me, it'll hopefully make a little more sense as we continue writing our program.

Alright, so we've tested the case of no punctuation. Let's move onto a case that includes punctuation, the period. Testing for that gives us a unit test that looks like this:

[TestMethod]
public void Input_is_statement_response_says_ok()  
{
    Assert.AreEqual("Ok.", _sut.Response("Do it now."));
}

Continuing with the red, green, refactor cycle, we now have a failing test. Let's go ahead and write the bare minimum implementation.

public string Response(string input)  
{
    if(input.EndsWith("."))
    {
        return "Ok.";
    }

    return "What?";
    }
}

Easy enough, time for another test.

[TestMethod]
public void Input_is_question_response_says_no()  
{
    Assert.AreEqual("No.", _sut.Response("Please?"));
}

Next up? You've got it. Let's make this test pass.

public string Response(string input)  
{
    if(input.EndsWith("."))
    {
        return "Ok.";
    }

    if (input.EndsWith("?"))
    {
        return "No.";
    }

    return "What?";
}

Now, when we make this test pass, we can see that there is some code duplication going on that we should probably refactor. After all, after making a test pass, we are given the opportunity to refactor the code. Unfortunately, it may not always be clear on how to refactor it. There is hope, however!

The Transformation Priority Premise

As stated in the introduction, The Transformation Priority Premise (TPP) is a premise that was put together as a means to prioritize the transformations that occur when getting unit tests to pass.

When you're practicing TDD you may ask: "How doesn't all code produced by using TDD, just result in code that is specifically tailored to pass the tests?"

You might notice that we're starting to see that a little in our current program. As it stands right now, we have one conditional per unit test. There's really nothing to stop this trend from occurring. There is, however, another little mantra that goes with TDD that pushes developers away from this practice.

“As the tests get more specific, the code gets more generic.”

Put another way: As we add more tests to our system (become more specific), our code becomes more generic (agnostic to the input).

With this in mind, it should be a little clearer to see that our current approach may not be the best one that we can take to solve this problem. We're just introducing more and more if statements to make the tests pass. Let's take a stab at refactoring our code and get away from our potential mountain of conditionals.

To start, the root of the TPP is its list of transformations and their priority. Here is the full list:

  1. ({}–>nil) no code at all->code that employs nil
  2. (nil->constant)
  3. (constant->constant+) a simple constant to a more complex constant
  4. (constant->scalar) replacing a constant with a variable or an argument
  5. (statement->statements) adding more unconditional statements.
  6. (unconditional->if) splitting the execution path
  7. (scalar->array)
  8. (array->container)
  9. (statement->recursion)
  10. (if->while)
  11. (expression->function) replacing an expression with a function or algorithm
  12. (variable->assignment) replacing the value of a variable.

.. and In case you've forgotten, this is the code we're trying to refactor.

public string Response(string input)  
{
    if(input.EndsWith("."))
    {
        return "Ok.";
    }

    if (input.EndsWith("?"))
    {
        return "No.";
    }

    return "What?";
}

Now we want to refactor this in order to get rid of the duplication. We started with a single constant, "What?" which was priority #3 and moved onto splitting the execution path, #6. It's time to consult the list and see what transformations we can make in order to clean up the if statements.

Being at #6 currently, the next logical step would be to take a look at #7, scalar to array. That could probably work, but given the context of this problem, we know it's a mapping issue. We're mapping punctuation to results. So let's take it one step further and leverage #8, array to container.

Note: The difference between an array and container is that an array is generally going to be a primitive array (think int[], string[], etc). Whereas a container is going to be something like a List, Set, or Dictionary.

Using scalar to array, and then array to container, we get a refactored method that looks like this:

public string Response(string input)  
{
    var inputResponses = new Dictionary<char, string>()
    {
        { '.', "Ok." },
        { '?', "No." }
    };

    if (inputResponses.ContainsKey(input.Last()))
    {
        return inputResponses[input.Last()];
    }

    return "What?";
}

That's pretty neat. No more repeating if statements. Recompile, ensure the tests still pass.. and they do! Now, there's only one punctuation mark that remains in our requirements, and that's the exclamation mark. We just finished refactoring, so we start again from red and introduce our last test:

[TestMethod]
public void Input_is_yelling_response_says_quiet()  
{
    Assert.AreEqual("Quiet!", _sut.Response("Woot!"));
}

Going back to our production code, it should be pretty straight forward as to how we can get this test to pass.

public string Response(string input)  
{
    var inputResponses = new Dictionary<char, string>()
    {
        { '.', "Ok." },
        { '?', "No." },
        { '!', "Quiet!" }
    };

    if (inputResponses.ContainsKey(input.Last()))
    {
        return inputResponses[input.Last()];
    }

    return "What?";
}

That's all there is to it! All of our tests pass, and we've met all of the requirements that we set out to meet.

The gain from leveraging the TPP is that it keeps us grounded, and forces us to continue to take baby steps when developing code. We generally do not want to take giant leaps forward. Start with repeating if statements over and over until something like a dictionary or a loop pops out at you.

If you're interested in learning more about all of the nuances of the Transformation Priority Premise, I highly recommend checking out Uncle Bob's initial blog post on the subject which can be found here.

Microsoft's June 2017 Security Rollup Woes

Let me set the scene for you.

It's the morning of June 14th, 6:30am to be exact, and we're going to make a standards deployment. You know how I know? Because it's Wednesday. And Wednesday morning is the morning that we usually make standards deployments. Monday is just too early in the week. Tuesday, well, we just don't like Tuesdays, but Wednesday morning we make sweet, standards deployments.

Business Time

On the docket that morning were some changes that impacted how our dialog boxes behaved as well as printing throughout the entire system.

All of the scheduled changes had already gone through internal and automated testing. The necessary test records were applied to the ticket, we were ready to rock and roll.

My team and I pushed the deployment button, and not 15 minutes later, we received a call from one of our customer care representatives that printing was failing for a single customer. Then another, and another. To play it safe, we decided to simply roll back the change and ask questions later. Unfortunately.. it did not fix the problem.

The Problem

We continue to investigate potential causes and eventually made the realization that the issue only occurred on Windows machines that had consumed the latest Windows updates. Specifically the June 2017 security roll up.

After spinning up a VM with Windows 7 and the latest updates, we're in business. We notice that printing from an iframe results in a blank page. Fiddler and the developer tools reveal it's actually an HTTP status code of 404.

See, when printing an iframe (or from within an iframe) IE apparently takes the contents of the iframe and downloads it locally. This downloaded iframe is then used as the source for the iframe and then it is printed from that copy. However, it appears that this security change was now referencing the iframe on the web server.

For example

<iframe src="myframe.html" />  

Became

<iframe src="ABC123.html" />  

Where ABC123.html is just some auto-generated name for the locally downloaded iframe. The problem with this is that obviously ABC123.html does not exist on our web server. So the iframe was loading a page that did not exist, and thus, printing a blank page.

The Solution

Luckily the community banded together. It gave us confidence that this was not a localized issue and that other 3rd parties were experiencing the same pains. Progress!

Everyone in the community was pretty sold on the same workaround. Instead of using window.print() use document.execCommand('print'). So we decided to implement this change in the core areas of the system that print from within an iframe (which about covers 99% of the system).

We deploy the change to our quality environment where its tested by our internal teams and customers a like. Everything passes testing, everyone seems pretty happy with the results, and we push the change to our Production environment the following week.

Everything is going smoothly, we're back up and running! Except for one area of the system... our check printing module.

Now, checks unfortunately fell within that 1%. The part of the system that wasn't covered by our change. Which we immediately thought wasn't a big deal, we can just adjust how checks are printed using the same method. How wrong we were..

Checks are a special flower within our system and have their own design. Completely separate from how other areas of the system (reports, labels, etc) print. I won't get into all of the technical details of how they were implemented, but it essentially meant that the only way to resolve the issue was to fix the underlying HTML.

After a couple days of tweaking HTML and CSS, pixel by pixel. Comparing old checks to the new implementation, everything looked good for launch. The finish line was right in front of us.

Conclusion

But of course, the morning of the final deployment.. Microsoft released a hotfix.

It's honestly great news. That's the best outcome we could have hoped for. A timely fix from the original offender. It just could've come a little sooner...

That's the game you have to play though. Software that you depend on breaks, which in turn, breaks your customers. You only have a couple options. You can do nothing and hope that the offender will release a hotfix in a timely manner, or you can make strides to develop a workaround.

We immediately took the latter approach, as it's the safest and quickest way to get our customers back up and running. Who knows when Microsoft would've released a fix. It could have never come for all we know.

As they say.. better late than never.

Coding Standards Should Not Matter

Yeah that's right. That behemoth of a document that you call your standards document just does not matter. Now that's a pretty bold statement, so let me explain.

I believe that a standards document should be about how to style code. It should be about where to place curly braces, it should answer the great spaces vs. tabs debate (even though there is obviously only ever one clear choice), and similar concerns. What a standards document should not contain is substance, best practices. So what is a standard? A best practice?

It's very likely that the definition of a standard or a best practice vary from individual to individual, so to get everyone on the same page, these are how I define standards and best practices.

  • Standard: A rule put in place by an authority for the purpose of consistency.
  • Best Practice: A proven approach, generally the best way to solve a problem.

The key would in the standards definition is consistency. It should not matter at all how the code is formatted as long as the team agrees to an approach and follows it. Ensuring that the coding style of the repository remains consistent.

Consistency is extremely important. Code should look like a team wrote it, and not any one individual. Everyone on the team is going to be sharing the same code base and the style should not vary from developer to developer. You definitely do not want developers as their first order of business when starting a new task to be to reformat the code to their own personal preference.

A best practice is more along the lines of do not wrap your entire class in a try-catch and swallow the exception to avoid generating errors or keep your collection setters private. While as important as a standard, if not more so, these types of statements should not exist within a standards document.

So why shouldn't a standards document contain both?

First and foremost, they are different concerns. We generally like to split up concerns into easily digestible bits and not combine them. It is not going to matter all that much if your team puts curly braces on the same lines as your conditionals. It is going to matter if you start piling all of your concerns into a single class.

However, from my experience, a standards document is rarely just standards. A standards document will generally combine standards and best practices.

Though I do not believe the solution is to split up these two concerns into different documents. No, I believe a standards document should not exist. Emphasis on document. We should have standards. Unfortunately the medium we choose to present our standards is generally the incorrect one.

So whats the better approach? Let your code be your standards document.

I believe that coding standards are better enforced through tooling and the code base itself. Tools such as ReSharper and StyleCop allow teams to define their standards and then have them be automatically enforced through the development environment. No need to spend time writing out paragraphs in English, comparing and contrasting different styling approaches (Do vs. Don't). No need to worry about document formatting. Simply define your rules and push it to the teams who you intend to consume the styling that was agreed upon.

Like the old saying goes, the best standards document is no standards document at all.

The Rules Pattern: How to Drop Your Guard

In your travels as a programmer, you will more than likely come across a body of code that looks a little something like the following:

public bool CheckSystem(Computer computer)  
{
    if (computer.Ghz < 3)
    {
        return false;
    }

    if (computer.Ram < 4)
    {
        return false;
    }

    if (computer.DiskSpace < 10)
    {
        return false;
    }

    return true;
}

Here we have a method called CheckSystem which tries to validate whether or not a model of a computer meets all of the system requirements. If the model fails to meet all of the specified requirements, the method will return false. It attempts to validate the model by using a number of different conditionals, one after the other. These types of conditionals are called Guard Clauses.

For a method that may only have a few conditions to check, guard clauses are completely acceptable. There's no need to over complicate things. However, if you find yourself with a large number of conditions to validate and/or you feel that the conditions will change over time, you may want to consider an alternative approach. Enter the Rules Pattern.

The Rules pattern is not a pattern that you'll see in the design patterns book by the Gang of Four, but could be considered an implementation of the Command pattern.

For the purpose of this blog post, we'll be implementing a set of rules to check if a computer meets all of the minimum requirements to run a given piece of software. Here's how it works.

First create an interface that all of your rules will inherit from.

public interface ISystemRequirementsRule  
{
    bool CheckRequirements(Computer computer);
};

The interface will only expose one method which will be used to validate the condition for the rule.

You will then need to create all of your rules. As stated previously, each of your rules will inherit from the same interface. Each rule will be one of your guard clauses.

class DiskSpaceRule : ISystemRequirementsRule  
{
    public bool CheckRequirements(Computer computer)
    {
        var ruleResult = computer.DiskSpace > 10;
        return ruleResult;
    }
}

This rule validates that the disk space of the computer exceeds 10. Ten what? You can associate whatever unit you want, doesn't matter in this case! Create as many of these rules as required to ensure all of your requirements are checked.

Next, you'll need to create a class whose responsibility is to run through and validate every rule that you have created. There are a couple approaches that I would recommend.

public class SystemRequirementsChecker  
{
    var _rules = new List<ISystemRequirementsRule>();

    public SystemRequirementsChecker()
    {
        _rules.Add(new DiskSpaceRule());
        _rules.Add(new RamRule());
    }

    public decimal CheckSystem(Computer computer)
    {
        foreach (var rule in _rules)
        {
            if(!rule.CheckRequirements(computer))
            {
                return false;
            }        
        }

        return true;
    }
}

This approach simply holds all of the available rules in a collection. When you want to validate all of your rules, simply call the CheckSystem method. This method will then iterate through all of the rules that you have defined in the constructor.

If you want an approach that fully embraced the Open/Closed Principle then I would recommend something similar to the following:

public class SystemRequirementsChecker  
{
    private readonly IEnumerable<ISystemRequirementsRule> _rules;

    public SystemRequirementsChecker()
    {
        _rules = GetRules();
    }

    public bool CheckSystem(Computer computer)
    {
        return _rules.All(r => r.CheckRequirements(computer));
    }

    private IEnumerable<ISystemRequirementsRule> GetRules()
    {
        var currentAssembly = GetType().GetTypeInfo().Assembly;
        var requirementRules = currentAssembly
                .DefinedTypes
                .Where(type => type.ImplementedInterfaces.Any(i => i == typeof(ISystemRequirementsRule)))
                .Select(type => (ISystemRequirementsRule)Activator.CreateInstance(type))
                .ToList();

        return requirementRules;
    }
}

This approach uses reflection to find all classes that inherit from the ISystemRequirementsRule interface (all of the rules that you will have written). It then takes all of these rules, instantiates them via CreateInstance, and returns them as a list so that the CheckSystem can iterate through them and validate each and every rule.

The reflection approach allows you to create a new rule with the expected interface, rebuild the application, and that's it. Your new rule will be enforced inside of the SystemRequirementsChecker. No need to touch any of the pre-existing source code!

The former approach, utilizing a list and instantiating each rule in the constructor would require you to create the class and then modify the SystemRequirementsChecker to include your new rule before it knew about its existence.

I'd consider either approach acceptable, it depends on your situation.

So there you have it, the Rules Pattern. A useful pattern that you typically don't run across when studying design patterns. Hope it helps!

Who Moved My Cheese?!

Earlier this morning, I was reminded of an incident that I had with a customer a couple years ago. It went a little something like this:

I was working on a web application. It consisted of a tabular grid that displayed information based on search criteria. Very similar to anything you would see on any sort of e-commerce website that allowed you to filter products by price range, name, etc. The customer wanted to enhance this specific grid. They wanted a new filter that would give them the ability to hide all records that were considered inactive. A fairly reasonable request.

Okay great, requirements established. I began on my merry way.

Now, this sort of request is relatively trivial in our environment. We need to add a new UI control to the filter section and then wire it up to the backing stored procedure. The whole process probably took around two hours from opening the development environment to pushing it out into our production environment. Once deployed to production, we inform the customer that their feature request is now live, they sign off on the change, and are as happy as could be.

Alt text

...I receive a notification that the customer is having difficulty filtering results within the grid, because a filter that used to be there no longer is.

Now, this struck me as odd, because I actually added a filter. So I opened up the application myself to see if I could validate the customers claim. I was not able to do so. Everything looked exactly as I would expect, and nothing changed from when the customer had previously signed off on it.

In an effort to get the potential miscommunication resolved swiftly, I hopped on a conference call with the customer to see if we could get to the bottom of this. We had already done some back and forth on the ticket itself, so the customer (lets call him Mark) and I had already established a little bit of a rapport. The conversation went something along the lines of:

John: Hey Mark, It's John. So show me the screen you're looking at. I'm seeing the filter on me end.
Mark: See here? The filter isn't on the top left.
John: Yeah we added a new filter yesterday, so the filter that you're looking for is right next to it.
Mark: Oh, I see it now. Yeah, that's not going to work. We would have to retrain all of our employees.
John: Why is that?
Mark: Our documentation on how to use this screen in our facility includes screenshots of the filters. It explicitly states which filters to use and where they are on the screen. We would have to print out entirely new documentation for our facility and re-train all of our employees.

This immediately got me thinking about a story I had read some time ago called Who Moved My Cheese? I'll let you read the Wikipedia entry if you are so inclined, but it essentially speaks to how people have a hard time adjusting to change. Now, for me, this came out of nowhere. Never would I have guessed that the customer had such rigid documentation of how to use our software.

The software that I develop and maintain changes many times per days. Sometimes we push over a hundred changes to the system in a given day. It's fluid, changing constantly. So to think that a user would assume our layout had no potential for change and would revolve their whole training philosophy around something that had the potential to change is a little boggling. On top of all of this, it was the customer who requested the change, just a different department.

I ended up getting the situation resolved by getting the other department involved and explaining the situation. The change was going to stick, there really wasn't a way around it.

So here we had a feature request directly from the customer, the change approved in our quality environment, and then verified on our production environment. Only to be followed up with an urgent message from the same customer stating that their workforce was slowed because the training was no longer valid after the change was made.

The whole situation really made me take a step back and realize that you can never be completely confident in any change you make to an existing system.

A Comment on Comments

Full disclosure, I really don't like comments.

Like any feature of a language, I believe they have their purpose. After all, they were put in the language in the first place. I've just seen a lot of incorrect uses of comments, and the more I run into them, the more I feel like they don't add a whole lot of value.

Lets dive into some of the reasons why.

Comments bloat classes

I see auto generated comments everywhere. They plague a large portion of our code base. They have good intentions, but often end up making the file larger and harder to understand. Especially when the developer pays no mind to the content that was generated.

public class Cube  
{
    /**
    * The height of the cube.
    */
    public Height { get; set; }

    /**
    * The width of the cube.
    */
    public Width { get; set; }

    /**
    * The depth of the cube.
    */
    public Depth { get; set; }
}

We've turned a simple model of a cube which could be a few lines long, into a bloated model with more lines of comments than of actual useful code. Everything that the comments are describing are self evident. The comment headers just aren't needed. In all honesty, if I see something similar to the above, I just go ahead and delete the comments.

Comments are code duplication

One of my favorite light bulb moments. We as developers like to avoid code duplication, reuse is king. If you take a step back and think about comments in this regard, that's exactly what they are.

Code explains what is going to be done. Comments explain what is going to be done. They just do so in different languages. Just like with code duplication, if you change a line of code, you best remember to also change the comment lest you want to be left with:

/**
* ...
<summary>Returns true when the test succeeds; false otherwise.</summary>  
*/
public TestResult RunTest()  
{

You can probably guess that this method used to be a bool, and it was decided that the returned result needed to be a little more verbose. So the developer went ahead and created a TestResult object to store some more information about the result of the test. However, they forgot to update the comment, and now it's out of sync

Comments can be a sign of a bigger problem

Even if you believe you are using comments as intended, to explain potentially confusing code, a better solution is to just make the code less confusing.

if(!(x > 10 && y < 5 && z == -1)) // ensure the object is not out of bounds  
bool isOutOfBounds = (x > 10 && y < 5 && z == 0)  
if(isOutOfBounds) {  

You could even go a step further and assign each int its own const variable so the user isn't guessing what exactly 10, 5, and 0 even mean.

Finally.. I've seen my fair share of auto generated comments, and you know what?

Auto generated comments make me cry
/**
* Processes the async
*/
public async Task<Result> ProcessAsync()  

Oh.. I had no idea.

Anyway, after all this comment bashing, lets talk about a couple cases that I believe to be a strong candidates for comment use.

Public APIs are a great use of comments. You are going to have a large body of people consuming your code, calling your methods. It can be greatly helpful if everything is well documented, explaining exactly how everything is supposed to work.

Another valid use for comments is explaining why code is present. I like to think that code explains how it works, where comments explain why it works. Code can only make so much sense, regardless of how you name your variables, how well you format it, etc. Sometimes, there exists a scenario that just needs a little more explanation.

/* Reflection may seem like an odd choice here, but after testing the use case
* scenarios it proved to be much faster than splitting everything up into database
* calls */

Like I said in the beginning. Comments were put into languages for a reason, they have their place. Please, just use them sparingly.

IIS Debugging Roulette

It's a typical day in the office. You've spent your day writing top notch unit tests and elegant code. You sit back, marvel at what you have accomplished and decide, being that good developer that you are, that maybe you should do some functional testing.

You navigate to the Debug pane, click your trusty "Attach to Process" and this familiar window pops up:

"Here we go again" you say to yourself, dreading what is to come next. You click on the first w3wp.exe process and attach. Nope.. doesn't look like that worked. You repeat this until you finally find the correct process to attach to so you can test your code. You hate having to do this every time you want to debug, but you accept it as a necessary evil of debugging an IIS process. It's just how it has to be done, right?

IIS Manager to the Rescue

You may be relieved to hear that this isn't how it has to be, there is hope! If you navigate your IIS Manager, you should see an option called Worker Processes underneath the IIS section.

After choosing this option, you will be presented with the Worker Process feature, which describes all of the w3wp.exe processes in greater detail than the debugger window. It is here that you can confidently link the application pool that you're trying to attach to with the correct process identifier. No more gambling!

Keep Your Collection Setters Private

When exposing properties of a class, you may find yourself immediately exposing a public getter and a public setter. This is a very common approach, especially in anemic models. However, this approach can be problematic when applied to collections. Allowing consumers of your collection to freely modify the entire collection is very destructive. In this post, I will go over two scenarios in which a public setter on your collection can get you into trouble.

You cannot iterate over a null collection

Lets compare two approaches to empty a collection. One approach that we could take is to call the .Clear() method like so:

myCollection.Clear();  

Another approach, assuming we had a public setter, could be done by setting the collection to null:

myCollection = null;  

This is the first problem area for allowing public setters on your collections. By allowing your consumers to directly modify the collection, it's no longer possible to control how they will interact with your collection. Setting the entire collection to null for example, can potentially cause problems later in your programs life cycle.

Iterating over your collection is one of these pain points. The following code will behave differently if the collection is set to null, or if it is simply empty.

for(var item in collection)  
{
    // do stuff with collection items
}

If the collection was set to null, you will run into a lovely NullReferenceException, like the one below.

An unhandled exception of type 'System.NullReferenceException' occurred in YourProgramHere.exe

This is because all collections expose a GetEnumerator() method, which foreach leverages to iterate over your collection. If the collection is null, its GetEnumerator() method cannot be called. However, if the collection was simply empty via .Clear(), the run time error would not occur.

Events will not be triggered

Another problem that can arise is that it be very hard, if not impossible, to know when your collection has changed. Consider the following MyCollection class.

public class MyCollection  
{
    public ObservableCollection<int> Numbers { get; set; }

    public MyCollection()
    {
        Numbers = new ObservableCollection<int>();
        Numbers.CollectionChanged += collectionChanged;
    }

    private void collectionChanged(object sender, NotifyCollectionChangedEventArgs args)
    {
        Console.WriteLine("Hey! Listen!");
    }
}

This class defines a collection that will output some text when the collection changes. Now while this example only shows one collection, it could be entirely possible that this collection has many subscribers that wish to be notified when its contents change.

Using this collection, the only time the collectionChanged event will fire is when you leverage its .Add(), .Remove(), etc methods. Setting the collection to null will indeed empty the collection, but its subscribers will be unaware of anything that has happened.

In summary, when you expose a public setter on a collection, you are exposing much more behavior than you need to. It gives a lot of unnecessary control to your consumers, which could potentially put your collection into an undesirable state.

Prefer readonly to const

I was recently reading Effective C# by Bill Wagner. In the book, the author makes the statement that we should prefer readonly to constants. I wasn't immediately able to piece together why we should, I mean, what's even the difference between the two? To answer this, let's first discuss how they're the same.

Both are static

Now obviously static readonly is static because it is decorated by the static keyword. One thing you may have overlooked however though, is that the const keyword is implicitly declared as static.

Static, put simply, means that whatever member it is decorating, belongs to the class itself rather than the object. The static member must be referenced by type and not by an instance. For example, the following code would not compile:

public class TheClass  
{
    public static void TotallyStaticMethod() { ... }
}
var someClass = new TheClass();  
someClass.TotallyStaticMethod(); // cannot be accessed with an instance reference;  

The above would generate a compiler error, specifically: Member 'TheClass.TotallyStaticMethod()' cannot be accessed with an instance reference; qualify it with a type name instead.

Both are immutable

Once declared readonly or const, the value cannot change. Now, there is a slight difference as to when the values become truly immutable. The value of a const must be initialized when it is declared. No sooner (though that'd be impressive) and no later.

On the other hand, readonly may be initialized during its declaration or in the constructor of the class that it was declared. This is useful for facilitating dependency injection through the constructor or even configuration values.

The biggest difference between the two?

Const is evaluated at compile time

What does this mean exactly? When you set the value of a constant, the compiler will actually take your variable assignment and bake it directly into the IL.

To see this in action, let's say we have the following code snippet.

namespace AnotherAssembly  
{
    public class Library
    {
        public static readonly string ReadOnlyValue = "first readonly";
        public const string ConstValue = "first const"; 
    }
}

If you were to compile the above code, reference the DLL in another project, and inspect the DLL with ILSPY (or your disassembler of choice) you would see the following output:

This clearly shows that our readonly variable is a reference type, referencing the ReadOnlyValue in the Library class which lives in AnotherAssembly. On the other hand, the const variable is being directly loaded with the value "first const".

This means that if you were to change the value of a const in an assembly, the assemblies that depend on that const will have the old value until they are rebuilt. This could cause a lot of headaches down the road, which is why it's always best to reserve const for values that you know will never change.

So while it is true that const will be slightly more performant than readonly, it's going to be a negligible amount. Always keep in mind that premature optimization is the root of all evil!

Summary

  1. const is evaluated at compile time, readonly is evaluated at runtime.
  2. Prefer const for values that you know will never change, for any reason.
  3. When in doubt, use readonly (the performance gain is negligible).

Design Patterns are Solutions

I believe there comes a point in every budding software developers career, when they will decide that they should learn design patterns. After all, there is more to developing than copying and pasting code from stack overflow, right? They want to be able to engineer their own solutions.

However, the majority all seem to fall into a similar trap. They read book after book. Design Patterns Explained, Head First Design Patterns, and lets not forget what is considered the holy grail of design patterns, the Gang of Four.

They may then move onto routinely performing katas, such as the the Greed Kata to fully memorize, and perfect their Strategy Pattern implementation. Blissfully awaiting the time where they can actually utilize all of the knowledge they have gained.

Unfortunately, the developer will end up trying to fit the [insert design pattern here] design pattern into every problem, where it probably does not belong. It's a shiny new tool, and they just want to use it everywhere. This is a textbook case of starting with a solution, and trying to find a problem. Which should sound pretty backwards!

You could think of reading about design patterns as reading the answer key for a test. The answer key is going to give specific questions with specific answers. You will probably do really well on that test, but you will leave more than likely not knowing why those were the answers. After all, you didn't put any work into actually solving the problem. You saw a pattern, you knew the answer, and just implemented it without a second thought. What happens on the next test that doesn't include an answer key, but the context is relatively the same?

Design patterns are solutions. To appreciate a solution, you must first suffer from the problem. The issue with diving into patterns head first is that you will never truly understand why the solution exists in the first place. The importance of actually experiencing the pains first hand are almost completely ignored.

The point that I want to make most clear is that you should not try to find solutions to problems that you don't have.

Now, I'm obviously not saying that reading about design patterns is completely taboo. There are a lot of things that can be learned from reading. When learning design patterns however, I believe that the best way is to simply write code and make mistakes.

Go ahead and copy and paste your switch statement throughout your code base, what does it really matter? You'll soon find out when you want to introduce a new case in your statement.