work

2 posts

Microsoft's June 2017 Security Rollup Woes

Let me set the scene for you.

It's the morning of June 14th, 6:30am to be exact, and we're going to make a standards deployment. You know how I know? Because it's Wednesday. And Wednesday morning is the morning that we usually make standards deployments. Monday is just too early in the week. Tuesday, well, we just don't like Tuesdays, but Wednesday morning we make sweet, standards deployments.

Business Time

On the docket that morning were some changes that impacted how our dialog boxes behaved as well as printing throughout the entire system.

All of the scheduled changes had already gone through internal and automated testing. The necessary test records were applied to the ticket, we were ready to rock and roll.

My team and I pushed the deployment button, and not 15 minutes later, we received a call from one of our customer care representatives that printing was failing for a single customer. Then another, and another. To play it safe, we decided to simply roll back the change and ask questions later. Unfortunately.. it did not fix the problem.

The Problem

We continue to investigate potential causes and eventually made the realization that the issue only occurred on Windows machines that had consumed the latest Windows updates. Specifically the June 2017 security roll up.

After spinning up a VM with Windows 7 and the latest updates, we're in business. We notice that printing from an iframe results in a blank page. Fiddler and the developer tools reveal it's actually an HTTP status code of 404.

See, when printing an iframe (or from within an iframe) IE apparently takes the contents of the iframe and downloads it locally. This downloaded iframe is then used as the source for the iframe and then it is printed from that copy. However, it appears that this security change was now referencing the iframe on the web server.

For example

<iframe src="myframe.html" />  

Became

<iframe src="ABC123.html" />  

Where ABC123.html is just some auto-generated name for the locally downloaded iframe. The problem with this is that obviously ABC123.html does not exist on our web server. So the iframe was loading a page that did not exist, and thus, printing a blank page.

The Solution

Luckily the community banded together. It gave us confidence that this was not a localized issue and that other 3rd parties were experiencing the same pains. Progress!

Everyone in the community was pretty sold on the same workaround. Instead of using window.print() use document.execCommand('print'). So we decided to implement this change in the core areas of the system that print from within an iframe (which about covers 99% of the system).

We deploy the change to our quality environment where its tested by our internal teams and customers a like. Everything passes testing, everyone seems pretty happy with the results, and we push the change to our Production environment the following week.

Everything is going smoothly, we're back up and running! Except for one area of the system... our check printing module.

Now, checks unfortunately fell within that 1%. The part of the system that wasn't covered by our change. Which we immediately thought wasn't a big deal, we can just adjust how checks are printed using the same method. How wrong we were..

Checks are a special flower within our system and have their own design. Completely separate from how other areas of the system (reports, labels, etc) print. I won't get into all of the technical details of how they were implemented, but it essentially meant that the only way to resolve the issue was to fix the underlying HTML.

After a couple days of tweaking HTML and CSS, pixel by pixel. Comparing old checks to the new implementation, everything looked good for launch. The finish line was right in front of us.

Conclusion

But of course, the morning of the final deployment.. Microsoft released a hotfix.

It's honestly great news. That's the best outcome we could have hoped for. A timely fix from the original offender. It just could've come a little sooner...

That's the game you have to play though. Software that you depend on breaks, which in turn, breaks your customers. You only have a couple options. You can do nothing and hope that the offender will release a hotfix in a timely manner, or you can make strides to develop a workaround.

We immediately took the latter approach, as it's the safest and quickest way to get our customers back up and running. Who knows when Microsoft would've released a fix. It could have never come for all we know.

As they say.. better late than never.

Who Moved My Cheese?!

Earlier this morning, I was reminded of an incident that I had with a customer a couple years ago. It went a little something like this:

I was working on a web application. It consisted of a tabular grid that displayed information based on search criteria. Very similar to anything you would see on any sort of e-commerce website that allowed you to filter products by price range, name, etc. The customer wanted to enhance this specific grid. They wanted a new filter that would give them the ability to hide all records that were considered inactive. A fairly reasonable request.

Okay great, requirements established. I began on my merry way.

Now, this sort of request is relatively trivial in our environment. We need to add a new UI control to the filter section and then wire it up to the backing stored procedure. The whole process probably took around two hours from opening the development environment to pushing it out into our production environment. Once deployed to production, we inform the customer that their feature request is now live, they sign off on the change, and are as happy as could be.

Alt text

...I receive a notification that the customer is having difficulty filtering results within the grid, because a filter that used to be there no longer is.

Now, this struck me as odd, because I actually added a filter. So I opened up the application myself to see if I could validate the customers claim. I was not able to do so. Everything looked exactly as I would expect, and nothing changed from when the customer had previously signed off on it.

In an effort to get the potential miscommunication resolved swiftly, I hopped on a conference call with the customer to see if we could get to the bottom of this. We had already done some back and forth on the ticket itself, so the customer (lets call him Mark) and I had already established a little bit of a rapport. The conversation went something along the lines of:

John: Hey Mark, It's John. So show me the screen you're looking at. I'm seeing the filter on me end.
Mark: See here? The filter isn't on the top left.
John: Yeah we added a new filter yesterday, so the filter that you're looking for is right next to it.
Mark: Oh, I see it now. Yeah, that's not going to work. We would have to retrain all of our employees.
John: Why is that?
Mark: Our documentation on how to use this screen in our facility includes screenshots of the filters. It explicitly states which filters to use and where they are on the screen. We would have to print out entirely new documentation for our facility and re-train all of our employees.

This immediately got me thinking about a story I had read some time ago called Who Moved My Cheese? I'll let you read the Wikipedia entry if you are so inclined, but it essentially speaks to how people have a hard time adjusting to change. Now, for me, this came out of nowhere. Never would I have guessed that the customer had such rigid documentation of how to use our software.

The software that I develop and maintain changes many times per days. Sometimes we push over a hundred changes to the system in a given day. It's fluid, changing constantly. So to think that a user would assume our layout had no potential for change and would revolve their whole training philosophy around something that had the potential to change is a little boggling. On top of all of this, it was the customer who requested the change, just a different department.

I ended up getting the situation resolved by getting the other department involved and explaining the situation. The change was going to stick, there really wasn't a way around it.

So here we had a feature request directly from the customer, the change approved in our quality environment, and then verified on our production environment. Only to be followed up with an urgent message from the same customer stating that their workforce was slowed because the training was no longer valid after the change was made.

The whole situation really made me take a step back and realize that you can never be completely confident in any change you make to an existing system.