Trustwave Blog

When Patching Goes Wrong: Lessons Learned from The CrowdStrike Incident

Written by | Aug 16, 2024

Patches are a way of life for any network administrator and are the most efficient method in place to ensure systems are running the most up-to-date and secure versions of their various software applications.

For the most part, updates take place behind the scenes, with the average person only noticing a patch being installed when they are asked to reboot their machine to install the new version.

However, on July 18, CrowdStrike pushed out what organizations quickly discovered was a buggy patch to the 8.5 million Windows computers running CrowdStrike Falcon globally. The result was millions of computers displaying the infamous Blue Screen of Death, effectively knocking hundreds of businesses offline and impacting untold millions of people as the airline, healthcare, financial, and dozens of other industries shut down.

Trustwave quickly reacted to what was a rapidly escalating and changing situation by ensuring its clients that it was aware of the news, while proactively assessing and monitoring our clients who may have been impacted. Additionally, Trustwave announced it was prepared to offer assistance to any organization or help those looking to improve its security stack.

Trustwave SpiderLabs Senior Security Research Manager Karl Sigler recently sat down to discuss what transpired that day and those that followed.

 

Q: Can you summarize what happened, and could it have been worse?

Sigler: The 8.5 million computer screens that went blank all belonged to computers serviced by CrowdStrike. This outage occurred in 78 minutes as CrowdStrike pushed out the patch.

It was an incredible failure like nothing I had seen before, and it reminded us of the dark predictions before we reached the year 2000 that the millennium bug would crash computers worldwide. That didn't happen, and in this case, the CrowdStrike outage was confined only to its client computers, not a worldwide disaster as some predicted with Y2K.

The effects seemed catastrophic and perhaps similar to Y2K because CrowdStrike has grown into a worldwide organization so the impact was so widely felt.

This patch was specifically for the Microsoft Windows version of CrowdStrike's Falcon software, so it only affected Windows operating systems with CrowdStrike installed. Most of the computers in the world continued to operate without serious consequences. If there were any problems, it was when one computer crashed, and it had other dependent computers below it that didn't have CrowdStrike installed but nevertheless could not run. This is particularly true in an industry like healthcare, where so many computer systems are different and yet trying to communicate with each other.

 

Q: It's logical to assume that patching millions of computers simultaneously is challenging, but most upgrade evolutions occur without any problems. What do you think took place with this particular patch?

Sigler: Yes, patches can be a challenge. When a patch must be put in place fast, we call it agile patching. CrowdStrike, like many vendors, doesn't want to leave it up to their customers to decide when the patch will get installed. Thus, they automate the delivery. Back in March, CrowdStrike put out a series of patches in the same group as this one, and the company had assumed that follow-up patches would be secure and didn't need more quality assurance.

However, the error went out and shut down all the computers. It was simply a case of human error.

 

Q: What was the biggest challenge facing the recovery effort?

Sigler: That became a horrible process. Machines had no operating systems running, and the bad patches had to be manually removed from each machine. This can take weeks for big organizations to accomplish and perhaps days for smaller companies.

 

Q: Should this incident lead to patches being implemented in a different manner?

Sigler: It's popular to say that the deployment of CrowdStrike’s patch worldwide is to blame. I don't think that's the issue. When you reach that size, you simply have to take better care that you are developing quality products. They should have done a better job in preventing this.

 

Q: Do clients need to have a system in place to check if a patch is dangerous?

Sigler: Layers of redundancy at the endpoint probably would not help. Two anti-malware security systems on the same computer could amount to competing software and would cause more problems than they save. It must be understood that layered protection wouldn't have prevented the CrowdStrike software from crashing because it was a bug at fault, not a cyberattack.

 

Q: Should the government play a role in ensuring this and similar incidents do not occur?

Sigler: I'm a fan of regulations. Some security companies spend too much time worrying about their bottom lines of profit and introduce security risks in the process. The threat of more government penalties for accidents or neglect should force companies to do more quality assurance in the future. The European Union, the UK, and countries like Japan and Korea have implemented more security regulations than we have in the US. We ought to watch the benefits of those regulations closely.

 

A version of this interview appeared in Crain's Chicago Business.