September 10, 2024

The Worst Case Scenario: What Happened in the CrowdStrike Outage?

By Anir Desai and Bill Kachersky

The world was recently shaken by a global outage brought about by a faulty software update by CrowdStrike, a prominent player in the field of endpoint security and threat intelligence. The outage was brought about by an undetected error in a rapid response content configuration update to CrowdStrike’s Falcon platform, which caused Windows systems to crash and display the “blue screen of death.” The defect was found specifically in a Falcon content update for Windows hosts. Mac and Linux systems were unaffected.

The outage led to substantial disturbances in IT operations worldwide. Businesses across various industries faced significant disruptions. An unprecedented number of Windows systems, estimated at around 8.5 million, were directly affected. Many large public and private sector businesses and organizations experienced service disruptions. Airlines were notably affected, with many flights grounded. Many health systems also faced significant operational disruptions due to the outage, preventing providers and patients from accessing essential services such as patient records and electronic health record (EHR) systems.

This incident, though temporary, had significant repercussions across various industries relying on CrowdStrike’s services. These repercussions also had adverse impacts on organizations and individuals who rely on services provided by CrowdStrike’s many customers, highlighting the need for robust vendor management and a comprehensive understanding of third and fourth-party risks. This blog will delve into the effects of the outage, its implications for vendor management, and strategies organizations can adopt to evaluate the impact of their vendors on daily operations.

Implications of the Outage

To start, let’s break down the impact of this event and its implications as it relates to those businesses and organizations that were affected by the outage.

Immediate Impact

Many businesses experienced interruptions in their critical services. This included healthcare systems unable to access electronic health records, airlines grounding flights, and railways facing scheduling issues.Companies faced short-term financial impacts due to the downtime’s impact on business operations, such as a halt in productivity, loss of sales or customers, and the costs for remediation.Many companies had to redirect their IT resources to manage the fallout of the outage. This pulled IT personnel away from their regular duties, causing further disruptions to projects and regular operations. IT support teams faced a surge in requests, leading to additional workload and possibly slower response times for other support requests.

Long-term Impact

The outage also has long-term implications for how these organizations approach vendor management, particularly critical service providers.For many organizations, trust and credibility was unavoidably undermined. There may be a need to restore the trust of their customers, stakeholders, and the public. Depending on the sector and the severity of the impact, there may be revisions to regulatory requirements related to risk assessment and business continuity planning. This incident will also play a significant role in how organizations choose vendors in the future. There could be a shift in their vendor management strategy, focusing on diversification, backup solutions, and better contingency planning

Strategies for Evaluating Vendor Impact on Daily Operations

Recent events are just the latest example of a major third party vendor disrupting operations for organizations considered “too big to go down.” Last October’s Okta hack of several Las Vegas, Nevada hospitality properties was another striking example of the unexpected impact of third party vulnerabilities.

To mitigate the risks highlighted by such incidences, organizations need to adopt robust strategies for evaluating the impact of their vendors on daily operations. As modern systems become increasingly interdependent, organizations are expanding their reliance on third party service providers to support critical business functions and operations that are cost prohibitive to manage internally. With that trade off, capturing the value generated by cost-savings and increased efficiency relies primarily on processes that can provide effective oversight aligned with organizations’ formal risk tolerance and appetite metrics. Only then are organizations able to realize the benefits of outsourcing while still protecting their bottom line.

Due Diligence

Organizational due diligence is a critical gate in determining the feasibility of incorporating a new third party resource or product into your organization’s ecosystem. A comprehensive vendor assessment framework is crucial for understanding the potential impact of vendors on business operations. At a basic level, this involves evaluating vendors across several dimensions:

Service Criticality: Assess how critical the vendor’s services are to your core business operations. Categorize vendors based on the criticality and frequency of the services they provide.
Organizational Stability: Review the vendor’s historical performance, including uptime statistics, incident response times, and customer service quality. Ensure the vendor’s financial health to gauge their ability to sustain operations and invest in infrastructure and innovation.
Security Posture: Ensure vendors comply with relevant regulations and standards (e.g., GDPR, ISO 27001). Request and review security certifications and audit reports. Request and review the vendor’s security policies, incident response plans, and SDLC processes documentation. Review the vendor’s plans for addressing any identified vulnerabilities and their timelines for remediation. Assess the vendor’s business continuity and disaster recovery plans to understand their preparedness for handling outages and other disruptions.
Secure Software Development Practices: Ensure the vendor follows secure coding practices and conducts regular code reviews and static code analysis. Where possible, request third-party code review reports to assess the security of the vendor’s code. Verify that developers at the vendor are trained in secure development practices. Ensure that security is integrated into the vendor’s DevOps processes (DevSecOps), including automated security testing as part of the CI/CD pipeline.
Include Specific Contractual Terms: Where possible, negotiate SLAs that include specific terms for update notifications, testing, rollback procedures, and support. Include clauses that address liability and indemnification in case updates cause disruptions or security breaches. Consider alternative products where vendor cooperation is not available if the software in question is not an operational requirement.

While it is often difficult to enforce provisions such as right-to-audit clauses or vendor due diligence activities with large vendors, interdependencies of critical operational systems underscore the need for additional due diligence to accurately determine the risk an organization is accepting when procuring SaaS products from large vendors. In these scenarios, it is crucial to validate that the vendor’s software development and change management practices are sound and designed to mitigate software supply chain risks.

That said, let’s go over additional measures organizations can take to exercise situational awareness when performing due diligence, especially when reviewing the capabilities of large vendors that provide critical services, to ensure we have the necessary information needed to maintain a risk-informed security posture.

Understand the Vendor’s Update Process: Request detailed documentation on the vendor’s update process, including how automatic updates are managed, tested, and deployed. Understand how frequently updates are released and what types of changes they typically include (e.g., security patches, feature enhancements, bug fixes).
Evaluate Testing and Quality Assurance: Verify that the vendor has robust testing and quality assurance processes in place. Ensure updates are thoroughly tested in environments that mimic real-world usage scenarios. Inquire if the vendor has a beta testing phase where updates are tested by a smaller group of users before general release.
Assess Rollback and Recovery Capabilities: Ensure the vendor has a clear and efficient rollback process if an update causes issues. This includes the ability to revert to a previous stable version without significant downtime. Check if the vendor performs regular backups before deploying updates to facilitate quick recovery if needed.
Communication and Transparency: Confirm that the vendor provides advance notifications about upcoming updates, including details about what the update entails and any potential impacts. Ensure the vendor offers real-time alerts when updates are applied, including any issues or disruptions that may arise.
Security and Compliance: Verify that the vendor prioritizes security in their update process, promptly addressing vulnerabilities and ensuring updates do not introduce new security risks. Ensure the vendor’s update process complies with relevant industry standards and regulatory requirements (e.g., GDPR, HIPAA, etc.).
Control and Customization Options: Show preference for vendors that provide an option to defer updates to a more convenient time, allowing for proper testing and scheduling. Check for the capability to selectively apply updates, focusing on critical patches first while deferring non-critical ones.

In keeping with the maxim of “trust, but verify,” many standardized questionnaires are based on Yes/No responses, rather than inviting detailed responses. Good vendor questionnaires should be scoped according to the organization’s unique vendor tiering structure and vendor categories, and ask open-ended questions that encourage meaningful responses. Organizations must have also a strong understanding of their own IT systems and networks and ensure relevant personnel are involved in the vendor due diligence process. Criticality rankings and vendor tiering standards can support organizations’ ability to determine when it is a good idea to get IT and Information Security experts involved.

Due Care: The Internal Aspects of Vendor Risk Management

While due diligence traditionally refers to the precautions we take to identify both known and unknown risks, due care refers to the policies, procedures, and processes that organizations employ to avoid or mitigate known risks. Let’s review what best practices organizations can implement to demonstrate due care.

Governance and Documentation

Define Governance Structures: Create a governance structure that includes a dedicated vendor risk management team or committee to oversee and manage vendor relationships.
Develop and Centrally Manage Policies and Standards: Develop and document a comprehensive vendor risk management policy outlining roles, responsibilities, and procedures. Maintain a centralized repository for all vendor documentation, including contracts, risk assessments, audit reports, and performance evaluations. Regularly update documentation to reflect changes in vendor relationships, risk assessments, and compliance requirements.

Dependency Mapping and Impact Analysis

Identify Critical Services and Components: Create a thorough inventory of all services, applications, and components provided by the vendor that are critical to operations. Categorize these services and components based on their criticality, impact on business operations, and data sensitivity.
Map Dependencies: Identify and document all dependencies between different services and components. This includes understanding how data flows between systems and which services rely on others to function. Map both upstream (services the vendor relies on) and downstream (services that rely on the vendor’s services) dependencies to get a complete picture. Visual diagrams help illustrate the dependencies between services and components. Maintain detailed documentation of all identified dependencies, including descriptions, data flow details, and the criticality of each dependency.
Impact Analysis: Conduct a risk assessment to determine the potential impact of disruptions to each critical service or component. Consider factors such as downtime, data breaches, and regulatory compliance. Identify any single points of failure within the vendor’s services that could pose significant risks to operations.
Regular Reviews and Updates: Regularly review and update dependency maps and documentation to ensure they remain accurate and reflect any changes in the vendor’s services or organizational requirements. Implement a change management process to update dependency maps whenever there are changes to the services or components.

Incident and Continuity Planning

Incident Response Plan: Establish procedures for managing and reporting vendor-related incidents in the organization’s incident response plan. Establish clear incident response protocols with your vendors. This includes defining roles and responsibilities, communication channels, and response timelines in the event of a disruption. Ensure a contact list is readily available for the incident response team, and collaborate with vendors whenever possible in developing an incident communication plan. Include vendor outage exercises in annual tabletops to test these procedures and communication plans.
Business Continuity Planning: Coordinate with vendors to involve them in business continuity and disaster recovery plans and testing. Impact analyses and dependency maps are indispensable here to help teams quickly identify the vendors that should be involved. Integrate contact lists with your organization’s plans. Develop redundancy plans for critical services to ensure continuity in case of vendor service disruptions. This might include alternative vendors, backup systems, or failover mechanisms.
Secondary Vendors: Engage secondary vendors for alternative options of critical services in case of a primary vendor outage.
Data Backup and Replication: Implement robust data backup and replication strategies to avoid critical data loss during vendor outages.
Cross-Training Staff: Train staff to handle critical functions in the absence of vendor support.

Adopting a strategy high-availability systems and applications, redundant architecture, and failover capabilities are effective deterrents for vendor issues. Even so, organizations should exercise due diligence and due care in determining where redundancy is needed. As the saying goes, “when everything is important, nothing is important.” Good dependency mapping supports effective prioritization.

Performing regular business impact analyses on your IT infrastructure, bespoke and third-party applications and systems has many benefits, the outputs can be used as drivers for more robust asset management, risk management, system administration, change management, business continuity and disaster recovery, vulnerability management, and incident response. The benefits cannot be understated, and the outcomes are well worth the effort involved.

Stay Vigilant: Vendor Continuous Monitoring and Review

In addition to due diligence and due care, organizations need to effectively manage existing vendors. With all that goes into the processes outlined above, this can be overwhelming for smaller teams and organizations to manage. Automation can significantly assist in reducing the effort involved in vendor monitoring and review cycles. Let’s review some best practices for vendor continuous monitoring and how automation can support.

Vendor Performance and Security Monitoring: Implement tools and processes to continuously monitor vendor performance. This includes tracking service uptime, response times, security practices and any changes in security posture. Use threat intelligence services to stay informed about emerging threats and vulnerabilities related to the vendor. Implement systems that automatically score vendors based on threat intelligence data and security posture. Use the risk scores to prioritize vendors that need immediate attention or additional scrutiny. These tools can also support due diligence by assessing prospective vendor security for identifying recent incidents or breaches.
Key Performance Indicators (KPIs): Establish and monitor KPIs to track vendor performance and identify areas for improvement. Use APIs to integrate vendor management dashboards with your internal systems and vendor systems to automatically collect data related to KPIs. Use real-time dashboards to visualize KPIs. Tools like Tableau, Power BI can provide interactive and real-time views of vendor performance. Generate custom reports that can be automatically sent to stakeholders at regular intervals.
Feedback Mechanisms: Implement internal mechanisms for providing regular feedback to vendors and addressing any performance issues promptly.

For many organizations, this is the most difficult phase to effectively manage of their vendor risk management lifecycle. Manual performance monitoring and risk assessment are time and resource intensive processes, and when done inefficiently, can quickly swallow up any cost savings that outsourcing business processes to those vendors was meant to realize. Investing in automated solutions and platforms that facilitate continuous monitoring and risk management actions allows your vendor risk management personnel to remain agile and focused on the strategic alignment, remediation escalation for vendor non-compliance or emergent risks, and continuous improvement.

The Fourth Dimension: Fourth-Party Risk

Up until this point, we’ve almost exclusively focused on best practices for third party risk and strategies for managing that risk effectively. There’s another aspect of the CrowdStrike outage that needs to be discussed still, the fourth-party risk.

Fourth-party risks extend the organization’s vulnerability beyond its direct control, making it susceptible to disruptions in its vendors’ supply chains. The intricate web of dependencies can make it difficult to identify and manage risks effectively. Since many organizations rely on a network of vendors and those vendors have their own suppliers, the impact of an outage can cascade through multiple layers of the supply chain. Let’s discuss some strategies organizations can adopt to proactive identify and manage these risks.

Strategies to Manage Fourth-Party Risks:

Enhanced Vendor Due Diligence: When assessing third-party vendors, include questions about their key suppliers and the risk management practices they have in place. Ensure that third-party contracts include obligations for vendors to disclose their critical fourth-party relationships.
Risk Mapping and Assessment: Include fourth-party relationships in dependency maps wherever possible to understand the full extent of potential vulnerabilities. Conduct regular risk assessments that consider the entire supply chain, including fourth-party vendors. Conduct scenario planning exercises that include potential fourth-party disruptions to test your organization’s preparedness. Regularly conduct drills and simulations with your vendors to ensure that everyone is ready to respond effectively to potential incidents.
Visibility and Monitoring: Use tools and technologies that provide better visibility into the entire supply chain, including fourth-party vendors. Implement continuous monitoring to detect potential issues in real-time and respond proactively.
Contractual Safeguards: Include clauses in vendor contracts that require third-party vendors to have robust risk management and business continuity plans for their own suppliers. Ensure that there are clear terms for communication, incident reporting, and remediation in the event of a disruption.
Regulatory Compliance and Standards: Stay informed about regulatory requirements related to third-party and fourth-party risk management. Ensure that your risk management practices align with industry standards and best practices.

Conclusion

In a world where digital disruptions can have far-reaching consequences, robust vendor management is not just a best practice but a necessity. As we’re continually reminded of critical dependencies on third party vendors, organizations should propel their strategies towards more resilient and proactive approaches to managing their vendor ecosystems, ultimately safeguarding their operations and maintaining stakeholder trust.

By adopting comprehensive vendor assessment frameworks, continuous monitoring, and proactive risk management strategies, organizations can mitigate the risks associated with vendor dependencies and ensure the continuity of their operations. Building resilience through redundancy, fostering collaborative vendor relationships, and investing in technology are crucial steps in fortifying an organization’s vendor risk management capability.

How Tevora Can Help

Tevora’s balanced perspective of business insight and deep technical knowledge ties your business requirements to concrete risk management outcomes designed to give you and your organization clarity, confidence, and peace of mind. Our experienced consultants bring efficiency, specialized expertise, adaptability, and operate as an extension of your team to rapidly develop your organization’s vendor and supply chain risk management capabilities.

About the Author

Anir Desai and Bill Kachersky are a part of Tevora’s Strategic Services, Third-Party Risk Management Team

Discover in-depth compliance resources and featured events

Webinar

A Unique Approach to Cybersecurity Risk Models

Blog

Stopping Ransomware By Managing Third-Party Risks

Webinar

September 10, 2024

The Worst Case Scenario: What Happened in the CrowdStrike Outage?

By Anir Desai and Bill Kachersky

Implications of the Outage

Immediate Impact

Long-term Impact

Strategies for Evaluating Vendor Impact on Daily Operations

Due Diligence

Due Care: The Internal Aspects of Vendor Risk Management

Governance and Documentation

Dependency Mapping and Impact Analysis

Incident and Continuity Planning

Stay Vigilant: Vendor Continuous Monitoring and Review

The Fourth Dimension: Fourth-Party Risk

Strategies to Manage Fourth-Party Risks:

Conclusion

How Tevora Can Help

About the Author

Discover in-depth compliance resources and featured events

A Unique Approach to Cybersecurity Risk Models

Stopping Ransomware By Managing Third-Party Risks

Communicating Cyber Risk