Leaders are meant to strengthen their businesses to be ready for tough challenges, so tackling actual and potential disruption will be a critical part of leadership in 2025.
Leaders are meant to strengthen their businesses to be ready for tough challenges, so tackling actual and potential disruption will be a critical part of leadership in 2025.
Some 91 per cent of UK respondents in our recent global survey felt that it’s not a matter of ‘if’ but ‘when’ a service disruption will happen. 88 per cent of executives expect an incident as large as the worldwide IT outage of July 2024 to happen again within the next year.
IT outages are either frustrating inconveniences or at worst, downtime that risks revenue and reputation. If a customer is looking to buy an item online, and the website of their go-to retailer is down, it might be the case that they will switch to a competitor. If it happens multiple times, this will start building a preference for the alternative and reputational damage.
Losing that brand loyalty can kill a business over time, which is why it’s important for leaders to learn from the challenges and developments and respond. During incidents the focus is on stabilising the situation and sharing a message.
Businesses can miss the opportunity incidents present: learning. If there can be any benefits taken from the 2024 outage, it certainly highlighted the importance of robust software updates testing and the cascading consequences of relying on a single point of failure in critical IT environments.
Leaders have seen that preparedness for a wide range of digital service shocks is required to keep business operations on track.
Build resilience
Respondents (86%) felt that firms are not focused enough on service disruptions, being too concerned with security. One of the best opportunities after an incident or preparing for potential risks is to build more experts across all facets of the risk area.
Businesses can hold their own incident review for internal learning and improvement, but don’t view this as a single point in time. There’s the one-day and 30-day perspective. Reporting soon after the incident means everyone is likely still in ‘learning mode’, not yet ready for making strategic plans. Moreover, the experts who managed the incident might not be the right people to review it - they have tunnel vision.
Ensuring post-incident reviews take place drives understanding and perspectives between business executives and technical and customer care employees at the ‘sharp edge’. An incident analysis won’t provide great recommendations on ROI if it doesn’t see the whole picture and where the whole spectrum of pressures lies.
For more than a third (37%) of survey respondents, July’s global IT outage impacted their business and resulted in lost revenue or an inability to process sales transactions. 39% also faced delayed response times for customer or internal requests. Any future disaster that merely replicated the scale of July’s would be bad enough — with grounded flights, delayed medical appointments, retail and public sector service disruption — but worse is possible.
What comes next
Based on the survey findings, business and IT leaders have realized that they’ve been prioritizing security at the expense of preparing to prevent service disruptions, and counting the costs incurred, it has led to them considering making changes.
A majority (55%) have observed a mindset shift towards continually evaluating and improving preparedness instead of a one-time move into investments in new systems or protocols which are now complete (45%).
100% of respondents reported a heightened focus on preparing for future service disruptions. One key area of action has been increasing budgets for technology solutions (41%). In the UK, most (52%) executives favour using AI tools for proactive prevention of service disruptions. Beyond financial investment, many leaders are also improving communication about preparedness protocols.
SMEs as much as any other business must strengthen their operational resilience. This is their ability to predict, respond to, and prevent unplanned IT work to drive reliable customer experiences and protect revenue.
Resilience is measured in terms of reduced customer impact: downtime and service degradation. IT metrics include mean-time-to-acknowledge (MTTA), mean-time-to-resolve (MTTR), service level objectives (SLOs), but the main issue at stake is how small an impact the customer feels and potential revenue loss.
Here are three recommendations to grow an organisation’s resilience:
So, with operational complexity and IT incidents rising together, leaders need to clearly articulate their priorities and empower their business teams with the solutions and licence to efficiently deliver. And, they must map out a vision that increasingly leverages AI and automation in ways that add value to their business and their customer goals.
Eduardo Crespo, VP EMEA, PagerDuty
Thanks for signing up to Minutehack alerts.
Brilliant editorials heading your way soon.
Okay, Thanks!