After suffering a “network failure” which triggered a system outage that lasted four agonising days (starting in the night of 21st August 2019) for travellers and airport staff alike, KL International Airport (KLIA) seems to have resumed full operations earlier this week.
KLIA Going Fully Manual
The outage affected pretty much all forms of operations – from check-in counters, autogates, flight information displays systems (FIDS), baggage handling system, the MYairports mobile app and even credit card systems at retail and F&B stores – forcing KLIA to go into full “manual mode”, turning to whiteboards and marker pens to disseminate information. Passengers departing from the airport were advised to come at least four hours before their flight departure time.
You can just imagine the chaos that must have ensued both within the KLIA and KLIA2 airports, as well as outside, as rumours started circulating on whether the outages were the direct result of a malicious cyberattack, ransomware, or even conspiracy theories involving foreigners, including three million China nationals, were being smuggled in illegally while systems were down.
Meanwhile, while not as dramatic as events at KLIA, CIMB, one of Malaysia's largest financial services providers, also suffered its own outage. For over two days starting Monday, customers were unable to use both the CIMB Clicks website as well as mobile app due to unexplained “technical issues”. This meant that customers had no online access to their accounts and could not make any transactions during the outage.
Even at the time of writing, while the website seems to be up, customers are still complaining that the mobile app remains inaccessible.
Are these incidents indeed just cases of unfortunate technical issues involving software or hardware failure, or have these companies fallen victim to malicious attacks? The truth is, it doesn’t matter.
Today, we read a lot about the tough SLA requirements modern businesses are expected to fulfil, with RTO and RPO times inching ever closer to zero. With that in mind, what were the expected SLAs that the management at KLIA and CIMB were supposed to meet? We’re pretty sure that three to four days of downtime would be unacceptable, even for smaller businesses.
The costs of downtime may vary considerably across industries. However, according to (conservative) estimates by Gartner, companies stand to lose around $42,000 per hour when systems go down. How much has KLIA and CIMB lost due to their respective fiascos? What about indirect impacts such as negative press, damaged reputation as well as the loss of productivity, data and customers?
Having a disaster recovery and business continuity plan would account for incidents that include natural, technical and human disasters as well as cybercrime or terrorism. If there was a malware or ransomware attack for example, well-prepared businesses should be able to restore from the most recent clean backup fairly quickly with today’s technology and for KLIA, if they really needed to “replace switches” as stated, failover to a disaster recovery site should have been automated within minutes, allowing for the recovery and restoration efforts of the original site to be initiated.
Obviously, both KLIA and CIMB did not take enough measures in terms of DR/BC to prepare and plan ahead for unforeseen circumstances and digital emergencies.
For the management of both companies, heads should roll, penalties should be imposed, but in true Malaysian fashion, most likely they merely have to wait for the furore to die down before carrying on business as usual.