Very few people live day-to-day expecting a disaster. However, disasters do happen and a critical component of business continuity is ensuring that when disaster strikes, our critical systems are returned to normal operations as quickly and efficiently as possible.
Last month’s training tips for ships discussed a catastrophic fire that occurred in France last month at one of the world’s largest data centres. This fire took millions of websites offline including the Learning Management System (LMS) for one of the world’s largest cruise lines. LMSs are business-critical in that they are relied upon to ensure compliance and safety. This example should immediately cause us to reflect on our business-critical training systems, their importance to our operations, and whether we have a comprehensive and reliable disaster recovery process in place for those systems.
Proper disaster recovery planning requires organizational leadership and action. There should be a disaster recovery team that’s responsible for building and continually improving the Disaster Recovery Plan (DRP). That team should identify and assess risks, determine which applications and data are critical to operations, specify processes for backup and recovery, and continually test and update the DRP.
In this article we will focus on one of the core parts of a DRP for our LMS and other technical systems: how do we protect our critical data from loss due to disaster.
Our historical learning data serves as both evidence of compliance and the basis on which all future training is planned. It’s therefore critical to operations. Typically, it resides in a combination of our LMS and our crewing or human resource systems. How do we protect this data from loss?
The first goal is to reduce the likelihood of loss wherever possible. Some of the highest risks are hardware failure, cyberattack, and human error. All enterprise-level servers should be configured with a redundant storage architecture such that if one or a small number of disks fail, the system continues to operate, and no data is lost. Another critical component in reducing the possibility of data loss is strict data security processes. Cyber security is a growing concern because cyber attacks which either encrypt or divulge corporate data have become distressingly common. No organization is immune. This is a complex topic in itself but the basic practices of dual-factor authentication, unique and strong passwords, and system segregation are all important here. Finally, comprehensive training and data access policies are also important in reducing the likelihood of data loss resulting from human error.
While attention to the above and other risks is critical, it is never the case that data loss can be prevented absolutely. Therefore, every DRP incorporates a process for the backup and recovery of data.
One of the core questions that must be asked is “how much data can you afford to lose”? This provides guidance on how often data should be backed up. In general, most organizations cannot afford to lose very much in the way of learning records. Therefore, backups should be frequent. At Marine Learning Systems our backup policy ensures that no more than 15 minutes of data could ever be lost. And in practice, a failure would result in a loss window which is typically much smaller. To ensure this, automated backups are taken every 15 minutes, or more frequently, and are immediately transferred to a location which is geographically distant from the live system. Automated processes are used to test and report on the integrity of the backed-up data daily. We also use a similar automated process to restore this data daily on an operational “hot backup” server. This is a redundant, geographically distant server which is always online and running, ready to take over in the event that the primary server suffers a catastrophic failure. Not only does this provide us with a failover option when necessary, but it serves the dual purpose of fully testing our backup and restore process from end to end every day.
This policy and practice, while it takes time and resources to plan and maintain, served us extremely well when our emergency response team was alerted to the fire in France last month.
Disaster recovery planning is a large and complex topic, made more complex by the decentralized and (often) disconnected nature of maritime operations. However, the goal here is to raise awareness of the need for a proper DRP, and to highlight some of the critical issues that need to be considered for our training infrastructure.
Until next month, sail safely.
TAKE THE MARTID SURVEY
MarTID is an annual survey of maritime training practices, a non-commercial survey conducted jointly between the World Maritime University, Marine Learning Systems and New Wave Media, publishers of Maritime Reporter & Engineering News.
This is the 4th annual survey, and the results are available free, globally. The survey takes only 30 minutes to complete, and solicits insights from three primary groups:
The success of the survey is dependent purely on growing the response, and we have done so previously, doubling response yearly. I respectfully request that you forward this email to the person in your organization that is best suited to respond.
Click the category that best suits you and you will be taken directly to the survey.