Business Continuity Strategy: Difference between revisions

From Supporting Role Wiki
Jump to navigationJump to search
No edit summary
 
(9 intermediate revisions by the same user not shown)
Line 1: Line 1:
This page documents all of the procedures and processes we have in place that can be used to ensure that our clients' businesses can survive a disaster, at least from an IT perspective.
This page documents all of the procedures and processes we have in place that can be used to ensure that your business can survive a disaster, at least from an IT perspective.


==Data Backups==
==Data Backups==
Forget About IT Ltd currently has three backup servers in two separate locations. Every night, at approximately 10pm, our servers start to copy all the changes that have happened to your data since the previous backup. Should this backup be interrupted, it will automatically try again after a short period of time, for a number of attempts. The process is repeated for at least one other out of the three backup servers, so that your data is backup up onto at least two backup servers.

Once the backup has finished, the backup servers take a copy and date it. The daily backups are kept for a week. One backup a week is used as the weekly backup, and is kept for a month, and one backup a month is kept for a year. So at the end of a full year's rotation, we will have, on at least two different servers, at least 20 snapshots of your data (6 x daily, 4 or 5 x weekly, 12 x monthly).
===Mission Critical data===
===Mission Critical data===
Because we charge for the amount of data we store for you at our data centres, it is the client's choice as to what they consider to be mission critical data. However, unless a client specified otherwise, we back up all the data on the server.
===Archive data===
===Archive data===
As all business accumulate data, eventually there comes a time when it needs to be archived off. We can provide areas on the server that are not backed up to our off-site storage, but in order to make sure the data is safe, we recommend that the client purchases a couple of external hard drives, which the backup program will use to carry out a local backup of that archive area. The client then swaps those drives on a weekly basis, taking the other off site somewhere, like their home.
===Recycle Bin===
Enabled on a per-share basis, we can configure a network Recycle Bin, which will store a copy of any deleted files for 32 days. This allows bridges the gap when you delete a file that was created after the most recent backup.
===Large Files===
Because it is possible for files to be too large to be uploaded overnight, files over 1GB in size are only backed up at the weekend, when we can leave the backups running long enough to complete the task before the next backup is due.

Files over 10GB are not currently backed up remotely for the same reasons. However, our backup program logs what files it has not backed up, so it is very easy for us to determine if there are files on your server not being backed up due to their size.

As upload bandwidth improves over time, we review these sizes on a regular basis.
===Bandwidth===
we normally limit the amount of upload bandwidth we use on a client's broadband to 25% of its capacity. At the weekend we increase this to 50%. This means that when the backup program is running, it does not impact the the normal traffic to and from the Internet.
===Virtual Machines===
We have installed virtual computers on our servers for some of our clients. Because of the size of the image files, we do not back these up remotely. However, we set up special scripts in our backup program that can access the mission critical data, and add it to the backup.
===Data Recovery===
Thankfully, full metal recoveries are rare. Much more usual are accidental deletions, or using "save" instead of "save as". we can either email you the file, so that you can re-save it yourself, or we can recover the file(s) to their original location, or a new location. Obviously the larger the file, the longer it will take to restore.
===Test Restores===
We are happy to carry out test restores for you at any time. Should you wish to test our backup system, then I suggest you create a file and save it somewhere on the server that also contains mission critical data, and a couple of days later, delete it and ask us to recover it.
==Hardware==
==Hardware==
All the hardware we use, including the servers we install at client's premises, follow a common format. These means that even when we buy servers from a different manufacturer, they remain compatible with each other. This means that in the even of a hardware failure, we can simply turn up with a spare server, swap the hard drives over, and the client is back up and running again.
===RAID===

If the hard drives have also been damaged, then depending of the clients preference, we can either download the backup data onto the spare server before we bring it over, or bring the server over with a base install, and then prioritise what data is downloaded first depending on the client's requirements.

We can also take the replacement server to alternative premises, along with a small network, or set it up at our office or our server farm, to be used remotely.
===Mirrored Hard Drives===
All of our servers used mirrored hard drives. That means that the system can tolerate a hard drive failure without any data being lost, and more importantly, no interruption to the server. If this happens to your server, we would simply turn up with a new hard drive.
===Spares===
===Spares===
As previously mentioned, we hold a number of spare servers in stock. we normally keep a 10% ratio of spare servers to production servers.
===Duty of Care===
The only thing we require of our clients is a duty of care over the physical well being of the server. We always use quiet servers, as they are usually located under a desk or in a corner of an office, as most of our clients do not have the space (nor do they need) for a dedicated server room. However, as such it can be exposed to spilt drinks, localised heat sources like a fan heater, etc.

Should the failure of the server be linked to one of these sorts of issues, we would normally expect to charge the client for the repair. We also expect the client to cover the cost of replacement hardware and our time in the event a major incident, such as a fire, in their insurance.
==Email==
==Email==
Assuming you allow us to route your email via our servers, we have the following systems in place that are used automatically whenever your main email server is unavailable.
===Relay Servers===
===Relay Servers===
These alternative servers are permanently configured in the email settings for your domain. Whenever your mail server becomes unavailable, these servers will accept the email on its behalf, and will queue it up until the main server becomes available again.
==="Panic" Server===
==="Panic" Server===
Also permanently set up, whenever an email is processed by one of our mail relay servers, a copy is taken and sent to what we refer to as our "Panic" email server. This server can be used to redirect email to an alternative destination, such as a private email address, as well as being able to access the mail via webmail. This server also keeps copies for 30 days.
===Transport Map===
A transport map is a way of delivering queued email to a different destination, so in the event of the mail server likely to be unavailable for some time, we can set up an alternative server.
==Security==
==Security==
Everything we do, we do with one eye on security.
===Traffic Encryption===
===Traffic Encryption===
All of the external traffic to and from the server can be encrypted. We can enforce encryption in many situations, so that none of your traffic is sent in clear text. This stops public wireless packet sniffing and a whole range of other snooping. You may think you have nothing worth stealing as far as data is concerned, but to the dark side, even an empty server on the other side of the world can be used as a jumping off point for other attacks, spam and denial of service attacks.
===Data Encryption===
===Data Encryption===
As an option, we can encrypt the partitions your data is stored on. This would mean that if the server was stolen, when it was powered back up, the thieves would not be able to access the data. However, we do not do this automatically as it has it's downside, since a server reboot can mean your data is not available until the key is entered.
===Cloud===
Although your data is backup up off-site, we do not, and will never use what people often refer to as "The Cloud". For many years we have been operating what is now described as a private cloud, but basically is our own physical servers. Nobody other than us has access to them. we know where they are, and under whose jurisdiction they fall under. None of your off-site data ever leaves the UK, and we can guarantee that.
==Remote Access==
Although most often used on a day to day basis for remote workers, remotely accessing your data is a valuable tool when it comes to business continuity.
===Email===
Email is always available, and we configure all laptops and mobile devices to be able to use email both from inside and outside the network.
===VPN===
We have two different types of Virtual Private Networking, which is a way of being on the office network from anywhere in the world. You can access all the files you can access in the office, and even send print jobs to the office printer.
===WebDAV===
This is a method of being able to access your files over the Internet, without the need for a VPN. Whereas a VPN is a single connection that can be used for almost any sort of traffic, WebDAV can only be used for access files. The traffic is encypted, just like VPN traffic.
===Web Interface===
The lowest common denominator in terms of remote access, it is a way of access your email, calendar, tasks, files, address book, often in read only mode, but requires no configuration on the client computer, so can be used at a moment's notice.
==Potential Scenarios amd solutions==
The scenarios below are to give some idea of what we would do when something happens. The biggest problem is not what to do, but when to do it, as there is a tendency to wait until a problem is resolved. Sometimes putting a solution in place can make the downtime longer than the length of the problem, so it is equally important not to over-react.

Undoubtedly the most important fact is to know how long the problem will last.
===Loss of Internet Connectivity for 2 hours===
If staff primarily work from the same office as the server, then they will b e able to continue working on files without any downtime. However, depending on where the email server is, there are a couple of possible solutions.
*If the email server is hosted locally, then a senior member of staff could access the panic email system and monitor incoming emails. To do this they would need an alternative internet connection, either using a mobile phone in tethered mode, or perhaps using the outage as an excuse to sit in the local coffee shop for a couple of hours.
*:If the email server is internal, then after an hour of queuing, the relay mail servers will inform the sender that the email has not yet arrived. By default the mail is queued for 7 days, but we can change that. Once the internet connectivity is re-established, the queued emails will be send down to the main server.
*If the email server is remotely hosted, then emails could be monitored on mobile phones, or via laptops with mobile dongles.
===Loss of Power for 24 hours===
If the email server is locally hosted, this generally means that we have to get the panic email server to forward emails onto alternative email addresses, or set up temporary mailboxes for the more important messages. As far as data is concerned, we would suggest coping the most information files from the backup onto an external hard drive, memory sticks, FTP site or other method of getting the data to the users who need it.
===Restricted access to premises for 2 days===
If you cannot get access tot he building, but the server is working fine and can be accessed from outside via the broadband line, then staff can use one of the assorted remote access methods for getting access. For example using web-mail to get to their email, and WebDAV to be able to work on files. All of this could be done from a home computer or local Internet café.
===Catastrophic fire===
As soon as we were informed as to the scale of the problem, we would start preparing a spare server, and liaise with you as to where the server should be located. Once the replacement server was set up, it could be accessed remotely in the same was as if you had no access to the premises. If you have managed to arrange temporary premises, we would bring the server and set up a small network for you.

Latest revision as of 16:45, 7 March 2014

This page documents all of the procedures and processes we have in place that can be used to ensure that your business can survive a disaster, at least from an IT perspective.

Data Backups

Forget About IT Ltd currently has three backup servers in two separate locations. Every night, at approximately 10pm, our servers start to copy all the changes that have happened to your data since the previous backup. Should this backup be interrupted, it will automatically try again after a short period of time, for a number of attempts. The process is repeated for at least one other out of the three backup servers, so that your data is backup up onto at least two backup servers.

Once the backup has finished, the backup servers take a copy and date it. The daily backups are kept for a week. One backup a week is used as the weekly backup, and is kept for a month, and one backup a month is kept for a year. So at the end of a full year's rotation, we will have, on at least two different servers, at least 20 snapshots of your data (6 x daily, 4 or 5 x weekly, 12 x monthly).

Mission Critical data

Because we charge for the amount of data we store for you at our data centres, it is the client's choice as to what they consider to be mission critical data. However, unless a client specified otherwise, we back up all the data on the server.

Archive data

As all business accumulate data, eventually there comes a time when it needs to be archived off. We can provide areas on the server that are not backed up to our off-site storage, but in order to make sure the data is safe, we recommend that the client purchases a couple of external hard drives, which the backup program will use to carry out a local backup of that archive area. The client then swaps those drives on a weekly basis, taking the other off site somewhere, like their home.

Recycle Bin

Enabled on a per-share basis, we can configure a network Recycle Bin, which will store a copy of any deleted files for 32 days. This allows bridges the gap when you delete a file that was created after the most recent backup.

Large Files

Because it is possible for files to be too large to be uploaded overnight, files over 1GB in size are only backed up at the weekend, when we can leave the backups running long enough to complete the task before the next backup is due.

Files over 10GB are not currently backed up remotely for the same reasons. However, our backup program logs what files it has not backed up, so it is very easy for us to determine if there are files on your server not being backed up due to their size.

As upload bandwidth improves over time, we review these sizes on a regular basis.

Bandwidth

we normally limit the amount of upload bandwidth we use on a client's broadband to 25% of its capacity. At the weekend we increase this to 50%. This means that when the backup program is running, it does not impact the the normal traffic to and from the Internet.

Virtual Machines

We have installed virtual computers on our servers for some of our clients. Because of the size of the image files, we do not back these up remotely. However, we set up special scripts in our backup program that can access the mission critical data, and add it to the backup.

Data Recovery

Thankfully, full metal recoveries are rare. Much more usual are accidental deletions, or using "save" instead of "save as". we can either email you the file, so that you can re-save it yourself, or we can recover the file(s) to their original location, or a new location. Obviously the larger the file, the longer it will take to restore.

Test Restores

We are happy to carry out test restores for you at any time. Should you wish to test our backup system, then I suggest you create a file and save it somewhere on the server that also contains mission critical data, and a couple of days later, delete it and ask us to recover it.

Hardware

All the hardware we use, including the servers we install at client's premises, follow a common format. These means that even when we buy servers from a different manufacturer, they remain compatible with each other. This means that in the even of a hardware failure, we can simply turn up with a spare server, swap the hard drives over, and the client is back up and running again.

If the hard drives have also been damaged, then depending of the clients preference, we can either download the backup data onto the spare server before we bring it over, or bring the server over with a base install, and then prioritise what data is downloaded first depending on the client's requirements.

We can also take the replacement server to alternative premises, along with a small network, or set it up at our office or our server farm, to be used remotely.

Mirrored Hard Drives

All of our servers used mirrored hard drives. That means that the system can tolerate a hard drive failure without any data being lost, and more importantly, no interruption to the server. If this happens to your server, we would simply turn up with a new hard drive.

Spares

As previously mentioned, we hold a number of spare servers in stock. we normally keep a 10% ratio of spare servers to production servers.

Duty of Care

The only thing we require of our clients is a duty of care over the physical well being of the server. We always use quiet servers, as they are usually located under a desk or in a corner of an office, as most of our clients do not have the space (nor do they need) for a dedicated server room. However, as such it can be exposed to spilt drinks, localised heat sources like a fan heater, etc.

Should the failure of the server be linked to one of these sorts of issues, we would normally expect to charge the client for the repair. We also expect the client to cover the cost of replacement hardware and our time in the event a major incident, such as a fire, in their insurance.

Email

Assuming you allow us to route your email via our servers, we have the following systems in place that are used automatically whenever your main email server is unavailable.

Relay Servers

These alternative servers are permanently configured in the email settings for your domain. Whenever your mail server becomes unavailable, these servers will accept the email on its behalf, and will queue it up until the main server becomes available again.

"Panic" Server

Also permanently set up, whenever an email is processed by one of our mail relay servers, a copy is taken and sent to what we refer to as our "Panic" email server. This server can be used to redirect email to an alternative destination, such as a private email address, as well as being able to access the mail via webmail. This server also keeps copies for 30 days.

Transport Map

A transport map is a way of delivering queued email to a different destination, so in the event of the mail server likely to be unavailable for some time, we can set up an alternative server.

Security

Everything we do, we do with one eye on security.

Traffic Encryption

All of the external traffic to and from the server can be encrypted. We can enforce encryption in many situations, so that none of your traffic is sent in clear text. This stops public wireless packet sniffing and a whole range of other snooping. You may think you have nothing worth stealing as far as data is concerned, but to the dark side, even an empty server on the other side of the world can be used as a jumping off point for other attacks, spam and denial of service attacks.

Data Encryption

As an option, we can encrypt the partitions your data is stored on. This would mean that if the server was stolen, when it was powered back up, the thieves would not be able to access the data. However, we do not do this automatically as it has it's downside, since a server reboot can mean your data is not available until the key is entered.

Cloud

Although your data is backup up off-site, we do not, and will never use what people often refer to as "The Cloud". For many years we have been operating what is now described as a private cloud, but basically is our own physical servers. Nobody other than us has access to them. we know where they are, and under whose jurisdiction they fall under. None of your off-site data ever leaves the UK, and we can guarantee that.

Remote Access

Although most often used on a day to day basis for remote workers, remotely accessing your data is a valuable tool when it comes to business continuity.

Email

Email is always available, and we configure all laptops and mobile devices to be able to use email both from inside and outside the network.

VPN

We have two different types of Virtual Private Networking, which is a way of being on the office network from anywhere in the world. You can access all the files you can access in the office, and even send print jobs to the office printer.

WebDAV

This is a method of being able to access your files over the Internet, without the need for a VPN. Whereas a VPN is a single connection that can be used for almost any sort of traffic, WebDAV can only be used for access files. The traffic is encypted, just like VPN traffic.

Web Interface

The lowest common denominator in terms of remote access, it is a way of access your email, calendar, tasks, files, address book, often in read only mode, but requires no configuration on the client computer, so can be used at a moment's notice.

Potential Scenarios amd solutions

The scenarios below are to give some idea of what we would do when something happens. The biggest problem is not what to do, but when to do it, as there is a tendency to wait until a problem is resolved. Sometimes putting a solution in place can make the downtime longer than the length of the problem, so it is equally important not to over-react.

Undoubtedly the most important fact is to know how long the problem will last.

Loss of Internet Connectivity for 2 hours

If staff primarily work from the same office as the server, then they will b e able to continue working on files without any downtime. However, depending on where the email server is, there are a couple of possible solutions.

  • If the email server is hosted locally, then a senior member of staff could access the panic email system and monitor incoming emails. To do this they would need an alternative internet connection, either using a mobile phone in tethered mode, or perhaps using the outage as an excuse to sit in the local coffee shop for a couple of hours.
    If the email server is internal, then after an hour of queuing, the relay mail servers will inform the sender that the email has not yet arrived. By default the mail is queued for 7 days, but we can change that. Once the internet connectivity is re-established, the queued emails will be send down to the main server.
  • If the email server is remotely hosted, then emails could be monitored on mobile phones, or via laptops with mobile dongles.

Loss of Power for 24 hours

If the email server is locally hosted, this generally means that we have to get the panic email server to forward emails onto alternative email addresses, or set up temporary mailboxes for the more important messages. As far as data is concerned, we would suggest coping the most information files from the backup onto an external hard drive, memory sticks, FTP site or other method of getting the data to the users who need it.

Restricted access to premises for 2 days

If you cannot get access tot he building, but the server is working fine and can be accessed from outside via the broadband line, then staff can use one of the assorted remote access methods for getting access. For example using web-mail to get to their email, and WebDAV to be able to work on files. All of this could be done from a home computer or local Internet café.

Catastrophic fire

As soon as we were informed as to the scale of the problem, we would start preparing a spare server, and liaise with you as to where the server should be located. Once the replacement server was set up, it could be accessed remotely in the same was as if you had no access to the premises. If you have managed to arrange temporary premises, we would bring the server and set up a small network for you.