What is RAID (Redundant Arrays of Independent Disks)?

RAID or redundant array of independent disks is a data storage virtualization technology that combines multiple physical disk drive components into one or more logical units for data redundancy, performance improvement, or both.

It is a way of storing the same data in different places on multiple hard disks or solid-state drives to protect data in the case of a drive failure. A RAID system consists of two or more drives working in parallel. These can be hard discs, but there is a trend to use SSD technology (Solid State Drives).

RAID combines several independent and relatively small disks into single storage of a large size. The disks included in the array are called array members. The disks can combine into the array in different ways, which are known as RAID levels. Each of RAID levels has its own characteristics of:

Fault-tolerance is the ability to survive one or several disk failures.
Performance shows the change in the read and writes speed of the entire array compared to a single disk.
The array's capacity is determined by the amount of user data written to the array. The array capacity depends on the RAID level and does not always match the sum of the RAID member disks' sizes. To calculate the particular RAID type's capacity and a set of member disks, you can use a free online RAID calculator.

RAID systems can use with several interfaces, including SATA, SCSI, IDE, or FC (fiber channel.) Some systems use SATA disks internally but that have a FireWire or SCSI interface for the host system.

Sometimes disks in a storage system are defined as JBOD, which stands for Just a Bunch of Disks. This means that those disks do not use a specific RAID level and acts as stand-alone disks. This is often done for drives that contain swap files or spooling data.

How RAID Works

RAID works by placing data on multiple disks and allowing input/output operations to overlap in a balanced way, improving performance. Because various disks increase the mean time between failures (MTBF), storing data redundantly also increases fault tolerance.

RAID arrays appear to the operating system as a single logical drive. RAID employs the techniques of disk mirroring or disk striping.

Disk Mirroring will copy identical data onto more than one drive.
Disk Striping partitions help spread data over multiple disk drives. Each drive's storage space is divided into units ranging from 512 bytes up to several megabytes. The stripes of all the disks are interleaved and addressed in order.
Disk mirroring and disk striping can also be combined in a RAID array.

In a single-user system where significant records are stored, the stripes are typically set up to be small (512 bytes) so that a single record spans all the disks and can be accessed quickly by reading all the disks at the same time.

In a multi-user system, better performance requires a stripe wide enough to hold the typical or maximum size record, allowing overlapped disk I/O across drives.

Levels of RAID

Many different ways of distributing data have been standardized into various RAID levels. Each RAID level is offering a trade-off of data protection, system performance, and storage space. The number of levels has been broken into three categories, standard, nested, and non-standard RAID levels.

Standards RAID Levels

Below are the following most popular and standard RAID levels.

1. RAID 0 (striped disks)

RAID 0 is taking any number of disks and merging them into one large volume. It will increase speeds as you're reading and writing from multiple disks at a time. But all data on all disks is lost if any one disk fails. An individual file can then use the speed and capacity of all the drives of the array. The downside to RAID 0, though, is that it is NOT redundant. The loss of any individual disk will cause complete data loss. This RAID type is very much less reliable than having a single disk.

There is rarely a situation where you should use RAID 0 in a server environment. You can use it for cache or other purposes where speed is essential, and reliability or data loss does not matter at all.

2. RAID 1 (mirrored disks)

It duplicates data across two disks in the array, providing full redundancy. Both disks are store exactly the same data, at the same time, and at all times. Data is not lost as long as one disk survives. The total capacity of the array equals the capacity of the smallest disk in the array. At any given instant, the contents of both disks in the array are identical.

RAID 1 is capable of a much more complicated configuration. The point of RAID 1 is primarily for redundancy. If you completely lose a drive, you can still stay up and running off the other drive.

If either drive fails, you can then replace the broken drive with little to no downtime. RAID 1 also gives you the additional benefit of increased read performance, as data can read off any of the drives in the array. The downsides are that you will have slightly higher write latency. Since the data needs to be written to both drives in the array, you'll only have a single drive's available capacity while needing two drives.

3. RAID 5(striped disks with single parity)

RAID 5 requires the use of at least three drives. It combines these disks to protect data against loss of any one disk; the array's storage capacity is reduced by one disk. It strips data across multiple drives to increase performance. But, it also adds the aspect of redundancy by distributing parity information across the disks.

4. RAID 6 (Striped disks with double parity)

RAID 6 is similar to RAID 5, but the parity data are written to two drives. The use of additional parity enables the array to continue to function even if two disks fail simultaneously. However, this extra protection comes at a cost. RAID 6 has a slower write performance than RAID 5.

The chances that two drives break down at the same moment are minimal. However, if a drive in a RAID 5 system died and was replaced by a new drive, it takes a lot of time to rebuild the swapped drive. If another drive dies during that time, you still lose all of your data. With RAID 6, the RAID array will even survive that second failure also.

Nested RAID levels

Some RAID levels are referred to as nested RAID because they are based on a combination of RAID levels, such as:

1. RAID 10 (1+0)

This level Combines RAID 1 and RAID 0 in a single system, which offers higher performance than RAID 1, but at a much higher cost.

This is a nested or hybrid RAID configuration. It provides security by mirroring all data on secondary drives while using striping across each set of drives to speed up data transfers.

2. RAID 01 (0+1)

RAID 0+1 is similar to RAID 1+0, except the data organization method is slightly different. Rather than creating a mirror and then striping the mirror, RAID 0+1 creates a stripe set and then mirrors the stripe set.

3. RAID 03 (0+3, also known as RAID 53 or RAID 5+3)

This level uses striping similar to RAID 0 for RAID 3's virtual disk blocks. This offers higher performance than RAID 3 but at a higher cost.

4. RAID 50 (5+0)

This configuration combines RAID 5 distributed parity with RAID 0 striping to improve RAID 5 performance without reducing data protection.

Non-standard RAID levels

Non-standard RAID levels vary from standard RAID levels, and they are usually developed by companies or organizations for mainly proprietary use, such as:

1. RAID 7

A non-standard RAID level is based on RAID 3 and RAID 4 that adds caching. It includes a real-time embedded OS as a controller, caching via a high-speed bus, and other stand-alone computer characteristics.

2. Adaptive RAID

This level enables the RAID controller to decide how to store the parity on disks. It will choose between RAID 3 and RAID 5, depending on which RAID set type will perform better with the kind of data being written to the disks.

3. Linux MD RAID 10

The Linux kernel provides this level. It supports the creation of nested and non-standard RAID arrays. Linux software RAID can also support standard RAID 0, RAID 1, RAID 4, RAID 5, and RAID 6 configurations.

Implementation of RAID

The distribution of data across multiple drives can manage either by computer hardware or by software.

1. Hardware-based

Hardware-based RAID requires a dedicated controller installed in the server. Hardware RAID controllers can be configured through card BIOS or Option ROM before an operating system is booted. And after the operating system is booted, proprietary configuration utilities are available from each controller's manufacturer.

Hardware RAID is created using separate hardware. There are two options:

AN inexpensive RAID chippossibly built into the motherboard.
More expensive option with a complex stand-alone RAID controller. These controllers can be equipped with their CPU, battery-backed up cache memory, and typically hot-swapping.

A hardware-based RAID card does all the RAID array(s) management, providing logical disks to the system with no overhead on the part of the system itself. Additionally, hardware RAID can offer many different types of RAID configurations simultaneously to the system. This includes providing a RAID 1 array for the boot and application drive and a RAID-5 array for the large storage array.

Some other operating systems have implemented their own generic frameworks for interfacing with any RAID controller and provide tools for monitoring RAID volume status. A hardware RAID has some advantages over a software RAID, such as:

It doesn't use the CPU of the host computer.
It allows users to create boot partitions.
It handles errors better since it communicates with the devices directly.
It supports hot-swapping.

2. Software RAID

Software RAID is an included option in all of Steadfast's dedicated servers. This means there is NO cost for software RAID 1 and is highly recommended if you're using local storage on a system. It is highly recommended that drives in a RAID array be of the same type and size. Software RAID is one of the cheapest RAID solutions.

Software-based RAID will control some of the system's computing power to manage the RAID configuration. If you're looking to maximize a system's performance, such with a RAID 5 or 6 configurations, it's best to use a hardware-based RAID card when you're using standard HDDs.

Many modern operating systems provide software RAID implementations. Software RAID can be implemented as:

A layer that abstracts multiple devices, thereby providing a single virtual machine.
A layer that sits above any file system and provides parity protection to user data.

If a boot drive fails, the system must be sophisticated enough to boot from the remaining drive or drives.

There are certain limitations on the use of the software RAID to boot the system. Only RAID 1 can contain boot partition, while system boot is impossible with software RAID 5 and RAID 0.

In most cases, software RAID doesn't implement the hot-swapping, and so it cannot be used where continuous availability is required.

Benefits of RAID

Benefits of RAID include the following.

An improvement in cost-effectiveness because lower-priced disks are used in large numbers.
The use of multiple hard drives enables RAID to improve the performance of a single hard drive.
Increased computer speed and reliability after a crash depending on the configuration.
There is increased availability and resiliency with RAID 5. With mirroring, RAID arrays can have two drives containing the same data. It ensures one will continue to work if the other fails.

Drawbacks of RAID

RAID has the following drawbacks or disadvantages:

Nested RAID levels are more expensive to implement than traditional RAID levels because they require many disks.
The cost per gigabyte of storage devices is higher for nested RAID because many of the drives are used for redundancy.
When a drive fails, the probability that another drive in the array will also soon fail rises, which would likely result in data loss. This is because all the drives in a RAID array are installed simultaneously. So all the drives are subject to the same amount of wear.
Some RAID levels, such as RAID 1 and 5, can only sustain a single drive failure.
RAID arrays are in a vulnerable state until a failed drive is replaced and the new disk is populated with data.
When RAID was implemented, it takes a lot longer to rebuild failed drives because drives have much greater capacity.

However, nested RAID levels address these problems by providing a greater degree of redundancy, significantly decreasing the chances of an array-level failure due to simultaneous disk failures.

Next TopicBlood Group Types

← prev next →