Storage Subsystem Design

This chapter covers the issues and considerations involved in designing storage subsystems. Guidance on actually implementing the ideas set out here can be found in General AIX Storage Management, and Practical Examples.

Introduction

Designing storage subsystems involves evaluating the requirements that the business processes that will be executed on the machine have, in terms of data access and availability. Systems will generally be used for more than one purpose (database applications may compete for resource with word processing and image based applications for example), and it is important to attempt to configure the environment in such a way that each process can perform within required tolerances (these are usually performance and availability related - user response time, and recovery in the event of error or failure for example). As each process will have differing requirements, this task will of necessity involve some compromise. The design of the AIX storage management components, as has been covered in Operating System Software Components, does allow great flexibility in organization. The logical volume manager allows the physical disks to be partitioned and the space thus created organized in different ways to enable performance requirements to be met in one logical volume, and availability requirements in another for example.

The first task is therefore to evaluate the storage requirements of the application set that will be executed on the system in terms of:

  1. Performance requirements
  2. Availability requirements
  3. Recovery requirements
  4. Disk utilization

Each of these areas will now be examined in more detail.

Planning Disk Utilization

The design of the volume group and logical volume organization has a major impact upon performance, availability, and recovery. The first consideration in the process is volume group allocation.

Volume Groups

The most common hardware failure in a storage subsystem is disk failure, followed by failure of adapters and power supplies. When failures of this type occur, recovery will be easier if a sensible volume group design has been implemented. Multiple volume groups should generally be implemented for the following reasons:

The number of volume groups created should therefore be decided based upon consideration of these points.

Physical Volumes

The next consideration should be the number of physical volumes per volume group. This affects quorum checking and mirroring. A volume group with two disks and quorum checking will fail to vary on if the disk with two VGDAs fails (see Quorum Checking for a description of this process). With more than two disks, 51% or more of the VGDAs must become unavailable for the vary on to fail, and data to become inaccessible. This is particularly true in a two disk mirrored system, failure of the two VGDA disk will result in no access, even though a good copy of the data is still available.

Enough physical disks must also be included to support the mirroring strategy required, both in terms of space for the mirrored copies, and number of disks for the policies. If mirroring is to be done across the maximum number of physical volumes possible, for availability purposes, then it makes sense to have at least enough space to ensure the copies are stored on separate physical volumes. A disk failure in this scenario will not impact access to the data.

Logical Volumes

The delineating factors for deciding upon the number of logical volumes to create are basically performance and availability. As many logical volumes should be created as there are different performance and availability requirements. The design of the logical volumes themselves to satisfy these requirements is covered in Planning for Performance, and Planning for Availability. Within this however, there is the consideration of disk space utilization. Depending upon the intended purpose of file systems that will be created within logical volumes, different fragment sizes may be required to optimally utilize the available disk space in the logical volumes. As has been described in Fragmentation, choosing different fragment sizes can significantly improve disk space use. If there is a need for file systems containing many small files, then a logical volume for each file system with different requirements should be created.

File Systems

The primary considerations when creating file systems are as follows:

Having created volume groups and added the required number of physical volumes, the logical volumes and file systems need to be created. There are two basic considerations: performance and availability. Generally, designing for high performance will impact availability, and vice versa. The next two sections look at design from these perspectives.

Planning for Performance

The performance of a disk subsystem is a combination of factors that includes:

Examples of using LVM and file system configuration commands to maximize performance are detailed in A Design Example for Improved Performance.

Planning for Availability

Designing a disk subsystem for availability also involves a number of considerations, including:

Planning Backup Strategies

The rest of this chapter looks at backup strategies, and the elements involved in planning them.

Backup Overview

As soon as the system has been set up and the operating environment configured as required, a backup strategy should be immediately implemented. From this point on, valuable data will be created and stored within the storage subsystem that represents time and effort, and in most cases that supports the business. The organization of the system (operating system data and applications), and the user information created (files and directories) are subject to misadventure, however carefully managed; files can be accidentally erased, and hardware or software faults can destroy some information or even the entire system. For these reasons, it is important to be able to recover the system back to a point at which work can continue. Backing up the system involves making copies of all the information contained in it on a some medium that can be stored separately. The copies can then be used to recreate the system after a failure has been repaired, or information accidentally lost. The information in the system is usually highly dynamic, and therefore frequent copies or updates to the copies (also known as incremental backups) should be taken. The frequency and content of the updates or full backups is unique for each business, and depends upon the rate of change of information and the relative importance of that information. Evaluating this is the process of developing a backup strategy. The following points should be considered:

Backup Planning

There are two main types of backup:

A complete system backup policy should be used when data does not change too often. The backups should be scheduled at a frequency that allows complete recovery of business critical information. For example, if database update runs are done weekly, then a backup after the run each week is sensible.

An incremental policy should be used when information is extremely dynamic. Full system backups are taken at a fixed interval, within which backups of changed information are taken at shorter intervals. The frequency of the incremental backups depends upon the criticality of the information, as in the complete system backup policy. The frequency of the incremental backups depend upon the volumes of data that have changed. As recovery with incremental backup requires reloading the last full backup followed by application of the incremental backups up until the point of failure, the frequency of incremental backups should be set at a value which is a balance between criticality of information and number of incremental backups that will need applying.

Backup Methods

There are several ways of backing up information:

The following commands can be used to implement the backup policy created:

backup
This command allows backup by file name or by file system.
mksysb
This command creates an installable image of the root volume group.
savevg
This command backs up a user volume group.
cpio
This command copies files into and out of archive storage. The cpio format is common across many platforms, and so can be used for exchange of information between systems.
dd
This command will convert and copy information from one device to another. The dd command does not group multiple files in any particular format, it just streams the data, performing any supported conversion from the source to the target device.
tar
This command manipulates archives of files and directories. The tar command will create an archive on the output device, write files and directories to it, and extract them when required.
rdump
This command backs up files by file system to a device on a remote machine.
pax
This POSIX conformant command will read and write tar and cpio compliant archives.

These commands are described in detail in Storage Management Files and Commands Summary. Examples of backing up a system are detailed in Managing Backup and Restore.

Backup Media

So far, backup purposes, policies, and commands have been discussed. This leaves the important topic of the actual media that the backup will be stored on. Tape devices are the most common backup or long term archive medium, though there are still several considerations:

These considerations are discussed in relation to tape technology in Tape Storage.

Summary

This chapter has looked in detail at the planning and design requirements for storage subsystems. Design of the subsystem involves considering the following points:

  1. Disk Utilization

    How the physical subsystem will be organized from the perspective of:

  2. Performance

    For those applications that require high performance, designing the storage subsystem for maximum performance. This involves optimizing:

  3. Availability

    For those applications that require high availability, designing the storage subsystem for maximum availability. This involves optimizing:

  4. Backup

    Designing a strategy that allows for as full a recovery as necessary in the event of failure, for the business to continue. This entails the following: