Operating System Software Components

This chapter is intended to discuss the various software components within the Operating System that are used to enable storage management. A brief overview of the higher level software components available for storage management is also included.

The Operating System

The operating system of a computer has been defined in many ways, but essentially it is a set of software interfaces and functions designed to provide an environment in which the hardware resources of the system can be utilized easily to do work. Within this definition there can be arbitrary levels of complexity, ranging from simple operating systems that allow a single process to execute at a time for a single user, to those that manage multiple processors, large arrays of disk, huge amounts of real memory, as well as many different devices on behalf of hundreds of users, each running several processes concurrently.

The operating system can be subdivided into a number of components, each of which perform essential tasks. This section will focus on those elements related to storage management.

Page Space

An application consists of four main elements:

  1. Library

    The library segment contains shared instructions that perform common functions that will be used by many processes.

  2. Text

    The text segment contains any static information such as text strings, tables, and instructions to the processor.

  3. Data

    The data segment contains variable information that the application will use and modify during the course of its operation.

  4. Files

    Files contain the information that the application will actually process to produce output that the user requires.

The text and data segments form an entity known as the executable and are stored in a file system (file systems and the organization of disk space for their support are described in more detail in Logical Volume Manager, and File Systems), usually on disk. When the user wishes to run the application, the operating system must locate the executable in the file system, and load it into real memory. Any shared libraries required, if not already being used by other applications, and therefore already loaded, must also be loaded.

Memory under AIX is managed by the Virtual Memory Manager, which provides a 52 bit virtual address space (4 petabytes). This space is divided into segments, each of 256MB. Segments can be of several different types:

  1. Working Segments

    Working segments are those pages of memory that contain transient information, such as application data, and shared library code.

  2. Persistent Segments

    Persistent segments are those pages of memory that contain longer term information, such as data files, or application text.

  3. Client Segments

    Client segments are used for NFS files, or data from remote systems.

  4. Log Segments

    Log segments contain meta information used by the journaled file system (see File Systems for more information on logs).

When the application is loaded, the VMM loads the various elements into virtual memory segments (see Figure - Virtual Memory Manager Disk Usage). Application text and file data are loaded into persistent segments, whilst application data and shared libraries are loaded into working segments. If any of the libraries required have already been loaded, then they obviously need not be loaded again.

Virtual memory segments are themselves divided into pages of 4096 bytes each, and the VMM manages the mapping of these pages between real memory, paging space, and disk. The first element of the application loaded is the text segment, and the first few pages of this working segment are mapped to real memory locations. This means that when the operating system loader issues instructions to load the text segment to the required virtual memory segment, the VMM translates the addresses of the first few pages, so that they actually correspond to real memory page frames, and the corresponding pages are therefore loaded into real memory.

As was described in General Concepts, real memory is also logically divided into pages frames, where each frame is a fixed number of bytes of data (again 4096 bytes). When the application is loaded into memory, the virtual memory pages are placed in real memory page frames, which is where they must be for the processor to access information from them. The other pages in real memory will contain other applications including those parts of the operating system currently in use, also mapped by the VMM from the virtual memory locations where they are actually addressed. The VMM maintains a free list of currently unused real memory page frames; frames from this list are used for mapping the incoming virtual memory pages to.


Figure: Virtual Memory Manager Disk Usage

An entry in the process table is created containing essential information regarding the application, such as the address of the current instruction, locks held, and other application specific information. An entry is also put on a process dispatch queue which is used by the operating system scheduler to select the application which will run next.

The code segment, and shared libraries (if necessary) are also loaded in the same manner.

The application eligible to run next is dispatched, which means that the address of the next instruction in the process table is loaded into the processor, and the application is now running. If at some point the application makes a jump to an instruction contained in a page not currently in real memory (or requests a piece of data on such a page,, or calls a library function whose instructions are on a page not currently in real memory) a page fault occurs. In this case, the operating system must copy the page required into real memory for use.

Sooner or later, the application will need to access information from a data file. When the request to open the file is made, the physical file on disk is loaded into a persistent virtual memory segment in the same manner as has been described for the executable.

Each application also has a fixed time slice during which it can use the processor, and once this expires the process is put to the back of the dispatch queue and the next process scheduled. The next process may be part of the operating system loading a new application that a user has started, in which case the new executable is loaded into memory pages.

Relatively quickly, the memory pages will become full, and when a certain threshold is reached, the VMM must act to maintain the number of pages in the free list. To do this, it uses an algorithm to determine which of the in use pages are not likely to be required in the near future. The criteria include:

Pages selected in this way (enough to bring the number of used pages back below the threshold) are paged out, and page frames thus freed added to the free list. The decisions are designed to favor those types of segment most likely to be required in the near future, such as program text data, working segments, and highly accessed information.

The actions taken for paging out (or swapping) vary depending upon the type of segment from which the page comes, as well as its current state. If the page is from a working segment, then it is copied to an area of disk reserved for this purpose known as page space. The VMM changes the mapping to reflect that the page is no longer in real memory, so that if a jump to an address (or request for data at an address) on this page occurs, a page fault will result causing the page to be reloaded. This is the case for any working segment page out. The page is then added to the free list.

If the page is from a persistent segment, then the VMM will check to see whether the page has been altered since it was first loaded, if not then the page is just added to the free list; if a change has happened, then the page is first written out to its original location on disk (usually in a file system). Again, the VMM changes the virtual memory mapping to point to the pages new location back on disk, and then adds the freed real memory page to the free list. Utilization of disk in this fashion is known as single level storage.

Thus page space is used as an extension to real memory for situations where more real memory than is available for working segments is necessary. Persistent memory segments use their original locations on disk as overflow. In this way, the VMM is able to effectively use more real memory than is physically available on the system, as well as provide a huge address space.

There are certain mechanisms for file access such as the mmap() subroutine, or accessing files via shared memory segments that will cause the file to be loaded into a working segment in virtual memory. In these cases, the file information will be treated in the same way as data or libraries and page faults may result in parts being paged out to page space. Applications using these mechanisms will require much more page space, particularly if large files are to be accessed.

The operating system monitors the number of free pages in paging space, and when this falls below a certain level, all applications currently running on the system are informed of the situation via a SIGDANGER signal. If the number of free pages should then fall further, below a second threshold, those processes using the most paging space will be sent the SIGKILL signal. This will continue until the number of free pages has risen above the danger level. Well behaved applications will trap the SIGDANGER signal, and upon receipt, free up as much page space as they can by releasing resources.

As can be seen, the allocation of paging space is critical to the operation of the computer system. Furthermore, the design of the storage subsystems generally will have a big impact due to the concept of single level storage. These considerations are taken into account in General AIX Storage Management and Storage Subsystem Design, which cover design and management.

Device Drivers

Whenever an operating system component or an application needs to interact with a physical device, such as a disk or tape, it does so through services provided by another element of the operating system known as a device driver. Device drivers provide a common mechanism for accessing the devices that they support.

Device drivers are treated as though they are ordinary files; thus when a process needs to communicate with a device, it uses the open subroutine to initialize the interface, and can then use the read and write calls to access data. Control of the device is accomplished using the ioctl calls, and when the task requiring the device is complete, the interface is ended with the close subroutine.

There are two main types of device driver; those designed for character oriented devices, such as terminals, and printers, and those designed for block devices, such as disks or tapes. Storage management devices are usually block devices, as this is more efficient for transfer of larger quantities of information.

Device drivers normally consist of two parts:

  1. The top half

    This half is responsible for interacting with requestors, blocking and buffering data, queuing up requests for service, and handling error recovery and logging.

  2. The bottom half

    This half is responsible for the actual I/O to and from the device, and can perform some preprocessing on requests, such as sorting reads from disks for maximum efficiency.

Device drivers provide the primary interface to devices. The logical volume manager, which is discussed in the next section also provides a device driver interface to higher level software, and makes use of the services provided by device drivers to communicate with the physical devices themselves. A fuller understanding of device drivers is really only necessary for those readers who intend to develop applications that will interact directly with storage devices, such as tape or optical library managers. More information can be found in the "Device Driver Concepts Overview" in the InfoExplorer* online hypertext documentation.

Logical Volume Manager

The logical volume concept defines a higher level interface transparent to applications and users, that allows the division, allocation, and management of fixed disk storage space. This concept is implemented as a set of operating system commands, subroutines, device drivers, and tools that are collectively known as the Logical Volume Manager (LVM). This section is relevant to both AIX Version 3 and AIX Version 4, though any differences will be highlighted.

Logical Volume Manager Terminology

There are a number of specialized terms used to describe the various entities that comprise Logical Volume Management.

Physical Volumes

The physical disk drive itself forms the basis of Logical Volume Management. Before a disk can be used by the system, it must be defined. Each disk is assigned certain configuration and identification information that together define the disk as a Physical Volume (PV). This information is physically recorded on the disk and includes a Physical Volume Identifier (PVid) that uniquely identifies it. The disk is also assigned a physical volume name, typically hdiskx where x is a system unique number. This physical volume name is also used for the low level device driver interface to the disk (for example /dev/hdisk0).

Volume Groups:

A collection of between 1 and 32 physical volumes is known as a Volume Group (VG). When physical volumes are created, they must be added to a volume group in order to be used. A physical volume can only be in one volume group on a system, though there can be multiple volume groups. Volume group information includes a unique Volume Group Identifier (VGid), and the PVids of all physical volumes in the volume group, as well as various status information. Each disk in the volume group has an area on disk known as the VGDA or Volume Group Descriptor Area, where this information is stored. The VGDA also contains information describing all of the logical volumes (discussed later in this section) that exist in the volume group.

If more than 32 physical volumes are attached to a system then more than one volume group will definitely be required. It is usually sensible to design the system such that different types of information are stored in different volume groups though. For example, operating system information contained in one volume group, and user information in a separate one, can assist in management and in particular recovery; should a disk fault occur in a physical volume from one volume group, then only information from that volume group will be affected.

Under AIX Version 3 and AIX Version 4, up to 255 volume groups can be defined.

Physical Partitions

When a physical volume is added into a volume group, the space on the physical volume is divided up into equal chunks known as Physical Partitions (PPs). The physical partition size is set when a volume group is created, and all physical volumes that are added to the volume group inherit the value. The physical partition size can range from 1 to 256MB, and must be a power of 2, the AIX default being 4MB. Up to 1016 physical partitions can be defined per physical volume under AIX Version 3 and AIX Version 4.

This is the smallest unit of disk space allocation in the logical volume paradigm. Smaller units increase allocation flexibility at the cost of increased management overhead.

Logical Partitions

A Logical Partition (LP) is effectively a pointer to from 1 to 3 physical partitions, this number specified when a logical volume (see next section) is created. Information written to a logical partition will be physically written to the physical partitions pointed to. Thus the number of physical partitions mapped to a logical partition defines the number of copies of that partition, or the level of mirroring.

Up to 35,512 logical partitions can be defined per logical volume under AIX Version 3 and AIX Version 4.

Logical Volumes

Once a volume group has been created, and physical volumes added to it, logical volumes can be created. A Logical Volume (LV) defines a number of logical partitions, and therefore an area of disk that can be used to store information. With AIX Version 3, the maximum size of a logical volume was 2GB, with AIX Version 4, this limit has been raised to 256GB. The maximum number of user-definable logical volumes in a volume group is 256.

Logical volumes are used to store such things as file systems, log volumes, page space, boot data, and dump storage. The section on logical partitions explained that a logical partition can be mapped to up to three physical partitions, which means that up to two copies of the information contained in a logical volume can be maintained; this is called mirroring, and is explained in more detail in Logical Volume Manager Policies.

A logical volume can have its size changed by adding or removing logical partitions, the number of copies can be increased or reduced, and even the physical location of the logical volume on disk can be changed.

Further information on creating and managing volume groups, physical volumes, logical volumes, physical partitions, and logical partitions can be found in General AIX Storage Management. The diagram in Figure - Components of the Logical Volume Manager, shows the relationship between these components.


Figure: Components of the Logical Volume Manager

The logical volume manager provides the tools to create and manage these entities. Structuring access to the physical disks in this manner provides the following benefits.

Logical Volume Manager Operation

General Operation

As has already been discussed, the logical volume manager consists of a set of operating system commands, library subroutines, and other tools that allow logical volumes to be established and controlled. The operating system commands are discussed in detail in General AIX Storage Management, and Storage Management Files and Commands Summary. These commands use the library subroutines to perform management and control tasks for the logical volumes, physical volumes, and volume groups in a system. The interface to the logical volumes is called the Logical Volume Device Driver (LVDD), and this is a pseudo device driver that manages and processes all I/O to logical volumes. The logical volume device driver is designed and utilized in the same way as any other device driver in the system, consisting of two halves. In this case, the lower half is responsible for mapping the logical addresses to actual physical disk addresses, for handling any mirroring, and for maintaining Mirror Write Consistency (MWC). Mirror Write Consistency uses a cache in the device driver where blocks that are mirrored are stored until all copies have been updated. This ensures data consistency between mirrors. The lower half also manages bad block detection and relocation if necessary. If the physical disk is capable of this function, then the logical volume device driver will make use of the hardware support, otherwise it will be done in software. Both mirror write consistency and bad block relocation can be disabled on a logical volume basis.

The list of data blocks to be written (or read) is finally passed by the logical volume device driver to the physical disk device drivers, who interact directly with the disks. In order for the logical volume manager to work with a disk device driver, it must adhere to a number of criteria, the most significant of which is a fixed disk block size of 512 bytes.

The relationship between the various software layers involved in disk access with the logical volume manager is shown in Figure - Relationship Between the LVM and other Components.


Figure: Relationship Between the LVM and other Components

Quorum Checking

In order for a volume group to be accessible to the system, it must be varied on. The process of varying on a volume group is discussed in Varying On and Varying Off Volume Groups. During this process, the logical volume manager reads management information from the physical volumes in the volume group. This information includes the volume group descriptor area already mentioned in Logical Volume Manager Terminology, and another on-disk information repository known as the Volume Group Status Area (VGSA), which is also stored on all physical volumes in the volume group. The VGSA contains information regarding the state of physical partitions and volumes in the volume group, such as whether physical partitions are stale (used for mirroring, but not reflecting the latest information), and whether physical volumes are accessible or not. The VGDA is managed by the subroutine library, and the VGSA is maintained by the LVDD. If the vary on command cannot access a physical volume in the volume group it will mark it as missing in the VGDA. For the command to succeed, a quorum of physical volumes must be available. A quorum is defined as a majority of VGDAs and VGSAs (more than half of the total number available). The only situation where this is slightly different is in the case where there are one or two physical volumes in a volume group. In this case two VGDAs and VGSAs will be written to one disk, and one (or none if only one disk) to the other. If the disk with two sets is inaccessible, then a quorum will not be achieved and the vary on will fail. For techniques to recover from quorum failure, see General Volume Group Recovery.

Logical Volume Manager Policies

When logical volumes are created, there are a number of attributes that can be defined for them that govern their subsequent operation in terms of performance and availability. These attributes are really policies that the logical volume manager enforces for the logical volume and include the following:

  1. Bad-Block Relocation Policy

    As was mentioned in General Operation, the logical volume manager will perform bad-block relocation if required. This is the process of redirecting read/write requests from a disk block that has become damaged to one that is functional, transparently to an application.

  2. Intra-Physical Volume Allocation Policy

    The logical volume manager defines five concentric areas on a disk where physical partitions can be located. These regions are shown in Figure - Physical Disk Partition Location, and are combined into the following three policy choices for data location:

    1. Edge and Inner Edge

      These regions generally have the longest seek times, resulting in the slowest average access times. Logical volumes containing relatively infrequently accessed data are best located here.

    2. Middle and Inner Middle

      These regions provide lower average seek times, and consequently lower average access times. Reasonably frequently accessed data should be positioned here.

    3. Center

      This region provides the lowest average seek times, and hence the best response times. Information which is accessed regularly, and needs high performance should be situated here.

    The different average seek times are based upon the supposition that there is a uniform distribution of disk I/O, meaning the disk head will spend more time crossing the center section of the disk than any of the other regions.


    Figure: Physical Disk Partition Location

    When a logical volume is created, the preferred location policy for the logical volume can be defined. The logical volume manager will then do its utmost to locate the volume as closely to the required position as is possible.

  3. Mirroring

    The logical volume manager allows each logical partition in a logical volume to be mapped to from one to three physical partitions. This means that up to two copies of a logical volume can be transparently maintained for performance and availability purposes. The scheduling policy explained below determines how information is actually written. Should a disk with one of the copies of the logical volume fail, or should some of the physical partitions in the copy become damaged, then another copy can be transparently used while repairs are effected. Furthermore, the copy that has the required partitions closest to a read/write head will be used for reading, improving performance. The benefits here are somewhat dependent upon the inter-physical volume allocation policy which is explained next.

  4. Inter-Physical Volume Allocation Policy

    When the logical volume manager allocates partitions for a logical volume, the partitions can be spread across multiple disks. The inter-physical volume allocation policy governs how this will actually be implemented in terms of numbers of physical volumes. There are two options:

    1. Minimum

      The minimum option indicates that if mirroring is being used, then the minimum number of physical volumes should be used per copy, and that each copy should use separate physical volumes. If mirroring is not being used, then just the minimum number of physical volumes necessary to hold all of the required physical partitions should be used.

    2. Maximum

      The maximum option predictably enough, attempts to spread the required physical partitions over as many physical volumes as possible, thereby improving performance. If mirroring is not used here, then this approach is highly sensitive to physical volume failure.

  5. Scheduling Policy

    When mirroring is being used, there are two ways in which the logical volume manager can schedule I/O for the physical volumes:

    1. Sequential-write copy

      When this option is selected for a logical volume, write requests are performed to each copy successively, in the order primary, secondary, and tertiary. A write to a copy must complete before the next copy can be updated, thus ensuring maximum availability in the event of failure.

      Read requests will be initially directed to the primary copy, and if this fails, to the secondary, and then tertiary if necessary (and defined). While the data is being read from the next copy, the failing copy (or copies) is repaired by turning the read into a write with bad-block relocation switched on.

    2. Parallel-write copy

      In this case, write requests are scheduled for each of the copies simultaneously. The write request returns when the copy that takes the longest to update completes. This method provides the best performance.

      Read requests are scheduled to the copy that can be most rapidly accessed, thereby minimizing response time. If the read fails, repairs are accomplished using the same mechanism as for sequential-write copy.

There is a great deal of additional information on all aspects of the logical volume manager in InfoExplorer online hypertext documentation, if required. Further information on planning and managing the elements of the logical volume manager can be found in Storage Subsystem Design, and General AIX Storage Management.

File Systems

One further level of abstraction is provided at the operating system level, and this is the file system. A file system is essentially a hierarchical structure of directories, each directory containing files, or further directories (known as subdirectories). The diagram in Figure - Standard AIX Version 4.1 JFS Organization shows the standard AIX journaled file system structure as of AIX Version 4; the differences between this and AIX Version 3 structure are organizational. The main purpose of a file system is to provide for improved management of data by allowing different types of information to be organized and maintained separately. As will be shown later in this section however, file systems also provide many more facilities.


Figure: Standard AIX Version 4.1 JFS Organization

There many different types of file systems in existence, including the following:

However there must be at least one base (or root) file system within which other file systems can be accessed, on the local machine.

Journaled File System

Journaled file systems are implemented through a set of operating system commands that allow creation, management, and deletion, and a set of subroutines that allow lower level access such as open, read, write, and close to files in the file system. A JFS is created inside a logical volume and is organized as shown in Figure - JFS Physical Organization.


Figure: JFS Physical Organization

As can be seen, the JFS divides the logical volume into a number of fixed size units or Logical Blocks. The logical block size is the block size used for I/O at the file system interface - this means that the file system passes data to be written or receives data that has been read in blocks of 4096 bytes to/from the LVM. The block size was selected to be 4KB to be the same as memory page size for maximum transfer efficiency, and to minimize free space fragmentation on the disk. The logical blocks in the file system are organized as follows:

Logical Block 0

The first logical block in the file system is reserved and available for a bootstrap program or any other required information; this block is unused by the file system.

Superblock

The first and thirty first logical blocks are reserved for the superblock (logical block 31 being a backup copy). The super block contains information such as the overall size of the file system in 512 byte blocks, the file system name, file system log device address (logs will be covered later in this section), version number, and file system state.

Allocation Groups

The rest of the logical blocks in the file system are divided into a number of allocation groups. An allocation group consists of data blocks and i-nodes to reference those data blocks when they are allocated to directories or files. The purpose behind this extra level of abstraction is as follows:

  1. Improve locality of reference

    Files created within a directory will be maintained in an allocation group with that directory. As allocation groups consist of contiguous logical blocks, this should assist in maintaining locality of reference for the disk head.

  2. Ease file system extension

    Extending a file system is easier as a new allocation group of i-nodes and data blocks can be added, maintaining the relationship between i-nodes and file system size simply. Without allocation groups, the file system would either have to be reorganized to increase the number of i-nodes, or the extension could only increase the number of data blocks available, thereby conceivably limiting the number of files and directories in the file system.

I-nodes are explained in the next section. For a pictorial representation of this organization, please refer to Figure - JFS Physical Organization.

Disk i-nodes

When the file system is created, files and directories within the file system are located via i-nodes. An i-node is an on-disk structure that contains information regarding the file and its physical location on disk. Under AIX Version 3, an i-node is created for every 4KB of file system space, so for a 32MB file system, 8000 i-nodes would be created, and these i-nodes would be divided between the allocation groups. This then defines the maximum number of files and directories that can be created in the file system. The structure of an i-node is depicted in Figure - Anatomy of an I-node. The first part contains information such as the owner, and permissions for the directory or file, the second part contains an array of 8 pointers to the actual disk addresses of the 4KB logical blocks that make up the file or directory.


Figure: Anatomy of an I-node

For files that can fit within the array storage area, such as most links, the file is actually stored in the i-node itself, thus saving disk space.

For a file of size up to 32KB, each i-node pointer will directly reference a logical block on the disk. For example, if the file is of size 27KB, then the first seven pointers will be required, the last pointer referencing a 4KB logical block containing the last 3KB of the file.

For files up to 4MB, the i-node points to a logical block that contains 1024 pointers to logical blocks that will contain the files data; this gives a file size of up to 1024 x 4096 or 4MB.

For files greater in size than this, the i-node points to a logical block that contains 512 pointers to logical blocks that each contain 1024 pointers to the logical blocks that will actually contain the files data; this gives a maximum file size of 512 x 1024 x 4096 or 2GB.

The mapping of file names to i-node numbers is stored within directory files.

The maximum size of the file system under AIX Version 3 is limited to 2GB. This is due to limitations in the size of the internal pointer used by some system calls to navigate the file system; the pointer is defined as a signed integer which means there are 31 bits available for addressing. This gives a maximum range of 2 to the power 31, or 2GB.

AIX Version 4 Enhancements to the JFS
  1. Fragments

    While generally efficient from the point of view of loading into memory and preventing physical disk fragmentation, having a fixed logical block size can have drawbacks. If the majority of files stored in the file system are small (less than 1 logical block in size), then there will be a great deal of wasted disk space in the accumulation of those portions of the logical blocks that remain unused by the smaller files. If all files are less than half of a logical block in size, for example, then half of the total file system space will be unused, even though the file system is full.

    In order to address this type of situation, AIX Version 4 introduces the concept of the fragment. A fragment is the smallest unit of file system disk space allocation, and can be 512, 1024, 2048, or 4096 bytes in size. The fragment size is defined at JFS creation time and is stored in the superblock.

  2. Number of Bytes Per i-node

    Under AIX Version 3, the number of i-nodes created for a file system was fixed, as discussed in the section on i-nodes. With AIX Version 4, it is now possible to vary the number of i-nodes created within a file system, and therefore the amount of space required by the i-node structures can be tailored to maximize utilization of the file system. If only a few very large files are going to be created in a file system, then it is a waste of space to generate 8000 i-node structures, and therefore the value of the Number of Bytes Per i-node or NBPI, should be increased. For example, if the NBPI is set to 16KB, then in a file system of size 32MB, 2000 i-nodes would be created rather than 8000 as in AIX Version 3.

  3. Compression

    Another new feature in AIX Version 4 is JFS compression. This facility provides for compression of regular files (as opposed to directories or links). The compression is implemented on a logical block basis which means that when a logical block of file data is to be written, an entire logical block is allocated for it; the logical block is then compressed and the number of fragments now required as a result of the compression, actually allocated. Thus in contrast to a fragment file system which only allocates fragments for the final logical blocks of files less than 32KB in size, compressed file systems allocate fragments for every logical block in every file.

    Compression is done block by block in order to fulfill the requirements for efficient random I/O. The algorithm used by default is LZ1, although user-defined compression algorithms are also supported.

  4. File System Size

    As discussed previously, the maximum size of a file system under AIX Version 3 was 2GB. With AIX Version 4, this maximum has been increased to 256GB. This increase has been achieved by changing the file system pointers and data types to 64 bits; the limitations restricting the maximum size are now JFS data structure and algorithm related.

These enhancements are described in more detail in AIX Version 4 Storage Management Enhancements.

Network File System

NFS allows files and directories located on other systems to be incorporated into a local file system and accessed as though they were a part of that file system. NFS provides its services on a client-server basis. Server systems can make selected files and directories available for access by client systems. NFS provides a number of services, including the following:

NFS installation, configuration and management are covered in detail in the InfoExplorer online hypertext documentation.

NFS operation is stateless, which means that the server does not maintain any transaction information on behalf of clients. Each file operation is atomic, which means that once complete, no information on the operation is retained. Thus if a connection should fail, it is up to the client to maintain any synchronization or transaction logging to ensure consistency.

Other File Systems

As has been explained in this section, file systems provide an interface that simplifies management and access to information. The native file system under AIX is the JFS, but there are other types such as NFS for remote files, as well as the following:

Accessing File Systems

Once a file system of whatever type has been created, it must be mounted in order to access the information within it. The process of mounting creates the connection between an existing and accessible local file system mount point and the root directory of the directory structure to be accessed. A mount point can be either a directory or a file in the local file system. If a new local file system, or remote directory structure (using NFS for example) is to be accessed, then it must be mounted over a directory. If only a single file is to be accessed then it must be mounted over a local file. As shown in Figure - Standard AIX Version 4.1 JFS Organization, the AIX operating system starts with a root file system into which the /usr, /var, /tmp, and /home file systems are mounted at boot time. Any other file systems required can then be mounted wherever required (assuming relevant permissions).

Higher Level Tools

So far in this chapter the basic operating system support providing access to information stored on physical media (such as disk) has been discussed. This next section will look briefly at some higher level functions provided either by the operating system, or by applications designed to enhance storage management capabilities.

Backup/Restore

Backup facilities, as provided by the operating system, enable all information (including both user and operating system data) to be copied (generally to removable media such as tape), so that in the event of a major problem the information can be easily restored.

There are three main areas to consider when designing a backup strategy:

  1. Which information should be backed up

    There is usually a large amount of information stored in a computer system. Copying this information can take time, and there will usually be information that is not important enough to warrant concern (such as temporary files, or data that has already been archived).

  2. What technology should be used for the backups

    This will depend on the quantity of information, length of time available for backup, length of time information needs to be stored on backup media, and the cost of the technology used. As has been discussed in previous sections, optical storage is generally faster and has longer potential shelf life, whilst tapes are cheaper and generally larger capacity.

  3. When and how often should backups occur

    There are several strategies that can be used, depending on the nature of the information produced by the business. For example, sites where there is a great deal of static reference information with little day to day change, and maybe monthly updates to the static information could benefit from an incremental policy. This would mean taking full backups on a monthly basis, and only backing up the changed information daily. Organizations processing large amounts of information on a daily basis may choose to backup the data daily.

    The strategy chosen should reflect the business information cycles, but will also be strongly tied to the criticality of the information. The decision is simply: how long can the business survive without key information. If the answer is one day, then backups of the information must be scheduled at least once a day.

Designing a backup strategy is an essential task, as however well maintained a system is, the unexpected, by definition can always happen. Backup and recovery planning and techniques are discussed in more detail in Planning Backup Strategies.

Hierarchical Storage Management

Thus far, storage subsystems have been discussed from the point of view of operating system level access. That is to say, covering the various storage devices available, their pros and cons, and the way in which they can be made available to higher level user applications. There are also many intermediate and higher level applications that themselves provide services, both for system administrators, and for higher level management of storage on behalf of users. These types of application fall into the category of Hierarchical Storage Management.

The premise behind hierarchical storage management is to categorize storage devices in terms of their basic properties and provide automatic mechanisms to utilize them most efficiently within this context, usually in a networked environment. For example, as has been discussed in How to Make the Decision, the basic property classifications shown in Figure - Summary of Device Attributes, apply. This implies that frequently accessed information, or information that requires high performance access (such as databases), should be located on fast disk devices. Older, less frequently accessed information, or information with less restrictive performance requirements, can be stored on optical media. Backup, archive, or long term information that is rarely accessed, can be stored on tape. The process of classifying information in this way, is however, really a dynamic one, and therefore best done on a reasonably continuous basis. This is where hierarchical storage managers can be useful.

ADSTAR* Distributed Storage Manager (ADSM) for example, can manage disk, optical and tape storage in just this way. ADSM provides backup/restore and archive/retrieve services to client systems in a distributed environment. Storage pools are maintained on the server machine that ADSM uses to fulfill client requests. These pools can be defined to form a hierarchy and information can be automatically migrated between pools. For example, the first pool may be composed of fast disk devices to support rapid satisfaction of client requests. However, if this pool becomes full, then information can be automatically migrated to another storage pool. Generally, the information so migrated will be the less frequently accessed information, and so the lower level pool will be composed of lower cost, slower, higher capacity devices, such as optical. The hierarchy so defined is arbitrary, and can contain pools of tape devices lower in the hierarchy, to which backup information can be directly (and automatically) written on behalf of clients, or to which even less regularly accessed archive data can be migrated.

ADSM also has many other capabilities, and for a fuller description of these, please refer to Higher Level Storage Management Products.

Some tools manage disk space at the client machine as well. FSF/6000 is an example of such an application. In this case, the client fast disk space is maintained merely as a cache (or window onto the real storage space). When information is created or changed in any way, it is stored in the cache at the local machine, but a copy is made and this is maintained at the server. When the cache approaches a predefined high water mark, or percentage utilization, data is automatically migrated to the server storage space, and a pointer to it left in the cache. Subsequent requests for this information will result in the information being transparently moved back (a copy is still maintained). In this way, the small client storage space can be made to seem much larger than it really is; in addition, the server storage pool can be managed by a tool such as ADSM (at the server) to provide automatic backup, archive and migration.

Using such tools additionally enhances storage management by providing a centralized mechanism for information management, which can be useful, not only from resource utilization management, but also from security, general availability, and ease of use.

General information on some of the applications available in this area is available in Higher Level Storage Management Products.

Media Management

Media management refers to the control of the attached storage devices. This chapter has already looked at the operating system supplied mechanisms to enable this, including device drivers, the LVM, and file systems. In the previous two sections, backup and restore, and hierarchical storage managers have also been discussed, which provide higher level functions for storage. The capabilities of the devices are thus made available through the operating system components, and through higher level tools. Generally, the higher level applications make use of the operating system provided components to access the required devices. There are some device types that cannot currently be managed by the operating system, including tape and optical libraries. In these cases an alternative mechanism must be found to control the devices, if required; ADSM for example provides the necessary support to control a range of libraries (both tape and optical). It is not always the case that the complete functionality of a higher level tool be required, merely just the ability to manage a tape library for a discrete system. In this case an alternative solution may be necessary in order to provide the library management component.

Summary

This chapter has discussed in detail the operating system software components that enable access and management of the physical storage devices, as well as those aspects that require storage space for system operation. Higher level functions provided by tools such as hierarchical storage managers, backup/restore operations, and management of the physical media were also briefly discussed.

  1. Operating system

    The components of the operating system discussed were:

  2. Higher level tools

    The higher level tools discussed were: