Module 7: Managing Fault Tolerance
Fault tolerance is the the ability of a computer or OS to respond to a
catastrophic event, such as power outage or a hardware failure, so that no data is lost,
and that work in progress is not corrupted.
RAID provides fault tolerance
by implementing data redundancy. There are 5 levels:
RAID 0 |
Volume sets & Disk striping without parity (just for study)
- NO FAULT TOLERANCE
- Both are "Lose one, lose
'em all.
|
RAID 1 |
Disk mirroring / duplexing. |
RAID 2 |
Disk striping with Error-Correction-Code (ECC). |
RAID 3 |
Disk striping with ECC stored as parity. |
RAID 4 |
Disk striping large blocks; parity stored on one drive. |
RAID 5 |
Disk striping with parity distributed on multiple drives. |
Only RAID 0, RAID 1 and RAID 5 are supported be
Windows NT Server, not Windows NT Workstation.
RAID 1: Mirror Set and Disk
Duplexing
Mirror Sets
- Mirror sets (RAID 1) use the NT Ftdisk.sys
(fault tolerance driver) to simultaneously write the same data to two physical drives.
- NT server configures fault tolerance at the
level of the logical drive letter, not the physical disk level. If you have two drives on
one disk, you can choose to mirror one to other.
- Mirror Sets can enhance read performance
because the fault tolerance driver reads from both members of a set at once. There can be
a slight decrease in write performance. When one drive fails, performance returns
too normal.
Disk Duplexing
- In Disk duplexing, each disk in the mirror set has its own disk
controller. In this way, it can protect single controller failure. Disk duplexing is a
hardware enhancement to a NTS mirror set. No additional software configuration is
necessary.
- Disk duplexing can improve bus traffic
because two controllers are involved.
- Keep in mind that the total usage of both
disks capacity is only used for about 50 %, this doesn't make it cheap, but it is one of
the cheapest methods.
RAID 5: Stripe Sets with
Parity
- In stripe sets with parity, you need at
least 3 disks. Up to 32 disks can be supported.
- There is a parity block in each stripe and
parity block is on alternate disks.
- Upon disk failure, the data on the new disk
can be regenerated using the data and parity information in each stripe on the remaining
disks.
- RAID 5 significantly improves read
performance, but write performance is significantly slower that stripe set without parity.
RAID 5 offers better cost advantage over mirror sets
(overhead is about 25% as compared to 50% in mirror set).
- Neither the boot nor the
system partition can be part of a stripe set! DO remember this They will try to trick you.
- To restore the fault tolerance; replace the
failed disk --> in Disk Administrator use the fault tolerance menu and choose
regenerate.
Breaking a Mirrored Set
When a member of a mirrored set
fails, the functional member will continue to operate. To replace the failed disk the
administrator must first break the mirror set by doing the following:
- In Disk Administrator use the Break Mirror
on the Fault Tolerance menu. Automatically the mirrored (secondary) volume is assigned the
next available drive letter. This is the first step!
- If the failed drive is the primary member
of the set, it may be necessary to assign the drive letter that was previously assigned to
the complete set. Think what happens to the shares on the set if you don't!
- Delete the failed partition.
- Use free space on another disk to create a
new mirror set relationship.
- The computer must be restarted!
- When making a mirror set for the boot
partition or system partition, a fault tolerant boot disk should be created in case of a
physical disk failure.
Remember: only mirror sets can provide fault
tolerance for the boot and system partition. |
Fault
Tolerance Specifications |
RAID
0 -means- NOT fault tolerant. Windows NT SERVER only supports RAID 0, 1 and 5. NT
Workstation natively supports RAID 0 AKA, it does not have fault tolerance -- built in
anyway.
RAID Level 2 - Disk
Striping with error correction code (ECC)
RAID Level 3 - Disk Striping with ECC stored
as parity
RAID Level 4 - Disk Striping large blocks;
parity stored on one drive
RAID Summary Chart
|
Volume Set |
Mirror |
Duplex |
Disk Striping without Parity |
Disk Striping with parity |
RAID |
0 |
1 |
1 |
0 |
5 |
# Disks required |
1 (2- 32 areas per volume) |
2 same controller |
2 not same controller |
2 |
3 |
Max # disks |
32 areas per vol. |
|
|
32 |
32 |
Contain system / boot
partition |
No |
Yes |
Yes |
No |
No |
Can be extended without data
loss |
Yes |
|
|
No |
|
Can be decreased without
data loss |
No |
|
|
No |
|
File Systems |
Must be the same on all
volumes |
|
|
and/or:
FAT, NTFS ; can put multiple file systems
together |
NTFS only |
Different Types of Hard
Disks together |
|
|
|
Yes |
|
Advantage |
disk space ; best method
w/out fault tolerance |
potential read performance
gain |
reduce bus traffic and potential
read performance gain; also protect against controller failure |
I/O speed gain. Fastest read/write
performance of all disk sets |
I/O speed gain. 2nd Fastest
read/write performance of all disk sets |
Disadvantage |
no fault ; no performance
gain |
write
performance ; cost |
cost |
no fault ; space |
requires 3x more memory for
parity calcs.- AKA - memory hog and space hog! |
Supports Removable media? |
Can be done but not
recommended unless you plan to use removable media as fixed disk. |
Can be done but not
recommended unless you plan to use removable media as fixed disk. |
|
No |
|
Paging File |
Can be placed but no
performance gains |
Can be placed but no
performance gains |
Can be placed but no
performance gains |
Can be placed but no
performance gains |
Should not be implemented on
; causes poor performance |
"Lose one you lose em
all" |
Yup |
|
|
Yup |
|
Creating RAID 0,
RAID 1, and RAID 5 Disks |
Creating a Volume Set
- Select area of free space
(or select a formatted partition with a drive letter assigned to it)
- Holding down the CTRL key,
select a second area of free space (can be on same physical disk)
- Partition > Create
Volume Set (for a formatted partition, "Extended Volume Set")
- Select Total size of volume
set
- Volume set is created and
Disk Administrator automatically assigns a drive letter to it
- Commit Changes now
Extending a Volume Set when a Volume Set is already
created
- Select the Volume Set
- Holding down the CTRL key,
select a another area of free space (can be on same physical disk)
- Partition > Extend
Volume Set
- Volume set is extended
- Commit Changes now
Creating a Stripe Set without Parity
- Select area of free space
- Holding down the CTRL key,
select a second area of free space (must be on separate physical disk)
- Partition > Create
Stripe Set
- Select Total size of Stripe
set
- Stripe set is created and
Disk Administrator automatically assigns a drive letter to it
- Commit Changes now
Creating a Disk Mirror / Duplex
- Select a formatted
partition with a drive letter assigned to it.
- Holding down the CTRL key,
select area of free space (must be equal to in size) (must be on separate physical disk)
- Fault Tolerance >
Establish Mirror
- Mirror / Duplex is created
(drive letter is formatted partition with a drive letter assigned to it)
- Commit Changes now
Creating a Stripe Set with Parity
Calculate the size
- 1/3 of total space is used
to store parity in Disk Striping with Parity (3 disks) ; 1/4 of the total space is used to
store parity in Disk Striping with parity (4 disks) etc.
- Disk Striping with
parity is cumulation of most available space on three or more drives. Largest space on each drive is equal to smallest space
available on smallest drive.
- Select area of free space
(on separate physical disk)
- Holding down the CTRL key,
select a second area of free space (must be on separate physical disk)
- Holding down the CTRL key,
select a third area of free space (must be on separate physical disk)
- Fault Tolerance > Create
Stripe Set with Parity
- Commit Changes now
RAID 1 Failure: Disk
Mirrors And Duplex Failure
Note: It looks easy
enough, but recovery in RL (real life) is a whole different story. Disaster recovery is a
very, very complex undertaking and demands careful planning and testing both at the
hardware and software level -- do it -- before you actually have to!
Overview
- When original member of
disk mirror or duplex set fails, NT -- "automatically" -- uses the other member
of the set to continue operation.
- Whenever a member of a
mirror or duplex set fails, you must replace the failed member and reestablish the mirror
or duplex, to continue to have data protection.
- NT Server will be unable
to reboot at all if the original set member fails, because the BOOT.INI file points to
that member (it's necessary to hand-edit the ARC name in the boot file to point to the
other member of the set to restart the system at all). You will have to boot with your Fault
Tolerance boot disk to regain access to the system to run Disk Administrator utility
to repair the set.
Fixing Broken Mirrors And Duplexes
- Assuming you are able to
boot into NT, Break the mirror. ... Disk Administrator > Select drive that
was part of the mirror > Fault Tolerance >Break Mirror
- Reassign the drive letter
to the remaining member of the (now broken) mirror/duplex set. For example, if Drives 0
and 1 are mirrored and assigned drive letter C, and Drive 0 fails, you must break the
mirrored set, and then assign drive letter C to Drive 1.
- Replace the failed drive
and create a partition on the new drive equal in size to the drive to be mirrored.
Recreate the mirror by using the Establish
Mirror option from the Fault Tolerance drop-down menu. (Note: This effectively switches
the roles of the previous original and mirrored drives. Once NT reboots, the mirror set
will be rebuilt using the new member, but the original drive now will be Drive 1 -- based
on this example.
RAID 5
Failure: Stripe Set with Parity |
|
Overview
- Once the device fails, the
system is able to rebuild data on the fly from parity information stored on the still
operational devices. Because regeneration is CPU-intensive, performance slows
dramatically. However, the system continues to operate, even with a failed set member.
Fixing Failed Members of a Stripe Set
- Fix the hardware by
replacing or repairing the failed drive.
- Disk Administrator >
create a new partition equal in size to the one that failed (if the original disk is
brought back online without losing its partition structure, skip this step).
- Select the stripe set and
the partition that replaces the failed member (which could be the original partition if it
wasnt destroyed), then select the Regenerate command from the Fault Tolerance
drop-down menu. Once NT reboots, the stripe set will be rebuilt by copying parity
information to recreate the new member.
Windows NT Boot Disk - this is not the same disk as the
Emergency Repair Disk
Windows NT Boot Disk Fixes - NT boot disk can access a drive that has NTFS or
FAT file system installed. Boot disk useful for:
- Corrupted boot sector.
- Corrupted master boot
record (MBR).
- Virus infections.
- Missing or corrupt NTLDR or
NTDETECT.COM.
- Incorrect NTBOOTDD.SYS
driver.
- boot from the shadow of a
broken mirror, although you may need to change the BOOT.INI to do that.
Note
- Boot files reside on the
System Partition. (System partition is c:\ for DOS). NT system partition must be a primary
partition.
- System files reside on the
Boot Partition. (Boot partition is where your OS is located). Can be primary partition or
logical drive.
To create a fault
tolerance boot disk:
- On a computer running Windows NT. Format a
floppy disk.
- Copy the following files from the primary
partition. They are located in the system root, but may be hidden files:
Intel x86-based computers |
RISC-based computers |
Ntldr |
Osloader.exe |
Ntdetect.com |
Hal.dll |
Ntbootdd.sys
(for small computer system interface (SCSI) disks not using a SCSI BIOS)* |
*.pal
(Alpha only) |
Boot.ini |
|
*The Ntbootdd.sys file appears
only on SCSI systems in which the SCSI BIOS is not used!
- On Intel x86-based computers, modify the
boot.ini.
- On RISC-based computers, modify the
firmware variables shown below (remember to update the disk in case of changes):
Variable |
Value |
Osloader |
Multi(0)disk(0)fdisk(0)\Osloader.exe |
Systempartition |
Multi(0)disk(0)fdisk(0) |
Osloadpartition |
Path
to the secondary mirrored partition. |
osloadfilename |
Path
to the Windows NT Server root directory. |
- An ARC is the Advanced RISC
Computing. It uses some conventions described below:
Multi(0)disk(0)rdisk(1)partition(2)
Convention |
Description |
Multi
| scsi |
Identifies
the controller type. It can either be SCSI or some other type (multi). |
(x) |
Identifies
the hardware adapter (starts with 0). |
Disk
(y) |
SCSI
bus number. For multi the value is always 0. |
Rdisk(z) |
Ordinal number of the disk (ignored for SCSI controllers). start with 0 |
Partition(a) |
Ordinal
number of the partition. (start with 1) |
The SCSI ARC naming convention varies the disk() parameter
for successive disks on one controller; the multi controller format varies the rdisk()
parameter.
Example Boot.ini
How do I work with
Advanced RISC Computer (ARC) names? |
What are the two sections labeled [boot loader] and
[operating systems] in the BOOT.INI (read only, system, hidden)?
1. The [boot loader]
section supplies timeout interval after which the default operating system to load
(defined in the default= line that follows timeout) loads automatically. *Windows NT
usually boots from (boot loader) section.
2. The [operating
systems] section supplies complete menu of operating system choices NTLDR displays after
the program loads. You can disable the timer before it elapses by pressing any arrow or
letter key on the keyboard. Then, you can wait as long as you like to make your menu
selection. *Windows NT boots from (operating system) section if deliberate change to OS is
made.
Note: You need to
make changes in both especially if using disk mirroring or disk duplexing and one fails.
Remember
- Boot files reside on the System
Partition. (System partition is c:\ for DOS). NT system partition must be a primary
partition.
- System files reside on the Boot
Partition. (Boot partition is where your OS is located). Can be primary partition or
logical drive.
- Very Microsoft of them :-)
I. How do I determine if I use the Multi or SCSI
parameter?
1. "SCSI"
only applies to a SCSI drive whose onboard BIOS has been disabled (no BIOS translation capabilities).
2. If the NTBOOTDD.SYS
file is on your system, use "SCSI".
If neither is
the case, use "multi."
II. How do I
determine the SCSI or Multi number?
What is a disk
controller - a chip and associated circuitry that is responsible for controlling a
disk drive. There are different controllers for different interfaces. For example, an IDE
interface requires an IDE controller and a SCSI interface requires a SCSI controller.
Which Disk
Controller is the drive we are looking for attached to?
A. If there is only 1
Disk controller (multi or SCSI) = 0
B. If there
are 2 Disk controllers (multi or SCSI) = 1 ; this is assuming the hard drive we are
looking for is attached to the second controller. If the hard drive we are looking for is
attached to the first disk controller (multi or SCSI) will = 0
III. How do I
determine the rdisk and disk number?
Part 1
1. If we are using
"multi", "disk" will always be 0
2. If we
are using "scsi", "rdisk" will always be 0
Part 2
A. disk = SCSI bus ID
(usually 0 to 6) when "SCSI" is chosen.
B. rdisk =
LUN (SCSI logical unit number) or position in disk chain when "multi" is chosen.
Usually
-The first
hard disk = 0
-The second
hard disk = 1
-The third
hard disk = 2
-and so on
unless specifically mentioned.
NTFS support:
- Maximum Volume size = 16EB
;
- Maximum File Size = 16EB
theoretical ; 4GB-64GB actual
- Max. Files in the Root = No
limit
- Max. Files in non-root = No
limit
- File level security = Yes
- LFN support = Yes
- Self repair = Yes
- Transaction Log
capabilities = Yes
- File Level Compression =
Yes
- File Level Security = Yes
- Dual File Fork Support
(Macintosh) = Yes
- POSIX support = Yes
When to use NTFS:
- Partitions of 400 MB or
larger
- NT is only OS in use
- Using Services for MAC for
file sharing
- File level security is
required
- Permissions must be
preserved while migrating directories and files from NetWare server
- File compression required
- Local security required
NTFS Notables
- Note: fdisk will not
remove an NTFS logical drive in extended partition. You will have to use NT setup disks.
- NT is the only OS NT that
can directly access NTFS partitions. NTFS information is read-able across the network by
many operating systems (including DOS, Win-dows 3.x and 95, as well as other operating
systems)
- An NTFS partition
larger than 4GB CANNOT be created during installation because of FAT limitations.
"...it can be done after an installation by using Disk Administrator. If you are
upgrading from NT 3.51, do this before you upgrade using Disk Administrator (remember, you
can't touch the system or boot partition) ..." Sooooo, how this would work is a
mystery to me. Basically, for a new install, what this means is: plug in a second hard
disk (to a machine already running NT), have Disk Administrator format it, mark it as
active. Power down, and take the newly formatted disk out before you reboot and put it in
the PC your doing the NT installation on ... have not had try this but seems like the only
logical solution. Probably want to back up original configuration (on the machine you are
using to format the hard disk) before putting in second disk as Disk Administrator is
stubborn about "forgetting" that a second drive was added :-)
Comments and suggestions? E-mail me at grantwil@sk.sympatico.ca |
Last Updated: Wednesday, March 10, 1999 |
Grant Wilson, Tisdale, SK. Canada |
|
|