Windows NT Server 4.0 Notes


Module 7: Managing Fault Tolerance

Fault tolerance is the  the ability of a computer or OS to respond to a catastrophic event, such as power outage or a hardware failure, so that no data is lost, and that work in progress is not corrupted.

RAID provides fault tolerance by implementing data redundancy. There are 5 levels:

RAID 0 Volume sets & Disk striping without parity (just for study)
  • NO FAULT TOLERANCE
  • Both are "Lose one, lose 'em all.
RAID 1 Disk mirroring / duplexing.
RAID 2 Disk striping with Error-Correction-Code (ECC).
RAID 3 Disk striping with ECC stored as parity.
RAID 4 Disk striping large blocks; parity stored on one drive.
RAID 5 Disk striping with parity distributed on multiple drives.

Only RAID 0, RAID 1 and RAID 5 are supported be Windows NT Server, not Windows NT Workstation.

RAID 1: Mirror Set and Disk Duplexing

Mirror Sets

  • Mirror sets (RAID 1) use the NT Ftdisk.sys (fault tolerance driver) to simultaneously write the same data to two physical drives.
  • NT server configures fault tolerance at the level of the logical drive letter, not the physical disk level. If you have two drives on one disk, you can choose to mirror one to other.
  • Mirror Sets can enhance read performance because the fault tolerance driver reads from both members of a set at once. There can be a slight decrease in write performance. When one drive fails, performance returns too normal.

 Disk Duplexing

  • In Disk duplexing, each disk in the mirror set has its own disk controller. In this way, it can protect single controller failure. Disk duplexing is a hardware enhancement to a NTS mirror set. No additional software configuration is necessary.
  • Disk duplexing can improve bus traffic because two controllers are involved.
  • Keep in mind that the total usage of both disks capacity is only used for about 50 %, this doesn't make it cheap, but it is one of the cheapest methods.

RAID 5: Stripe Sets with Parity

  • In stripe sets with parity, you need at least 3 disks. Up to 32 disks can be supported.
  • There is a parity block in each stripe and parity block is on alternate disks.
  • Upon disk failure, the data on the new disk can be regenerated using the data and parity information in each stripe on the remaining disks.
  • RAID 5 significantly improves read performance, but write performance is significantly slower that stripe set without parity.

  • RAID 5 offers better cost advantage over mirror sets (overhead is about 25% as compared to 50% in mirror set).
  • Neither the boot nor the system partition can be part of a stripe set! DO remember this They will try to trick you.
  • To restore the fault tolerance; replace the failed disk --> in Disk Administrator use the fault tolerance menu and choose regenerate.

Breaking a Mirrored Set

When a member of a mirrored set fails, the functional member will continue to operate. To replace the failed disk the administrator must first break the mirror set by doing the following:

  1. In Disk Administrator use the Break Mirror on the Fault Tolerance menu. Automatically the mirrored (secondary) volume is assigned the next available drive letter. This is the first step!
  2. If the failed drive is the primary member of the set, it may be necessary to assign the drive letter that was previously assigned to the complete set. Think what happens to the shares on the set if you don't!
  3. Delete the failed partition.
  4. Use free space on another disk to create a new mirror set relationship.
  5. The computer must be restarted!
  • When making a mirror set for the boot partition or system partition, a fault tolerant boot disk should be created in case of a physical disk failure.
    Remember: only mirror sets can provide fault tolerance for the boot and system partition.

 

Fault Tolerance Specifications

RAID 0 -means- NOT fault tolerant. Windows NT SERVER only supports RAID 0, 1 and 5. NT Workstation natively supports RAID 0 AKA, it does not have fault tolerance -- built in anyway.

RAID Level 2 - Disk Striping with error correction code (ECC)
RAID Level 3 - Disk Striping with ECC stored as parity
RAID Level 4 - Disk Striping large blocks; parity stored on one drive
 

RAID Summary Chart

  Volume Set Mirror Duplex Disk Striping without Parity Disk Striping with parity
RAID 0 1 1 0 5
# Disks required 1 (2- 32 areas per volume) 2 same controller 2 not same controller 2 3
Max # disks 32 areas per vol.     32 32
Contain system / boot partition No Yes Yes No No
Can be extended without data loss Yes     No  
Can be decreased without data loss No     No  
File Systems Must be the same on all volumes     and/or: 
FAT, NTFS ; can put multiple file systems together
NTFS only
Different Types of Hard Disks together       Yes  
Advantage disk space ; best method w/out fault tolerance  potential read performance gain reduce bus traffic and potential read performance gain; also protect against controller failure I/O speed gain. Fastest read/write performance of all disk sets I/O speed gain. 2nd Fastest read/write performance of all disk sets 
Disadvantage no fault ; no performance gain write 
performance ; cost
cost no fault ; space requires 3x more memory for parity calcs.- AKA - memory hog and space hog!
Supports Removable media? Can be done but not recommended unless you plan to use removable media as fixed disk.  Can be done but not recommended unless you plan to use removable media as fixed disk.   No  
Paging File Can be placed but no performance gains Can be placed but no performance gains Can be placed but no performance gains Can be placed but no performance gains Should not be implemented on ; causes poor performance
"Lose one you lose em all" Yup     Yup  

 

Creating RAID 0, RAID 1, and RAID 5 Disks

Creating a Volume Set

  • Select area of free space (or select a formatted partition with a drive letter assigned to it)
  • Holding down the CTRL key, select a second area of free space (can be on same physical disk)
  • Partition > Create Volume Set (for a formatted partition, "Extended Volume Set")
  • Select Total size of volume set
  • Volume set is created and Disk Administrator automatically assigns a drive letter to it
  • Commit Changes now

Extending a Volume Set when a Volume Set is already created

  • Select the Volume Set
  • Holding down the CTRL key, select a another area of free space (can be on same physical disk)
  • Partition > Extend Volume Set
  • Volume set is extended
  • Commit Changes now

Creating a Stripe Set without Parity

  • Select area of free space
  • Holding down the CTRL key, select a second area of free space (must be on separate physical disk)
  • Partition > Create Stripe Set
  • Select Total size of Stripe set
  • Stripe set is created and Disk Administrator automatically assigns a drive letter to it
  • Commit Changes now

Creating a Disk Mirror / Duplex

  • Select a formatted partition with a drive letter assigned to it.
  • Holding down the CTRL key, select area of free space (must be equal to in size) (must be on separate physical disk)
  • Fault Tolerance > Establish Mirror
  • Mirror / Duplex is created (drive letter is formatted partition with a drive letter assigned to it)
  • Commit Changes now

Creating a Stripe Set with Parity

  • calculationCalculate the size
    • 1/3 of total space is used to store parity in Disk Striping with Parity (3 disks) ; 1/4 of the total space is used to store parity in Disk Striping with parity (4 disks) etc.
    • Disk Striping with parity is cumulation of most available space on three or more drives. Largest space on each drive is equal to smallest space available on smallest drive.
  • Select area of free space (on separate physical disk)
  • Holding down the CTRL key, select a second area of free space (must be on separate physical disk)
  • Holding down the CTRL key, select a third area of free space (must be on separate physical disk)
  • Fault Tolerance > Create Stripe Set with Parity
  • Commit Changes now

RAID 1 Failure: Disk Mirrors And Duplex Failure

Disk Alarm!Note: It looks easy enough, but recovery in RL (real life) is a whole different story. Disaster recovery is a very, very complex undertaking and demands careful planning and testing both at the hardware and software level -- do it -- before you actually have to!

Overview

  • When original member of disk mirror or duplex set fails, NT -- "automatically" -- uses the other member of the set to continue operation.
  • Whenever a member of a mirror or duplex set fails, you must replace the failed member and reestablish the mirror or duplex, to continue to have data protection.
  • NT Server will be unable to reboot at all if the original set member fails, because the BOOT.INI file points to that member (it's necessary to hand-edit the ARC name in the boot file to point to the other member of the set to restart the system at all). You will have to boot with your Fault Tolerance boot disk to regain access to the system to run Disk Administrator utility to repair the set.

Fixing Broken Mirrors And Duplexes

  • Assuming you are able to boot into NT, Break the mirror. ... Disk Administrator >  Select drive that was part of the mirror > Fault Tolerance >Break Mirror
  • Reassign the drive letter to the remaining member of the (now broken) mirror/duplex set. For example, if Drives 0 and 1 are mirrored and assigned drive letter C, and Drive 0 fails, you must break the mirrored set, and then assign drive letter C to Drive 1.
  • Replace the failed drive and create a partition on the new drive equal in size to the drive to be mirrored.

  • Recreate the mirror by using the Establish Mirror option from the Fault Tolerance drop-down menu. (Note: This effectively switches the roles of the previous original and mirrored drives. Once NT reboots, the mirror set will be rebuilt using the new member, but the original drive now will be Drive 1 -- based on this example.
RAID 5 Failure: Stripe Set with Parity

Overview

  • Once the device fails, the system is able to rebuild data on the fly from parity information stored on the still operational devices. Because regeneration is CPU-intensive, performance slows dramatically. However, the system continues to operate, even with a failed set member.

Fixing Failed Members of a Stripe Set

  • Fix the hardware by replacing or repairing the failed drive.
  • Disk Administrator > create a new partition equal in size to the one that failed (if the original disk is brought back online without losing its partition structure, skip this step).
  • Select the stripe set and the partition that replaces the failed member (which could be the original partition if it wasn’t destroyed), then select the Regenerate command from the Fault Tolerance drop-down menu. Once NT reboots, the stripe set will be rebuilt by copying parity information to recreate the new member.

Windows NT Boot Disk - this is not the same disk as the Emergency Repair Disk

Windows NT Boot Disk Fixes - NT boot disk can access a drive that has NTFS or FAT file system installed. Boot disk useful for:

  • Corrupted boot sector.
  • Corrupted master boot record (MBR).
  • Virus infections.
  • Missing or corrupt NTLDR or NTDETECT.COM.
  • Incorrect NTBOOTDD.SYS driver.
  • boot from the shadow of a broken mirror, although you may need to change the BOOT.INI to do that.

Note

  • Boot files reside on the System Partition. (System partition is c:\ for DOS). NT system partition must be a primary partition.
  • System files reside on the Boot Partition. (Boot partition is where your OS is located). Can be primary partition or logical drive.

To create a fault tolerance boot disk:

  • On a computer running Windows NT. Format a floppy disk.
  • Copy the following files from the primary partition. They are located in the system root, but may be hidden files:
Intel x86-based computers  RISC-based computers
Ntldr Osloader.exe
Ntdetect.com Hal.dll
Ntbootdd.sys (for small computer system interface (SCSI) disks not using a SCSI BIOS)* *.pal (Alpha only)
Boot.ini

*The Ntbootdd.sys file appears only on SCSI systems in which the SCSI BIOS is not used!

  • On Intel x86-based computers, modify the boot.ini.
  • On RISC-based computers, modify the firmware variables shown below (remember to update the disk in case of changes):
Variable  Value
Osloader Multi(0)disk(0)fdisk(0)\Osloader.exe
Systempartition Multi(0)disk(0)fdisk(0)
Osloadpartition Path to the secondary mirrored partition.
osloadfilename Path to the Windows NT Server root directory.
  • An ARC is the Advanced RISC Computing. It uses some conventions described below:

Multi(0)disk(0)rdisk(1)partition(2)

Convention  Description
Multi | scsi Identifies the controller type. It can either be SCSI or some other type (multi).
(x) Identifies the hardware adapter (starts with 0).
Disk (y) SCSI bus number. For multi the value is always 0
Rdisk(z) Ordinal number of the disk (ignored for SCSI controllers).

start with 0

Partition(a) Ordinal number of the partition. (start with 1)

 The SCSI ARC naming convention varies the disk() parameter for successive disks on one controller; the multi controller format varies the rdisk() parameter.

Example Boot.ini

How do I work with “Advanced RISC Computer (ARC) names”? 

What are the two sections labeled [boot loader] and [operating systems] in the BOOT.INI (read only, system, hidden)?

1. The [boot loader] section supplies timeout interval after which the default operating system to load (defined in the default= line that follows timeout) loads automatically. *Windows NT usually boots from (boot loader) section.

2. The [operating systems] section supplies complete menu of operating system choices NTLDR displays after the program loads. You can disable the timer before it elapses by pressing any arrow or letter key on the keyboard. Then, you can wait as long as you like to make your menu selection. *Windows NT boots from (operating system) section if deliberate change to OS is made.

Note: You need to make changes in both especially if using disk mirroring or disk duplexing and one fails.

Remember

  • Boot files reside on the System Partition. (System partition is c:\ for DOS). NT system partition must be a primary partition.
  • System files reside on the Boot Partition. (Boot partition is where your OS is located). Can be primary partition or logical drive.
  • Very Microsoft of them :-)

I. How do I determine if I use the Multi or SCSI parameter?

1. "SCSI" only applies to a SCSI drive whose onboard BIOS has been disabled (no BIOS translation capabilities).
2. If the NTBOOTDD.SYS file is on your system, use "SCSI".

If  neither is the case, use "multi."

II. How do I determine the SCSI or Multi number?

What is a disk controller - a chip and associated circuitry that is responsible for controlling a disk drive. There are different controllers for different interfaces. For example, an IDE interface requires an IDE controller and a SCSI interface requires a SCSI controller.

Which Disk Controller is the drive we are looking for attached to?

A. If there is only 1 Disk controller (multi or SCSI) = 0
B. If there are 2 Disk controllers (multi or SCSI) = 1 ; this is assuming the hard drive we are looking for is attached to the second controller. If the hard drive we are looking for is attached to the first disk controller (multi or SCSI) will = 0

III. How do I determine the rdisk and disk number?

Part 1

1. If we are using "multi", "disk" will always be 0
2. If we are using "scsi", "rdisk" will always be 0

Part 2

A. disk = SCSI bus ID (usually 0 to 6) when "SCSI" is chosen.
B. rdisk = LUN (SCSI logical unit number) or position in disk chain when "multi" is chosen. Usually
-The first hard disk = 0
-The second hard disk = 1
-The third hard disk = 2
-and so on unless specifically mentioned.

Odds and Ends

NTFS support:

  • Maximum Volume size = 16EB ;
  • Maximum File Size = 16EB theoretical ; 4GB-64GB actual
  • Max. Files in the Root = No limit
  • Max. Files in non-root = No limit
  • File level security = Yes
  • LFN support = Yes
  • Self repair = Yes
  • Transaction Log capabilities = Yes
  • File Level Compression = Yes
  • File Level Security = Yes
  • Dual File Fork Support (Macintosh) = Yes
  • POSIX support = Yes

When to use NTFS:

  • Partitions of 400 MB or larger
  • NT is only OS in use
  • Using Services for MAC for file sharing
  • File level security is required
  • Permissions must be preserved while migrating directories and files from NetWare server
  • File compression required
  • Local security required

NTFS Notables

  • Note: fdisk will not remove an NTFS logical drive in extended partition. You will have to use NT setup disks.
  • NT is the only OS NT that can directly access NTFS partitions. NTFS information is read-able across the network by many operating systems (including DOS, Win-dows 3.x and 95, as well as other operating systems)
  • An NTFS  partition larger than 4GB CANNOT be created during installation because of FAT limitations. "...it can be done after an installation by using Disk Administrator. If you are upgrading from NT 3.51, do this before you upgrade using Disk Administrator (remember, you can't touch the system or boot partition) ..." Sooooo, how this would work is a mystery to me. Basically, for a new install, what this means is: plug in a second hard disk (to a machine already running NT), have Disk Administrator format it, mark it as active. Power down, and take the newly formatted disk out before you reboot and put it in the PC your doing the NT installation on ... have not had try this but seems like the only logical solution. Probably want to back up original configuration (on the machine you are using to format the hard disk) before putting in second disk as Disk Administrator is stubborn about "forgetting" that a second drive was added :-)

E-mail Me! Comments and suggestions? E-mail me at grantwil@sk.sympatico.ca
Last Updated: Wednesday, March 10, 1999 Grant Wilson, Tisdale, SK. Canada