AIX Boot Process

A system boot is the process by which a machine powers on, tests the hardware, loads and executes the operating system after configuring the devices on the machine. To boot, AIX machines require the following resources:

In AIX 3.2, there are three types of system boots:

Normal Boot
A standalone machine is started for normal operations. The system is booted from the hard disk with the key in normal mode.
Service Boot
The system is booted from a maintenance diskette, bootable tape, CD-ROM, hard disk or any other bootable media with the key in service mode.
Network Boot
A diskless or dataless workstation is booted over a network. The diskless client receives its boot image from the boot server. After successfully booting, the diskless client mounts the necessary file systems from appropriate servers.

Understanding the Normal and Service Boot Processes

The normal and service boot processes can be divided into the following phases:

ROS Kernel Init Phase

This phase is outlined in Figure - ROS Kernel Init Phase.

Chip Sequencer checks if there are any problems in the system mother board. If there is problem, appropriate LED codes are displayed and the machine will stop. This check is also referred to as built-in self-test (BIST). During testing the LED displays values in the range of 100 to 199.

After BIST the control is passed on to the Read Only Storage (ROS) which performs power-on self-test (POST). During the POST, the LED displays values in the range of 200 and above. Please refer to Common Diagnostics and Service Guide. under section "Three-Digit Display Numbers", for a listing of the BIST indicators and their meanings. The same section also contains the meanings of POST indicators.

If the POST passes successfully, then the ROS starts to search for the boot device. The ROS first checks the bootlist contained in the nvram. The nvram keeps two separate bootlists, one for normal mode and one for service mode. Under normal mode, usually the hard disk is the first boot device. Under service mode, usually the floppy drive is the first boot device.

If there is a valid list in the nvram, the ROS will start booting from the device pointed to by the bootlist. If the bootlist is not valid or absent, then the ROS will begin searching for a bootable device from slot8 to slot0 and from SCSI ID 0 thru SCSI ID 6. The nvram contents can be invalidated by unplugging the battery or by the bootlist command as in the examples in section Procedure to View and Change Service Mode Bootlist. and section Procedure to View and Change Normal Mode Bootlist. Please note that the bootlist is referenced as the ipl list in some documentation.


Figure: ROS Kernel Init Phase

Once ROS has identified a boot device, it reads the boot block from the disk. It uses the disk offset and length from the boot block to load the boot image into memory. Control is then passed on to the boot image and the kernel begins executing. Once the kernel is in control, it relocates to address 0 and executes init.

Procedure to View and Change Service Mode Bootlist

This section gives the commands to view and change service mode bootlist. A sample bootlist output is included with the explanation of values.
  1. To view the Service Bootlist enter this command:
    # bootlist -m service -r
    



    ------------------- SERVICE IPLIST at address 0xa00278 --------------
    57 52 2 47 46 2 47 54 2 47 W R - G F - G T - G
    43 19 56 0 0 1 87 31 ac 62 C - V - - - - 1 - b
    5f 0 0 0 0 0 0 0 0 53 - - - - - - - - - S
    20 49 7 7 1 0 0 2 47 4b - I - - - - - - G K
    2 47 49 0 0 0 0 0 0 0 - G I - - - - - - -
    0 0 0 0 0 0 0 0 0 0 - - - - - - - - - -
    0 0 0 0 0 0 0 0 0 0 - - - - - - - - - -
    0 0 0 0 0 0 0 0 0 0 - - - - - - - - - -
    0 0 0 0 - - - -

    Figure: Sample Service Mode bootlist

    An explanation of the output from Figure - Sample Service Mode bootlistfollows:
    57 52 The start of the Service list
    2 How many HEX bytes for the next field
    47 46 The group for diskette (fd)
    47 54 The group for tape (rmt)
    47 43 The group for CD-ROM (cd)
    47 4b The Bus Attached Disk (badisk)
    47 49 The SCSI Disk (scdisk)
    0000018731ac625f The physical volume ID of the hard disk
  2. To invalidate the Service bootlist, enter this command:
    # bootlist -m service -i
    
  3. To set the Service bootlist to tape only, enter this command:
    # bootlist -m service rmt
    
  4. To set the Service bootlist back to the default, enter this command:
    # bootlist -m service fd rmt cd badisk scdisk
    

Procedure to View and Change Normal Mode Bootlist

This section gives the commands to view and change service mode bootlist. A sample bootlist output is included with the explanation of values.
  1. To view the Normal bootlist, enter this command:
    # bootlist -m normal -r
    



    ------------------- NORMAL IPLIST at address 0xa00224 ---------------
    4a 4d 19 56 0 0 1 87 31 ac J M - V - - - - 1 -
    62 5f 0 0 0 0 0 0 0 0 b - - - - - - - - -
    53 20 49 7 7 1 0 0 0 0 S - I - - - - - - -
    0 0 0 0 0 0 0 0 0 0 - - - - - - - - - -
    0 0 0 0 0 0 0 0 0 0 - - - - - - - - - -
    0 0 0 0 0 0 0 0 0 0 - - - - - - - - - -
    0 0 0 0 0 0 0 0 0 0 - - - - - - - - - -
    0 0 0 0 0 0 0 0 0 0 - - - - - - - - - -
    0 0 0 0 - - - -

    Figure: Sample Normal Mode bootlist

    An explanation of the output from Figure - Sample Normal Mode bootlist follows:
    4a 4d The start of the Normal Boot List
    0000018731ac625f The physical volume ID of hard disk
  2. To invalidate the Normal bootlist, enter this command:
    # bootlist -m normal -i
    
  3. To set the Normal bootlist to scdisk only, enter this command:
    # bootlist -m normal scdisk
    

Procedure to View and Change bootlist Using Diagnostic Aids

Diagnostic programs provide an easier way to view and alter the normal and service mode bootlist. Three ways to invoke diagnostic programs in AIX V3.2 is given below:

In all the cases, after loading the diagnostic programs follow the procedure given below to view or alter the bootlist.


**** NOTE: **** Please note that the above procedure has been tested on an AIX 3.2.5 system. The procedure of displaying the bootlist from a standalone diagnostics program has been tested in earlier levels of AIX.

The reader is cautioned that the bootlist must not be changed if not necessary. When the system is installed the bootlist is properly set to boot from the correct hard disk. While the system administrator may view and record the bootlist values, changing the bootlist indiscriminately either for normal or service mode could cripple the system.


Understanding the Network Boot Process

AIX diskless clients require a boot image and access to the AIX file systems, similar to a standalone machine. The network boot consists of three major parts, which are:

See Figure - ROS Kernel Init Phase for an outline of the kernel boot process. In a network boot, the bootlist will indicate that the boot device is a network device. The client machine ROS code broadcasts a boot request packet through a network boot device (network interface) to a boot server. The boot server uses the bootpd daemon to accept the request packet and find the client's boot code in the bootptab file. The server bootpd daemon sends a reply packet back to the client that contains the location of the client's boot file. The client ROS code then reads and loads the boot file from the server. Control is passed on to the kernel similar to a standalone boot.

IPL Phase 1 and IPL Phase 2

In all modes ROS passes control on to the kernel. In IPL phase 1 and phase 2 the ssh program (simple shell) is executed. This simple shell is designed to operate in the RAM file system. It has no shared libraries. A copy of the /sbin/rc.boot script file is copied into the boot logical volume during the bosboot command.

IPL Phase 1

The first command in rc.boot is the restbase command. During restbase, the base customize configuration information is read from the boot image. ODM database gets populated with the base customize information and the configuration manager is invoked in phase 1. This will locate the system bus and determine the type of adapters present on the machine. Any adapters that have ODM database entries in the predefined configuration database will be configured into the system. In the case of a disk boot, all disks in` the system are now configured. The bootinfo command is executed to get the ipldevice location from the IPL control block and a link is made to /dev/ipldevice.

IPL Phase 2

The ipl_varyon routine is executed after varyon of the root volume group. Once the rootvg is varied on, paging is started. The fsck command is run on /dev/hd4 (root file system). Running fsck will cause logredo to execute and replay the jfslog on the volume group. The root file system from the hard disk is then mounted over /mnt and the PATH variable is changed to include the executables on the hard disk as well. Using /etc/filesystems information from the hard disk, /usr file system is mounted. Logical Volume Manager (LVM) information is copied to the hard disk. The directory /dev is recovered from any maintenance work.

Then mergedev is run which insures that the major/minor number of the devices configured to this point matches the devices in the /dev directory on the hard disk. This enables us to continue using the devices after moving the root to the hard disk. The final merge step is to merge the current ODM database entries into the hard disk. Now the temporary mounts are undone and / and /usr are mounted properly. Phase 2 is now complete.

Newroot

At the end of phase 2 on rc.boot, the simple shell which was working as init will exit. Anytime init dies, the kernel will attempt to respawn init since init is the parent of all processes in an AIX system. In the meantime, however the root file system from the hard disk has been mounted over the / mount point. When the kernel respawns init, it will run init from the root file system. The first time init executes it will perform the newroot functions. The following events will occur:

When init begins executing, it will run /etc/inittab file.

Description of /etc/inittab

The different stanzas of /etc/inittab are described below. Only Major functions of the contents are described. For a detailed discussion of the format of /etc/inittab file please refer to InfoExplorer*.

stanza brc

The stanza brc starts execution at the beginning of /etc/inittab. If /etc/inittab is missing on AIX V3.2 then init will run brc directly. This causes /sbin/rc.boot 3 to be executed and the following tasks are performed:

The next step is to run configuration manager (cfgmgr) phase 2. At this time the full configuration database is available and cfgmgr will complete the system configuration. This will configure all attached devices and customize all user specified devices. The command cfgmon is executed to configure the system console. All system configuration is complete at this time and savebase is executed to save the base customize information to the boot image. The message Saving Base Customize Data to boot disk appears on the system console.

The stanza brc will perform the following steps and exit:

For further details of the service boot or the network boot please refer to the information in the /sbin/rc.boot script file.

stanza rc

At the end of brc, stanza rc runs /etc/rc. This performs the following functions:

At the end of the stanza rc, the message Multi-user Initialization completed is displayed on the console.

System Resource Controller srcmstr

The srcmstr daemon is the System Resource Controller, which is also called the source master. The srcmstr daemon starts and controls subsystems such as qdaemon, tcpip, nfs, sna and others. The stanza for srcmstr must precede any other stanzas using scrmstr in inittab.

See Chapter 13, "System Resource Controller & Subsystems" in System Management Guide for AIX Version 3.2 for further details on srcmstr.

Network and Other Subsystem Stanzas

The subsystems tcpip and nfs are started with startsrc resource controller in the inittab file. The stanzas rctcpip and rcnfs run /etc/rc.tcpip and /etc/rc.nfs scripts to start the corresponding subsystems. A separate stanza to start sna subsystem uses the scripts in the file /etc/rc.sna. Each of these script files can be modified to suit the users requirements.

Console and Terminal Logins

The stanza console displays the login prompt on the system console. The stanza getty starts the getty process, which puts the login prompt on each serial attached terminal.
**** NOTE: **** If the console is a serial attached terminal (tty), this terminal must not be set up to enable in the ODM corresponding attribute. It means the attribute Enable LOGIN for this tty mustn't be set to enable, but to disable.
# lscons
/dev/tty0
# lsattr -E -l tty0 -a login
login enable Enable LOGIN True
# chdev -l tty0 -a login=disable
tty0 changed
# lsattr -E -l tty0 -a login
login disable Enable LOGIN True

These commands allow you to check the console device name, to show the current login attribute value for this device, to change it to the correct value, and to display it again.

The output of this lsattr command shows the ODM attribute name (login), its value (enable or disable in the example), its description, which you can see in the menu you get with the smit chgtty command, and the user-settable string.

If the console tty is set to enable, then /etc/inittab will run two different getty on it. One will be run from the console stanza, another one from the tty stanza. What may occur if the attribute of the tty is enable is some garbage on the screen, or some trouble when you loggin into the system. It may look like a cable wiring problem. Once again, this problem only occurs on the console tty.


stanzas cron, qdaemon and writesrv

In the /etc/inittab file there are seperate stanzas to start the cron daemon, qdaemon and writeserv daemon. The cron daemon controls all cronjobs. The qdaemon handles all the print spooler requests. The writeserv daemon allows the users to send and receive messages from remote users, using the write command.

The process init will follow inittab until all processes as defined in each stanza are started and then the control is passed to the user. The login prompt is displayed and the users can start using the system.

User Defined Stanzas

You can start whatever program, application or script shell you want to be run at boot time in file /etc/inittab.

You can refer to InfoExplorer to see further details about /etc/inittab file format.

Common Failure Modes and Recovery

In the following sections, the most common modes of failure during the boot process and possible recovery steps are discussed. Most boot failures on a previously working system are symptoms of some other problem. For example let us assume that rootvg is installed across multiple hard disks. When the system is rebooted, if one of the hard disks is not turned on, the varyon of rootvg will result in a 552 LED displayed. While applying PTFs, the command bosboot is run for some PTFs. If bosboot failed to complete and the warning message is ignored, then the system may not reboot. This problem may show up only when a reboot is attempted. Similarly if some changes are made in the /sbin/rc.boot file, that will be reflected only in a later bosboot and reboot.

To recover from a system that hangs at boot time, you can try to add some commands in file /sbin/rc.boot in maintenance mode, after having run getrootfs command. It will help you to determine the exact step in this shell that fails in normal mode. For example, you can use command /usr/lib/methods/showled, which displays the number you specified on the LED codes. An example of a showled argument is 0x777, in order to display 777 on the LED codes. If you insert your test command inside the Phase 2 boot - disk paragraph, you will have to run bosboot -a -d/dev/hdiskn command after your change, in order to take it into account at next boot. Replace the word hdiskn with the name of the bootable hard disk. If you change the paragraph Phase 3 boot - disk, you don't need to run bosboot command.

Procedure to Run getrootfs in Maintenance Mode

Given below is a procedure to boot the system in service mode and run getrootfs. This procedure will have to run before attempting most of the recovery procedures, which follow this section. This procedure when completed succesfully will attach / and /usr file systems and enable the commands to be executed from these directories, from the maintenance shell.


**** Caution **** In the examples in this section hdisk0 is used as the device for the commands getrootfs and bosboot. In most of the systems hdisk0 contains the boot logical volume and also holds the volume group descriptors for other hard disks in the rootvg. It is quite possible to boot from hard disks other than hdisk0 as well.

The maintenance boot may result in disks having different numbers as compared to those during normal boot, especially if you have external hard disks. During service boot, no base customized data is available and all devices are configured as they are encountered. The correct disk name must be used in the maintenance procedures for the suggested commands to function properly.



**** Warning **** Never run the command getrootfs if the system has been booted from the hard disk, in normal mode.
  1. Turn the key to the Service position.
  2. With bootable media of the same version and level as the system, reset the system.

    You may boot the system in maintenance mode from any one of the following:

    Follow the prompts to the installation/maintenance menu.
  3. If you get a LED 551 or 552 on this step, or if the maintenance menu is not coming up then any one of the following could be the cause:
  4. In the AIX Installation and Maintenance menu choose the ID number that corresponds to Start a limited function maintenance shell.
  5. Determine the value of n to be used in the command getrootfs hdiskn. If you have only one disk, then hdiskn=hdisk0 in the above command. If you have more than one disk, run the command:
    # lqueryvg -p hdiskn -At | grep hd5
    
    for each hdiskn (hdisk0, hdisk1, and so on) until you get output that looks like:
                      00005264feb3631c.2  hd5 1

    The exact output you get will be different. The logical volume identifier will correspond to that of your system in this output.

    You may find more than one disk has this output. These will be the disks in the rootvg volume group. If you get hdisk0 and hdisk1 then use hdisk0. This step is to identify the hard disk on which the boot logical volume is located.

  6. When the system is operational, you may run the following commands to record the placement of hd5. The command lspv will list the PV IDs of the hard disks. The command lslv -m hd5 will list the physical hard disk number for the logical volume hd5. These commands will only run in the normal mode booting, or after having run getrootfs command in maintenance mode, if it has properly executed.
  7. Now access the rootvg volume group by running:
    # getrootfs hdiskn
    
    The command getrootfs is a shell script in the file /usr/sbin/getrootfs.

The command syntax is as follows:

# getrootfs hdisk<n> <command>

where:
<n> = The hard disk containing the boot logical volume
<command> = The command you specify here will be run after the root
volume group is imported, but before the root file system is mounted.
This feature could be used for maintenance purposes.

Two examples of invoking the command are given below. The command getrootfs hdiskn exit will stop getrootfs before mounting any file system.
The command getrootfs hdiskn sh opens another shell before the root file system is mounted. You can perform commands like fsck to check and repair the file systems before mounting them. When you exit the shell, the command getrootfs continues with the process of mounting the file systems.


**** NOTE: **** In maintenance mode, your terminal type isn't set. If you want to use SMIT, pg command or vi editor, you'll have to set your TERM variable to the correct value, and to export it also.

See the chapter on Recovery Procedures in AIX Installation Guide for additional information.

Recovery from Different LED codes

The following sections discuss the recovery procedures to try when your system can't boot in normal boot mode. This explains the usual causes of the usual LED codes, and describes how to repair the system in order to boot it in normal boot mode properly in such circumstances.

Recovery from LED 201

The known causes of LED 201 during IPL on a RISC System/6000 are:

The boot logical volume can become corrupted if the / (root) or /tmp file system is full when the command bosboot is run. When you install PTFs, the command bosboot is run by the installation process.

Recovery Procedure for LED 201

To recover from LED 201, you need to boot into maintenance mode, check the /dev/hd3 and /dev/hd4 file systems for space, erase files if necessary and rebuild the boot image. Then you need to check the error log for check stop errors. Follow the steps given below:

  1. See Procedure to Run getrootfs in Maintenance Mode for booting a system under service mode and running the command getrootfs. The steps in this section allow you to mount the file systems /, /tmp and /usr and run the commands from these directories.
  2. Use the following command to check for free space in the / and /tmp file systems:
    # df /dev/hd4
    # df /dev/hd3
    
  3. If the output showed that either file system is out of space, erase unwanted files from that file system. The files you may want to erase from / are /smit.log, /smit.script, and /.sh_history. You may move them to another not filled file system, if you don't want to erase them.
  4. Determine the disk containing the boot logical volume with the following command:
    # lslv -m hd5
    
    The boot disk will be shown in the PV1 column in the output.
  5. Recreate the boot image with the following command:
    # bosboot -a -d /dev/hdiskn
    

    **** NOTE: **** In this command substitute hdiskn by the boot hard disk, for example hdisk0.
  6. Check the error log for hardware check stop errors with the command:
    # errpt -a | pg
    

    Look for the words check stop. If you find them, contact the IBM Support Center for assistance.

  7. With the key in Normal position, shutdown and reboot the system with the command:
    # shutdown -Fr
    
  8. If LED 201 still occurs when you reboot your system, contact the IBM Support Center.

Recovery from LED 223-229, 225-229, 233-235, 221-229 or LED 221

An alternating LED 223/229, alternating LED 225/229, alternating LED 233/235, alternating LED 221/229, or solid LED 221 occurs in AIX V3.2 on the RISC System/6000 when the system cannot locate the boot image.

To recover from this problem, rebuild the bootlist with the following steps:

  1. Turn the key to the Service position.
  2. While booting from the hard disk, watch for the light on the diskette drive. When you see the light, turn the key to the Normal position.
  3. Did the machine boot successfully ?
    Yes
    Go to Step 1.
    No
    You need to boot the system in the Service mode and run the command getrootfs. See Procedure to Run getrootfs in Maintenance Mode for this procedure. Then go to Step 5.
  4. login as root.
  5. Use the bootlist command to rebuild the bootlist:
    # bootlist -m normal scdisk badisk
    
  6. With the key in the Normal position, shutdown and reboot the system with the command:
    # shutdown -Fr
    
  7. Did the machine boot successfully ?
    Yes
    You have completed the procedure.
    No:
    If you still get LEDs 223-229, 225-229, 233-235, 221-229, or 221, you may have a more serious problem, such as a missing boot logical volume or a defective hard disk. Contact the IBM Support Center for assistance.

Recovery from LED 551

The known causes of LED 551 during IPL on a RISC System/6000 are:

Recovery Procedure

To recover from LED 551, you will need to boot the system in the maintenance mode and run logform on /dev/hd8. Then run fsck to check and correct the file systems. Follow the steps given below:

  1. See Procedure to Run getrootfs in Maintenance Mode for booting the system under Service mode and running the command getrootfs hdiskn sh. The steps in this section allow you to import the rootvg volume group and to open a shell before the root file system is mounted.
  2. Format the default jfslog for the rootvg jfs file systems. Run the following command to format jfslog.
    # logform /dev/hd8
    
    Answer yes when asked if you want to destroy the log.
  3. Next, run the following commands to check and repair file systems:
    # fsck -y /dev/hd1
    # fsck -y /dev/hd2
    # fsck -y /dev/hd3
    # fsck -y /dev/hd4
    # fsck -y /dev/hd9var
    
    The -y option in the above commands gives the permission to the command to repair file systems when necessary. Note that you have to use the command getrootfs hdiskn sh in the getrootfs procedure to run fsck before mounting the file systems. You cannot run fsck on mounted file systems.
  4. Type exit. The file systems will automatically mount.
  5. If you are running the Andrew File System** (AFS), use the following commands to save the AFS file system helper and replace it with the original file system helper:
    # cd /sbin/helpers
    # copy v3fshelper v3fshelper.afs
    # copy v3fshelper.orig v3fshelper
    
  6. Determine the boot disk with the following command:
    # lslv -m hd5
    

    The boot disk will be shown in the PV1 column of the output.

  7. Recreate the boot image with the following command:
    # bosboot -a -d /dev/hdiskn
    

    **** NOTE: **** In this command substitute hdiskn by the boot hard disk, for example hdisk0.
  8. If you are running the Andrew File System (AFS), copy the AFS file system helper back, with the command:
    # copy v3fshelper.afs v3fshelper
    
  9. With the key in Normal position, shutdown and reboot the system with the command:
    # shutdown -Fr
    

    If you followed all of the above steps and the system still stops at an LED 551 during a reboot in Normal mode, please contact the IBM Support Center for further assistance.

Recovery from LED 552

The known causes of LED 552 during IPL on a RISC System/6000 are:

Recovery Procedure for LED 552

To recover from LED 552, you will need to boot the system in the maintenance mode. Then run fsck to check and correct the file systems. Follow the steps given below.

  1. See Procedure to Run getrootfs in Maintenance Mode for booting system under service mode and run the command getrootfs hdiskn sh. You need to run the fsck command before mounting the file systems. The word hdiskn has to be replaced with the name of your bootable hard disk.
  2. Next, run the following commands to check and repair file systems:
    # fsck -y /dev/hd1
    # fsck -y /dev/hd2
    # fsck -y /dev/hd3
    # fsck -y /dev/hd4
    # fsck -y /dev/hd9var
    
    The -y option gives the permission to the command to repair file systems when necessary. Note that you have to use the command getrootfs hdiskn sh in the getrootfs procedure to run fsck before mounting the file systems. You cannot run fsck on mounted file systems.

    Examine the output of fsck. Check the error condition and take the following actions.

    Successful file system Check

    If the fsck was successful on all file systems, go to Step 2 for rebooting the system.

    Could not read block 8

    The file system is unrecoverable. The only way to fix an unrecoverable file system is to recreate it. This involves deleting it from the system and restoring it from a backup. Note that hd2 (/usr) and hd4 (/) cannot be recreated. If either of these file systems are unrecoverable, you must reinstall AIX. For assistance with unrecoverable file systems, contact the IBM Support Center. You have completed the procedure.

    Unrecognized file system

    Block 8 could be read, but you get an error that the file system is not a recognized AIX file system. Attempt to repair the file system with these commands:

    # dd count=1 bs=4k skip=31 seek=1 if=/dev/hd# of=/dev/hd#
    # fsck /dev/hf# 2>&1 | tee /tmp/fsck.errors
    

    In this command, /dev/hd# is the name of your bad logical volume. You may look at the file /tmp/fsck.errors to see what corrections fsck states need to be made.

    Problems in logredo

    A file system with an unknown log record type or fsck fails in the logredo process. A corruption of the jfslog logical volume has been detected. Use the following command to reformat it:

    # logform /dev/hd8
    
  3. Repeat step 1 to run fsck for all file systems that did not successfully complete fsck the first time. If fsck fails a second time, the file system is unrecoverable. It has to be recreated. If the unrecoverable file system is either hd4 (/) or hd2 (/usr) then AIX needs to be reinstalled. If the file system check is successful, then go to Step 2.
  4. With the key in Normal position, run the following commands to reboot the system:
    # exit
    # sync;sync;sync
    # shutdown -Fr
    

    Did the system reboot normally ?

    Yes
    You have completed the procedure.
    No
    Boot in maintenance again, run getrootfs hdiskn sh again, and continue with Step 1.
  5. For AIX V3.2 only, run the following commands, which will remove most of the system's configuration, saving it to a backup directory. You will follow this step, when all other steps have failed:
    # mount /dev/hd4 /mnt
    # mount /dev/hd2 /usr
    # mkdir /mnt/etc/objrepos/bak
    # cp /mnt/etc/objrepos/Cu* /mnt/etc/objrepos/bak
    # cp /etc/objrepos/Cu* /mnt/etc/objrepos
    # /etc/umount all
    # exit
    

    Determine the boot disk with the following command:

    # lslv -m hd5
    
    The boot disk will be shown in the PV1 column of the output.

    Save the clean ODM database to the boot logical volume with the command:

    # savebase -d /dev/hdisk0
    
    In the above command substitute hdisk0 with your boot disk number if different. If you are running AFS, go to Step
    2. Otherwise, go to Step 3.
  6. If you are running the Andrew File System (AFS), use the following commands to save the AFS file system helper, replace it with the original file system helper, rebuild the boot logical volume, and copy the AFS file system helper back:
    # cd /sbin/helpers
    # cp v3fshelper v3fshelper.afs
    # cp v3fshelper.orig v3fshelper
    # bosboot -a -d /dev/ipldevice
    # cp v3fshelper.afs v3fshelper
    
  7. Turn the key to Normal position and run the following commands to shutdown and reboot the system:
    # shutdown -Fr
    

If you followed all of the above steps and the system still stops at an LED 552 during a reboot in the Normal mode, please contact the IBM Support Center.

For reasons of time and the integrity of your AIX operating system, the best alternative at this point may be to reinstall AIX.

Recovery from LED 553

The known causes for LED 553 during IPL on RISC System/6000 are:

Recovery Procedure for LED 553

To recover from LED 553, you will need to boot the system in the maintenance mode. Then check /dev/hd3 and /dev/hd4 for space problems. Check /etc/inittab file for corruption. Then check the shell profiles, the /bin/bsh file, and some other files. Follow the steps given below.

  1. See Procedure to Run getrootfs in Maintenance Mode for booting system under service mode and running the command getrootfs. The steps in this section allow you to mount the file systems /, /tmp and /usr and run the commands from these directories.
  2. Check for free space in / and /tmp file systems with the following commands:
    # df  /dev/hd4
    # df  /dev/hd3
    
  3. If the output showed that either file system is out of space, erase some files from that file system. Three files you may want to erase from / are /smit.log, /smit.script, and /.sh_history (or move them to another file system).
  4. Next, check the /etc/inittab file for corruption. It may be empty or missing, or it may have an incorrect entry. For comparison, see Figure - Example of /etc/inittab for a sample file. Permissions on this file should be -rw-------, owned by root, group system.
  5. If the inittab file is corrupt, set your terminal type in preparation for editing the file.
    # TERM=xxx
    # export TERM
    
    In the above command xxx is a terminal type, such as hft, ibm3151, or vt100. Now use an editor to create the /etc/inittab file.
    You may refer to Figure - Example of /etc/inittab for this purpose.
    **** NOTE: **** All the stanzas in /etc/inittab that begin with ":" are comments. You don't have to create these entries in your new file. Comment lines must not begin with a "#".
  6. Check the following files for any modification or problems with permission.
    # ls -al /.profile /etc/environment /etc/profile
    -rw-r--r--   1 root      system     ...        /.profile
    -rw-rw-r--   1 root      system     ...        /etc/environment
    -rw-r--r--   1 root      system     ...        /etc/profile
    

    One of these files may contain a command that is valid only in the Korn shell. Change the command to something that is also valid in the Bourne shell. For example, change:

    # export PATH=/bin:/usr/bin/:/etc:/usr/ucb:.
    
    to:
    # PATH=/bin:/usr/bin/:/etc:/usr/ucb:.
    # export PATH
    

    **** NOTE: **** File /etc/environment should only contain variable affectations.
  7. Make sure the following files and directory are not missing or moved to some other directories:
    /bin/sh
    /bin/bsh
    /bin
    If all of the above files are missing, check if the problem is a missing symbolic link for /bin. Use the following command to link /bin to /usr/bin:
    # ln -s /usr/bin /bin
    
  8. Make sure the following are not missing or corrupt:
    # ls -al /etc/fsck /usr/sbin/fsck /sbin/rc.boot
    lrwxrwxrwx   1 root     system   ...  /etc/fsck -> /usr/sbin/fsck
    -r-xr-xr-x   1 root     system   ...  /usr/sbin/fsck
    -rwxrwxr--   1 root     system   ...  /sbin/rc.boot
    

    Use the following command to check the integrity of bos.obj:

    # lppchk -f bos.obj
    

    The output of this command will indicate if any files are missing or corrupt.
  9. If you have not found any obvious problems, try substituting ksh for bsh with the following commands. The first command saves your bsh before you replace it:
    # cp /bin/bsh /bin/bsh.orig
    # cp /bin/ksh /bin/bsh
    

    If you can then reboot successfully, you know that one of the profiles was causing problems for bsh. Check the profiles again by running the following:

    # /bin/bsh.orig /.profile
    # /bin/bsh.orig /etc/profile
    # /bin/bsh.orig /etc/environment
    

    If you receive errors with any of the above commands, you know there is a command in that profile that bsh cannot handle.

If you followed all of the above steps and the system still stops at LED 553 during a reboot in the Normal mode, please contact the IBM Support Center. For reasons of time and the integrity of your AIX operating system, the best alternative at this point may be to reinstall AIX.


: @(#)49  1.28  com/cfg/etc/inittab, bos, bos320 10/3/91 10:46:51
: COMPONENT_NAME: CFGETC
:
: ORIGINS: 3, 27
:
: (C) COPYRIGHT International Business Machines Corp. 1989, 1990
: All Rights Reserved
: Licensed Materials - Property of IBM
:
: US Government Users Restricted Rights - Use, duplication or
: disclosure restricted by GSA ADP Schedule Contract with IBM Corp.
:
: Note - initdefault and sysinit should be the first and second entry.
:
init:2:initdefault:
brc::sysinit:/sbin/rc.boot 3 >/dev/console 2>&1 # Phase 3 of system boot
powerfail::powerfail:/etc/rc.powerfail >/dev/console 2>&1 # d51225
rc:2:wait:/etc/rc > /dev/console 2>&1 # Multi-User checks
fbcheck:2:wait:/usr/lib/dwm/fbcheck >/dev/console 2>&1 # run /etc/firstboot
srcmstr:2:respawn:/etc/srcmstr # System Resource Controller
rctcpip:2:wait:/etc/rc.tcpip > /dev/console 2>&1 # Start TCP/IP daemons
rcnfs:2:wait:/etc/rc.nfs > /dev/console 2>&1 # Start NFS Daemons
cons:0123456789:respawn:/etc/getty /dev/console
piobe:2:wait:/bin/rm -f /usr/lpd/pio/flags/* # Clean up printer flags files
cron:2:respawn:/etc/cron
qdaemon:2:wait:/bin/startsrc -sqdaemon
writesrv:2:wait:/bin/startsrc -swritesrv
uprintfd:2:respawn:/etc/uprintfd
rcncs:2:wait:sh /etc/rc.ncs
infod:2:once:startsrc -s infod
afs:2:wait:/etc/rc.afs > /dev/console 2>&1 # Start afs
tty0:2:off:/etc/getty /dev/tty0

Figure: Example of /etc/inittab

Recovery from LED 727

In AIX V3 on the RISC System/6000, the system may hang on LED 727 during a boot in Normal mode if the devices are defined on a concentrator but are not attached. This problem is seen to occur on the 64-port async adaptor card only. Normally LED 727 may occur when the system is rebooted. Sometime during the operation, administrator could have disconnected a concentrator from the async adaptor. Or the whole async adaptor may have been removed or changed to another location. In such case, the tty and serial printer devices become unavailable. This is a known problem. Refer to the IBM Support Center for more information on APAR ix23405.

Recovery Procedure for LED 727

To recover from LED 727, list all the ttys and printers with the command lsdev after booting the system in the maintenance mode. Then, for each device listed as defined, remove the definition or attach the device.

  1. See Procedure to Run getrootfs in Maintenance Modefor booting system under Service mode and running the command getrootfs. The steps in this section allow you to mount the file systems /, /tmp and /usr and run the commands from these directories.
  2. Set the ODMDIR and TERM environment variables in preparation for using SMIT in later steps (xxx is a terminal type, such as hft, ibm3151, or vt100):
    # ODMDIR=/etc/objrepos
    # TERM=xxx
    # export ODMDIR
    # export TERM
    
  3. To list all ttys and printers, run the following commands:
    # lsdev -Cc tty
    # lsdev -Cc printer
    
  4. For each asynchronous concentrator device that is shown as defined, either remove its definition or attach it so that it becomes available.

    An example is given in the following screen:


    # lsdev -Cc tty
    tty0 available 00-00-S1-00 Asynchronous Terminal
    tty3 available 00-07-01-13 Asynchronous Terminal
    tty4 defined 00-07-04-15 Asynchronous Terminal

    In the above example, you don't need to do anything for tty0 because it is on the native serial port 1, or for tty3, because it is available. For tty4, you need to either remove the definition or attach a terminal device to concentrator port 00-07-04-15.

    Consider the following example for a serial printer:


    # lsdev -Cc printer
    lp0 available 00-00-0P-00
    lp3 available 00-07-01-13
    lp4 defined 00-07-04-15
    lp1 defined 00-07-02-11

    You do not have to do anything for lp0, because it is on the parallel port, or for lp3, because it is available. For lp1 and lp4, you need to either remove the definitions or attach printers to concentrator ports 00-07-04-15 and 00-07-02-11.

  5. If you decide to attach the devices, you can do so through the SMIT interface with the command smit device. When you are finished, reboot in the Normal mode.

    If you have successfully attached the devices and rebooted properly in the normal mode, then you have completed the procedure.

    If the system still hangs on LED 727, or if you do not have the devices to attach to the defined ports, then you must remove these device definitions from the system.

  6. You can remove the devices through the SMIT interface with smit device or by using the following commands:
    # rmdev -l lp# -d
    
    or
    # rmdev -l tty# -d
    

    Note: Replace # with the appropriate number in the above commands.

    If you have successfully removed the devices, go to Step 10

    You may receive the following error when you try to remove the devices:

    0514-039 error unloading kernel extension
    Method error: /etc/methods/ucftty
    
  7. If you receive the above errors, create a file called del_dev consisting of the following lines:
    # odmdelete -q "name=${1}" -o CuAt
    # odmdelete -q "name=${1}" -o CuDv
    # odmdelete -q "value3=${1}" -o CuDvDr
    

    Save the file and exit the editor.

  8. Run the following command to change the file type to executable:
    # chmod +x del_dev
    
  9. Use the command del_dev xxx where xxx is the definition you want to remove (for example lp1).
  10. After you have removed all the unnecessary devices, be sure to enter the command: savebase.
  11. Now, switch the key to Normal, remove the diskette or tape from which you booted, and run the following command to shutdown and reboot the system in the Normal mode:
    # shutdown -Fr
    

    **** NOTE: **** If all else fails, and you need to get your system running, you can turn the system off and remove the 64-port adapter card. Then reboot. For assistance with removing the card contact the IBM Support Center.

Recovery from LED C31

The known causes of LED C31 during IPL on a RISC System/6000 are:

Recovery Procedure for LED C31

To recover from LED C31 you need to check up the cable connections for the console. If that is not the problem, then you need to boot the system in the maintenance mode and change the console definition. Follow the steps given below.

  1. Check for a non-terminal device plugged into native serial port S1 or S2. These ports are at the back of the system unit in a RISC System/6000. If there is no such device, go to step 2. If there is one, unplug it or turn it off. Then reboot the machine with the key in Normal position. You need not continue with these steps, if the system reboots without stopping at LED C31.
  2. See Procedure to Run getrootfs in Maintenance Mode for booting system under service mode and running the command getrootfs. The steps in this section allow you to mount the file systems /, /tmp and /usr and run the commands from these directories.
  3. Change the system console with the following command. The change will take effect at the next startup of the system.

    If your console is an hft (6091), run the command:

    # chcons -a login=enable /dev/hft/0
    
    If your console is a tty (ibm3151), run the command:
    # chcons -a login=enable /dev/tty/0
    
  4. With the key in Normal position, run:
    # shutdown -Fr
    
  5. If the system does not stop at LED C31 during reboot, then you have completed the procedures.

    If the system still stops at LED C31 during reboot, please repeat Step 2. Note that the syntax of the command to be used is getrootfs hdiskn sh to allow for a shell to fork before the file systems are mounted. Then proceed with the step 6.

  6. This step removes most of the system configuration after saving it to a backup directory. You should be cautious with any command which modifies ODM directly. This step will normally be tried in desparation. Run the following commands, which will remove most of the system's configuration, saving it to a backup directory (n is the number of the fixed disk, determined in step 2):
    # getrootfs hdiskn sh
    # mount /dev/hd4 /mnt
    # mount /dev/hd2 /usr
    # mkdir /mnt/etc/objrepos/bak
    # cp /mnt/etc/objrepos/Cu* /mnt/etc/objrepos/bak
    # cp /etc/objrepos/Cu* /mnt/etc/objrepos
    # /etc/umount all
    # exit
    

    Determine which disk is the boot disk with the lslv command. The boot disk will be shown in the PV1 column of the lslv output.

    # lslv -m hd5
    
    Save the clean ODM database to the boot logical volume (n is the number of the fixed disk, determined with the previous command):
    # savebase -d /dev/hdiskn
    
  7. Turn the key to Normal position and run:
    # shutdown -Fr
    

    If you still get LED C31, contact the IBM Support Center for assistance.

Recovery from LED C99

The known cause for LED C99 during IPL is as follows:

Recovery Procedure for LED C99

To recover from LED C99, you need to boot the system in the maintenance mode. Then you have to check the /usr file system and directories /usr/bin. Follow the steps given below.

  1. See Procedure to Run getrootfs in Maintenance Mode for booting system under service mode and running the command getrootfs. The steps in this section allow you to mount the file systems /, /tmp and /usr and run the commands from these directories.
  2. View the /etc/filesystems file to make sure it contains a stanza for the /usr file system, which may be similar to the following example.


    /usr:
    dev = /dev/hd2
    vfs = jfs
    log = /dev/hd8
    mount = automatic
    check = false
    type = bootfs
    vol = /usr
    free = false

    Figure: Example of /usr Stanza in /etc/filesystems

  3. Use the command:
    # df /usr
    
    to make sure that /usr is mounted.

    Did you find the /usr file system ?

    Yes
    Go to Step 1.
    No
    Contact the IBM Support Center.
  4. Check for the existence of the /usr/bin directory by trying to change to that directory, with the command:
    # cd /usr/bin
    

    Was this command successful ?

    Yes
    Go to Step 1.
    No
    The directory /usr/bin may need to be created. Contact the IBM Support Center for assistance.
  5. Check for the existence of the /usr/bin/odmget file with the following command:
    # ls -l /usr/bin/odmget
    
  6. If there is no odmget, then check to see if other files are missing from /usr/bin with the command:
    # lppchk -f bos.obj
    

    Look at the stderr output of the above command to check if any files are missing from the /usr/bin directory.

  7. Restore missing files if you can. If you cannot restore the files, you will need to reinstall AIX V3.