Linux on mixed SCSI/IDE system

I have been using Linux since 1997, starting with my Pentium 90MHz system. Over the years, I had it installed on many different configurations, but never had a mixed SCSI/IDE system. After using Linux on my Dell Inspiron 5000 for a while, I decided it's time to put it on my desktop. Due to the unique configuration of my current system, it wasn't a smooth installation as expected, however, I am sure many people who have similar configuration would find this article helpful.

On my desktop computer, I have one EIDE hard disk, a 2.1G Seagate ultra

SCSI (wide) hard disk, a 9.1G Western Digital (WD) ultra2 SCSI hard disk, and one fast SCSI MO drive. The two SCSI hard disks are connected to a QLogic Ultra2 LVD SCSI

adapter while the MO drive is connected to a Adaptect 2940U SCSI adapter.

Other system info: SuSE Linux 7.2, Intel Pentium III, ATI Mach64 video card,

Matrox G200 video card. Dual monitor setup.

To install SuSE Linux from the CD-ROM, I had to load the QLogic SCSI module at the installation menu before it would recognize my SCSI hard disk. I chose to install it to the Seagate SCSI disk (SCSI ID 1, where as the Western Digital is on ID 0). The driver was successfully loaded. The kernel issued /dev/sda1 to the WD drive and /dev/sdb1 to the Seagate drive.

After successfully partitioning the drive, I activated the boot partition on the Seagate drive and designate LILO to its boot sector. I proceeded with the installation. It went flawlessly. During the next reboot, instead of seeing LILO, I saw “LI” and the system hangs. I have ran into something like this before and I thought by adding the linear option in lilo.conf file it would fix the problem, but it didn’t.

I was able to use the bootable CD-ROM provided by SuSE to get inside and load my Qlogic driver and boot from /dev/hdb1 (the Seagate drive where Linux is installed). When I boot this way, I was able to successfully get into the system.

LI - what? The LILO booting process

Common LILO errors at booting

This is mainly section 5.2.1 from [Alm94].

When LILO loads itselft, it displays the word “LILO”. Each letter is printed before or after performing some specific action. If LILO fails at some point, the letters printed so far can be used to identify the problem.

Nothing - NO part of LILO has been loaded. Either LILO is not installed at all or the partition on which its boot section is located isn’t active.

‘L’ error ... The first stage boot loader has been loaded and started, but it can’t load the second stage boot loader (/boot/boot.b). The two-digit error codes indicate the type of problem. This condition usually indicates a media failure or a geometry mismatch.

‘LI’ The second stage ahs been invoked but could not be started. This can either be caused by a geometry mismatch or by moving /boot/boot.b without reinstalling LILO.

‘LIL’ The second stage of boot loader has been started, but it can’t load the descriptor table from the map file. This is typically due to a physical error of the boot device or a faulty disk geometry.

‘LIL?’ The second stage boot loader has been loaded at an incorrect address. This is typically caused by a subtle geometry mismatch or by moving /boot/boot.b without reinstalling LILO.

‘LIL-’ The descriptor table (in the map file) is corrupt. This can either be caused by a geometry mismatch or by moving /boot/boot.b without reinstalling LILO.

‘LILO’ All parts of LILO have been successfully loaded.

The most common causes for geometry errors are not physical defects or invalid partition tables but errors in LILO installation, including:

• disregarding the 1024 cylinders limit

• an unsuccessful attempt at starting LILO from a logical partion.

Well, after reading this, I got an idea of what went wrong. However, since the cylinder limit does not apply in this case as the LILO resides in the first 512 sectors of the Seagate disk, I had to think of another cause. So I booted from the CD-ROM and loaded the kernel from the Seagate disk. Then I took a look at my /etc/lilo.conf file:

disk = /dev/hda

bios = 0x80

disk = /dev/sda

bios = 0x81

disk = /dev/sdb

bios = 0x82

boot = /dev/sdb1

vga = normal

read-only

menu-scheme = Wg:kw:Wg:Wg

lba32

prompt

timeout = 80

[portions deleted]

Looking at the BIOS code, I realized LILO had incorrectly mapped the bios code. During the boot-up process, my Qlogic SCSI controller’s BIOS shows that the Seagate has replaced the current C: drive (IDE) with itself. The Seagate was assigned the bios code 80 and the WD drive has code 81. This is all visible on the screen during the Qlogic BIOS initialization process.

Upon realizing that I need to change the bios code in lilo.conf I booted again from the CD-ROM into the existing system. I modified the configuration file with the following entries:

disk = /dev/hda

bios = 0x82

disk = /dev/sda

bios = 0x81

disk = /dev/sdb

bios = 0x80

boot = /dev/sdb1

I thought this would correct the problem. Once finished, I re-installed LILO using this new config file.

lilo -C /etc/lilo.conf

I rebooted the system. Watching the screen output LILO and continued with the boot process. I felt relieved, until I saw this message:

Kernel panic: No init found. Try the init= option

Well, at least I solved the ‘LI’ problem. Now I have a new problem to solve. After tinkering with various configuration files and many reboot, I couldn’t figure out what went wrong except that the kernel was looking for something in the wrong place.

Frustrated, I decided to look at the boot log a few time. I booted again from the CD-ROM, loading only the Qlogic SCSI module. I booted into the system on the hard disk and carefully examined the log file. Suddenly, I found an entry for /dev/sdc, a third SCSI drive. For a few seconds I was puzzled. Then I realized it was the MO drive connected to my Adaptec SCSI controller!

I rebooted the system with the CD-ROM. At the module screen, I loaded Qlogic module first, and examined the kernel message. It had assigned the hard disks as I configured in the LILO configuration file. So I proceed to load the second SCSI module, the Adaptec module. I review the kernel message again and realized that an additional SCSI device entry was added (/dev/sdc).

My forgetful mind

What happened was that upon the first failure of LILO, I booted from the CD-ROM to be able to modify the lilo.conf file on my Seagate disk. I loaded only the Qlogic SCSI module so I can boot from the hard drive. The Adaptec SCSI controller was not probed because I forgot to load its module. However, in the actual kernel (when it successfully process LILO and continues until the “kernel panic” message) it detects both the Adaptec and QLogic, hence another /dev/sdx drive shows up.

Since the order the SCSI device changed. I needed to modify the lilo.config and fstab, as following: (/dev/sdc is the Seagate drive where Linux resides)

disk = /dev/hda

bios = 0x82

disk = /dev/sda

bios = 0x83

disk = /dev/sdb

bios = 0x81

disk = /dev/sdc

bios = 0x80

boot = /dev/sdc

vga = normal

read-only

menu-scheme = Wg:kw:Wg:Wg

lba32

prompt

timeout = 80

message = /boot/message

image = /boot/vmlinuz

label = linux

root = /dev/sdc1

initrd = /boot/initrd

image = /boot/vmlinuz.suse

label = suse

root = /dev/sdc1

initrd = /boot/initrd.suse

optional

other = /dev/hda1

label = nt

image = /boot/memtest.bin

label = memtest86

root = /dev/sdc1

I re-installed LILO with the new configuration file and made sure the entries in fstab is also correct so I can mount properly.

fstab

dev/sdc1 / ext2 defaults 1 1

/dev/cdrecorder /media/cdrecorderauto ro,noauto,user,exec 0 0

/dev/cdrom /media/cdrom auto ro,noauto,user,exec 0 0

devpts /dev/pts devpts defaults 0 0

/dev/fd0 /media/floppy auto noauto,user,sync 0 0

proc /procproc defaults 0 0

/dev/hda1 /windows/C ntfs ro,noauto,user,umask=022 0 0

/dev/sdb1 /ultra vfat noauto,user 0 0

/dev/sdc2 swap swap pri=42 0 0

I rebooted the system. Now Linux boots without a glitch.