Linux on mixed SCSI/IDE system
I have been using Linux since 1997, starting with my Pentium 90MHz system. Over the years, I had it installed on many different configurations, but never had a mixed SCSI/IDE system. After using Linux on my Dell Inspiron 5000 for a while, I decided it's time to put it on my desktop. Due to the unique configuration of my current system, it wasn't a smooth installation as expected, however, I am sure many people who have similar configuration would find this article helpful.
On my desktop computer, I have one EIDE hard disk, a 2.1G Seagate ultra
SCSI (wide) hard disk, a 9.1G Western Digital (WD) ultra2 SCSI hard disk, and one fast SCSI MO drive. The two SCSI hard disks are connected to a QLogic Ultra2 LVD SCSI
adapter while the MO drive is connected to a Adaptect 2940U SCSI adapter.
Other system info: SuSE Linux 7.2, Intel Pentium III, ATI Mach64 video card,
Matrox G200 video card. Dual monitor setup.
To install SuSE Linux from the CD-ROM, I had to load the QLogic SCSI module at the installation menu before it would recognize my SCSI hard disk. I chose to install it to the Seagate SCSI disk (SCSI ID 1, where as the Western Digital is on ID 0). The driver was successfully loaded. The kernel issued /dev/sda1 to the WD drive and /dev/sdb1 to the Seagate drive.
After successfully partitioning the drive, I activated the boot partition on the Seagate drive and designate LILO to its boot sector. I proceeded with the installation. It went flawlessly. During the next reboot, instead of seeing LILO, I saw “LI” and the system hangs. I have ran into something like this before and I thought by adding the linear option in lilo.conf file it would fix the problem, but it didn’t.
I was able to use the bootable CD-ROM provided by SuSE to get inside and load my Qlogic
driver and boot from /dev/hdb1 (the Seagate drive where Linux is installed). When I boot this
way, I was able to successfully get into the system.
LI - what? The LILO booting process
Common LILO errors at booting
This is mainly section 5.2.1 from [Alm94].
When LILO loads itselft, it displays the word “LILO”. Each letter is printed before or after performing some specific action. If LILO fails at some point, the letters printed so far can be used to identify the problem.
Nothing - NO part of LILO has been loaded. Either LILO is not installed at all or the partition on which its boot section is located isn’t active.
‘L’ error ... The first stage boot loader has been loaded and started, but it can’t load the second stage boot loader (/boot/boot.b). The two-digit error codes indicate the type of problem. This condition usually indicates a media failure or a geometry mismatch.
‘LI’ The second stage ahs been invoked but could not be started. This can either be caused by a geometry mismatch or by moving /boot/boot.b without reinstalling LILO.
‘LIL’ The second stage of boot loader has been started, but it can’t load the descriptor table from the map file. This is typically due to a physical error of the boot device or a faulty disk geometry.
‘LIL?’ The second stage boot loader has been loaded at an incorrect address. This is typically caused by a subtle geometry mismatch or by moving /boot/boot.b without reinstalling LILO.
‘LIL-’ The descriptor table (in the map file) is corrupt. This can either be caused by a geometry mismatch or by moving /boot/boot.b without reinstalling LILO.
‘LILO’ All parts of LILO have been successfully loaded.
The most common causes for geometry errors are not physical defects or invalid partition tables but errors in LILO installation, including:
• disregarding the 1024 cylinders limit
• an unsuccessful attempt at starting LILO from a logical partion.
Well, after reading this, I got an idea of what went wrong. However, since the cylinder limit does not apply in this case as the LILO resides in the first 512 sectors of the Seagate disk, I had to think of another cause. So I booted from the CD-ROM and loaded the kernel from the Seagate disk. Then I took a look at my /etc/lilo.conf file:
disk = /dev/hda
bios = 0x80
disk = /dev/sda
bios = 0x81
disk = /dev/sdb
bios = 0x82
boot = /dev/sdb1
vga = normal
read-only
menu-scheme = Wg:kw:Wg:Wg
lba32
prompt
timeout = 80
[portions deleted]
Looking at the BIOS code, I realized LILO had incorrectly mapped the bios code. During the boot-up process, my Qlogic SCSI controller’s BIOS shows that the Seagate has replaced the current C: drive (IDE) with itself. The Seagate was assigned the bios code 80 and the WD drive has code 81. This is all visible on the screen during the Qlogic BIOS initialization process.
Upon realizing that I need to change the bios code in lilo.conf I booted again from the CD-ROM into the existing system. I modified the configuration file with the following entries:
disk = /dev/hda
bios = 0x82
disk = /dev/sda
bios = 0x81
disk = /dev/sdb
bios = 0x80
boot = /dev/sdb1
I thought this would correct the problem. Once finished, I re-installed LILO using this new
config file.
lilo -C /etc/lilo.conf
I rebooted the system. Watching the screen output LILO and continued with the boot process. I felt relieved, until I saw this message:
Kernel panic: No init found. Try the init= option
Well, at least I solved the ‘LI’ problem. Now I have a new problem to solve. After tinkering with various configuration files and many reboot, I couldn’t figure out what went wrong except that the kernel was looking for something in the wrong place.
Frustrated, I decided to look at the boot log a few time. I booted again from the CD-ROM, loading only the Qlogic SCSI module. I booted into the system on the hard disk and carefully examined the log file. Suddenly, I found an entry for /dev/sdc, a third SCSI drive. For a few seconds I was puzzled. Then I realized it was the MO drive connected to my Adaptec SCSI controller!
I rebooted the system with the CD-ROM. At the module screen, I loaded Qlogic module first, and examined the kernel message. It had assigned the hard disks as I configured in the LILO configuration file. So I proceed to load the second SCSI module, the Adaptec module. I review the kernel message again and realized that an additional SCSI device entry was added (/dev/sdc).
My forgetful mind
What happened was that upon the first failure of LILO, I booted from the CD-ROM to be able to modify the lilo.conf file on my Seagate disk. I loaded only the Qlogic SCSI module so I can boot from the hard drive. The Adaptec SCSI controller was not probed because I forgot to load its module. However, in the actual kernel (when it successfully process LILO and continues until the “kernel panic” message) it detects both the Adaptec and QLogic, hence another /dev/sdx drive shows up.
Since the order the SCSI device changed. I needed to modify the lilo.config and fstab, as following: (/dev/sdc is the Seagate drive where Linux resides)
disk = /dev/hda
bios = 0x82
disk = /dev/sda
bios = 0x83
disk = /dev/sdb
bios = 0x81
disk = /dev/sdc
bios = 0x80
boot = /dev/sdc
vga = normal
read-only
menu-scheme = Wg:kw:Wg:Wg
lba32
prompt
timeout = 80
message = /boot/message
image = /boot/vmlinuz
label = linux
root = /dev/sdc1
initrd = /boot/initrd
image = /boot/vmlinuz.suse
label = suse
root = /dev/sdc1
initrd = /boot/initrd.suse
optional
other = /dev/hda1
label = nt
image = /boot/memtest.bin
label = memtest86
root = /dev/sdc1
I re-installed LILO with the new configuration file and made sure the entries in fstab is also correct so I can mount properly.
fstab
dev/sdc1 / ext2 defaults 1 1
/dev/cdrecorder /media/cdrecorderauto ro,noauto,user,exec 0 0
/dev/cdrom /media/cdrom auto ro,noauto,user,exec 0 0
devpts /dev/pts devpts defaults 0 0
/dev/fd0 /media/floppy auto noauto,user,sync 0 0
proc /procproc defaults 0 0
/dev/hda1 /windows/C ntfs ro,noauto,user,umask=022 0 0
/dev/sdb1 /ultra vfat noauto,user 0 0
/dev/sdc2 swap swap pri=42 0 0
I rebooted the system. Now Linux boots without a glitch.