Those who use the hex Word, 55AAH, to represent the byte sequence "55 AA" on a hard disk, floppy or in the memory of an IBM PC-compatible are either ignorant of the little-endian[1] nature of the entire x86 family of Intel® processors, or do not care enough about the truth to correct their errors[2]. Some of the correct ways to represent the hex byte sequence "55 AA" on a PC (as a hexadecimal Word) would be: 0xAA55, 0AA55H, AA55h. Of course, employing a phrase like "55h followed by AAh" may be the best way to describe this sequence for those who will never use an x86 assembler.
A. From
any IBM® or Microsoft® DOS prompt, you would simply type, debug,
to enter this program.
B. Under a Windows OS, such as Windows XP, you can
either open a CMD window first, then enter debug,
or if it's easier for you: Click on the start button, then click on "Run...",
type debug in the box and click on OK. This will pop up
a virtual DEBUG window.
|
Most other PC operating systems, including Linux, do not have as easy to use a debugger as DEBUG. However, it's still possible to prove the correct way to refer to the byte sequence "55 AA" by creating a small binary program with NASM or even as (or some other assembler) using equivalent assembly instructions to those above, and then dumping its contents with the command: hexdump -Cv <filename> to view the order of the bytes stored in memory or on the hard disk. If any Linux users really need help doing this, please contact us. [We will try to add more about this in the future.]
When the first IBM® Personal Computer became available in 1981, it had no hard disk and no concept of a boot record signature in its operating system. It wasn't until the introduction of IBM® Personal Computer DOS 2.00 in 1983 that our identifier "55 AA" appeared in boot sectors[4] on floppy diskettes. We are still seeking the earliest reference within any IBM® or Microsoft® documents to this Signature ID[5].
We have found the
incorrect hex word (55AAH) in the "First
Edition (April 1987)" of IBM's "Technical
Reference (Programming Family)" for the "Disk Operating
System Version 3.30" only in "Chapter
9. Fixed Disk Information"; located on following pages (quoting
the sentences that include them for proper context):
"Signature: The
last 2 bytes of the boot record (55AAH)
are used as a signature to identify a valid boot record. Both this record and
the partition boot records are required to contain the signature at offset 1FEH."
(p. 9-9),
"Each extended volume
contains an extended boot record in the first sector of the disk location assigned
to it. This boot record contains the 55AAH
signature id byte." (p. 9-12) and
"The last two bytes of
the extended boot record (55AAH)
are used as a signature to identify a valid boot record. Both this record and
the logical drive boot records are required to contain the signature at offset
1FEH." (p. 9-14).
Although it's only a guess, we suspect this erroneous hex Word (55AAH) for the Signature ID became part of the IBM documentation simply because an employee assigned the task (and/or a manager who may have provided the final "corrections") was apparently too unfamiliar with the little-endian nature of the Intel processor used in the PC to realize their mistake. Sadly, this error continued to be perpetuated within IBM's documentation for so long that many today believe this must be the correct way to refer to these Signature ID bytes on a PC. If someone had to pass a class in x86 assembly, using MASM[6], NASM[7], TASM[8] or any other x86 assembler, they would quickly learn the truth.
Journalists and computer book authors must often rely upon the documentation from a manufacturer, so we're sure this is the main reason so many writer's comments about the signature ID contain this error; and yet another reason for their reluctance to correct it. Unfortunately, Microsoft® made things worse by apparently deciding to come up with an excuse for why this error is actually an acceptable usage! We have more to say about their very confusing notes[9] below.
If you search the Net for the terms "Sammes Jenkinson 55aah," you should find a link at Google Books to page 142 of the first edition of Forensic Computing, A Practitioner's Guide, by A. J. Sammes, Tony Sammes and Brian Jenkinson. Although the comments here and in Table 5.25 on page 143 relate to loading additional BIOS code, such as that for a video card, it still uses the same 2-byte identifier as the MBR sector but in the first two bytes of this 64-byte dump of 0C0000h and following from a computer's memory:
Offset 0 1 2 3 4 5 6 7 8 9 A B C D E F 0C0000 55 AA 5A E9 CB AA 30 30-30 30 30 30 30 30 30 30 U.Z...0000000000 0C0010 30 30 C4 17 E9 DD 16 BD-40 00 B0 0A 30 30 49 42 00......@...00IB 0C0020 4D 20 56 47 41 20 43 6F-6D 70 61 74 69 62 6C 65 M VGA Compatible 0C0030 20 42 49 4F 53 2E 20 03-5B 00 6B 00 79 00 8B C0 BIOS. .[.k.y... |
But twice on page 142, this book makes reference to these bytes using the incorrect hex Word, 55AAh. [The same errors were repeated in the book's second edition on pages 170-171.] Now click your way to page 145, where you will find at the end of the 2nd paragraph, the comment: "the final two bytes of the partition table are always of value 55h and aah, as shown." Unfortunately, the second edition, added a footnote here (#71 on page 174), which reads: "Many systems will refuse to boot if these two bytes are not set to 55aah." The publisher of this book (Springer) is a prestigious company, and its authors are quite knowledgeable. Since the authors discussed the topic of little-endianness at length in chapter 2 and elsewhere; including sections of chapter 5, why did they use an incorrect hexadecimal number when describing the two-byte identifier of 55h and AAh on a PC? We have been unable to contact the authors, so don't know for sure, but they probably did so just because so many others do.
MORE EXAMPLES TO FOLLOW AT A LATER DATE.
No! AA55H is hexadecimal for 43,605, and 55AAH will always equal 21,930; whether these numbers are stored on a little-endian PC or a big-endian computer. A number is a number, period; whether it's binary, decimal or hex. And changing what endian-type of computer a number is stored on does not change one number into another, only the way its bytes are ordered. When we see the characters "0x" followed by a string of digits, we must logically conclude they represent one hexadecimal number; a number which we could also convert into equivalent decimal, octal or binary, and they must be equal every time we do so! If someone wishes to use different symbols to define some "little-endian hex representation on a PC" that's their choice. But prepending "0x" or appending "h" or "H" to one or more hexadecimal digits is already a well-defined standard.
A signature of two or more bytes in length cannot be separated from the exact order of its bytes. If defined as such, they would need to keep the same order whether being stored on a big-endian or little-endian computer; yet this would mean they'd have to be represented by different hexadecimal numbers, since each type of computer stores multi-byte numbers differently.
However, as we saw above, an Option ROM, Boot Record or any other kind of signature isn't required to be expressed as a number. Technically, it's just an indicator or an identification, and could just as well be handled as two separate data bytes; a 55h followed by AAh (or: db 55h, AAh in assembly code). The main reason these two bytes show up as a single Word (0xAA55 or 0AA55h) more often, is simply because it's easier, and more logical, to compare two Words of data rather than four separate bytes. We'll show this by examining some assembly code below.
Students who pass a class in x86 assembly (or even you, if you studied the DEBUG proof above) know the truth, and so did the early programmers at Microsoft® who used this "Signature ID" in their code. For example, here's a bit of commented MASM source code which checks if a valid boot record signature ID exists; followed by two lines of code that define where and what it should be:
; The partition table is located at offset 1BEh in the sector. ; The signature is located at offset 1FEh (= 55h, AAh or word AA55h). TestBootSignature: cmp WORD PTR [BX + 510],0aa55h ; Check for 55 AA sequence #define BOOT_SIG_OFFSET 510 #define BOOT_SIG 0xaa55
The cmp instruction above, compares the WORD pointed to (PTR) at offsets 510 and 511 decimal to the hex word 0aa55h; which could have also been written as, cmp WORD PTR [BX + 510],BOOT_SIG, if BOOT_SIG had been previously defined as 0xAA55. Here's another code snippet that checks for an Option ROM:
mov ax,0C000h ; look for 2nd graphics card installed mov ds,ax cmp word ptr DS:[0],0AA55h ; DS:0 -- check for option ROM
You can clearly see we need to write the hex Word as "0AA55h" to check for the byte sequence "55 AA" of either an MBR's signatrue, or the beginning of a video card's BIOS (as we saw in Fig. 2 above). And one last example:
;---- Check for ROM signature ---- cmp es:[di],0xAA55 ; Is the ROM signature present? jne NotOptionRom ; If not, jump out.
If any early DOS programmers had used the erroneous 55AAh in their programs, they would not have functioned correctly! People who merely write about such things, rather than those who assemble working code, are the ones more likely to make mistakes that may never be corrected.
1[Return to Text] [Table 1.] The original IBM® Personal Computer ("PC") or any IBM PC-compatible (whether its CPU is an Intel®, AMD® or some other manufacturer's processor), has what's known as a little-endian architecture; as opposed, for example, to the big-endian architecture of the Motorola® processors in a Macintosh or PowerPC system. (Note: Many Apple OSX computers today are being sold with Intel® processors having little-endian architecture.) This Little-endian architecture refers to the order of the bytes found in memory (or on a storage medium) for hexadecimal numbers composed of more than a single byte; where the least-significant byte will occur at a lower (or preceding) location than its most-significant byte (or any more significant bytes in-between).Thus, the Hex number, 38DA75C6h, as seen in a PC's memory would occur as: C6 75 DA 38 (each byte will be in reverse order); whereas, a Big-endian system would have: 38 DA 75 C6. The creators of the first 16-bit Intel® processor used this little-endian order so the CPU could start working on arithmetic problems as soon as the first byte of a large number was accessed; basically the same as we do when adding the least-significant digits of two large numbers first, carrying any remainder over to the next column on the left, and finally adding together the most-significant digits, on the far left, last.
2[Return to Text] In all fairness, some authors may not have become aware of this truth until after their book was published. Thus there would be a sizeable cost involved in correcting this error; and perhaps the publisher might not agree to changing it in a second edition. But they should at least be honest enough to include it an errata note somewhere on the Internet. For those who never printed a book; have only created web or some other digital form of their works, hopefully they would be able to find enough time to correct this error.
Some authors, however, may make excuses and "shift the blame" onto some large company, stating they won't make any changes unless that "well-known company" changes their documentation first!
3[Return to Text] If you like what you see here, we have a complete Guide to DEBUG here.
4[Return to Text] We have much more information about various MBR and Volume boot sectors here.
The following shows how we described the Boot Record Signature on our pages about Partition Tables:
Table 1. This should remove any confusion over what constitutes a valid Boot Record signature; sometimes called its Magic number, and often expressed as the 16-bit hexadecimal Word, 0xAA55 (or: AA55h) for the little-endian[1] PC. |
|
5[Return to Text] Please contact us here if you have any documentation you'd like to share with us.
6[Return to Text] Microsoft® Macro Assembler (MASM). .
7[Return to Text] The Netwide Assembler (NASM) was originally written by Simon Tatham (with assistance from Julian Hall), but is now maintained by a team led by H. Peter Anvin. It's available as free software under GNU Lesser General Public License. See http://www.nasm.us/ for more information.
8[Return to Text] Turbo Assembler (TASM) by Borland; no longer maintained. Here's a FAQ about TASM.
9[Return to Text] Please read our note on Microsoft's very confusing way of referring to hex numbers in their current documentation and on their web sites by using a nonstandard definition for little-endian Hex numbers! Although we're happy to see at least one author at microsoft.com dropped the incorrect 0x usage and simply listed the hex bytes in order with spaces between them (as we've done on our own boot record pages), many continue to reference the same illogical note under their tables. Therefore, we're presenting another analogy here to illustrate how wrong that concept is. The note is often worded as follows: "Numbers larger than one byte are stored in little endian format or reverse-byte ordering. Little endian format is a method of storing a number so that the least significant byte appears first in the hexadecimal number notation." (Searching microsoft.com or their related sites for the phrase "Troubleshooting Disks and File Systems" should turn up some links to pages with this note). First, the little-endianness of a PC has absolutely nothing to do with "hexadecimal number notation" as this note claims! If you want to enter the hex number 0x3F (63) as a double-word (4 bytes; quad-words are 8 bytes), you could simply write "dd 0x3F" and the assembler would know it needed to reserve 4 bytes for this data (i.e., it would store this as: "3F 00 00 00"). But you would never enter 0x3F000000 since that would be a completely different number! Yet some Microsoft employees think it's correct to write something like this: "The sample value for the Relative Sectors field in the previous table, 0x3F000000, is a little endian representation of 0x0000003F. The decimal equivalent of this little endian number is 63. The sample value for Total Sectors is 0x41D31200, which represents 0x0012D341. Therefore, in decimal, there are 1,233,729 sectors in the volume."
To show just how ridiculous this is, I propose a similar "definition" to theirs, but dealing with money: All currency values larger than $9 are stored on my PC in reverse-order. This format is a method of storing the least significant digit first in decimal dollar notation! So, the sample value for our Relative Taxes in the previous table, a full $1,056,964,608.00, is a little endian representation of $63.00. The hexdecimal equivalent of this little endian amount is 0x3F dollars. The sample value for a Microsoft employee's income is $26,292,224.00, which represents $233,729.00. Therefore, in hexadecimal, he earned 0x39101 dollars. That's really not much different than what I quoted from Microsoft above; saying that two different dollar amounts represent the same thing or two different hex numbers are the same is an equally insane way to approach this topic.
Now wouldn't it confuse you if various banks and lenders started using my "dollar notation" (which you can't even tell is any different than normal dollars by knowing that term!) for their assets and revenues, or your savings, on their comptuers with only a little footnote redefining what everyone normally takes for granted? Didn't something similar to that happen here in the USA?! That's why it's best to use a standard correctly.
Created: February
25, 2009. (2009.02.25)
Updated: February 28,
2009. (2009.02.28)
Last Update: March 1, 2009. (2009.03.01)
You can write to us here:
contact page (opens in a new window).
MBR
and Boot Records Index
The Starman's Realm Index Page