This discussion assumes you've read and have access our Introduction, Boot Sector, Disk Contents and Tech Notes concerning IBM® Personal Computer DOS 1.00.
Although the goal of true Computer Forensics is to discover evidence of any illegal activities in a manner that allows such evidence to be admissible in a court of law, the term forensics is often used today to simply refer to the methods employed by such experts. Since our goal is only to discover technical and historical facts of interest for digital historians and collectors of original DOS distributions, our examination need not comply with all the legal requirements for gathering such evidence.
However, there is much that
can be gained from studying or actually learning the skills of a forensics
expert! For example, they must always be concerned about how evidence
is handled: Physical items, such as a diskette (and any
electronic evidence contained on it), must be preserved as best as possible
in the state they were originally found in! This would involve both a
legal aspect called "chain of custody" and technical skills to ensure
nothing on the diskette was altered by their investigation. As we discussed
on our Tech Notes page, you should never
insert an original distribution diskette into a drive without first:
1) Making sure it won't physically damage the diskette (test it with unimportant
media) and
2) Establishing that the drive's write-protect mechanism functions properly,
so the diskette is indeed "write-protected" from any write
that might be done by the OS or a utility program (old 5-1/4
inch diskettes must have their write notch covered; it's best
if the material used to do so is both opaque to light and will not move).
Forensics experts often employ drives that are certified as being impossible
to write any data to a diskette.
Although we won't be looking for any evidence of a crime, we will employ forensic methods to discover any data on this diskette that was not intended to be seen by the general public. Such data might be found in the hexadecimal codes of any normally accessible file, the unused space between files, deleted files or the areas used only by file systems.
As an example of how studying computer forensics can be useful to others, especially collectors and historians, picture a friend who one day tells you that he discovered a famous saying embedded in all the diskettes of some old operating system. What should you ask him first? "Where did you get the diskette from?" and "How did you find those words?" would be my reaction. What if he said, 'Oh, from an image file I found on the Net.' Or, 'I found them in a deleted file.' How much confidence would you then have in his discovery?
We can learn from the meticulous records kept by forensics experts (think of the phrase, "chain of evidence"), that it's important for us to know as best as possible where all the physical items have been before we acquire them, and in what ways we may have altered any of them! You can think of this as establishing the Integrity of our data.
Accuracy speaks of the methods employed when making copies or image files of those diskettes, and the manner in which we discover any interesting data contained on them. As with several experiments we've conducted to arrive at conclusions about various details of recent operating systems, you must also think very logically when applying methods that might help you discover new facts and reach reasonable conclusions about any historic DOS records!
The data used in
our investigations comes directly from original DOS diskettes (see
the labels pictured on our 1.00
and 1.10 index pages).
Although we can not state with 100% certainty that our data came
directly from IBM®
(nor can almost anyone else, as that would require them to have
watched it being produced and placed directly into their hands!),
we're very confident that we've been working with every byte of the original
data, and each confirmation*
increases our trust in that assessment. Our examination of the diskettes revealed
no signs of tampering or contamination. However, don't hesitate to write to
us about any possible discrepancies in our facts or conclusions.
____________________
*Please
send us confirmations. We'd especially
like to receive any replies regarding our MD5 checksums for image files
of whole diskettes! Once we've received a number of confirmations, this note
will be replaced by a table indicating how confident we are in our data/conclusions.
IBM®
Personal Computer DOS 1.00 Image File
MD5 sum: 73c919cecadf002a7124b7e8bfe3b5ba |
|
IBM®
Personal Computer DOS 1.10 Image File
MD5 sum: 47bfb4371d28cd9e45fb1197f2a70c00 |
|
As part of our examination, we've compiled a list of all the names and interesting textual phrases or data we discovered, and identified their locations as either a file, system object or slack space.
The very first name encountered on the diskette is: Robert O'Rear (see Boot Record for a detailed analysis of that sector's contents). The same name is found inside FORMAT.COM, but it's only due to the fact the file contains an exact copy of the Boot Record for making new bootable diskettes.
The following table lists all the names contained on the IBM® Personal Computer DOS 1.00 distribution diskette. Offsets are given in hexadecimal from the beginning of the whole diskette and Absolute sectors ("Sect") are also listed:
Name
|
Offset
|
Sect |
Object/File
Name
|
Comments
|
Robert O'Rear
|
00168 |
0
|
Boot
Record
|
See notes here; Microsoft employee* |
"
|
046E8 |
35
|
FORMAT.COM
|
"
|
David Litton
|
0528A |
41
|
DISKCOPY.COM
|
Program's author; IBM employee* |
"
|
05893 |
44
|
DISKCOMP.COM
|
"
|
"
|
05E8F |
47
|
COMP.COM
|
"
|
Mel Hallerman
|
06A88 |
53
|
MODE.COM
|
Program's co-author; IBM employee* |
M. Hallerman
|
246A2 |
291
|
SPACE.BAS
|
"
|
Ron Heiney
|
06A9A |
53
|
MODE.COM
|
Program's co-author; IBM employee |
R. Heiney
|
24696 |
291
|
SPACE.BAS
|
"
|
Glenn
Stuart Dardick
|
1A694 |
211
|
ART.BAS
|
Program's author; IBM employee* |
"
|
1AE98 |
215
|
SAMPLES.BAS
|
"
|
"
|
1B899 |
220
|
MORTGAGE.BAS
|
"
|
"
|
1D299 |
233
|
COLORBAR.BAS
|
"
|
M. C. Rojas
|
25695 |
299
|
COMM.BAS
|
Program's author; IBM employee |
*
You can search the Net for Glenn S. Dardick, Mel Hallerman, or
Robert O'Rear for further |
Except for some phrases that might be interesting only to programmers (which we didn't list here), the following table contains mostly program Version and Copyright strings (and Author names already listed above); offsets are given in hexadecimal from beginning of the diskette along with the Absolute sector ("Sect"):
Offset
|
Sect |
File Name
|
Text Strings
|
00E22 |
7
|
IBMBIO.COM
|
BIOS Version 1.00 |
033BA |
25
|
COMMAND.COM
|
The IBM Personal Computer DOS Version 1.00 Copyright IBM Corp 1981 |
03403 |
26
|
COMMAND.COM
|
Licensed Material - Program Property of IBM |
05203 |
41
|
DISKCOPY.COM
|
"The IBM Personal Computer Diskette Copier |
05803 |
44
|
DISKCOMP.COM
|
"The IBM Personal Computer Diskette Compare Utility Version 1.00 (C)Copyright IBM Corp 1981 Licensed Material - Program Property of IBM Author - David Litton " (All on one line!) |
06A03 |
53
|
MODE.COM
|
"The IBM Personal Computer Mode command Version 1.00 (C)Copyright IBM Corp 1981 Licensed Material - Program Property of IBM Authors - Mel Hallerman and Ron Heiney " (All on a single line!) |
06E04 |
55
|
EDLIN.COM
|
The IBM Personal Computer EDITOR Version 1.00 (C)Copyright IBM Corp 1981 |
08E05 |
71
|
DEBUG.COM
|
"The IBM Personal Computer DEBUGVersion 1.00 (C)Copyright IBM Corp 1981 Licensed Material - Program Property of IBM" (No space between "DEBUG" and "Version"; all on one line!) |
0D5BF |
106
|
LINK.EXE
|
MS PASCALFORTRAN 77 |
0D95F |
108
|
LINK.EXE
|
IBM Personal Computer Linker Version 1.00 (C) Copyright IBM Corp 1981 |
16023 |
176
|
BASIC.COM
|
"The IBM Personal Computer Basic", FFh, 0Dh "Version D1.00 Copyright IBM Corp. 1981" |
1A071 |
208
|
BASICA.COM
|
"The IBM Personal Computer Basic", FFh, 0Dh "Version A1.00 Copyright IBM Corp. 1981" |
Note the use of the phrase "Version A1.00" in BASICA.COM. From some of the BASIC program files on this diskette, we learn this was Microsoft's "Advanced" BASIC program, so the letter "A" probably stands for that word. Since Microsoft also produced a BASIC version for ROM chips, we've concluded the letter "D" in the phrase "Version D1.00" of its plain BASIC.COM program, most likely stands for the "D" in the phrase "Disk Basic" which was often used to refer to their standalone product. Although many of the BASIC program files (*.BAS) on the diskette have an author string, others do not, so we've decided listing them here would make the paper unnecessarily long; our discoveries in the Slack Space are considered to be more important!
Slack Space is the unused area between the end of a file's actual contents and the beginning of another file at the start of the next cluster. Since each cluster on a 160 KiB diskette is only one sector, none of the data found in one of these areas can be longer than 511 bytes and will often be less than half that size. Data found within this diskette's slack space was of course, never intended to be seen by the public or anyone else for that matter. Apparently no one at either IBM® or Microsoft® cared enough to copy all the final distribution files to a "newly formatted" diskette; if they had, all the slack space on the diskette would have contained nothing but "F6" bytes and this paper would be far less interesting! However, only those who learned how to use DEBUG (or later on, a disk editor) and spent time looking through these areas, could possibly find anything of interest.
This table shows portions of the two most interesting strings we found in the Slack Space:
Offset
|
Sect |
Interesting
Text in the Slack Space
|
0576B |
43
|
HEXCOMError in HEX file--conversion aborted |
16496 |
178
|
DEC-20 Downlink to Boca Raton [300-bps] |
In the second item listed above, the "DEC-20" term is not a calendar date. It no doubt refers to a Digital Equipment Corporation's "digital DECSYSTEM 20" owned by Microsoft® at that time1. We know this, because Bob O'Rear not only wrote about how he used that system to complete many tasks for this key software project, but also informed us that IBM® employees in Boca Raton would dial-in to Microsoft's DEC-20 computer via voice modem (at only 300-BPS in this case) to read e-mail communications concerning development of the DOS code2. This fragment is likely a reference to one of those occurrences. [ Note: We've also found two other communications fragments in the slack space of the DOS 1.10 diskette: "DEC-20 +++ FAST +++ [Hal version] 11-Oct-81". Four months later, a similar "event" was captured: "Tops-20 Downlink to MS-DOS (Created: 24-Feb-82) [IBM Version] [1200-bps]". We believe that anyone working in forensics should be at least slightly fascinated that pieces of events which might have no other record of ever occurring, ended up being preserved on all the distribution diskettes for these operating systems. Fortunately for both IBM® and Microsoft®, no company secrets appear to have ever been revealed in any of these bits and pieces of data. ]
Many of the first DOS executables existed as HEX files which were then converted to COM files by a "HEX to COM" program, and remnants of those HEX files can still be found in this diskette's slack space. As a matter of fact, part of a HEX to COM conversion tool, probably the same one used to convert each of these files, was left sitting in the slack area after DISKCOPY.COM. The first clue to its existence on diskette is part of a text string from this program (for location, see first item in table above): "HEXCOMError in HEX file--conversion aborted." Though very informative by itself, we also have at least 169 bytes of code from which to conclude this section was indeed part of a working "HEX to COM" executable. (This file has been identified as the HEX2BIN.COM program from Seattle Computer Products. The date, however, is uncertain; one copy is dated April 18, 1981 and another May 7, 1981, yet both are identical3.)
Unless someone is familiar with the contents of a particular file format know as HEX (*.hex files) or has a trained eye for viewing ASCII strings (especially DOS line return pairs of 0D, 0A) in a hexadecimal (binary) file editor, they're likely to miss much of the information that's embedded in the Slack Space of this diskette! Once we recognized many strings of numbers as ASCII code bytes and converted them to characters, most of the Slack Space confirmed an important part of the story as to how this diskette's original (the actual diskette used to make all the distribution copies) had been used by its programmers!
The following is a brief explanation of how to interpret the HEX file records found in our diskette's slack space:
Here are two typical HEX Data Record lines (from Cluster 41; Abs. Sector 46): :1A0534000D0A496E73756666696369656E74206D656D6F7279070D0A240DA7 :16054E000A496E76616C696420706172616D65746572070D0A24A3 Each Record begins with a colon (":") mark. The first byte ("1A") indicates the Record Length. Almost every Record found had 1A hex (26 bytes) of data, and when it didn't (see "16" in next record above), it had fewer bytes because it was last! The next two bytes list the load offset within the program (0534 then 054E; 534h + 1Ah = 54Eh) if it's a data record (which is whenever the fourth byte is 00). We didn't find any records that weren't Data Records (type 00). The last byte of a HEX Record, (A7 and A3 here), is always a checksum; it's not part of the data. Once the remaining 26 ASCII bytes (plus 22 for the last line) are converted, they become: 0D,0A,"Insufficient memory",07,0D,0A,"$",0D These strings are almost certainly the input for a DOS Interrupt 21h Function 9 routine; which means that the 07 byte would sound a beep on the computer's speaker; these are Error messages for some program! When such an Interrupt is used, the "$" sign (24 hex) marks the end of the string to be displayed. |
Why did we take the time to describe the HEX format? Because it's very useful for any study involving fragments of a HEX file: The second and third bytes of each Record contain that line's "load offset" within its program. Therefore, if the same string of bytes within these records is found in the slack space of many files, its load offset provides further evidence that all of those bytes belong to a single program in which those lines have the same offsets! As more and more bytes "line up" in the same locations, the assurance of a "match" increases.
For example, the phrase "Insufficient memory" was found within the DISKCOPY, DISKCOMP, EDLIN and DEBUG programs of this diskette. However, taking all 48 of the bytes shown above, in that same order, should narrow it down to only one file, right? Well, it doesn't; both DISKCOPY and DISKCOMP contain all 48 bytes in the same exact order! This isn't too surprising since the same person (Daivd Litton) wrote both programs at nearly the same time. But there's more evidence to consider: First, you need to know that just like .COM files, a .HEX file is normally loaded into a segment of memory at offset 0100 hex and following. Now, within the DISKCOPY program, these bytes begin at hex offset 490h (5690h - 5200h); whereas for the DISKCOMP program, they begin at offset 434h (5C34h - 5800h). Since we know from their HEX Records (see the dark green "0534" following the red ":1A" in the gray box above) that these bytes are supposed to load into memory at offset 0534h (434h + 100h = 534h), it means (at least on this diskette), they can only belong to the DISKCOMP.COM program!
We've compiled a separate page detailing the evidence from slack spaces for our conclusions here: See "Slack Space Evidence" (including a file containing .HEX examples from this diskette's slack space, along with a Batch program that automatically loads the data into DEBUG to reveal their text strings).
Many computer forensics experts have a wealth of information they can glean from the MAC (Modified, Accessed and Created) Times associated with any system files that have been purposely altered (during an Internet related break-in) or other time-stamps found in various server logs. Other investigators may need to pay particular attention to the dates of certain files deleted by a suspect, whether their contents are recoverable or not!
However, due to the nature of the very first DOS file system (only the Last Modified Date/Time was ever recorded) and because IBM® made a decision to alter the dates and times of most files on this diskette (which, unfortunately, was a decision they essentially carried over to every IBM® and Microsoft® OS distribution), we have almost no date related data left to analyze!
Fortunately, there are still some dates that were embedded in parts of the diskette normally unseen by most users; along with the dates of two hidden files. The following table shows each date string found on the diskette, listed chronologically:
Date String
|
Offset | Sect | Object/File Name |
Comments
|
9-Apr-81
|
164C1
|
178
|
Slack
Space
|
From the the phrase: "DEC-20 Downlink to Boca Raton 300-BPS 9-APR-81" |
7-May-81
|
00009
|
0
|
Boot
Record
|
Part of the Boot sector which was authored by Robert O'Rear. |
15-Jul-81 |
04577
|
34
|
FORMAT.COM
|
This string isn't used by the program; we assume it was used to date the code. |
15-Jul-81 |
04E0E
|
39
|
SYS.COM
|
This string isn't used by the program; we assume it was used to date the code. |
22-Jul-81 |
00E34
|
7
|
IBMBIO.COM
|
This string immediately follows the byte A0 after string: "BIOS Version 1.00" |
4-Aug-81
|
1606E
|
176
|
BASIC.COM
|
String appears immediately in front of the phrase: "Licensed Material - Program Property of IBM" Assume it dates code. |
4-Aug-81
|
1A0BC
|
208
|
BASICA.COM
|
String appears immediately in front of the phrase: "Licensed Material - Program Property of IBM" Assume it dates code. |
The earliest date,
comes from an interesting string already mentioned in our analysis of the slack
space:
"DEC-20
Downlink to Boca Raton 300-BPS 9-APR-81". Although
this phrase isn't associated with any file on the diskette, it does confirm
that something related to a project in Boca Raton, FL was underway by
this date; and of course, from other historical records, we know for a fact
that project was IBM's
Personal Computer DOS. Apart from both BASIC.COM and BASICA.COM
which contain the same date (4-Aug-81) as all the files with visible
attributes, we believe the other embedded dates are genuine indicators of the
project's progress.
The
Hidden System Files
|
||
Date
|
Time
|
Directory
Entry
|
07-23-81
|
12:00am
|
IBMBIO.COM
|
08-13-81
|
12:00am
|
IBMDOS.COM
|
Even though the times of these two files appear to have been altered (both showing exactly 12:00am; the same time as all the other files on this diskette), we believe their dates still reveal some useful information: For one thing, the IBMBIO.COM file is dated the day after the "22-Jul-81" string embedded inside it! This seems to imply its author, or perhaps another programmer, finished up some important change(s) in its code without bothering to set both the file date and this internal string to the same day.
The date for the IBMDOS.COM file is significant in that this was the day after IBM® had already announced the release of its new Personal Computer which could be controlled by IBM® Personal Computer DOS on August 12, 1981. Though it's still possible this file date was altered, the fact that all the visible file dates were uniformly changed to "08-04-81" leads us to believe this later date for such a critical file was most likely due to a necessary last minute change in its code instead of a purposely altered date.
This late date (August 13, 1981) for such an important file may also reflect possible reasons why no one ever "cleaned up" the diskette before making the final master for all distribution copies: They either felt there wasn't enough time for what was considered a trivial matter, or just plain forgot to do so.
The consistent discovery of HEX Records whose bytes match-up only with those in the programs physically preceding the slack space in which they were found and allocated to, leads us to conclude there's a high probability all the .COM files on the original diskette were first copied there as HEX files and then converted to their present state by a "HEX to COM" conversion program. Furthermore, the remnants of just such a program were discovered in the diskette's slack space; with all the bytes matching such a program from that same time period. It's likely this program was used in converting all the DOS files.
We also found what we're calling a communications fragment (i.e., the string: "DEC-20 Downlink to Boca Raton 300-BPS 9-APR-81") in the diskette's slack space which predates the release of the operating system by four months. So how did this fragment end up on the release diskette? Was it merely a coincidence, or had the project manager(s) kept everything on the same diskette; which was then used as a master for making all the distribution diskettes?
In spite of the fact the date and time attributes for every visible file were purposely altered to "08-04-81 12:00am", we found a number of embedded dates that give us a broad picture of the project's progress during 1981 and perhaps even a sense of urgency to 'rush the product out the door' since the hidden system file IBMDOS.COM was dated the day after IBM's new Personal Computer had already been announced to the world.
This work
will remain open for some time! We'd
appreciate any relevant comments or corrections.
You may contact the author here .
1
Here's a full description of one such DECSYSTEM-20
at Columbia University; complete with photographs.
Microsoft's installation may have been similar to this one.
2An e-mail
dated August 11, 2005 from Robert O'Rear to Daniel B. Sedory.
3 After searching
the Internet for the filename HEX2BIN.COM,
we found a copy of it dated 04-18-81
that matched perfectly with all the bytes found in the slack space of
our IBM diskette. We
are also expecting further confirmation from others!
Last Update: August 14, 2008 (14-08-2008).
You can write to me using this: online reply form. (It opens a new window.)
IBM
PC DOS 1.00 Index
MBR
and Boot Records Index
The Starman's Realm Index