A Forensic Examination
of the IBM® Personal Computer™ DOS
Version 1.00 (1981) Diskette


Copyright © 2005,2008 by Daniel B. Sedory

NOT to be reproduced in any form without Permission of the Author!

Dedicated to all true forensics experts (those dealing with crimes),
and everyone helping to preserve history for future generations !



This discussion assumes you've read and have access our Introduction, Boot Sector, Disk Contents and Tech Notes concerning IBM® Personal Computer™ DOS 1.00.

Computer Forensics

Although the goal of true Computer Forensics is to discover evidence of any illegal activities in a manner that allows such evidence to be admissible in a court of law, the term forensics is often used today to simply refer to the methods employed by such experts. Since our goal is only to discover technical and historical facts of interest for digital historians and collectors of original DOS distributions, our examination need not comply with all the legal requirements for gathering such evidence.

However, there is much that can be gained from studying or actually learning the skills of a forensics expert! For example, they must always be concerned about how evidence is handled: Physical items, such as a diskette (and any electronic evidence contained on it), must be preserved as best as possible in the state they were originally found in! This would involve both a legal aspect called "chain of custody" and technical skills to ensure nothing on the diskette was altered by their investigation. As we discussed on our Tech Notes page, you should never insert an original distribution diskette into a drive without first:
1) Making sure it won't physically damage the diskette (test it with unimportant media) and
2) Establishing that the drive's write-protect mechanism functions properly, so the diskette is indeed "write-protected" from any write that might be done by the OS or a utility program (old 5-1/4 inch diskettes must have their write notch covered; it's best if the material used to do so is both opaque to light and will not move). Forensics experts often employ drives that are certified as being impossible to write any data to a diskette.

Although we won't be looking for any evidence of a crime, we will employ forensic methods to discover any data on this diskette that was not intended to be seen by the general public. Such data might be found in the hexadecimal codes of any normally accessible file, the unused space between files, deleted files or the areas used only by file systems.

Accuracy and Integrity

As an example of how studying computer forensics can be useful to others, especially collectors and historians, picture a friend who one day tells you that he discovered a famous saying embedded in all the diskettes of some old operating system. What should you ask him first? "Where did you get the diskette from?" and "How did you find those words?" would be my reaction. What if he said, 'Oh, from an image file I found on the Net.' Or, 'I found them in a deleted file.' How much confidence would you then have in his discovery?

We can learn from the meticulous records kept by forensics experts (think of the phrase, "chain of evidence"), that it's important for us to know as best as possible where all the physical items have been before we acquire them, and in what ways we may have altered any of them! You can think of this as establishing the Integrity of our data.

Accuracy speaks of the methods employed when making copies or image files of those diskettes, and the manner in which we discover any interesting data contained on them. As with several experiments we've conducted to arrive at conclusions about various details of recent operating systems, you must also think very logically when applying methods that might help you discover new facts and reach reasonable conclusions about any historic DOS records!

The data used in our investigations comes directly from original DOS diskettes (see the labels pictured on our 1.00 and 1.10 index pages). Although we can not state with 100% certainty that our data came directly from IBM® (nor can almost anyone else, as that would require them to have watched it being produced and placed directly into their hands!), we're very confident that we've been working with every byte of the original data, and each confirmation* increases our trust in that assessment. Our examination of the diskettes revealed no signs of tampering or contamination. However, don't hesitate to write to us about any possible discrepancies in our facts or conclusions.
____________________
*Please send us confirmations. We'd especially like to receive any replies regarding our MD5 checksums for image files of whole diskettes! Once we've received a number of confirmations, this note will be replaced by a table indicating how confident we are in our data/conclusions.

 
IBM® Personal Computer™ DOS 1.00 Image File
MD5 sum: 73c919cecadf002a7124b7e8bfe3b5ba
IBM® Personal Computer™ DOS 1.10 Image File
MD5 sum:
47bfb4371d28cd9e45fb1197f2a70c00

Discoveries

As part of our examination, we've compiled a list of all the names and interesting textual phrases or data we discovered, and identified their locations as either a file, system object or slack space.

Names

The very first name encountered on the diskette is: Robert O'Rear (see Boot Record for a detailed analysis of that sector's contents). The same name is found inside FORMAT.COM, but it's only due to the fact the file contains an exact copy of the Boot Record for making new bootable diskettes.

The following table lists all the names contained on the IBM® Personal Computer™ DOS 1.00 distribution diskette. Offsets are given in hexadecimal from the beginning of the whole diskette and Absolute sectors ("Sect") are also listed:

Name
Offset
Sect
Object/File Name
Comments
Robert O'Rear
00168
0
Boot Record
See notes here; Microsoft employee*
"
046E8
35
FORMAT.COM
"
David Litton
0528A
41
DISKCOPY.COM
Program's author; IBM employee*
"
05893
44
DISKCOMP.COM
"
"
05E8F
47
COMP.COM
"
Mel Hallerman
06A88
53
MODE.COM
Program's co-author; IBM employee*
M. Hallerman
246A2
291
SPACE.BAS
"
Ron Heiney
06A9A
53
MODE.COM
Program's co-author; IBM employee
R. Heiney
24696
291
SPACE.BAS
"
Glenn Stuart Dardick
1A694
211
ART.BAS
Program's author; IBM employee*
"
1AE98
215
SAMPLES.BAS
"
"
1B899
220
MORTGAGE.BAS
"
"
1D299
233
COLORBAR.BAS
"
M. C. Rojas
25695
299
COMM.BAS
Program's author; IBM employee

* You can search the Net for Glenn S. Dardick, Mel Hallerman, or Robert O'Rear for further
information about these individuals.
*David Litton - His name also appears in the DOS 2.00 (1983) MBR code as its author, but we
recently learned (Jan 2006) that Mr. David Litton had died some time around 1982/1983.
If you have any information you could share with us about Ron Heiney or M.C. ("Maggie") Rojas,
or anything about the early DOS programmers, we'd appreciate hearing from you!

Significant Text Strings

Except for some phrases that might be interesting only to programmers (which we didn't list here), the following table contains mostly program Version and Copyright strings (and Author names already listed above); offsets are given in hexadecimal from beginning of the diskette along with the Absolute sector ("Sect"):

Offset
Sect
File Name
Text Strings
00E22
7
IBMBIO.COM
BIOS Version 1.00
033BA
25
COMMAND.COM
The IBM Personal Computer DOS
Version 1.00
Copyright IBM Corp 1981
03403
26
COMMAND.COM
Licensed Material - Program Property of IBM
05203
41
DISKCOPY.COM

"The IBM Personal Computer Diskette Copier
 Version 1.00 (C)Copyright IBM Corp 1981
 Licensed Material - Program Property of IBM
 Author - David Litton " (All on a single line!)

05803
44
DISKCOMP.COM
"The IBM Personal Computer Diskette Compare
 
Utility Version 1.00 (C)Copyright IBM Corp
 1981 Licensed Material - Program Property of
 IBM Author - David Litton " (All on one line!)
06A03
53
MODE.COM
"The IBM Personal Computer Mode command
 Version 1.00 (C)Copyright IBM Corp 1981
 Licensed Material - Program Property of IBM
 Authors - Mel Hallerman and Ron Heiney "
 (All on a single line!)
06E04
55
EDLIN.COM
The IBM Personal Computer EDITOR
Version 1.00 (C)Copyright IBM Corp 1981
08E05
71
DEBUG.COM
"The IBM Personal Computer DEBUGVersion
 1.00 (C)Copyright IBM Corp 1981 Licensed
 Material - Program Property of IBM" (No space
between "DEBUG" and "Version"; all on one line!)
0D5BF
106
LINK.EXE
MS PASCALFORTRAN 77
0D95F
108
LINK.EXE
IBM Personal Computer Linker
Version 1.00 (C) Copyright IBM Corp 1981
16023
176
BASIC.COM
"The IBM Personal Computer Basic", FFh, 0Dh
"Version D1.00 Copyright IBM Corp. 1981"
1A071
208
BASICA.COM
"The IBM Personal Computer Basic", FFh, 0Dh
"Version A1.00 Copyright IBM Corp. 1981"

Note the use of the phrase "Version A1.00" in BASICA.COM. From some of the BASIC program files on this diskette, we learn this was Microsoft's "Advanced" BASIC program, so the letter "A" probably stands for that word. Since Microsoft also produced a BASIC version for ROM chips, we've concluded the letter "D" in the phrase "Version D1.00" of its plain BASIC.COM program, most likely stands for the "D" in the phrase "Disk Basic" which was often used to refer to their standalone product. Although many of the BASIC program files (*.BAS) on the diskette have an author string, others do not, so we've decided listing them here would make the paper unnecessarily long; our discoveries in the Slack Space are considered to be more important!

Slack Space

Slack Space is the unused area between the end of a file's actual contents and the beginning of another file at the start of the next cluster. Since each cluster on a 160 KiB diskette is only one sector, none of the data found in one of these areas can be longer than 511 bytes and will often be less than half that size. Data found within this diskette's slack space was of course, never intended to be seen by the public or anyone else for that matter. Apparently no one at either IBM® or Microsoft® cared enough to copy all the final distribution files to a "newly formatted" diskette; if they had, all the slack space on the diskette would have contained nothing but "F6" bytes and this paper would be far less interesting! However, only those who learned how to use DEBUG (or later on, a disk editor) and spent time looking through these areas, could possibly find anything of interest.

This table shows portions of the two most interesting strings we found in the Slack Space:

Offset
Sect
Interesting Text in the Slack Space
0576B
43
HEXCOMError in HEX file--conversion aborted
16496
178
DEC-20 Downlink to Boca Raton [300-bps]

In the second item listed above, the "DEC-20" term is not a calendar date. It no doubt refers to a Digital Equipment Corporation's "digital DECSYSTEM 20" owned by Microsoft® at that time1. We know this, because Bob O'Rear not only wrote about how he used that system to complete many tasks for this key software project, but also informed us that IBM® employees in Boca Raton would dial-in to Microsoft's DEC-20 computer via voice modem (at only 300-BPS in this case) to read e-mail communications concerning development of the DOS code2. This fragment is likely a reference to one of those occurrences.  [ Note: We've also found two other communications fragments in the slack space of the DOS 1.10 diskette: "DEC-20 +++ FAST +++ [Hal version] 11-Oct-81". Four months later, a similar "event" was captured: "Tops-20 Downlink to MS-DOS (Created: 24-Feb-82)  [IBM Version] [1200-bps]". We believe that anyone working in forensics should be at least slightly fascinated that pieces of events which might have no other record of ever occurring, ended up being preserved on all the distribution diskettes for these operating systems. Fortunately for both IBM® and Microsoft®, no company secrets appear to have ever been revealed in any of these bits and pieces of data. ]

Many of the first DOS executables existed as HEX files which were then converted to COM files by a "HEX to COM" program, and remnants of those HEX files can still be found in this diskette's slack space. As a matter of fact, part of a HEX to COM conversion tool, probably the same one used to convert each of these files, was left sitting in the slack area after DISKCOPY.COM. The first clue to its existence on diskette is part of a text string from this program (for location, see first item in table above): "HEXCOMError in HEX file--conversion aborted." Though very informative by itself, we also have at least 169 bytes of code from which to conclude this section was indeed part of a working "HEX to COM" executable. (This file has been identified as the HEX2BIN.COM program from Seattle Computer Products. The date, however, is uncertain; one copy is dated April 18, 1981 and another May 7, 1981, yet both are identical3.)

Unless someone is familiar with the contents of a particular file format know as HEX (*.hex files) or has a trained eye for viewing ASCII strings (especially DOS line return pairs of 0D, 0A) in a hexadecimal (binary) file editor, they're likely to miss much of the information that's embedded in the Slack Space of this diskette! Once we recognized many strings of numbers as ASCII code bytes and converted them to characters, most of the Slack Space confirmed an important part of the story as to how this diskette's original (the actual diskette used to make all the distribution copies) had been used by its programmers!

The following is a brief explanation of how to interpret the HEX file records found in our diskette's slack space:

Here are two typical HEX Data Record lines (from Cluster 41; Abs. Sector 46):

:1A0534000D0A496E73756666696369656E74206D656D6F7279070D0A240DA7
:16054E000A496E76616C696420706172616D65746572070D0A24A3

Each Record begins with a colon (":") mark. The first byte ("1A") indicates the Record Length. Almost every Record found had 1A hex (26 bytes) of data, and when it didn't (see "16" in next record above), it had fewer bytes because it was last! The next two bytes list the load offset within the program (0534 then 054E; 534h + 1Ah = 54Eh) if it's a data record (which is whenever the fourth byte is 00). We didn't find any records that weren't Data Records (type 00). The last byte of a HEX Record, (A7 and A3 here), is always a checksum; it's not part of the data. Once the remaining 26 ASCII bytes (plus 22 for the last line) are converted, they become:

0D,0A,"Insufficient memory",07,0D,0A,"$",0D
0A
,"Invalid parameter",07,0D,0A,"$"

These strings are almost certainly the input for a DOS Interrupt 21h Function 9 routine; which means that the 07 byte would sound a beep on the computer's speaker; these are Error messages for some program! When such an Interrupt is used, the "$" sign (24 hex) marks the end of the string to be displayed.

Why did we take the time to describe the HEX format? Because it's very useful for any study involving fragments of a HEX file: The second and third bytes of each Record contain that line's "load offset" within its program. Therefore, if the same string of bytes within these records is found in the slack space of many files, its load offset provides further evidence that all of those bytes belong to a single program in which those lines have the same offsets! As more and more bytes "line up" in the same locations, the assurance of a "match" increases.

For example, the phrase "Insufficient memory" was found within the DISKCOPY, DISKCOMP, EDLIN and DEBUG programs of this diskette. However, taking all 48 of the bytes shown above, in that same order, should narrow it down to only one file, right? Well, it doesn't; both DISKCOPY and DISKCOMP contain all 48 bytes in the same exact order! This isn't too surprising since the same person (Daivd Litton) wrote both programs at nearly the same time. But there's more evidence to consider: First, you need to know that just like .COM files, a .HEX file is normally loaded into a segment of memory at offset 0100 hex and following. Now, within the DISKCOPY program, these bytes begin at hex offset 490h (5690h - 5200h); whereas for the DISKCOMP program, they begin at offset 434h (5C34h - 5800h). Since we know from their HEX Records (see the dark green "0534" following the red ":1A" in the gray box above) that these bytes are supposed to load into memory at offset 0534h (434h + 100h = 534h), it means (at least on this diskette), they can only belong to the DISKCOMP.COM program!

We've compiled a separate page detailing the evidence from slack spaces for our conclusions here: See "Slack Space Evidence" (including a file containing .HEX examples from this diskette's slack space, along with a Batch program that automatically loads the data into DEBUG to reveal their text strings).

 

Date-trail Analysis

Many computer forensics experts have a wealth of information they can glean from the MAC (Modified, Accessed and Created) Times associated with any system files that have been purposely altered (during an Internet related break-in) or other time-stamps found in various server logs. Other investigators may need to pay particular attention to the dates of certain files deleted by a suspect, whether their contents are recoverable or not!

However, due to the nature of the very first DOS file system (only the Last Modified Date/Time was ever recorded) and because IBM® made a decision to alter the dates and times of most files on this diskette (which, unfortunately, was a decision they essentially carried over to every IBM® and Microsoft® OS distribution), we have almost no date related data left to analyze!

Fortunately, there are still some dates that were embedded in parts of the diskette normally unseen by most users; along with the dates of two hidden files. The following table shows each date string found on the diskette, listed chronologically:

Date String
Offset Sect Object/File Name
Comments
9-Apr-81
164C1
178
Slack Space
From the the phrase: "DEC-20 Downlink
to Boca Raton 300-BPS   9-APR-81
"
7-May-81
00009
0
Boot Record
Part of the Boot sector which was
authored by Robert O'Rear.
15-Jul-81
04577
34
FORMAT.COM
This string isn't used by the program;
we assume it was used to date the code.
15-Jul-81
04E0E
39
SYS.COM
This string isn't used by the program;
we assume it was used to date the code.
22-Jul-81
00E34
7
IBMBIO.COM
This string immediately follows the byte
A0 after string: "BIOS Version 1.00"
4-Aug-81
1606E
176
BASIC.COM
String appears immediately in front of the
phrase: "Licensed Material - Program
Property of IBM
" Assume it dates code.
4-Aug-81
1A0BC
208
BASICA.COM
String appears immediately in front of the
phrase: "Licensed Material - Program
Property of IBM
" Assume it dates code.

The earliest date, comes from an interesting string already mentioned in our analysis of the slack space:
"DEC-20 Downlink to Boca Raton 300-BPS   9-APR-81". Although this phrase isn't associated with any file on the diskette, it does confirm that something related to a project in Boca Raton, FL was underway by this date; and of course, from other historical records, we know for a fact that project was IBM's Personal Computer™ DOS. Apart from both BASIC.COM and BASICA.COM which contain the same date (4-Aug-81) as all the files with visible attributes, we believe the other embedded dates are genuine indicators of the project's progress.

The Hidden System Files
Date
Time
Directory Entry
07-23-81
12:00am
IBMBIO.COM
08-13-81
12:00am
IBMDOS.COM

Even though the times of these two files appear to have been altered (both showing exactly 12:00am; the same time as all the other files on this diskette), we believe their dates still reveal some useful information: For one thing, the IBMBIO.COM file is dated the day after the "22-Jul-81" string embedded inside it! This seems to imply its author, or perhaps another programmer, finished up some important change(s) in its code without bothering to set both the file date and this internal string to the same day.

The date for the IBMDOS.COM file is significant in that this was the day after  IBM® had already announced the release of its new Personal Computer™ which could be controlled by IBM® Personal ComputerDOS on August 12, 1981. Though it's still possible this file date was altered, the fact that all the visible file dates were uniformly changed to "08-04-81" leads us to believe this later date for such a critical file was most likely due to a necessary last minute change in its code instead of a purposely altered date.

This late date (August 13, 1981) for such an important file may also reflect possible reasons why no one ever "cleaned up" the diskette before making the final master for all distribution copies: They either felt there wasn't enough time for what was considered a trivial matter, or just plain forgot to do so.

 

Summary

The consistent discovery of HEX Records whose bytes match-up only with those in the programs physically preceding the slack space in which they were found and allocated to, leads us to conclude there's a high probability all the .COM files on the original diskette were first copied there as HEX files and then converted to their present state by a "HEX to COM" conversion program. Furthermore, the remnants of just such a program were discovered in the diskette's slack space; with all the bytes matching such a program from that same time period. It's likely this program was used in converting all the DOS files.

We also found what we're calling a communications fragment (i.e., the string: "DEC-20 Downlink to Boca Raton 300-BPS   9-APR-81") in the diskette's slack space which predates the release of the operating system by four months. So how did this fragment end up on the release diskette? Was it merely a coincidence, or had the project manager(s) kept everything on the same diskette; which was then used as a master for making all the distribution diskettes?

In spite of the fact the date and time attributes for every visible file were purposely altered to "08-04-81 12:00am", we found a number of embedded dates that give us a broad picture of the project's progress during 1981 and perhaps even a sense of urgency to 'rush the product out the door' since the hidden system file IBMDOS.COM was dated the day after IBM's new Personal Computer had already been announced to the world.

 

This work will remain open for some time!  We'd
appreciate any relevant comments or corrections.
You may contact the author here .

Notes

1 Here's a full description of one such DECSYSTEM-20 at Columbia University; complete with photographs. Microsoft's installation may have been similar to this one.

2An e-mail dated August 11, 2005 from Robert O'Rear to Daniel B. Sedory.

3 After searching the Internet for the filename HEX2BIN.COM, we found a copy of it dated 04-18-81 that matched perfectly with all the bytes found in the slack space of our IBM diskette. We are also expecting further confirmation from others!

 

 


 

Last Update: August 14, 2008 (14-08-2008).

You can write to me using this: online reply form. (It opens a new window.)



IBM PC DOS 1.00 Index

MBR and Boot Records Index

The Starman's Realm Index

 

 

 

 

1