What is Data Corruption?
Although Data Corruption comes in many forms, the most common is when
information contained in a data file contains "non-standard" characters.
The presence of non-standard characters often prevents software from being
able to work with the data properly. This is because the software cannot
read the data and/or cannot write over the corrupted data with new
information.
To understand what "non-standard" characters, one must first understand
what "standard" characters are. Usually files store data using "ASCII".
Simply put, ASCII is the universal language used by computers to store
information. All the keys on the keyboard are stored in files using ASCII
format – A, a, B, b, @, #, 1, 2, etc. Thus the keys you see on the
keyboard represent "standard" characters.
Corrupt data comes in many forms. Probably the most common is when
non-ASCII "trash" characters invade a data file. An example of these can
be found when you see strange characters on screen, such as § ¨ © ª Æ j ›
@ , etc.
Some more examples of corrupted data in increasingly technical terms are
:- spaces in a file or record where there shouldn’t be a space; header
files containing trash characters, accidentally "zapped" files and memo
damage, which is often due to corrupted "next free block pointers" and
corrupted or altered pointers.
How serious is data corruption?
Data corruption is a fairly regular, and usually minor event. Many
programs have built in routines that fix the most common sorts of
corruption. Most data programs, for example, contain a "reindex" or
"rebuild" routine specifically designed to fix corrupted or damaged index
files.
On occasion data corruption can not be fixed by the tools built into the
software. When this happens, frequently the software cannot operate at
all. This is because when it sees the corrupted data, it can’t work with
it at all. The software will then return an error message or even cause
the program to freeze, or "Lock Up". When this happens, there are usually
some courses of action :-
1)
A technician can try and comb through the files manually,
looking to
delete the
corruption "by hand".
Pro: Can be effective.
Con: 1. More tedious and boring to perform than watching grass grow.
2. Requires a lot of a technician’s time, which means
it is usually
quite expensive.
3. It’s easy to do this type of rescue only to realize
that the
problem was made worse because of the problems inherent in
doing it "by hand" (we won’t bore you with the details here).
2)
Restore from backup from prior to when the data
corruption occurred.
Pro: 1. Relatively easy, so most anyone can do it with a
minimum of
training (you must thoroughly understand the
backup and restore
process BEFORE trying to do this.)
2. In and of itself it doesn’t cost anything. Con: You lose all
data
entered since the last "good" backup, which can
also be quite expensive if an "older" backup must be
used.
If the data is corrupt, doesn’t
that mean the software is defective?
The answer, in a word, is no. To understand why this
is so, one must understand that the program and the data are two separate
things. Data is a collection of information, and is usually stored in a
number of separate files. A program, on the other hand, is a tool that
allows the data to be seen on screen in a familiar manner, and to be
manipulated (added to, changed, deleted, etc.). Think of the program as "a
set of screens", i.e. a name and address screen, a notes screen, an
inventory screen, etc.
This distinction is critically important to understanding the nature of
data corruption. Data is the information a user enters into the computer,
and is unique to that user. "John" in Penang has the same program as
"Betty" in Johor, but John’s data is totally different then Betty’s data.
This is why the same program can be used by hundreds of different users
and function properly and consistently for all, except those unfortunate
users who have corrupted data.
Another way to explain this is that most database
programs can work with multiple databases. For instance, your software
might have a "real" database, and a "training" database. But there is only
one program.
Now, if the program works properly with the training database, but fails
to work with the real database, the difference between the two situations
is not found in the program. Rather, it is found in the databases. In our
example, the training database is operating fine, whereas the real
database is not. So the key to fixing the problem rests in fixing the real
database.
Finally, programs themselves do not corrupt databases. Database corruption
is in all cases caused by factors external to the program. And that is
what we address in the next section.
What causes Corrupt Data?
The single most common root cause of data corruption (lost clusters, cross
linked files, etc.) and hardware failure is bad electric power. A recent
study commissioned by IBM reported that the cause of 51% of all computer
problems was traced to bad electric power. Bad electric power includes
blackouts (no power), brownouts (reduced power), surges (increased power),
spikes (massive increases in power, e.g. lightning), Electromagnetic Radio
Interference (EMI) (caused by, among other things, sunspots), and Radio
Frequency Interference (RFI).
Something to bear in mind about computers and electric power: an
electrical "event" in computer terms is one that lasts greater than about
45 milliseconds. The human senses, however, do not perceive anything of
shorter duration then 1/30th of a second. In other words, the power must
go out for more then 1/30th of a second for us to notice it. In computer
terms, the length of time between .000045 seconds and .03 seconds is an
eternity.
Another very common cause of corrupt data is hard drive problems, such as
lost clusters, cross linked files, and damaged file allocation tables
(FAT). These can often be fixed by the operating system (DOS or Windows),
but they often leave damaged data files behind.
Other reasons for corrupted data include: mechanical failure (RAM,
motherboard, CMOS, BIOS, hard drive, floppy drive, etc.), electrical
failure (see above, and also the computer’s own power supply can fail),
virus activity, system malfunctions, accidental erasure or re-format
(human error), water, fire, and smoke damage.
How is Data Corruption Fixed?
Most software has built-in tools to fix data corruption. The Reindex (or
Rebuild) routine recreates index files, is generally harmless to run, and
most often the first step to fix a problem. Depending on the software,
more sophisticated tools for fixing data corruption may or may not be
available: consult your Users’ Manual or call the software manufacturer’s
Technical Support Department for more information.
Usually the easiest and cheapest way to fix data corruption is to restore
from a recent backup. If a backup was performed in the past day, chances
are that on average ½ day of work will be lost.
If the software’s built in tools don’t fix the problem, and a recent
backup is not available, data corruption may possibly be fixed by sending
the data to the software manufacturer or to a third party for data repair.
This can be quite expensive. It is recommended that you always start with
the software manufacturer, as it will have the most expertise in fixing
data problems in its own software.
What is the best defense against data
corruption?
1. Connect EVERY computer, monitor, printer, modem, and other
peripheral
device to an Uninterruptable Power Supply (UPS). A UPS will
protect the
computer against power blackouts, brownouts, surges, and
spikes. It will also
filter out EMI and RFI. A UPS is a cheap insurance
policy, and it will significantly
extend the life of your computer equipment.
(Note: do not connect laser printers to a
UPS – they require too much
power and the UPS won’t be able to handle it.)
2. Backup every day, and rotate the media (floppy disks, zip disks,
tapes,
etc.) on which you backup. How?
a. Use separate media each day of the week that you are open,
plus 2. For
example, if you are open 6 days per week, you
need 8 sets of disks, or 8
zip
disks, or 8 tapes.
b. Label 6 sets of disks or tapes with the day of the week you
are open. Label the 7th "Monthly Off Site Copy #1..
Label
the 8th "Monthly Off Site Copy #2".
c. Backup each and
every day you are open. NEVER fail to do this.
d. Once per week,
do an extra backup using "Monthly Off Site Copy
#1". Keep this copy at home.
e. Each month,
switch off using Monthly Off Site Copy #1 with Monthly
Off Site Copy #2, and vice versa.
f. Each time you
switch from using #1 to #2, store the disks or tape in
a
safe
deposit box.
3. Run anti-virus software in auto-protect mode. Some popular and
effective
anti- virus software programs include McAffee Anti-Virus, Norton Anti-
Virus,
and Dr. Solomon Anti-Virus.
4.
Regularly run the reindex (aka "rebuild") maintenance
functions
contained in your software.
Postscript: What proof is there that Data
Corruption is a Common Problem?
1. The
inclusion of software data repair utilities in most software packages.
2. The
existence for years of software designed specifically to help end
users repair
their own data and disk drive problems, such as Norton
Utilities and PCTools.
3. Major data
recovery companies such as :-
a.
Ontrack Data International, Inc.
NASDAQ: ONDI), a world leader in data recovery that specializes in
software and
services that help computer users protect their valuable
data and recover
lost data. Ontrack operates data recovery labs in Los
Angeles, San Jose,
Washington D.C., New York, Minneapolis, Tokyo,
London, Paris, and Stuttgart.
b. CBL
Data Recovery Specialists, who recover mission critical data
when all other
conventional methods and experts have failed. CBL
recovers data for losses due to file corruption, mechanical or
electrical failure,
virus activity, system malfunctions, accidental erasure
or re-format, water, fire, or smoke damage.
c.
Disktek Data Recovery590 Alden Road, Unit 105, Markham, Ontario,
Canada,L3R 8N2. Their technicians salvage and extract lost data from
hard
disk drives and other storage media which are corrupted due to
hardware or
software failure, natural disasters, or human error.
d.
Kleiber Enterprises, which has been actively involved in data recovery
since1984. Kleiber uses proprietary techniques on a wide range of
storage devices
and platforms: DOS, Win95, WinNT and Novell.
Here
are some suggestions to salvage your precious data before its too late
1)
DON'T let the hard disk run if your hard
disk emits any unusual noises
(clicking, grinding
or metal scraping), turn it off immediately! This
condition typically
indicates a head crash and major media damage.
Hard disks spin at a
high rate of speed, anywhere from 3000 to 10,000
revolutions per
minute. Extensive damage can occur in a short period of
time if a drive is
left running, making the data irretrievable. In this
situation it is best
to send the drive directly to Data Recovery
Specialists so they
may retrieve the data before it is lost forever.
2)
DON'T use any file recovery programs
(especially Norton Utilities) in
the presence
of mechanical damage, such as when strange noises are
coming from the drive.
3)
DO make undo disks whenever you run any utility
programs; do not run
the program more than once if it does not correct the problem the first
time.
4)
DON'T remove the cover to the hard drive to
expose the media
surface. You
can only make things worse if you do this. Drives are
meant to be opened in clean-room environments. Touching the actuator
arm, read-write head, etc will make a bad situation worse.
5)
DO shut down the drive and leave it that way
if the CMOS does not
recognize the
drive as being present in the system. Mechanical and/or
electronic damage is indicated and the drive is not accessible through
normal means. Changing the CMOS to an improper setting will cause
further data corruption.