The encoding algorithm
Decoding: additional terms
The principle of decoding
At the start of decoding, the dictionary looks the same as at the start of encoding -- it contains all possible roots.
Let's consider a point in the process of decoding, when the dictionary contains some longer strings. The algorithm remembers the previous code word (pW) and then reads the current code word (cW) from the codestream. It outputs the string.cW, and adds the string.pW extended with the first character of the string.cW to the dictionary. This is the character that would have been explicitly read from the codestream in LZ78. Because of this principle, the decoding algorithm "lags" one step behind the encoding algorithm with the adding of new strings to the dictionary.
A special case occurrs if the cW denotes an empty entry in the dictionary. This can happen because of the explained "lagging" behind the encoding algorithm. It happens if the encoding algorithm reads the string that it has just added to the dictionary in the previous step. During the decoding this string is not yet present in the dictionary. A string can occurr twice in a row in the charstream only if its first and last character are equal, because the next string always starts with the last character of the previous one. This leads to the following decoding rule: the string.pW is extended with its own first character and the resulting string is added to the dictionary and output to the charstream.
The decoding algorithm
An example
The encoding process is presented in Table 1.
(1) | A | |
(2) | B | |
(3) | C |
Pos | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
---|---|---|---|---|---|---|---|---|---|
Char | A | B | B | A | B | A | B | A | C |
Step | Pos | Dictionary | Output |
---|---|---|---|
1. | 1 | (4) A B | (1) |
2. | 2 | (5) B B | (2) |
3. | 3 | (6) B A | (2) |
4. | 4 | (7) A B A | (4) |
5. | 6 | (8) A B A C | (7) |
6. | -- | -- | (3) |
Table 2. explains the decoding process. In each decoding step the algorithm reads one code word (Code), outputs the corresponding string (Output) and adds a string to the dictionary (Dictionary).
Step | Code | Output | Dictionary |
---|---|---|---|
1. | (1) | A | -- |
2. | (2) | B | (4) A B |
3. | (2) | B | (5) B B |
4. | (4) | A B | (6) B A |
5. | (7) | A B A | (7) A B A |
6. | (3) | C | (8) A B A C |
Let's analyze the step 4. The previous code word (2) is stored in pW, and cW is (4). The string.cW is output ("A B"). The string.pW ("B") is extended with the first character of the string.cW ("A") and the result ("B A") is added to the dictionary with the index (6).
We come to the step 5. The content of cW=(4) is copied to pW, and the new value for cW is read: (7). This entry in the dictionary is empty. Thus, the string.pW ("A B") is extended with its own first character ("A") and the result ("A B A") is stored in the dictionary with the index (7). Since cW is (7) as well, this string is also sent to the output.