The LZW algorithm


The encoding algorithm

  1. At the start, the dictionary contains all possible roots, and P is empty;
  2. C := next character in the charstream;
  3. Is the string P+C present in the dictionary?
    1. if it is, P := P+C (extend P with C);
    2. if not,
      1. output the code word which denotes P to the codestream;
      2. add the string P+C to the dictionary;
      3. P := C (P now contains only the character C);
    3. are there more characters in the charstream?
      • if yes, go back to step 2;
      • if not:
        1. output the code word which denotes P to the codestream;
        2. END.

    Decoding: additional terms

    The principle of decoding
    At the start of decoding, the dictionary looks the same as at the start of encoding -- it contains all possible roots.

    Let's consider a point in the process of decoding, when the dictionary contains some longer strings. The algorithm remembers the previous code word (pW) and then reads the current code word (cW) from the codestream. It outputs the string.cW, and adds the string.pW extended with the first character of the string.cW to the dictionary. This is the character that would have been explicitly read from the codestream in LZ78. Because of this principle, the decoding algorithm "lags" one step behind the encoding algorithm with the adding of new strings to the dictionary.

    A special case occurrs if the cW denotes an empty entry in the dictionary. This can happen because of the explained "lagging" behind the encoding algorithm. It happens if the encoding algorithm reads the string that it has just added to the dictionary in the previous step. During the decoding this string is not yet present in the dictionary. A string can occurr twice in a row in the charstream only if its first and last character are equal, because the next string always starts with the last character of the previous one. This leads to the following decoding rule: the string.pW is extended with its own first character and the resulting string is added to the dictionary and output to the charstream.

    The decoding algorithm

    1. At the start the dictionary contains all possible roots;
    2. cW := the first code word in the codestream (it denotes a root);
    3. output the string.cW to the charstream;
    4. pW := cW;
    5. cW := next code word in the codestream;
    6. Is the string.cW present in the dictionary?
      • if it is,
        1. output the string.cW to the charstream;
        2. P := string.pW;
        3. C := the first character of the string.cW;
        4. add the string P+C to the dictionary;
      • if not,
        1. P := string.pW;
        2. C := the first character of the string.pW;
        3. output the string P+C to the charstream and add it to the dictionary (now it corresponds to the cW);
    7. Are there more code words in the codestream?
      • if yes, go back to step 4;
      • if not, END.

    An example
    The encoding process is presented in Table 1.

    Contents of the dictionary at the beginning of encoding:
    (1) A
    (2) B
    (3) C
    Charstream to be encoded:
    Pos123456789
    Char A B B A B A B A C

    Table 1: The encoding process
    StepPos DictionaryOutput
    1.1(4) A B(1)
    2.2(5) B B(2)
    3.3(6) B A(2)
    4.4(7) A B A (4)
    5.6(8) A B A C (7)
    6.----(3)

    Table 2. explains the decoding process. In each decoding step the algorithm reads one code word (Code), outputs the corresponding string (Output) and adds a string to the dictionary (Dictionary).

    Table 2: The decoding process
    StepCode OutputDictionary
    1. (1)A--
    2. (2)B(4) A B
    3. (2)B(5) B B
    4. (4)A B(6) B A
    5. (7)A B A (7) A B A
    6. (3)C(8) A B A C

    Let's analyze the step 4. The previous code word (2) is stored in pW, and cW is (4). The string.cW is output ("A B"). The string.pW ("B") is extended with the first character of the string.cW ("A") and the result ("B A") is added to the dictionary with the index (6).

    We come to the step 5. The content of cW=(4) is copied to pW, and the new value for cW is read: (7). This entry in the dictionary is empty. Thus, the string.pW ("A B") is extended with its own first character ("A") and the result ("A B A") is stored in the dictionary with the index (7). Since cW is (7) as well, this string is also sent to the output.


    This page hosted by
    Get your own Free Home Page
    1