Data Compression Algorithms in Java

Two common data compression techniques, Limpel Ziv Welch (LZW), and Run Length Encoding (RLE), are explained and implemented as Java classes. No online demo applet here, just the sources, but they include test / demo code.

Be advised that these, and virtually all data compression techniques, have been granted patents.
This software uses only algorithms and coding techniques that are available to the public in the open literature. You are however, advised to make your own determination of legality, for your intended use.
See the comp.compression FAQ for more information, both on theory and application of data compression, and on patent coverage that you should be aware of. The patent information given below is taken from this FAQ.

Run Length Encoding (RLE)

The Run Length Encoding (RLE) algorithm compresses any sequence of byte data. The resulting compressed codes are larger than bytes, but can each represent a group of bytes, at least some of the time, for an overall saving.

RLE compression recognizes sequences of a repeating value, and replaces each sequence by a code, representing the value and the repeat count. The degree of compression depends on how much repetition is found in the data. RLE compression is effective with data having many and / or lengthy contiguous repeat values, such as graphics with areas of solid color, and sparsely filled binary data with runs of zero bytes.

The repeat count should allow for the expected amount of repetition, without wasting too much space allowing for unusually large counts, which can be handled when they do occur, by writing one or more additional codes. This version uses 4 bit counts for total 12 bit code width.

Note! Other versions of RLE may use other sizes of repeat count, and other coding conventions. They are not inter-operable with this algorithm.

Tsukiyama has two US patents on run length encoding: 4,586,027 and 4,872,009 granted in 1986 and 1989 respectively.

There is no need to construct an RLE object, the RLE.Compress () and RLE.Expand () methods are static.

Limpel, Ziv, Welch (LZW)

The Limpel, Ziv, Welch (LZW) algorithm compresses any sequence of byte data. The resulting compressed codes are larger than bytes, but can each represent a group of bytes, at least some of the time, for what is usually an overall saving.

LZW compression recognizes sequences that have already appeared in the data, and replaces each later appearance by a unique code. The degree of compression depends on how much repetition is found in the data. LZW compression is effective with data having many and / or lengthy repeat sequences, such as text, and graphics with areas of solid color or patterns.

The sequence history should allow for the expected amount of repetition, without wasting too much space in each code on a history index. This version is fixed at 12 bit codes, which can refer to nearly a 2^^12 item sequence history, less some internal overhead. This size is considered good for short message blocks, and is simple to implement.

Note! Other versions of LZW may use other sizes of sequence history, and other coding conventions. They are not inter-operable with this algorithm.

The original papers:

This and many similar algorithms have been patented: This Java was ported from C code in the copyrighted article, "LZW Data Compression" by Mark Nelson, in Dr. Dobb's Journal October, 1989.

There is no need to construct an LZW object, the LZW.Compress () and LZW.Expand () methods are static.

Sources

This software is provided free of charge with no support. Email me at morris_hirsch@brown.edu if you do have a problem, and I will try to help, but I cannot promise to.

Your comments, suggestions, bug reports, bug fixes, and enhancements, are welcome!

Was this Helpful?

Was this Helpful? Was it what you were looking for? How did you find this page? Search result, following a link, recommendation? Please let me know. Email me at morris_hirsch@brown.edu

Other java

Please also visit my home page.

Legalities

Except for all sections which are identified as the work, and copyright, of others, this work is Copyright (C) 1998 by Morris Hirsch. All rights reserved, except as granted here.

Redistribution and use in source and binary forms, with or without modification, for any purpose, are permitted, except as may be restricted by original copyright holders, provided that the following conditions are met:

DISCLAIMER

This software is provided free of charge with no support.
This software is provided in source format; You are advised to examine it and understand it, before putting it to use.
This software uses only algorithms and coding techniques that are available to the public in the open literature. You are however, advised to make your own determination of legality, for your intended use.
This software is provided by the Author and Contributors ``AS IS'' and any express or implied warranties, including, but not limited to, the implied warranties of merchantability and fitness for a particular purpose, are disclaimed.
In no event shall the Author or Contributors, or their Agents, be liable for any direct, indirect, incidental, special, exemplary, or consequential damages (including, but not limited to, procurement of substitute goods or services; loss of use, data, or profits; or business interruption) however caused and on any theory of liability, whether in contract, strict liability, or tort (including negligence or otherwise) arising in any way out of the use of this software, even if advised of the possibility of such damage.
1