The Art of Lossless Data Compression vol. 19t

Here are the results of tests performed in September 2000 to compare lossless compression of english texts by all known good enough programs developed for such purpose, including RK, DC, YBS, Bzip2, IMP, RAR and 7-zip. See Archive Comparison Test by J.Gilchrist for more details: http://act.by.net If anybody wants to start or continue such tests, or can suggest some other sets of texts, or other compression programs, (not sources or algorithm descriptions, executable programs only) or knows we have missed something important, (some new fantastic technology, an algorithm or even a program capable of lossless compression of up to 1000:1 etc.) please let us know immediately: artest@hotmail.ru Thank you!

[[1]] COMPRESSION QUALITY

(see also [[2]] Speed [[3]] Details [[4]] Comments) Fifth line shows results for the sum of four Canterbury Corpus Large Set files, tenth line - for the sum of all 556 files in five sets. Original ACE32 BEE BIX BOA BA BZip2 DC ERI IMP length -m5-d4096 -m3-d3 -m1 -mdg -m15 -k50 -m -k -9 -b16300-mt5 (none) -2-s4 581.79% 138.67 108.95 129.00 106.46 109.61 121.55 104.85 112.32 119.84 411.40% 112.54 105.04 105.48 100.56 103.86 110.95 101.39 106.17 109.09 582.55% 139.98 106.19 130.78 106.37 106.98 120.52 102.53 109.57 118.23 657.05% 139.67 112.21 137.08 112.45 110.49 130.05 110.92 112.48 128.20 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 523.75% 128.40 106.29 120.77 104.15 106.01 117.43 102.85 108.43 115.51 485.12% 134.76 105.29 129.30 104.67 106.57 116.69 101.84 110.39 115.42 395.58% 130.60 104.45 124.51 102.76 105.56 113.01 100.95 109.19 112.70 432.57% 134.01 104.07 128.51 103.36 106.45 115.88 101.71 110.58 115.55 723.25% 147.93 112.09 143.07 110.68 118.26 135.44 109.89 118.12 143.21 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 448.75% 133.44 104.25 127.84 103.27 106.50 116.14 101.61 110.15 116.28 ArHanGel PPMonstr SBC RAR RK SZip 777 7-zip YBS ZZip -2-mm-mt -o8-m58 -b19 -m5-mm-mde -mx3 -o10-b41 -m5-mu32 -mx -m16mu -b20-mx 115.91 103.48 111.74 138.73 *100% 111.26 114.79 159.77 105.39 109.54 100% 102.55 101.83 112.46 102.13 103.83 100.50 111.08 102.00 103.38 115.28 101.98 109.04 141.03 *100% 111.22 112.14 161.22 102.81 106.94 139.25 104.59 112.95 141.29 100% 115.21 127.33 184.90 109.73 110.23 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 111.81 102.04 106.61 128.87 100% 108.11 109.42 144.02 103.13 105.77 113.92 100.61 110.78 134.99 *100% 110.86 112.33 152.34 104.56 107.07 107.58 100% 109.23 134.61 100.57 109.27 107.97 142.02 103.44 106.12 110.45 ^100% 110.75 135.33 100.69 109.62 109.09 147.50 104.24 107.05 137.70 105.95 117.14 153.76 100% 117.12 116.00 178.32 115.11 118.63 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 111.38 100% 110.16 135.48 100.01 109.47 108.91 147.89 104.24 107.04 * RK -mx2 (not -mx3 ) ^ PPMonstr -o9 -m56 (not -o8 -m56)

[[2]] Speed

Canterbury Corpus Large Set http://corpus.canterbury.ac.nz/ftp/large.zip was used for this test, and an AMD-K6-400 machine with 64M RAM and Windows98. Programs,options Overall Average Compress Extract Compressed score, Users' time, time, size, score, seconds seconds bytes seconds % seconds % 777 a -m5 -mu32 1354 156% 1171 140% 203 222 3343996 777 a -mg -s 1880 217% 1262 150% 688 139 3793939 7zip a 1307 151% 1232 147% 83 4 4393623 7zip a -mx 1358 156% 1240 148% 131 4 4401160 acb B 2540 293% 1818 217% 803 808 3346915 acb b 2997 346% 2059 246% 1042 1047 3267480 acb u 3802 439% 2496 298% 1452 1456 3221349 ace32 a 1265 146% 1132 135% 148 7 3998222 ace32 a -d4096 1265 146% 1123 134% 158 7 3962314 ace32 a -d4096 -s- 1265 146% 1123 134% 159 7 3962374 ace32 a -d4096 -m1 1221 141% 1150 137% 80 7 4086782 ace32 a -d4096 -m5 1552 179% 1142 136% 456 7 3923686 arhangel a -2 -mm 1203 139% 1117 133% 96 94 3647060 arhangel a -mt 1173 135% 1069 127% 115 109 3417110 arhangel a -mtf 1177 136% 1071 128% 118 110 3418181 ba -k 1057 122% 988 118% 78 26 3432541 ba -k -1 1170 135% 1122 134% 54 26 3927264 ba -k -50 1046 120% 954 114% 103 17 3337823 bee a -m1 1297 149% 1143 136% 171 178 3414048 bee a -m2 1371 158% 1177 140% 215 222 3361009 bee a -m3 1615 186% 1303 155% 347 353 3295506 bee a -m1 -d3 1247 144% 1114 133% 148 168 3353767 bee a -m2 -d3 1312 151% 1143 136% 188 210 3289365 bee a -m3 -d3 1534 177% 1268 151% 296 336 3248025 bee a -m3 -s 1846 213% 1430 171% 463 466 3303624 bee a -d3 -s 1363 157% 1176 140% 209 216 3378513 bix a 1243 143% 1063 127% 201 3 3743319 bix a -mdg 1245 143% 1051 125% 215 4 3690815 bix a -m9 1246 144% 1064 127% 202 4 3743319 bix a -mdg -m9 1249 144% 1052 125% 219 5 3690815 bix a -mdg -s 1274 147% 1054 126% 244 5 3690984 boa -m1 1623 187% 1387 165% 263 281 3886856 boa -a 1560 180% 1266 151% 327 340 3217347 boa -m15 1588 183% 1277 152% 346 358 3182732 bzip2 -k -1 1201 138% 1159 138% 47 13 4109767 bzip2 -k -5 1089 125% 1046 125% 48 14 3697142 bzip2 -k -9 1070 123% 1023 122% 53 15 3611558 dc e 948 109% 917 109% 35 19 3218290 dc e -ft 954 110% 921 110% 37 20 3232273 dc e -b16300 1024 118% 872 104% 170 69 2826931 dc e -b16300 -mt5 995 115% 869 103% 141 70 2826931 dc e -b12000 867 100% 836 100% 35 18 2931168 dc e -b12000 -mt5 865 100% 836 100% 33 18 2931168 eri a -m1 1110 128% 982 117% 143 29 3378440 eri a -m2 1108 128% 975 116% 148 30 3346586 eri a -m3 1114 128% 970 116% 160 32 3318853 eri a 1127 130% 971 116% 175 33 3313568 eri a -m5 1162 134% 975 116% 208 33 3313559 imp98 a -2 1043 120% 1002 119% 46 11 3547964 imp98 a -2 -s4 1040 120% 998 119% 48 11 3535351 imp_d a -2 -s4 1041 120% 1001 119% 45 11 3548156 pkzip -es 1659 191% 1655 197% 5 3 5945608 pkzip -a 1326 153% 1307 156% 22 2 4691477 pkzip -exx 1498 173% 1303 155% 217 2 4605928 ppmd e -o5 953 110% 934 111% 21 22 3276542 ppmd e -o7 967 111% 941 112% 29 32 3260462 ppmd e -o9 1027 118% 990 118% 42 45 3387445 ppmd e -o5 -m56 948 109% 931 111% 20 22 3266132 ppmd e -o6 -m56 927 107% 906 108% 24 26 3159004 ppmd e -o7 -m56 914 105% 890 106% 27 29 3090636 ppmd e -o8 -m56 917 106% 885 105% 36 36 3045769 ppmd e -o9 -m56 956 110% 919 109% 42 42 3142087 ppmonstr e -o5 1025 118% 975 116% 57 59 3276610 ppmonstr e -o7 1038 120% 975 116% 70 75 3214871 ppmonstr e -o9 1106 127% 1022 122% 93 98 3293262 ppmonstr e -o5 -m56 1018 117% 971 116% 53 58 3267452 ppmonstr e -o7 -m56 983 113% 924 110% 65 69 3055431 ppmonstr e -o9 -m56 1048 121% 955 114% 104 96 3051781 rar a 1226 141% 1134 135% 103 4 4029077 rar a -m1 1247 144% 1205 144% 48 4 4304853 rar a -s -m5 1560 180% 1144 136% 463 4 3937052 rk -mf1 1134 131% 1096 131% 43 29 3826096 rk -mf2 1228 141% 1109 132% 133 81 3652520 rk -mf3 1347 155% 1121 134% 252 83 3645264 rk -mx1 1615 186% 1249 149% 407 352 3083632 rk -mx2 1735 200% 1320 157% 461 418 3080372 rk -mx2 -ft+ -fe+ 1737 200% 1321 158% 463 419 3080372 rk -mx3 1768 204% 1336 159% 480 437 3064076 rk -mx3 -ft+ -fe+ 1765 204% 1334 159% 479 435 3064076 sbc c 1058 122% 993 118% 73 24 3459990 sbc c -b9 1052 121% 967 115% 95 26 3352214 sbc c -b19 1103 127% 958 114% 162 43 3233894 sbc c -b19 -e 1033 119% 941 112% 103 26 3257878 szip -v0 -b41 1019 117% 984 117% 39 34 3405120 szip -o8 -b41 1021 118% 974 116% 53 36 3356744 szip -o0 -b41 1055 121% 959 114% 107 24 3326271 ufa a -m5 -mu32 1378 159% 1185 141% 216 234 3343996 ufa a -m5 -mu10 1312 151% 1154 138% 177 195 3387619 ufa a -mg -s 1630 188% 1161 138% 522 28 3889878 uharc a 1381 159% 1183 141% 220 27 4081072 uharc a -m1 1354 156% 1244 148% 122 29 4333271 uharc a -m3 1514 175% 1125 134% 432 26 3801399 ybs_d -y 986 113% 932 111% 61 19 3265494 ybs_d -m2mu 986 113% 932 111% 61 19 3265494 ybs_d -m16mu 988 114% 925 110% 71 19 3236677 ybs_d -m16mu -r 992 114% 930 111% 70 18 3257713 zzip a 1033 119% 975 116% 65 25 3396007 zzip a -mm 1615 186% 1555 186% 68 29 5468735 zzip a -mm -b20 1436 166% 1364 163% 81 28 4780656 zzip a -mm -mx 1030 119% 971 116% 66 26 3376260 Overall score is calculated by adding compression time, extraction time, and time it would take to transfer the compressed file over a 28,800bps network: (compressed_size)/3600 , because 28800 bits_per_second is 3600 bytes_per_second Average Users' score is calculated by adding (compress_time/10)+ extract_time + time it would take to transfer the compressed file over a 28,800bps network. Compression time is divided by 10 here, because more than 90% of people would never compress anything during their life (with compression programs), but they use compressed data almost _every_ time they use computers and/or Internet. That's why compression time is not so actual for them.

[[3]] Details

are no longer put to this main text (738 lines reporting 22796 results on 556 files in 5 sets), but can be found in FULL version with TEXTS.DAT and *.BAT at http://geocities.com/SiliconValley/Bay/1995/artest19.zip or http://artest1.tripod.com/artest19.zip

[[4]] Comments

Links to download programs:

7-Zip 2.11 :W http://www.7-zip.com/dl/7zip211.exe 493K BIX 1.00b7 :W http://www.7-zip.com/dl/ufa/bix100b7.zip 89K 777 0.04b1 :W http://www.7-zip.com/dl/ufa/777004b1.zip 72K UFA 0.04b1 :W http://www.7-zip.com/dl/ufa/ufa004b1.zip 64K ArHanGeL 1.40 :a http://geocities.com/SiliconValley/Lab/6606/arh140.zip 50K ERI32 4.8fre :e http://geocities.com/eri32/eri48fre.zip 91K Imp 1.1 :e http://www.winimp.com/imp110d.zip 266K Imp-win 1.12 :W http://www.winimp.com/imp112.exe 122K PkZip 2.50 :a ftp://ftp.simtel.net/pub/simtelnet/msdos/arcers/pk250dos.exe 202K RK 1.03b1 :e http://malcolmt.tripod.com/downloads/rk103a1d.exe 478K RK 1.03b1 :W http://malcolmt.tripod.com/downloads/rk103a1w.exe 380K RAR32 2.71 :e ftp://ftp.netlab.sk/public/rarsoft/rar/rarx271.exe 257K WinRAR 2.71 :W ftp://ftp.netlab.sk/public/rarsoft/rar/wrar271.exe 588K PPMD var.F, PPmonstr v.F :W ftp://ftp.simtel.net/pub/simtelnet/win95/compress/ppmdf.zip 97K ACB 2.00c :e ftp://ftp.simtel.net/pub/simtelnet/msdos/compress/acb_200c.zip 42K BOA 0.58b :e ftp://ftp.cdrom.com/.3/sac/pack/boa058.zip 74K DC 0.98b :W ftp://ftp.cdrom.com/.3/sac/pack/dc124.zip 55K BA 1.00 beta :e ftp://ftp.cdrom.com/.3/sac/pack/ba100b.zip 60K Bzip2 1.0.1 :W ftp://sourceware.cygnus.com/pub/bzip2/v100/bzip2-100-x86-win32.exe 68K SZip 1.12a :W http://www.compressconsult.com/szip/szip_112a_win32.zip 71K UHArc 0.2b :e ftp://ftp.cdrom.com/.3/sac/pack/uharc02.zip 101K ZZip 0.35g :W http://www.via.ecp.fr/~damien/zzip/zzip-win32.zip 23K ACE32 2.0b3 :W ftp://ftp.forlangs.net/pub/windows/winace/ace20b3.exe 573K YBS 0.03e :e http://members.nbci.com/vycct/ybs003ed.zip 55K YBS 0.03e :W http://members.nbci.com/vycct/ybs003ew.zip 43K SBC 0.305b :e http://geocities.com/sbcarchiver/sbc0305b.zip 158K BEE 0.4.8 : mailto:Andrew.Filinsky@p11.f4.n452.z2.fidonet.org :a - any DOS - DOS programs, will run under pure DOS or in a DOS box :e - extender - DOS programs using DOS extenders like DOS/4GW or CWSDPMI :W - windoze - Windows95/98/NT/etc programs If direct link doesn't work-most probably newer version of the program appeared at the same site: visit web page, or read the whole directory from ftp server (i.e. try the same URL, but without filename).

Homepages:

Arhangel : http://geocities.com/SiliconValley/Lab/6606 Eri32 : http://geocities.com/eri32 mirror : http://artest1.tripod.com RK : http://malcolmt.tripod.com Imp,WinImp : http://www.technelysium.com.au mirror : http://www.winimp.com ACE32 : http://www.winace.com PkZip : http://www.pkware.com RAR,WinRAR : http://www.rarsoft.com BZip2 : http://sources.redhat.com/bzip2 SZip : http://www.compressconsult.com/szip ZZip : http://www.via.ecp.fr/~damien/zzip YBS : http://members.nbci.com/vycct SBC : http://geocities.com/sbcarchiver Ufa,777, BIX,7-Zip: http://www.7-zip.com PPMD, PPMonstr, ACB, BA, Bee, BOA, DC, UHArc - no homepage.

What's new:

7 new programs were tested: PPMD var.Gpre Sep29, PPMonstr var.Gpre Oct4, YBS 0.03e -DOS and Win32 versions, ZZip 0.35f, SBC 0.304b, ERI32 4.8fre. Newer versions of ZZip, SBC, ACE, UFA are ready, and will be tested next time. Latest beta versions of BEE, DC, PPMonstr, UFA are available from authors by e-mail request: BEE: Andrew.Filinsky@p11.f4.n452.z2.fidonet.org DC: EdgarBinder@t-online.de PPMonstr: shkarin@arstel.ru , dmitry.shkarin@mtu-net.ru UFA: support@7-zip.com ACB, UHArc and PKzip are not tested on all 556 text files any more, their results can be found in previous versions: ACB - ARTest17 UHArc - ARTest17 PKzip - ARTest17,18 Results of PPMD (an open source version of PPMonstr) are in full version only, TEXTS.DAT file, UFA 0.04b1 performs on text files exactly as 777 0.04b1. Results of old programs (not updated for more than 3 years, and no homepage), programs with low overall score will not be put to latest versions of ARTest. And also results of programs that are known to have bugs (in compression/decompression functions) for more than half a year.

WARNINGS:

BA 1.00beta can't decompress any file compressed with -mf , and says nothing like "CRC fails" DC 0.99.158b failed to decompress 1DFRE10.dc , ANDES10.dc , and BTI0110.dc , saying "Corrupted block" (while t(est) command writes "Test successful"). RK 1.03b1 was unable to correctly decompress 555 files (all except E.TXT) compressed with "-mx3 -ft-" , reporting ERROR 303: CRC check failed. ERI32 4.8fre can't compress files larger than (free DPMI memory)/6, i.e. about 10Mb on a PC with 64Mb RAM. The largest 44Mb file was split to 5 chunks 9000000 bytes long (last chunk was 8894190 bytes). Bugs in tested versions of SBC and ZZip were found, but they are removed from latest versions ZZip 0.35g and SBC 0.305b . Problems in all other compressors were not found. The LATEST RELEASE, and all previous versions of these tests can be found at http://geocities.com/SiliconValley/Bay/1995/ and http://artest1.tripod.com/

The FINAL PART

> [[5]] PLEASE read THIS before replying to this article was removed from this text, but can be easily found at http://geocities.com/SiliconValley/Bay/1995/artest10.html http://artest1.tripod.com/artest10.html Send your suggestions, comments to artest@hotmail.ru With best kind regards, RAO Inc. 1