The Data Compression News Blog

Your daily update on the most recent compression techniques, algorithms, patents, products, tools and events.

Subscribe

Posts: RSS Feed
Comments: RSS Feed

Sponsored Links

Recent Posts

  • Bijective BWT (2 Comments)

    David Scott has written a bijective BWT transform, which brings all the advantages of bijectiveness to BWT based compressors. Among other things, making BWT more suitable for compression-before-encryption and also give (slightly) better compression.

  • Asymmetric Binary System (107 Comments)

    Jarek Duda’s “Asymmetric Binary System” promises to be an alternate to arithmetic coding, having all the advantages, but being much simpler. Matt has coded a PAQ based compressor using ABS for back-end encoding. Update: Andrew Polar has written an alternate implementation of ABS.

  • Precomp: More Compression for your Compressed Files

    So many of today’s files are already compressed (using old, outdated algorithms) that newer algorithms don’t even get a chance to touch them. Christian Schneider’s Precomp comes to rescue by undoing the harm.

  • On2 Technologies is Hiring

    There aren’t too many companies working on cutting edge codecs, and of those few this one is hiring. Best of luck.

  • China’s AVS Specifications Available (2 Comments)

    Its old news that China has developed their own Advanced Video Standard to avoid high licensing fees. English translation of the standard is now available, along with the IPR policy. Finally something technical that you can get your hands on to feed your appetite.

XML-WRT 3.0

Posted by Sachin Garg on 29th September 2006 | Permanent Link

Przemyslaw Skibinski recently released version 3.0 of XML-WRT. This version adds internal PPMVC and FastPAQ8 compression.

XML-WRT is a high-performance XML compressor (actually it works with all textual files). It transforms XML to more compressible form and uses zlib (default), LZMA, PPMVC, or FastPAQ8 as back-end compressor. This idea is based on well-known XML compressor - XMill. Moreover, XML-WRT creates a semi-dynamic dictionary and replaces frequently used words with shorter codes. There are additional techniques to improve compression ratio:

* word alphabet can consist of start tags (like ”), urls, e-mails
* special model for numbers encoding
* input XML file is split into containers
* there are special containers for dates, time, pages and fractional numbers
* end tags (”) are replaced with a single char
* end tags + EOL symbols can also be replaced with a single char
* spaceless words model
* very effective methods for white-space preserving
* quotes modeling (’=”‘ and ‘”>’ replaced with a single char)

Matt Mahoney compared results from version 2.0 to 3.0.

On enwik8 and enwik9 from large text benchmark.

xml-wrt 2.0 -l6 -b255 -m255 -s -f8 23,199,202 196,914,328
xml-wrt 3.0 -l11 -b255 -m255 -f24 19,663,305 165,274,422

On enwik8, as a preprocessor to ppmonstr:

ppmonstr J -m1700 -o16 = 19,055,092
xml-wrt 2.0 -l0 -w -s -c -b255 -m100 -e2300 | ppmonstr J -m1650 -o64 = 18,625,624
xml-wrt 3.0 -l0 -b255 -m255 -3 -s -e7000 | ppmonstr J -m1650 -o64 = 18,494,374

On enwik9, as a preprocessor to ppmonstr, it goes from 150,651,873 to 150,004,636.

Leave a Reply

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>