The Calgary Corpus Compression Challenge Update
Posted by Sachin Garg on 5th August 2006 | Permanent Link
Alexander Ratushnyak updates his December 2005 entry (of size 593620) to Calgary corpus compression challenge. The new record is 589862 bytes.
Matt Mahoney provides a summary of the PAQAR based compressor in a comp.compression post:
I looked at the decompressor. It is based on PAQAR with a tiny dictionary like the previous submission last December. But it appears there are two main differences. First, it uses bitwise contexts to model pic (in PicModel), as in PAQ7. Second, it adds an indirect context model (included in SparssModel), as in PAQ8F.
In other news, on May 21, 2006, Calgary corpus compression completed ten years since its inception.
August 6th, 2006 at 3:30 am
Does this 0.5% improvement matters to anyone?
You researchers must be excited about this, but all this effort is a useless overkill.
And PAQ is sooooooo slow…
August 6th, 2006 at 7:09 am
Tom, you are correct that it doesn’t matters to normal users “yet”, but almost all technology the normal users are using today (or will use in future) is a result of such state-of-the-art work by researchers.
August 7th, 2006 at 5:15 am
If you look at the 10 year history of the challenge, most entries were small gains, but over time all this adds up to a significant improvement.
Size Date Name
759881 09/1997 Malcolm Taylor
692154 08/2001 Maxim Smirnov
680558 09/2001 Maxim Smirnov
653720 11/2002 Serge Voskoboynikov
645667 01/2004 Matt Mahoney
637116 04/2004 Alexander Ratushnyak
608980 12/2004 Alexander Ratushnyak
603416 04/2005 Przemysław Skibiński
596314 10/2005 Alexander Ratushnyak
593620 12/2005 Alexander Ratushnyak
589863 05/2006 Alexander Ratushnyak
August 7th, 2006 at 5:27 am
I wish there was a record of how the speed and memory requirements changed over time.
August 11th, 2006 at 11:56 am
Probably quadric anti-proportional. :-)