The Data Compression News Blog

All about the most recent compression techniques, algorithms, patents, products, tools and events.

Subscribe

Posts: RSS Feed
Comments: RSS Feed

Bijective BWT

Posted by Sachin Garg on 15th January 2008 | Permanent Link

David Scott has written a bijective BWT transform, which brings all the advantages of bijectiveness to BWT based compressors. Among other things, making BWT more suitable for compression-before-encryption and also give (slightly) better compression.

For those who are among the majority who don’t understand bijectiveness and its advantages, I will start with a brief introduction.

Bijectiveness, in data compression context, means that it should be possible to feed any random file to the decompressor and it should generate a valid ‘uncompressed’ file. Decompressors of non-bijective compressors usually crash (or get stuck in infinite loops) if they are made to decompress any data which was not created by their compressor. The bijective property is particularly important for compression done before encryption.

And as bijective compression result in a more complete one-to-one mapping, they should theoretically give better compression ratios when compared to their no-bijective counterparts. Practically however this gain is very small and the added runtime and code complexity makes it debatable if its ‘worth the effort’. But anyway, that doesn’t means that the gain doesn’t exists.

OK, enough about bijectiveness, lets get to David’s ‘Scottified’ BWT.

With BWT, we need to store the last column of rotated and sorted strings, along with a index of original string’s location. With normal BWT, its not possible to reverse the transform without knowing the index. But with David’s BWT, we don’t need to store that extra index. Without the need of index, any stream can be considered a BWTed string and its reverse transform can be computed.

It might sound a little confusing how the transform can still be reversible without that index. Its got something to do with the cycles in BWTed strings which are followed during reverse BWT transform but you will need to go through all the gory details at the pages linked below to understand that.

You can get the code and a brief description of the algorithm at David’s website. You can find a better description of the algorithm in the comp.compression discussion on this, look for posts by Klaus Stengel.