Compressed Index Web Site
Posted by Mark Nelson on 3rd January 2006 | Permanent Link
Paolo Ferragina of the University of Pisa and Gonzalo Navarro of the University of Chile have a web site dedicated to the exploration of compressed indices. Their mission statement explains it better than I can:
The new millennium has seen the born of a new class of full-text indexes which are structurally similar to Suffix Trees and Suffix Arrays, in that they support the powerful substring search operation, but are succinct in space, in that it is close to the empirical entropy of the indexed data. They are therefore called compressed Suffix Trees and compressed Suffix Arrays, or in general compressed indexes.
In general, old-school compression guys like me are more or less resigned to the notion that once you compress a block of data, you need to decompress the whole thing in order to get back even a little piece of it. However, Paolo and Gonzalo have posted links to quite a few papers on full text compressed indices, which expound the notion that you can pick and choose exactly what you want to decompress. The site has collections of texts, links to people and papers, and a proposed API for testing work in the future.
January 5th, 2006 at 6:37 pm
Definitely clever, definitely room for improvement. It would be nice if they would put up some free explanations of the algorithms on that page, with lots of nice pictures…static LZW with a special symbol to mark block beginnings can’t be optimal on BWTs.