The Data Compression News Blog

All about the most recent compression techniques, algorithms, patents, products, tools and events.

Subscribe

Posts: RSS Feed
Comments: RSS Feed

Sponsored Links

Quantized Indexing: Beyond Arithmetic Coding

Posted by Sachin Garg on 22nd November 2005 | Permanent Link

Ratko V. Tomic, Chief scientist & CTO, 1stWorks Corp. has been talking about a new algorithm for entropy coding. The promise held by this algorithm is that of being much faster and having better compression efficiency too.

It is coding based on enumerative combinatorics (walks on Pascal triangle, counting of integer lattice paths).

The new algorithm significantly improves on the previously known method “enumerative coding” (which is the regular un/ranking procedure in combinatorics), increasing its speed and reducing memory size by factor O(n), where n is number of input symbols). It is the algorithm Rissanen was looking for when he found a partial solution, arithmetic coding (in 1976).

I dropped him a mail with some questions, his answers are given below.

He feels that currently popular arithmetic/range coding scheme will eventually get replaced by quantized indexing algorithm. He said that “With the published well performing reference implementation, this process should accelerate, but I can’t predict when will QI replace AC altogether. Eventually it will”.

But, this algorithm has a “patent pending” tag attached. So even if you find it technically suitable to your requirements, consider this fact too. This also means that everything stated has a commercial motive behind it, which is not a bad thing, but just means that verify results before you believe anything.

Here are the questions, and his answers (copied here with permission):

[comp.compression excerpt]
The coding aspects in both cases are message enumerations — with EC in the exact lists of messages (performed using the virtual dictionary of combinations), with AC in the cummulative probablity space (which only asymptotically for large n approaches the exact EC coding, while being uniformly suboptimal for any finite n).

What do you mean by “virtual dictionary of combinations”?

This refers to the EC modeling patern (ECMP) described in ref
[23] pp 26-39. Virtual dictionary is on p 29.

[23] Quantized indexing: Background information
1stWorks TR05-0625, 1-39, 2005
http://www.1stworks.com/ref/TR/tr05-0625a.pdf

[24] Fast, optimal entropy coder
1stWorks TR04-0815, 1-52, 2004
http://www.1stworks.com/ref/TR/tr04-0815b.pdf

Virtual dictionary in enum coding is like dictionary of functions (or wave patterns) sin(kx), cos(kx) in transforms coding. These patterns form a virtual list (in abstract mathematical space, not in a literal RAM list) of waveforms that all other functions being encoded are decomposed into. With EC/QI, this virtual dictionary contains combinations of various symbol sequences, with constraints relevant for given domain, that all input sequences get decomposed into. You can check more on this in [23].

What is the input data for your coder? AC uses a list of cummulative frequency counts of the symbol.

The QI coder takes raw input data, nothing special is needed. It works internally with the plain symbol counts, not the cumulative frequency counts as AC. For multialphabet coder, something equivalent to AC cumulative counts comes out of the QI algorithm, but for a different reason (see [24] on on QI multialphabet scheme).

How much more computation is needed for preparing this input, compared to preparing frequency counts?

Only the plain counts are computed. In case of larger input data, when a coder finds that the array of counts contains sufficient redundancy (e.g. for very low entropy arrays, you get mostly 0’s) and the list is large enough the second QI pass is done on the list itself by a multialphabet QI coder, where counts are treated as symbols satisfying a priori binomial distribution. (I had tested also a more accurate hypergeometric distribution, but this doesn’t give a sufficient gain to be worth the trouble, exept for extremely skewed distributions, e.g very large sparse srrays with few dense small clusters.)

I suppose, one may need even a further similar passes if the data is large enough, but for the stuff tested (up to about 128 Mbit) the single second pass was sufficient.

When squeezeing the last bit isn’t the top priority (as it is in the benchmarks), a simple Huffman code constructed to binomial distribution is more than good enough to encode the counts quickly and simply. QI has a file based tables of such codes, which are very regular, thus the tables are extremely compact, e.g. taking only 67k to store all of the Huffman codes for every combination of n=1K (coding block size, up to 1K), k (the count of 1’s per block, 0..n) and (the average count of 1’s, 0..n), the total equivalent to 2^30 = 1 billion distinct Huffman codes.

[comp.compression excerpt]
EC has a much cleaner separation between the modeling parameters and the coding computations — the modeling engine of EC defines the enumerative classes (the subsets of “equiprobable messages”), while coder computations deal only with the universal combinatorial properties of sequences (belonging to a given class). The result is a much faster operation, since the ‘universal properties of sequences’ are independent of the particular source parameters, thus they can be precomputed and optimized away, outside of the coding loop, as it were. If AC were to code via precomputed values, it would need a separate table for each set of source robabilities, which is impractical even for a static binary source.

Can they be “precomputed and optimized away” in an adaptive model also?

Yes, but as indicated, with much larger tables (see [23], 20-22).

What kind of model does EC/QI needs? AC uses PPM models which are just lists of frequency counts of symbols in some context.

PPM is modeling approach tuned to AC model interface, where modeling engine has to translate all it knows about the data into the probability of the next single symbol. See [23] 30-31 for comparison of ECMP and ACMP. The two approaches are entirely different and belong to different schools of thought in information theory, EC to Kolmogorov, AC to Shannon (see also DCC preprint p. 9, [N7]).

QI can code to PPM in two ways (see [N7]). The faster, more native way is to split the output into separate streams, based on probability classes produced by the engine. E.g. if engine gives p(a) values with 1% uncertainty (which is highly optimistic), then you have up to 100 classes. These classes can be phased in, with fewer classes initially, since the natural uncertaintly of probabilities is sqrt(n), hence most classes are initially melded into one, and they split into subclasses as the n grows and the relative uncertainty of p’s, sqrt(n)/n drops. This is similar phasing in & splitting scheme to Context Tree Weighing or the older Context Tree scheme of Rissanen.

QI can also code literally the same way as AC, by doing lattice jumps to appropriate new probability. To avoid loss of the index content (since mantissa is different at different lattice points), the mantissa needs to be rescaled to the new positions, which involves multiplication, squandering all of the QI speed edge, with only slight accuracy gain at best vs AC doing the same thing.

Neither way is native QI modeling, both trying to interface to the AC modeling engine at the output end. The native EC/QI modeling is described in the [23]. It is basically a segementation/partition scheme which produces genuine enumerative classes, as input to QI.

BWT is a natural modeling engine for QI, since it produces the right kind of segmentation (in the output column of BWT, cf. [23] p38-39). And QI is a natural entropy coder fo BWT, instead of ad hoc methods used presently. This is still one of sub-projects to be done.


Although you claim EC/QI to be faster than AC, is their any know difference in computational complexity of how model is managed for AC vs how it is managed for EC/QI ? Is any one of them known to be better/worse than the other?

Check [23] 30-32 on comparison between the two modeling approaches. The QI modeling has great speed advantage due to better division of labor than AC modeling. Also, the universal (descriptive) way is inherently much more robust than the adaptive (predictive) way in the situations of non-idalized real life inputs, which have great uncertainties, such as frame difference data in video codecs.

The EC/QI modeling engine is BWT for real life data. For idealized Markov sources of order N, it is the output splitting and context tree growing method. Both modeling scheme +QI are significantly faster than PPM+AC.

Can it be used with higher-order adaptive models?

The BWT is equivalent to unlimited order PPM (which were constructed only after BWT came out, as a response to BWT, from the finite, fixed order PPM authors).


We know that huffman coding is computationally much more expensive when used with an adaptive or higher-order scheme. Does your coder suffers from a similar limitation?

No. QI/EC native modeling is not adaptive/predictive but universal/descriptive — instead of encoder pretending to predict what is coming (even though it has it in front of itself), it examins it and seeks to produce minimum description sufficient to reconstruct the data on the other end. That is Kolmogorv or MDL approach to modeling.

The ‘adaptive’/PPM style of modeling is a historical abberation, an artificial and harmful distortion of the whole modeling engine to the pecuiliarities of the AC coding requirements — the modeling engine is trying to funnel all it knows or can know about the data through the bottleneck of the ‘probability of next single symbol’. The strange problems of guessworking the Escape probahilities or zero frequencies are unique problems to this approach, they are artifacts of the mentioned distortions.

The key advantages of universal/descriptive modeling are:

a) Robustness — there is no surprises for the coder since it sees all that needs to be coded, it doesn’t conjecture what is coming, it just reads it. This is illustrated in the row Vary in the preprint, where QI suffers very little loss of compresion due to the sudden drastic change of frequency (at transition from -1 to 0 in the array) — all the harm was concentrated to the extra few bits in the length of Huffman codes used to transmit the counts, while the bulk of output, the enumerative index describing the sequence having the given counts, remained perfectly optimal. Even this harm would have been reduced further had the QI used its own multi-alpha coder to encode the counts the descriptive way, in which case the harm would shift to the next lower order quantities, counts of the counts (the benchmarking coder was quite primitive so it didn’t have multipass capability).

In contrast adaptive/predictive AC scheme failed catastrophically — predicting wrong, then operating based on it is very costly.

Hence even though both coders had faced data with much higher order regularity (it is really a nonprobabilistic grammar source, and not a finite order Markov source) than their order 0 modeling can understand, the descriptive way suffers very little. That was the whole point of Davisson’s minimax universal codes (for which the coder was exact EC): these are descriptive type codes which code always, even with no a priori knowledge of the source parameters, to within the O(log(n)/n) bits/symbol from the optimum that a specialized coder, fully aware of and tuned to the source parameters, would produce. (With knowledge of the source, they code fully optimally.)

b) Speed — the descriptive type coding allows for much better and cleaner division of labor between modeling engine and coder, since the modeler provides the boundary & initial condition for the coder (the selction of enumerative class), then lets the coder do the rest of work, without getting in the way symbol by symbol, as it does with AC adaptive/predictive scheme. See [23] 30-32 on this point.

Can it be used as a drop in replacement to AC in current compressors, thus getting performance gains? If not, what changes will be needed to replace the AC?

Until a reference implementation is available, it can’t be used at all :).

The release copy can be used as a drop in replacement for entropy coder, e.g. for JPEG or BWT or gzip. For anything that works via AC style modeling, and if no change in the modeler is allowed, it would require QI coder with the capability of output splitting, mentioned above. That feature is not planned in the first batch of coder sources to be released (next 2-3 months).

The BWT based general native to QI/EC modeling engine, that is planned for the second batch of releases (within 6 months), should be able to handle most other situations, probably better even on its generic autopilot than the hand tuned PPM+AC they may have had before. On our end, we’re also working on a video codec using the full power of QI, with its native modeling engine, designed & hand tuned for the differential video data coding, which we believe will be significantly better (our optimistic estimates at the moment are: a factor 2 smaller output and 5+ times faster coding/decoding) than the existent AC style modeling video codecs. That project is the highest priority among the QI related tasks.

So, this is a long term development and a research process that is unfolding. With the published well performing reference implementation, this process should accelerate, but I can’t predict when will QI replace AC altogether. Eventually it will — the edge at the fundamental algorithmic level is completely on the QI side (AC being merely an interim approximation of QI, which not only codes more poorly, but it also runs much more slowly) and with some practical sides of this edge, as illustrated by the little performance comparison chart, already showing through.

(Above answers have been mentioned as stated by Ratko V. Tomic)
END

8 Responses to “Quantized Indexing: Beyond Arithmetic Coding”

  1. c10n.info Says:

    December 28th, 2005
    Posted by Sachin Garg

    Quantized Indexing Source Code Available
    The Data Compression News Blog - c10n.info

  2. Quantized Indexing Update Says:

    David A. Scott (known for his work on Bijective compressors) has proposed a simple challenge for QI, pitting it against Arithmetic Coding. In the nut shell, its nothing but…

  3. Internet Banking Says:

    Glad to see that this site works well on my Blackberry Bold, everything I want to do is functional. Thanks for keeping it up to date with the latest.

  4. Earl Colby Pottinger Says:

    If I am reading this right, QI does not offer better compression than AC, just faster compression with fewer resources needed.

    That sounds like a win-win situation to me, but if it is a restricted patent it becomes a no-go option.

    I do very little compression coding, but what I do do is always release for the public to use whatever way they wish. Most patents have licences that prevent you from doing such.

  5. David Scott Says:

    I thought I posted code for the so called challenge a long time ago. I felt that in this case the bijective arithmetic compressor compressed better (closer to entropy) then the QI or what ever it was called. The simple contest if I remember right was just for compressing files generated by an I. I. D. source of 3 symbols each having probability of 1/3 I don’t think
    he actually ever coded anything. But I found my last post on it. I compressed better than what I think he claimed was optimal.
    Here is part of my last post of jan 26 2006

    start …

    QUOTE ON

    Hence the total decodable output for N=10^6 symbols, A=3 is:

    DECODABLE OUTPUT = 1584993 bits = 198124.125 bytes

    That size is fixed, the same for all inputs. You can check the whole
    code-decode-compare in the function radix_iter() in Tests.c file.

    QUOTE OFF

    Maybe my code that is in

    http://bijective.dogma.net/nitelite.zip

    is way off base and doesn’t seem to compress to
    1584992 bits or 198124 bytes. That is if I put it
    to a buffer and included in a poor way the length
    field. Actully the compressed file length was
    198121 bytes. Maybe this is all a dream of mine.
    If so I GIVE UP YOU WIN. I HAVE WASTED ENOUGH
    TIME TRYING TO ARGUE WITH YOU. DO YOU UNDERSTAND
    I GIVE YOU WIN I GOT TIRED BEFORE YOU DO THEREFOR
    YOU WIN. I WILL NOT POST IN THIS THREAD ANYMORE
    SO GO AHEAD BE THE LAST ONE TO POST.

    David A. Scott

    …end post

    I gave up arguing with the guy it got no where
    but my compressor for this special example compressed to a few bytes smaller then what he claimed his exact compress would. So why did you even bring up this again its several years ago? Did he ever write a program for the challange? Did he correct his method so that it would get a better answer than the simple bijective arthimtec or what?

  6. Earl Colby Pottinger Says:

    I read the thread pointed at the start of the article.

    I was not impressed with the OP who spent all his time quoting research papers, but seem unaware on any modern AC that was in real world use.

    Worse were his claims that he could always compress better AC or Huffman, even with my limited knowledge of compression I am aware that if model is a perfect match to the data, that you can’t compress better unless you increase the number of orders.

    But the very worse is I followed the link on his website. He posted in 2005 and it is now 2010 and he still does not have a demo program to prove his ideas.

    In the long run - working code is always king.

  7. Earl Colby Pottinger Says:

    Sorry, just read your last paragraph. Someone else posted here recently. So this was the first time I noticed this discussion. I started to read the article/Usenet thread and did not notice at first that all the posts were five(5) years ago

  8. Beau Penrod Says:

    Interesting blog dude Thanks

Leave a Reply

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>