The Data Compression News Blog

Your daily update on the most recent compression techniques, algorithms, patents, products, tools and events.

Subscribe

Posts: RSS Feed
Comments: RSS Feed

Sponsored Links

Recent Posts

  • Bijective BWT (2 Comments)

    David Scott has written a bijective BWT transform, which brings all the advantages of bijectiveness to BWT based compressors. Among other things, making BWT more suitable for compression-before-encryption and also give (slightly) better compression.

  • Asymmetric Binary System (107 Comments)

    Jarek Duda’s “Asymmetric Binary System” promises to be an alternate to arithmetic coding, having all the advantages, but being much simpler. Matt has coded a PAQ based compressor using ABS for back-end encoding. Update: Andrew Polar has written an alternate implementation of ABS.

  • Precomp: More Compression for your Compressed Files

    So many of today’s files are already compressed (using old, outdated algorithms) that newer algorithms don’t even get a chance to touch them. Christian Schneider’s Precomp comes to rescue by undoing the harm.

  • On2 Technologies is Hiring

    There aren’t too many companies working on cutting edge codecs, and of those few this one is hiring. Best of luck.

  • China’s AVS Specifications Available (2 Comments)

    Its old news that China has developed their own Advanced Video Standard to avoid high licensing fees. English translation of the standard is now available, along with the IPR policy. Finally something technical that you can get your hands on to feed your appetite.

Asymmetric Binary System

Posted by Sachin Garg on 17th December 2007 | Permanent Link

Update 16th Jan, 2008: Andrew Polar has written an alternate implementation of ABS. Check the comments below for details.

In an unpublished paper, Jarek Duda from Jagiellonian University in Poland has written about what he calls the “Asymmetric Binary System”.

It promises to be an alternate to arithmetic coding, having all the advantages, but being much simpler. Whether it holds up to that promise is yet to be seen but there have been a few interesting developments in last few weeks.

After a lot of hiccups the process, Matt Mahoney has finally managed to implement the asymmetric coder in an order 0 compressor, fpaqa, released under GPL. It is designed as a drop in replacement for arithmetic coders in other programs like PAQ, LPAQ, BBB, and SR2.

There might be a few confusions when you go through the paper, so here are some details that came up in the comp.compression discussion on this. I am not detailing all the maths, refer the the paper for that.

It might seem that algorithm is O(n^2) because as x grows larger, so does the complexity of the arithmetic. But this is not true because x can be normalized by outputting the low bits to keep it in a fixed range (e.g. under 2^32) with insignificant compression loss.

It has only one state variable instead of 2 (low and high range), which means a lookup table can be used for both coding and decoding to fairly high precision.

Unlike arithmetic coding, the decoder moves x through the sequence of states in the opposite order as the encoder. This means that the bits of d are decompressed in the reverse order that they were compressed. This almost rules out all the usual predictive models like PPM, PAQ, LZMA, etc. because the model sees different context during encoding and decoding.

You have to ensure decoding/encoding to be reverses of each other - if one first encode/decode than put/get lowest bits, than the second has to do it in the reverse order - get/put lowest bits such that after decoding/encoding we will stay in the selected range. To do it unambiguously, what we need to have the reverse, we should ensure constant bit-length of x - select the range [2^R,2^{R+1}-1].

To solve the asymmetry problem in fpaqa, what Matt has done is that the compressor models the input in the forward direction and saves q and d in a stack of size B=500K. When the stack is full or at end of input the saved q and d are encoded to a second stack of packed bits, and that stack is written to the output file in reverse order along with the first stack size and final state x. The decompressor does everything in the forward direction, periodically reading x to reset the state. The compressor does all the data reversal, so it is about 15% slower than the decompressor. B is large enough that the extra headers are negligible (5 byte overhead per block).

Overall, it looks a promising alternate to arithmetic coding but there aren’t any compelling reasons yet which will make you want to replace it. But as with all fresh ideas, it might get there with time. And, did I forgot to tell you that its free of any patents?

107 Responses to “Asymmetric Binary System”

  1. Ognyan Kulev Says:

    What about patents? Arithmetic coding has some. Is there such problem (for free software) with ABS?

  2. Sachin Garg Says:

    Jarek mentioned that there are no patents to it.

  3. Jarek Says:

    Jarek only observed, that if it would be known … it would be known :)
    I don’t know anything about patent law - I believe that it’s so simple that optimal algorithms should be created quickly, and it would be impossible to patent anything important… (?)

  4. Andrew Polar Says:

    My quick test of Matts fpaqa showed that compress/decompress time is identical to FastAC and my implementation of AE from ezcodesample.com (the difference is insignificant, about 15% for all 3 programs) but the compressed size is significantly smaller than entropy estimated size. I’ve done round trip for test.

    Any explanation? Is that expected result or something is wrong with my test?

    Regarding the patents: In USA the inventor can apply for a patent within a year since first publication. In order to do that inventor should not be a citizen of USA.

  5. Andrew Polar Says:

    I found what was wrong with my test, now things are clear. The compressed size is the same the compression time is the same. Are there any other reasons of choosing new technique, may be theoretical possiblity of future improvement?
    Thanks.

  6. Jarek Says:

    Arithmetic coding is evolving for 30 years, ABS a few weeks … fpaqb decompress a bit faster and codes practically optimally. And instead of arithmetic, it should stay patent free.
    I’ve just made general concept for versions which for fixed probability distribution decodes/encodes as fast as Huffman, with option that the output is encrypted(looks very safe)
    http://forums.devshed.com/security-and-cryptography-17/cryptosystem-based-on-asymmetric-numeral-systems-497071.html

  7. Andrew Polar Says:

    Thanks for quick answer but you did not make it clear. You say fpaqb is optimal, does that mean that you don’t expect further improvement? If no further improvement than no reason to switch.
    Arithmetic encoding is now patent free or soon will be. They all expired. I implemented AE (or some people call it range encoder) as it was published in 1979. I’m not sued yet for infringing a patent. And my code could be written in 1979 without evolution if author made his solution popular. My code is not very special, it has same peformace as other successful implementations such as FastAC and possibly other.

    I would like to ask again do you expect further significant improvement or not?
    Thanks.

  8. Jarek Says:

    Optimal in the sense that we get theoretical compression (-\sum_i p_i lg(p_i)), arithmetic is worse about 0.3%.
    And decompress in fpakb doesn’t uses memory (lookup tables).
    In theory - ABS should be faster - it uses one state instead of 2 (low and high range), fpaqa has now a few days.
    Another think is that in the version from the link above (for fixed probability distribution) we block a few bits - in one table lookup we decode whole symbol as in Huffman - it’s a few times faster!

  9. Jarek Says:

    I’m sorry for this 0.3%, I’ve only looked on Matt’s tests…
    Of course - it depends on the data and implementation - optimal for arithmetic shouldn’t have lookup tables - it would be much slower.

  10. Andrew Polar Says:

    After reading of the article and looking at the Matt’s code I can say a few things:

    1. It is not necessary to calculate LOW and HIGH in arithmetic encoding, most encoders do that but in my encoder only LOW is calculated, so in this way the asymmetric binary coder has no advantage.
    2. It is big limitation when decoding should be performed backwards.
    3. Known programmatic implementation does not show significant advantage in terms of time or compression ratio.
    4. Besides AE and range encoding there are other methods of entropy encoding that don’t provide significant advantage in general but may be preferred for some particular data, Golomb-Rice coding, for example. The suggested asymmetric binary coding may be considered as another alternative of that kind.
    5. Practically all technical articles about entropy encoding have one big problem. They tell you one story, when software implementation is completely different. The article and first implementation of ABC maintain that bad tradition established 30 years ago. The article of Mr. Jarek Duda don’t provide proper explanation of the algorithm and implementation of Matt Mahoney is so far from the article that nobody would figure out that they are at least remotely talking about same things without reference to each other. As it can be seen by google search there was long technical discussion between Matt and Jarek before the programmatic implementation. Isn’t that possible to explain things more clear? I think the author of the method should be kind of interested in that and not waiting when others explain things in a proper way.

  11. Jarek Says:

    Patient please :)
    This method is just rapidly evolving right now!
    In this moment I can tell You why it’s better :)
    The general version (from the link above) works as fast as Huffman: converts a few bits in one table check - it’s a few times faster!
    The limitation that the probability distribution is fixed can be weakened by using a few of them:
    For example we can use for each symbol(or a few previous) separate probability distribution of the following one - in this way we DOUBLE
    (or more) USED CORRELATION LENGTH (by cost of memory).

    It’s as precise as AC, as fast as Huffman!
    And we optionally can encrypt it well in the same time!
    The implementation is only a matter time now.
    In Matt’s implementation of the previous version backward is encoding, which slows it about 15%.

  12. Andrew Polar Says:

    What about computational stability and large alphabets. My encoder can handle data with alphabets from 2 up to 20,000 without any adjustment and makes compressed result matching entropy limit for all. Many encoders have issues with large alphabets (FastAC for example), some of them need adjustment where you have to change code (Witten’s coder), others have limitations that make processing large alphabets impossible. I tested Matt’s code on the nice data sample: alphabet size 256, distribution normal. What if I passed alphabet 10,000 with long seqence of zeros somewhere in the middle of data.
    To run as fast as Huffman is not a big deal, most range encoders are near that. Encryption may be interested for only few users. What I see so far is new but not revolutionary and what is completely wrong in preamble on the top is statement that it is much simpler than AE. Absolutely not, it is way more complicated. Give your article to read to random person with corresponded background and ask questions after that, you’ll see it right away that people don’t understand the concept easy. Not only a concept but implementation also shows the same complexity. Most coders are few lines only same as suggested method. The length is important when people make modifications for particular data types, that is what I did many times with different coders.

  13. Sachin Garg Says:

    I meant simple for hardware to execute, not for humans to understand :-)

    And yes its not there yet, but “promises to be”.

  14. Matt Mahoney Says:

    I implemented asymmetric coders in fpaqa, fpaqb, and lpaq1a all available at http://cs.fit.edu/~mmahoney/compression/
    For explanations of the coding system, see the comments in fpaqa and fpaqb. fpaqb is newer and simpler. All implementations are open source (GPL) C++.

    I initially thought that ABS would be faster because encoding is a table lookup with 10 bit state (x), 7 bit probability (q, using nonuniform quantization from 12 bits) and one bit to be coded (d), with new x as output. Decoding is also a table lookup from x,q to x,d. However my implementation in fpaqa turned out to be twice as slow as bitwise arithmetic coding. It might be faster on old hardware because no multiplication or division is required. However, a second table is needed to map q from 12 bits to 7 nonuniformly with smaller spacing near 0 and 1.

    In fpaqb I improved decompression speed and compression ratio and reduced memory usage by using 12 bits for x and q, and using a table only to save a division during compression. The algorithm is also simpler. Decompression is faster, but still 40% slower than arithmetic coding.

    fpaqa and fpaqb are order 0 models only intended to illustrate the coders. You can compile with -DARITH to select arithmetic coding and compare performance. fpaqa compresses about 0.3% larger. fpaqb compresses very close to arithmetic coding.

    As a practical example, I wrote lpaq1a using the model from lpaq1 and coder from fpaqb. This is a high end compressor (comparable to ppmonstr). lpaq1a is about 1% slower than lpaq1, compresses 0.02% larger, and uses an extra 1.3 MB memory for compression. ABS requires that compression and decompression be done in reverse order, so the compressor divides the input into 64KB blocks, models in the forward direction, codes in reverse to a second buffer, which is then written to the compressed file again in reverse order. Decompression does everything in the forward direction (although either the compressor or decompressor could reverse the data). The larger size is due to a 3 byte header that has to be added to each block to save x. You could improve compression approaching the same as arithmetic coding at the expense of more memory.

  15. Jarek Says:

    fpaqa is slower than arithmetic coding, because for each digit, it makes TWO TABLE CHECKS:
    ” r=qr[predictor.p()]; x=dec[r][x] ”
    Matt used 2 tables to make stretch (stretch(q) = ln(q/(1 - q) ) before decreasing its precision - thanks of this we have better approximation for extreme q (near 0 or 1), but worse for q near 1/2 - finally Matt writes that it makes the compression rate about 0.02-0.1% better.
    If we wont use it - just reduct the precision, we can do everything in ONE TABLE CHECK, for example
    r=dec[predictor shr K][x]
    where K is the number of bits to cut (2-4 or even zero but it would need a lot of memory).

    Another slow down is the while loop for very small q. We could store the number of bits to get, in some bits of dec/cod and decode/encode in separate codes.
    But the best solution would be to construct the predictor to work around 1/2 - if q>=1/4 we need at most 2 bits to get/put (one if q>=1/2).

    For example if there is Huffman based compressor, instead of standard encoding (approximate every binary choice with 1/2), we can do it exactly (get better compression rates) using created Huffman tree - just calculate the binary choice probabilities exactly (they are near 1/2, maybe larger than 1/4 (1/2?) ) and use ABS.

    I don’t believe it would be slower than arithmetic coding.
    Jarek

  16. Andrew Polar Says:

    I wish to thank Mr. Mahoney for first implementation of ABS, however, I have to add few critical remarks.
    For the moment ABS is patent free. In the event if new fast version is developed corporations start filing patents. They do not file published ABS, because it is prior art, but add some unpublished modification and file patent on a new version. This all already developed before our eyes just few years ago with AE. When filing patents corporations will use obscure terminology and fuzzy definitions, so when the patent is granted it will be written in such twisted language that any ABS implementation will have risk to infringe a patent, so scared researchers will try to avoid risk of corporate lawsuits. The advantage of being patent free will disappear as soon as ABS enters into serious competition with AE and Mr. Duda will not be capable to prevent that along with Mr. Garg and Mr. Mahoney.
    For the moment ABS is slower than many AE. When making comparison the best is to compare it to FastAC, it uses file in the same manner as Matt’s implementation. I would advice anyone build fpaqa and FastAC with same compiler option and pass same data file to both you will see the difference right away and it is not 40%, it is larger.

  17. Sachin Garg Says:

    Yep, that is a probable scenario. Not that their is anything wrong in it, but if someone else puts in the effort, their is the possibility of them going the patent route if they want to.

    If someone would really really care about the patent issues, they should make their own fast implementation and make it public :-)

  18. Andrew Polar Says:

    As I already mentioned the explanation of ABS continues bad traditions established by authors of arithmetic encoders when you need to read several lines of text for hours before the algorithm gets clear. I could not understand Mr. Duda’s explanation even after reading it multiple times and figured out how the method works only from Mr. Mahoney’s code. I wrote elementary demo myself and now I see that ABS encoder can beat AE significantly when written correctly. As I found in Mahoney’s implementation the operations are conducted for every bit. This is not even unoptimal, that is just wrong. It may kill the compression if symbols have near even number of positive and zero bits. But that could and have to be corrected. Let us consider the message AAABBBBBCCDDD. If we make binary array with positive bits where symbol A is met we have 1110000000000. If we then ignore A and make binary array for remained symbols with positive bit where B is met we have 1111100000. If we continue for remained part we have 11000. We can restore the original from binaries 11000, 1111100000, 1110000000000 if we know also alphabet as sequence CDBA. Our new binaries are total 28 bits. Original is 13 * 8 = 104. The difference is large. Even if we express A,B,C,D in 2 bits each, it is 26 bits per message, that is close. I already published on my site the proof that total compressed size of considered partial binary expressions is exactly matching Shannon formula, but then removed proof because it is elementary and probably known. Another important advantage of BINARY PARTITION (my terminology) that we shall have long fragments of 0 bits and, in case of block processing that may lead to better compression, some sort of adoptive approach.

  19. Andrew Polar Says:

    In my partitioning example if we start from the most frequent symbol we get total binary size of 26 bits for binaries

    11000, 11100000, 0001111100000 and alphabet saved in sequence C,D,A,B

    the exact number of 26 bits required for whole message if symbols are expressed via 2 bits. I don’t think it is a coincidence.

    I think people still have questions and not many of them understand the concept because I don’t see much activity. To help I made elementary demo for those who still have questions how the algorithm works. It is not compression tool, it is demo of the formula, people can find it on www.ezcodesample.com/abs/abs.cpp

  20. Jarek Says:

    I believe that ABS is simple enough to implement (multiplication/division and transfer of bits), so before they will be allowed to patent it (one year as You’ve said?), there will be optimal implementations generally available.
    I know - I should do it myself, but I’m only alone mathematician, who has programmed in Mathematica only for last five years…

    About Your representation - there are plenty equivalent of them.
    The trick of good compression is to find proper structure and representation of CORRELATIONS in given model.
    For example, I think that in image compression we should get better compression rates, using hexagonal structure - we could predict the probability distribution using values of 3 closest neighbors:
    http://arxiv.org/pdf/0712.1309

  21. Andrew Polar Says:

    Is that correct?

    Asymmetric Binary System is reversible bitwise conversion of one natural number into another wherein the latter has uniform distribution of bits and length according to entropy of original integer computed for bits. Direct conversion is achieved by logical and arithmetical operations on one variable that is called state, conducted in two possible ways depending on the bit occurred in original data. Reverse conversion is achieved by backward bitwise extraction of original data from state variable and updating state variable in two possible ways depending on extracted bit. In reverse conversion the state variable is taking same values as in direct conversion but in reverse order.
    If we denote state variable in i-th state as S(i) and P{b=0}, P{b=1} are probabilities for occurrence of zero and positive bit in the input stream than direct conversion is performed as

    S(i+1) =

    ceil [(S(i) + 1)/P{b=0}] – 1 if b = 0
    floor [S(i)/P{b=1}] if b = 1

    where initial state can be assigned as S(0) = 0. The reverse conversion is performed as

    b = ceil[(S(i) + 1)P{b=1}] – ceil[S(i)P{b=1}],
    where b is extracted bit

    and new state S(i+1) =

    floor [S(i)P{b=0}] if extracted bit b=0
    ceil [S(i)P{b=1}] if extracted bit b=1

  22. Andrew Polar Says:

    I changed demo to operations with integers

    http://www.ezcodesample.com/abs/abs2.cpp

    interesting that initial state can be assigned arbitrary to any integer and this integer can be used as end indicator in decoding.

  23. Andrew Polar Says:

    ABS appeared to be much simpler than it looked originally. One important property of transformation is that state variable can be subjected to any arithmetical operation in direct transformation and, if reverse operation is applied on the backward transformation, the result is correct. The backup demo can be found at
    www.ezcodesample.com/abs/abs3.cpp
    That means that when state variable grow large enough to approach buffer overflow the upper bits can be output and state variable can be shifted left in order to continue. This operation must be synchronized and reversed (shift right add bits from compressed buffer) in decompression process. It is, of course doable, but I think it even does not worth bothering. Since size of the state variable reach the size defined by Shannon entropy pretty quick the whole state variable can be saved to output buffer as soon as it reach certain length, let say 48 bits. So I simply suggest to continue operations while size of state variable is less or equal 48 bits, then output whole state variable and start from 0 again. The decoding in decoding process shall continue until state variable reach 0 (some more research required for defining stability in termination of decoding process). It can be adoptive, because it is performed by blocks, it may start from probability 0.5 and update it after every block. Look-up table may be not the best solution for 48 bit variable, but hardware implementation may completely solve problem. As you can see in demo there are only two critical functions process_bit, extract_bit that may have hardware implementation. In case of suggested above Binary Partitioning the whole process can be conducted as multi-threaded, that in case of multi-processor system, may run significantly faster than regular AE.

  24. Jarek Says:

    You have to be careful with using the initial state as end indicator-it can occurs many ti me.
    About dividing on blocks - in Your way we loose much of capacity. To do it wright, we should use constant bit length of the state - see paper or Matt’s implementation.
    In fact ABS is kind of similar to AE - instead of dividing the range into two subranges, we just spread them uniformly.

  25. Andrew Polar Says:

    I suggested one of the possible ways to manage growing size of state variable. Demo can be found at

    http://www.ezcodesample.com/abs/abs4.cpp

    In this example when state variable grew over 32 bits we output least 32 bits than shift them out

    state >>= 32;

    and continue with new state variable. In decoding we need to find right moment when we do the opposite operation

    state

  26. Andrew Polar Says:

    I have discovered instability in the above approach because small values very often stay the same after processing a bit. I will try later the opposite method - maitaining constant bit length state and output least significant bits.

  27. Matt Mahoney Says:

    That should work. That is the approach I used in fpaqa/b/c. In fpaqa the state, x, is always in the range 2^n to 2^(n+1) - 1, where n is the number of bits of precision in the bit probability, q (in the range 1 to 2^n - 1). For compression, you first check if encoding will make the state too large, and if so you output the least significant bit until x will be in the right range. This is stable as long as q has at most n bits. For decoding (when x gets smaller), if x is too small, you read into the least significant bit in the opposite order.

    In fpaqb/c, the coder reads and writes 8 bits at a time. x is in the range 2^n to 2^(n+8) - 1. You can have n

  28. Andrew Polar Says:

    To maintain the computational stability appeared to be a challenge in this algorithm. It may be fast enough to compete with AE when synchronization in saving bits in encoding and pulling them back in decoding is solved. In my latest version I have failure in every other data sample. The good news that every other sample pass for the long range.

    www.ezcodesample.com/abs/abs6.cpp

    I did not follow Matt’s recommendations on purpose but my version is close to his approach and it still have an instability.
    Any ideas?
    Thanks.

  29. Jarek Says:

    Matt has already optimized the version with the proposed formula (multiplication/division).
    Maybe try to use lookup tables - created to be reverses of each other.
    This version is slightly less accurate (like usual implementations of AC), but should be faster than AC - for one bit we need only one table check and bit transfer.
    There is huge amount of proper coding/decoding functions(x\to(x_d,d)), the essential is only that they fulfills:
    - that x_i is about x*p_i
    - is bijective
    http://forums.devshed.com/security-and-cryptography-17/cryptosystem-based-on-asymmetric-numeral-systems-497071.html
    (for example take a=phi)
    Unfortunately, I think that quickly to calculate proper coding/decoding formula can be find only in the binary case, but we don’t care about it when we use tables, which can be quickly initialized.

  30. Andrew Polar Says:

    There is numerical instability for small state values. There are some values when state don’t change on particular step that means it goes to infinite loop in decoding. For example,

    long long res = 8;
    for (int i=1; i

  31. Andrew Polar Says:

    I finally implemented ABS encoder that works not very far from the capacity of AE. The running example can be seen on www.ezcodesample/abs/abs7.cpp My concept is straight forward and simple. It encodes data in 48 bit fragments. Each 48 bit fragment of output data contains variable length sequence of data. All fragments can be processed independently in decoding. That means that highest disadvantage of ABS (backward decoding) is resolved. If implemented in multi-processor hardware it can be very fast. The compression ratio, however, is about 3 percent larger than theoretical limit, but it can be compensated by adoptive coding. We can predict probability of positive bit for every fragment based on previous fragments and use it for non-stationary binary stream. I recommend my program to those unfamiliar with subject and willing to understand the concept. The encoding and decoding needs about 20 lines each and the whole program including comments, entropy estimation and data sample generation is about 200 lines.

  32. Matt Mahoney Says:

    You can avoid the 3% size penalty by outputting the 8 low bits of the state when it grows too large, instead of outputting the entire state (48 bits) and resetting back to 1. This was done in fpaqc and lpaq1a. The problem with this approach is the compressor has to buffer the predictions and the output in order to reverse it. Of course, abs7 does this with a 48 bit buffer. My programs just use larger buffers.

  33. Andrew Polar Says:

    Thank you for reviewing my code. I’m experimenting so far and thinking of a different solution even set of possible solutions. I expedited functions by replacing possible multiplication and division by shifts and program now runs twice faster. If not using lookups there is one unavoidable division in direct transform and one unavoidable multiplication in reverse transform. I added short explanation of ABS to my site and now sample can be found through the link, so people can look to home page ezcodesample.com and follow the link.

  34. Andrew Polar Says:

    OK, with binary encoder thigs are more or less clear. It there a way to use ABS with larger alphabets directly without converting everything to dual alphabet?
    Thanks.

  35. Jarek Says:

    Yes! … :)
    We only need that x_i is about x*p_i
    This time, I think that there is no fast to calculate formula, but we can generate the lookup tables - encode x into symbol ‘i’ when something with probability p_i happen - there is huge amount of possibilities - we can use for it random generator.
    If we use the key to initialize generator - we by the way encrypt the data.
    This time we need better precision - we can use only a few probability distributions - for example: general and of successor of a few most probable elements.
    Ok - I’ll place it the third time… :)
    http://forums.devshed.com/security-and-cryptography-17/cryptosystem-based-on-asymmetric-numeral-systems-497071.html
    regards
    j

  36. Jarek Says:

    I thought about error correction, because one bit-flip makes that we lose everything after it.
    To do it, we have to add some redundancy - according to expected density of errors.
    For example we can set one bit in every byte to 0 in the data to encode (after prediction).
    Now if while decoding we get there 1, we know that there was an error in a bit in at most a few bytes before (with probability 1/2 in the last, 1/4 previous… )
    We should flip bit by bit and try to decode - if the density of errors isn’t too large, we should be able to localize and correct the error with large probability.

    Eventually if we lose larger fragment, we have to guess proper initial state - if we’ve put some redundancy, it’s rather doable state by state in not too large time.

  37. Matt Mahoney Says:

    I think it is best if data compression, encryption, and error correction are separate functions, independent of one another. First compress, then encrypt, then add error correction.

  38. Andrew Polar Says:

    Encryption in US is subject to export control. That means it even can not be taught or explained without license (www.access.gpo.gov/bis/ear/ear_data.html) to a foreign national leaving in US. Those techniques that are patented, such as RSA, are excused and can be discussed freely, but only the algorithms, software are still subject to licensing. I consider these regulations totally stupid unjustified and violating my constitutional right, but I follow and do not publish certain materials on my site, such as my solution for SSL Web server. I post this message for those who don’t know.
    Encryption in combination with AE or ABS is not a problem, everybody understands that changing one bit will affect all following message. There is infinity of opportunities for this, why you pay such attention to this? I don’t see it as serious technical problem.

  39. Jarek Says:

    Luckly I’m not from US :)
    The general concept, for example with some fixed probability distribution, is just coding algorithm.
    Encryption is just a natural modification.
    In AE new symbol is put into most significant position of state and encoding is made from the same side too - there would be correlations between symbol and the output.
    In ABS symbol is also put into the most significant position, but output takes the least significant - its behavior looks completely random - instead of in AE it practically doesn’t depend on the symbol.
    Symbol only determines which of random (using the generator) jump we use - in practice we jump in random way between very large number of states.
    Having worse random number generator, only can allow (I think) to try brute force attack without initializing whole table.

  40. Andrew Polar Says:

    Well, one way of encryption is change in final state, so the user without proper key will start from wrong state with correct frequencies and fail. But that does not look what you mention.
    It looks like you mention of changing tables that means probability that may also work but I don’t see big advantage here.
    The other way that you mentioned is possibly changing probabilites during encoding accoriding to certain rule. That may not lead to big advantage, because you may introduce some rule of changing state after every step.
    So, I still don’t see the technical problem. All is doable and will work.
    The strength of encryption not in rules because they will become known but in password. And again there is no differece with AE, the password can be applied as XOR operation for output bits and can be as long as you want it, so brute force attack will take forever.

  41. Jarek Says:

    Matt - sorry - I haven’t seen Your post…
    I agree that doing compression, encryption and data correction separately is conceptually simpler.
    But look at todays super object programming - computers are faster … but often works slower … and programs are counted in GBs…
    We cannot always just say that something will be done in another layer and just forget about it…
    To make error correction there is always redundancy needed, encryption can somehow show the structure of compressed file… the more layers we are fully aware of, the better should they combine with each other…

    ABS(ANS) is just a general concept of operating on nonintiger size data parts.
    The more functional layers would it support, the faster and safer will be software based on it.

  42. Jarek Says:

    I see quick responce… :)
    Andrew - this encryption’s main power is in the coding/decoding tables (generated using the key) - they defines the rule of jumping between states - state is hidden variable - slightest change of the tables and we are in completely different place … and the hacker sees only the least important bits of the states - they are useless, because he cannot track his current location…

  43. Matt Mahoney Says:

    Actually the U.S. has loosened export controls on encryption. For example, you can freely download AES source code from many sources linked by
    http://en.wikipedia.org/wiki/Advanced_Encryption_Standard
    Export controls had nothing to do with whether the method was patented. Also the patent on RSA has expired.

    When I say that compression, encryption, and error correction should be separate functions, that doesn’t mean that you can’t write applications that combine them. Many compression programs also encrypt and add checksums. What I mean is that the algorithms should be independent. Encryption is very hard to get right. Combining it with compression just makes it harder.

    Before you start designing your own encryption, first you need to tell me what is wrong with AES, SHA-256, RSA, etc. The only way that we know a system is secure is if a lot of people try to attack it and fail. Standard methods have been studied a lot more than anything you could design yourself.

    Just for example, you could “encrypt” by setting the ABS state to a secret key, then I could crack it by brute force search. I know you could “fix” it by adding more complications, and I could break it again by using more clever tricks, and back and forth, but who is going to be confident that you fixed the last security hole?

  44. Jarek Says:

    I personally believe that in the simplest version (with added random byte on the beginning of file) ANS is already safe - that means the only possible attack is brute force.
    But it has nice protection against this kind of attacks:

    In other encryptors we can just take next key and try to decrypt - analyzing the beginning of output we can decide if we’ve found the correct key.
    In ANS before starting decryption process we just have to initialize whole decoding table.
    Choosing the number of bits we are working on (R), we can make the time needed to check every key as long as we want.
    So even using short key should be safe with ANS.

    If it will occur that we can decode without initializing the whole table, we can just take better (less predictable) random number generator.

    RSA needs gigantic key, is extremely slow and in fact we don’t know if somebody hasn’t polynomial in time decomposition algorithm (3-4 years ago was officially shown polynomial algorithm that checks if number is prime).
    But RSA has large advantage - asymmetric key, so I will compare ANS with symmetric key encryptors only.

    Both AES and SHA just takes constant length block and mix it as long as it doesn’t look safe.
    Another advantage of ANS is that it uses variable block lengths - we cannot even decide how to block the file!

    And the QUICKNESS …
    After constant time for initialization, ANS encrypt/decrypt maaaaany times faster than AES or SHA … and compress by the way :)

    But of course - many should think about it, before it will be widely believed safe and used.
    Because of it and that I think, we’ve overstrain a bit hospitality of this blog (thanks :) ), I propose to move to cryptography forum with further discussion (I’m citing You there, If You mind - I will remove it).

  45. Matt Mahoney Says:

    Yes, take it to a crypto forum. Also read the sci.crypt mini-FAQ, especially about designing your own system.
    http://groups.google.com/group/sci.crypt/browse_thread/thread/bfad3d3a638a0722/05eca1a923fec8dc

  46. Andrew Polar Says:

    This is for Matt regarding disadvantage of other methods.
    RSA is slow.
    You forgot to mention BlowFish. I used it. It is fast and known to be secured. It passed test period as some other methods you mentioned.
    The reliability of your encryption must be tested by those people who are paid for breaking the ciphers. I’m not one of them although familiar with some methods.
    There are rules about patenting and export control, Matt you are not a lawyer. I tried to obtain legal consultation about my SSL Web server. Unfortunately, lawyer wanted $2000 for giving me an answer if publishing is allowed. I said that it is too much for a person with 4 years of education and refrain from a publishing until I find legal advice for cheaper price.

  47. Jarek Says:

    I’ve look through the FAQ and haven’t seen any reason why ANS could be not safe?
    Maybe (?) in the simplest version it could be vulnerable to adaptive chosen plaintext type attacks …
    To ultimately protect it, we can for example before putting to ANS, XOR every byte with byte generated by our generator (there are many possible options here).

    DES, AES, SSL show tendency that every few years we should just take larger block and make more XORs, permutations…
    Maybe we could change the way of thinking - observe that we have much more unpredictable processes then permutations, for example random number generators and use mathematical theory of chaos…

    And now even the simplest cryptosystem looks safe - just XOR the data with output from good random number generator, which was initialized using our key.

  48. Andrew Polar Says:

    If you did not spend years in crypto laboratory trying to break ciphers and did not succeed in this task you are not a specialist. Do you know that during the second world war all ciphers created by best mathematicians with special encryption experience were broken. Specialists specifically warn people do not use any unprofessional ciphers. Especially vulnerable is XOR operation and everything related to it for different type of attacks. Average cryptanalyst needs between 15 minuts and 3 days to break average amature cipher.

  49. Jarek Says:

    Making XOR with sequence generated by random number generator was only a simple example, vulnerable on any plaintext attacks.
    But it could be easy strengthen for example by using also previous output in the generator.

    I’m only suggesting that instead of expanding old methods, maybe we should try some new concept.
    ECC is becoming more and more popular, but it’s not the end of possible new concepts…
    For example if we use some family of chaotic, measure preserving bijections parametrized by the key, I think it should be provable that taking large iteration of it just cannot be found not using brute force - it should be perfect random number generator.

  50. Andrew Polar Says:

    The reliabitly of code will become known only when large number of commercial giants start using the cipher and breaking it become attractive to hackers. For now you may suggest something and successfully use it for years. New technique is always good, nobody ask you to stop doing that. My point that encryption is specific and very different from compression. In compression you see the speed and compression limit right away. In encryption you need attract analysts for the test, which is not easy. Contact Bruce Schneider and see what happens. Google his name if you don’t knwo who is he.

  51. Jarek Says:

    ANS can be imagined as a tube shaped mixer - we fill it to the top (to R bites) from upper side, turn on (use table) - it’s mixing up-down its contents (the number of state) and spit out something (symbol) from the bottom side.
    Random generator creates small, unpredictable disturbances in the mixing process, they grow while the data flow down and mix with other data … finally on the end is completely changed in random way.

    It’s quite universal way of thinking …
    Binary system is the base of computer science … something so simple as it’s generalization (this or complex base systems) should have many uses too…

    I’ve already mailed Bruce Schneider … and many others …
    I see that many people watch the cryptography forum and nobody respond … I can guess that it means that it’s not so easy to break it, but we will see …

  52. Andrew Polar Says:

    There is one more step that can be taken to the direction of entropy encoding that looks like more advanced compared to all known and final for binary encoders. The bit length of compressed expession computed based on entropy is larger than bit length of number of possible permutations computed as logarithm of [n!/a!/b!], where ‘n’ is size, ‘a’ and ‘b’ are number of positive and zero bits. The usage of look up tables does not make sense for ABS because instead of computed table values suggested by Jarek we can use computed (once in a life time) table values as positions in a sorted list of all permutations for given size and probabitity of positive bit. In this case we convert fragment by fragment back and forth.
    How about this Jarek?

  53. Jarek Says:

    I have speedup for bit transfer!
    For example it should make fpaqc a few (or more) percent faster.

    To ensure everything is reversible, we use constant length of digits (R+1 bits) buffer (state).
    But nowhere is written that this digits have to be binary … we can use higher base - for example 4 (2 bits) or larger.
    For base 4 (2 bits), we will use:
    Range: [2^{R-1},2^{R+1}-1]
    Decompressing: while(x

  54. Jarek Says:

    (something cut my post…? I see… :)
    Decompressing: while(x \less 2^{R-1})x=4x+2bits from input
    etc.
    Thanks of it we have to make twice less transfers in cost of table size(half more) and a bit of precision.
    We don’t feel these costs in precise version (fpaqc) - it will be faster (compression especially) practically for free.
    If the cost of multiplication doesn’t grow, maybe we could increase it even to 8bits ([2^R,2^{R+8}-1])- decreasing the time for bit transfer practically to zero.

    ps. Andrew - changing the representation automatically doesn’t give nothing for the compression … we need probabilities, correlations…

  55. Sachin Garg Says:

    I am sorry for that ‘cut’, must be something with the blogging software I am using.

  56. Andrew Polar Says:

    I’m speaking about final solution when all permutations are sorted and numbered for example if message is 4 bits and 2 of them are positive we have
    0011,0101,0110,1001,1010,1100
    and we number them as 0,1,10,11,100,101.
    When we output 11 that means original is 1001. Same thing applies to large array. This way of compression is better than entropy times size and Shannon was wrong but not very much wrong, because it is close to entropy times size. The number of particular permutation can be found without making all permutations and sorting them.

  57. Andrew Polar Says:

    In my previous example according to Shannon formula the bit length of any message of 4 bits with 2 positive bits must be 4 bits, but actually it is 2 bits the difference is bit. Unfortunately it is not that big when message is large. For 1000 bits the difference is only fraction of percent. But if you apply this to small fragments as look ups you can make compression significantly better. There is however one problem it is number of positive bits, it should be same or should be output and kills the compression. However it is something to think about, not yet solved.

  58. Jarek Says:

    Andrew - if I correctly understand - You are proposing some bijection between binary representations and numbers of permutation…
    It won’t change the number of possibilities!
    Encoding is choosing the number which denotes given element of our space - to do it we need for example lg(size of the space) bits…
    The only way to decrease this number is to use the probability distribution of elements!
    (for example that some sequences are forbidden - have (almost) zero probability)

  59. Andrew Polar Says:

    I thought I expressed my idea clearly. I speak about totally different way of entropy encoding. For the moment I don’t know how to implement it. If I knew I would do that but I see that it is possible. I can repeat again. If you are given the size of message and number of positive bits you can calculate the number of possible permutations. You also can make the list of possible permutations, sort it and replace your particular message by the position in the list. The entropy is not theoretical limit and Shannon made mathematical mistake and I’m surprised how people don’t see it. Theoretical limit is bit length of number of possible permutations which is slightly less (fraction of percent for real case) than entropy times size. We know that before Huffman encoding there were other methods of making binary trees and Shannon was using one of them and noticed that size of binary code for symbol approaching to -log2(p). Read his article again and you will see that there is no theory behind this statement. Fortunately to him this wrong limit appeared to be close to the right one so he avoided shame of publishing mathematical mistake. Read my article about range encoding on my site it clearly shows everything.

  60. Andrew Polar Says:

    The new, final and optimal way of statistical data compression is replacement of the message by the position in imaginary list of sorted permutations. Please confirm that you understand what I’m talking about, because when I say that theoretical limit is below the entropy people are reacting like catholic priests on heresy. Surprisingly Shannon mistaken statement made without proof over 50 years ago is taken as final truth and supported by yourself too. Do you agree, yes or no?

  61. Sachin Garg Says:

    Andrew, your method sounds fine but I fail to understand how it contradicts Shanon.

  62. Andrew Polar Says:

    It can compress below the entropy limit statistically. That means that every possible message and all possible messages will be compressed below entropy limit. However, not significantly below, fraction of percent for most of them. I can write implementation but it will require long time for encoding/decoding and will be impractical. I speak about theoretical mistake. For pactical compression fraction of percent is not important for theoretical article it is important. Why other people persistently don’t see this mistake that surprise me.

  63. Sachin Garg Says:

    Can you explain ‘how’ it will compress below the limit?

  64. Jarek Says:

    The equation made me feel the Shannon’s entropy (logarithm of number of possibilities) is (see my paper):
    binomial(n,pn)=~ 2^{nh(p)}
    where h(p)=-plg(p)-(1-p)lg(1-p)
    So if we already know the number of ‘1′ (probability of 1), to encode the number of combination we need about h(p) per bit.
    You would get practically the same (compression rate) as by finding p(1) and using only this to encode the file for example using ABS.

  65. Andrew Polar Says:

    Sachin, I thought I answered this question many times. I do it again because I want to see at least one person in the world who understand that. If you have, for example, file of 100 bits with 50 positive bits. According to Shannon this file can not be compressed. Entropy times size is exactly 100 bits. Now, I repeat it again, how many different messages you can make. The correct answer that can provide any math teacher with bachelor degree is 100!/50!/50!. The bit length of this number is 97 bits not 100 bits. If we sort all possible messages and assign sequential number to each of them starting from 1,2,3,… and so on we can use this sequential number as encoded message. The average size of encoded message will be even less than 97 bits, may be something near 92 bits. Huffman got his Ph.D. for improving algorithm of Fanno-Shannon on a fraction of percent here is much more. People persistently deny mathematical mistake of Shannon. That is not approximation that is MISTAKE again MISTAKE. Approximation is when author state that the formula is approximation and precise answer is different.

  66. Andrew Polar Says:

    Jarek,
    I don’t always understand what you say, but you must completely understand what I’m saying and I’m surprised that you did not confirm this yet. Please say clearly do you agree that Claude Shannon saw statistical compression from wrong perspective. He simply knew from experience of making binary trees that bit length of compressed symbol is about -log2(p) and took it as limit in statistical compression without even considering number of permutations. How message of 100 bits with 50 positive bits can be uncompressable? Total nonsense, we know statistics we can exclude giant number of messages where number of positive bits is not 50. 100 bits is necessary to represent all possible messages with any number of positive bits not only 50. Please stop playing misunderstanding we are scientists.

  67. Sachin Garg Says:

    Andrew, if it is already known that we have 50 1s in 100 bits, AND we follow Shannon’s formula, we can still get the encoded-size you mention.

    -log2(p) uses probability, and with the additional information that we know (50 1s) we can calculate probability p better and thus encode in less than 100 bits.

    This doesn’t makes Shannon’s formula wrong.

  68. Jarek Says:

    I’m sorry, You are wright: for very small files, the number of combinations is a bit smaller.
    We need 97 bits to encode 100 bits with 50 ‘1′, or 995 when 500/1000 or 9994 when 5000/10000 or 99992 when 50000/100000.
    But the problem is that we have to remember the number of bits too - ceiling(log2(50))= 6.
    Even without rounding up, we get more than 100.
    It’s because in fact we need a bit less for storing ‘50′, because it’s probability distribution isn’t uniform. To store it we need:
    -log2(p(50))=-log2(100!/50!/50!/2^100)
    So if we do it exactly - in sum we get … 100 bits! :)
    Miracle? no!
    As I’ve said - You are proposing bijection between representations.
    It doesn’t change the number of possibilities - so to choose one of them we still need the same amount of information!

    The equation I mentioned (p\in(0,1)):
    log2(n!/(pn)!/((1-p)n)!) =~ n*h(p)
    is asymptotic behavior (use Stirling’s formula)

  69. Andrew Polar Says:

    I’m glad Jarek understands me. I mentioned that mistake is theoretical Shannon’s entropy is close to realistic limit otherwise it would have been noticed. And I said that Shannon introduced limit from experience pretending that it was some sort of theory. However, you all speak about upper limit. If we number all permutations as 1,2,3,… and so on we can have particular case where our particular permutation is assigned number twice shorter than a maximum which is log2(100!/50!/50!), how about that?
    But all this is only encoder of order 0. What if I add other conditions. If I introduce sort of Markov matrix
    30 70
    20 80
    which says that in our 100 bit message with 50 positive bits probability of 0 after 0 is 0.3 and probability of 1 afer 0 ia 0.7 and so on. We need to output 4 additional numbers but the number of possible permutations will be reduced drammatically. I don’t know how to enumerate all possible permutations in this case. Do any of you know that?

  70. Jarek Says:

    To choose one element from set of size 2^n without probability (uniform), we need EXACTLY n bits - first chooses between first and second half, etc…

    About the probability - if we would store it exactly (like 50/100), than as I’ve said before - we would have no compression.
    Ok - You can say that probability is in fixed range, but again if calculating exactly - we finally would need again the same number of bits (100).
    You won’t cheat math :)

    The trick of compression is loose (statistical) approximation of these probabilities.
    In this case, wanting or not - we have to allow some variance (eg of number of ‘1′) - we don’t restrict possibilities to some set as before, but use natural statistical behavior of given model - with small variances around flat maximum of entropy.

    About the Markov’s chain …
    Having statistical conditions (correlations) it’s a bit weird wanting finite deterministic sequences fulfilling these conditions.
    On the other hand, if we want such asymptotic behavior or infinite sequence … just use ABS :)
    Of course we would get some small variances of this statistics, but THANKS OF IT we can get compression.

  71. Andrew Polar Says:

    Jarek,
    You and Sachin are showing political correctness. Why don’t you admitt that Shannon exposed scientific sloppiness in his article by taking limit from experience and making it looking like scientifically defined.
    Another part. You must easy see that compression based on permutation could be better than entropy encoder but you turn around and try to deny. What if we take data after ABS and split them into 16 bit fragments. We can make easy look up tables and convert every 16 bit fragment into sequential position order in permutation list. The length of every of such fragment is below entropy. In addition some of such fragments will take beginning indexes and be much shorter. We may have few particular cases where 16 bit fragments turn into 1 or 2 bits (just particular cases). Second and important part that you don’t want to see. We need to output nubmer of ‘1′ for every fragment. If data are after ABS what you expect. You know that distribution of mean values for groups is normal even when taked from uniformly distributed population (starting from group size 30). Same as average weight of passengers on board of aircraft. Number of ‘1’s may go to a different stream that is subject to a further compression. I’ll write this program and show you that the data after AE or ABS can be further compressed, probably not very much, but we speak about principal.

  72. Matt Mahoney Says:

    Andrew, enumerative coding was developed by R. V. Tomic in 2004. http://www.1stworks.com/ref/qi.htm
    It is claimed to be faster than arithmetic coding. The drawback is that it only works for independent bits with fixed probability. It was discussed at length in comp.compression (under the pseudonym nightlight).

    However it does not compress smaller than the Shannon limit. It is true that there are less than 2^n permutations of n/2 zeros and n/2 ones, so you can code it in less than n bits, but you also need to transmit the number of zeros and ones, which exactly makes up the difference. There is no mistake in Shannon’s math.

    If you don’t believe it, go ahead and write your recursive compressor. You can easily compress to n-1 bits because if you know the number of zeros and ones then you know what the last bit is and don’t have to send it.

  73. Jarek Says:

    Compressor is a program which makes some bijection between inputs and possible outputs.
    Andrew - if, as You suggest, we would enforce that the output has always the same length, than because we need the same number of possibilities as input has - the output has to have at least the same length as the input.
    The trick of compression is to choose some expected statistics (maximizing entropy) which will be converted into short output, but the cost is that pessimistic length of output has to be much larger than input’s length.
    For example if we encode 01 sequence which should has expected statistical behavior: p(0)=0.1, with nonzero probability we can get sequence of one hundred zeros, which will be converted into much longer sequence.
    The trick is that it occurs extremely rarely.

  74. Andrew Polar Says:

    Matt,
    thank you for link. I already made estimation program for enumerative coding. The result shows 3% better compression, however I need to check everything because it does not make round trip, only estimation.
    The discussion was about theoretical aspect. It was clear that improvement is tiny if exists. Shannon did not mention number of possible permutations because he had no idea how it is related to entropy and introduced entropy from experience. This is what people persistently don’t want to admit and shift subject to size from formula like you, Marek and Sachin. Size is the same but formula is different
    log2(n!/p!/q!)/n is close to log2(n^n/p^p/q^q)/n.
    (p and q number of positive and zero bits, n is size)
    The last is entropy and previous is correct one. The last is upper assimptotic limit for previous and difference is fraction of percent for sizes over 1000. This is how it should be explained.

  75. Andrew Polar Says:

    Another thing that you did not mention. What happens if we provide Markov matrix instead of simple probability of positive bit. How to calculate number of posible permutations for given Markov matrix? Did anyone try that?

  76. Jarek Says:

    The first equation has the same asymptotic behavior (use Stirilng formula: n!=~sqrt(2\pi n)*(n/e)^n).
    If You want the average entropy(logarithm of number of such sequences) for given Markov’s matrix (see my paper):
    -\sum_i p_i \sum_j S_ij log2(S_ij)
    where p_i is normalized eigenvector corresponding to dominant eigenvalue(1) - probability of i-th symbol in the sequence.

    Andrew - You are proposing some general idea of compression which doesn’t look into file structure…
    See that ‘typical’ file - the most probable we will get if we don’t know anything about its structure is uncorrelated sequence of 01 with p(1)=1/2 - we just cannot compress such files!
    If You want to construct a compressor, You just HAVE to assume that the file has some structure … and try to make as good as possible model of it…
    The simplest assumption is that near bits are somehow correlated - we get for example Markov’s model this way…

  77. Andrew Polar Says:

    One more reason of not to use ABS and use simple range encoder: I made simple routine that generates binary array with given probability of positive bit. Then I calculated entropy bitwise and bytewise. It appeared that entropy times size is the same number for both. That means that any binary data can be passed to regular range encoder and processed by bytes. As it is known range ecoder makes one multiplication/division per byte not per bit. If to use lookup tables as Marek suggested it is no longer ABS but lookup. Because lookup tables can be made by many different ways including simple manual design from experience with no theory behind. Marek, try to beat my range coder from ezcodesample.com in speed. But process data by my encoder bytewise and let me know what happened.

  78. Andrew Polar Says:

    One more thing. If we process only bytes I can optimize my range coder and make look up tables as well and in this way to replace multiplication and division by look ups. I did not do that on pupose because I wanted to keep it as generic as possible. It process integers within wide bit length, alphabets from size 2 to size 20,000 with no change in code.

  79. Jarek Says:

    Ok, let’s compare Range coding and ANS(ABS): both uses transformation:
    x(x_d,d), where x_d is about x*p_d
    The difference is that in ANS we are looking for d on the least important position, in RC on the most important.

    The advantage of ABS is that because symbols are placed uniformly, we can use the buffer of constant size, and put/get bits on the least important positions of it.
    In RC symbols are placed in ranges, so if we would try to do the same as in ABS, the higher ranges would be visited less frequent.

    So in RC we have to use various size of buffer and some times just put the remainings to output: we lose PRECISION, need more computation time …
    ANS thanks of constant buffer size is much more precise (better compression rate) and flexible.

    I think the only use in which RC could(?) concur with ANS in which we don’t have (yet?) general formula, is while encoding large symbols and the statistics varies very much through the whole file.
    But the cost is precision … and RC would use multiplications instead of table check, so I think just to initialize ANS a few time would usually be faster.

    And thanks of that the digits are placed uniform in ANS, but only in statistical way, we can make great cipher by choosing exact placing.

  80. Andrew Polar Says:

    When we speak about compression the only things that represent interest is speed and limit and limitation on alphabet. The limit is the same, the seed is faster in RC, the alphabet is virtually unlimited in RC. This may be grave for ABS. Make it faster at least for binary and publish your program. If you beat my RC in speed I will make it significantly faster by replacing multiplication by look up in the same way as you want to use it in ABS. The details that you mention may represent interest for some particular implementation in rare cases.

  81. Jarek Says:

    There is one more interest - precision - how exactly we approximate expected probability.
    The better precision, the closer to theoretical entropy - the better compression rate.
    And ABS/ANS is much better in this.

    With RC using large alphabets makes problem with gigantic multiplications and has large precision lost.
    ANS can use large alphabets too and thanks of constant buffer size, practically doesn’t have above problems.
    The disadvantage I’ve spoken about is that for every probability distribution, we should initialize new tables.
    Eventually, because the changes of probability distribution aren’t usually large, so we could just update the tables near some boundary - replace some symbols with another symbol.
    About implementing it - I’m mathematician currently working on a few completely different projects … and I completely doesn’t have experience with low-level fast coding.
    This should be written by compression expert and for specific predictor…

  82. Andrew Polar Says:

    I don’t know how to write ABS that runs faster than RC and don’t know how to handle large alphabets in ABS other than by binary partitioning that is very slow for large alphabets and I did not see other implementations where it is done by other people. I have no reasons to protect RC from being a history, I did not invent RC. ABS is interesting but for the moment can not compete with RC. I thought ABS is faster for binary data but the fact that binary data can be processed by bytes or even by groups of 12 bits and make same size compressed data changes everything into favour of RC.

  83. Jarek Says:

    I’ve told how to speedup Matt’s binary implementations - don’t use stretching in fpaqa (one table check instead of two) or use larger buffer in fpaqc…
    ABS for groups of bits is ANS which is practically implemented on the link I’ve placed her three times…
    Takes practically one table check for group of bits and a few binary operations.
    It’s similar to RC, but uses constant buffer size - it just has to be faster and more precise while processing!
    But requires initialization…

    As always - the time will show…
    Best regards
    j

  84. Andrew Polar Says:

    OK, I compared again fpaqc as it is for the moment to RandCode.cpp on the same data sample. Time for fpaqc is 0.26 time for rancode is 0.22. Compressed data buffer is 486,685 for fpaqc and 474,460 for rancode. Obviously, I altered rancode to read and write data to a file to reproduce same conditions. The advantage of rancode in performance is obvious although insignificant. In addition to that fpaqc is limited to binary data while rancode can process large alphabets. rancode also can be optimized for binary data in the same way as fpaqc by replacing multiplication and division by look-ups.
    Good luck in your further research.
    A.

  85. Jarek Duda Says:

    A few things…
    1) I’ve found that there is a problem with ABS - maybe it’s the instability Andrew talked about…
    The problem is that each symbol has to appear very precise number of times, and some times the formula for ABS don’t do it. It would occur very rarely, especially if the range is huge as in fpaqc.
    I’ve describe the problem on the cryptosystem page and I’ve just seen that there is problem with ABS formulas.
    I’m just rewriting the section of my paper in which I’ll show some solutions (sometimes change one value or formulas).
    2) I’ve checked fpaqc source - he’s already transferring 8 bits at once. I’ve finally also analyzed arithmetic coding - I’ve underestimated it - it also uses constant buffer size. ABS should decode as fast as AC and encode a bit slower.
    3) The real advantage of ANS is that it’s one-stated: in opposite to AC, it can be cheaply tabled (… and large freedom of random initialization for encryption purposes)

    I’ve also written a demonstration about ANS almost two weeks ago … it should show up in a day or two(?) here:
    http://demonstrations.wolfram.com/author.html?author=Jarek+Duda

    best
    j

  86. Jarek Duda Says:

    I’ve analized the problem - fpaqc isn’t affected!
    But fpaqb can be…

    If we work in [2^R,2^{R+l}-1] range (fpaqb l=1, fpaqc l=8),
    in ABS using the formulas the problem occures iff ‘Ceiling[r*2^{R+l}] is odd’.
    In fpaqc the ‘R+l’-th bit is out of precision of q - is automatically 0 - everything is ok.
    But in tabled versions (fpaqc) we uses smaller precision.
    Thanks of tabling -the problem can be easily corrected by changeing slightly initialization:
    - round q so that Ceiling[r*2^{R+l}] is even, or
    - change one value in the initialization tables - the last number should create opposite bit than it currently, or
    - when the ceiling is odd, use alternative formulas: with floors
    x_1=floor(xq)
    x_0=x-floor(xq)
    d=floor((x+1)q)-floor(xq)
    x=floor(x’/(1-q)) for d=0
    or = ceiling((x’+1)/q)-1 for d=1

  87. Jarek Duda Says:

    But while initializing fpaqb we uses all possible probabilities just below this precision - ‘R+1′-st bit of q is automatically 0 - everything is ok!
    No corrections are needed!

    I wanted finally write this section solidly and I’ve finally simulated the formulas … and I’ve panicked … but everything is fine :)

  88. Sachin Garg Says:

    Great, congrats :-)

  89. Andrew Polar Says:

    The biggest disadvantage of ABS is only binary capabilities compared to large alphabets in range encoder. Especially my version that can process alphabets of 10000 symbols easily. As I already mentioned you can switch from small alphabet to large one without change in compression ratio. I generated binary array with certain probability of zero bit, say 0.8, but processed array byte by byte and received same compression ratio as I would do it for bits. This is what you use when process it by bytes. The opposite is not true. You can not easily convert large alphabets into smaller alphabet without paying a price. For example if we have message of 100 symbols with alphabet A,B,C and frequencies 60, 30 and 10. The entropy is 1.3 per symbol the compressed array is 130 bits. If we try to replace A by 00, B by 01 and C by 11, we have 150 zeros and 50 ones. Entropy is 0.81 but the length is 200 bits so the length of compressed array is 162 bits. Other ways of changing alphabet size are complicated and may work only for cases of small alphabets not when alphabet is 1000 and more. So, you may approach RC capabilities in the future for case of binary data, which is rare case in real life.
    By the way, most patents on AC are expired. I showed that method that I use in my RC was mentioned in reference in fundamental article of Mr. Langdon in 1984, who was AC guru at that time and the article was published in fundamental journal. World known guru in data compression acknowledged the method, I use, in 1984 as previously published. I can not imagine better prior art reference.

  90. Jarek Says:

    But ANS doesn’t have this disadventage…
    Ok - there are no quick to calculate formulas, but it’s one-stated - unlike RC we can table it.
    So if the probability distribution doesn’t change too frequent, RC needs 2 multiplications while ANS needs for the same symbol one check of table…

    While correcting the section in paper (pages 10-16), I’ve realized how terrible it was … sorry … I’ve rewritten it completely. I hopes it it’s much better and I would be gratful for any comments…
    http://arxiv.org/PS_cache/arxiv/pdf…0710.3861v3.pdf

  91. Jarek Says:

    sorry for the link above - please exchange it to
    http://arxiv.org/PS_cache/arxiv/pdf/0710/0710.3861v3.pdf

  92. Jarek Says:

    Yeah … there were a few small things to correct…
    I belive it’s fine now (?) - the same link.

  93. Andrew Polar Says:

    Jarek,
    When you write the article it is your responsibility to explain everything clear. You introduced only simple algebraic formulas but explained it in such a way that it is not possible to understand what is what. Try to explain your concept of ABS to middle school student and ask him to write article for you he will do it better and people will understand what you are trying to say.
    What is ‘q’ above formula (20). Is it probability of 0 or 1. Does it matter if q > 0.5 or it should be always below or equal 0.5. Formula (23) is not decoding it is encoding because division by probability makes number growing. When we do encoding or decoding we are using single state variable that takes new value on every step, why don’t we use S[i] and S[i+1] in notations. Formula (22) defines which bit is next in decoding. Why not to denote it as b. Formulas (20) and (21) should show decoding process, why you use notations x1 and x0 when we are speaking about same state variable that denoted in (23) as D. Formulas (20) and (21) will not work if initial state is 0, it continue convert it to 0 on every next step.
    One more thing: Topology has nothing to do with data compression. Your article is set of unrelated to each other topological statements written in a wrong English, which are true when understood but don’t represent anything new or not trivial. Software engineers see topology as a science similar to the one used in middle ages when people argued how many angels can balance on the head of a pin, that means interesting for selected people and totally unrelated to real life.

  94. Jarek Says:

    Andrew,
    Thanks for the remarks.
    1). what q means? two lines above (20) is: denote q=q_1 and it doesn’t matter where it is in (0,1)
    2) The paper is about encoding data into symbols - opositely to compression - coding/decoding are switched, as is written in introduction to section 3.
    3) About notation… I don’t remember where it is from :) It’s just representation of the data in ANS… and I wanted to be consistent with previous versions… D is from decoding…
    4) ‘0′ represent choosing one of zero elements … we cannot encode here anything… while making shift right in binary system, we also fininsh in 0 after some time…
    5) While splitting Markov’s chain into topologically separated subchains - there are no correlations between elements in these subchains. In this paper I generalize it to higher dimensions - to find methods to handle with higher dimensional correlations. I’m sorry but as a physicist, I see statistical physics in data compression - there is plenty of topolgy here… in image compression You can divide image into topologically separated by eg background regions…
    6) I’m sorry for my English … it was my first paper and I coudn’t find anyone here to read it. I would be very greatful for any comments
    best
    j

  95. Andrew Polar Says:

    thanks for reply,
    I, probably, used some strong expressions in my previous mail, I appologise, but I still support the matter of my previous message. Your article is hard to understand and it could be easily rewritten in understandable way. About topology: it is possible to split image into topological regions but it will not become smaller after that. Those gurus of topology who wrote books never contribute to image compression and opposite engineers who contributed to image compression were not known to write articles about topology. Your algorithm is newly discovered property of fractions that shown in your article totally independent from other ideas which are expressed in topological terms that raises questions why that all sitting in one article, because it takes time to understand it and to see that it could not be used in image compression at least immediately. I’ll return to technical questions in the next mail.

  96. Andrew Polar Says:

    I compared article
    http://arxiv.org/PS_cache/arxiv/pdf/0710/0710.3861v3.pdf to the first one mentioned on the top and did not find big difference regarding compression technique.
    Do I understand things correctly:
    I start from decoding:
    Step 1. we identify the bit
    b = ceil[(s+1)*p1)] - ceil[s*p1], where p1 is probability of positve bit and s is integer called state. b is integer that takes values from (0,1). When we know the bit we can apply formulas for decoding
    s(i) = floor[s(i+1) * p0] if bit is 0
    s(i) = ceil [s(i+1) * p1] if bit is 1
    where s(i) is state and (i) shows iteration number.
    Encoding is done by formulas
    s(i+1) = ceil {[s(i) + 1]/p0} - 1 if bit = 0
    s(i+1) = floor{s(i)/p1} if bit = 1.
    Is that correct? and is there something new in your last article?

  97. Jarek Duda Says:

    The article main goal is written in its title :)
    On the way I needed good coding, Huffman was imprecise, I didn’t know arithmetic, so ABS comes up…
    I agree that topology isn’t generally too helpful in data compression, but can be useful to handle with models with some local constrains - it’s very special, idealistic case, but can be used for example to increase hard disk’s capacity.

    The problem I’ve talked about concerns stream versions - we have make I_s to be b-absorbing. Other new things is solid mathematical explanations and the figure that shows that this encoding looks just … natural - sequence of bits on one side, sequence of symbols on the second, and the state between them that contains some data and using it we change one sequence into the second.

    There are plenty of possible formulas - it’s important that x_s sums to x and that x_s is about x*ps.
    There should be my demonstration on Wolfram’s page … their editor has accepted it a few weeks ago (???)- it should be obvious then … I can mail it now if somebody wants.

  98. Jarek Says:

    The demonstration is finally available:
    http://demonstrations.wolfram.com/DataCompressionUsingAsymmetricNumeralSystems/
    It shows that even with tiny tables and simplest statistical initialization it works nice.
    I’ve checked correlation tables - the probability distribution of sequences has is asymptotically going to be consistent with central limit theorem for uniform probability distribution - it can be used as random number generator.
    I wanted to make it educational - simply written.
    I made remarks how to write it practically on the cryptosytem site.

  99. Andrew Polar Says:

    Thanks for demo. Your understanding of ‘simply written’ is different from mine. I did not find it ‘simply written’ but ‘better written’ than original article. You wrote original article as if you purposely tried to reduce number of people who might understand it to possible minimum. This demo is better and I understood more things but not all.
    The remark about Wolfram implementation: It looks that they also like release things in the most complicated way possibl