The Data Compression News Blog

All about the most recent compression techniques, algorithms, patents, products, tools and events.


Posts: RSS Feed
Comments: RSS Feed

Recent Posts

  • Google Snaps Up On2 (12 Comments)

  • Bijective BWT (37 Comments)

    David Scott has written a bijective BWT transform, which brings all the advantages of bijectiveness to BWT based compressors. Among other things, making BWT more suitable for compression-before-encryption and also give (slightly) better compression.

  • Asymmetric Binary System (172 Comments)

    Jarek Duda’s “Asymmetric Binary System” promises to be an alternate to arithmetic coding, having all the advantages, but being much simpler. Matt has coded a PAQ based compressor using ABS for back-end encoding. Update: Andrew Polar has written an alternate implementation of ABS.

  • Precomp: More Compression for your Compressed Files (9 Comments)

    So many of today’s files are already compressed (using old, outdated algorithms) that newer algorithms don’t even get a chance to touch them. Christian Schneider’s Precomp comes to rescue by undoing the harm.

  • On2 Technologies is Hiring (11 Comments)

    There aren’t too many companies working on cutting edge codecs, and of those few this one is hiring. Best of luck.

Asymmetric Binary System

Posted by Sachin Garg on 17th December 2007 | Permanent Link

Update 16th Jan, 2008: Andrew Polar has written an alternate implementation of ABS. Check the comments below for details.

In an unpublished paper, Jarek Duda from Jagiellonian University in Poland has written about what he calls the “Asymmetric Binary System”.

It promises to be an alternate to arithmetic coding, having all the advantages, but being much simpler. Whether it holds up to that promise is yet to be seen but there have been a few interesting developments in last few weeks.

After a lot of hiccups in the process, Matt Mahoney has finally managed to implement the asymmetric coder in an order 0 compressor, fpaqa, released under GPL. It is designed as a drop in replacement for arithmetic coders in other programs like PAQ, LPAQ, BBB, and SR2.

There might be a few confusions when you go through the paper, so here are some details that came up in the comp.compression discussion on this. I am not detailing all the maths, refer the the paper for that.

It might seem that algorithm is O(n^2) because as x grows larger, so does the complexity of the arithmetic. But this is not true because x can be normalized by outputting the low bits to keep it in a fixed range (e.g. under 2^32) with insignificant compression loss.

It has only one state variable instead of 2 (low and high range), which means a lookup table can be used for both coding and decoding to fairly high precision.

Unlike arithmetic coding, the decoder moves x through the sequence of states in the opposite order as the encoder. This means that the bits of d are decompressed in the reverse order that they were compressed. This almost rules out all the usual predictive models like PPM, PAQ, LZMA, etc. because the model sees different context during encoding and decoding.

You have to ensure decoding/encoding to be reverses of each other - if one first encode/decode than put/get lowest bits, than the second has to do it in the reverse order - get/put lowest bits such that after decoding/encoding we will stay in the selected range. To do it unambiguously, what we need to have the reverse, we should ensure constant bit-length of x - select the range [2^R,2^{R+1}-1].

To solve the asymmetry problem in fpaqa, what Matt has done is that the compressor models the input in the forward direction and saves q and d in a stack of size B=500K. When the stack is full or at end of input the saved q and d are encoded to a second stack of packed bits, and that stack is written to the output file in reverse order along with the first stack size and final state x. The decompressor does everything in the forward direction, periodically reading x to reset the state. The compressor does all the data reversal, so it is about 15% slower than the decompressor. B is large enough that the extra headers are negligible (5 byte overhead per block).

Overall, it looks a promising alternate to arithmetic coding but there aren’t any compelling reasons yet which will make you want to replace it. But as with all fresh ideas, it might get there with time. And, did I forgot to tell you that its free of any patents?

172 Responses to “Asymmetric Binary System”

  1. Ognyan Kulev Says:

    What about patents? Arithmetic coding has some. Is there such problem (for free software) with ABS?

  2. Sachin Garg Says:

    Jarek mentioned that there are no patents to it.

  3. Jarek Says:

    Jarek only observed, that if it would be known … it would be known :)
    I don’t know anything about patent law - I believe that it’s so simple that optimal algorithms should be created quickly, and it would be impossible to patent anything important… (?)

  4. Andrew Polar Says:

    My quick test of Matts fpaqa showed that compress/decompress time is identical to FastAC and my implementation of AE from (the difference is insignificant, about 15% for all 3 programs) but the compressed size is significantly smaller than entropy estimated size. I’ve done round trip for test.

    Any explanation? Is that expected result or something is wrong with my test?

    Regarding the patents: In USA the inventor can apply for a patent within a year since first publication. In order to do that inventor should not be a citizen of USA.

  5. Andrew Polar Says:

    I found what was wrong with my test, now things are clear. The compressed size is the same the compression time is the same. Are there any other reasons of choosing new technique, may be theoretical possiblity of future improvement?

  6. Jarek Says:

    Arithmetic coding is evolving for 30 years, ABS a few weeks … fpaqb decompress a bit faster and codes practically optimally. And instead of arithmetic, it should stay patent free.
    I’ve just made general concept for versions which for fixed probability distribution decodes/encodes as fast as Huffman, with option that the output is encrypted(looks very safe)

  7. Andrew Polar Says:

    Thanks for quick answer but you did not make it clear. You say fpaqb is optimal, does that mean that you don’t expect further improvement? If no further improvement than no reason to switch.
    Arithmetic encoding is now patent free or soon will be. They all expired. I implemented AE (or some people call it range encoder) as it was published in 1979. I’m not sued yet for infringing a patent. And my code could be written in 1979 without evolution if author made his solution popular. My code is not very special, it has same peformace as other successful implementations such as FastAC and possibly other.

    I would like to ask again do you expect further significant improvement or not?

  8. Jarek Says:

    Optimal in the sense that we get theoretical compression (-\sum_i p_i lg(p_i)), arithmetic is worse about 0.3%.
    And decompress in fpakb doesn’t uses memory (lookup tables).
    In theory - ABS should be faster - it uses one state instead of 2 (low and high range), fpaqa has now a few days.
    Another think is that in the version from the link above (for fixed probability distribution) we block a few bits - in one table lookup we decode whole symbol as in Huffman - it’s a few times faster!

  9. Jarek Says:

    I’m sorry for this 0.3%, I’ve only looked on Matt’s tests…
    Of course - it depends on the data and implementation - optimal for arithmetic shouldn’t have lookup tables - it would be much slower.

  10. Andrew Polar Says:

    After reading of the article and looking at the Matt’s code I can say a few things:

    1. It is not necessary to calculate LOW and HIGH in arithmetic encoding, most encoders do that but in my encoder only LOW is calculated, so in this way the asymmetric binary coder has no advantage.
    2. It is big limitation when decoding should be performed backwards.
    3. Known programmatic implementation does not show significant advantage in terms of time or compression ratio.
    4. Besides AE and range encoding there are other methods of entropy encoding that don’t provide significant advantage in general but may be preferred for some particular data, Golomb-Rice coding, for example. The suggested asymmetric binary coding may be considered as another alternative of that kind.
    5. Practically all technical articles about entropy encoding have one big problem. They tell you one story, when software implementation is completely different. The article and first implementation of ABC maintain that bad tradition established 30 years ago. The article of Mr. Jarek Duda don’t provide proper explanation of the algorithm and implementation of Matt Mahoney is so far from the article that nobody would figure out that they are at least remotely talking about same things without reference to each other. As it can be seen by google search there was long technical discussion between Matt and Jarek before the programmatic implementation. Isn’t that possible to explain things more clear? I think the author of the method should be kind of interested in that and not waiting when others explain things in a proper way.

  11. Jarek Says:

    Patient please :)
    This method is just rapidly evolving right now!
    In this moment I can tell You why it’s better :)
    The general version (from the link above) works as fast as Huffman: converts a few bits in one table check - it’s a few times faster!
    The limitation that the probability distribution is fixed can be weakened by using a few of them:
    For example we can use for each symbol(or a few previous) separate probability distribution of the following one - in this way we DOUBLE
    (or more) USED CORRELATION LENGTH (by cost of memory).

    It’s as precise as AC, as fast as Huffman!
    And we optionally can encrypt it well in the same time!
    The implementation is only a matter time now.
    In Matt’s implementation of the previous version backward is encoding, which slows it about 15%.

  12. Andrew Polar Says:

    What about computational stability and large alphabets. My encoder can handle data with alphabets from 2 up to 20,000 without any adjustment and makes compressed result matching entropy limit for all. Many encoders have issues with large alphabets (FastAC for example), some of them need adjustment where you have to change code (Witten’s coder), others have limitations that make processing large alphabets impossible. I tested Matt’s code on the nice data sample: alphabet size 256, distribution normal. What if I passed alphabet 10,000 with long seqence of zeros somewhere in the middle of data.
    To run as fast as Huffman is not a big deal, most range encoders are near that. Encryption may be interested for only few users. What I see so far is new but not revolutionary and what is completely wrong in preamble on the top is statement that it is much simpler than AE. Absolutely not, it is way more complicated. Give your article to read to random person with corresponded background and ask questions after that, you’ll see it right away that people don’t understand the concept easy. Not only a concept but implementation also shows the same complexity. Most coders are few lines only same as suggested method. The length is important when people make modifications for particular data types, that is what I did many times with different coders.

  13. Sachin Garg Says:

    I meant simple for hardware to execute, not for humans to understand :-)

    And yes its not there yet, but “promises to be”.

  14. Matt Mahoney Says:

    I implemented asymmetric coders in fpaqa, fpaqb, and lpaq1a all available at
    For explanations of the coding system, see the comments in fpaqa and fpaqb. fpaqb is newer and simpler. All implementations are open source (GPL) C++.

    I initially thought that ABS would be faster because encoding is a table lookup with 10 bit state (x), 7 bit probability (q, using nonuniform quantization from 12 bits) and one bit to be coded (d), with new x as output. Decoding is also a table lookup from x,q to x,d. However my implementation in fpaqa turned out to be twice as slow as bitwise arithmetic coding. It might be faster on old hardware because no multiplication or division is required. However, a second table is needed to map q from 12 bits to 7 nonuniformly with smaller spacing near 0 and 1.

    In fpaqb I improved decompression speed and compression ratio and reduced memory usage by using 12 bits for x and q, and using a table only to save a division during compression. The algorithm is also simpler. Decompression is faster, but still 40% slower than arithmetic coding.

    fpaqa and fpaqb are order 0 models only intended to illustrate the coders. You can compile with -DARITH to select arithmetic coding and compare performance. fpaqa compresses about 0.3% larger. fpaqb compresses very close to arithmetic coding.

    As a practical example, I wrote lpaq1a using the model from lpaq1 and coder from fpaqb. This is a high end compressor (comparable to ppmonstr). lpaq1a is about 1% slower than lpaq1, compresses 0.02% larger, and uses an extra 1.3 MB memory for compression. ABS requires that compression and decompression be done in reverse order, so the compressor divides the input into 64KB blocks, models in the forward direction, codes in reverse to a second buffer, which is then written to the compressed file again in reverse order. Decompression does everything in the forward direction (although either the compressor or decompressor could reverse the data). The larger size is due to a 3 byte header that has to be added to each block to save x. You could improve compression approaching the same as arithmetic coding at the expense of more memory.

  15. Jarek Says:

    fpaqa is slower than arithmetic coding, because for each digit, it makes TWO TABLE CHECKS:
    ” r=qr[predictor.p()]; x=dec[r][x] ”
    Matt used 2 tables to make stretch (stretch(q) = ln(q/(1 - q) ) before decreasing its precision - thanks of this we have better approximation for extreme q (near 0 or 1), but worse for q near 1/2 - finally Matt writes that it makes the compression rate about 0.02-0.1% better.
    If we wont use it - just reduct the precision, we can do everything in ONE TABLE CHECK, for example
    r=dec[predictor shr K][x]
    where K is the number of bits to cut (2-4 or even zero but it would need a lot of memory).

    Another slow down is the while loop for very small q. We could store the number of bits to get, in some bits of dec/cod and decode/encode in separate codes.
    But the best solution would be to construct the predictor to work around 1/2 - if q>=1/4 we need at most 2 bits to get/put (one if q>=1/2).

    For example if there is Huffman based compressor, instead of standard encoding (approximate every binary choice with 1/2), we can do it exactly (get better compression rates) using created Huffman tree - just calculate the binary choice probabilities exactly (they are near 1/2, maybe larger than 1/4 (1/2?) ) and use ABS.

    I don’t believe it would be slower than arithmetic coding.

  16. Andrew Polar Says:

    I wish to thank Mr. Mahoney for first implementation of ABS, however, I have to add few critical remarks.
    For the moment ABS is patent free. In the event if new fast version is developed corporations start filing patents. They do not file published ABS, because it is prior art, but add some unpublished modification and file patent on a new version. This all already developed before our eyes just few years ago with AE. When filing patents corporations will use obscure terminology and fuzzy definitions, so when the patent is granted it will be written in such twisted language that any ABS implementation will have risk to infringe a patent, so scared researchers will try to avoid risk of corporate lawsuits. The advantage of being patent free will disappear as soon as ABS enters into serious competition with AE and Mr. Duda will not be capable to prevent that along with Mr. Garg and Mr. Mahoney.
    For the moment ABS is slower than many AE. When making comparison the best is to compare it to FastAC, it uses file in the same manner as Matt’s implementation. I would advice anyone build fpaqa and FastAC with same compiler option and pass same data file to both you will see the difference right away and it is not 40%, it is larger.

  17. Sachin Garg Says:

    Yep, that is a probable scenario. Not that their is anything wrong in it, but if someone else puts in the effort, their is the possibility of them going the patent route if they want to.

    If someone would really really care about the patent issues, they should make their own fast implementation and make it public :-)

  18. Andrew Polar Says:

    As I already mentioned the explanation of ABS continues bad traditions established by authors of arithmetic encoders when you need to read several lines of text for hours before the algorithm gets clear. I could not understand Mr. Duda’s explanation even after reading it multiple times and figured out how the method works only from Mr. Mahoney’s code. I wrote elementary demo myself and now I see that ABS encoder can beat AE significantly when written correctly. As I found in Mahoney’s implementation the operations are conducted for every bit. This is not even unoptimal, that is just wrong. It may kill the compression if symbols have near even number of positive and zero bits. But that could and have to be corrected. Let us consider the message AAABBBBBCCDDD. If we make binary array with positive bits where symbol A is met we have 1110000000000. If we then ignore A and make binary array for remained symbols with positive bit where B is met we have 1111100000. If we continue for remained part we have 11000. We can restore the original from binaries 11000, 1111100000, 1110000000000 if we know also alphabet as sequence CDBA. Our new binaries are total 28 bits. Original is 13 * 8 = 104. The difference is large. Even if we express A,B,C,D in 2 bits each, it is 26 bits per message, that is close. I already published on my site the proof that total compressed size of considered partial binary expressions is exactly matching Shannon formula, but then removed proof because it is elementary and probably known. Another important advantage of BINARY PARTITION (my terminology) that we shall have long fragments of 0 bits and, in case of block processing that may lead to better compression, some sort of adoptive approach.

  19. Andrew Polar Says:

    In my partitioning example if we start from the most frequent symbol we get total binary size of 26 bits for binaries

    11000, 11100000, 0001111100000 and alphabet saved in sequence C,D,A,B

    the exact number of 26 bits required for whole message if symbols are expressed via 2 bits. I don’t think it is a coincidence.

    I think people still have questions and not many of them understand the concept because I don’t see much activity. To help I made elementary demo for those who still have questions how the algorithm works. It is not compression tool, it is demo of the formula, people can find it on

  20. Jarek Says:

    I believe that ABS is simple enough to implement (multiplication/division and transfer of bits), so before they will be allowed to patent it (one year as You’ve said?), there will be optimal implementations generally available.
    I know - I should do it myself, but I’m only alone mathematician, who has programmed in Mathematica only for last five years…

    About Your representation - there are plenty equivalent of them.
    The trick of good compression is to find proper structure and representation of CORRELATIONS in given model.
    For example, I think that in image compression we should get better compression rates, using hexagonal structure - we could predict the probability distribution using values of 3 closest neighbors:

  21. Andrew Polar Says:

    Is that correct?

    Asymmetric Binary System is reversible bitwise conversion of one natural number into another wherein the latter has uniform distribution of bits and length according to entropy of original integer computed for bits. Direct conversion is achieved by logical and arithmetical operations on one variable that is called state, conducted in two possible ways depending on the bit occurred in original data. Reverse conversion is achieved by backward bitwise extraction of original data from state variable and updating state variable in two possible ways depending on extracted bit. In reverse conversion the state variable is taking same values as in direct conversion but in reverse order.
    If we denote state variable in i-th state as S(i) and P{b=0}, P{b=1} are probabilities for occurrence of zero and positive bit in the input stream than direct conversion is performed as

    S(i+1) =

    ceil [(S(i) + 1)/P{b=0}] – 1 if b = 0
    floor [S(i)/P{b=1}] if b = 1

    where initial state can be assigned as S(0) = 0. The reverse conversion is performed as

    b = ceil[(S(i) + 1)P{b=1}] – ceil[S(i)P{b=1}],
    where b is extracted bit

    and new state S(i+1) =

    floor [S(i)P{b=0}] if extracted bit b=0
    ceil [S(i)P{b=1}] if extracted bit b=1

  22. Andrew Polar Says:

    I changed demo to operations with integers

    interesting that initial state can be assigned arbitrary to any integer and this integer can be used as end indicator in decoding.

  23. Andrew Polar Says:

    ABS appeared to be much simpler than it looked originally. One important property of transformation is that state variable can be subjected to any arithmetical operation in direct transformation and, if reverse operation is applied on the backward transformation, the result is correct. The backup demo can be found at
    That means that when state variable grow large enough to approach buffer overflow the upper bits can be output and state variable can be shifted left in order to continue. This operation must be synchronized and reversed (shift right add bits from compressed buffer) in decompression process. It is, of course doable, but I think it even does not worth bothering. Since size of the state variable reach the size defined by Shannon entropy pretty quick the whole state variable can be saved to output buffer as soon as it reach certain length, let say 48 bits. So I simply suggest to continue operations while size of state variable is less or equal 48 bits, then output whole state variable and start from 0 again. The decoding in decoding process shall continue until state variable reach 0 (some more research required for defining stability in termination of decoding process). It can be adoptive, because it is performed by blocks, it may start from probability 0.5 and update it after every block. Look-up table may be not the best solution for 48 bit variable, but hardware implementation may completely solve problem. As you can see in demo there are only two critical functions process_bit, extract_bit that may have hardware implementation. In case of suggested above Binary Partitioning the whole process can be conducted as multi-threaded, that in case of multi-processor system, may run significantly faster than regular AE.

  24. Jarek Says:

    You have to be careful with using the initial state as end indicator-it can occurs many ti me.
    About dividing on blocks - in Your way we loose much of capacity. To do it wright, we should use constant bit length of the state - see paper or Matt’s implementation.
    In fact ABS is kind of similar to AE - instead of dividing the range into two subranges, we just spread them uniformly.

  25. Andrew Polar Says:

    I suggested one of the possible ways to manage growing size of state variable. Demo can be found at

    In this example when state variable grew over 32 bits we output least 32 bits than shift them out

    state >>= 32;

    and continue with new state variable. In decoding we need to find right moment when we do the opposite operation


  26. Andrew Polar Says:

    I have discovered instability in the above approach because small values very often stay the same after processing a bit. I will try later the opposite method - maitaining constant bit length state and output least significant bits.

  27. Matt Mahoney Says:

    That should work. That is the approach I used in fpaqa/b/c. In fpaqa the state, x, is always in the range 2^n to 2^(n+1) - 1, where n is the number of bits of precision in the bit probability, q (in the range 1 to 2^n - 1). For compression, you first check if encoding will make the state too large, and if so you output the least significant bit until x will be in the right range. This is stable as long as q has at most n bits. For decoding (when x gets smaller), if x is too small, you read into the least significant bit in the opposite order.

    In fpaqb/c, the coder reads and writes 8 bits at a time. x is in the range 2^n to 2^(n+8) - 1. You can have n

  28. Andrew Polar Says:

    To maintain the computational stability appeared to be a challenge in this algorithm. It may be fast enough to compete with AE when synchronization in saving bits in encoding and pulling them back in decoding is solved. In my latest version I have failure in every other data sample. The good news that every other sample pass for the long range.

    I did not follow Matt’s recommendations on purpose but my version is close to his approach and it still have an instability.
    Any ideas?

  29. Jarek Says:

    Matt has already optimized the version with the proposed formula (multiplication/division).
    Maybe try to use lookup tables - created to be reverses of each other.
    This version is slightly less accurate (like usual implementations of AC), but should be faster than AC - for one bit we need only one table check and bit transfer.
    There is huge amount of proper coding/decoding functions(x\to(x_d,d)), the essential is only that they fulfills:
    - that x_i is about x*p_i
    - is bijective
    (for example take a=phi)
    Unfortunately, I think that quickly to calculate proper coding/decoding formula can be find only in the binary case, but we don’t care about it when we use tables, which can be quickly initialized.

  30. Andrew Polar Says:

    There is numerical instability for small state values. There are some values when state don’t change on particular step that means it goes to infinite loop in decoding. For example,

    long long res = 8;
    for (int i=1; i

  31. Andrew Polar Says:

    I finally implemented ABS encoder that works not very far from the capacity of AE. The running example can be seen on http://www.ezcodesample/abs/abs7.cpp My concept is straight forward and simple. It encodes data in 48 bit fragments. Each 48 bit fragment of output data contains variable length sequence of data. All fragments can be processed independently in decoding. That means that highest disadvantage of ABS (backward decoding) is resolved. If implemented in multi-processor hardware it can be very fast. The compression ratio, however, is about 3 percent larger than theoretical limit, but it can be compensated by adoptive coding. We can predict probability of positive bit for every fragment based on previous fragments and use it for non-stationary binary stream. I recommend my program to those unfamiliar with subject and willing to understand the concept. The encoding and decoding needs about 20 lines each and the whole program including comments, entropy estimation and data sample generation is about 200 lines.

  32. Matt Mahoney Says:

    You can avoid the 3% size penalty by outputting the 8 low bits of the state when it grows too large, instead of outputting the entire state (48 bits) and resetting back to 1. This was done in fpaqc and lpaq1a. The problem with this approach is the compressor has to buffer the predictions and the output in order to reverse it. Of course, abs7 does this with a 48 bit buffer. My programs just use larger buffers.

  33. Andrew Polar Says:

    Thank you for reviewing my code. I’m experimenting so far and thinking of a different solution even set of possible solutions. I expedited functions by replacing possible multiplication and division by shifts and program now runs twice faster. If not using lookups there is one unavoidable division in direct transform and one unavoidable multiplication in reverse transform. I added short explanation of ABS to my site and now sample can be found through the link, so people can look to home page and follow the link.

  34. Andrew Polar Says:

    OK, with binary encoder thigs are more or less clear. It there a way to use ABS with larger alphabets directly without converting everything to dual alphabet?

  35. Jarek Says:

    Yes! … :)
    We only need that x_i is about x*p_i
    This time, I think that there is no fast to calculate formula, but we can generate the lookup tables - encode x into symbol ‘i’ when something with probability p_i happen - there is huge amount of possibilities - we can use for it random generator.
    If we use the key to initialize generator - we by the way encrypt the data.
    This time we need better precision - we can use only a few probability distributions - for example: general and of successor of a few most probable elements.
    Ok - I’ll place it the third time… :)

  36. Jarek Says:

    I thought about error correction, because one bit-flip makes that we lose everything after it.
    To do it, we have to add some redundancy - according to expected density of errors.
    For example we can set one bit in every byte to 0 in the data to encode (after prediction).
    Now if while decoding we get there 1, we know that there was an error in a bit in at most a few bytes before (with probability 1/2 in the last, 1/4 previous… )
    We should flip bit by bit and try to decode - if the density of errors isn’t too large, we should be able to localize and correct the error with large probability.

    Eventually if we lose larger fragment, we have to guess proper initial state - if we’ve put some redundancy, it’s rather doable state by state in not too large time.

  37. Matt Mahoney Says:

    I think it is best if data compression, encryption, and error correction are separate functions, independent of one another. First compress, then encrypt, then add error correction.

  38. Andrew Polar Says:

    Encryption in US is subject to export control. That means it even can not be taught or explained without license ( to a foreign national leaving in US. Those techniques that are patented, such as RSA, are excused and can be discussed freely, but only the algorithms, software are still subject to licensing. I consider these regulations totally stupid unjustified and violating my constitutional right, but I follow and do not publish certain materials on my site, such as my solution for SSL Web server. I post this message for those who don’t know.
    Encryption in combination with AE or ABS is not a problem, everybody understands that changing one bit will affect all following message. There is infinity of opportunities for this, why you pay such attention to this? I don’t see it as serious technical problem.

  39. Jarek Says:

    Luckly I’m not from US :)
    The general concept, for example with some fixed probability distribution, is just coding algorithm.
    Encryption is just a natural modification.
    In AE new symbol is put into most significant position of state and encoding is made from the same side too - there would be correlations between symbol and the output.
    In ABS symbol is also put into the most significant position, but output takes the least significant - its behavior looks completely random - instead of in AE it practically doesn’t depend on the symbol.
    Symbol only determines which of random (using the generator) jump we use - in practice we jump in random way between very large number of states.
    Having worse random number generator, only can allow (I think) to try brute force attack without initializing whole table.

  40. Andrew Polar Says:

    Well, one way of encryption is change in final state, so the user without proper key will start from wrong state with correct frequencies and fail. But that does not look what you mention.
    It looks like you mention of changing tables that means probability that may also work but I don’t see big advantage here.
    The other way that you mentioned is possibly changing probabilites during encoding accoriding to certain rule. That may not lead to big advantage, because you may introduce some rule of changing state after every step.
    So, I still don’t see the technical problem. All is doable and will work.
    The strength of encryption not in rules because they will become known but in password. And again there is no differece with AE, the password can be applied as XOR operation for output bits and can be as long as you want it, so brute force attack will take forever.

  41. Jarek Says:

    Matt - sorry - I haven’t seen Your post…
    I agree that doing compression, encryption and data correction separately is conceptually simpler.
    But look at todays super object programming - computers are faster … but often works slower … and programs are counted in GBs…
    We cannot always just say that something will be done in another layer and just forget about it…
    To make error correction there is always redundancy needed, encryption can somehow show the structure of compressed file… the more layers we are fully aware of, the better should they combine with each other…

    ABS(ANS) is just a general concept of operating on nonintiger size data parts.
    The more functional layers would it support, the faster and safer will be software based on it.

  42. Jarek Says:

    I see quick responce… :)
    Andrew - this encryption’s main power is in the coding/decoding tables (generated using the key) - they defines the rule of jumping between states - state is hidden variable - slightest change of the tables and we are in completely different place … and the hacker sees only the least important bits of the states - they are useless, because he cannot track his current location…

  43. Matt Mahoney Says:

    Actually the U.S. has loosened export controls on encryption. For example, you can freely download AES source code from many sources linked by
    Export controls had nothing to do with whether the method was patented. Also the patent on RSA has expired.

    When I say that compression, encryption, and error correction should be separate functions, that doesn’t mean that you can’t write applications that combine them. Many compression programs also encrypt and add checksums. What I mean is that the algorithms should be independent. Encryption is very hard to get right. Combining it with compression just makes it harder.

    Before you start designing your own encryption, first you need to tell me what is wrong with AES, SHA-256, RSA, etc. The only way that we know a system is secure is if a lot of people try to attack it and fail. Standard methods have been studied a lot more than anything you could design yourself.

    Just for example, you could “encrypt” by setting the ABS state to a secret key, then I could crack it by brute force search. I know you could “fix” it by adding more complications, and I could break it again by using more clever tricks, and back and forth, but who is going to be confident that you fixed the last security hole?

  44. Jarek Says:

    I personally believe that in the simplest version (with added random byte on the beginning of file) ANS is already safe - that means the only possible attack is brute force.
    But it has nice protection against this kind of attacks:

    In other encryptors we can just take next key and try to decrypt - analyzing the beginning of output we can decide if we’ve found the correct key.
    In ANS before starting decryption process we just have to initialize whole decoding table.
    Choosing the number of bits we are working on (R), we can make the time needed to check every key as long as we want.
    So even using short key should be safe with ANS.

    If it will occur that we can decode without initializing the whole table, we can just take better (less predictable) random number generator.

    RSA needs gigantic key, is extremely slow and in fact we don’t know if somebody hasn’t polynomial in time decomposition algorithm (3-4 years ago was officially shown polynomial algorithm that checks if number is prime).
    But RSA has large advantage - asymmetric key, so I will compare ANS with symmetric key encryptors only.

    Both AES and SHA just takes constant length block and mix it as long as it doesn’t look safe.
    Another advantage of ANS is that it uses variable block lengths - we cannot even decide how to block the file!

    And the QUICKNESS …
    After constant time for initialization, ANS encrypt/decrypt maaaaany times faster than AES or SHA … and compress by the way :)

    But of course - many should think about it, before it will be widely believed safe and used.
    Because of it and that I think, we’ve overstrain a bit hospitality of this blog (thanks :) ), I propose to move to cryptography forum with further discussion (I’m citing You there, If You mind - I will remove it).

  45. Matt Mahoney Says:

    Yes, take it to a crypto forum. Also read the sci.crypt mini-FAQ, especially about designing your own system.

  46. Andrew Polar Says:

    This is for Matt regarding disadvantage of other methods.
    RSA is slow.
    You forgot to mention BlowFish. I used it. It is fast and known to be secured. It passed test period as some other methods you mentioned.
    The reliability of your encryption must be tested by those people who are paid for breaking the ciphers. I’m not one of them although familiar with some methods.
    There are rules about patenting and export control, Matt you are not a lawyer. I tried to obtain legal consultation about my SSL Web server. Unfortunately, lawyer wanted $2000 for giving me an answer if publishing is allowed. I said that it is too much for a person with 4 years of education and refrain from a publishing until I find legal advice for cheaper price.

  47. Jarek Says:

    I’ve look through the FAQ and haven’t seen any reason why ANS could be not safe?
    Maybe (?) in the simplest version it could be vulnerable to adaptive chosen plaintext type attacks …
    To ultimately protect it, we can for example before putting to ANS, XOR every byte with byte generated by our generator (there are many possible options here).

    DES, AES, SSL show tendency that every few years we should just take larger block and make more XORs, permutations…
    Maybe we could change the way of thinking - observe that we have much more unpredictable processes then permutations, for example random number generators and use mathematical theory of chaos…

    And now even the simplest cryptosystem looks safe - just XOR the data with output from good random number generator, which was initialized using our key.

  48. Andrew Polar Says:

    If you did not spend years in crypto laboratory trying to break ciphers and did not succeed in this task you are not a specialist. Do you know that during the second world war all ciphers created by best mathematicians with special encryption experience were broken. Specialists specifically warn people do not use any unprofessional ciphers. Especially vulnerable is XOR operation and everything related to it for different type of attacks. Average cryptanalyst needs between 15 minuts and 3 days to break average amature cipher.

  49. Jarek Says:

    Making XOR with sequence generated by random number generator was only a simple example, vulnerable on any plaintext attacks.
    But it could be easy strengthen for example by using also previous output in the generator.

    I’m only suggesting that instead of expanding old methods, maybe we should try some new concept.
    ECC is becoming more and more popular, but it’s not the end of possible new concepts…
    For example if we use some family of chaotic, measure preserving bijections parametrized by the key, I think it should be provable that taking large iteration of it just cannot be found not using brute force - it should be perfect random number generator.

  50. Andrew Polar Says:

    The reliabitly of code will become known only when large number of commercial giants start using the cipher and breaking it become attractive to hackers. For now you may suggest something and successfully use it for years. New technique is always good, nobody ask you to stop doing that. My point that encryption is specific and very different from compression. In compression you see the speed and compression limit right away. In encryption you need attract analysts for the test, which is not easy. Contact Bruce Schneider and see what happens. Google his name if you don’t knwo who is he.

  51. Jarek Says:

    ANS can be imagined as a tube shaped mixer - we fill it to the top (to R bites) from upper side, turn on (use table) - it’s mixing up-down its contents (the number of state) and spit out something (symbol) from the bottom side.
    Random generator creates small, unpredictable disturbances in the mixing process, they grow while the data flow down and mix with other data … finally on the end is completely changed in random way.

    It’s quite universal way of thinking …
    Binary system is the base of computer science … something so simple as it’s generalization (this or complex base systems) should have many uses too…

    I’ve already mailed Bruce Schneider … and many others …
    I see that many people watch the cryptography forum and nobody respond … I can guess that it means that it’s not so easy to break it, but we will see …

  52. Andrew Polar Says:

    There is one more step that can be taken to the direction of entropy encoding that looks like more advanced compared to all known and final for binary encoders. The bit length of compressed expession computed based on entropy is larger than bit length of number of possible permutations computed as logarithm of [n!/a!/b!], where ‘n’ is size, ‘a’ and ‘b’ are number of positive and zero bits. The usage of look up tables does not make sense for ABS because instead of computed table values suggested by Jarek we can use computed (once in a life time) table values as positions in a sorted list of all permutations for given size and probabitity of positive bit. In this case we convert fragment by fragment back and forth.
    How about this Jarek?

  53. Jarek Says:

    I have speedup for bit transfer!
    For example it should make fpaqc a few (or more) percent faster.

    To ensure everything is reversible, we use constant length of digits (R+1 bits) buffer (state).
    But nowhere is written that this digits have to be binary … we can use higher base - for example 4 (2 bits) or larger.
    For base 4 (2 bits), we will use:
    Range: [2^{R-1},2^{R+1}-1]
    Decompressing: while(x

  54. Jarek Says:

    (something cut my post…? I see… :)
    Decompressing: while(x \less 2^{R-1})x=4x+2bits from input
    Thanks of it we have to make twice less transfers in cost of table size(half more) and a bit of precision.
    We don’t feel these costs in precise version (fpaqc) - it will be faster (compression especially) practically for free.
    If the cost of multiplication doesn’t grow, maybe we could increase it even to 8bits ([2^R,2^{R+8}-1])- decreasing the time for bit transfer practically to zero.

    ps. Andrew - changing the representation automatically doesn’t give nothing for the compression … we need probabilities, correlations…

  55. Sachin Garg Says:

    I am sorry for that ‘cut’, must be something with the blogging software I am using.

  56. Andrew Polar Says:

    I’m speaking about final solution when all permutations are sorted and numbered for example if message is 4 bits and 2 of them are positive we have
    and we number them as 0,1,10,11,100,101.
    When we output 11 that means original is 1001. Same thing applies to large array. This way of compression is better than entropy times size and Shannon was wrong but not very much wrong, because it is close to entropy times size. The number of particular permutation can be found without making all permutations and sorting them.

  57. Andrew Polar Says:

    In my previous example according to Shannon formula the bit length of any message of 4 bits with 2 positive bits must be 4 bits, but actually it is 2 bits the difference is bit. Unfortunately it is not that big when message is large. For 1000 bits the difference is only fraction of percent. But if you apply this to small fragments as look ups you can make compression significantly better. There is however one problem it is number of positive bits, it should be same or should be output and kills the compression. However it is something to think about, not yet solved.

  58. Jarek Says:

    Andrew - if I correctly understand - You are proposing some bijection between binary representations and numbers of permutation…
    It won’t change the number of possibilities!
    Encoding is choosing the number which denotes given element of our space - to do it we need for example lg(size of the space) bits…
    The only way to decrease this number is to use the probability distribution of elements!
    (for example that some sequences are forbidden - have (almost) zero probability)

  59. Andrew Polar Says:

    I thought I expressed my idea clearly. I speak about totally different way of entropy encoding. For the moment I don’t know how to implement it. If I knew I would do that but I see that it is possible. I can repeat again. If you are given the size of message and number of positive bits you can calculate the number of possible permutations. You also can make the list of possible permutations, sort it and replace your particular message by the position in the list. The entropy is not theoretical limit and Shannon made mathematical mistake and I’m surprised how people don’t see it. Theoretical limit is bit length of number of possible permutations which is slightly less (fraction of percent for real case) than entropy times size. We know that before Huffman encoding there were other methods of making binary trees and Shannon was using one of them and noticed that size of binary code for symbol approaching to -log2(p). Read his article again and you will see that there is no theory behind this statement. Fortunately to him this wrong limit appeared to be close to the right one so he avoided shame of publishing mathematical mistake. Read my article about range encoding on my site it clearly shows everything.

  60. Andrew Polar Says:

    The new, final and optimal way of statistical data compression is replacement of the message by the position in imaginary list of sorted permutations. Please confirm that you understand what I’m talking about, because when I say that theoretical limit is below the entropy people are reacting like catholic priests on heresy. Surprisingly Shannon mistaken statement made without proof over 50 years ago is taken as final truth and supported by yourself too. Do you agree, yes or no?

  61. Sachin Garg Says:

    Andrew, your method sounds fine but I fail to understand how it contradicts Shanon.

  62. Andrew Polar Says:

    It can compress below the entropy limit statistically. That means that every possible message and all possible messages will be compressed below entropy limit. However, not significantly below, fraction of percent for most of them. I can write implementation but it will require long time for encoding/decoding and will be impractical. I speak about theoretical mistake. For pactical compression fraction of percent is not important for theoretical article it is important. Why other people persistently don’t see this mistake that surprise me.

  63. Sachin Garg Says:

    Can you explain ‘how’ it will compress below the limit?

  64. Jarek Says:

    The equation made me feel the Shannon’s entropy (logarithm of number of possibilities) is (see my paper):
    binomial(n,pn)=~ 2^{nh(p)}
    where h(p)=-plg(p)-(1-p)lg(1-p)
    So if we already know the number of ‘1′ (probability of 1), to encode the number of combination we need about h(p) per bit.
    You would get practically the same (compression rate) as by finding p(1) and using only this to encode the file for example using ABS.

  65. Andrew Polar Says:

    Sachin, I thought I answered this question many times. I do it again because I want to see at least one person in the world who understand that. If you have, for example, file of 100 bits with 50 positive bits. According to Shannon this file can not be compressed. Entropy times size is exactly 100 bits. Now, I repeat it again, how many different messages you can make. The correct answer that can provide any math teacher with bachelor degree is 100!/50!/50!. The bit length of this number is 97 bits not 100 bits. If we sort all possible messages and assign sequential number to each of them starting from 1,2,3,… and so on we can use this sequential number as encoded message. The average size of encoded message will be even less than 97 bits, may be something near 92 bits. Huffman got his Ph.D. for improving algorithm of Fanno-Shannon on a fraction of percent here is much more. People persistently deny mathematical mistake of Shannon. That is not approximation that is MISTAKE again MISTAKE. Approximation is when author state that the formula is approximation and precise answer is different.

  66. Andrew Polar Says:

    I don’t always understand what you say, but you must completely understand what I’m saying and I’m surprised that you did not confirm this yet. Please say clearly do you agree that Claude Shannon saw statistical compression from wrong perspective. He simply knew from experience of making binary trees that bit length of compressed symbol is about -log2(p) and took it as limit in statistical compression without even considering number of permutations. How message of 100 bits with 50 positive bits can be uncompressable? Total nonsense, we know statistics we can exclude giant number of messages where number of positive bits is not 50. 100 bits is necessary to represent all possible messages with any number of positive bits not only 50. Please stop playing misunderstanding we are scientists.

  67. Sachin Garg Says:

    Andrew, if it is already known that we have 50 1s in 100 bits, AND we follow Shannon’s formula, we can still get the encoded-size you mention.

    -log2(p) uses probability, and with the additional information that we know (50 1s) we can calculate probability p better and thus encode in less than 100 bits.

    This doesn’t makes Shannon’s formula wrong.

  68. Jarek Says:

    I’m sorry, You are wright: for very small files, the number of combinations is a bit smaller.
    We need 97 bits to encode 100 bits with 50 ‘1′, or 995 when 500/1000 or 9994 when 5000/10000 or 99992 when 50000/100000.
    But the problem is that we have to remember the number of bits too - ceiling(log2(50))= 6.
    Even without rounding up, we get more than 100.
    It’s because in fact we need a bit less for storing ‘50′, because it’s probability distribution isn’t uniform. To store it we need:
    So if we do it exactly - in sum we get … 100 bits! :)
    Miracle? no!
    As I’ve said - You are proposing bijection between representations.
    It doesn’t change the number of possibilities - so to choose one of them we still need the same amount of information!

    The equation I mentioned (p\in(0,1)):
    log2(n!/(pn)!/((1-p)n)!) =~ n*h(p)
    is asymptotic behavior (use Stirling’s formula)

  69. Andrew Polar Says:

    I’m glad Jarek understands me. I mentioned that mistake is theoretical Shannon’s entropy is close to realistic limit otherwise it would have been noticed. And I said that Shannon introduced limit from experience pretending that it was some sort of theory. However, you all speak about upper limit. If we number all permutations as 1,2,3,… and so on we can have particular case where our particular permutation is assigned number twice shorter than a maximum which is log2(100!/50!/50!), how about that?
    But all this is only encoder of order 0. What if I add other conditions. If I introduce sort of Markov matrix
    30 70
    20 80
    which says that in our 100 bit message with 50 positive bits probability of 0 after 0 is 0.3 and probability of 1 afer 0 ia 0.7 and so on. We need to output 4 additional numbers but the number of possible permutations will be reduced drammatically. I don’t know how to enumerate all possible permutations in this case. Do any of you know that?

  70. Jarek Says:

    To choose one element from set of size 2^n without probability (uniform), we need EXACTLY n bits - first chooses between first and second half, etc…

    About the probability - if we would store it exactly (like 50/100), than as I’ve said before - we would have no compression.
    Ok - You can say that probability is in fixed range, but again if calculating exactly - we finally would need again the same number of bits (100).
    You won’t cheat math :)

    The trick of compression is loose (statistical) approximation of these probabilities.
    In this case, wanting or not - we have to allow some variance (eg of number of ‘1′) - we don’t restrict possibilities to some set as before, but use natural statistical behavior of given model - with small variances around flat maximum of entropy.

    About the Markov’s chain …
    Having statistical conditions (correlations) it’s a bit weird wanting finite deterministic sequences fulfilling these conditions.
    On the other hand, if we want such asymptotic behavior or infinite sequence … just use ABS :)
    Of course we would get some small variances of this statistics, but THANKS OF IT we can get compression.

  71. Andrew Polar Says:

    You and Sachin are showing political correctness. Why don’t you admitt that Shannon exposed scientific sloppiness in his article by taking limit from experience and making it looking like scientifically defined.
    Another part. You must easy see that compression based on permutation could be better than entropy encoder but you turn around and try to deny. What if we take data after ABS and split them into 16 bit fragments. We can make easy look up tables and convert every 16 bit fragment into sequential position order in permutation list. The length of every of such fragment is below entropy. In addition some of such fragments will take beginning indexes and be much shorter. We may have few particular cases where 16 bit fragments turn into 1 or 2 bits (just particular cases). Second and important part that you don’t want to see. We need to output nubmer of ‘1′ for every fragment. If data are after ABS what you expect. You know that distribution of mean values for groups is normal even when taked from uniformly distributed population (starting from group size 30). Same as average weight of passengers on board of aircraft. Number of ‘1’s may go to a different stream that is subject to a further compression. I’ll write this program and show you that the data after AE or ABS can be further compressed, probably not very much, but we speak about principal.

  72. Matt Mahoney Says:

    Andrew, enumerative coding was developed by R. V. Tomic in 2004.
    It is claimed to be faster than arithmetic coding. The drawback is that it only works for independent bits with fixed probability. It was discussed at length in comp.compression (under the pseudonym nightlight).

    However it does not compress smaller than the Shannon limit. It is true that there are less than 2^n permutations of n/2 zeros and n/2 ones, so you can code it in less than n bits, but you also need to transmit the number of zeros and ones, which exactly makes up the difference. There is no mistake in Shannon’s math.

    If you don’t believe it, go ahead and write your recursive compressor. You can easily compress to n-1 bits because if you know the number of zeros and ones then you know what the last bit is and don’t have to send it.

  73. Jarek Says:

    Compressor is a program which makes some bijection between inputs and possible outputs.
    Andrew - if, as You suggest, we would enforce that the output has always the same length, than because we need the same number of possibilities as input has - the output has to have at least the same length as the input.
    The trick of compression is to choose some expected statistics (maximizing entropy) which will be converted into short output, but the cost is that pessimistic length of output has to be much larger than input’s length.
    For example if we encode 01 sequence which should has expected statistical behavior: p(0)=0.1, with nonzero probability we can get sequence of one hundred zeros, which will be converted into much longer sequence.
    The trick is that it occurs extremely rarely.

  74. Andrew Polar Says:

    thank you for link. I already made estimation program for enumerative coding. The result shows 3% better compression, however I need to check everything because it does not make round trip, only estimation.
    The discussion was about theoretical aspect. It was clear that improvement is tiny if exists. Shannon did not mention number of possible permutations because he had no idea how it is related to entropy and introduced entropy from experience. This is what people persistently don’t want to admit and shift subject to size from formula like you, Marek and Sachin. Size is the same but formula is different
    log2(n!/p!/q!)/n is close to log2(n^n/p^p/q^q)/n.
    (p and q number of positive and zero bits, n is size)
    The last is entropy and previous is correct one. The last is upper assimptotic limit for previous and difference is fraction of percent for sizes over 1000. This is how it should be explained.

  75. Andrew Polar Says:

    Another thing that you did not mention. What happens if we provide Markov matrix instead of simple probability of positive bit. How to calculate number of posible permutations for given Markov matrix? Did anyone try that?

  76. Jarek Says:

    The first equation has the same asymptotic behavior (use Stirilng formula: n!=~sqrt(2\pi n)*(n/e)^n).
    If You want the average entropy(logarithm of number of such sequences) for given Markov’s matrix (see my paper):
    -\sum_i p_i \sum_j S_ij log2(S_ij)
    where p_i is normalized eigenvector corresponding to dominant eigenvalue(1) - probability of i-th symbol in the sequence.

    Andrew - You are proposing some general idea of compression which doesn’t look into file structure…
    See that ‘typical’ file - the most probable we will get if we don’t know anything about its structure is uncorrelated sequence of 01 with p(1)=1/2 - we just cannot compress such files!
    If You want to construct a compressor, You just HAVE to assume that the file has some structure … and try to make as good as possible model of it…
    The simplest assumption is that near bits are somehow correlated - we get for example Markov’s model this way…

  77. Andrew Polar Says:

    One more reason of not to use ABS and use simple range encoder: I made simple routine that generates binary array with given probability of positive bit. Then I calculated entropy bitwise and bytewise. It appeared that entropy times size is the same number for both. That means that any binary data can be passed to regular range encoder and processed by bytes. As it is known range ecoder makes one multiplication/division per byte not per bit. If to use lookup tables as Marek suggested it is no longer ABS but lookup. Because lookup tables can be made by many different ways including simple manual design from experience with no theory behind. Marek, try to beat my range coder from in speed. But process data by my encoder bytewise and let me know what happened.

  78. Andrew Polar Says:

    One more thing. If we process only bytes I can optimize my range coder and make look up tables as well and in this way to replace multiplication and division by look ups. I did not do that on pupose because I wanted to keep it as generic as possible. It process integers within wide bit length, alphabets from size 2 to size 20,000 with no change in code.

  79. Jarek Says:

    Ok, let’s compare Range coding and ANS(ABS): both uses transformation:
    x(x_d,d), where x_d is about x*p_d
    The difference is that in ANS we are looking for d on the least important position, in RC on the most important.

    The advantage of ABS is that because symbols are placed uniformly, we can use the buffer of constant size, and put/get bits on the least important positions of it.
    In RC symbols are placed in ranges, so if we would try to do the same as in ABS, the higher ranges would be visited less frequent.

    So in RC we have to use various size of buffer and some times just put the remainings to output: we lose PRECISION, need more computation time …
    ANS thanks of constant buffer size is much more precise (better compression rate) and flexible.

    I think the only use in which RC could(?) concur with ANS in which we don’t have (yet?) general formula, is while encoding large symbols and the statistics varies very much through the whole file.
    But the cost is precision … and RC would use multiplications instead of table check, so I think just to initialize ANS a few time would usually be faster.

    And thanks of that the digits are placed uniform in ANS, but only in statistical way, we can make great cipher by choosing exact placing.

  80. Andrew Polar Says:

    When we speak about compression the only things that represent interest is speed and limit and limitation on alphabet. The limit is the same, the seed is faster in RC, the alphabet is virtually unlimited in RC. This may be grave for ABS. Make it faster at least for binary and publish your program. If you beat my RC in speed I will make it significantly faster by replacing multiplication by look up in the same way as you want to use it in ABS. The details that you mention may represent interest for some particular implementation in rare cases.

  81. Jarek Says:

    There is one more interest - precision - how exactly we approximate expected probability.
    The better precision, the closer to theoretical entropy - the better compression rate.
    And ABS/ANS is much better in this.

    With RC using large alphabets makes problem with gigantic multiplications and has large precision lost.
    ANS can use large alphabets too and thanks of constant buffer size, practically doesn’t have above problems.
    The disadvantage I’ve spoken about is that for every probability distribution, we should initialize new tables.
    Eventually, because the changes of probability distribution aren’t usually large, so we could just update the tables near some boundary - replace some symbols with another symbol.
    About implementing it - I’m mathematician currently working on a few completely different projects … and I completely doesn’t have experience with low-level fast coding.
    This should be written by compression expert and for specific predictor…

  82. Andrew Polar Says:

    I don’t know how to write ABS that runs faster than RC and don’t know how to handle large alphabets in ABS other than by binary partitioning that is very slow for large alphabets and I did not see other implementations where it is done by other people. I have no reasons to protect RC from being a history, I did not invent RC. ABS is interesting but for the moment can not compete with RC. I thought ABS is faster for binary data but the fact that binary data can be processed by bytes or even by groups of 12 bits and make same size compressed data changes everything into favour of RC.

  83. Jarek Says:

    I’ve told how to speedup Matt’s binary implementations - don’t use stretching in fpaqa (one table check instead of two) or use larger buffer in fpaqc…
    ABS for groups of bits is ANS which is practically implemented on the link I’ve placed her three times…
    Takes practically one table check for group of bits and a few binary operations.
    It’s similar to RC, but uses constant buffer size - it just has to be faster and more precise while processing!
    But requires initialization…

    As always - the time will show…
    Best regards

  84. Andrew Polar Says:

    OK, I compared again fpaqc as it is for the moment to RandCode.cpp on the same data sample. Time for fpaqc is 0.26 time for rancode is 0.22. Compressed data buffer is 486,685 for fpaqc and 474,460 for rancode. Obviously, I altered rancode to read and write data to a file to reproduce same conditions. The advantage of rancode in performance is obvious although insignificant. In addition to that fpaqc is limited to binary data while rancode can process large alphabets. rancode also can be optimized for binary data in the same way as fpaqc by replacing multiplication and division by look-ups.
    Good luck in your further research.

  85. Jarek Duda Says:

    A few things…
    1) I’ve found that there is a problem with ABS - maybe it’s the instability Andrew talked about…
    The problem is that each symbol has to appear very precise number of times, and some times the formula for ABS don’t do it. It would occur very rarely, especially if the range is huge as in fpaqc.
    I’ve describe the problem on the cryptosystem page and I’ve just seen that there is problem with ABS formulas.
    I’m just rewriting the section of my paper in which I’ll show some solutions (sometimes change one value or formulas).
    2) I’ve checked fpaqc source - he’s already transferring 8 bits at once. I’ve finally also analyzed arithmetic coding - I’ve underestimated it - it also uses constant buffer size. ABS should decode as fast as AC and encode a bit slower.
    3) The real advantage of ANS is that it’s one-stated: in opposite to AC, it can be cheaply tabled (… and large freedom of random initialization for encryption purposes)

    I’ve also written a demonstration about ANS almost two weeks ago … it should show up in a day or two(?) here:


  86. Jarek Duda Says:

    I’ve analized the problem - fpaqc isn’t affected!
    But fpaqb can be…

    If we work in [2^R,2^{R+l}-1] range (fpaqb l=1, fpaqc l=8),
    in ABS using the formulas the problem occures iff ‘Ceiling[r*2^{R+l}] is odd’.
    In fpaqc the ‘R+l’-th bit is out of precision of q - is automatically 0 - everything is ok.
    But in tabled versions (fpaqc) we uses smaller precision.
    Thanks of tabling -the problem can be easily corrected by changeing slightly initialization:
    - round q so that Ceiling[r*2^{R+l}] is even, or
    - change one value in the initialization tables - the last number should create opposite bit than it currently, or
    - when the ceiling is odd, use alternative formulas: with floors
    x=floor(x’/(1-q)) for d=0
    or = ceiling((x’+1)/q)-1 for d=1

  87. Jarek Duda Says:

    But while initializing fpaqb we uses all possible probabilities just below this precision - ‘R+1′-st bit of q is automatically 0 - everything is ok!
    No corrections are needed!

    I wanted finally write this section solidly and I’ve finally simulated the formulas … and I’ve panicked … but everything is fine :)

  88. Sachin Garg Says:

    Great, congrats :-)

  89. Andrew Polar Says:

    The biggest disadvantage of ABS is only binary capabilities compared to large alphabets in range encoder. Especially my version that can process alphabets of 10000 symbols easily. As I already mentioned you can switch from small alphabet to large one without change in compression ratio. I generated binary array with certain probability of zero bit, say 0.8, but processed array byte by byte and received same compression ratio as I would do it for bits. This is what you use when process it by bytes. The opposite is not true. You can not easily convert large alphabets into smaller alphabet without paying a price. For example if we have message of 100 symbols with alphabet A,B,C and frequencies 60, 30 and 10. The entropy is 1.3 per symbol the compressed array is 130 bits. If we try to replace A by 00, B by 01 and C by 11, we have 150 zeros and 50 ones. Entropy is 0.81 but the length is 200 bits so the length of compressed array is 162 bits. Other ways of changing alphabet size are complicated and may work only for cases of small alphabets not when alphabet is 1000 and more. So, you may approach RC capabilities in the future for case of binary data, which is rare case in real life.
    By the way, most patents on AC are expired. I showed that method that I use in my RC was mentioned in reference in fundamental article of Mr. Langdon in 1984, who was AC guru at that time and the article was published in fundamental journal. World known guru in data compression acknowledged the method, I use, in 1984 as previously published. I can not imagine better prior art reference.

  90. Jarek Says:

    But ANS doesn’t have this disadventage…
    Ok - there are no quick to calculate formulas, but it’s one-stated - unlike RC we can table it.
    So if the probability distribution doesn’t change too frequent, RC needs 2 multiplications while ANS needs for the same symbol one check of table…

    While correcting the section in paper (pages 10-16), I’ve realized how terrible it was … sorry … I’ve rewritten it completely. I hopes it it’s much better and I would be gratful for any comments……0710.3861v3.pdf

  91. Jarek Says:

    sorry for the link above - please exchange it to

  92. Jarek Says:

    Yeah … there were a few small things to correct…
    I belive it’s fine now (?) - the same link.

  93. Andrew Polar Says:

    When you write the article it is your responsibility to explain everything clear. You introduced only simple algebraic formulas but explained it in such a way that it is not possible to understand what is what. Try to explain your concept of ABS to middle school student and ask him to write article for you he will do it better and people will understand what you are trying to say.
    What is ‘q’ above formula (20). Is it probability of 0 or 1. Does it matter if q > 0.5 or it should be always below or equal 0.5. Formula (23) is not decoding it is encoding because division by probability makes number growing. When we do encoding or decoding we are using single state variable that takes new value on every step, why don’t we use S[i] and S[i+1] in notations. Formula (22) defines which bit is next in decoding. Why not to denote it as b. Formulas (20) and (21) should show decoding process, why you use notations x1 and x0 when we are speaking about same state variable that denoted in (23) as D. Formulas (20) and (21) will not work if initial state is 0, it continue convert it to 0 on every next step.
    One more thing: Topology has nothing to do with data compression. Your article is set of unrelated to each other topological statements written in a wrong English, which are true when understood but don’t represent anything new or not trivial. Software engineers see topology as a science similar to the one used in middle ages when people argued how many angels can balance on the head of a pin, that means interesting for selected people and totally unrelated to real life.

  94. Jarek Says:

    Thanks for the remarks.
    1). what q means? two lines above (20) is: denote q=q_1 and it doesn’t matter where it is in (0,1)
    2) The paper is about encoding data into symbols - opositely to compression - coding/decoding are switched, as is written in introduction to section 3.
    3) About notation… I don’t remember where it is from :) It’s just representation of the data in ANS… and I wanted to be consistent with previous versions… D is from decoding…
    4) ‘0′ represent choosing one of zero elements … we cannot encode here anything… while making shift right in binary system, we also fininsh in 0 after some time…
    5) While splitting Markov’s chain into topologically separated subchains - there are no correlations between elements in these subchains. In this paper I generalize it to higher dimensions - to find methods to handle with higher dimensional correlations. I’m sorry but as a physicist, I see statistical physics in data compression - there is plenty of topolgy here… in image compression You can divide image into topologically separated by eg background regions…
    6) I’m sorry for my English … it was my first paper and I coudn’t find anyone here to read it. I would be very greatful for any comments

  95. Andrew Polar Says:

    thanks for reply,
    I, probably, used some strong expressions in my previous mail, I appologise, but I still support the matter of my previous message. Your article is hard to understand and it could be easily rewritten in understandable way. About topology: it is possible to split image into topological regions but it will not become smaller after that. Those gurus of topology who wrote books never contribute to image compression and opposite engineers who contributed to image compression were not known to write articles about topology. Your algorithm is newly discovered property of fractions that shown in your article totally independent from other ideas which are expressed in topological terms that raises questions why that all sitting in one article, because it takes time to understand it and to see that it could not be used in image compression at least immediately. I’ll return to technical questions in the next mail.

  96. Andrew Polar Says:

    I compared article to the first one mentioned on the top and did not find big difference regarding compression technique.
    Do I understand things correctly:
    I start from decoding:
    Step 1. we identify the bit
    b = ceil[(s+1)*p1)] - ceil[s*p1], where p1 is probability of positve bit and s is integer called state. b is integer that takes values from (0,1). When we know the bit we can apply formulas for decoding
    s(i) = floor[s(i+1) * p0] if bit is 0
    s(i) = ceil [s(i+1) * p1] if bit is 1
    where s(i) is state and (i) shows iteration number.
    Encoding is done by formulas
    s(i+1) = ceil {[s(i) + 1]/p0} - 1 if bit = 0
    s(i+1) = floor{s(i)/p1} if bit = 1.
    Is that correct? and is there something new in your last article?

  97. Jarek Duda Says:

    The article main goal is written in its title :)
    On the way I needed good coding, Huffman was imprecise, I didn’t know arithmetic, so ABS comes up…
    I agree that topology isn’t generally too helpful in data compression, but can be useful to handle with models with some local constrains - it’s very special, idealistic case, but can be used for example to increase hard disk’s capacity.

    The problem I’ve talked about concerns stream versions - we have make I_s to be b-absorbing. Other new things is solid mathematical explanations and the figure that shows that this encoding looks just … natural - sequence of bits on one side, sequence of symbols on the second, and the state between them that contains some data and using it we change one sequence into the second.

    There are plenty of possible formulas - it’s important that x_s sums to x and that x_s is about x*ps.
    There should be my demonstration on Wolfram’s page … their editor has accepted it a few weeks ago (???)- it should be obvious then … I can mail it now if somebody wants.

  98. Jarek Says:

    The demonstration is finally available:
    It shows that even with tiny tables and simplest statistical initialization it works nice.
    I’ve checked correlation tables - the probability distribution of sequences has is asymptotically going to be consistent with central limit theorem for uniform probability distribution - it can be used as random number generator.
    I wanted to make it educational - simply written.
    I made remarks how to write it practically on the cryptosytem site.

  99. Andrew Polar Says:

    Thanks for demo. Your understanding of ‘simply written’ is different from mine. I did not find it ‘simply written’ but ‘better written’ than original article. You wrote original article as if you purposely tried to reduce number of people who might understand it to possible minimum. This demo is better and I understood more things but not all.
    The remark about Wolfram implementation: It looks that they also like release things in the most complicated way possible. Your demo includes few GUI controls and real time updated tables with numbers. All that can be simply written (in this case it is literary) in JavaScript, be browser and operating system independent, sitting in one HTML file and would not require this 85 Meg MathematicalPlayer.exe that many people would refuse to download. In this case your demo could be also saved, passed in e-mail attachment and executed locally. But that is only a remark because if people start developing software as simple as possible 90% of programmers go unemployed, we don’t want this to happen in America.

  100. Andrew Polar Says:

    Sorry for persistently adding critical remarks but I want to do it again. In the description of mentioned above DEMO I found again a lot of meaningless unimportant details and missing of critical part. The decoding can be easily derived from exposed numbers. For example we start from 010100 because it is default length and address decoding table. We find there number 010100 (which is 20) and can see there decoding result 1 new expression 1010 and number of bits to read. The missing part is how decoding table is generated and how encoding procedure works. I appreciate if anybody can make it clear.

  101. Andrew Polar Says:

    I think in Jarek’s DEMO and Matt’s implementation the encoding/decoding is generalized to the level of manipulation of look up tables and went away from ABS concept. Their approach is the following: We have processing buffer changeable on every step and not necessarily of same size. We conventionally call this buffer values NEXT and PREV. We have processed SYMBOL on every step and few bits that are read from decoded stream or passed to encoded stream depending on operation. We need to know only length of this bit expression, we call it BITLEN. Now look at the single table row

    10011001 3 011001(11) 2

    If it is encoding and our processing buffer is 01100111 and encoded symbol is 3 we output to encoded buffer two last bits 11 and turn processing buffer into 10011001. Then we do same operation for next
    SYMBOL and if everythig is correct we should find match in table for new SYMBOL and new PREV value. In decoding we have NEXT value known. We should identify unque row, extract symbol 3, get PREV value as
    011001 and read 2 bits from outstanding encoded buffer. We can easy see that decoding works if for every PREV value and bits read from stream we can find NEXT value that have unique match with new PREV value.
    Here we have interesting mathematical problem that have nothing to do with topology and very few with numbers theory. It is how to create optimal table for given distribution of symbols (the alphabet is not
    necessarily dual). ABS cover only one way of making such tables. It does not prevent new ideas that could be patented if different from ABS.

  102. Andrew Polar Says:

    In my previous text

    NEXT = 10011001
    SYMBOL = 3
    PREV = 011001(11)
    and BITLEN = 2

    The table did not pass through clearly.

  103. Andrew Polar Says:

    I implemented second more advanced version of ABS coder that can be found at
    It manages growing size of the state variable differently . It is easy to understand and customize. Now the things with binary ABS coder more or less clear. I’m still wondering if there is a way to handle multiple alphabet case without binary partitioning of array?

  104. Andrew Polar Says:

    I see that interest to entropy encoding is exhausting. I found one application for ABS. It can treat assymetry in bits distribution after Huffman encoding. I provided example at
    In spite of all attempts to write fast range or arithmetic encoder, Huffman encoder at least twice faster the fastest range encoder. It, however, may produce assymetrically distributed bits that can be corrected by ABS coder and brought the result after Huffman to the length of entropy encoder.

  105. Andrew Polar Says:

    I added one more section in my description of ABS algorithm that shows related concept of entropy encoder implemented without formulas introduced by Jarek:
    It can be generalized into larger than binary alphabet, however, it is still limited to small alphabets due to computer resources. I believe that my explanation is generic when Jarek concept with formulas (no matter which ones) is particular case.

  106. Andrew Polar Says:

    Regarding usage of different table values in ABS coder for encryption. Now it is clear that it is totally wrong way. It will take 15 minutes to break this code. You can look at my explanation of multi-alphabet ABS ( The tables may be fully expressed as sequence of bits. The tail of that sequence is key to decoding message starting from end. Cryptanalyst can implement brute force attack on tail only and continue it as soon as something making sense appear and then continue with other part, so message can be deciphered by parts.
    There is also other part that I wanted to mention. Now Jarek’s concept is generalized to the case of multiple alphabets. The encoder that I described is pure logical it uses only logical operations for both encoding and decoding and it is so simple that can be explained over 15 minutes to someone who intuitively understands probability concept.

  107. Andrew Polar Says:

    I achieved computational stability for small alphabets in my version of ANS coder that I call Logical Entropy Encoder or Enlogen. It is faster than best arithmetic encoders. I compared it to my best version of arithmetic encoder RanCode that has same speed as FastAC. RanCode made round trip (encode/decode) with 8,000,000 small integers for 1.1 sec. and Enlogen needed only 0.8 sec. The alphabet in both cases was 25 symbols. The advantage is smaller than anticipated but it it noticeable at no doubt.
    In the next upgrade I want to make Enlogen to be able to process large alphabets effectively.

  108. Jarek Duda Says:

    It was ‘obvious’ that the probability distribution of states we are visiting is uniform…
    I thought lately about it … it’s not true! Smaller states are more probable!
    We have uniform distribution not for x but for lg x, what should be seen from the paper…
    So the probability that we are in state x is proportional to 1/x.

    I’m sorry for this mistake. Fortunately it shouldn’t change anything.
    It looks like knowing that it’s a bit more probable that we are in smaller state should reduce a bit safenesses of the cryptosystem, but in fact guessing the state isn’t practical way of attack - without knowing the coding/decoding function we cannot verify the state, we will lose it in one step…

    I thought also about using ABS/ANS to data correction - it has some nice properties I haven’t met in other methods - resistance to fluctuations of error density.
    Data correction is done by adding usually constant density of redundancy. But error density isn’t constant - it can fluctuate - sometimes is above average and it can exceed threshold we can correct - we loose whole block, sometimes is below and we’ve placed there more redundancy than it was really needed.
    In standard methods we loose these surpluses, using ANS we can transfer them to help with pessimistic cases.

  109. Jarek Says:

    We can use ANS entropy coding property to make its data correction property quicker and distributing redundancy really uniformly:
    to create easily recognizable pattern, instead of inserting a symbol regularly, we can just add a new symbol to the alphabet - the forbidden one.
    If it occurs, we know that something was wrong, the nearer the more probable.
    The larger probability we will give to it, the larger file will be, the easier it will be to correct.
    I’ve described it with examples on the link from the previous post.

  110. Jarek Says:

    I’m sorry for the link, I’ve just noticed it has changed, here is the correct one:

  111. Jarek Says:

    I’ve just realized that we can use huge freedom of choice for the functions for ANS to improve latency of data correction - we can make that if the forbidden symbol occurs, we are sure that if there was only single error, it was among bits used to decode this symbol.
    The trick is that the forbidden symbol usually dominate in the coding tables, so we can make that if for given transferred bits we would get allowed symbol, for each sequence differ on one bit (Hamming distance 1) we would get the forbidden symbol.
    I’m describing it on the forum I’ve linked.

  112. Jarek Says:

    I’ve just realized that Hamming and tripling bits are some special cases of ANS based data correction :)
    They are kind of degenerated cases because the intermediate state after bit transfer, has only one possibility, making the blocks independent.

    If it could have many values, we would transfer some redundancy - the ‘blocks’ would be somehow connected and if in one of them would occur more errors, we could use this connection to see that something is wrong and use some unused redundancy from succeeding blocks to correct it - we use the assumption that according to error distribution, the succeeding bits are with large probability correct.
    It correspond better to the real probability distribution of errors than Hamming, in which all 8 possibilities - 7bits were correct or one of bits were changed - are treated equally … and we are forgetting about the possibility of more errors.

    I apology for putting last posts on this blog (more information is on the linked forum), but I’ve realized that data compression and correction are highly connected.
    In the first we want to remove some statistical, difficult to practical use redundancy …
    in the second we want to distribute uniformly some deterministic, easy to use redundancy …
    And in both cases, the closer to the real statistics we are - of symbols or errors, the closer to the optimal coding we are - the less bit we need …

  113. Jarek Says:

    To visualize this connection, let’s think about theoretical limit of bits of redundancy we have to add for bit of information for assumed statistical error distribution to be able to full correct the file.
    To find this threshold, let’s think about simpler looking question: how many information is stored in such uncertain bit?
    Let’s take the simplest error distribution model - for each bit probability that it’s switched is equal e (near zero), so if we see ‘1′ we know that with probability 1-e it’s really ‘1′, and with probability e it’s 0.
    So if we would know which of this cases we have, what is worth h(e)=-e lg(e) - (1-e) lg(1-e),we would have whole bit.
    So such uncertain bit is worth 1-h(e) bits.
    So to transfer n real bits, we have to use at least n/(1-h(e)) these uncertain bits - the theoretical limit to be able to read a message is (asymptotically)
    h(e)/(1-h(e)) additional bits of redundancy /bit of information.

    So a perfect data correction coder for e=1/100 error probability, would need only additional 0.088 bits/bit to be able to restore message.
    Hamming 4+3 instead of using additional 0.75 bits/bit, looses 16bits/kilobyte with the same error distribution
    … we see that there is a lot to do…

  114. Jarek Says:

    I was thinking lately about some approaches to computation which could for example endanger used cryptosystems.
    It’s difficult to say if they have a chance to work, but I thought about protecting against such eventualities. They would require relatively short time for example for verifying validity of the key, so protected cryptosystem should for example require enforcing long initialization time, like cryptosystems based on ANS.

    So I thought about it’s safeness again…
    The fact that distribution among possible states isn’t uniform, but decreasing (approximately 1/x), makes that ‘0′ are a bit more likely produced than ‘1′.
    So a good statistical analysis of huge amount of data could probably say something … (?)
    To protect against something like that, we can inverse (NOT) bit transferred while coding e.g. even symbols - one symbol encoded normally, next with inverse…and so on.
    Of course probability distribution of symbols shouldn’t be uniform, because Fourier analysis could see something.
    Generally if we need only a really good encryption, the best would be using two steps - encode into symbols (with randomly chosen probabilities) and then back into bits (with inverse).

  115. Andrew Polar Says:

    Jarek, I have a question.
    Encryption is not the topic of this forum, however I see it is possible to use ANS coder for encryption with double run. People can look at my implementation of ANS coder to get an idea. The probability table is actual cipher. Changing the probabilities changes output. However, single run does not provide protection from brute force attack because probabilities table can be presumed and backward decoding can expose tail of the message, so attacker does not need to execute brute force attack on the whole message and therefore can run many combination in short time. The situation changes when we perform double run. That means we encrypt message once and pass encrypted data to another cipher. Since decoding is backwards the attacker has to decode complete message with one combination and get successful first run, then try another cipher, which makes brute force attack virtually impossible for large message. Now the question: Is that what meant under using ANS coder for encryption or you have different algorithm in mind?

  116. Jarek Says:

    If cryptosystem uses short blocks and has relatively small amount of internal states, it looks it is vulnerable to ‘exposing tail’. This kind of attack require adaptive scenarios (wiki says that AES needs 7-9 such rounds to be broken). There is simple protection - chose initial state randomly or add a few random bytes on the beginning\end of the file.
    (brute force is something different - generating keys randomly)

    Important advantage of ANS security is that different symbols has different probability and so length of blocks. Because of it, if there even were some small correlations, they would be impossible to use.
    But if we want only to encrypt, we have rather uniform distribution on input. Using PRNG initialized with the key we can generate some probability distribution of symbols for the intermediate step. Now one step of coding is to use decoding table to generate a symbol and instantly use coding table with this symbol to generate output bits - we have 2 separate states (square of the number of possibilities).

    Encoding twice - forward and backward is something different, but also gives great protection and is advised if would like to use ANS to construct hashing functions. We could also use symbols with random probability distribution for the intermediate step.

  117. Andrew Polar Says:

    One year passed since first publications and running samples of computer programs for ABS coder. If you did not file patent application in USA yet, you can not do it any more and the concept of ABS coder can not be patented in USA. I decided to write archiver for DNA sequences and want to use slightly modified ABS concept. I want to make it free and public, please confirm that you did not file patent on ABS (binary concept) during this year.
    Thanks, Andrew Polar.

  118. David A. Scott Says:

    I have a feeling that the patents in compression are so vague that even if no work done in this area some group with money will claim somewhere down the line that they have a patent that covers it if it makes them more money.
    I have a question concerning ABS as a pass before encryption. Have any of you made it bijective. That is
    if you use ABS and then encryption. Will every test key lead to a file that kind have been compressed with ABS?
    If you don’t then no matter how many passes of ABS you use the attacker can tell immediately after one pass that the key if wrong will most likely fail.

    PS the random data fix is not that easy random is hard.

  119. Jarek Says:

    This whole intellectual property thing became paranoic nowadays. Simple and fundamental ideas like ABS or maximal entropy random walk should be found a few dozens years ago … I never wanted to patent them and I want to believe that nobody will manage to do it.

    If You want to make ABS+encryption, I believe it’s better just to use ANS. Yes I know - there is still much to do: choose proper form and parameters and the most importantly - prove its safeness. Generally I would gladly cooperate if someone want to work on these ideas.


  120. Andrew Polar Says:

    Answer to David:
    I don’t do encryption but the ANS coder can be bijective. Look at the very bottom of my article
    section 11.3. The table represent key. In order to be bijective columns A,B,C must contain all sequential numbers from 2 to 31 without repetition. Each column may contain different number of digits but they have to be sorted in ascending order. That is condition of bijectivity. The number of combination become extremely large when you extend maximum table value to 65535 and number of columns to 256. Attacker needs presume whole table and decode one way, then presume the other table and start decoding other way. Well, brute force attach impossible, but I did not investigate other attacks. In spite of all evidence of the advantage of this method over the others it is ignored by cryptoanalysts. I saw huge number of sites and articles in internet published by so-called cryptoanalysts or individuals claiming the that encryption is part of their scientific interests. They all play deaf. I don’t have time now because I’m working on DNA archiver but, if this method will continue be ignored, I’ll make some demo with incentive. Something like encrypt message, publish cipher, give $10,000 in cash to my attorney with instruction to pay anyone who present correct deciphered text first. May be that will work with 3rd world. For American cryptoanalysts it is too small to move the finger over the keyboard.

  121. Carlos Eduardo Olivieri Says:

    Hi, all.

    People, help me please. Does someone knows if exists any software able to compress RAR files? Even not be all, some files?

    Thank you.


  122. Carlos Eduardo Olivieri Says:

    Sorry for the mistakes in English. :(

  123. Andrew Polar Says:

    Short answer is no. After any compression the internal structure of the data is destroyed and result have no correlation in data, so it can be compressed may be on few percent. If RAR compression ratio does not look impressive you can decompress them and apply other compression tool. The best known tools may give you 30% improvement compared to RAR but it may take a lot of time such as 30 seconds for 1 Meg file.

  124. Carlos Eduardo Olivieri Says:

    Thank you very much, Andrew.

  125. David A. Scott Says:

    Answer to Andrew

    If you could bijectively compress your ABS in the space of files only composed ot ASCII ones and zeroes to ones and zeroes. I could handle the modifications to make it fully bijective in the space of all bytes files to bytes files. I think it would be very easy especially if you made it bully bijective to space of ones and zeroes in the first place. But even if you only made to something like huffman in the end meaning the last thing written has form like a standard huffman it would be easy to modify it. It would take a little more time but even if last state a high and low like normal arithmetic it would not be terrible hard to do.


  126. Andrew Polar Says:

    David, I don’t understand your last message. It sounds like you do not understand how ABC an ANC works. They don’t have high and low and completely different from arithmetic coding and Huffman. I can answer your questions only after you able reproduce concept from section 11.3. And I can say again I’m not doing research in encryption for this moment, just see this opportunity to use ANC for encryption at no doubt.

  127. David A. Scott Says:

    Andrew I looked at the example in section 11. It appears each time a symbol occurs you always change state and sometimes ouput either a 11 0 or 00 ths could
    not be easily made bijective since on decompression you would never be able to decompress …010…

    However assume you allow any string to appear if at each point you either change state and output nothing or change state and output bits in such a way that every string is possible in a long set of bits. The only thing necessary to create freeends simlar to what Matt Timmermans or me has done. You don’t need the high and low numbers it was only a comment. To get an idea of how the freeends concept works check out Matt site at

    at each time you process a symbol you create a temporary end symbol. I do it in the space of all strings and then use my bit I/O routines to handle it you can see then used in

    There are simalar concepts once you get the feel of one the others are not that hard.
    Take Care
    David Scott

  128. Jarek Says:

    Please look at my paper or Wolfram demonstration - ANS is constructed to be bijective. We put/take bits until we are in given range, which is chosen so that it’s made unambiguously - each time we know how many bits we have to put/take.
    There also isn’t a problem with ending - the final state of coding is stored in the header of the file to be the initial state of decoding. Now we decode while we can - after some step we just get the end of file. Now eventually we can decode the information stored in the state.

    ps. If the correlations after inverting bits created using even symbols would be still too large, we can expand this idea - for example use/generate some sequence and XOR it succeedingly in cyclic order with bits created using succeeding symbols.

    For data correction we need that the number of internal state of the coder is much larger - we can create some table of short bit sequences of the same length and after each coding switch the youngest bits of the state with a sequence in this table and increase cyclically the position in it. Now decoding: decrease position, switch bits, decode. In this way we can cheaply make that there is as large number of states as required.

  129. David A. Scott Says:

    I have a feeling your concept of bijective is entirely different from mine. When I compress to bytes like in a simple arithmetic coder. I compress to the space of all byte files. Let X be any possible byte file
    Then if c() is compression from a byte file to a byte file and call d() the uncompression from a byte file to byte file. Then x = d( c(x) ) for every possible
    x and x = c( d(x) ) for every possible x. That would
    be bijective compression. One could also use every bit
    string instead of bytes. One could stretch this if one space was say the space of all characters and the opposite the space of all bytes. But its not clear to me how you can call yours bijective especially with a header it can be done but its usually much harder to do.

  130. Andrew Polar Says:

    It is clear what you mean under bijective. In my example I reproduced Jarek Concept from Wolfram although it was not easy to understand in the way he explain it. It is bijective.
    In encoding we have single operating buffer we call STATE. On every step STATE grow and we output bits in a special tricky way the Jarek did not explain. But in decoding everything is simple. In every decoding step STATE loose bits (getting smaller) and we read new bits to maintain same size. In my example this size is 5. At the first step we read 5 bits and process. In result of processing we extract symbol and STATE becomes smaller (let say 3 bits). We read another 2 bits, make it 5 bit long again and process it again and so on. No matter what are these 5 bits, they can always be processed, but give you correct result only for specific bits. In order to maintain bijectivity you have to make table in the way that every 5 bit expression is processed. For that to happen every 5 bit number must be in the table, now if you look at the table you can see all 5 bit numbers there. I used 5 bits for table to shown in real encoder this size may be 16 bit, that means all 16 bit numbers must be in encoding/decoding table.

  131. Andrew Polar Says:

    In Jarek Wolfram example the encoding part that is most complicated was not explained. The encoding is tricky. We start from any 5 bit STATE, let say 11111. As you can see we can not process this number 11111 because the table ends and we move out of the table. In this case we out as many bits as it is necessary for STATE to be processed. We output them, make STATE smaller and process it. And we follow this simple rule. Every time we can not process because it moves us out of the table we output bits until we can process. After that simple encoding the decoding works the way Jarek mentioned. We read bits after every step until we get predefined size.

  132. Jarek Says:

    I apologies, I haven’t understood.
    To make ANB/ABS bijective, there is no problem with making steps.
    A special header is also not necessary - while choosing b=2, states are in form: 1*******, so to decode a bit sequence, the first bits should be used as these stars. To encode pure symbol sequence, we can start with state=0 and until we get into the range, we make encoding steps without bit transfers.
    But… there is a problem with the symbol which makes 0->0. Another problem is on the end of file while decoding: for example we need 2 bits now, but there is only 1 and eof.
    To solve them we should store the length of the initial state of encoding in the first (a few) symbol. Retrieving this information makes that this symbol is changed into one of some proper subset of symbols … without the ‘0->0 symbol’.

  133. David A. Scott Says:

    Let me ask this question pretend your decompressing. Say you start with a known table 100000 your choice its in the decompression program. Can you decompress (well it’s ok it they grow) 0 1 00 01 10 11 000 … as separate strings to output strings that a simalar program could compress (maybe I should have said transform and reverse-transform) back to original strings.

    In the paper you reference you assumed you could have alphabet A,B,C with probabilities 0.45, 0.35, 0.2 say this is current example the program could contain starting state and tables. could you decode
    the set of all strings 0 1 00 01 10 … to the set of all character strings A AB ABC … B … C … with out gaps and reverse. If you can then the example would be bijective in my sense for this example.

    Asuume you can do the above. Could you do the more
    extreme example where you know you compressing from
    the set of all 256 8 bit symbols streams to the set of all binary strings. Look at it from decompress side can you start with an internal state for this build in changes in tables so that you could from the start handle decompressin any string 0 1 00 .. to the set of all strings of 256 symbols. Then it would be bijective in my sense.

    First of all I call arb255 my bijective byte file to byte file so it does not do directly do what I am asking of your code. It does do it internally. What I mean is take any sting any length of pure ones and zeros you represent this as a string of Ascii ones and zeroes til EOF. Then run a012b01 this compresses such a string bijectively to bytes. Then you could run unarb255 to produce the desired output of the 256 symbol bytes of 256 symbols where each symol is exactly 8 bytes. it would be childs play to have modifyed the code to work with few or more symbols they do not have to be powers of two.

  134. Andrew Polar Says:

    Jarek, It is not necessary in decoding to know the size and have end (EOF) marker. I do it in my example without that, you can look to my latest adoptive example abicon0 (I call it adoptive binary converter - abicon). It don’t have headers, end-of-data markers or any other labels. Since we start encoding from any predefined expression like 11111 the last decoding step is reading last bit operation. When we decode symbol and read last bit we are done. The condition is no-more-bits to read and expression (STATE = 11111) is 11111. We don’t need headers and sizes. The problem with 0 to 0 conversion is not exist in not binary alphabet.

  135. Andrew Polar Says:

    David, speaking about bijectivity you try to find what happen if you pass random bits into decoder. I think you mess. If we speaking about encryption the brute force attack is conducted in other way. Attacker have cipher he or she try different tables in order to obtain message. Attacker need to presume different combination in probabilities 0.1, 0.8, 0.1 for example for A,B,C, build new table and try. Attacker do not pass random data to decoder. Attacker pass valid cipher into different decoder. In this case there are only 3 symbols and size of digits is limited. In real case it will be 256 symbols with different probabilities assigned to them, so number of combination is large. In addition attacker must decode complete message, make another presumption and start decoding backwards. Only at this moment attacker can see if the first and second presumption is right.
    Again: attacker don’t pass random bits to encoder. Cipher is not random bits, it can not contain all 0 or some other weird combination. Attacker make table according to known stable and published algorithm. What attacker do not know is set of 256 numbers for the first run and set of another 256 numbers for the second run.
    Again: the decoder is bijective in a sense that you can make different table, try it on encoded result and get some wrong message.

  136. David A. Scott Says:

    I am lost how can it be bijective if it can’t handle the binary string 0.
    I am lost when you say an attacker has to try many guesses like .1 .8 .1 if its bijective would not any guess work smoothly. Sure it would not decode to the correct ABC sequence but no matter what guess he uses should it decode to a sequence than when coded with the same guesses comes back to the same starting string.
    Sorry I hope I am not asking to many questions.

  137. Andrew Polar Says:

    David, according to law U.S. citizen can not teach non citizen an encoding technique unless it is patented and published in The patented encryption is RSA. This is what I can teach anyone including Fidel Castro and Kim Jong Il. ANS/ABS is not part of it. No matter what is your nationality anything that is published in Internet presumed to be read by everyone in the world. I will answer your question but that will be the last one about encryption. I think you know the basic terminology. The original text is called message. The result of encryption is called cipher. The secret combination of numbers used for conversion message into cipher and back is called key. The question about bijectivity was brought in regards of efficiency of brute force attack. The attacker has cipher and want message. The attacker does not know key. The brute force attack is trying all possible keys. The attacker does not change cipher. Attacker try different keys. Again, it is a big difference between cipher and keys. Attacker can pass all zeros or all ones to a key but for what?. If attacker is not drunk and is good professional he/she try different keys. If cipher start crashing on different keys attacker can find out that particular key is wrong and try other key. That means we need capability for all keys to process cipher. This capability exist in ANS/ABS. The key is very large and may include 65,000 numbers. The attacker needs to decode whole message before he/she find out if the presumed key is right. I think I explained everything here clear for high school student.

  138. David A. Scott Says:

    Andrew forget about encryption.
    How do you make the code bijective even for the simple example in Chap 11.
    Andrew I looked at the example in section 11. It appears each time a symbol occurs you always change state and sometimes output either a 11 0 or 00 this could
    not be easily made bijective since on decompression you would never be able to decompress …010…

    Would it be to hard to show how for this example alone how would 00100101100 decompress to the sample set and recompress back to that string. I only ask it for the example your give in chapter 11.

    If its to much of a problem or bother sorry.

  139. David A. Scott Says:

    Andrew your quote about the US law is interesting. I have talked to legal people for free they don’t seem to think its so clear cut as you. The problem with laws is that they are made to be changed by judges and the political winds. The last two things I remember about the US law and crypto is that open source free systems done in the public are exempt. The sad thing is I remember stories where people where challenged to break a system and then when they did they got in trouble because they really didn’t want it known that it was so easily broken. Remember when the Russian guy was arrested for telling how weak the electronic book method was.
    If the government is so concerned about crypto why is AES open and free since they claim its good.
    That aside I still would like a simple example of what you code does. Without have to download and run you code. And we don’t need to discuss crypto.

  140. Andrew Polar Says:

    David, if you want to test ABS/ANS try ‘Enlog’ and ‘Abicon’ from article at the top. This is not download. You can copy text from browser to compiler, compile and run. The rules about encryption are published at I have my own SSL web server but I can not publish it. You can read user manual at but there is no download. Government rule say that:
    1. Providing encryption technique to a foreign national in US deem export and require the license.
    2. Teaching encryption of foreign national outside US is export and require the license.
    3. Anything published on-line presumed to be downloaded by users in all countries immediately after publishing.
    4. Patented algorithms consider public and can be taught and discussed without limitation. Such algorithm is RSA, however,
    5. Software that are made using RSA encryption are subject to license and can not be published.
    6. GUI and other tools that use SSL libraries need same license as SSL libraries even when they are published without libraries and contain only link to SSL site.
    7. For SSL encryption software there is license exception for the keys less than 512 bits but I wrote web server that use SSL libraries that provide keys sizes up to 2048 bits, so I can not publish it.

    These rules concern export. If you approach me and show me your American passport I can provide you this server because it is not an export, however I should retain for the record your contact information and provide this information to an export control administration when they ask.
    You and anybody else in America can download SSL servers from German sites and use, because that does not constitute export.

  141. David A. Scott Says:

    I am not interested in every using something like what it appears your working on for any sort of pass with encryption unless it could be made bijective. But it seems that that is not possible.
    Who knows what the current laws mean or how fast they change. I played the game when the rules first came out. I submitted to email the the required links and I got zero feedback from the feds on scott19u. Looking at the link you gave there are currently lots of exceptions. I know a math man in Chicago had a test case. I think he won. That aside at least for a few months the “Bill of Rights” is still in effect. Last tested when the DC police wanted to follow a policy like what is currently happening in Mexico where my relatives are scared to death because only the police and thugs have guns people there are not allowed to protect themselves and as a result are dying like flies it likely soon there will be a new revolution in Mexico its what happens when the government no longer cares for or trusts its common citizens. It appears at least in Texas maybe not where you live since its a constant fight the politicians want to steal all our rights. That said I work on compression since its easy to test. And when push comes to shove many compression programs are child’s play to mod for encryption.
    Don’t take this wrong I find your method interesting and may give it a larger look a few years down the road when I find BWTS bijective compression boring.

    Thank You
    David Scott

  142. Jarek Says:

    First of all, as I’ve already said - I don’t see a problem making ABS/ANS bijective:
    - decoding reads the initial state from the first bits, than makes some number of steps. We finally get a situation when there is not enough bits left to make a step - we have some number of such possible pairs: (state,a few bits) - we can encode them uniquely using some sequence of symbols.
    - coding decodes a few first symbols into such pair,make steps and finally just writes all but the oldest bit of the state.
    Please ask me by email if wouldn’t understand something - I don’t like to elongate this topic here.

    Secondly - I’m far from being convinced that bijectivity is important property in encryption. I can think about huge amount of bijectivities which are far from being safe … starting with identity. From the other side, take a really secure PRNG, like CryptoMT - from a few bytes of seed it can produce gigabytes of information such that it’s practically impossible to recover this seed from them.
    What is important is chaoticity - the smallest difference causes huge, unpredictable changes … like butterfly causing a tornado. Knowing all weather data, You couldn’t determine which butterfly should have been caught to avoid given tornado.
    ANS produces chaoticly coding tables which chooses local chaotic behavior…

  143. Andrew Polar Says:

    I want to bring up new topic to our discussion. It is report of G.N.N. Martin in 1979 where he or she explained principle of range encoder. I found only one reference

    Glen G. Langdon, An Introduction to Arithmetic Coding. IBM J. Res.Develop. Vol. 28, No. 2, March 1984.

    to this report. The article mentioned report but do not provide details. Every link to the article of G.N.N. Martin show the same ugly illiteral explanation. There is possibility that this article was fabricated in order to disable patents on arithmetic coding. I think the report of G.N.N. Martin took place, but where is the author??? Where is he or she??? And what was in report??? If that article was written by an inventor of the method why he or she missed very important details such that original range is from 0 to n^n where n is length or message. And this is the equivalent of [0,1) interval. The range is product of all frequencies and only missed point is LOW limit that has to be calculated for range coder. The founders of method usually explain everything clear. Read Newton, Chebyshev, Lobachevski and others they explain their concepts better than people after them.

  144. Andrew Polar Says:

    Sachin, would you like to start a new topic. A discussion about authenticity of article of G.N.N. Martin about range encoder. The best way to dissipate the doubts is show the author.

  145. Sachin Garg Says:

    Is this just a guess you are making based on lack of references or are there more reasons to believe that the article was fabricated?

  146. Andrew Polar Says:

    It is a guess. The ground for this guess is follows:
    1. All links contain same bad formatted document.
    2. The only place I found that mentioned G.N.N.Martin is article of Langdon of 1984 that does not say clearly what was there on the conference. Langdon was expert in industry and he could recognize more progressive technique compared to arithmetic encoding.
    3. This document that people call report of G.N.N. Martin appeared when Mr.Langdon passed away and at the time when all people were angry with IBM, ATT and other patents in arithmetic encoding.
    4. During all this time since 1979 G.N.N.Marin did not expose himself of herself when being the founding father of the very advanced method in data compression.
    5. The report was in England so is it not easy to find traces.
    6. All patents on arithmetic encoding obtained by experts start from definition that AE maps sequence of data on interval [0,1). Range coder does not do that so their patents are useless. Range coder maps sequence of data on interval [0, n^n), the big difference and ground to avoid formal infringement. How experts could miss that and filed patents that so easy to get around. The average cost of filing patent is $15,000 for corporation, attorney fee $300 an hour.

  147. Jarek Says:

    I’ve just finished large paper about ANS. There is added some deeper analysis, gathered rethinked information I’ve placed on different forums…
    There is also shown that presented data correction approach can really allow to reach theoretical Shannon’s limit and looks to have expected linear time, so should be much better than the only used practical such method - Low density Parity Check (LDPC)

    ps. There are many different forums … please use this place to discuss ABS/ANS…

  148. Andrew Polar Says:

    Jarek, just for your information:
    Matt Mahoney implemented ABS (binary coder). Some time later I implemented ANS coder for non-binary alphabet. The demo is at It is running without problems and show faster speed for small alphabets. I also tried adoptive ABS at It is binary but adoptive and different from Matt’s versions. It makes several tables at the start and pass state from one table to another in a special way that allow exact synchronization in decoding. So when you say that tables should be updated in adoptive coder that is not necessary. The can be calculated for several discrete cases up front. What is the reason of not having reference on me. You don’t like my critical remarks?

  149. Jarek Says:

    I apologies, but I referenced first implementations I knew about of both coders - probably there are more of them, but 2 links in reference is enough. Yours were a few months later after necessity of explaining every smallest detail and placing here many unrelated posts instead of just mailing me. I don’t remember any constructive criticism from You, but I don’t want more unrelated discussions here.

  150. Andrew Polar Says:

    My critical remarks concerned the way you explain your method. I think this is technical issue that means related to discussion forum. If you scroll back to message 89 you can see that I ask clearly how to write non binary encoder. You can look at your answer 90 that does not explain anything and later (message 98) you showed the link to example of implementation, which was exactly what I was asking. This implementation shows the concept but hides most critical and important part: how to reduce size of state variable in encoding and recognize this reduction accordingly in decoding. Hiding important details in explanation is not necessary because people will find that out anyway. I also criticized of using topology in explanation because it is not necessary for this particular topic and engineers don’t understand it. Your new article is better than original, however, it is missing the discussion about renormalization that means reduction of state variable by outputting part of bits. This is critical. I wrote one implementation that actually works but there is no guarantee that it is computationally stable. As you know in arithmetic coding the stability in renormalization was proven theoretically by Rissanen.
    I also don’t need reference because I have 30 unique visitors a day and I have to admit that I was not always right in discussion I apologies, but I still stand by my remarks that your explanation is not readers friendly while the method itself is very interesting any I’m interested to use it. I’m interested to use the best method no matter who invented it. I did not invent range coder either.
    Regards, Andrew.

  151. Andrew Polar Says:

    The whole concept of ANS coder require 10 minutes for explanation and one illustrative table that I showed at the bottom of it is table at 11.3. It also tells how to use it in adoptive coding. Since each step works for any state and all sequential states are presented in the first column, program can simply reload table at any time or pass state to another table. That is all. Topic covered, nice and clear. But what is missing is how to reversibly reduce state in outputting and reading bits. Although there are some techniques already used programmers need to know that using one particular method guarantee reliable coding for any data. I call this topic reliable renormalization. This is another thing that I noticed in your papers: a lot of discussion about unimportant details and ignoring really critical things that prevent ANS coder to be used. And by the way your suggestion in wofram demo for renormalization is not reliable. I can generate data that crash your coder.

  152. David A. Scott Says:

    Andrew let me ask again. In section 11.3 you mention
    if there is an ouput its either 11 0 0 or 00.
    This seems to me to be an error is it an error or not?
    Thank You

  153. Andrew Polar Says:

    I changed the explanation. Try to look again. I think this time it will be clear.

  154. Andrew Polar Says:

    Regarding article of G.N.N.Martin.
    I have contacted some people who could know Martin. One gentleman confirmed that Nigel Martin was really working on this topic at those time, however, he could not confirm that published document belonged to Martin. I continue my investigation and let all know when I get the result. May be it is possible to obtain good quality copy of this article (from Southampton University) that holds date and name and show all in a good format.

  155. David A. Scott Says:

    Looking at

    Of course if you changed it somewhere else it might
    be nice to actually have the current URL you are
    referring too.

    as far as I can tell nothing has changed in table
    look at row of output bits.
    11 0 0 00 there is not single
    isolated 1 which to me be means bad

    following a cut and paste from the URl
    it looks better at the URL

    state before output of bits 31 16 22 24
    output bits 11 0 0 00
    state after bit output 7 8 11 6
    processed symbol A B A C
    state after processing 16 22 24 30

  156. Andrew Polar Says:

    Table is correct, I don’t understand your question. The example code is running and making compression in accordance with entropy and in accordance with theory. If you don’t understand the method read again. The output is 11 0 0 00 11110. The last fragment is last state, which is 30. Regarding bijectivity. You can not pass any junk into last state 11110. It must be 5 bit value starting from 10000 to 11111. Anything else can be replaced. I don’t want again discussion about encryption but you can see that last value 11110 may be a key that don’t go to output. In this case any cipher can be processed.

  157. Jarek Says:

    I’ve been asked to precise the correction algorithm - there is new version of the paper available
    … there were also a few small errors to correct (…especially in error correction section…) I believe it’s better now (?)

  158. kobi barak Says:

    Hello .
    My name is KOBI Barak, I am an electronics engineer and a pilot
    A year and a half ago I developed an electronic unit that requires a unique compression..
    I decided to try to develop compression algorithm and solve the problem by myself. About a 2 month ago I finish to develop a compression algorithm that able to compress in rate of thousands percent,.
    It can be used in audio and video lossless compression applications, archiving and more.
    My algorithm is able to shrink any file to the size of 4K (in price of time) (1 GB takes 15 mints).
    It sound a little imaginary, but it is not it is real End working.
    I will be happy to represent what I doing in the subject and I know that it can use for many applications.
    Sincerely yours.
    kobi Barak.

  159. Andrew Polar Says:

    Can you show an example of encryption? It is obviously possible to use ANS for encryption but there are many ways of doing that. It is interesting to know your way.

  160. Jarek Says:

    Andrew - it’s not that easy that one person says that it’s just safe … it should be discussed widely to be considered such … but I believe that if You follow suggestions from the end of 8th section it should be so…

    To give a more specific answer I would have to know all parameters to use it for all three purposes simultaneously … but if You want to use it for the simplest purpose of encryption only, I’ll suggest for example to use this PRNG initialized with the key to generate probabilities distribution for the intermediate step, e.g. from [1/512,1/256], so that we always transfer a byte and sometimes additionally one bit - bit transfer is cheap. l should be used about 2^17 .. 2^20 to fit into cache. If there is a chance of adaptive scenarios, the initial states should be chosen randomly.
    For this moment I believe that for such parameters there is no need to additionally remove correlations, but I think it’s the most controversial part (?) …

    ps. There is a small problem with combining data correction with the method of enlarging internal state in which we switch some youngest bits - these bits are immediately transferred and be used in the next cycle - we would have to locate them…
    To make correction simpler, there should be changed order: step of coding is bit transfer then switching the youngest bit (or a few) of this reduced intermediate state then encode the symbol.
    To make that this bit switch doesn’t lead outside I_s, we have to use l_s being even numbers (or correspondingly higher powers of 2).

  161. alan Says:

    Andrew Polar writes: “Topology has nothing to do with data compression.”

    data: isolated bits that can assemble, in any order

    topology: isolated order, that can assemble in any bits

    data compression = topological expansion

    that is, if data is compressible, it contains inherent topolgical aspects.

    A “pin” is a cross-over (like “x”)

    so is like “a head”. A “head of a pin” is a double cross-over (or “a soft where” (i.e. you’re not quite sure where “the head” or “the pin” IS; “engineer’ (it’s got to be somewhere, so you need to de-code it)(otherwise known as “how many angles”, or “can dance”. To include both these concepts, requires “partial differentiation” (imaginary “pin-heads” otherwise known as “binary/ code”)

    To differentiate “binary” from “code”…..

    well, who wants to sign an NDA? Be my guest?! …

    (heard of “space calculus”?! …… geometrical diffraction….???)(micro-lensing…. or “gravity” !!!)

    rest of quote:

    “Your article is set of unrelated to each other topological statements written in a wrong English, which are true when understood but don’t represent anything new or not trivial. Software engineers see topology as a science similar to the one used in middle ages when people argued how many angels can balance on the head of a pin, that means interesting for selected people and totally unrelated to real life.”

  162. Dmitry Sychov Says:


    At first i must say that i’ve read Jarek doc and understood nothing from it -
    luckely to me Andrew and Matt sources put some light on the algorithm.

    Currently I see no advantages over optimised AC, the number of ops
    plays not a vital role - its the system bus that slows the things down. i did not
    find any benchmarks but out for the code i’ve saw, the running ANS ahould be
    at most 15-25% faster than optimized AC, and there is ANS-table setup time too.

    An interesting thing that i’ve noted is that ANS could be somehow used as a replacement
    of a block cipher.

    Lets consider the popular theme of packets exchanging. For every packet compressed+encoded with ANS
    we should still pass probability distribution in _encoded_ form (using RSA as a cipher)

    .. or there is some magic number out of rand generator which encrypt the probability distribution table itself?

    .. or should we use block/stream cipher to encode the distribution table only? [but then we should prefer to
    encode the whole packet with it i guess..]

    Regards, Dmitry

  163. Jarek Says:

    Alan, the heart of data compression are correlations … topology is rather a part of practical models of simulating them, like Markov’s chain.

    Dmitry, as entropy coder ANS in one table check and a few small bit operations can encode/decode a byte or more. In modern world the more important are hardware implementations - they would require some memory, but should be much faster.
    … and simultaneously we can encrypt the message well and we can add redundancy for really good error corrections - three layers are made in one faster and better than a single one used today.

    About the correction purpose - the simulator has just been published on Wolfram’s page:
    Is shows that we finally have near Shanon’s limit method working in nearly linear time for any noise level.

    I’ve realized that for practical correction methods (not requiring exponential correction time), we rather need a bit more redundancy than theoretical (Shannon’s) limit. Redundancy allows to reduce the number of corrections to consider. In practical correction methods we rather have to elongate corrections and so we have to assume that the expected number of corrections up to given moment is finite, what requires more redundancy than Shannon’s limit (observe that block codes fulfills this assumption).
    This limit is calculated in the last version of the paper (0902.0271). The basic correction algorithm (as in the simulator) works for a bit worse limit (needs larger encoded file by at most 13%), but it can probably be improved.

    Used today error correction methods works practically only for very low noise. Presented approach works well for any noise.
    For small noises it needs size of encoded file practically like for Shannon’s limit. The difference starts for large noises: it needs file size at most twice larger than the limit.
    Practical method for large noises give new way to increase capacity of transition lines and storage devices - for example place two bits where we would normally place one - the cost is large noise increase, but we can handle it now.

    For extremely large noises, we can no longer use ANS. On fig. 3 of the paper is shown how to handle it. For example if we have to increase the size of the file 100 times, we can encode each bit in 100 bits - encode 1 as (11…111 XOR ‘hash value of already encoded message’. The same with 0. Now while creating the tree, each split will have different number of corrected bits - different weight.

    Regads, Jarek

  164. alan Says:


    if I may quote….

    “the heart of data compression are correlations”

    reaction: old style! The data compression “revolution” is changing this

    Do you know how every DVD movie in New Zealand can occupy almost no space on a single DVD (I’m guessing potential here re: revolutionary new technology)?

    Shanon- I have heard of him; and Stephen Wolfram; “Shanon’s limit”…. I wonder….

    “we rather need a bit more redundancy than theoretical (Shannon’s) limit.” Very good thinking, if I may give an opinion !

    An infinte number of corrections = A SPACE CODE

    Guess what “dark matter” and “dark energy” is!!!! What “The Hubble Constant” really might be; what “speed of light”: actually represents, …!!!!!!!!

    Why correct errors?

    Convert “weight” into “electricity”……….

  165. Tanya Kitchens Says:

    I feel far more persons will need to read this, extremely very good info.

  166. Jarek Duda Says:

    After a year of focusing on physics, I finally have the ANS PhD defense this month (a few small corrections: ) and while preparing presentation, I wanted to write that the fact that encoding and decoding are in reverse order complicates adaptive compression … and spotted that it’s not unavoidable - there is no problem in encoding in both directions:

    find the smallest natural y, such that x+y correspond to symbol s (D1[x+y]==s)
    put y to output (in base b)

    There are a few ways to ‘find the smallest y’, for example:
    -just check x,x+1,x+2,.. - if q/in(1/3,2/3) it need at most 3 checks, but the average number of checks for ABS is about 1.5,
    -using ABS with formula we can immediately determine y looking at fractional part of q*x,
    -for ANS we could put y into table of size n*l,
    -or more cheaply - remember only appearances of symbols and for given x interpolate its position in such table - we find the proper one in at average a bit more than 1 check.

    It seems that there could be a problem with x near the right end of the segment, but there is always some place left there and often there should be appearances of all symbols there. Eventually if not, in such rare situations we can go cyclically to the beginning of the segment (instead of y encode (bl-x)/b+(x’-l)).
    Alternatively we could reverse decoding, but it looks less practical and is more complicated.

  167. Jarek Says:

    I forgot about increasing x - of course it should be:
    { while x is smaller than l, x = bx ;
    find the smallest natural y, such that D1[x+y]==s;
    put y to output (in base b system);
    Just before the last operation, x+y corresponds to x in decoding step. Instead of reading bits from input, this time we have to find them so that its the nearest one appearance of given symbol.

  168. Jarek Says:

    I have tremendous information compression challenge :) - present the paper within 20 minutes - there are many new intuitive diagrams in the presentation, so I’ve decided to translate it if someone would like to understand some concepts and had problem with the paper - it might help:
    (information about LDPC is taken from )

  169. Jarek Says:

    I should have read it one more time … here is corrected version:

  170. Nannette Says:

    Hi! I could have sworn I’ve been to this website bbefore but after browsing through some of the ost
    I realized it’s new too me. Nonetheless, I’m definitely happy I found it annd I’ll
    be book-marking and chuecking back often!

  171. melaka property expert Says:

    of course like your website however you have to check
    the spelling on several of your posts. A number of them
    are rife with spelling problems and I in finding it very
    troublesome to inform the reality nevertheless I will definitely come
    again again.

    Here is my weblog … melaka property expert

  172. our cheap discount hot clothing ed hardy love kills slowly designer handbag in red loved wholesale Says:

    Access - This Cover Almost everything regarding nfl jersey
    our cheap discount hot clothing ed hardy love kills slowly designer handbag in red loved wholesale–clothing-ed-hardy-love-kills-slowly-designer-handbag-in-red-loved-p-1397.html