<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	>
<channel>
	<title>Comments on: Precomp: More Compression for your Compressed Files</title>
	<atom:link href="http://www.c10n.info/archives/719/feed" rel="self" type="application/rss+xml" />
	<link>http://www.c10n.info/archives/719</link>
	<description>All about the most recent compression techniques, algorithms, patents, products, tools and events.</description>
	<pubDate>Tue, 07 Feb 2012 21:41:16 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.6</generator>
		<item>
		<title>By: Sachin Garg</title>
		<link>http://www.c10n.info/archives/719#comment-271810</link>
		<dc:creator>Sachin Garg</dc:creator>
		<pubDate>Sun, 18 Jan 2009 09:43:46 +0000</pubDate>
		<guid isPermaLink="false">http://www.c10n.info/archives/719#comment-271810</guid>
		<description>The exact bit identical reconstruction of zip files matters when the zip compressed (deflated) data is embedded inside another file (like in PNGs or PDFs). For other uses, I agree that it is not very important.

There has been some work done on text-filters like the one you mentioned. One of the tricks I can recall is that they 'grouped' frequently used 2-3 character combinations and replaced them with a single 'unused' character. (eg, 'the' can be replaced by 0x01. the is used very frequently in the, them, their, there etc and 0x01 is never used in text files.) Another one was about optimizing encoding of new-line characters and special handling of capitalization of first characters after a '.' period.

You might want to look up these papers. Let me know if you can't find them and I will try to see if I have them in my old archives.</description>
		<content:encoded><![CDATA[<p>The exact bit identical reconstruction of zip files matters when the zip compressed (deflated) data is embedded inside another file (like in PNGs or PDFs). For other uses, I agree that it is not very important.</p>
<p>There has been some work done on text-filters like the one you mentioned. One of the tricks I can recall is that they &#8216;grouped&#8217; frequently used 2-3 character combinations and replaced them with a single &#8216;unused&#8217; character. (eg, &#8216;the&#8217; can be replaced by 0&#215;01. the is used very frequently in the, them, their, there etc and 0&#215;01 is never used in text files.) Another one was about optimizing encoding of new-line characters and special handling of capitalization of first characters after a &#8216;.&#8217; period.</p>
<p>You might want to look up these papers. Let me know if you can&#8217;t find them and I will try to see if I have them in my old archives.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: David</title>
		<link>http://www.c10n.info/archives/719#comment-271796</link>
		<dc:creator>David</dc:creator>
		<pubDate>Sun, 18 Jan 2009 04:32:27 +0000</pubDate>
		<guid isPermaLink="false">http://www.c10n.info/archives/719#comment-271796</guid>
		<description>I don't really care about recovering a particular .zip file.
As long as the underlying data (the original uncompressed files) can be recovered, one .zip file is about the same as another to me.

On the other hand ...

I've been playing with something distantly related to this -- recognizing "compression" in English text (acronyms, contractions, standard patterns for making words plural, other patterns for adding "ing" or "ed" suffixes, etc.).

So my decompressor has 2 stages:

First decompress to a "verbose" text analogous to the ".pcf file", which has text something like "cooky +s".

Then "compress" using standard English patterns:
"The plural of *y is *ies".

The "bit-for-bit identical" is important for this application, because I want the result -- after the second stage -- to be bit-for-bit identical with the original English text.</description>
		<content:encoded><![CDATA[<p>I don&#8217;t really care about recovering a particular .zip file.<br />
As long as the underlying data (the original uncompressed files) can be recovered, one .zip file is about the same as another to me.</p>
<p>On the other hand &#8230;</p>
<p>I&#8217;ve been playing with something distantly related to this &#8212; recognizing &#8220;compression&#8221; in English text (acronyms, contractions, standard patterns for making words plural, other patterns for adding &#8220;ing&#8221; or &#8220;ed&#8221; suffixes, etc.).</p>
<p>So my decompressor has 2 stages:</p>
<p>First decompress to a &#8220;verbose&#8221; text analogous to the &#8220;.pcf file&#8221;, which has text something like &#8220;cooky +s&#8221;.</p>
<p>Then &#8220;compress&#8221; using standard English patterns:<br />
&#8220;The plural of *y is *ies&#8221;.</p>
<p>The &#8220;bit-for-bit identical&#8221; is important for this application, because I want the result &#8212; after the second stage &#8212; to be bit-for-bit identical with the original English text.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Benbelkacem</title>
		<link>http://www.c10n.info/archives/719#comment-260430</link>
		<dc:creator>Benbelkacem</dc:creator>
		<pubDate>Fri, 12 Sep 2008 14:20:14 +0000</pubDate>
		<guid isPermaLink="false">http://www.c10n.info/archives/719#comment-260430</guid>
		<description>am sorry if i wrote it in french but i can write in english.
please can you tell me how because am really lost in every thing there is in the internet .

              thanks</description>
		<content:encoded><![CDATA[<p>am sorry if i wrote it in french but i can write in english.<br />
please can you tell me how because am really lost in every thing there is in the internet .</p>
<p>              thanks</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Sachin Garg</title>
		<link>http://www.c10n.info/archives/719#comment-260426</link>
		<dc:creator>Sachin Garg</dc:creator>
		<pubDate>Fri, 12 Sep 2008 12:33:32 +0000</pubDate>
		<guid isPermaLink="false">http://www.c10n.info/archives/719#comment-260426</guid>
		<description>I am using google's translation of your comment as I don't understand french.

Best way to protect your rights would be to get a patent on your method, but it will have to be something really new. Best of  luck.</description>
		<content:encoded><![CDATA[<p>I am using google&#8217;s translation of your comment as I don&#8217;t understand french.</p>
<p>Best way to protect your rights would be to get a patent on your method, but it will have to be something really new. Best of  luck.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Benbelkacem</title>
		<link>http://www.c10n.info/archives/719#comment-260424</link>
		<dc:creator>Benbelkacem</dc:creator>
		<pubDate>Fri, 12 Sep 2008 12:22:53 +0000</pubDate>
		<guid isPermaLink="false">http://www.c10n.info/archives/719#comment-260424</guid>
		<description>Bonjour ;
moi aussi je suis intéresser par la compression des données mais je ne comprend pas comment faire...
par exemple si je trouve une nouvelle méthode comment la communiqué au mande entier perdre ses droits ??????</description>
		<content:encoded><![CDATA[<p>Bonjour ;<br />
moi aussi je suis intéresser par la compression des données mais je ne comprend pas comment faire&#8230;<br />
par exemple si je trouve une nouvelle méthode comment la communiqué au mande entier perdre ses droits ??????</p>
]]></content:encoded>
	</item>
</channel>
</rss>

