The Data Compression News Blog

All about the most recent compression techniques, algorithms, patents, products, tools and events.

Subscribe

Posts: RSS Feed
Comments: RSS Feed

Recent Posts

  • Google Snaps Up On2 (3 Comments)

  • Bijective BWT (29 Comments)

    David Scott has written a bijective BWT transform, which brings all the advantages of bijectiveness to BWT based compressors. Among other things, making BWT more suitable for compression-before-encryption and also give (slightly) better compression.

  • Asymmetric Binary System (169 Comments)

    Jarek Duda’s “Asymmetric Binary System” promises to be an alternate to arithmetic coding, having all the advantages, but being much simpler. Matt has coded a PAQ based compressor using ABS for back-end encoding. Update: Andrew Polar has written an alternate implementation of ABS.

  • Precomp: More Compression for your Compressed Files (5 Comments)

    So many of today’s files are already compressed (using old, outdated algorithms) that newer algorithms don’t even get a chance to touch them. Christian Schneider’s Precomp comes to rescue by undoing the harm.

  • On2 Technologies is Hiring

    There aren’t too many companies working on cutting edge codecs, and of those few this one is hiring. Best of luck.

Compressed XML based files formats in Microsoft Office 12

Posted by Sachin Garg on 27th February 2006 | Permanent Link

Microsoft, has been talking about Key Benefits and Functionality in Office 12 XML formats. The first item in the 6 item list says:

Compact file format. Documents are automatically compressed—up to 75 percent smaller in some cases.

Microsoft’s official FAQ on XML office formats has this to say:

Q. Why do these file formats produce smaller document sizes?
A. Internal data storage of the Office XML Formats is segmented into discreet components—for example, embedded code is stored separately from the document content inside the file. Each component within a single file is compressed using ZIP compression technology. In addition to the compression of each document segment, the entire document is also compressed using ZIP compression. As a result, the sizes of Microsoft Office Word documents, Microsoft Office Excel spreadsheets, and Microsoft Office PowerPoint presentations are reduced. The amount of compression and size savings varies per file.

Q. Does this mean I will have to unzip my files before I open them in Microsoft Office applications?
A. No. When opening the documents in Microsoft Office, the compression and de-compression happens automatically; users are not required to manually apply compression, and they are never asked to unzip a document.

Q. Will the file compression reduce the quality of images stored in my documents?
A. No. The ZIP compression technology will not compress images.

There has been a recent wave of commercial products for compressing PowerPoint files. Is microsoft planning to put out of business all the Office file compressors available today? It seems they are.

If you are thinking older Office formats will still stay for a while, think again. Here is what microsoft has to say on this, “By installing a simple update, users of Microsoft Office 2000, Microsoft Office XP, and Office 2003 Editions can open, edit, and save documents in one of the Office XML Formats.”

Also, unlike older days, this format will have a open and royalty-free specification.

Further reading: A review of other Office 12 features. And, there is a post on Office XML formats at MSDN blogs.

4 Responses to “Compressed XML based files formats in Microsoft Office 12”

  1. Sunil Says:

    If Microsoft is not going to compress images, then how do they intend to put companies like NX Power out of business. Assuming that 70% of a ppt’s size maybe images and 30% text. Microsoft stores these images as PNG. Assuming the new zip technology will give say 50% improvement over the text, this translates to and 15% improvement on the overall size. I am sure NX Power lite can do more on the 70%. Of course it is a differnet matter if Microsoft is going to shift to JPEG/JPEG200 (microsoft will have its own flavor of JPEG2000/embedded coding called WMI, windows media imaging)as a mode of storing images.

    I am not a great fan of microsoft or NX power lite. But i think an open XML based standard will only provide more control for office file optimizers because they have more control over the file format.

    -Sunil M


  2. Sachin Garg Says:

    Now they have solved the issue for non-image data. For images, I think microsoft cannot automatically do lossy compression of images. That explains why they have chosen PNG. This is far better than older versions of Office where BMP images were embedded as is.

    Together, it seems Microsoft is waking up to the size issues and has already reduced the advantage gained by using specialized products.

    But, that is just my opinion. Others may feel that these 3rd party tools are still worth it. It wouldn’t be too surprising is MS comes with the “optimize embedded images” feature too. Microsoft is known for adding such features which put others out of business.

    > But i think an open XML based standard will only
    > provide more control for office file optimizers

    Yes, the advantage of having open standards.


  3. Sunil Says:

    Maybe they will do a lossy compression on images to reduce the file size. But even then i think, if people use the right kind of tools, like stuffIT which can “re - compress” JPG’s. I am sure that similar concepts can be extended to “re - compress” PNG’s. This way the optimizer companies can still get some amount of compression gain over Microsoft. I dont know if that gain would be worthwhile though. I seriously think the only way Microsoft can put someone out of business in this game is if it can do some flavor of embedded coding. This would give some extraordinary features to the office suite in my opinion.

    -Sunil M


  4. sandman Says:

    In short, life is going to be better for an average user.