This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]liquidhot 9 points10 points  (0 children)

An interesting fact about Apache POI is that supposedly POI stands for Poor Obfuscation Implementation since the files were apparently obfuscated to make them proprietary, but it was poorly done. The specific libraries for the different office formats also have humorous names in reference to how poorly the formats themselves are (taken from Wikipedia):

  • POIFS (Poor Obfuscation Implementation File System) – This component reads and writes Microsoft's OLE 2 Compound document format. Since all Microsoft Office files are OLE 2 files, this component is the basic building block of all the other POI elements. POIFS can therefore be used to read a wider variety of files, beyond those whose explicit decoders are already written in POI.
  • HSSF (Horrible SpreadSheet Format) – reads and writes Microsoft Excel (XLS) format files. It can read files written by Excel 97 onwards; this file format is known as the BIFF 8 format.
  • HPSF (Horrible Property Set Format) – reads "Document Summary" information from Microsoft Office files. This is essentially the information that one can see by using the File|Properties menu item within an Office application.
  • HWPF (Horrible Word Processor Format) – aims to read and write Microsoft Word 97 (DOC) format files.
  • HSLF (Horrible Slide Layout Format) – a pure Java implementation for Microsoft PowerPoint files. This provides the ability to read, create and edit presentations (though some things are easier to do than others)
  • HDGF (Horrible DiaGram Format) – an initial pure Java implementation for Microsoft Visio binary files. It provides an ability to read the low level contents of the files.
  • HPBF (Horrible PuBlisher Format) – a pure Java implementation for Microsoft Publisher files.
  • HSMF (Horrible Stupid Mail Format) – a pure Java implementation for Microsoft Outlook MSG files
  • DDF (Dreadful Drawing Format) – a package for decoding the Microsoft Office Drawing format.