I started working on a project where I need to generate reports as text documents that can then be edited with a word processor. The reports will be a combination of rich text blocks from HTML and data sets in tables. Open Document (odt) or Microsoft’s Office Open XML (docx) are both suitable choices for the file format since they are both open and supported by most word processors.
It’s relatively easy to generate Open Document spreadsheets, but converting HTML text into a formatted document is much more difficult. Luckily there are a handful PHP libraries out there for working with Open Document or Office Open XML files that could make this project a lot easier. To meet my needs the library must be able to create a document, import HTML blocks and create tables. I’m going to test the most promising looking libraries, compare the results and post my results here. I’ve created a live demo for testing each library which can be found here http://sporkcode.com/sandbox/document.
PHPWord looks very promising. It’s open source, actively maintained, has documentation and examples, supports many file formats and has an extensive API for working with documents. It’s also also easy to install using composer.
Unfortunately the library failed to format the document correctly in the demo. The first problem is that it doesn’t recognize <b> or <i> tags in the HTML. This seems like a pretty glaring omission, but it was easy to get around by using HTML Purifier to change the tags into <strong> and <em>. The library also has problems formatting text that has inline HTML tags, but is not wrapped in a block level tag such as a <p>. This especially a problem since text generated by HTML editor in the demo is not always wrapped in block level tags.
Open Document Text
The Open Document text file had many rendering problems.
As the image above shows the heading tag did not render any formatting, inline tags did not render correctly when not inside another tag (as mentioned above), the unordered and ordered list did not render at all and the table cells did not render any formatting.
Office Open XML
The Office Open XML document rendered better, but still had problems.
There are still problems with inline tags not wrapped in block level tag. The table styles rendered correctly, but the first columns took up the entire width of the document until I set a static column width.
Portable Document Format (pdf)
The library has the option to render to a PDF, however it requires an additional library which I didn’t test because of the problems with Open Document and Office Open XML formats.
I took a quick look at the source code to see how difficult it would be to fix some of the problems and was not very impressed with what I found. The code seems rather thrown together. There are unnecessary static functions, arguments passed by reference and use of reflection classes. Not that there was anything wrong with the code, but it doesn’t utilize what I would consider best practices. It appears that the project is a branch of a project of the same name created by CodePlex, which could explain the patchwork nature of the code.
Another thing that annoyed me is that all of the syntax for document styling uses older HTML attribute names (valign, bgcolor) instead of using CSS conventions.
Overall the library has impressive features and appears well supported, but sill has plenty of issues. Even though the project is supported I have doubts that it will become stable. I was very disappointed by the results of the Open Document support and if I use this library I will definitely stick with Office Open XML.