Tag Archives: Office Open XML

Using HTML52PDF to create text documents in PHP

I’ve been trying out different PHP libraries to generate text documents. My previous post Using PHPWord to generate text documents in PHP describes my quest in more detail.

This time around I’m tying out the HTML52PDF library (http://www.html52pdf.com/). Despite what it’s name suggests the library is not PDF specific and actually supports the Open Document format directly. The library is actively being developed. It’s website is nice looking, has tutorials and documentation. The only downside is that using the library for commercial use requires a license that starts at $199, but if the library provides what the website claims it would be well worth it.

The demo I created to test the library is available here http://sporkcode.com/sandbox/document/html52pdf.

The first problem I ran into with the library is that it generates a strict standards level error when you try to create a document. To get around the problem I disabled error reporting of non fatal errors around the render document function call. After that the document would render, but it’s very surprising that a commercial library would have such a obvious problem in the code.

Open Document Format

The Open Document file rendered pretty well except it striped out text that was not wrapped in an HTML tag. This is a problem because that is how most of the text is returned by the Dojo HTML Editor (using Firefox). It is interesting that the PHPWord library had a similar problem.

Open Document Text file generated by HTML52PDF screen capture

screen capture of Open Document Text file generated by HTML52PDF

Portable Document Format (pdf) & Office Open XML (docx)

To generate PDF, Office Open XML or RTF documents the library requires LibreOffice to be installed on the server and setup to use an extension included in the HTML52PDF library and then to run an additional PHP script as a service which translates the Open Document file into another format with LibreOffice. I set this up on my development server, but my first attempt generated additional warnings and errors when I tried to generate formats other than Open Document. After changing where I was saving the files from the system temporary folder to a folder with different permissions and tweaking the service script. I was able to generate other formats. I decided against going through the same process on my production server so the demo does not production Office Open XML (docx) or PDF files.

Conclusion

It’s really hard to feel confident in a library that generates warnings using its most basic functions. Even though the warnings don’t prevent the library from functioning it makes me question how much testing went into its release and spending $199 on licensing software that has such obvious defects.

The dependency on LibreOffice libraries to generate formats other that Open Document seems problematic as it means extra setup on a production server and may introduce compatibility issues with later versions of LibreOffice. The PHP script that you have to run as a service gave me additional concerns. First the library doesn’t provide any tools to launch the script which means additional server setup and maintenance. It also means that if the script crashes, rendering formats other than Open Document will stop working without any warning, unless you setup some kind of monitoring solution to check that the script is runing. Finally the libraries installation guide recommends running the script as root which is an obvious security vulnerability. If I did go ahead with using this software I think I would only use it to generate Open Document files, but that seriously reduces it usefulness.

In the end the library does to a good job rendering Open Document Text files, but there are several issues that make it difficult for me justify the $199 it costs to license it.

 

Using PHPWord to generate text documents in PHP

I started working on a project where I need to generate reports as text documents that can then be edited with a word processor. The reports will be a combination of rich text blocks from HTML and data sets in tables. Open Document (odt) or Microsoft’s Office Open XML (docx) are both suitable choices for the file format since they are both open and supported by most word processors.

It’s relatively easy to generate Open Document spreadsheets, but converting HTML text into a formatted document is much more difficult. Luckily there are a handful PHP libraries out there for working with Open Document or Office Open XML files that could make this project a lot easier. To meet my needs the library must be able to create a document, import HTML blocks and create tables. I’m going to test the most promising looking libraries, compare the results and post my results here. I’ve created a live demo for testing each library which can be found here http://sporkcode.com/sandbox/document.

PHPWord

https://github.com/PHPOffice/PHPWord

Demo http://sporkcode.com/sandbox/document/phpWord

PHPWord looks very promising. It’s open source, actively maintained, has documentation and examples, supports many file formats and has an extensive API for working with documents. It’s also also easy to install using composer.

Unfortunately the library failed to format the document correctly in the demo. The first problem is that it doesn’t recognize <b> or <i> tags in the HTML. This seems like a pretty glaring omission, but it was easy to get around by using HTML Purifier to change the tags into <strong> and <em>. The library also has problems formatting text that has inline HTML tags, but is not wrapped in a block level tag such as a <p>. This especially a problem since text generated by HTML editor in the demo is not always wrapped in block level tags.

Open Document Text

The Open Document text file had many rendering problems.

PHPWord Open Document Text screen capture

Screen capture of Open Document Text file created by PHPWord

As the image above shows the heading tag did not render any formatting, inline tags did not render correctly when not inside another tag (as mentioned above), the unordered and ordered list did not render at all and the table cells did not render any formatting.

Office Open XML

The Office Open XML document rendered better, but still had problems.

PHPWord Office Open XML screen capture

Screen capture of Office Open XML generated by PHPWord

There are still problems with inline tags not wrapped in block level tag. The table styles rendered correctly, but the first columns took up the entire width of the document until I set a static column width.

Portable Document Format (pdf)

The library has the option to render to a PDF, however it requires an additional library which I didn’t test because of the problems with Open Document and Office Open XML formats.

Other Issues

I took a quick look at the source code to see how difficult it would be to fix some of the problems and was not very impressed with what I found. The code seems rather thrown together. There are unnecessary static functions, arguments passed by reference and use of reflection classes. Not that there was anything wrong with the code, but it doesn’t utilize what I would consider best practices. It appears that the project is a branch of a project of the same name created by CodePlex, which could explain the patchwork nature of the code.

Another thing that annoyed me is that all of the syntax for document styling uses older HTML attribute names (valign, bgcolor) instead of using CSS conventions.

Conclusion

Overall the library has impressive features and appears well supported, but sill has plenty of issues. Even though the project is supported I have doubts that it will become stable. I was very disappointed by the results of the Open Document support and if I use this library I will definitely stick with Office Open XML.

Next: Using HTML52PDF to create text documents in PHP