Python Buzz Forum - Why I'm Pining for PDF Support in Firefox/Gecko

A big part of the product I'm working on entails producing PDFs of business reports. The product is completely web-based and all reports are available via the web but PDF export is necessary for a couple of reasons:

The reports must go to print and must look professional when printed.
Reports are often sent via email to people who don't have access to the website. (We don't like this and would prefer for everyone to come to the website to view reports. However, PDF/Excel via Email is how things have been done in this space forever so we have to take a gradual approach to bringing our client's processes to the web.)
The web application presents multi-page reports as multiple web pages (each report page has its own URL) but when exporting to PDF, a bunch of these pages need to go into a single multi-page document.

The product is built on Rails (with lots of custom stuff on top). Each individual report page has an ERB (.rhtml) template that generates the HTML version of the report. We then duplicate the view code in an .rpdf template, which is just Ruby with a PDF::Writer object scoped in. We also export to Excel so we have an additional .rxls template, which is just Ruby with a Spreadsheet::Excel object scoped in. For any given report, we end up with something that looks like this in the project:

cost-comparison.rhtml - The HTML view code.
cost-comparison.rpdf - The PDF view code.
cost-comparison.rxls - The Excel view code.

The system is pretty simple, really, and we've been able to accomplish a lot with it. Our controller and model code is completely view independent - all routing and template picking logic lives in a nice little plugin.

Rails 2.0 adds explicit support for multi-view setups like this and, although I haven't looked into it yet with any great depth, it seems to be very well done and should be able to accommodate our needs.

(As an aside, we hacked support for HTTP content negotiation in here as well, which was fun but absolutely no-one cares or will ever use it, so I digress.)

Anyway, what I really want to talk about is that maintaining these PDF templates is a real pain in the ass. PDF is basically a raw 2D vector surface with drawing primitives similar to what you find in SVG (e.g., lines, rects, circles, ellipses, paths, text, etc.) You don't get a nice box model and flowing layout as you do in HTML. Ruby's PDF::Writer alleviates a bit of the pain here by providing automatic text wrapping and few other niceties but, for the most part, you're forced to manage the x, y, width, and height of everything manually. Pain.

The PDF templates are especially disgusting from a code clarity perspective. They are typically 3-4x the LOC of our HTML templates, at least half of which is dedicated to managing the incidental complexity of page layout, positioning, and style. The other half is a complete duplication of the presentation logic from the HTML view.

Managing these PDF templates is a nightmare and everyone tries to avoid them if at all possible.

I've been experimenting with different ways to make generating PDFs a bit simpler. One of the more promising experiments was using a CSS print stylesheet on top of the existing HTML based reports. This turned out to work fairly well in Firefox using MacOS's built in Print to PDF support. The resulting PDF actually looked quite a bit better than the hand coded PDF output in a lot cases.

Instead of maintaining N PDF templates (one for each report page), I could theoretically have a single print stylesheet for the entire project and get PDF versions of all of our HTML based reports for free.

Theoretically.

The problem is that you can't count on Print to PDF support on the client (MacOS is the only platform I know of that has good support built in). You can get Print to PDF support on Windows but you have to purchase a separate PDF print driver. But even if you could count on Print to PDF being supported everywhere, the browser incompatibilities going to print are another barrier you'd have to deal with. And even if that worked, we would still need to generate PDFs on the server because we have a report scheduler that automatically runs reports and sends Email with PDF attachments.

Clearly, moving PDF generation client side isn't the answer. But I hope that doesn't necessarily mean we can't use the browser as our rendering engine. What I'd like to do is run Firefox/Gecko on the server. It would load up the report, render it with the print stylesheet and then output the PDF. The concept is not unlike khtml2png or webkit2png but instead of outputting a raster image, it would output a PDF: gecko2pdf, if you will.

I've been researching the concept off and on for about six months but I haven't seen anything even approaching this. All of the discussion around Firefox 3.0's PDF support seems specific to saving the screen presented page as PDF (i.e., without the print CSS applied). Another huge downer is that I can't seem to find many examples of people using Gecko on the server in any sort of automated fashion.

Quite frankly, I'm stumped. I figured I'd take a moment and write down my thoughts to clear my head. Maybe someone has been down this dark path before or maybe I'm just way out in left field with the whole concept.


	Web Artima.com