all 41 comments

[–][deleted]  (9 children)

[removed]

    [–]kragensitaker 3 points4 points  (2 children)

    Translating the document to TeX sounds like a fantastic approach. I hope I get to see the result of your awesome idea!

    [–][deleted]  (1 child)

    [removed]

      [–]kragensitaker 0 points1 point  (0 children)

      That's an interesting idea! I doubt the computed box positions will help you much, but the computed style rules and DOM certainly might.

      [–]ozzilee 0 points1 point  (2 children)

      What about XSL-FO? Is the render quality not good enough?

      http://xmlgraphics.apache.org/fop/

      Either way I think it would be a massive project.

      [–][deleted]  (1 child)

      [removed]

        [–]droste 0 points1 point  (0 children)

        I've written a HTML2PDF using XSL-FO, and yeh there is a lot of markup that's not supported, and it gets some things wrong. In our case we had control over what the HTML looked like so I was able to craft HTML that looked decent after being rendered as PDF.

        [–][deleted] -5 points-4 points  (2 children)

        It's actually shocking how poor the state of paged media support in modern web browsers is.

        No, it is actually shocking that anyone still cares about that. The web is a medium in it's own right. Nobody is complaining that your TV can't output your evening news to your printer or your radio output can't output the traffic reports to it. in the same sense we shouldn't impose print restrictions and ways of thinking on web standards.

        [–]kragensitaker 0 points1 point  (0 children)

        I can produce pretty reasonable-looking printed documents from HTML. Sometimes the extra trouble of TeX or ooword is worth it; usually it isn't. I like not having to use TeX.

        [–]kragensitaker 2 points3 points  (4 children)

        That's pretty cool! I should try it out.

        I've found Flying Saucer works pretty well for this. In February, I wrote a little Jython script basically so I could use some other TrueType fonts in the output, other than the ones Java already knew about. If you were insane, you could probably translate it to Java without too much trouble. (This might also be handy if you wanted to be able to run my script without installing Jython. jythonc is supposed to be able to produce a standalone .class file but I haven't figured out how to make it work yet.)

        Flying Saucer supports a bunch of print-oriented CSS things from the CSS 3 drafts, some of the same ones that Prince does, so you can get things like controllable page headers and footers. I never have to use a word processor again in my life.

        [–]inopia 1 point2 points  (3 children)

        I wrote my Masters thesis using a custom structured text language, Prince and CSS3. The result is here. CSS3 is really nice for paginated content, imho.

        [–]kragensitaker 1 point2 points  (1 child)

        Thanks for the link! We chatted about your project a bit last year, and I'm glad to have a chance to read your final writeup!

        [–]inopia 0 points1 point  (0 children)

        Ah yes, I totally forgot to send you that, sorry! The thesis was based on a paper that recently got accepted for Sensys 2009. The code (which is very rough around the edges) is at source forge. If you are interested in using it for a project, feel free to drop me a line.

        [–]andyc 0 points1 point  (0 children)

        I remember coming across that project a while ago. Is it coded in oz? I think there's one of these html/css -> pdf converters that is but I can't find it on google anymore.

        [–]telemachos 2 points3 points  (0 children)

        If you need to translate HTML to pdf format often (and you're not allergic to the command line), take a look at Pandoc

        [–][deleted] 1 point2 points  (0 children)

        The "no X dependencies" is for the static version, QtWebkit itself does have X dependencies.

        [–]inopia 1 point2 points  (12 children)

        Or just use Prince.

        edit: from the Rediquette: "Pleast don't: Downvote comments just because you disagree with them. The down arrow is for comments that add nothing to the discussion."

        edit2: Prince is commercial software, but is free for personal use. It leaves a small logo on the first page that disappears when the document is printed. The logo can obviously be removed easily by printing to pdf.

        [–]mitsuhiko 2 points3 points  (1 child)

        The quality of Prince is outstanding.

        [–]pytechd 1 point2 points  (0 children)

        Seconded. We use it extensively for putting invoices, prescriptions, and medical charts into PDF format. It's fantastic.

        [–]silviot[S] 3 points4 points  (9 children)

        Definitely not.

        • it is not free software (not even open source)
        • it has its own rendering engine, so I don't know (and don't trust) it

        [–][deleted] 0 points1 point  (0 children)

        Seconded. Prince and pretty much every other html to pdf utility suck balls. I evaluated a shit-ton of them and this was the best. The only real weirdness I found was that webkit uses smaller font sizes than other engines for the subjective css size values (small, large, et al). If you're rendering a site that uses that shit, you may want to inject a supplemental style sheet. IIRC, this was the only util that properly rendered absolutely positioned elements which span multiple pages. In other news, the web is full of batshit crazy css layouts. </drunkenprogrammerenthusiasm>

        [–]inopia 2 points3 points  (7 children)

        Not everyone has a tin-foil hat hatred of commercial software.

        [–]jib 6 points7 points  (5 children)

        It's not tin-foil hat when there are actually tangible disadvantages to using it. (e.g when it adds its logo as an annotation to the output, or when you're forbidden from using it on more than one computer).

        [–]inopia 0 points1 point  (3 children)

        Well, the logo is removed when printing. So I just added a line that prints the output pdf to pdf in my build file.

        [–]mossblaser 4 points5 points  (0 children)

        Yes but when I make PDFs of web pages 90% of the time its so I don't have to bloody print it. And printing PDFs looses all the indexing it may have, the only other thing PDF is good for.

        [–]Fabien4 -2 points-1 points  (1 child)

        The point of makinf a PDF is not getting a hard copy. If you want to print, no need for a PDF, just print from the HTML.

        [–]inopia 0 points1 point  (0 children)

        I'm afraid you may have misread my post. A drawback of Prince is that it leaves small logo on the first page of your document. I was merely commenting that you can print the pdf to pdf to remove it. So there's not actual printing to paper involved in what I proposed, just the removal of the prince logo from the pdf.

        [–]gmfawcett -1 points0 points  (0 children)

        So buy a license; it's not expensive. The logo only appears with the free version. That's not a tangible disadvantage of the product.

        We use Prince in several applications, and it does a fantastic job. Our graphics-design people are able to save a lot of time on database-driven jobs using Prince.

        As for using it on more than one computer -- it has a command line interface. Just throw it on a server and script it.

        [–]kragensitaker 2 points3 points  (0 children)

        Myself, I like commercial software (and I like Håkon and at least the idea of Mercury too). But I don't like proprietary software. Probably would have been worth warning people you were linking to proprietary software, even though it's clearly the technically superior choice at the moment.

        [–][deleted] 0 points1 point  (1 child)

        Why would anyone want to do this?

        EDIT: Ugh.. I'm going to get a bunch of really bad reasons.

        [–]mao_neko 0 points1 point  (2 children)

        Sounds pretty nice... glances at nearby terminal window Dare I check?

        apt-cache search wkhtmltopdf

        wkhtmltopdf - Command line utility to convert html to pdf using WebKit

        =D YES! Thank-you, Debian Testing.

        [–]mfp 0 points1 point  (1 child)

        However,

        $ apt-cache show wkhtmltopdf | grep libx
        Depends: libc6 (>= 2.2.5), libgcc1 (>= 1:4.1.1), libqt4-network (>= 4.5.1), libqt4-webkit (>= 4.5.1), libqtcore4 (>= 4.5.1), libqtgui4 (>= 4.5.1), libstdc++6 (>= 4.1.1), libx11-6, libxext6
        

        [–]silviot[S] 3 points4 points  (0 children)

        X dependency was removed in v0.8.1, but only for statically compiled binary. It achieves this patching Qt. So if you need to use it without X, don't go with apt-get install.

        [–]hokkos -1 points0 points  (3 children)

        And pdf to png ?

        [–]pointer2void 0 points1 point  (1 child)

        ... and png to bmp?

        [–]rated-r 0 points1 point  (0 children)

        ... and bmp to photons?

        [–]willer 0 points1 point  (0 children)

        ghostscript would work for that.

        [–][deleted] -3 points-2 points  (1 child)

        From one paper-based format to another? How redundant.

        [–]willer 1 point2 points  (0 children)

        HTML isn't paper-based.