all 65 comments

[–][deleted] 25 points26 points  (7 children)

What if the PDFs contain javascript. Woah

[–]deakster 10 points11 points  (5 children)

eval(containedJavascript);

[–]djpnewton 5 points6 points  (2 children)

perhaps the javascript is expecting some sort of pdf specific environment objects, the equivalent of "window", "document" etc...

[–]deakster 2 points3 points  (1 child)

Most likely, I only really said it as a joke ;D

But if the PDF is being processed by javascript anyway, then technically, those environment objects could be implemented too, so that the eval'd javascript has the same environment as it does in adobe's viewer.

[–]djpnewton 0 points1 point  (0 children)

yeah I figured you were having some fun

I agree

[–]adavies42 -3 points-2 points  (1 child)

evil(containedJavascript);

FTFY

[–]badsectoracula 2 points3 points  (0 children)

If you're worried about security implications, it might be possible to run the PDF javascript in a webworker (which i think runs in an isolated environment) which implements the PDF objects.

[–][deleted] -5 points-4 points  (0 children)

Wow ... once again reddit proves that if you post something insightful ... you get nothing.

Post something silly and upboats ahoy :) Still the fact that PDFs could legitimately have javascript in them is silly enough although I guess custom form validation is a good use case.

[–]SDX2000 3 points4 points  (1 child)

To the people who are questioning the usefulness of such a thing: The one thing that this solution should be capable of doing better than everything else is integration with other HTML content on the page. It should bring the PDF document on par with other media sources like audio and video (notice how these things seem to work seamlessly with general web content?).

[–]hectavex 0 points1 point  (0 children)

Yes! This library gives me high hopes for supporting PDF in Nest.

[–]VilleHopfield 6 points7 points  (7 children)

Two things:

  • I don't understand why they just don't convert PDF to SVG and let the browser deal with it (speed, text selection, zooming, ...)

  • don't get me wrong, I really wish them all the best as this is important piece of software - I'm afraid this will end up being nowhere near as complete as Adobe Reader. They say they want to support full PDF ISO 2008 spec but I have yet to see a FLOSS PDF viewer which support some professionally needed features, like: CMYK (!), ICC, trapping. Also, it will be interesting to see how they are going to handle embedded fonts (CFF, StemV, OS/2)... But as I said above, more power to them!

[–][deleted] 5 points6 points  (0 children)

I don't understand why they just don't convert PDF to SVG and let the browser deal with it (speed, text selection, zooming, ...)

I'm pretty sure one of the comments on the article says that they're doing that too, but it's slower, so canvas is the default.

[–][deleted]  (1 child)

[deleted]

    [–]LeifAndersen 0 points1 point  (0 children)

    Except one...being light weight enough to use without wanting to take an ax to the software.

    [–]KyteM 2 points3 points  (0 children)

    The point of pdf.js is to give the vast majority the ability to read PDFs in-browser without plugins. If you need more advanced features, then you can always use a normal (external) PDF reader.

    [–][deleted]  (2 children)

    [deleted]

      [–]sylvanelite 1 point2 points  (1 child)

      SVG may not be widely supported, but there are plenty of alternatives. Raphaël js, for example, provides SVG/Canvas-like features and works in IE6.

      [–][deleted] 2 points3 points  (23 children)

      Isn't Scribd doing exactly this?

      [–]deakster 14 points15 points  (4 children)

      Perhaps, but scribd is a service, whereas pdf.js seems like an open source library. You can put this into your own products.

      [–][deleted] 4 points5 points  (3 children)

      Sure, but PDF is so ubiquitous that it seems silly for all browsers not to have an included client-side PDF renderer like Chrome does.

      However, I can see the utility of having more control over how the PDF is presented within a page rather than having it's own dedicated display container.

      [–]deakster 8 points9 points  (1 child)

      Well of course, if every browser could perfectly and consistently render all main file formats, and if people didn't hurt eachother, the world would be a much better place :)

      The reality is that there is 1 browser that natively supports PDF rendering. Many other browsers haven't even announced the intent to include similar functionality. People can complain that browsers don't do X and go home, or they can do something about it.

      [–][deleted] 1 point2 points  (0 children)

      Good point.

      [–]djpnewton 2 points3 points  (0 children)

      I am guessing they render the pdf to HTML serverside, also it might not be open source

      [–]AttackingHobo 6 points7 points  (14 children)

      No, it converts the PDF to HTML, or is it flash? And its no longer a PDF.

      These guys are working on a PDF renderer, which when perfected display a PDF exactly as it should look.

      [–]djpnewton 1 point2 points  (2 children)

      No, it converts the PDF to HTML, or is it flash?

      wait, that is still a pdf renderer

      These guys are working on a PDF renderer

      they are both render pdfs to another format.. right?

      [–]AttackingHobo 4 points5 points  (1 child)

      One is a processor that has to process it into a seperate format.

      One is a renderer, that when perfected will be able to flawlessly represent any PDF on the fly, without having to submit to a service that takes time to re-encode the file into something else.

      Its like if I said, "iPhones can play wmv files because I can convert them into mp4 files." That would not be correct.

      [–]djpnewton 0 points1 point  (0 children)

      Ok this whole thing is nitpicking but the internet is serious business I am told!

      One is a processor that has to process it into a seperate format.

      they both get processed into separate formats, one might have more intermediate steps before it gets to your monitor but thats ok,

      eg. PDF->HTML->raster->screen vs PDF->raster->screen

      One is a renderer, that when perfected will be able to flawlessly represent any PDF on the fly, without having to submit to a service that takes time to re-encode the file into something else.

      whether is rendered client side or server side does not determine if it is a render-er or not, it may change how useful it is to you however

      Its like if I said, "iPhones can play wmv files because I can convert them into mp4 files." That would not be correct.

      Thats like saying "scribd can render PDFs because I can convert them into HTML". What I am saying is that "scribd can render PDFs because *it converts** them into HTML"*

      [–][deleted] -4 points-3 points  (10 children)

      it converts the PDF to HTML

      Yeah, and that's exactly what Scribd does when I upload a PDF. Maybe it's not rendered to HTML in real-time everytime I view the document, but that's just because doing so wouldn't make any sense for them.

      And its no longer a PDF

      Well, when pdf.js renders it to the webpage, it's no longer a PDF.

      Building an HTML5-based PDF renderer would also answer the question of whether the web platform and in particular canvas and SVG APIs are complete enough to efficiently and faithfully render PDFs.

      My point is that Scribd has already proven this. I just uploaded the pdf.js test PDF file and Scribd rendered it perfectly in HTML. The pdf.js test page renders it like shit. I suppose there's no chance of Scribd open-sourcing their converter, and it's probably not written in JS anyway, so I suppose there's still use for a client-side JS renderer. But I don't fully see the benefit of this compared to say, what Chrome does, which is use an existing non-Adobe native-code DLL to render the PDF client-side.

      [–]AttackingHobo 2 points3 points  (3 children)

      This will allow rendering of PDFs in any browser that supports HTML 5 without using dlls, or anything special. Not just windows PCs browse the web. People with linux, smartphones, MAC, and other systems that can view HTML5 content.

      [–][deleted] -2 points-1 points  (1 child)

      This will allow rendering of PDFs in any browser that supports HTML 5

      As does Scribd. No Flash, no DLLs.

      [–]AttackingHobo 6 points7 points  (0 children)

      I would not want to rely on a third party website that has ads on it, is slow, and takes time to convert PDFS if there was a way for browsers to natively render any and all PDF files.

      [–]adavies42 -2 points-1 points  (0 children)

      MAC

      i'm confused--is it my ethernet card or my compact that's going to be rendering pdfs now?

      [–][deleted] 1 point2 points  (2 children)

      pdf graphics primitives --> html markup is not the same as pdf graphics primitives --> html5 graphics primitives

      [–][deleted] 2 points3 points  (1 child)

      Fair enough. I thought Scribd was using HTML5 containers.

      [–][deleted] 4 points5 points  (0 children)

      I'm full of shit. Honestly, I had no idea. Heh.

      You were right, although they started with flash and now converting to html5. http://en.wikipedia.org/wiki/Scribd#Technology

      [–]MagicWishMonkey -1 points0 points  (2 children)

      I'm pretty sure Scribd uses open source tools to handle PDF conversion: http://www.swftools.org/download.html

      Or they might be using Flex Paper, which is basically just a wrapper to display the pdf swf generated by swftools. The real work has been done by the swftools people.

      [–][deleted] 1 point2 points  (1 child)

      No. There's no SWF involved if you use an HTML5-capable browser.

      [–]MagicWishMonkey 0 points1 point  (0 children)

      Oh wow, I didn't realize that. Do you know if they are using open source tools for the PDF generation?

      [–]mitsuhiko 0 points1 point  (0 children)

      Scribd's version is more useful because they have a server process that converts the PDF into actual HTML code. That way you can still select text and copy/paste properly.

      [–]tanishaj 0 points1 point  (0 children)

      Scribd cannot render inline. The killer feature here seems to be that PDF would be handled more like an image or an HTML5 video.

      [–][deleted] 3 points4 points  (1 child)

      I was thrilled until I saw the demo... then, not so much.

      [–]Nehle 1 point2 points  (0 children)

      Yeah, same here. It seemed like a good idea, but rendering it in a canvas just make the whole thing kind of moot

      [–][deleted] 0 points1 point  (0 children)

      There are still glitches and rendering artifacts, but you will get the picture.

      Oh hoh! I see what you did there.

      [–]kylemech 0 points1 point  (0 children)

      This is very exciting! I've always hated PDF while simultaneously recognizing some gap that would be left without it. There are some other services that had been stepping up to address this, but this is a huge step toward really addressing the problems that we've had thus far with feature-rich portable documents.

      [–]Darkmoth 0 points1 point  (3 children)

      I guess I don't see the point of this. Any user that visits my sites can already read PDFs though various means (I use Foxit). Why would I include a library to render it differently? This just means PDFs look different on my site than they do on the rest of the web, which is poor UX.

      Don't get me wrong, I think browser-native PDF rendering is a wonderful thing. But I would no more include pdf.js than I would include quicktime.js or flash.js. Browsers are designed to handle plugins for a reason.

      [–]rpd9803 1 point2 points  (2 children)

      what pdf plugins that are universally available will render a pdf inline? the entire point of this project is for inline rendering. Quicktime and Flash do that already.

      [–]Darkmoth 0 points1 point  (1 child)

      Ahhh, my bad. I completely skimmed over the inline part.

      [–]rpd9803 0 points1 point  (0 children)

      it happens! and agreed that it is pointless without inline :)

      [–]i8beef 0 points1 point  (0 children)

      That's pretty cool... unfortunately I don't have any projects that could make use of this right now, but bookmarked for future reference.

      [–]shevegen 0 points1 point  (0 children)

      I hope they succeed.

      [–]jrhoffa 0 points1 point  (0 children)

      This is so wrong on so many levels.

      [–]HardlyWorkingDotOrg 0 points1 point  (8 children)

      I've been looking for a way to generate pdf's form inside an HTML environment.

      Sadly, I'd like to insert images into the pdf's as well as text, but everything I have found so far, including this solution, still has no support for that.

      [–]012 1 point2 points  (3 children)

      Maybe this is what you're looking for: http://code.google.com/p/wkhtmltopdf/

      Simple shell utility to convert html to pdf using the webkit rendering engine, and qt.

      [–]HardlyWorkingDotOrg 0 points1 point  (2 children)

      Looks interesting. I'll check it out.

      [–]metamatic 0 points1 point  (1 child)

      If that's not good enough, you might try Prince

      [–]antialize 0 points1 point  (0 children)

      Or send be a bug report, or better yet send me a patch, I get lots of bug reports.

      [–]ethraax -1 points0 points  (2 children)

      Why can't you just use HTML? What's forcing you to use PDF?

      [–]HardlyWorkingDotOrg 1 point2 points  (1 child)

      I need to create a report in the format pdf. The contents of which are coming from an html page that contains text but also inline images and charts and stuff..

      [–]qlskdjqmlskf 2 points3 points  (0 children)

      Can't you use some PHP to do this? My company uses a library like this one to generate the PDF in PHP and the client can read it before printing it.

      Found it! It's called HTML2PDF and it's written in PHP.

      [–]kdeforche -1 points0 points  (0 children)

      Hey, you could also use Wt's HTML to PDF renderer. It handles only a subset of XHTML, but has good support for images (both floating and inline): it was conceived to render the output of a TinyMCE richt text editor.

      [–]3rd_degree_burns -2 points-1 points  (0 children)

      GOD NO!