PDF to DOM conversion using javascript - Stack Overflow

I've been at it for quite some time and all i could uncover was this pdf2dom parser and probably a

I've been at it for quite some time and all i could uncover was this pdf2dom parser and probably a reverse engineered version of this. Anyway, here are my questions. For any rendering engine its input should be a stream of data (in my case the pdf content) and its output should be a chosen format (in my case DOM, HTML & CSS).

  1. However, instead of using java or c++, is it possible that i get the stream of "pdf data" (which is something i have no idea about) from the server and store into a javascript variable and use javascript to render it and append it to the DOM?

  2. How does the raw "pdf data" appear (is there any particular format.. etc)?

All inputs are wele.

NOTE : Should be IE patible.

I've been at it for quite some time and all i could uncover was this pdf2dom parser and probably a reverse engineered version of this. Anyway, here are my questions. For any rendering engine its input should be a stream of data (in my case the pdf content) and its output should be a chosen format (in my case DOM, HTML & CSS).

  1. However, instead of using java or c++, is it possible that i get the stream of "pdf data" (which is something i have no idea about) from the server and store into a javascript variable and use javascript to render it and append it to the DOM?

  2. How does the raw "pdf data" appear (is there any particular format.. etc)?

All inputs are wele.

NOTE : Should be IE patible.

Share Improve this question edited Dec 19, 2011 at 12:53 Ashwin Krishnamurthy asked Dec 19, 2011 at 10:12 Ashwin KrishnamurthyAshwin Krishnamurthy 3,7563 gold badges30 silver badges49 bronze badges
Add a ment  | 

2 Answers 2

Reset to default 6

It's been done already. The result is pdf.js. Note that it's working by rendering the PDF onto a canvas. The result can be guaranteed that way; some features of PDF wouldn't be possible outside the canvas currently.

PDF is generally a subset of PostScript + options for embedding flash, JavaScript and all sorts of other things.

Translating PDF trivially to HTML (/DOM), and have it render in a correct-ish manner is all but impossible. As an example, PDF uses JPEG images, but with subtle changes here and there, which means you have to convert them before use anywhere else. Try reading some presentations from the PDF.js-guys, and you'll find quite a long list of WTFs.

However, if you only have simple PDF's (plain text; no images, etc.) and don't care about preserving anything but the simplest of layout, you should be able to scrape out string data from the PDF's and put it into the DOM.

Personally, however, I believe that it would be simpler either to force users to have a plugin (flash/acrobat/...), or render the PDF's server-side and serve them as images to the browser.

发布者:admin,转转请注明出处:http://www.yc00.com/questions/1744582948a4582148.html

相关推荐

  • PDF to DOM conversion using javascript - Stack Overflow

    I've been at it for quite some time and all i could uncover was this pdf2dom parser and probably a

    2天前
    50

发表回复

评论列表(0条)

  • 暂无评论

联系我们

400-800-8888

在线咨询: QQ交谈

邮件:admin@example.com

工作时间:周一至周五,9:30-18:30,节假日休息

关注微信