I want to download web pages that use javascript to output the data. Wget can do everything else, but run javascript.
Even something like:firefox -remote "saveURL(www.mozilla, myfile.html)"
would be great (unfortunately that kind of mand does not exist).
I want to download web pages that use javascript to output the data. Wget can do everything else, but run javascript.
Even something like:firefox -remote "saveURL(www.mozilla, myfile.html)"
would be great (unfortunately that kind of mand does not exist).
Share Improve this question edited Mar 24, 2009 at 23:10 UnkwnTech 91.1k66 gold badges191 silver badges234 bronze badges asked Mar 24, 2009 at 23:07 Nick NolanNick Nolan 05 Answers
Reset to default 4I'd look at the selenium browser automation tool (http://seleniumhq/) - you can automate visiting a webpage, and saving the resultant HTML.
We used it to great success for a similar purpose on a prior project.
I second Alex's suggestion for Selenium. It runs in the browser so it can capture output HTML after Javascript has modified the DOM.
The problem with using a browser-driven approach is that it'll be hard to automate the process of scraping.
Look for a "headless browser" in your favourite programming language of choice. Alternatively, you can use Jaxer to load the DOM serverside, execute the JavaScript and let it manipulate the DOM, and then scrape the modified DOM using the same JavaScript you are already familiar with. This would be my preferred approach.
If it can be a Windows based app, you can try using the browser ponent of any programming language like C#, Visual Basic, Delphi, etc to load the page and then peek into the content and save it. The browser ponent should be based on IE rendering engines and should support JavaScript. There's a question regarding snapshots of websites here. May be of some use to you.
Alternately, you could consider building your own Firefox extension. Take a peek here for further details (there's no "next" button, just the menu on the left for navigation, confused me at first).
I have done this before using:
- chickenfoot
- webkit
发布者:admin,转转请注明出处:http://www.yc00.com/questions/1745344527a4623482.html
评论列表(0条)