browser - savingmirroringcrawling web pages that use javascript to generate content - Stack Overflow

I want to download web pages that use javascript to output the data. Wget can do everything else, but r

I want to download web pages that use javascript to output the data. Wget can do everything else, but run javascript.

Even something like:firefox -remote "saveURL(www.mozilla, myfile.html)"

would be great (unfortunately that kind of mand does not exist).

I want to download web pages that use javascript to output the data. Wget can do everything else, but run javascript.

Even something like:firefox -remote "saveURL(www.mozilla, myfile.html)"

would be great (unfortunately that kind of mand does not exist).

Share Improve this question edited Mar 24, 2009 at 23:10 UnkwnTech 91.1k66 gold badges191 silver badges234 bronze badges asked Mar 24, 2009 at 23:07 Nick NolanNick Nolan 0
Add a ment  | 

5 Answers 5

Reset to default 4

I'd look at the selenium browser automation tool (http://seleniumhq/) - you can automate visiting a webpage, and saving the resultant HTML.

We used it to great success for a similar purpose on a prior project.

I second Alex's suggestion for Selenium. It runs in the browser so it can capture output HTML after Javascript has modified the DOM.

The problem with using a browser-driven approach is that it'll be hard to automate the process of scraping.

Look for a "headless browser" in your favourite programming language of choice. Alternatively, you can use Jaxer to load the DOM serverside, execute the JavaScript and let it manipulate the DOM, and then scrape the modified DOM using the same JavaScript you are already familiar with. This would be my preferred approach.

If it can be a Windows based app, you can try using the browser ponent of any programming language like C#, Visual Basic, Delphi, etc to load the page and then peek into the content and save it. The browser ponent should be based on IE rendering engines and should support JavaScript. There's a question regarding snapshots of websites here. May be of some use to you.

Alternately, you could consider building your own Firefox extension. Take a peek here for further details (there's no "next" button, just the menu on the left for navigation, confused me at first).

I have done this before using:

  • chickenfoot
  • webkit

发布者:admin,转转请注明出处:http://www.yc00.com/questions/1745344527a4623482.html

相关推荐

发表回复

评论列表(0条)

  • 暂无评论

联系我们

400-800-8888

在线咨询: QQ交谈

邮件:admin@example.com

工作时间:周一至周五,9:30-18:30,节假日休息

关注微信