javascript - Is there a way to take a screenshot of every page on a website? - Stack Overflow

We have a couple of legacy sites undergoing an upgrade. It would be useful to be able to screenshot eve

We have a couple of legacy sites undergoing an upgrade. It would be useful to be able to screenshot every page and then md5 sum the results for both domains, and then test if everything which renders matches 100%.

I am unsure of how to do this - we have looked at cheerio which would crawl the site but be unable to screenshot, and nightwatch which can take screenshots but not crawl the site. Does anyone have experience doing this?

We have a couple of legacy sites undergoing an upgrade. It would be useful to be able to screenshot every page and then md5 sum the results for both domains, and then test if everything which renders matches 100%.

I am unsure of how to do this - we have looked at cheerio which would crawl the site but be unable to screenshot, and nightwatch which can take screenshots but not crawl the site. Does anyone have experience doing this?

Share Improve this question asked Jun 7, 2018 at 8:59 jackdawjackdaw 2,3145 gold badges33 silver badges54 bronze badges 1
  • @Patrick Roberts - have you actually experienced this while screenshotting wikipedia? – pguardiario Commented Jun 8, 2018 at 9:25
Add a ment  | 

2 Answers 2

Reset to default 3

An easy solution is to use Chrome in headless mode which can also be controlled with many Node modules like Puppeteer.

Taken from the Google Developers page:

chrome --headless --disable-gpu --screenshot https://www.chromestatus./

About crawling, you can use a mix of Cheerio and Puppeteer to crawl links and take screenshots. Alternatively you could find some tool that allows to export a sitemap (example) with all the website URLs, at this point it should be easy to loop through them and take a screenshot of each.

You could use StormCrawler with Selenium and write a custom NavigationFilter to take the screenshot and store the md5sum of it in the document metadata. See tutorial for an introduction to SC+Selenium.

The next step could be to write a custom indexer and dump the URLs with the md5s into a database or file. Finally, you'd do the same for the newer version of the site and pare the content of the files or rows in the table.

发布者:admin,转转请注明出处:http://www.yc00.com/questions/1745499868a4630363.html

相关推荐

发表回复

评论列表(0条)

  • 暂无评论

联系我们

400-800-8888

在线咨询: QQ交谈

邮件:admin@example.com

工作时间:周一至周五,9:30-18:30,节假日休息

关注微信