javascript - How to run jquery commands on HTML in python for DOM actionsscraping? - Stack Overflow

Lets say I'm using urllib2 and cookiejar (like so) to get responses from websites. Now I'm lo

Lets say I'm using urllib2 and cookiejar (like so) to get responses from websites. Now I'm looking for an easy way to use jQuery to essentially scrape data from the response returned from the webserver.

I understand that there are other modules that can be used in python for web-scraping (like), but is it possibly with just jQuery mands? I'm assuming I'd need some sort of js parser within python?

The reason that I am wanting to use jQuery is that I have ~20 Greasemonkey scripts(mostly written by others) that do some interesting modifications to numerous web sites and web games. They do all of the DOM modifications with jQuery. Instead of pletely refactoring most of this working and dependable code, I'd like to be able to simply port it to python (enabling simple and effective automation).

Lets say I'm using urllib2 and cookiejar (like so) to get responses from websites. Now I'm looking for an easy way to use jQuery to essentially scrape data from the response returned from the webserver.

I understand that there are other modules that can be used in python for web-scraping (like), but is it possibly with just jQuery mands? I'm assuming I'd need some sort of js parser within python?

The reason that I am wanting to use jQuery is that I have ~20 Greasemonkey scripts(mostly written by others) that do some interesting modifications to numerous web sites and web games. They do all of the DOM modifications with jQuery. Instead of pletely refactoring most of this working and dependable code, I'd like to be able to simply port it to python (enabling simple and effective automation).

Share Improve this question edited May 23, 2017 at 12:07 CommunityBot 11 silver badge asked Oct 5, 2012 at 14:37 g19fanaticg19fanatic 11k6 gold badges36 silver badges65 bronze badges
Add a ment  | 

2 Answers 2

Reset to default 7

pyquery is suited perfectly for this task.

It allows you to use jQuery like selectors on (X)HTML/XML from Python.

For example:

>>> from pyquery import PyQuery as pq
>>> d = pq("<html><p id="hello">Foo</p></html>")

>>> d("#hello")
[<p#hello.hello>]

>>> d('p:first')
[<p#hello.hello>]

See the plete API documentation for details, and the project page on bitbucket for the source and issue tracker.

Use lxml to parse the HTML and use it's cssselect module:

from lxml.cssselect import CSSSelector
from lxml import etree

tree = etree.parse(document)
elements = CSSSelector('div.content')(tree)

发布者:admin,转转请注明出处:http://www.yc00.com/questions/1742400173a4436753.html

相关推荐

发表回复

评论列表(0条)

  • 暂无评论

联系我们

400-800-8888

在线咨询: QQ交谈

邮件:admin@example.com

工作时间:周一至周五,9:30-18:30,节假日休息

关注微信