Use python to a open web browser (on windows), trigger javascript actions, and get the html contents? - Stack Overflow

admin•2025-04-16 21:48:08•questions•阅读1

Yes that sounds overly plicated.I am trying to mine data from pages on our intranet. The pages are secu

Yes that sounds overly plicated.

I am trying to mine data from pages on our intranet. The pages are secure. The connection is refused when I try to get the contents with urllib.urlopen().

So I would like to use python to open a web browser to open the site then click some links that trigger javascript pop ups containing tables of info that I want to collect.

Any suggestions on where to begin?

I know the format of the page. It is something like this:

<div id="list">
    <ul id="list item">
        <li><a onclict="Openpopup('1');">blah</a></li>
    </ul>
    <ul></ul>
    etc

Then a hidden frame bees visible and the fields in the table within are filled.

<div>
    <table>
       <tr><td><span id="info_i_want">...

Yes that sounds overly plicated.

I am trying to mine data from pages on our intranet. The pages are secure. The connection is refused when I try to get the contents with urllib.urlopen().

So I would like to use python to open a web browser to open the site then click some links that trigger javascript pop ups containing tables of info that I want to collect.

Any suggestions on where to begin?

I know the format of the page. It is something like this:

<div id="list">
    <ul id="list item">
        <li><a onclict="Openpopup('1');">blah</a></li>
    </ul>
    <ul></ul>
    etc

Then a hidden frame bees visible and the fields in the table within are filled.

<div>
    <table>
       <tr><td><span id="info_i_want">...

Share Improve this question asked Jan 26, 2012 at 2:47 sequoia 3,1658 gold badges34 silver badges41 bronze badges

Add a ment |

4 Answers 4

Sorted by: Reset to default 5

First off, I suggest that it's better to figure out what the page needs that JS is providing, and fake that - you'll have an easier time scraping the page if a browser isn't involved.

If it's just Javascript making an XMLHttpRequest, you can find the page from which the Javascript fetches the iframe data and connect directly to that.

But in spite of that you may need a library that does Javascript execution (if the reverse-engineering is too hard or it uses challenge tokens). A web-rendering framework like Gecko or WebKit might be appropriate.

Take a good look at Selenium if you insist on using a true web browser or cannot get the programmatic methods to work.

Once you've gotten the page contents via whatever method, you need an HTML parser (such as sgmllib or [almost] xml.dom). I suggest a DOM library. Parse the DOM and extract the contents from the appropriate node in the resulting tree.

The connection is refused when I try to get the contents with urllib.urlopen(). probably means you have to make a post request using python urllib module.I would suggest you use urllib2.You may also need to handle cookies, referrer,user-agent from your python code.

To see all the post request fired from your browser use firefox's live-http-headers.

For the javascript part,

Your best bet is to run a headless browser e.g phantomjs which understands all the intricacies of JavaScript, DOM etc but you will have to write your code in Javascript, benefit is that you can do whatever you want.

As, @phihag mentioned selenium is also a good option.

First of all, you should really find out why the connection is refused when you access the page with Python. Most likely, you'll have to perform HTTP authentication or specify a different User-Agent.

Firing up a browser, navigating, and getting the HTML back is a plex task. Luckily, you can implement it using selenium.

Consider taking a look at splinter which is a simpler webdriver API than Selenium.

发布者：admin，转转请注明出处：http://www.yc00.com/questions/1744674466a4587243.html

admin

questions
javascript - Need to make a vertical timeline - Stack Overflow
I need to make a vertical javascript timeline like below link which is now created with flash.Can some
admin
27分钟前
10
questions
plugins - Why is wp-cron only executing on page visit?
As I have understood wp-cron runs after a HTTP-Request to the given domain. Can someone explain to me why this is the ca
admin
24分钟前
10
questions
react native - Is Microsoft Purview Information Protection has API like Graph or Restful a mobile ReactNative App can use to get
Like the title, I'm working on a React Native App to get file contents from MPIP controlled.I use
admin
23分钟前
00
questions
javascript - How to store the output of an XMLHttpRequest in a global variable - Stack Overflow
I have a webpage that will load a couple JSON files via XMLHttpRequest.I want to load the file, parse
admin
21分钟前
10
questions
Show current navigation path from menu
I want something like thisHome > Products > Steel > NailsThis information should come from the menu structure!
admin
20分钟前
10
questions
javascript - Table cells format when exporting jQuery Datatable to Excel - Stack Overflow
I have a datatable in my application that I need to export to Excel but when the datatable is exported
admin
20分钟前
10
questions
javascript - Entire window becomes blur after a modal is shown jQuery why? - Stack Overflow
I want to show a modal using jQuery, but all window bees blured like this:This is the js:$(document).re
admin
19分钟前
10
questions
pydantic - Cleanly initialize one model from another model without using model_dump() - Stack Overflow
Given the following toy setup:from pydantic import BaseModelclass External(BaseModel):string: strclass
admin
17分钟前
00
questions
javascript - How do I render icon font glyphs on a canvas element? - Stack Overflow
I'm using a font fromon a project and need to draw this to a canvas element with fillText.var c
admin
17分钟前
00
questions
javascript - Guide scaling of an image according limits of parents - Stack Overflow
Currently I'm helping someone with a page and they needed to display some images of their work. At
admin
17分钟前
10
questions
javascript - are dist folders always available in node modules - Stack Overflow
i am building a tool of my own to trans pile and pack the related js files (that are written in ES6) in
admin
14分钟前
00
questions
Accessing customizer values in javascript
I'm using a page builder (Divi) and have made several custom controls for the Theme Customizer. No problems hooking
admin
14分钟前
00
questions
php - Fatal error: Uncaught Error: Call to undefined function test()
So I'm having an issue where I'm getting a 'Undefined function' response and I can't seem to fi
admin
12分钟前
00
questions
python - Claude Sonnet can call tool only once at a time - Stack Overflow
I'm testing a function calling capability of Claude Sonnet 3.7, but it can only call the tool once
admin
12分钟前
00
questions
javascript - Create table by clicking buttons - Stack Overflow
Here's what I want to do. Hopefully it's not too hard. I need to create a table with a div in
admin
11分钟前
00
questions
javascript - How toggle button as active and inactive jquery - Stack Overflow
I have 3 button and if i press the button 1 it should change the color (i add some class) it should act
admin
11分钟前
10
questions
javascript - npm run tests before build - Stack Overflow
I recently discovered the npm version [major|minor|patch] mand to automatically bump the package versio
admin
10分钟前
00
questions
git - There is one active patchset which is not merged to my Master....I want to merge that patch set to my local and don&#3
There is new version of 8 of my files which I have cloned , Those new 8 files are presented in a patchs
admin
10分钟前
00
questions
javascript - GPS location tracking in laravel - Stack Overflow
I am currently working on a University project and I am going to create a hitch hiking mobile app.I a
admin
8分钟前
00
questions
c# - Fine Code Coverage(FCC) throwing issue while generating the report - Stack Overflow
In the latest version of VS 17.13.2, FCC is breaking with the below errorSystem.InvalidOperationExcepti
admin
7分钟前
00

发表回复

评论列表（0条）

暂无评论

Use python to a open web browser (on windows), trigger javascript actions, and get the html contents? - Stack Overflow

4 Answers 4

发表回复

评论列表（0条）

联系我们

400-800-8888

Use python to a open web browser (on windows), trigger javascript actions, and get the html contents? - Stack Overflow

4 Answers 4

相关推荐

发表回复

评论列表（0条）

联系我们

400-800-8888