Let me preface by saying I have very little programming experience. I've learned a bunch in the last few days trying to write this program. I am running Python 2.7 on Windows 7 using PyCharm, requests, Beautiful Soup, and lxml.
I am trying to scrape data from a website that relies heavily on Javascript. I have two options:
1) The data I need is populated through Javascript and does not necessarily need a login. However I have not been able to figure how to get at this data. I've live monitored headers with live HTTP Headers chrome plugin and I think I've found the Javascript that does it but I'ts beyond my means to figure it out. Its a long bit of code, I'll post it if anyone is interested in taking a look.
or
2)On one of the main pages I found a series of ID numbers which I can use to generate URL's for each of the individual items I am analyzing. Problem is I have to be logged in to see these individual item pages. My code is as follows:
from requests.adapters import HTTPAdapter
from requests.packages.urllib3.poolmanager import PoolManager
from BeautifulSoup import BeautifulSoup
import ssl
# Request a date from user
UDate = "06/22/2015" # raw_input('Enter a date mm/dd/yyyy\n')
# Open TLSv1 Adapter (Whataver that means)
class MyAdapter(HTTPAdapter):
def init_poolmanager(self, connections, maxsize, block=False):
self.poolmanager = PoolManager(num_pools=connections,
maxsize=maxsize,
block=block,
ssl_version=ssl.PROTOCOL_TLSv1)
# Begin a requests session. Every get from here on out will use TLSv1 Protocol
import requests
payload = {
'LogName': 'xxxxxxxx',
'LogPass': 'xxxxxxxx'
}
s = requests.Session()
s.mount('', MyAdapter())
# Login with post and Request source code from main page.
log = s.post('LoginURL', data=payload)
print log.text
result = s.get(url)
soup = BeautifulSoup(result.content)
print soup
Neither the post or the get show me a logged in website. The logform id's from the HTML source code look like this:
<div id="DivLogForm">
<label for="BadText"><div id="BadText" class="BadText" style="display:none" tabindex="-2">User Name or Password is Invalid</div></label>
<div class="LogLabel">
<label for="LogName" > User Name </label><input tabindex="0" id="LogName" class="LogInput" value="" />
</div>
<div class="LogLabel">
<label for="LogPass" >User Password </label><input tabindex="0"id="LogPass" type="password" class="LogInput" value="" />
</div>
So I'm passing LogName and LogPass with the post.
There is also a logform.js with this bit of code
$("#LogButton").click(function()
{ //$('#divLogForm').hide();
//$('#divLoading').show();
var uName = $("#LogName").val();
var uPass = $("#LogPass").val();
var url = "/index.cfm";
$.post(url, {ZACTION:'AJAX',ZMETHOD:'LOGIN',func:'LOGIN',USERNAME:uName, USERPASS:uPass},
function(data){if (data.isOk =="YES"){location.href="/index.cfm";}
else {$('.BadText').show(); $('#BadText').focus();};
},"json");
});
The LoginURL in my code is taken from the var url in this script. I have tried using USERNAME & USERPASS and I have tried uName and uPass with my post but these didnt work either.
Not sure how to move forward here. Any help is greatly appreciated
Let me preface by saying I have very little programming experience. I've learned a bunch in the last few days trying to write this program. I am running Python 2.7 on Windows 7 using PyCharm, requests, Beautiful Soup, and lxml.
I am trying to scrape data from a website that relies heavily on Javascript. I have two options:
1) The data I need is populated through Javascript and does not necessarily need a login. However I have not been able to figure how to get at this data. I've live monitored headers with live HTTP Headers chrome plugin and I think I've found the Javascript that does it but I'ts beyond my means to figure it out. Its a long bit of code, I'll post it if anyone is interested in taking a look.
or
2)On one of the main pages I found a series of ID numbers which I can use to generate URL's for each of the individual items I am analyzing. Problem is I have to be logged in to see these individual item pages. My code is as follows:
from requests.adapters import HTTPAdapter
from requests.packages.urllib3.poolmanager import PoolManager
from BeautifulSoup import BeautifulSoup
import ssl
# Request a date from user
UDate = "06/22/2015" # raw_input('Enter a date mm/dd/yyyy\n')
# Open TLSv1 Adapter (Whataver that means)
class MyAdapter(HTTPAdapter):
def init_poolmanager(self, connections, maxsize, block=False):
self.poolmanager = PoolManager(num_pools=connections,
maxsize=maxsize,
block=block,
ssl_version=ssl.PROTOCOL_TLSv1)
# Begin a requests session. Every get from here on out will use TLSv1 Protocol
import requests
payload = {
'LogName': 'xxxxxxxx',
'LogPass': 'xxxxxxxx'
}
s = requests.Session()
s.mount('https://xxxx.xxx', MyAdapter())
# Login with post and Request source code from main page.
log = s.post('LoginURL', data=payload)
print log.text
result = s.get(url)
soup = BeautifulSoup(result.content)
print soup
Neither the post or the get show me a logged in website. The logform id's from the HTML source code look like this:
<div id="DivLogForm">
<label for="BadText"><div id="BadText" class="BadText" style="display:none" tabindex="-2">User Name or Password is Invalid</div></label>
<div class="LogLabel">
<label for="LogName" > User Name </label><input tabindex="0" id="LogName" class="LogInput" value="" />
</div>
<div class="LogLabel">
<label for="LogPass" >User Password </label><input tabindex="0"id="LogPass" type="password" class="LogInput" value="" />
</div>
So I'm passing LogName and LogPass with the post.
There is also a logform.js with this bit of code
$("#LogButton").click(function()
{ //$('#divLogForm').hide();
//$('#divLoading').show();
var uName = $("#LogName").val();
var uPass = $("#LogPass").val();
var url = "/index.cfm";
$.post(url, {ZACTION:'AJAX',ZMETHOD:'LOGIN',func:'LOGIN',USERNAME:uName, USERPASS:uPass},
function(data){if (data.isOk =="YES"){location.href="/index.cfm";}
else {$('.BadText').show(); $('#BadText').focus();};
},"json");
});
The LoginURL in my code is taken from the var url in this script. I have tried using USERNAME & USERPASS and I have tried uName and uPass with my post but these didnt work either.
Not sure how to move forward here. Any help is greatly appreciated
Share Improve this question asked Jun 19, 2015 at 20:00 Gustavo CostaGustavo Costa 1111 gold badge1 silver badge8 bronze badges1 Answer
Reset to default 2The last bit of javascript you posted gives a clue as to why your login POST request isn't working.
According to the javascript, you should be sending a dictionary that looks like the following with your login POST:
{
'ZACTION': 'AJAX',
'ZMETHOD': 'LOGIN',
'func': 'LOGIN',
'USERNAME': '<enter username>',
'USERPASS': '<enter password>'
},
发布者:admin,转转请注明出处:http://www.yc00.com/questions/1745168446a4614776.html
评论列表(0条)