[Update] Unfortunately I never had an opportunity to solve this problem. However, there are some interesting responses below that are worth a try for other readers looking to do something similar.
I'm trying to parse data from a site running ASP.NET. This site has a login page that I've successfully traversed (using a legitimate account) and stored the cookie for, but when I get deeper into the site I need to navigate it by updating UpdatePanels via Asynchronous Postbacks. The UpdatePanels contain the data that I want.
I'm trying to do this all using PHP and curl. I can successfully load the initial page. When I POST to my target page with all the relevant data (obtained via Firefox's Tamper Data plugin), the echoed result returned from curl always clears my page. Typically, echoing the result would just print out (or spew some error/garbled text) further down the page. curl_error() doesn't print out anything, so it's something wrong with what's being returned to me.
I'm at wits end about how to go about this from here. Please tell me if: a) you know what error I'm getting, b) if this is even going to be possible with exclusively PHP, and c) if, conversely, I need to brush off javascript to interact with ASP.NET's UpdatePanels.
$uri = "TARGETURL";
$cl=curl_init();
curl_setopt($cl, CURLOPT_URL, $uri);
curl_setopt($cl, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:5.0) Gecko/20100101 Firefox/5.0');
curl_setopt($cl, CURLOPT_COOKIEFILE, "/tmp/cookie2.txt");
curl_setopt($cl, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($cl, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($cl, CURLOPT_CONNECTTIMEOUT, 0);
curl_setopt($cl, CURLOPT_POST, 1);
$postdata=array(
"__VIEWSTATE" => $viewstate,
"OTHER DATA" => "asdfkljsddflkjshdjf",
"__ASYNCPOST" => "true",
);
echo "<PRE>";
print_r($postdata);
echo "</PRE>";
curl_setopt ($cl, CURLOPT_POSTFIELDS, $postdata);
$result = curl_exec($cl); // execute the curl mand
echo $result;
Here is the Header and Body I am receiving back from the server (e-mailed to myself to bypass the page-clearing happening with the echo statement):
HEADER RESPONSE:
HTTP/1.1 100 Continue
HTTP/1.1 200 OK
Cache-Control: no-cache
Pragma: no-cache
Content-Type: text/plain; charset=utf-8
Expires: -1
Server: Microsoft-IIS/7.5
X-Content-Type-Options: nosniff
Set-Cookie: culture=en-US; expires=Tue, 27-Nov-2012 20:02:37 GMT; path=/
X-Powered-By: ASP.NET Date: Mon, 28 Nov 2011 20:02:37 GMT
Content-Length: 112
BODY RESPONSE:
69|dataItem||<script type="text/javascript">window.location="about:blank"</script>|11|pageRedirect||/Error.aspx|
This explains the problem I'm getting with the page going blank (javascript redirecting my browser output). It also seems to indicate that the header isn't the issue as I'd be getting an HTTP error from bad header values.
[Update] Unfortunately I never had an opportunity to solve this problem. However, there are some interesting responses below that are worth a try for other readers looking to do something similar.
I'm trying to parse data from a site running ASP.NET. This site has a login page that I've successfully traversed (using a legitimate account) and stored the cookie for, but when I get deeper into the site I need to navigate it by updating UpdatePanels via Asynchronous Postbacks. The UpdatePanels contain the data that I want.
I'm trying to do this all using PHP and curl. I can successfully load the initial page. When I POST to my target page with all the relevant data (obtained via Firefox's Tamper Data plugin), the echoed result returned from curl always clears my page. Typically, echoing the result would just print out (or spew some error/garbled text) further down the page. curl_error() doesn't print out anything, so it's something wrong with what's being returned to me.
I'm at wits end about how to go about this from here. Please tell me if: a) you know what error I'm getting, b) if this is even going to be possible with exclusively PHP, and c) if, conversely, I need to brush off javascript to interact with ASP.NET's UpdatePanels.
$uri = "TARGETURL";
$cl=curl_init();
curl_setopt($cl, CURLOPT_URL, $uri);
curl_setopt($cl, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:5.0) Gecko/20100101 Firefox/5.0');
curl_setopt($cl, CURLOPT_COOKIEFILE, "/tmp/cookie2.txt");
curl_setopt($cl, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($cl, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($cl, CURLOPT_CONNECTTIMEOUT, 0);
curl_setopt($cl, CURLOPT_POST, 1);
$postdata=array(
"__VIEWSTATE" => $viewstate,
"OTHER DATA" => "asdfkljsddflkjshdjf",
"__ASYNCPOST" => "true",
);
echo "<PRE>";
print_r($postdata);
echo "</PRE>";
curl_setopt ($cl, CURLOPT_POSTFIELDS, $postdata);
$result = curl_exec($cl); // execute the curl mand
echo $result;
Here is the Header and Body I am receiving back from the server (e-mailed to myself to bypass the page-clearing happening with the echo statement):
HEADER RESPONSE:
HTTP/1.1 100 Continue
HTTP/1.1 200 OK
Cache-Control: no-cache
Pragma: no-cache
Content-Type: text/plain; charset=utf-8
Expires: -1
Server: Microsoft-IIS/7.5
X-Content-Type-Options: nosniff
Set-Cookie: culture=en-US; expires=Tue, 27-Nov-2012 20:02:37 GMT; path=/
X-Powered-By: ASP.NET Date: Mon, 28 Nov 2011 20:02:37 GMT
Content-Length: 112
BODY RESPONSE:
69|dataItem||<script type="text/javascript">window.location="about:blank"</script>|11|pageRedirect||/Error.aspx|
This explains the problem I'm getting with the page going blank (javascript redirecting my browser output). It also seems to indicate that the header isn't the issue as I'd be getting an HTTP error from bad header values.
Share Improve this question edited Dec 14, 2012 at 0:23 David asked Nov 27, 2011 at 3:33 DavidDavid 3594 silver badges16 bronze badges 8-
1
Just have to temper data of the ajax request and mimic it pletely. Whats the target url? Probably missing like
__EVENTTARGET
__EVENTARGUMENT
– SSpoke Commented Nov 27, 2011 at 3:43 - It was removed for confidentiality. Same with __EVENTTARGET and __EVENTARGUMENT (though I know those two have the right parameters). – David Commented Nov 27, 2011 at 4:13
- 1 But yeah since it's not SSL you can easily Sniff the traffic using Wireshark and see which line in the request or headers you are not sending.. try to mimic everything to the byte. – SSpoke Commented Nov 28, 2011 at 1:35
- Thanks for the info. I checked out Wireshark and decided I'd likely need to invest a large amount of time into it. I'm going to e back to that when I get an opportunity. Thanks. – David Commented Nov 29, 2011 at 0:52
- 1 probably kicking in an open door, but isn't it easier to just ask the site you are scraping to ask if they have an API, RSS feed or maybe even generated XML file to access the data you want. – Rody Commented Nov 30, 2011 at 15:16
4 Answers
Reset to default 2A. You state in your request that you are Firefox browser:
curl_setopt($cl, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:5.0) Gecko/20100101 Firefox/5.0');
Do not claim you're Firefox:
- if you cannot process scripts (as Firefox can and does)
- if you want to prevent ASP.NET from sending you a partial rendering response
Make your own user agent name, or don't send it at all.
ASP.NET checks if user agent supports callbacks: HttpCapabilitiesBase.SupportsCallback Property
B. Don't send __ASYNCPOST = true
(give it a try).
Here you are an addapted approach that works for me:
public function doPostbackToAspDotNetPage()
{
$uri = '*** THE_URL ***';
$cl = curl_init();
curl_setopt($cl, CURLOPT_URL, $uri);
curl_setopt($cl, CURLOPT_USERAGENT, 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:54.0) Gecko/20100101 Firefox/54.0');
curl_setopt($cl, CURLOPT_COOKIESESSION, '*** OPTIONAL ***');
curl_setopt($cl, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($cl, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($cl, CURLOPT_CONNECTTIMEOUT, 0);
curl_setopt($cl, CURLOPT_POST, 1);
// Just in case the url is https and the certification gives some kind of error
curl_setopt($cl, CURLOPT_SSL_VERIFYHOST, false);
curl_setopt($cl, CURLOPT_SSL_VERIFYPEER, false);
$postdata = array(
'__EVENTTARGET' => '*** A value such as: SOME_ID$ctl20$ctl02 ***',
'__EVENTARGUMENT' => ' *** OPTIONAL ***',
"__VIEWSTATE" => '*** REQUIRED BUNCH OF CHARACTERS ***',
"__ASYNCPOST" => "true",
'__VIEWSTATEGENERATOR' => '*** OPTIONAL ***',
'__EVENTVALIDATION' => "*** REQUIRED BUNCH OF CHARACTERS ***",
);
curl_setopt($cl, CURLOPT_POSTFIELDS, $postdata);
$result = curl_exec($cl);
if (!$result) {
echo sprintf('ERROR:%s', PHP_EOL);
echo curl_error($cl);
} else {
echo $result;
}
curl_close($cl);
}
A different approach can be use a very useful PHP tool (a class emulating browser behavior) that do all the job to keep trace of all fields, do the post/get by clicking on links or buttons.
Here the link:
simpletest
I have no clue about php and curl, but if I understand Correctly, you are trying to send info to an ASP page. Maybe the problem is that the page has the CausesValidation
option activated. so, the server is not allowing external POSTs to the page.
发布者:admin,转转请注明出处:http://www.yc00.com/questions/1745378883a4625128.html
评论列表(0条)