python - Failed to parse the total results from a webpage, of which my existing script can parse one-third - Stack Overflow

admin•2025-04-19 14:32:09•questions•阅读6

I've created a script that issues a POST HTTP request with the appropriate parameters to fetch the

I've created a script that issues a POST HTTP request with the appropriate parameters to fetch the town, continent, country, and inner_link from this webpage. The script can parse 69 containers, but there are 162 items in total. How can I fetch the rest?

import requests

link = ''
inner_link = '/{}'

payload = {
    "z":"nZ9g0AdFBj7cLRX5v2wSWjjGf2Q5KPpss9DS4wZGh9pvfC4xcJvnTebBg+npAqWaQvdVUFxVD1NZ88siTRUfPo8gB70CGoJG/2MPv9Gu9kC+48KwvV4COpsB3HmER0Mgx0bz2G9pSpw6veTnEUnNR78xonQmhuvL3eztB+ikZaI3OTeuVfRVNetmdX4iDgOkKrM6kLt/2SuRKKwT2aAZHJbdhlTV1I65zj1jD7VBwrm+lJDNh7pZug0/gKCWUDQz4CgmrAdQdnxyJDde2ewzudcsGDimhnWB56bcejoli4LLvevtMB4RUMhmM6FIYn0Tl4sclUD7YLQ8gZQOMmBndDkGctxeq74bpDAwBMOG74qu9gb4WLUFxgB/lWCQ9OnJsfkT0J/kUShhQPoRVr72qUx8f8ldkliIGINoBy9i+lm1RYM3L/NfOJ0kBZ+fbKndVJk2owAZ1kLMupja4iPmpxszQlFGTstpAlF5pTckhL+QYIc6vYbslWqXVs8XrzKs955DHPe1WpWmI714MsJfHhd3XHDsuMy9lfY6mE+cfc0434amFJC5gCgoEhGIQsFQD/kGRaWvqCcMfPYiW/o++nQ017bAKzlg7qb0EfPpy/EMG+u4i7QEU/vvC9mUnVCN0ZzFpxP8HWiTTCF0djuB+UnfUaHKtXciPwwZUTV4o8PtI6v6QdrC4PvtAKSJ9CpIccW+A3SSvOgCgEwOtniCdLxezWaP1Dq3fv9G56HCOvsOGRlQ0RgzNgq/+pCwkvyqFYcs/VtX9NPuaCAAXLi+SFM0xRuI4Sq6nHQr7qs6R2C4gAVHm9bZHfByKZ5x03KJp74IGlGSd1GL9/z9CySVZw==",
    "y":"oht3SrBVqLvR2lXJSwtwWw==",
    "x":"dmpOxF/FB13c+GGFmDW4Y4SPz6jEItrcjegm/WNbqFk="
}

headers = {
    'accept': 'application/json, text/javascript, */*; q=0.01',
    'accept-language': 'en-US,en;q=0.9',
    'origin': '',
    'referer': '/',
    'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/133.0.0.0 Safari/537.36',
    'x-requested-with': 'XMLHttpRequest'
}

res = requests.post(link,json=payload,headers=headers)
print(res.status_code)
for item in res.json()['hits']['hits']:
    print((
        item['_source']['town_text'],
        item['_source']['continent__text__text'],
        item['_source']['country__text__text'],
        inner_link.format(item['_source']['Slug'])
    ))

import requests

link = 'https://wenomad.so/elasticsearch/search'
inner_link = 'https://wenomad.so/city/{}'

payload = {
    "z":"nZ9g0AdFBj7cLRX5v2wSWjjGf2Q5KPpss9DS4wZGh9pvfC4xcJvnTebBg+npAqWaQvdVUFxVD1NZ88siTRUfPo8gB70CGoJG/2MPv9Gu9kC+48KwvV4COpsB3HmER0Mgx0bz2G9pSpw6veTnEUnNR78xonQmhuvL3eztB+ikZaI3OTeuVfRVNetmdX4iDgOkKrM6kLt/2SuRKKwT2aAZHJbdhlTV1I65zj1jD7VBwrm+lJDNh7pZug0/gKCWUDQz4CgmrAdQdnxyJDde2ewzudcsGDimhnWB56bcejoli4LLvevtMB4RUMhmM6FIYn0Tl4sclUD7YLQ8gZQOMmBndDkGctxeq74bpDAwBMOG74qu9gb4WLUFxgB/lWCQ9OnJsfkT0J/kUShhQPoRVr72qUx8f8ldkliIGINoBy9i+lm1RYM3L/NfOJ0kBZ+fbKndVJk2owAZ1kLMupja4iPmpxszQlFGTstpAlF5pTckhL+QYIc6vYbslWqXVs8XrzKs955DHPe1WpWmI714MsJfHhd3XHDsuMy9lfY6mE+cfc0434amFJC5gCgoEhGIQsFQD/kGRaWvqCcMfPYiW/o++nQ017bAKzlg7qb0EfPpy/EMG+u4i7QEU/vvC9mUnVCN0ZzFpxP8HWiTTCF0djuB+UnfUaHKtXciPwwZUTV4o8PtI6v6QdrC4PvtAKSJ9CpIccW+A3SSvOgCgEwOtniCdLxezWaP1Dq3fv9G56HCOvsOGRlQ0RgzNgq/+pCwkvyqFYcs/VtX9NPuaCAAXLi+SFM0xRuI4Sq6nHQr7qs6R2C4gAVHm9bZHfByKZ5x03KJp74IGlGSd1GL9/z9CySVZw==",
    "y":"oht3SrBVqLvR2lXJSwtwWw==",
    "x":"dmpOxF/FB13c+GGFmDW4Y4SPz6jEItrcjegm/WNbqFk="
}

headers = {
    'accept': 'application/json, text/javascript, */*; q=0.01',
    'accept-language': 'en-US,en;q=0.9',
    'origin': 'https://wenomad.so',
    'referer': 'https://wenomad.so/',
    'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/133.0.0.0 Safari/537.36',
    'x-requested-with': 'XMLHttpRequest'
}

res = requests.post(link,json=payload,headers=headers)
print(res.status_code)
for item in res.json()['hits']['hits']:
    print((
        item['_source']['town_text'],
        item['_source']['continent__text__text'],
        item['_source']['country__text__text'],
        inner_link.format(item['_source']['Slug'])
    ))

Share asked Mar 7 at 15:47 robots.txt 1492 gold badges10 silver badges41 bronze badges

2 If you View Source on the response, do you see all 162 items? Perhaps the missing ones are backfilled via javascript – JonSG Commented Mar 7 at 15:52
None of the 162 items is available in the page source. – robots.txt Commented Mar 7 at 16:01
You will like need selenium – JonSG Commented Mar 7 at 16:19
I've scraped the first 69 items without the help of Selenium, even though they are not present in the page source. I hope there is a way to scrape the rest while sticking with requests. – robots.txt Commented Mar 7 at 16:47
If the page you're accessing is dependant on JavaScript for its rendering it's highly unlikely that you'll succeed with requests – Adon Bilivit Commented Mar 7 at 17:11

| Show 3 more comments

1 Answer 1

Sorted by: Reset to default 4

You need to replicate the requests to the /elasticsearch/search endpoint which requires three params x, y and z. These params are generated through a cryptographic encryption in the encode3 function of run.js

First install PyCryptodome:

pip install pycryptodome

Then you can use this script to get all (162) results:

from Crypto.Cipher import AES
from Crypto.Protocol.KDF import PBKDF2
from Crypto.Hash import MD5
from Crypto.Util.Padding import pad
import base64
import json
import random
import time
import requests


def encode(key, iv, text, appname):
    derived_key = PBKDF2(key, appname.encode(), dkLen=32, count=7, hmac_hash_module=MD5)
    derived_iv = PBKDF2(iv, appname.encode(), dkLen=16, count=7, hmac_hash_module=MD5)

    cipher = AES.new(derived_key, AES.MODE_CBC, iv=derived_iv)
    text_bytes = pad(text.encode(), AES.block_size)
    encrypted_text = cipher.encrypt(text_bytes)
    encrypted_base64 = base64.b64encode(encrypted_text).decode()
    
    return encrypted_base64


def generate_payload(data):
    v = "1"
    appname = 'fie'
    cur_timestamp = str(int(time.time() * 1000)) 
    timestamp_version = f'{cur_timestamp}_{v}'
    key = appname + cur_timestamp
    iv = str(random.random())

    text = json.dumps(data, separators=(',', ':'))
    encoded = {
        'z': encode(key, iv, text, appname),
        'y': encode(appname, "po9", timestamp_version, appname),
        'x': encode(appname, "fl1", iv, appname)
    }

    return encoded


def fetch_all_search_results(data):
    headers = {
        'x-requested-with': 'XMLHttpRequest',
        'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/133.0.0.0 Safari/537.36'
    }

    results = []
    while True:
        payload = generate_payload(data)
        response = requests.post('https://wenomad.so/elasticsearch/search', headers=headers, json=payload)
        res_json = response.json()

        hits = res_json.get('hits', {}).get('hits', [])
        results.extend(hits)
        data['from'] += len(hits)
        
        if res_json.get('at_end'):
            break

    return results


data = {
    "appname": "fie",
    "app_version": "live",
    "type": "custom.town",
    "constraints": [
        {
            "key": "active_boolean",
            "value": True,
            "constraint_type": "equals"
        }
    ],
    "sorts_list": [
        {
            "sort_field": "overall_rating_number",
            "descending": True
        },
        {
            "sort_field": "overall_rating_number",
            "descending": True
        }
    ],
    "from": 0,
    "n": 9999,
    "search_path": "{\"constructor_name\":\"DataSource\",\"args\":[{\"type\":\"json\",\"value\":\"%p3.AAV.%el.cmQus.%el.cmSJO0.%p.%ds\"},{\"type\":\"node\",\"value\":{\"constructor_name\":\"Element\",\"args\":[{\"type\":\"json\",\"value\":\"%p3.AAV.%el.cmQus.%el.cmSJO0\"}]}},{\"type\":\"raw\",\"value\":\"Search\"}]}",
    "situation": "unknown"
}

results = fetch_all_search_results(data)
print(f'{len(results) = }')

发布者：admin，转转请注明出处：http://www.yc00.com/questions/1744919968a4601072.html

admin

questions
javascript - How to break an array of objects into 2 arrays, using UnderscoreLoDash - Stack Overflow
I have an array of objects:var arr = [{field1: value, field2: value, field3: value, field4: value},{fie
admin
26分钟前
00
questions
javascript - React JS Change text value and submit - Stack Overflow
Facing this Error : Failed propType: You provided a value prop to a form field without an onChange hand
admin
25分钟前
00
questions
php - Random code snippet showing up in browser
I have this random closing tag showing up in my top left header above the first navigation **);**Can't find it in t
admin
24分钟前
10
questions
How do you get all the urls of images attached to a post?
I've searched high and low and can't seem to get it. I'm trying to output an XML feed with all the images
admin
22分钟前
00
questions
javascript - Spanish special characters like á ó while displaying shows jumbled or garbage value - Stack Overf
I have a Spanish validation message which I'm trying to display using my JavaScript.And all the s
admin
20分钟前
00
questions
php - add custom link to gallery images
I need to link some gallery images to external webistes. After some research I'm not able to found a solution that
admin
19分钟前
00
questions
javascript - antd table actions and onrow function - Stack Overflow
While clicking actions in antd table onRow click action is also initiated. How can i solve these proble
admin
19分钟前
00
questions
javascript - Is there a way to run TypeScript on the command line? - Stack Overflow
I know that I can run JavaScript by running node through the mand line, but is there a way to play arou
admin
18分钟前
00
questions
Android problem with "@react-native-async-storageasync-storage" - Stack Overflow
I have a problem with "@react-native-async-storageasync-storage" lib. my app is expo react-n
admin
14分钟前
00
questions
Modifying the default search widget
I'm creating a plugin widget that does a different search from the normal default search widget. I copied the code
admin
14分钟前
00
questions
c - how can I array the external inputs for code generation in Simulink embedded coder? - Stack Overflow
I am building the c-code of a Subsystem in Simulink with the embedded coder. The created header file in
admin
9分钟前
00
questions
javascript - Get caret position (line number) in draft.js - Stack Overflow
How do you get the caret position in draft.js? I guess you can get the block from the selectionstate an
admin
8分钟前
00
questions
javascript - How to change event execution order? - Stack Overflow
<!DOCTYPE html><html><head><meta charset="utf-8"><title>Close
admin
7分钟前
00
questions
javascript - Validate Plugin - using the submitHandler - Stack Overflow
My apologies if you have already seen this or replied to it but I can't seem to find the question
admin
5分钟前
00
questions
t sql - SQL query to select fewest rows that cover all columns - Stack Overflow
I'm looking to get column coverage with the fewest amount of rows possible. The table is a list of
admin
5分钟前
00
questions
plugin development - How do I find the pageurl where a search came from when using pre_get_posts filter?
When I use pre_get_posts filter, I want to check what urlpage the search query came from, but the $query var does not a
admin
3分钟前
10
questions
javascript - How to reload the already existing page after closing popup window - Stack Overflow
i am using javascript, php, mysql.I have a button on 'index.php'. A new window will open on
admin
1分钟前
10
questions
php - update_post_meta on multi-dimensional array options
When updating the database, I use update_post_meta($employee->ID, '_email_address', $record->email); and
admin
1分钟前
00
questions
javascript - What does `InputElementDiv` stand for in ECMAScript lexical grammar - Stack Overflow
The lexical grammar of ECMAScript lists the following token classes for lexical analyzer (lexer):InputE
admin
34秒前
00
questions
Docker error "cannot program address 172.19.0.316 in sandbox interface because it conflicts with existing route&quo
I am trying to run these containers for a ctf on my kali machine, but when I run the command I encounte
admin
26秒前
00

发表回复

评论列表（0条）

暂无评论

python - Failed to parse the total results from a webpage, of which my existing script can parse one-third - Stack Overflow

1 Answer 1

发表回复

评论列表（0条）

联系我们

400-800-8888

python - Failed to parse the total results from a webpage, of which my existing script can parse one-third - Stack Overflow

1 Answer 1

相关推荐

发表回复

评论列表（0条）

联系我们

400-800-8888