javascript - Scrapy + splash: can't select element - Stack Overflow|江阴雨辰互联

I'm learning to use scrapy with splash. As an exercise, I'm trying to visit /, click on the address text box, enter a location and then press the Enter button to move to next page containing the restaurants available for that location. I have the following lua code:

function main(splash)
  local url = splash.args.url
  assert(splash:go(url))
  assert(splash:wait(5))

  local element = splash:select('.base_29SQWm')
  local bounds = element:bounds()
  assert(element:mouseclick{x = bounds.width/2, y = bounds.height/2})
    assert(element:send_text("Wall Street"))
  assert(splash:send_keys("<Return>"))
  assert(splash:wait(5))

  return {
  html = splash:html(),
  }
end

When I click on "Render!" in the splash API, I get the following error message:

  {
      "info": {
          "message": "Lua error: [string \"function main(splash)\r...\"]:7: attempt to index local 'element' (a nil value)",
          "type": "LUA_ERROR",
          "error": "attempt to index local 'element' (a nil value)",
          "source": "[string \"function main(splash)\r...\"]",
          "line_number": 7
      },
      "error": 400,
      "type": "ScriptError",
      "description": "Error happened while executing Lua script"
  }

Somehow my css expression is false, resulting in splash trying to access an element that is undefined/nil! I've tried other expressions, but I can't seem to figure it out!

Q: Does anyone know how to solve this problem?

EDIT: Even though I still would like to know how to actually click on the element, I figured out how to get the same result by just using keys:

function main(splash)
    local url = splash.args.url
    assert(splash:go(url))
    assert(splash:wait(5))
    splash:send_keys("<Tab>")
    splash:send_keys("<Tab>")
    splash:send_text("Wall Street, New York")
    splash:send_keys("<Return>")
    assert(splash:wait(10))

    return {
    html = splash:html(),
    png = splash:png(),
    }
  end

However, returned html/images in the splash API are from the page where you enter the address, not the page that you see after you've entered your address and clicked enter.

Q2: How do I succesfully load the second page?

I'm learning to use scrapy with splash. As an exercise, I'm trying to visit https://www.ubereats./stores/, click on the address text box, enter a location and then press the Enter button to move to next page containing the restaurants available for that location. I have the following lua code:

function main(splash)
  local url = splash.args.url
  assert(splash:go(url))
  assert(splash:wait(5))

  local element = splash:select('.base_29SQWm')
  local bounds = element:bounds()
  assert(element:mouseclick{x = bounds.width/2, y = bounds.height/2})
    assert(element:send_text("Wall Street"))
  assert(splash:send_keys("<Return>"))
  assert(splash:wait(5))

  return {
  html = splash:html(),
  }
end

When I click on "Render!" in the splash API, I get the following error message:

  {
      "info": {
          "message": "Lua error: [string \"function main(splash)\r...\"]:7: attempt to index local 'element' (a nil value)",
          "type": "LUA_ERROR",
          "error": "attempt to index local 'element' (a nil value)",
          "source": "[string \"function main(splash)\r...\"]",
          "line_number": 7
      },
      "error": 400,
      "type": "ScriptError",
      "description": "Error happened while executing Lua script"
  }

Somehow my css expression is false, resulting in splash trying to access an element that is undefined/nil! I've tried other expressions, but I can't seem to figure it out!

Q: Does anyone know how to solve this problem?

EDIT: Even though I still would like to know how to actually click on the element, I figured out how to get the same result by just using keys:

function main(splash)
    local url = splash.args.url
    assert(splash:go(url))
    assert(splash:wait(5))
    splash:send_keys("<Tab>")
    splash:send_keys("<Tab>")
    splash:send_text("Wall Street, New York")
    splash:send_keys("<Return>")
    assert(splash:wait(10))

    return {
    html = splash:html(),
    png = splash:png(),
    }
  end

However, returned html/images in the splash API are from the page where you enter the address, not the page that you see after you've entered your address and clicked enter.

Q2: How do I succesfully load the second page?

Share Improve this question edited Feb 14, 2022 at 18:24 Egor Skriptunoff 23.8k2 gold badges37 silver badges67 bronze badges asked Jan 13, 2017 at 10:46 titusAdam 8091 gold badge17 silver badges36 bronze badges

Add a ment |

1 Answer 1

Sorted by: Reset to default 7 +50

Not a plete solution, but here is what I have so far:

import json
import re

import scrapy
from scrapy_splash import SplashRequest


class UberEatsSpider(scrapy.Spider):
    name = "ubereatspider"
    allowed_domains = ["ubereats."]

    def start_requests(self):
        script = """
        function main(splash)
            local url = splash.args.url
            assert(splash:go(url))
            assert(splash:wait(10))

            splash:set_viewport_full()

            local search_input = splash:select('#address-selection-input')
            search_input:send_text("Wall Street, New York")
            assert(splash:wait(5))

            local submit_button = splash:select('button[class^=submitButton_]')
            submit_button:click()

            assert(splash:wait(10))

            return {
                html = splash:html(),
                png = splash:png(),
            }
          end
        """
        headers = {
            'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.95 Safari/537.36'
        }
        yield SplashRequest('https://www.ubereats./new_york/', self.parse, endpoint='execute', args={
            'lua_source': script,
            'wait': 5
        }, splash_headers=headers, headers=headers)

    def parse(self, response):
        script = response.xpath("//script[contains(., 'cityName')]/text()").extract_first()
        pattern = re.pile(r"window.INITIAL_STATE = (\{.*?\});", re.MULTILINE | re.DOTALL)

        match = pattern.search(script)
        if match:
            data = match.group(1)
            data = json.loads(data)
            for place in data["marketplace"]["marketplaceStores"]["data"]["entity"]:
                print(place["title"])

Note the changes in the Lua script: I've located the search input, send the search text to it, then located the "Find" button and clicked it. On the screenshot, I did not see the search results loaded no matter the time delay I've set, but I've managed to get the restaurant names from the script contents. The place objects contain all the necessary information to filter the desired restaurants.

Also note that the URL I'm navigating to is the "New York" one (not the general "stores").

I'm not pletely sure why the search result page is not being loaded though, but hope it'll be a good start for you and you can further improve this solution.

发布者：admin，转转请注明出处：http://www.yc00.com/questions/1742270781a4412657.html

javascript - Scrapy + splash: can't select element - Stack Overflow

1 Answer 1

发表回复

评论列表（0条）

联系我们

400-800-8888

javascript - Scrapy + splash: can&#39;t select element - Stack Overflow

1 Answer 1

相关推荐

javascript - Scrapy + splash: can&#39;t select element - Stack Overflow

发表回复

评论列表（0条）

联系我们

400-800-8888

javascript - Scrapy + splash: can't select element - Stack Overflow

javascript - Scrapy + splash: can't select element - Stack Overflow