python - Selenium cannot retrieve url when running in Google Colab

I built a small web scraper that has run successfully in a Google Colab over the last few months. It downloads a set of billing codes from the CMS website. Recently the driver started throwing timeout exceptions when retrieving some but not all urls. The reprex below downloads a file from two urls. It executes successfully when I run it locally and it attempts to and fails trying to retrieve the second url when running in Google Colab.

The timeout happens in driver.get(url). Strangely, the code works so long as the driver has not previously visited another url. For example, in the code below, not_working_url will successfully retrieve the webpage and download the file if it does not come after working_url.

from selenium import webdriver
from seleniummon.exceptions import TimeoutException
from selenium.webdriver.chrome.options import Options
from selenium.webdrivermon.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.wait import WebDriverWait


def download_documents() -> None:
    """Download billing code documents from CMS"""

    chrome_options = Options()
    chrome_options.add_argument("--headless")
    chrome_options.add_argument("--no-sandbox")
    chrome_options.add_argument("--disable-dev-shm-usage")
    driver = webdriver.Chrome(options=chrome_options)

    working_url = ".aspx?articleid=59626&ver=6"
    not_working_url = ".aspx?lcdid=36377&ver=19"

    for row in [working_url, not_working_url]:
        print(f"Retrieving from {row}...")
        driver.get(row) # Fails on second url

        print("Wait for webdriver...")
        wait = WebDriverWait(driver, 2)

        print("Attempting license accept...")
        # Accept license
        try:
            wait.until(EC.element_to_be_clickable((By.ID, "btnAcceptLicense"))).click()
        except TimeoutException:
            pass
        wait = WebDriverWait(driver, 4)
        print("Attempting pop up close...")
        # Click on Close button of the second pop-up
        try:
            wait.until(
                EC.element_to_be_clickable(
                    (
                        By.XPATH,
                        "//button[@data-page-action='Clicked the Tracking Sheet Close button.']",
                    )
                )
            ).click()
        except TimeoutException:
            pass
        print("Attempting download...")
        driver.find_element(By.ID, "btnDownload").click()

download_documents()

Expected behavior: The code above runs successfully in Google Colab, just like it does locally.

A potentially related issue: Selenium TimeoutException in Google Colab

from selenium import webdriver
from seleniummon.exceptions import TimeoutException
from selenium.webdriver.chrome.options import Options
from selenium.webdrivermon.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.wait import WebDriverWait


def download_documents() -> None:
    """Download billing code documents from CMS"""

    chrome_options = Options()
    chrome_options.add_argument("--headless")
    chrome_options.add_argument("--no-sandbox")
    chrome_options.add_argument("--disable-dev-shm-usage")
    driver = webdriver.Chrome(options=chrome_options)

    working_url = "https://www.cms.gov/medicare-coverage-database/view/article.aspx?articleid=59626&ver=6"
    not_working_url = "https://www.cms.gov/medicare-coverage-database/view/lcd.aspx?lcdid=36377&ver=19"

    for row in [working_url, not_working_url]:
        print(f"Retrieving from {row}...")
        driver.get(row) # Fails on second url

        print("Wait for webdriver...")
        wait = WebDriverWait(driver, 2)

        print("Attempting license accept...")
        # Accept license
        try:
            wait.until(EC.element_to_be_clickable((By.ID, "btnAcceptLicense"))).click()
        except TimeoutException:
            pass
        wait = WebDriverWait(driver, 4)
        print("Attempting pop up close...")
        # Click on Close button of the second pop-up
        try:
            wait.until(
                EC.element_to_be_clickable(
                    (
                        By.XPATH,
                        "//button[@data-page-action='Clicked the Tracking Sheet Close button.']",
                    )
                )
            ).click()
        except TimeoutException:
            pass
        print("Attempting download...")
        driver.find_element(By.ID, "btnDownload").click()

download_documents()

Expected behavior: The code above runs successfully in Google Colab, just like it does locally.

A potentially related issue: Selenium TimeoutException in Google Colab

Share Improve this question edited Nov 17, 2024 at 14:08 asked Nov 16, 2024 at 16:13 Marshall K 3331 silver badge14 bronze badges

Add a comment |

2 Answers 2

Sorted by: Reset to default 0

I was able to run my script successfully by initializing (and closing) the driver on every iteration of the loop rather than just once before it started.

For example, the loop below retrieves the url without timing out on each iteration. I would still appreciate any commentary explaining why I would ever need to reinitialize the driver regardless of my programming environment, but hopefully this solution is helpful for others who run into this issue.

for row in [working_url, not_working_url]:
    driver = webdriver.Chrome(options=chrome_options)
    driver.get(row)
    driver.close()

Try these below arguments:

    
   chrome_options = Options()
    chrome_options.add_argument("--headless")
    chrome_options.add_argument("--no-sandbox")
    chrome_options.add_argument("--disable-dev-shm-usage")
    chrome_options.add_argument("--disable-blink-features=AutomationControlled")
    chrome_options.add_argument(
        "user-agent=Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36"
    )

发布者：admin，转转请注明出处：http://www.yc00.com/questions/1745652883a4638370.html

python - Selenium cannot retrieve url when running in Google Colab - Stack Overflow

2 Answers 2

发表回复

评论列表（0条）

联系我们

400-800-8888

python - Selenium cannot retrieve url when running in Google Colab - Stack Overflow

2 Answers 2

相关推荐