python - How do I disguise my web scraper to read adobe pages? - Stack Overflow

I tried to use the following agent disguise: "User-Agent": "Mozilla5.0 (Windows NT 10.0

I tried to use the following agent disguise: "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.0.0 Safari/537.36" I also tried downloading the html and scanning it after with `elif app == "Acrobat Reader":

            options = webdriver.ChromeOptions()
            options.add_argument("--headless")
            options.add_argument("user-agent=Mozilla/5.0")

            driver = webdriver.Chrome(options=options)
            driver.get(url)

            html = driver.page_source

            with open("C:/PowerShellShit/website.html", "w", encoding="utf-8") as f: #speichert html via powershell lokal zwischen
                f.write(html)

            driver.quit()

            soup = BeautifulSoup(html, "html.parser")
            table = soup.find("table")
            version = "Unbekannt"

            first_data_row = table.find_all("tr")[1] 
            version = first_data_row.find_all("td")[1].text.strip()`

I always received the following error: HTTPSConnectionPool(host='helpx.adobe', port=443): Read timed out. (read timeout=20)

Do you guys have any ideas?

发布者:admin,转转请注明出处:http://www.yc00.com/questions/1744208918a4563239.html

相关推荐

发表回复

评论列表(0条)

  • 暂无评论

联系我们

400-800-8888

在线咨询: QQ交谈

邮件:admin@example.com

工作时间:周一至周五,9:30-18:30,节假日休息

关注微信