python - Download a PDF with selenium - Stack Overflow

I'm trying to download PDFs with selenium, but the argument driver.download_file(file_name, target

I'm trying to download PDFs with selenium, but the argument driver.download_file(file_name, target_directory) returns "WebDriverException: You must enable downloads in order to work with downloadable files."

I tried adding the option chrome_options.enable_downloads = True, but it didn't work. I also tried using a different browser (I obtained the same problem with Edge, and Firefox returned another error). I also tried several older version of Selenium, without any success.

In the end, all I want is to download PDFs and store them in a specific folder. If anyone has any advice on how I can achieve this, it would be very helpful!

Here is my complete code, please let me know if I can provide anything else :)

import time
from selenium import webdriver
from selenium.webdriver.chrome.options import Options

def download_pdf_and_rename(url, filename):
   # Configure Chrome options to download PDFs to a temporary directory
    chrome_options = Options()
    
    chrome_options.enable_downloads = True

    driver = webdriver.Chrome(options=chrome_options)

    # Access the PDF URL
    driver.get(url)

    time.sleep(5)  # Adjust the sleep time as needed
    
    driver.download_file('my_pdf.pdf', MY_PATH)
    
    # Close the browser
    driver.quit()


download_pdf_and_rename(".1257/aer.20170866", "my_pdf.pdf")

Thanks!

I'm trying to download PDFs with selenium, but the argument driver.download_file(file_name, target_directory) returns "WebDriverException: You must enable downloads in order to work with downloadable files."

I tried adding the option chrome_options.enable_downloads = True, but it didn't work. I also tried using a different browser (I obtained the same problem with Edge, and Firefox returned another error). I also tried several older version of Selenium, without any success.

In the end, all I want is to download PDFs and store them in a specific folder. If anyone has any advice on how I can achieve this, it would be very helpful!

Here is my complete code, please let me know if I can provide anything else :)

import time
from selenium import webdriver
from selenium.webdriver.chrome.options import Options

def download_pdf_and_rename(url, filename):
   # Configure Chrome options to download PDFs to a temporary directory
    chrome_options = Options()
    
    chrome_options.enable_downloads = True

    driver = webdriver.Chrome(options=chrome_options)

    # Access the PDF URL
    driver.get(url)

    time.sleep(5)  # Adjust the sleep time as needed
    
    driver.download_file('my_pdf.pdf', MY_PATH)
    
    # Close the browser
    driver.quit()


download_pdf_and_rename("https://pubs.aeaweb./doi/pdfplus/10.1257/aer.20170866", "my_pdf.pdf")

Thanks!

Share Improve this question asked Nov 18, 2024 at 12:08 Lucie BoisLucie Bois 11 bronze badge 1
  • check the eample in the official repository – cards Commented Nov 18, 2024 at 12:30
Add a comment  | 

2 Answers 2

Reset to default 0

Selenium doesn't have a built-in enable_downloads attribute. Instead, you need to set specific Chrome preferences to control the behavior of downloads, including the directory where files should be saved and how to handle PDF files.

import time
import os
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from webdriver_manager.chrome import ChromeDriverManager

def download_pdf_and_rename(url, target_directory, filename):
    # Ensure the target directory exists
    if not os.path.exists(target_directory):
        os.makedirs(target_directory)
    chrome_options = Options()
    chrome_options.add_experimental_option("prefs", {
        "download.default_directory": target_directory,  
        "download.prompt_for_download": False,  
        "plugins.always_open_pdf_externally": True,  
    })

    driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()), options=chrome_options)
    driver.get(url)
    time.sleep(10)
    downloaded_file_path = os.path.join(target_directory, "document.pdf")
    renamed_file_path = os.path.join(target_directory, filename)
    if os.path.exists(downloaded_file_path):
        os.rename(downloaded_file_path, renamed_file_path)
        print(f"File downloaded and renamed to: {renamed_file_path}")
    else:
        print("Downloaded file not found. Check the download settings or file name.")
    driver.quit()
download_pdf_and_rename(
    "https://pubs.aeaweb./doi/pdfplus/10.1257/aer.20170866",
    target_directory="./downloads",
    filename="my_pdf.pdf"
)

This is not a Selenium solution but you can make a request for the service in Python and check the response's Content-Disposition header. That will contain the name of the file that is being downloaded.

There is a chance that the request will get blocked, so you might need to play around with request-headers to get around blocked requests.

发布者:admin,转转请注明出处:http://www.yc00.com/questions/1745619313a4636425.html

相关推荐

  • python - Download a PDF with selenium - Stack Overflow

    I'm trying to download PDFs with selenium, but the argument driver.download_file(file_name, target

    4小时前
    30

发表回复

评论列表(0条)

  • 暂无评论

联系我们

400-800-8888

在线咨询: QQ交谈

邮件:admin@example.com

工作时间:周一至周五,9:30-18:30,节假日休息

关注微信