I'm trying to download PDFs with selenium, but the argument driver.download_file(file_name, target_directory) returns "WebDriverException: You must enable downloads in order to work with downloadable files."
I tried adding the option chrome_options.enable_downloads = True, but it didn't work. I also tried using a different browser (I obtained the same problem with Edge, and Firefox returned another error). I also tried several older version of Selenium, without any success.
In the end, all I want is to download PDFs and store them in a specific folder. If anyone has any advice on how I can achieve this, it would be very helpful!
Here is my complete code, please let me know if I can provide anything else :)
import time
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
def download_pdf_and_rename(url, filename):
# Configure Chrome options to download PDFs to a temporary directory
chrome_options = Options()
chrome_options.enable_downloads = True
driver = webdriver.Chrome(options=chrome_options)
# Access the PDF URL
driver.get(url)
time.sleep(5) # Adjust the sleep time as needed
driver.download_file('my_pdf.pdf', MY_PATH)
# Close the browser
driver.quit()
download_pdf_and_rename(".1257/aer.20170866", "my_pdf.pdf")
Thanks!
I'm trying to download PDFs with selenium, but the argument driver.download_file(file_name, target_directory) returns "WebDriverException: You must enable downloads in order to work with downloadable files."
I tried adding the option chrome_options.enable_downloads = True, but it didn't work. I also tried using a different browser (I obtained the same problem with Edge, and Firefox returned another error). I also tried several older version of Selenium, without any success.
In the end, all I want is to download PDFs and store them in a specific folder. If anyone has any advice on how I can achieve this, it would be very helpful!
Here is my complete code, please let me know if I can provide anything else :)
import time
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
def download_pdf_and_rename(url, filename):
# Configure Chrome options to download PDFs to a temporary directory
chrome_options = Options()
chrome_options.enable_downloads = True
driver = webdriver.Chrome(options=chrome_options)
# Access the PDF URL
driver.get(url)
time.sleep(5) # Adjust the sleep time as needed
driver.download_file('my_pdf.pdf', MY_PATH)
# Close the browser
driver.quit()
download_pdf_and_rename("https://pubs.aeaweb./doi/pdfplus/10.1257/aer.20170866", "my_pdf.pdf")
Thanks!
Share Improve this question asked Nov 18, 2024 at 12:08 Lucie BoisLucie Bois 11 bronze badge 1- check the eample in the official repository – cards Commented Nov 18, 2024 at 12:30
2 Answers
Reset to default 0Selenium doesn't have a built-in enable_downloads
attribute. Instead, you need to set specific Chrome preferences to control the behavior of downloads, including the directory where files should be saved and how to handle PDF files.
import time
import os
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from webdriver_manager.chrome import ChromeDriverManager
def download_pdf_and_rename(url, target_directory, filename):
# Ensure the target directory exists
if not os.path.exists(target_directory):
os.makedirs(target_directory)
chrome_options = Options()
chrome_options.add_experimental_option("prefs", {
"download.default_directory": target_directory,
"download.prompt_for_download": False,
"plugins.always_open_pdf_externally": True,
})
driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()), options=chrome_options)
driver.get(url)
time.sleep(10)
downloaded_file_path = os.path.join(target_directory, "document.pdf")
renamed_file_path = os.path.join(target_directory, filename)
if os.path.exists(downloaded_file_path):
os.rename(downloaded_file_path, renamed_file_path)
print(f"File downloaded and renamed to: {renamed_file_path}")
else:
print("Downloaded file not found. Check the download settings or file name.")
driver.quit()
download_pdf_and_rename(
"https://pubs.aeaweb./doi/pdfplus/10.1257/aer.20170866",
target_directory="./downloads",
filename="my_pdf.pdf"
)
This is not a Selenium solution but you can make a request for the service in Python and check the response's Content-Disposition
header. That will contain the name of the file that is being downloaded.
There is a chance that the request will get blocked, so you might need to play around with request-headers to get around blocked requests.
发布者:admin,转转请注明出处:http://www.yc00.com/questions/1745619313a4636425.html
评论列表(0条)