I am currently trying to take a screenshot of a given web page using Crawl4ai
, however each time that I try to do it I get an error or I don't get anything.
Here is the code I used that is the same from their own documentation:
import os, asyncio
from base64 import b64decode
from crawl4ai import AsyncWebCrawler, CacheMode
async def main():
async with AsyncWebCrawler() as crawler:
result = await crawler.arun(
url=";,
cache_mode=CacheMode.BYPASS,
pdf=True,
screenshot=True
)
if result.success:
# Save screenshot
if result.screenshot:
with open("wikipedia_screenshot.png", "wb") as f:
f.write(b64decode(result.screenshot))
# Save PDF
if result.pdf:
with open("wikipedia_page.pdf", "wb") as f:
f.write(result.pdf)
print("[OK] PDF & screenshot captured.")
else:
print("[ERROR]", result.error_message)
if __name__ == "__main__":
asyncio.run(main())
And the error that I get:
Error: crawl4ai.async_webcrawler.AsyncWebCrawler.aprocess_html() got multiple values for keyword argument 'screenshot'
I am currently trying to take a screenshot of a given web page using Crawl4ai
, however each time that I try to do it I get an error or I don't get anything.
Here is the code I used that is the same from their own documentation:
import os, asyncio
from base64 import b64decode
from crawl4ai import AsyncWebCrawler, CacheMode
async def main():
async with AsyncWebCrawler() as crawler:
result = await crawler.arun(
url="https://en.wikipedia./wiki/List_of_common_misconceptions",
cache_mode=CacheMode.BYPASS,
pdf=True,
screenshot=True
)
if result.success:
# Save screenshot
if result.screenshot:
with open("wikipedia_screenshot.png", "wb") as f:
f.write(b64decode(result.screenshot))
# Save PDF
if result.pdf:
with open("wikipedia_page.pdf", "wb") as f:
f.write(result.pdf)
print("[OK] PDF & screenshot captured.")
else:
print("[ERROR]", result.error_message)
if __name__ == "__main__":
asyncio.run(main())
And the error that I get:
Share Improve this question edited Mar 25 at 7:04 Basheer Jarrah 5713 silver badges16 bronze badges asked Mar 24 at 19:49 BernardoBernardo 411 silver badge7 bronze badges 3 |Error: crawl4ai.async_webcrawler.AsyncWebCrawler.aprocess_html() got multiple values for keyword argument 'screenshot'
1 Answer
Reset to default 0I have tested your script, and it mostly work, on my machine. It's possible that your Crawl4ai browser setup is misconfigured.
Using that particular URL (a long scrolling page), yes, the output PNG is 154Mb, that's way too much for this format to be opened.
I have added a compression step using pillow
(pip install pillow
), and changed the output format to JPEG, with quite high 20% compression (quality=20), this results in a fairly usable 4.8MB JPEG.
Going lower in compression is very destructive, but you could try to adjust that quality
.
#!/usr/bin/python
import os
import asyncio
from base64 import b64decode
from crawl4ai import AsyncWebCrawler, CacheMode
from PIL import Image
import io
async def main():
save_dir = os.path.dirname("wikipedia_screenshot.png")
if save_dir and not os.path.exists(save_dir):
os.makedirs(save_dir)
async with AsyncWebCrawler() as crawler:
result = await crawler.arun(
url="https://en.wikipedia./wiki/List_of_common_misconceptions",
cache_mode=CacheMode.BYPASS,
pdf=True,
screenshot=True
)
if result.success:
if result.screenshot:
image_data = b64decode(result.screenshot)
image = Image.open(io.BytesIO(image_data))
compressed_image = image.convert("RGB")
compressed_image_io = io.BytesIO()
compressed_image.save(compressed_image_io, format='JPEG', optimize=True, quality=20)
compressed_image_io.seek(0)
with open("wikipedia_screenshot.jpg", "wb") as f:
f.write(compressed_image_io.getvalue())
print("[OK] Screenshot saved successfully (compressed).")
if result.pdf:
with open("wikipedia_page.pdf", "wb") as f:
f.write(result.pdf)
print("[OK] PDF saved successfully.")
print("[OK] PDF & compressed screenshot captured.")
else:
print(f"[ERROR] Failed to retrieve content: {result.error_message}")
if __name__ == "__main__":
asyncio.run(main())
发布者:admin,转转请注明出处:http://www.yc00.com/questions/1744230935a4564246.html
firefox --screenshot https://en.wikipedia./wiki/List_of_common_misconceptions
. Calling that from python would requireimport subprocess
. Just an idea, there is many alternatives. – NVRM Commented Mar 25 at 7:10