python - Cant take a Screenshot using Crawl4ai - Stack Overflow|江阴雨辰互联

I am currently trying to take a screenshot of a given web page using Crawl4ai, however each time that I try to do it I get an error or I don't get anything.

Here is the code I used that is the same from their own documentation:

import os, asyncio
from base64 import b64decode
from crawl4ai import AsyncWebCrawler, CacheMode

async def main():
  async with AsyncWebCrawler() as crawler:
    result = await crawler.arun(
        url=";,
        cache_mode=CacheMode.BYPASS,
        pdf=True,
        screenshot=True
    )

    if result.success:
        # Save screenshot
        if result.screenshot:
            with open("wikipedia_screenshot.png", "wb") as f:
                f.write(b64decode(result.screenshot))

        # Save PDF
        if result.pdf:
            with open("wikipedia_page.pdf", "wb") as f:
                f.write(result.pdf)

        print("[OK] PDF & screenshot captured.")
    else:
        print("[ERROR]", result.error_message)

if __name__ == "__main__":
   asyncio.run(main())

And the error that I get:

Error: crawl4ai.async_webcrawler.AsyncWebCrawler.aprocess_html() got multiple values for keyword argument 'screenshot'

I am currently trying to take a screenshot of a given web page using Crawl4ai, however each time that I try to do it I get an error or I don't get anything.

Here is the code I used that is the same from their own documentation:

import os, asyncio
from base64 import b64decode
from crawl4ai import AsyncWebCrawler, CacheMode

async def main():
  async with AsyncWebCrawler() as crawler:
    result = await crawler.arun(
        url="https://en.wikipedia./wiki/List_of_common_misconceptions",
        cache_mode=CacheMode.BYPASS,
        pdf=True,
        screenshot=True
    )

    if result.success:
        # Save screenshot
        if result.screenshot:
            with open("wikipedia_screenshot.png", "wb") as f:
                f.write(b64decode(result.screenshot))

        # Save PDF
        if result.pdf:
            with open("wikipedia_page.pdf", "wb") as f:
                f.write(result.pdf)

        print("[OK] PDF & screenshot captured.")
    else:
        print("[ERROR]", result.error_message)

if __name__ == "__main__":
   asyncio.run(main())

And the error that I get:

Error: crawl4ai.async_webcrawler.AsyncWebCrawler.aprocess_html() got multiple values for keyword argument 'screenshot'

Share Improve this question edited Mar 25 at 7:04 Basheer Jarrah 5713 silver badges16 bronze badges asked Mar 24 at 19:49 Bernardo 411 silver badge7 bronze badges

Aside answer, depending the environment of course: using firefox --screenshot https://en.wikipedia./wiki/List_of_common_misconceptions. Calling that from python would require import subprocess. Just an idea, there is many alternatives. – NVRM Commented Mar 25 at 7:10
I am not sure why it's so large, but you could from this simply compress it to .jpg in your script, so it's automatic. You get 1Mb picture or so, Then you delete the original to save space. This might not be the best way if you have limited power, but doing this locally should be almost instantaneous. There is so many ways, find what is best for your use case, this is how we build our tooling. Later you will appreciate having spent time on this, it will be easier. GL – NVRM Commented Mar 25 at 19:56
Gotta love ghost accounts. – NVRM Commented 2 days ago

Add a comment |

1 Answer 1

Sorted by: Reset to default 0

I have tested your script, and it mostly work, on my machine. It's possible that your Crawl4ai browser setup is misconfigured.

Using that particular URL (a long scrolling page), yes, the output PNG is 154Mb, that's way too much for this format to be opened.

I have added a compression step using pillow (pip install pillow), and changed the output format to JPEG, with quite high 20% compression (quality=20), this results in a fairly usable 4.8MB JPEG.

Going lower in compression is very destructive, but you could try to adjust that quality.

#!/usr/bin/python
import os
import asyncio
from base64 import b64decode
from crawl4ai import AsyncWebCrawler, CacheMode
from PIL import Image
import io

async def main():
    save_dir = os.path.dirname("wikipedia_screenshot.png")
    if save_dir and not os.path.exists(save_dir):
        os.makedirs(save_dir)

    async with AsyncWebCrawler() as crawler:
        result = await crawler.arun(
            url="https://en.wikipedia./wiki/List_of_common_misconceptions",
            cache_mode=CacheMode.BYPASS,
            pdf=True,
            screenshot=True
        )

        if result.success:
            if result.screenshot:
                image_data = b64decode(result.screenshot)
                image = Image.open(io.BytesIO(image_data))
                compressed_image = image.convert("RGB")
                compressed_image_io = io.BytesIO()
                compressed_image.save(compressed_image_io, format='JPEG', optimize=True, quality=20)
                compressed_image_io.seek(0)
                with open("wikipedia_screenshot.jpg", "wb") as f:
                    f.write(compressed_image_io.getvalue())
                print("[OK] Screenshot saved successfully (compressed).")


            if result.pdf:
                with open("wikipedia_page.pdf", "wb") as f:
                    f.write(result.pdf)
                print("[OK] PDF saved successfully.")

            print("[OK] PDF & compressed screenshot captured.")
        else:
            print(f"[ERROR] Failed to retrieve content: {result.error_message}")

if __name__ == "__main__":
    asyncio.run(main())

发布者：admin，转转请注明出处：http://www.yc00.com/questions/1744230935a4564246.html

python - Cant take a Screenshot using Crawl4ai - Stack Overflow

1 Answer 1

发表回复

评论列表（0条）

联系我们

400-800-8888

python - Cant take a Screenshot using Crawl4ai - Stack Overflow

1 Answer 1

相关推荐