python - Cant take a Screenshot using Crawl4ai - Stack Overflow

I am currently trying to take a screenshot of a given web page using Crawl4ai, however each time that I

I am currently trying to take a screenshot of a given web page using Crawl4ai, however each time that I try to do it I get an error or I don't get anything.

Here is the code I used that is the same from their own documentation:

import os, asyncio
from base64 import b64decode
from crawl4ai import AsyncWebCrawler, CacheMode

async def main():
  async with AsyncWebCrawler() as crawler:
    result = await crawler.arun(
        url=";,
        cache_mode=CacheMode.BYPASS,
        pdf=True,
        screenshot=True
    )

    if result.success:
        # Save screenshot
        if result.screenshot:
            with open("wikipedia_screenshot.png", "wb") as f:
                f.write(b64decode(result.screenshot))

        # Save PDF
        if result.pdf:
            with open("wikipedia_page.pdf", "wb") as f:
                f.write(result.pdf)

        print("[OK] PDF & screenshot captured.")
    else:
        print("[ERROR]", result.error_message)

if __name__ == "__main__":
   asyncio.run(main())

And the error that I get:

Error: crawl4ai.async_webcrawler.AsyncWebCrawler.aprocess_html() got multiple values for keyword argument 'screenshot'

I am currently trying to take a screenshot of a given web page using Crawl4ai, however each time that I try to do it I get an error or I don't get anything.

Here is the code I used that is the same from their own documentation:

import os, asyncio
from base64 import b64decode
from crawl4ai import AsyncWebCrawler, CacheMode

async def main():
  async with AsyncWebCrawler() as crawler:
    result = await crawler.arun(
        url="https://en.wikipedia./wiki/List_of_common_misconceptions",
        cache_mode=CacheMode.BYPASS,
        pdf=True,
        screenshot=True
    )

    if result.success:
        # Save screenshot
        if result.screenshot:
            with open("wikipedia_screenshot.png", "wb") as f:
                f.write(b64decode(result.screenshot))

        # Save PDF
        if result.pdf:
            with open("wikipedia_page.pdf", "wb") as f:
                f.write(result.pdf)

        print("[OK] PDF & screenshot captured.")
    else:
        print("[ERROR]", result.error_message)

if __name__ == "__main__":
   asyncio.run(main())

And the error that I get:

Error: crawl4ai.async_webcrawler.AsyncWebCrawler.aprocess_html() got multiple values for keyword argument 'screenshot'

Share Improve this question edited Mar 25 at 7:04 Basheer Jarrah 5713 silver badges16 bronze badges asked Mar 24 at 19:49 BernardoBernardo 411 silver badge7 bronze badges 3
  • Aside answer, depending the environment of course: using firefox --screenshot https://en.wikipedia./wiki/List_of_common_misconceptions. Calling that from python would require import subprocess. Just an idea, there is many alternatives. – NVRM Commented Mar 25 at 7:10
  • I am not sure why it's so large, but you could from this simply compress it to .jpg in your script, so it's automatic. You get 1Mb picture or so, Then you delete the original to save space. This might not be the best way if you have limited power, but doing this locally should be almost instantaneous. There is so many ways, find what is best for your use case, this is how we build our tooling. Later you will appreciate having spent time on this, it will be easier. GL – NVRM Commented Mar 25 at 19:56
  • Gotta love ghost accounts. – NVRM Commented 2 days ago
Add a comment  | 

1 Answer 1

Reset to default 0

I have tested your script, and it mostly work, on my machine. It's possible that your Crawl4ai browser setup is misconfigured.

Using that particular URL (a long scrolling page), yes, the output PNG is 154Mb, that's way too much for this format to be opened.

I have added a compression step using pillow (pip install pillow), and changed the output format to JPEG, with quite high 20% compression (quality=20), this results in a fairly usable 4.8MB JPEG.

Going lower in compression is very destructive, but you could try to adjust that quality.

#!/usr/bin/python
import os
import asyncio
from base64 import b64decode
from crawl4ai import AsyncWebCrawler, CacheMode
from PIL import Image
import io

async def main():
    save_dir = os.path.dirname("wikipedia_screenshot.png")
    if save_dir and not os.path.exists(save_dir):
        os.makedirs(save_dir)

    async with AsyncWebCrawler() as crawler:
        result = await crawler.arun(
            url="https://en.wikipedia./wiki/List_of_common_misconceptions",
            cache_mode=CacheMode.BYPASS,
            pdf=True,
            screenshot=True
        )

        if result.success:
            if result.screenshot:
                image_data = b64decode(result.screenshot)
                image = Image.open(io.BytesIO(image_data))
                compressed_image = image.convert("RGB")
                compressed_image_io = io.BytesIO()
                compressed_image.save(compressed_image_io, format='JPEG', optimize=True, quality=20)
                compressed_image_io.seek(0)
                with open("wikipedia_screenshot.jpg", "wb") as f:
                    f.write(compressed_image_io.getvalue())
                print("[OK] Screenshot saved successfully (compressed).")


            if result.pdf:
                with open("wikipedia_page.pdf", "wb") as f:
                    f.write(result.pdf)
                print("[OK] PDF saved successfully.")

            print("[OK] PDF & compressed screenshot captured.")
        else:
            print(f"[ERROR] Failed to retrieve content: {result.error_message}")

if __name__ == "__main__":
    asyncio.run(main())

发布者:admin,转转请注明出处:http://www.yc00.com/questions/1744230935a4564246.html

相关推荐

  • python - Cant take a Screenshot using Crawl4ai - Stack Overflow

    I am currently trying to take a screenshot of a given web page using Crawl4ai, however each time that I

    8天前
    20

发表回复

评论列表(0条)

  • 暂无评论

联系我们

400-800-8888

在线咨询: QQ交谈

邮件:admin@example.com

工作时间:周一至周五,9:30-18:30,节假日休息

关注微信