node.js - Bypassing Cloudflare with Puppeteer and FlareSolver - Stack Overflow

In the past few weeks, we have been working on web scraping on .Initially, we used only Puppeteer, bu

In the past few weeks, we have been working on web scraping on /. Initially, we used only Puppeteer, but quite often, the browser encountered a Cloudflare challenge page displaying the message:

"Waiting for the website to respond."

To overcome this, we tried several alternative approaches:

  • Puppeteer + rotating proxy
  • puppeteer-extra-plugin-stealth
  • Puppeteer-real-browser + rotating proxy
  • Puppeteer-real-browser + FlareSolverr pre-request + rotating proxy

Current Approach

To address the issue, we decided to make a pre-request to the site using FlareSolverr. We then extracted the obtained cookies and user agent and passed them to Puppeteer for browser navigation.

However, we encountered two key issues:

FlareSolverr fails to solve the Cloudflare challenge

  • When FlareSolverr detects a Cloudflare challenge, it fails to bypass it, logging the error:

Error solving the challenge. Timeout after X seconds.

  • When FlareSolverr does not detect a challenge and successfully completes the pre-request, we extract the cookies and user agent, set them in Puppeteer, and navigate to the target page.

However, Puppeteer still encounters the Cloudflare challenge page. This suggests that FlareSolverr might not be detecting the challenge properly and, therefore, does not retrieve the necessary cookies.

Question What are we doing wrong? It seems that FlareSolverr reduces the likelihood of hitting the challenge but fails when it actually encounters one.

What would be the best approach to ensure Puppeteer can bypass Cloudflare protection?

Code Snippet typescript

let flaresolverrData: any;
let attempts = 0;
const maxAttempts = 5;

// pre request to flaresolverr, if it fails we try again.
while (attempts < maxAttempts) {
  try {
    let response = await fetch("http://localhost:8191/v1", {
      method: "POST",
      headers: { "Content-Type": "application/json" },
      body: JSON.stringify({
        cmd: "request.get",
        url: actionParams.url,
        maxTimeout: 5000,
        session: flaresolverrSessionId,
      }),
    });

    flaresolverrData = await response.json();

    if (flaresolverrData?.status === "error") {
      throw new Error(flaresolverrData?.message ?? "FlareSolverr error");
    }

    break;
  } catch (error: any) {
    attempts++;
    if (attempts === maxAttempts) {
      throw new Error(`Failed to fetch after ${attempts} attempts: ${error.message}`);
    }
    await new Promise((resolve) => setTimeout(resolve, 1000)); // Wait 1 second between retries
  }
}

if (!flaresolverrData) throw new Error("FlareSolverr data not found");

const cookies = flaresolverrData.solution.cookies;
const userAgent = flaresolverrData.solution.userAgent;

if (!userAgent) throw new Error("User agent not found");

if (cookies.length !== 0) {
  await browser.setCookie(
    ...cookies.map((cookie: any) => ({ ...cookie, expires: cookie?.expiry ?? 0 }))
  );
}

await browserPage.setUserAgent(userAgent);

await delay(Math.random() * 10000 + 1000);
await browserPage.goto(actionParams.url, { waitUntil: "networkidle0" });

const content = await browserPage.content();
return content;

发布者:admin,转转请注明出处:http://www.yc00.com/questions/1744687602a4588014.html

相关推荐

发表回复

评论列表(0条)

  • 暂无评论

联系我们

400-800-8888

在线咨询: QQ交谈

邮件:admin@example.com

工作时间:周一至周五,9:30-18:30,节假日休息

关注微信