javascript - Puppeteer Bright Data proxy returning ERR_NO_SUPPORTED_PROXY or CERT errors - Stack Overflow

So I went on Bright Data, made an account, and got my Search Engine Crawler proxy. Here's my scrap

So I went on Bright Data, made an account, and got my Search Engine Crawler proxy. Here's my scraping function below:

async function scrape() {
  try {
    const preparePageForTests = async (page) => {

          const userAgent = 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36';//'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36';

          await page.setUserAgent(userAgent);

          await page.evaluateOnNewDocument(() => {
            Object.defineProperty(navigator, 'webdriver', {
              get: () => false,
            });
          });

          // Pass the Chrome Test.
          await page.evaluateOnNewDocument(() => {
            // We can mock this in as much depth as we need for the test.
            window.navigator.chrome = {
              app: {
                isInstalled: false,
              },
              webstore: {
                onInstallStageChanged: {},
                onDownloadProgress: {},
              },
              runtime: {
                PlatformOs: {
                  MAC: 'mac',
                  WIN: 'win',
                  ANDROID: 'android',
                  CROS: 'cros',
                  LINUX: 'linux',
                  OPENBSD: 'openbsd',
                },
                PlatformArch: {
                  ARM: 'arm',
                  X86_32: 'x86-32',
                  X86_64: 'x86-64',
                },
                PlatformNaclArch: {
                  ARM: 'arm',
                  X86_32: 'x86-32',
                  X86_64: 'x86-64',
                },
                RequestUpdateCheckStatus: {
                  THROTTLED: 'throttled',
                  NO_UPDATE: 'no_update',
                  UPDATE_AVAILABLE: 'update_available',
                },
                OnInstalledReason: {
                  INSTALL: 'install',
                  UPDATE: 'update',
                  CHROME_UPDATE: 'chrome_update',
                  SHARED_MODULE_UPDATE: 'shared_module_update',
                },
                OnRestartRequiredReason: {
                  APP_UPDATE: 'app_update',
                  OS_UPDATE: 'os_update',
                  PERIODIC: 'periodic',
                },
              }
            };
          });

          await page.evaluateOnNewDocument(() => {
            const originalQuery = window.navigator.permissions.query;
            return window.navigator.permissions.query = (parameters) => (
              parameters.name === 'notifications' ?
                Promise.resolve({ state: Notification.permission }) :
                originalQuery(parameters)
            );
          });

          await page.evaluateOnNewDocument(() => {
            // Overwrite the `plugins` property to use a custom getter.
            Object.defineProperty(navigator, 'plugins', {
              // This just needs to have `length > 0` for the current test,
              // but we could mock the plugins too if necessary.
              get: () => [1, 2, 3, 4, 5],
            });
          });

          await page.evaluateOnNewDocument(() => {
            // Overwrite the `plugins` property to use a custom getter.
            Object.defineProperty(navigator, 'languages', {
              get: () => ['en-US', 'en'],
            });
          });
        }

        //the below is the Search Engine Crawler proxy used from the luminati/bright data sign up. This returns ERR_CERT_INVALID or ERR_CERT_AUTHORITY_INVALID
        const oldProxyUrl = 'http://lum-customer-customerID-zone-zone1:[email protected]:22225'
        const newProxyUrl = await proxyChain.anonymizeProxy(oldProxyUrl); //if this line is mented out, I get the ERR_NO_SUPPORTED_PROXY

        const browser = await puppeteerExtra.launch({ 
          headless: true, 
          args: [                
            '--no-sandbox', 
            '--disable-setuid-sandbox', 
            `--proxy-server=${newProxyUrl}`
            //If I add 'ignoreHTTPSErrors: true' here then I can bypass the CERT errors but then it seems like I can't navigate the browser anymore to a different page.                     
          ]
        });

        const page = await browser.newPage();

        await preparePageForTests(page);

        await page.setViewport({ width: 1440, height: 1080 });

        await page.goto('+near+new+york');   
        
        await page.screenshot({ path: `screenshot.jpeg` });

  } catch(err) {
    console.log(err)
  }
}

Not sure how to solve this. I believe the error here is with bypassing CERT errors with ignoreHttpsErrors. When I don't use a proxy at all, my analysis function (which essentially takes in the first 'ul' list seen below) works fine but if I use the proxy, it for some reason is giving me the data on the second page.

Any help would be much appreciated!

The 'ul' is nicely formatted and the data is easy to get at: .jpg

only a few 'ul' elements are visible and then I get a bunch of stuff I don't want returned. I tried doing a

page.$eval(".BXE0fe", element => element.click())

but that isn't redirecting the page for some reason: .png

So I went on Bright Data, made an account, and got my Search Engine Crawler proxy. Here's my scraping function below:

async function scrape() {
  try {
    const preparePageForTests = async (page) => {

          const userAgent = 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36';//'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36';

          await page.setUserAgent(userAgent);

          await page.evaluateOnNewDocument(() => {
            Object.defineProperty(navigator, 'webdriver', {
              get: () => false,
            });
          });

          // Pass the Chrome Test.
          await page.evaluateOnNewDocument(() => {
            // We can mock this in as much depth as we need for the test.
            window.navigator.chrome = {
              app: {
                isInstalled: false,
              },
              webstore: {
                onInstallStageChanged: {},
                onDownloadProgress: {},
              },
              runtime: {
                PlatformOs: {
                  MAC: 'mac',
                  WIN: 'win',
                  ANDROID: 'android',
                  CROS: 'cros',
                  LINUX: 'linux',
                  OPENBSD: 'openbsd',
                },
                PlatformArch: {
                  ARM: 'arm',
                  X86_32: 'x86-32',
                  X86_64: 'x86-64',
                },
                PlatformNaclArch: {
                  ARM: 'arm',
                  X86_32: 'x86-32',
                  X86_64: 'x86-64',
                },
                RequestUpdateCheckStatus: {
                  THROTTLED: 'throttled',
                  NO_UPDATE: 'no_update',
                  UPDATE_AVAILABLE: 'update_available',
                },
                OnInstalledReason: {
                  INSTALL: 'install',
                  UPDATE: 'update',
                  CHROME_UPDATE: 'chrome_update',
                  SHARED_MODULE_UPDATE: 'shared_module_update',
                },
                OnRestartRequiredReason: {
                  APP_UPDATE: 'app_update',
                  OS_UPDATE: 'os_update',
                  PERIODIC: 'periodic',
                },
              }
            };
          });

          await page.evaluateOnNewDocument(() => {
            const originalQuery = window.navigator.permissions.query;
            return window.navigator.permissions.query = (parameters) => (
              parameters.name === 'notifications' ?
                Promise.resolve({ state: Notification.permission }) :
                originalQuery(parameters)
            );
          });

          await page.evaluateOnNewDocument(() => {
            // Overwrite the `plugins` property to use a custom getter.
            Object.defineProperty(navigator, 'plugins', {
              // This just needs to have `length > 0` for the current test,
              // but we could mock the plugins too if necessary.
              get: () => [1, 2, 3, 4, 5],
            });
          });

          await page.evaluateOnNewDocument(() => {
            // Overwrite the `plugins` property to use a custom getter.
            Object.defineProperty(navigator, 'languages', {
              get: () => ['en-US', 'en'],
            });
          });
        }

        //the below is the Search Engine Crawler proxy used from the luminati/bright data sign up. This returns ERR_CERT_INVALID or ERR_CERT_AUTHORITY_INVALID
        const oldProxyUrl = 'http://lum-customer-customerID-zone-zone1:[email protected]:22225'
        const newProxyUrl = await proxyChain.anonymizeProxy(oldProxyUrl); //if this line is mented out, I get the ERR_NO_SUPPORTED_PROXY

        const browser = await puppeteerExtra.launch({ 
          headless: true, 
          args: [                
            '--no-sandbox', 
            '--disable-setuid-sandbox', 
            `--proxy-server=${newProxyUrl}`
            //If I add 'ignoreHTTPSErrors: true' here then I can bypass the CERT errors but then it seems like I can't navigate the browser anymore to a different page.                     
          ]
        });

        const page = await browser.newPage();

        await preparePageForTests(page);

        await page.setViewport({ width: 1440, height: 1080 });

        await page.goto('https://www.google./search?q=concerts+near+new+york');   
        
        await page.screenshot({ path: `screenshot.jpeg` });

  } catch(err) {
    console.log(err)
  }
}

Not sure how to solve this. I believe the error here is with bypassing CERT errors with ignoreHttpsErrors. When I don't use a proxy at all, my analysis function (which essentially takes in the first 'ul' list seen below) works fine but if I use the proxy, it for some reason is giving me the data on the second page.

Any help would be much appreciated!

The 'ul' is nicely formatted and the data is easy to get at: https://i.sstatic/RwiHM.jpg

only a few 'ul' elements are visible and then I get a bunch of stuff I don't want returned. I tried doing a

page.$eval(".BXE0fe", element => element.click())

but that isn't redirecting the page for some reason: https://i.sstatic/3DTay.png

Share Improve this question edited Jun 12, 2021 at 1:40 nickcoding2 asked Jun 10, 2021 at 0:25 nickcoding2nickcoding2 2941 gold badge17 silver badges48 bronze badges
Add a ment  | 

2 Answers 2

Reset to default 2 +75

Aside from the point Yevgeniy made about targeting Google (he's right btw, for Google you need to use their SERP product), if you're requesting through HTTPS you need to have their CA certificate installed and send requests through the Proxy Manager instead of through the Superproxy directly, and none of this would even matter for headless Chromium because of this unresolved Chromium bug.

Bright Data support can definitely help you with this issue - you can contact them through the chat bubble in your control panel or on Skype: luminati.io

This issue depends on which domain you are targeting (Bright Data blocks Google domains, you should use a SERP zone for google)

发布者:admin,转转请注明出处:http://www.yc00.com/questions/1745185437a4615615.html

相关推荐

发表回复

评论列表(0条)

  • 暂无评论

联系我们

400-800-8888

在线咨询: QQ交谈

邮件:admin@example.com

工作时间:周一至周五,9:30-18:30,节假日休息

关注微信