html - Can't access innerText property using Puppeteer - .$$eval and .$$ is not yielding results - JavaScript - Stack Ov

I am working on a web scraper that searches Google for certain things and then pulls text from the resu

I am working on a web scraper that searches Google for certain things and then pulls text from the result page, and I am having an issue getting Puppeteer to return the text I need. What I want to return is an array of strings.

Let's say I have a couple nested divs within a div, and each has text like so:

 <div class='mainDiv'>
   <div>Mary Doe </div>
   <div> James Dean </div>
 </div>

In the DOM, I can do the following to get the result I need:

document.querySelectorAll('.mainDiv')[0].innerText.split('\n')

This yields: ["Mary Doe", "James Dean"].

I understand that Puppeteer doesn't return NodeLists, and instead it uses JSHandles, but I still can't figure out how to get any information using the prescribed methods. See below for what I have tried in Puppeteer and the corresponding console output:

In every scenario, I do await page.waitFor('selector') to start.

Scenario 1 (using .$$eval()):

const genreElements = await page.$$eval('div.mainDiv', el => el);
console.log(genreElements) // [] 

Scenario 2 (using evaluate):

function extractItems() {
   const extractedElements = document.querySelectorAll('div.mainDiv')[0].innerText.split('\n')
   return extractedElements
}
      
let items = await page.evaluate(extractItems)
console.log(items) // UnhandledPromiseRejectionWarning: Error: Evaluation failed: TypeError: Cannot read property 'innerText' of undefined

Scenario 3 (using evaluateHandle):

const selectorHandle = await page.evaluateHandle(() => document.querySelectorAll('div.mainDiv'))
const resultHandle = await page.evaluate(x => x[0], selectorHandle)
console.log(resultHandle) // undefined

Any help or guidance on how I am implementing or how to achieve what I am looking to do is much appreciated. Thank you!

I am working on a web scraper that searches Google for certain things and then pulls text from the result page, and I am having an issue getting Puppeteer to return the text I need. What I want to return is an array of strings.

Let's say I have a couple nested divs within a div, and each has text like so:

 <div class='mainDiv'>
   <div>Mary Doe </div>
   <div> James Dean </div>
 </div>

In the DOM, I can do the following to get the result I need:

document.querySelectorAll('.mainDiv')[0].innerText.split('\n')

This yields: ["Mary Doe", "James Dean"].

I understand that Puppeteer doesn't return NodeLists, and instead it uses JSHandles, but I still can't figure out how to get any information using the prescribed methods. See below for what I have tried in Puppeteer and the corresponding console output:

In every scenario, I do await page.waitFor('selector') to start.

Scenario 1 (using .$$eval()):

const genreElements = await page.$$eval('div.mainDiv', el => el);
console.log(genreElements) // [] 

Scenario 2 (using evaluate):

function extractItems() {
   const extractedElements = document.querySelectorAll('div.mainDiv')[0].innerText.split('\n')
   return extractedElements
}
      
let items = await page.evaluate(extractItems)
console.log(items) // UnhandledPromiseRejectionWarning: Error: Evaluation failed: TypeError: Cannot read property 'innerText' of undefined

Scenario 3 (using evaluateHandle):

const selectorHandle = await page.evaluateHandle(() => document.querySelectorAll('div.mainDiv'))
const resultHandle = await page.evaluate(x => x[0], selectorHandle)
console.log(resultHandle) // undefined

Any help or guidance on how I am implementing or how to achieve what I am looking to do is much appreciated. Thank you!

Share edited Jun 8, 2021 at 12:07 DisappointedByUnaccountableMod 6,8464 gold badges20 silver badges23 bronze badges asked Dec 5, 2018 at 21:16 Nigel FinleyNigel Finley 1252 gold badges4 silver badges10 bronze badges 1
  • Instead of querySelectorAll()[0] (get all, then throw everything away but the first) why not querySelector() (get the first)? – ggorlen Commented Mar 14, 2023 at 20:34
Add a ment  | 

3 Answers 3

Reset to default 4

Use page.$$eval() or page.evaluate():

You can use page.$$eval() or page.evaluate() to run Array.from(document.querySelectorAll()) within the page context and map() the innerText of each element to the result array:

const names_1 = await page.$$eval('.mainDiv > div', divs => divs.map(div => div.innerText));
const names_2 = await page.evaluate(() => Array.from(document.querySelectorAll('.mainDiv > div'), div => div.innerText));

Note: Keep in mind that if you use Puppeteer to automate searches on Google, you may be temporarily blocked and end up with an "Unusual traffic from your puter network" notice, requiring you to solve a reCAPTCHA. This may break your web scraper, so proceed with caution.

Try it like this:

let names = page.evaluate(() => [...document.querySelectorAll('.mainDiv div')].map(div => div.innerText))

That way you can test the whole thing in the chrome console.

Using page.$eval:

const names = await page.$eval('.mainDiv', (element) => {
    return element.innerText
});

Here the element is retrieved by selector and directly passed to the function to be evaluated.

Using page.evaluate:

const namesElem = await page.$('.mainDiv');
const names = await page.evaluate(namesElem => namesElem.innerText, namesElem);

This is basically the first method split up into two steps. The interesting part is that ElementHandles can be passed as arguments in page.evaluate() and can be evaluated like JSHandles.

Note that for simplicity and clarification I used the methods for retrieving single elements. But page.$$() and page.$$eval() work the same way while selecting multiple elements and returning an array instead.

发布者:admin,转转请注明出处:http://www.yc00.com/questions/1744917752a4600940.html

相关推荐

发表回复

评论列表(0条)

  • 暂无评论

联系我们

400-800-8888

在线咨询: QQ交谈

邮件:admin@example.com

工作时间:周一至周五,9:30-18:30,节假日休息

关注微信