javascript - Extract only text content from a web page - Stack Overflow

I need to extract all the text content from a web page. I have used 'document.body.textContent

I need to extract all the text content from a web page. I have used 'document.body.textContent'. But I get the javascript content as well.How do I ensure that I get only the readable text content?

function myFunction() {
  var str = document.body.textContent
  alert(str);
}
<html>
<title>Test Page for Text extraction</title>

<head>I hope this works</head>
<script src=".1.3/jquery.min.js"></script>

<body>
  <p>Test on this content to change the 5th word to a link
    <p>
      <button onclick="myFunction()">Try it</button>
</body>
</hmtl>

I need to extract all the text content from a web page. I have used 'document.body.textContent'. But I get the javascript content as well.How do I ensure that I get only the readable text content?

function myFunction() {
  var str = document.body.textContent
  alert(str);
}
<html>
<title>Test Page for Text extraction</title>

<head>I hope this works</head>
<script src="https://ajax.googleapis./ajax/libs/jquery/2.1.3/jquery.min.js"></script>

<body>
  <p>Test on this content to change the 5th word to a link
    <p>
      <button onclick="myFunction()">Try it</button>
</body>
</hmtl>

Share Improve this question asked Sep 28, 2015 at 14:49 vjravivjravi 861 silver badge6 bronze badges
Add a ment  | 

2 Answers 2

Reset to default 5

Just remove the tags you dont want read before doing body.textContent.

function myFunction() {
  var bodyScripts = document.querySelectorAll("body script");
  for(var i=0; i<bodyScripts.length; i++){
      bodyScripts[i].remove();
  }
  var str = document.body.textContent;
  document.body.innerHTML = '<pre>'+str+'</pre>';
}
<html>
<title>Test Page for Text extraction</title>

<head>I hope this works</head>
<script src="https://ajax.googleapis./ajax/libs/jquery/2.1.3/jquery.min.js"></script>

<body>
  <p>Test on this content to change the 5th word to a link
    <p>
      <button onclick="myFunction()">Try it</button>
</body>
</hmtl>

Try document.body.innerText.

This MDN article describes the differences between textContent and innerText:

Don't get confused by the differences between Node.textContent and HTMLElement.innerText. Although the names seem similar, there are important differences:

  • textContent gets the content of all elements, including <script> and <style> elements. In contrast, innerText only shows "human-readable" elements.
  • textContent returns every element in the node. In contrast, innerText is aware of styling and won't return the text of "hidden" elements.

发布者:admin,转转请注明出处:http://www.yc00.com/questions/1744298389a4567376.html

相关推荐

  • javascript - Extract only text content from a web page - Stack Overflow

    I need to extract all the text content from a web page. I have used 'document.body.textContent

    7天前
    20

发表回复

评论列表(0条)

  • 暂无评论

联系我们

400-800-8888

在线咨询: QQ交谈

邮件:admin@example.com

工作时间:周一至周五,9:30-18:30,节假日休息

关注微信