How to get the static, original HTML source via JavaScript?

While developing a tool (which I don't consider important detailing here, on the question, given that I was able to develop the MCVE's below), I noticed that, at least in the Chrome and Firefox versions that I have on my desktop, the string I get from the innerHTML attribute is not equal to the original source code I wrote statically on the HTML file.

console.log(document.querySelector("div").innerHTML);
/*
  <table>
    <tbody><tr>
      <td>Hello</td>
      <td>World</td>
    </tr>
  </tbody></table>
*/

<div>
  <table>
    <tr>
      <td>Hello</td>
      <td>World</td>
    </tr>
  </table>
</div>

console.log(document.querySelector("div").innerHTML);
/*
  <table>
    <tbody><tr>
      <td>Hello</td>
      <td>World</td>
    </tr>
  </tbody></table>
*/

<div>
  <table>
    <tr>
      <td>Hello</td>
      <td>World</td>
    </tr>
  </table>
</div>

As you may have noticed, a spontaneous <tbody> tag (which I have not added to my HTML source!) came out, aparently due to preprocessing some time in between the page download and the page onload event. In this particular case, for my application purposes, this modification doesn't generate an error and could thus be ignored.

Turns out that, in certain cases, this sort of alteration can be catastrophic, specially when all the markup is removed, like in the example below.

console.log(document.querySelector("div").innerHTML);
/*
  Hello
  World
*/

<div>
  <td>Hello</td>
  <td>World</td>
</div>

Obviously, in this case the original markup has issues, but in my application, "misuses" (like a <td> inside a <div>) are accepted. What is not accepted is the innerHTML being left with no HTML markup at all, which leads to the main question: how can I get the original, statically coded HTML markup for the <div> element?

Also, if possible, it would also be nice to know why and how this phenomenon occurs, because I'm curious :D

Share Improve this question edited May 23, 2017 at 12:01 CommunityBot 11 silver badge asked Nov 26, 2014 at 19:28 Rui Pimentel 6141 gold badge7 silver badges16 bronze badges

You ma want to look here though it may not be the answer to your question : stackoverflow./questions/938083/… – deepakborania Commented Nov 26, 2014 at 19:34
The link is related but still doesn't solve my problem... Anyways, this was informative :D thank you! – Rui Pimentel Commented Nov 26, 2014 at 19:41
1 The attempted misuse of td as child of div does not work. You cannot style the td elements or access them in a script, simply because they do not exist—the <td> and </td> tags are just ignored. – Jukka K. Korpela Commented Nov 26, 2014 at 19:44
You ask a valid question, but I wonder if you have a valid use case where this could actually cause a problem in your code. When you do, I expect the fix will be simple and obvious -- but trying to anticipate all the possible problems with a flexible syntax prototyping tool will be neither simple nor obvious, and likely a huge waste of time. As Jukka pointed out, your second example is not exactly a valid use case. – wwwmarty Commented Nov 26, 2014 at 19:51
Yes... you're both right, it's not valid in vanilla HTML. But what my tool does is really filling this gap left by the lack of <table> and <tr> tags, in this example, by inserting those <td>'s at runtime in an attempt to reduce the the markup plexity. It's an internal solution for page prototyping, and it is already working reallly good, actually, but I want to improve it by removing this barrier. – Rui Pimentel Commented Nov 26, 2014 at 19:53

Add a ment |

2 Answers 2

Sorted by: Reset to default 6

The browser downloads the HTML source and parses it into a DOM (document object model). Any issues are fixed as good as possible, and elements that can be omitted in the source might be added in the DOM.

From that moment on, this memory structure is used to render the page, and it is this structure as well what you refer to in JavaScript. So if you request the innerHTML of an element, you just get a piece of HTML source code that is rendered based on the DOM. The original source is not available at all in JavaScript.

So, that's the reason why it happens. And also there is not much you can do about it. I think the only workaround is to re-load the entire page using AJAX into a string and get the required piece of source yourself.

But a better solution, obviously, would be to remove those "misuses" and make your HTML source valid. If you just need to enclose some information in the page to be used by JavaScript alone, you might choose to embed a script tag that initializes a couple of variables with those values, rather than generating some invalid HTML.

I've tried to do something like this at work before. In some of my solutions I've structured a table, with table rows around the table data elements that I want to use, just so I can use the table datas. If you want to do a little more processing on the javascript side of things, you could potentially do something like this:

<div>
    <div class="td">Hello</div>
    <div class="td">World</div>
</div>

And then you could process this with javascript to turn the div.td's into actual td's. Just an idea.

发布者：admin，转转请注明出处：http://www.yc00.com/questions/1744581966a4582090.html

How to get the static, original HTML source via JavaScript? - Stack Overflow

2 Answers 2

发表回复

评论列表（0条）

联系我们

400-800-8888

How to get the static, original HTML source via JavaScript? - Stack Overflow

2 Answers 2

相关推荐