Javascript templating language in reverse - Stack Overflow

Is there anything like a templating engine (a la Mustache.js) that can do templating in "reverse&q

Is there anything like a templating engine (a la Mustache.js) that can do templating in "reverse"?

That means that I provide rendered html and a template file, run it through the engine, and get data from it (say a JSON structure).

I realize this is the sort of thing that could be done with a "screen scraping library", but I have never seen a screen-scraping library that uses mustache-style templates (whatever those are called).

Is there anything like a templating engine (a la Mustache.js) that can do templating in "reverse"?

That means that I provide rendered html and a template file, run it through the engine, and get data from it (say a JSON structure).

I realize this is the sort of thing that could be done with a "screen scraping library", but I have never seen a screen-scraping library that uses mustache-style templates (whatever those are called).

Share Improve this question asked Aug 11, 2013 at 4:20 themirrorthemirror 10.3k8 gold badges49 silver badges85 bronze badges 4
  • If you use ruby, nokogiri solves that problem well, though it's not exactly what you're describing – Jonah Commented Aug 11, 2013 at 5:01
  • There is also an older python library called Scrapemark which does essentially this. – Troy Commented Aug 11, 2013 at 5:09
  • aka: how to reduce/press many similar html pages to json. scrapemark = github./arshaw/scrapemark – milahu Commented Nov 13, 2022 at 10:39
  • generic problem: tree pression of many similar trees – milahu Commented Nov 13, 2022 at 10:52
Add a ment  | 

2 Answers 2

Reset to default 6

A generic solution doesn't exist. E.g. you can never reverse the following template: {{foo}}{{bar}}, since it is impossible to find where the first mustache stops and the second one starts.

For example:

html: 'hello world!'
template: '{{foo}}{{bar}}'
model: {
    foo: '',
    bar: 'hello world!'
}
model2: {
    foo: 'hello world!',
    bar: ''
}

model and model2 both render the exact same html from the template, so they are both valid reverses.

But if you make some rules for the templates, it is possible to do this without ambiguities.

Rules:

  1. Two mustaches can never touch (explained above).
  2. The beginning of the content of a mustache can never be the same as the first text part after the mustache (or we cannot find the end of the mustache).
  3. The first text part in a section can not be the same as the first text part after a section (or we cannot find the end of the section).
  4. It is better to not use the richtext {{{}}} mustache (it is allowed to contain anything, so reverse matching means it can match the rest of the document).

These rules seem to be very restrictive for plaintext, but for xml and html they work pretty well (if you are only interested in element and attribute contents). Rule two is never a problem if you only use plaintext {{}} mustaches for instance.

The following template can be reversed without any ambiguities for instance:

<div>
    <p>{{title}}</p>
    <ul>
        {{#list}}
            <li>{{item}}</li>
        {{/list}}
    </ul>
</div>

But adding another <li> just before the </ul> will make the template ambiguous (rule 3).

I haven't found any code online that does this, so I have started writing a library for this. But it isn't finished by far, and every time I work on it, I find new limitations. Only for really simple templates this works ok (the only mustaches I allow are {{}}, {{#}} and {{/}}).


I found a solution using another templating system: https://github./fabiomcosta/mootools-meio-template/tree/master. It seems to have the same limitations.

Parseur is a reverse template engine. And it also includes a template editor to create those "reverse" templates. Of course, Parseur cannot do miracles (see @blerik answer), but it can repeatably extract data from similar documents.

One nice feature is that you can add more templates and it will check all of them in parallel and pick the one that can extract the most fields.

Output is in JSON and fields can be optionally formatted as number, date, address, nested or even in tabular format.

发布者:admin,转转请注明出处:http://www.yc00.com/questions/1745389455a4625593.html

相关推荐

  • Javascript templating language in reverse - Stack Overflow

    Is there anything like a templating engine (a la Mustache.js) that can do templating in "reverse&q

    6小时前
    30

发表回复

评论列表(0条)

  • 暂无评论

联系我们

400-800-8888

在线咨询: QQ交谈

邮件:admin@example.com

工作时间:周一至周五,9:30-18:30,节假日休息

关注微信