regex - Javascript: find URLs in a document - Stack Overflow

how do I find URLs (i.e. www.domain) within a document, and put those within anchors: < a href="

how do I find URLs (i.e. www.domain) within a document, and put those within anchors: < a href="www.domain" >www.domain< /a >

html:

Hey dude, check out this link www.google and www.yahoo!

javascript:

(function(){var text = document.body.innerHTML;/*do replace regex => text*/})();

output:

Hey dude, check out this link <a href="www.google">www.google</a> and <a href="www.yahoo">www.yahoo</a>!

how do I find URLs (i.e. www.domain.) within a document, and put those within anchors: < a href="www.domain." >www.domain.< /a >

html:

Hey dude, check out this link www.google. and www.yahoo.!

javascript:

(function(){var text = document.body.innerHTML;/*do replace regex => text*/})();

output:

Hey dude, check out this link <a href="www.google.">www.google.</a> and <a href="www.yahoo.">www.yahoo.</a>!
Share Improve this question asked Apr 14, 2010 at 22:48 user317005user317005
Add a ment  | 

2 Answers 2

Reset to default 6

Firstly, www.domain. isn't a URL, it's a hostname, and

<a href="www.domain.">

won't work — it'll look for a . file called www.domain relative to the current page.

It's not possible to highlight hostnames in the general case because almost anything can be a hostname. You could try to highlight ‘www.something.dot.separated.words’, but it's not really that reliable and there are many sites that don't use the www. hostname prefix. I'd try to avoid that.

/\bhttps?:\/\/[^\s<>"`{}|\^\[\]\\]+/;

This is an very liberal pattern you could use as a starting point for detecting HTTP URLs. Depending on what sort of input you've got you may want to narrow down what it allows, and it may be worth detecting trailing characters like . or ! that would be valid parts of the URL but in practice generally aren't.

(You could use a | to allow either the URL syntax or the www.hostname syntax, if you like.)

Anyhow, once you've settled on your preferred pattern you'll need to find that pattern in text nodes on the page. Don't run the regexp over innerHTML markup. You'll end up pletely ruining the page by trying to mark up every href="http://something" that's already inside markup. You'll also destroy any existing JavaScript references, events or form field values when you replace the innerHTML content.

In general regexp simply cannot process HTML in any reliable way. So take advantage of the fact that the browser has already parsed the HTML into elements and text nodes, and just look at the text nodes. You'll also want to avoid looking inside <a> elements, since marking up a URL as a link when it's already in a link is silly (and invalid).

// Mark up `http://...` text in an element and its descendants as links.
//
function addLinks(element) {
    var urlpattern= /\bhttps?:\/\/[^\s<>"`{}|\^\[\]\\]+/g;
    findTextExceptInLinks(element, urlpattern, function(node, match) {
        node.splitText(match.index+match[0].length);
        var a= document.createElement('a');
        a.href= match[0];
        a.appendChild(node.splitText(match.index));
        node.parentNode.insertBefore(a, node.nextSibling);
    });
}

// Find text in descendents of an element, in reverse document order
// pattern must be a regexp with global flag
//
function findTextExceptInLinks(element, pattern, callback) {
    for (var childi= element.childNodes.length; childi-->0;) {
        var child= element.childNodes[childi];
        if (child.nodeType===Node.ELEMENT_NODE) {
            if (child.tagName.toLowerCase()!=='a')
                findTextExceptInLinks(child, pattern, callback);
        } else if (child.nodeType===Node.TEXT_NODE) {
            var matches= [];
            var match;
            while (match= pattern.exec(child.data))
                matches.push(match);
            for (var i= matches.length; i-->0;)
                callback.call(window, child, matches[i]);
        }
    }
}

I've never used it, but this looks like a decent bit of code to leverage:

http://github./cowboy/javascript-linkify

发布者:admin,转转请注明出处:http://www.yc00.com/questions/1745143303a4613528.html

相关推荐

  • regex - Javascript: find URLs in a document - Stack Overflow

    how do I find URLs (i.e. www.domain) within a document, and put those within anchors: < a href="

    6小时前
    20

发表回复

评论列表(0条)

  • 暂无评论

联系我们

400-800-8888

在线咨询: QQ交谈

邮件:admin@example.com

工作时间:周一至周五,9:30-18:30,节假日休息

关注微信