javascript - Regex for removing a ** comment in js - Stack Overflow

How do I remove something like this from a string using regex in JavaScript? *multi-line ment*This i

How do I remove something like this from a string using regex in JavaScript?

/*
    multi-line ment
*/

This is what I have tried:

var regex = /(\/\*(^\*\/)*\*\/)/g;
string = string.replace(regex, '');

How do I remove something like this from a string using regex in JavaScript?

/*
    multi-line ment
*/

This is what I have tried:

var regex = /(\/\*(^\*\/)*\*\/)/g;
string = string.replace(regex, '');
Share Improve this question edited Mar 28, 2014 at 20:41 Sampson 269k76 gold badges545 silver badges568 bronze badges asked Mar 28, 2014 at 20:39 user2981107user2981107 874 bronze badges 4
  • 2 Wrong tool, use esprima and escodegen- JavaScript is not regular – Benjamin Gruenbaum Commented Mar 28, 2014 at 20:42
  • I'd say that looks suspicously like C style ments. If not, there are no rules, and its simply /\/\*.*\*\// for greedy, /\/\*.*?\*\// for non-greedy. – user557597 Commented Mar 28, 2014 at 20:53
  • 1 Regular expressions cannot correctly remove multi-line ments from JavaScript. You can e close, but there will be edge cases that will break. – zzzzBov Commented Mar 28, 2014 at 20:57
  • But, if C-syle rules, ments and quotes dance around to be first. – user557597 Commented Mar 28, 2014 at 20:57
Add a ment  | 

3 Answers 3

Reset to default 6

If you'd like to match /* followed by any amount of text that does not contain */, followed by */, then you can use a regular expression, however it will not correctly remove block ments from JavaScript.

A simple example of where the pattern I described fails is:

var a = '/*'; /* block ment */

Note that the first /* will be matched even though it's contained in a string. If you can guarantee that the content you are searching within does not contain such inconsistencies, or are just using the regular expression to find places to make manual changes, then you should be reasonably safe. Otherwise, don't use regular expressions because they're the wrong tool for the job in this case; you have been warned.


To build the regular expression, you just have to break down my first sentence into its posite parts.

  • / starts the regular expression literal
  • \/\* matches the literal characters /*
  • [\s\S]*? matches any character in a non-greedy manner
  • \*\/ matches the literal characters */
  • / ends the regular expression

All together you end up with:

/\/\*[\s\S]*?\*\//

The non-greedy matching is necessary to prevent a close ment (*/) from getting captured when multiple block ments are in a file:

/* foo */
var foo = 'bar';
/* fizz */
var fizz = 'buzz';

With the non-greedy matching,

/* foo */

and

/* fizz */

would be matched, without the non-greedy matching,

/* foo */
var foo = 'bar';
/* fizz */

would be matched.

All the answers that use a regular expression pletely fail here for several cases:

var myString = '/*Hello World!*/'; // inside a string
var a = "/*b", c = /.*/g; // inside a string partially, and inside a regex literal

// /*
alert("This will not fire with the regular expressions, but works in JS");
// */
var/**/b = 5; // perfectly valid, replacing a ment with nothing is simply incorrect

For some of the more obvious ones. Regular expressions are simply not strong enough to parse ments correctly, they need to be aware of the language syntax.

So, a regular expression fails, what's left? A parser. Is it hard? Not really.

Let's look at the JavaScript syntax ourselves! The section on ments states:

MultiLineComment ::
    /* MultiLineCommentCharsopt */

That's good, it means that when we're inside a multiline ment, we do not exit it until we reach */ and then we exit it immediately.

But when can ments appear? Pretty much anywhere outside of literals. Out of the 5 literals we have , multiline ment tokens can only appear in string literals and regexp literals.

function parse(code){
    // state
    var isInRegExp = false;
    var isInString = false;
    var terminator = null; // to hold the string terminator
    var escape = false; // last char was an escape
    var isInComment = false;

    var c = code.split(""); // code

    var o = []; // output
    for(var i = 0; i < c.length; i++){
        if(isInString) {  // handle string literal case
             if(c[i] === terminator && escape === false){
                  isInString = false;
                  o.push(c[i]);
             } else if (c[i] === "\\") { // escape
                  escape = true;
             } else {
                  escape = false;
                  o.push(c[i]); 
             }
        } else if(isInRegExp) { // regular expression case
             if(c[i] === "/" && escape === false){
                 isInRegExp = false;
                 o.push(c[i]);
             } else if (c[i] === "\\") {
                 escape = true;
             } else { 
                escape = false;
                o.push(c[i]);
             }
        } else if (isInComment) { // ment case
              if(c[i] === "*" && c[i+1] === "/"){
                  isInComment = false;
                  i++;
                  // Note - not pushing ments to output
              }
        } else {   // not in a literal
              if(c[i] === "/" && c[i+1] === "/") { // single line ment
                   while(c[i] !== "\n" && c[i] !== undefined){ //end or new line
                       i++;
                   }
              } else if(c[i] === "/" && c[i+1] === "*"){ // start ment
                    isInComment = true;
                    o.push(" "); // add a space per spec
                    i++; // don't catch /*/
              } else if(c[i] === "/"){ // start regexp literal
                    isInRegExp = true;
                    o.push(c[i]);
              } else if(c[i] === "'" || c[i] === '"'){ // string literal
                    isInString = true;
                    o.push(c[i]);
                    separator = c[i];
              } else { // plain ol' code
                    o.push(c[i]);
              }
        }
    }
    return o.join("");
}

I just wrote this in the console, it's long - but can you see how simple it is? It's really simple in concept - it just keeps track of where in the code it is and based on that consumes the word.

Let's try it:

parse("var a = 'hello world'"); // var a = 'hello world' 
parse("var/**/a = 'hello world'"); // var a = 'hello world' 
parse("var myString = '/*Hello World!*/';"); // var myString = '/*Hello World!*/';
parse('var a = "/*b", c = /.*/g;'); // var a = "/*b", c = /.*/g;
parse("var a; /* remove me please! */"); // var a;
parse("var x = /* \n \n Hello World Multiline String \n \n */ 5"); // var x =   5 

Following will remove mands and also spams from Javascript.

 var regex = /^(\s*[^\s]*\s*)$/g;
 string = string.replace(regex, '');

Hope this will help...

发布者:admin,转转请注明出处:http://www.yc00.com/questions/1742287981a4415666.html

相关推荐

  • javascript - Regex for removing a ** comment in js - Stack Overflow

    How do I remove something like this from a string using regex in JavaScript? *multi-line ment*This i

    19小时前
    20

发表回复

评论列表(0条)

  • 暂无评论

联系我们

400-800-8888

在线咨询: QQ交谈

邮件:admin@example.com

工作时间:周一至周五,9:30-18:30,节假日休息

关注微信