javascript - Regex match every string inside double quotes and include escaped quotation marks - Stack Overflow

There are quite a few similar questions already but none of them works in my case. I have a string that

There are quite a few similar questions already but none of them works in my case. I have a string that contains multiple substrings inside double quotes and these substrings can contain escaped double quotes.

For example for the string 'And then, "this is some sample text with quotes and \"escaped quotes\" inside". Not that we need more, but... "here is \"another\" one". Just in case.', the expected result is an array with two elements;

  • "this is some sample text with quotes and \"escaped quotes\" inside"
  • "here is \"another\" one"

The /"(?:\\"|[^"])*"/g regex works as expected on regex101; however, when I use String#match() the result is different. Check out the snippet below:

let str = 'And then, "this is some sample text with quotes and \"escaped quotes\" inside". Not that we need more, but... "here is \"another\" one". Just in case.'
let regex = /"(?:\\"|[^"])*"/g

console.log(str.match(regex))

There are quite a few similar questions already but none of them works in my case. I have a string that contains multiple substrings inside double quotes and these substrings can contain escaped double quotes.

For example for the string 'And then, "this is some sample text with quotes and \"escaped quotes\" inside". Not that we need more, but... "here is \"another\" one". Just in case.', the expected result is an array with two elements;

  • "this is some sample text with quotes and \"escaped quotes\" inside"
  • "here is \"another\" one"

The /"(?:\\"|[^"])*"/g regex works as expected on regex101; however, when I use String#match() the result is different. Check out the snippet below:

let str = 'And then, "this is some sample text with quotes and \"escaped quotes\" inside". Not that we need more, but... "here is \"another\" one". Just in case.'
let regex = /"(?:\\"|[^"])*"/g

console.log(str.match(regex))

Instead of two matches, I got four, and the text inside the escaped quotes is not even included.

MDN mentions that if the g flag is used, all results matching the plete regular expression will be returned, but capturing groups will not. If I want to obtain capture groups and the global flag is set, I need to use RegExp.exec(). I've tried it, the result is the same:

let str = 'And then, "this is some sample text with quotes and \"escaped quotes\" inside". Not that we need more, but... "here is \"another\" one". Just in case.'
let regex = /"(?:\\"|[^"])*"/g
let temp
let matches = []

while (temp = regex.exec(str))
  matches.push(temp[0])

console.log(matches)

How could I get an array with those two matched elements?

Share Improve this question asked Jul 28, 2021 at 22:43 Zsolt MeszarosZsolt Meszaros 23.2k19 gold badges58 silver badges69 bronze badges
Add a ment  | 

2 Answers 2

Reset to default 3

Another option is a more optimal regex without | operator:

const str = String.raw`And then, "this is some sample text with quotes and \"escaped quotes\" inside". Not that we need more, but... "here is \"another\" one". Just in case.`
const regex = /"[^"\\]*(?:\\[\s\S][^"\\]*)*"/g
console.log(str.match(regex))

Using String.raw, there is no need escaping quotes twice.

See regex proof. Btw, 28 steps vs. 267 steps.

EXPLANATION

--------------------------------------------------------------------------------
  "                        '"'
--------------------------------------------------------------------------------
  [^"\\]*                  any character except: '"', '\\' (0 or more
                           times (matching the most amount possible))
--------------------------------------------------------------------------------
  (?:                      group, but do not capture (0 or more times
                           (matching the most amount possible)):
--------------------------------------------------------------------------------
    \\                       '\'
--------------------------------------------------------------------------------
    [\s\S]                   any character of: whitespace (\n, \r,
                             \t, \f, and " "), non-whitespace (all
                             but \n, \r, \t, \f, and " ")
--------------------------------------------------------------------------------
    [^"\\]*                  any character except: '"', '\\' (0 or
                             more times (matching the most amount
                             possible))
--------------------------------------------------------------------------------
  )*                       end of grouping
--------------------------------------------------------------------------------
  "                        '"'

The reason why regex doesn't work as expected is because a single backslash is an escape character. You'll need escape the backslashes in the text:

let str = 'And then, "this is some sample text with quotes and \"escaped quotes\" inside". Not that we need more, but... "here is \"another\" one". Just in case.';
let regex = /"(?:\\"|[^"])*"/g

console.log(str);
console.log(str.match(regex))

str = 'And then, "this is some sample text with quotes and \\"escaped quotes\\" inside". Not that we need more, but... "here is \\"another\\" one". Just in case.';

console.log(str);
console.log(str.match(regex))

发布者:admin,转转请注明出处:http://www.yc00.com/questions/1745113527a4611988.html

相关推荐

发表回复

评论列表(0条)

  • 暂无评论

联系我们

400-800-8888

在线咨询: QQ交谈

邮件:admin@example.com

工作时间:周一至周五,9:30-18:30,节假日休息

关注微信