Skip to content Skip to sidebar Skip to footer

Regex Split String On Specific Chars Outside Quotes

How can this line be split while preserving quoted strings >div#a.more.style.ui[url='in.tray']{value} where the chars for the split are > # . [ { to yield: >div #a .more .

Solution 1:

Don't use split(), then it's easy:

result = subject.match(/[>#.[{](?:"[^"]*"|[^">#.[{])+/g);

See it live on regex101.com.

Explanation:

[>#.[{]     # Match a "splitting" character
(?:         # Start of group to match either...
 "[^"]*"    # a quoted string
|           # or
 [^">#.[{]  # any character except quotes and "splitting" characters
)+          # Repeat at least once.

Solution 2:

It's hard coming with a solution using only one regex.

I can propose this :

var i=0, s= '>div#a.more.style.ui[url="in.tray"]{value}';
var tokens = s.replace(/("[^"]+"|[^"\s]+)/g, function(v){
     return (i++)%2 ? v : v.replace(/([.>#\[{])/g, '@@@$1')}
).split('@@@').filter(Boolean);

(replace @@@ with a string you know isn't in your string.

The idea is to

  1. split the initial string into strings out of quotes and strings in quotes (alternatively, and the latter ones with their quotes) (not a real split, just a conceptual one)
  2. outside of the quotes, add @@@ before the separator
  3. split on @@@ the joined string
  4. remove the (potential) empty strings using filter

Solution 3:

I do wonder if Regex is really the way to go in this case. I know this was tagged as regex, but I'd like to share a non-Regex solution which simply processes each character:

varstring = '>div#a.more.style.ui[url="in.tray"]{value}'var delims = [ '>', '#', '.', '[', '{' ];
var inQuotes = false;
var parts = [];
var part = string[0]; // Start with first characterfor(i = 1; i < string.length; i++) {
  var character = string[i];

  if(character == '"') inQuotes = !inQuotes;

  if(!inQuotes && delims.indexOf(character) > -1) {
    parts.push(part);
    part = character;
  } else part += character;

  if(i == string.length-1) parts.push(part);
}

console.log(parts);

Output:

[ '>div',
  '#a',
  '.more',
  '.style',
  '.ui',
  '[url="in.tray"]',
  '{value}' ]

The inQuotes business will not work for escaped quotes within quotes, i.e., "He said, \"hi there!\"", but for simple cases like this it will work. You can extend it to check if the quote is an escaped quote inside a quote by comparing the previous character to "\" and checking if isQuotes is currently true I suppose, but there are probably better solutions to that.

In terms of readability I think an approach like this is preferred over Regex, though.

Post a Comment for "Regex Split String On Specific Chars Outside Quotes"