ChatGPT解决这个技术问题 Extra ChatGPT

How can I match multiple occurrences with a regex in JavaScript similar to PHP's preg_match_all()?

I am trying to parse url-encoded strings that are made up of key=value pairs separated by either & or &.

The following will only match the first occurrence, breaking apart the keys and values into separate result elements:

var result = mystring.match(/(?:&|&)?([^=]+)=([^&]+)/)

The results for the string '1111342=Adam%20Franco&348572=Bob%20Jones' would be:

['1111342', 'Adam%20Franco']

Using the global flag, 'g', will match all occurrences, but only return the fully matched sub-strings, not the separated keys and values:

var result = mystring.match(/(?:&|&)?([^=]+)=([^&]+)/g)

The results for the string '1111342=Adam%20Franco&348572=Bob%20Jones' would be:

['1111342=Adam%20Franco', '&348572=Bob%20Jones']

While I could split the string on & and break apart each key/value pair individually, is there any way using JavaScript's regular expression support to match multiple occurrences of the pattern /(?:&|&)?([^=]+)=([^&]+)/ similar to PHP's preg_match_all() function?

I'm aiming for some way to get results with the sub-matches separated like:

[['1111342', '348572'], ['Adam%20Franco', 'Bob%20Jones']]

or

[['1111342', 'Adam%20Franco'], ['348572', 'Bob%20Jones']]
it's a little odd that no one recommended using replace here. var data = {}; mystring.replace(/(?:&|&)?([^=]+)=([^&]+)/g, function(a,b,c,d) { data[c] = d; }); done. "matchAll" in JavaScript is "replace" with a replacement handler function instead of a string.
Note that for those still finding this question in 2020, the answer is "don't use regex, use URLSearchParams, which does all of this for you."

K
Klesun

Hoisted from the comments

2020 comment: rather than using regex, we now have URLSearchParams, which does all of this for us, so no custom code, let alone regex, are necessary anymore. – Mike 'Pomax' Kamermans

Browser support is listed here https://caniuse.com/#feat=urlsearchparams

I would suggest an alternative regex, using sub-groups to capture name and value of the parameters individually and re.exec():

function getUrlParams(url) {
  var re = /(?:\?|&(?:amp;)?)([^=&#]+)(?:=?([^&#]*))/g,
      match, params = {},
      decode = function (s) {return decodeURIComponent(s.replace(/\+/g, " "));};

  if (typeof url == "undefined") url = document.location.href;

  while (match = re.exec(url)) {
    params[decode(match[1])] = decode(match[2]);
  }
  return params;
}

var result = getUrlParams("http://maps.google.de/maps?f=q&source=s_q&hl=de&geocode=&q=Frankfurt+am+Main&sll=50.106047,8.679886&sspn=0.370369,0.833588&ie=UTF8&ll=50.116616,8.680573&spn=0.35972,0.833588&z=11&iwloc=addr");

result is an object:

{
  f: "q"
  geocode: ""
  hl: "de"
  ie: "UTF8"
  iwloc: "addr"
  ll: "50.116616,8.680573"
  q: "Frankfurt am Main"
  sll: "50.106047,8.679886"
  source: "s_q"
  spn: "0.35972,0.833588"
  sspn: "0.370369,0.833588"
  z: "11"
}

The regex breaks down as follows:

(?:            # non-capturing group
  \?|&         #   "?" or "&"
  (?:amp;)?    #   (allow "&", for wrongly HTML-encoded URLs)
)              # end non-capturing group
(              # group 1
  [^=&#]+      #   any character except "=", "&" or "#"; at least once
)              # end group 1 - this will be the parameter's name
(?:            # non-capturing group
  =?           #   an "=", optional
  (            #   group 2
    [^&#]*     #     any character except "&" or "#"; any number of times
  )            #   end group 2 - this will be the parameter's value
)              # end non-capturing group

This is what I was hoping for. What I've never seen in JavaScript documentation is mention that the exec() method will continue to return the next result set if called more than once. Thanks again for the great tip!
It does because of this: regular-expressions.info/javascript.html (Read through: "How to Use The JavaScript RegExp Object")
there is a bug in this code: the semicolon after the "while" should be removed.
Because I generally only use normal (i.e. capturing) groups if I'm actually interested in their content.
@KnightYoshi Yes. In JavaScript any expression also produces its own result (like x = y would assign y to x and also produce y). When we apply that knowledge to if (match = re.exec(url)): This A) does the assignment and B) returns the result of re.exec(url) to the while. Now re.exec returns null if there is no match, which is a falsy value. So in effect the loop will continue as long as there is a match.
m
meouw

You need to use the 'g' switch for a global search

var result = mystring.match(/(&|&)?([^=]+)=([^&]+)/g)

This doesn't actually solve the problem: "Using the global flag, 'g', will match all occurrences, but only return the fully matched sub-strings, not the separated keys and values."
M
Mike 'Pomax' Kamermans

2020 edit

Use URLSearchParams, as this job no longer requires any kind of custom code. Browsers can do this for you with a single constructor:

const str = "1111342=Adam%20Franco&348572=Bob%20Jones";
const data = new URLSearchParams(str);
for (pair of data) console.log(pair)

yields

Array [ "1111342", "Adam Franco" ]
Array [ "348572", "Bob Jones" ]

So there is no reason to use regex for this anymore.

Original answer

If you don't want to rely on the "blind matching" that comes with running exec style matching, JavaScript does come with match-all functionality built in, but it's part of the replace function call, when using a "what to do with the capture groups" handling function:

var data = {};

var getKeyValue = function(fullPattern, group1, group2, group3) {
  data[group2] = group3;
};

mystring.replace(/(?:&|&)?([^=]+)=([^&]+)/g, getKeyValue);

done.

Instead of using the capture group handling function to actually return replacement strings (for replace handling, the first arg is the full pattern match, and subsequent args are individual capture groups) we simply take the groups 2 and 3 captures, and cache that pair.

So, rather than writing complicated parsing functions, remember that the "matchAll" function in JavaScript is simply "replace" with a replacement handler function, and much pattern matching efficiency can be had.


I have a string something "this one" and "that one". I want to place all of the double quoted strings in a list i.e. [this one, that one]. So far mystring.match(/"(.*?)"/) works fine at detecting the first one, but I do not know how to adapt your solution for a single capturing group.
sounds like you should post a question on Stackoverflow for that, rather than trying to solve it in comments.
I've created a new question: stackoverflow.com/questions/26174122/…
Not sure why this answer has so few upvotes but it is the best answer to the question.
Hi @Mike'Pomax'Kamermans, the community guide-lines specifically recommend editing entries to improve them, see: stackoverflow.com/help/behavior . The core of your answer is exceedingly helpful, but I found the language "remember that matchAll is replace" wasn't clear and wasn't an explanation of why your code (which is non-obvious) works. I thought you should get the well-deserved rep, so I edited your answer rather than duplicating it with improved text. As the original asker of this question, I'm happy to revert the acceptance - of this answer (and the edit) if you still want me to.
A
Aram Kocharyan

For capturing groups, I'm used to using preg_match_all in PHP and I've tried to replicate it's functionality here:

<script>

// Return all pattern matches with captured groups
RegExp.prototype.execAll = function(string) {
    var match = null;
    var matches = new Array();
    while (match = this.exec(string)) {
        var matchArray = [];
        for (i in match) {
            if (parseInt(i) == i) {
                matchArray.push(match[i]);
            }
        }
        matches.push(matchArray);
    }
    return matches;
}

// Example
var someTxt = 'abc123 def456 ghi890';
var results = /[a-z]+(\d+)/g.execAll(someTxt);

// Output
[["abc123", "123"],
 ["def456", "456"],
 ["ghi890", "890"]]

</script>

@teh_senaus you need to specify the global modifier with /g otherwise running exec() won't change the current index and will loop forever.
If i call to validate this code myRe.test(str) and then try do execAll, it stars at second match and we lost the first match.
@fdrv You have to reset the lastIndex to zero before starting the loop: this.lastIndex = 0;
G
Gumbo

Set the g modifier for a global match:

/…/g

This doesn't actually solve the problem: "Using the global flag, 'g', will match all occurrences, but only return the fully matched sub-strings, not the separated keys and values."
r
randers

Source:
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/exec

Finding successive matches

If your regular expression uses the "g" flag, you can use the exec() method multiple times to find successive matches in the same string. When you do so, the search starts at the substring of str specified by the regular expression's lastIndex property (test() will also advance the lastIndex property). For example, assume you have this script:

var myRe = /ab*/g;
var str = 'abbcdefabh';
var myArray;
while ((myArray = myRe.exec(str)) !== null) {
  var msg = 'Found ' + myArray[0] + '. ';
  msg += 'Next match starts at ' + myRe.lastIndex;
  console.log(msg);
}

This script displays the following text:

Found abb. Next match starts at 3
Found ab. Next match starts at 912

Note: Do not place the regular expression literal (or RegExp constructor) within the while condition or it will create an infinite loop if there is a match due to the lastIndex property being reset upon each iteration. Also be sure that the global flag is set or a loop will occur here also.


If i call to validate this code myRe.test(str) and then try do while, it stars at second match and we lost the first match.
You can also combine String.prototype.match with the g flag: 'abbcdefabh'.match(/ab*/g) returns ['abb', 'ab']
K
Klesun

Hеllo from 2020. Let me bring String.prototype.matchAll() to your attention:

let regexp = /(?:&|&amp;)?([^=]+)=([^&]+)/g;
let str = '1111342=Adam%20Franco&348572=Bob%20Jones';

for (let match of str.matchAll(regexp)) {
    let [full, key, value] = match;
    console.log(key + ' => ' + value);
}

Outputs:

1111342 => Adam%20Franco
348572 => Bob%20Jones

Finally! A note of caution: "ECMAScript 2020, the 11th edition, introduces the matchAll method for Strings, to produce an iterator for all match objects generated by a global regular expression". According to the site linked in the answer, most browsers & nodeJS support it currently, but not IE, Safari, or Samsung Internet. Hopefully support will broaden soon, but YMMV for a while.
f
fedu

If someone (like me) needs Tomalak's method with array support (ie. multiple select), here it is:

function getUrlParams(url) {
  var re = /(?:\?|&(?:amp;)?)([^=&#]+)(?:=?([^&#]*))/g,
      match, params = {},
      decode = function (s) {return decodeURIComponent(s.replace(/\+/g, " "));};

  if (typeof url == "undefined") url = document.location.href;

  while (match = re.exec(url)) {
    if( params[decode(match[1])] ) {
        if( typeof params[decode(match[1])] != 'object' ) {
            params[decode(match[1])] = new Array( params[decode(match[1])], decode(match[2]) );
        } else {
            params[decode(match[1])].push(decode(match[2]));
        }
    }
    else
        params[decode(match[1])] = decode(match[2]);
  }
  return params;
}
var urlParams = getUrlParams(location.search);

input ?my=1&my=2&my=things

result 1,2,things (earlier returned only: things)


C
Chris West

Just to stick with the proposed question as indicated by the title, you can actually iterate over each match in a string using String.prototype.replace(). For example the following does just that to get an array of all words based on a regular expression:

function getWords(str) {
  var arr = [];
  str.replace(/\w+/g, function(m) {
    arr.push(m);
  });
  return arr;
}

var words = getWords("Where in the world is Carmen Sandiego?");
// > ["Where", "in", "the", "world", "is", "Carmen", "Sandiego"]

If I wanted to get capture groups or even the index of each match I could do that too. The following shows how each match is returned with the entire match, the 1st capture group and the index:

function getWords(str) {
  var arr = [];
  str.replace(/\w+(?=(.*))/g, function(m, remaining, index) {
    arr.push({ match: m, remainder: remaining, index: index });
  });
  return arr;
}

var words = getWords("Where in the world is Carmen Sandiego?");

After running the above, words will be as follows:

[
  {
    "match": "Where",
    "remainder": " in the world is Carmen Sandiego?",
    "index": 0
  },
  {
    "match": "in",
    "remainder": " the world is Carmen Sandiego?",
    "index": 6
  },
  {
    "match": "the",
    "remainder": " world is Carmen Sandiego?",
    "index": 9
  },
  {
    "match": "world",
    "remainder": " is Carmen Sandiego?",
    "index": 13
  },
  {
    "match": "is",
    "remainder": " Carmen Sandiego?",
    "index": 19
  },
  {
    "match": "Carmen",
    "remainder": " Sandiego?",
    "index": 22
  },
  {
    "match": "Sandiego",
    "remainder": "?",
    "index": 29
  }
]

In order to match multiple occurrences similar to what is available in PHP with preg_match_all you can use this type of thinking to make your own or use something like YourJS.matchAll(). YourJS more or less defines this function as follows:

function matchAll(str, rgx) {
  var arr, extras, matches = [];
  str.replace(rgx.global ? rgx : new RegExp(rgx.source, (rgx + '').replace(/[\s\S]+\//g , 'g')), function() {
    matches.push(arr = [].slice.call(arguments));
    extras = arr.splice(-2);
    arr.index = extras[0];
    arr.input = extras[1];
  });
  return matches[0] ? matches : null;
}

Since you want to parse the query string of a URL, you could also use something like YourJS.parseQS() (yourjs.com/snippets/56), although a lot of other libraries also offer this functionality.
Modifying a variable from an outer scope in a loop that is supposed to return a replacement is kind of bad. Your misusing replace here
f
fboes

If you can get away with using map this is a four-line-solution:

var mystring = '1111342=Adam%20Franco&348572=Bob%20Jones'; var result = mystring.match(/(&|&)?([^=]+)=([^&]+)/g) || []; result = result.map(function(i) { return i.match(/(&|&)?([^=]+)=([^&]+)/); }); console.log(result);

Ain't pretty, ain't efficient, but at least it is compact. ;)


j
jnnnnn

Use window.URL:

> s = 'http://www.example.com/index.html?1111342=Adam%20Franco&348572=Bob%20Jones'
> u = new URL(s)
> Array.from(u.searchParams.entries())
[["1111342", "Adam Franco"], ["348572", "Bob Jones"]]

i
ivar

To capture several parameters using the same name, I modified the while loop in Tomalak's method like this:

  while (match = re.exec(url)) {
    var pName = decode(match[1]);
    var pValue = decode(match[2]);
    params[pName] ? params[pName].push(pValue) : params[pName] = [pValue];
  }

input: ?firstname=george&lastname=bush&firstname=bill&lastname=clinton

returns: {firstname : ["george", "bill"], lastname : ["bush", "clinton"]}


While I like your idea, it doesn't work nicely with single params, like for ?cinema=1234&film=12&film=34 I'd expect {cinema: 1234, film: [12, 34]}. Edited your answer to reflect this.
p
p.s.w.g

Well... I had a similar problem... I want an incremental / step search with RegExp (eg: start search... do some processing... continue search until last match)

After lots of internet search... like always (this is turning an habit now) I end up in StackOverflow and found the answer...

Whats is not referred and matters to mention is "lastIndex" I now understand why the RegExp object implements the "lastIndex" property


p
pguardiario

Splitting it looks like the best option in to me:

'1111342=Adam%20Franco&348572=Bob%20Jones'.split('&').map(x => x.match(/(?:&|&amp;)?([^=]+)=([^&]+)/))

a
andrew pate

To avoid regex hell you could find your first match, chop off a chunk then attempt to find the next one on the substring. In C# this looks something like this, sorry I've not ported it over to JavaScript for you.

        long count = 0;
        var remainder = data;
        Match match = null;
        do
        {
            match = _rgx.Match(remainder);
            if (match.Success)
            {
                count++;
                remainder = remainder.Substring(match.Index + 1, remainder.Length - (match.Index+1));
            }
        } while (match.Success);
        return count;