I am trying to parse url-encoded strings that are made up of key=value pairs separated by either &
or &
.
The following will only match the first occurrence, breaking apart the keys and values into separate result elements:
var result = mystring.match(/(?:&|&)?([^=]+)=([^&]+)/)
The results for the string '1111342=Adam%20Franco&348572=Bob%20Jones' would be:
['1111342', 'Adam%20Franco']
Using the global flag, 'g', will match all occurrences, but only return the fully matched sub-strings, not the separated keys and values:
var result = mystring.match(/(?:&|&)?([^=]+)=([^&]+)/g)
The results for the string '1111342=Adam%20Franco&348572=Bob%20Jones' would be:
['1111342=Adam%20Franco', '&348572=Bob%20Jones']
While I could split the string on &
and break apart each key/value pair individually, is there any way using JavaScript's regular expression support to match multiple occurrences of the pattern /(?:&|&)?([^=]+)=([^&]+)/
similar to PHP's preg_match_all()
function?
I'm aiming for some way to get results with the sub-matches separated like:
[['1111342', '348572'], ['Adam%20Franco', 'Bob%20Jones']]
or
[['1111342', 'Adam%20Franco'], ['348572', 'Bob%20Jones']]
replace
here. var data = {}; mystring.replace(/(?:&|&)?([^=]+)=([^&]+)/g, function(a,b,c,d) { data[c] = d; });
done. "matchAll" in JavaScript is "replace" with a replacement handler function instead of a string.
Hoisted from the comments
2020 comment: rather than using regex, we now have URLSearchParams, which does all of this for us, so no custom code, let alone regex, are necessary anymore. – Mike 'Pomax' Kamermans
Browser support is listed here https://caniuse.com/#feat=urlsearchparams
I would suggest an alternative regex, using sub-groups to capture name and value of the parameters individually and re.exec()
:
function getUrlParams(url) {
var re = /(?:\?|&(?:amp;)?)([^=&#]+)(?:=?([^&#]*))/g,
match, params = {},
decode = function (s) {return decodeURIComponent(s.replace(/\+/g, " "));};
if (typeof url == "undefined") url = document.location.href;
while (match = re.exec(url)) {
params[decode(match[1])] = decode(match[2]);
}
return params;
}
var result = getUrlParams("http://maps.google.de/maps?f=q&source=s_q&hl=de&geocode=&q=Frankfurt+am+Main&sll=50.106047,8.679886&sspn=0.370369,0.833588&ie=UTF8&ll=50.116616,8.680573&spn=0.35972,0.833588&z=11&iwloc=addr");
result
is an object:
{ f: "q" geocode: "" hl: "de" ie: "UTF8" iwloc: "addr" ll: "50.116616,8.680573" q: "Frankfurt am Main" sll: "50.106047,8.679886" source: "s_q" spn: "0.35972,0.833588" sspn: "0.370369,0.833588" z: "11" }
The regex breaks down as follows:
(?: # non-capturing group \?|& # "?" or "&" (?:amp;)? # (allow "&", for wrongly HTML-encoded URLs) ) # end non-capturing group ( # group 1 [^=&#]+ # any character except "=", "&" or "#"; at least once ) # end group 1 - this will be the parameter's name (?: # non-capturing group =? # an "=", optional ( # group 2 [^&#]* # any character except "&" or "#"; any number of times ) # end group 2 - this will be the parameter's value ) # end non-capturing group
You need to use the 'g' switch for a global search
var result = mystring.match(/(&|&)?([^=]+)=([^&]+)/g)
2020 edit
Use URLSearchParams, as this job no longer requires any kind of custom code. Browsers can do this for you with a single constructor:
const str = "1111342=Adam%20Franco&348572=Bob%20Jones";
const data = new URLSearchParams(str);
for (pair of data) console.log(pair)
yields
Array [ "1111342", "Adam Franco" ]
Array [ "348572", "Bob Jones" ]
So there is no reason to use regex for this anymore.
Original answer
If you don't want to rely on the "blind matching" that comes with running exec
style matching, JavaScript does come with match-all functionality built in, but it's part of the replace
function call, when using a "what to do with the capture groups" handling function:
var data = {};
var getKeyValue = function(fullPattern, group1, group2, group3) {
data[group2] = group3;
};
mystring.replace(/(?:&|&)?([^=]+)=([^&]+)/g, getKeyValue);
done.
Instead of using the capture group handling function to actually return replacement strings (for replace handling, the first arg is the full pattern match, and subsequent args are individual capture groups) we simply take the groups 2 and 3 captures, and cache that pair.
So, rather than writing complicated parsing functions, remember that the "matchAll" function in JavaScript is simply "replace" with a replacement handler function, and much pattern matching efficiency can be had.
something "this one" and "that one"
. I want to place all of the double quoted strings in a list i.e. [this one, that one]. So far mystring.match(/"(.*?)"/)
works fine at detecting the first one, but I do not know how to adapt your solution for a single capturing group.
For capturing groups, I'm used to using preg_match_all
in PHP and I've tried to replicate it's functionality here:
<script>
// Return all pattern matches with captured groups
RegExp.prototype.execAll = function(string) {
var match = null;
var matches = new Array();
while (match = this.exec(string)) {
var matchArray = [];
for (i in match) {
if (parseInt(i) == i) {
matchArray.push(match[i]);
}
}
matches.push(matchArray);
}
return matches;
}
// Example
var someTxt = 'abc123 def456 ghi890';
var results = /[a-z]+(\d+)/g.execAll(someTxt);
// Output
[["abc123", "123"],
["def456", "456"],
["ghi890", "890"]]
</script>
/g
otherwise running exec()
won't change the current index and will loop forever.
Set the g
modifier for a global match:
/…/g
Source:
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/exec
Finding successive matches
If your regular expression uses the "g" flag, you can use the exec() method multiple times to find successive matches in the same string. When you do so, the search starts at the substring of str specified by the regular expression's lastIndex property (test() will also advance the lastIndex property). For example, assume you have this script:
var myRe = /ab*/g;
var str = 'abbcdefabh';
var myArray;
while ((myArray = myRe.exec(str)) !== null) {
var msg = 'Found ' + myArray[0] + '. ';
msg += 'Next match starts at ' + myRe.lastIndex;
console.log(msg);
}
This script displays the following text:
Found abb. Next match starts at 3
Found ab. Next match starts at 912
Note: Do not place the regular expression literal (or RegExp constructor) within the while condition or it will create an infinite loop if there is a match due to the lastIndex property being reset upon each iteration. Also be sure that the global flag is set or a loop will occur here also.
String.prototype.match
with the g
flag: 'abbcdefabh'.match(/ab*/g)
returns ['abb', 'ab']
Hеllo from 2020. Let me bring String.prototype.matchAll() to your attention:
let regexp = /(?:&|&)?([^=]+)=([^&]+)/g;
let str = '1111342=Adam%20Franco&348572=Bob%20Jones';
for (let match of str.matchAll(regexp)) {
let [full, key, value] = match;
console.log(key + ' => ' + value);
}
Outputs:
1111342 => Adam%20Franco
348572 => Bob%20Jones
If someone (like me) needs Tomalak's method with array support (ie. multiple select), here it is:
function getUrlParams(url) {
var re = /(?:\?|&(?:amp;)?)([^=&#]+)(?:=?([^&#]*))/g,
match, params = {},
decode = function (s) {return decodeURIComponent(s.replace(/\+/g, " "));};
if (typeof url == "undefined") url = document.location.href;
while (match = re.exec(url)) {
if( params[decode(match[1])] ) {
if( typeof params[decode(match[1])] != 'object' ) {
params[decode(match[1])] = new Array( params[decode(match[1])], decode(match[2]) );
} else {
params[decode(match[1])].push(decode(match[2]));
}
}
else
params[decode(match[1])] = decode(match[2]);
}
return params;
}
var urlParams = getUrlParams(location.search);
input ?my=1&my=2&my=things
result 1,2,things
(earlier returned only: things)
Just to stick with the proposed question as indicated by the title, you can actually iterate over each match in a string using String.prototype.replace()
. For example the following does just that to get an array of all words based on a regular expression:
function getWords(str) {
var arr = [];
str.replace(/\w+/g, function(m) {
arr.push(m);
});
return arr;
}
var words = getWords("Where in the world is Carmen Sandiego?");
// > ["Where", "in", "the", "world", "is", "Carmen", "Sandiego"]
If I wanted to get capture groups or even the index of each match I could do that too. The following shows how each match is returned with the entire match, the 1st capture group and the index:
function getWords(str) {
var arr = [];
str.replace(/\w+(?=(.*))/g, function(m, remaining, index) {
arr.push({ match: m, remainder: remaining, index: index });
});
return arr;
}
var words = getWords("Where in the world is Carmen Sandiego?");
After running the above, words
will be as follows:
[
{
"match": "Where",
"remainder": " in the world is Carmen Sandiego?",
"index": 0
},
{
"match": "in",
"remainder": " the world is Carmen Sandiego?",
"index": 6
},
{
"match": "the",
"remainder": " world is Carmen Sandiego?",
"index": 9
},
{
"match": "world",
"remainder": " is Carmen Sandiego?",
"index": 13
},
{
"match": "is",
"remainder": " Carmen Sandiego?",
"index": 19
},
{
"match": "Carmen",
"remainder": " Sandiego?",
"index": 22
},
{
"match": "Sandiego",
"remainder": "?",
"index": 29
}
]
In order to match multiple occurrences similar to what is available in PHP with preg_match_all
you can use this type of thinking to make your own or use something like YourJS.matchAll()
. YourJS more or less defines this function as follows:
function matchAll(str, rgx) {
var arr, extras, matches = [];
str.replace(rgx.global ? rgx : new RegExp(rgx.source, (rgx + '').replace(/[\s\S]+\//g , 'g')), function() {
matches.push(arr = [].slice.call(arguments));
extras = arr.splice(-2);
arr.index = extras[0];
arr.input = extras[1];
});
return matches[0] ? matches : null;
}
YourJS.parseQS()
(yourjs.com/snippets/56), although a lot of other libraries also offer this functionality.
If you can get away with using map
this is a four-line-solution:
var mystring = '1111342=Adam%20Franco&348572=Bob%20Jones'; var result = mystring.match(/(&|&)?([^=]+)=([^&]+)/g) || []; result = result.map(function(i) { return i.match(/(&|&)?([^=]+)=([^&]+)/); }); console.log(result);
Ain't pretty, ain't efficient, but at least it is compact. ;)
Use window.URL
:
> s = 'http://www.example.com/index.html?1111342=Adam%20Franco&348572=Bob%20Jones'
> u = new URL(s)
> Array.from(u.searchParams.entries())
[["1111342", "Adam Franco"], ["348572", "Bob Jones"]]
To capture several parameters using the same name, I modified the while loop in Tomalak's method like this:
while (match = re.exec(url)) {
var pName = decode(match[1]);
var pValue = decode(match[2]);
params[pName] ? params[pName].push(pValue) : params[pName] = [pValue];
}
input: ?firstname=george&lastname=bush&firstname=bill&lastname=clinton
returns: {firstname : ["george", "bill"], lastname : ["bush", "clinton"]}
?cinema=1234&film=12&film=34
I'd expect {cinema: 1234, film: [12, 34]}
. Edited your answer to reflect this.
Well... I had a similar problem... I want an incremental / step search with RegExp (eg: start search... do some processing... continue search until last match)
After lots of internet search... like always (this is turning an habit now) I end up in StackOverflow and found the answer...
Whats is not referred and matters to mention is "lastIndex
" I now understand why the RegExp object implements the "lastIndex
" property
Splitting it looks like the best option in to me:
'1111342=Adam%20Franco&348572=Bob%20Jones'.split('&').map(x => x.match(/(?:&|&)?([^=]+)=([^&]+)/))
To avoid regex hell you could find your first match, chop off a chunk then attempt to find the next one on the substring. In C# this looks something like this, sorry I've not ported it over to JavaScript for you.
long count = 0;
var remainder = data;
Match match = null;
do
{
match = _rgx.Match(remainder);
if (match.Success)
{
count++;
remainder = remainder.Substring(match.Index + 1, remainder.Length - (match.Index+1));
}
} while (match.Success);
return count;
Success story sharing
x = y
would assigny
tox
and also producey
). When we apply that knowledge toif (match = re.exec(url))
: This A) does the assignment and B) returns the result ofre.exec(url)
to thewhile
. Nowre.exec
returnsnull
if there is no match, which is a falsy value. So in effect the loop will continue as long as there is a match.