ChatGPT解决这个技术问题 Extra ChatGPT

Regular Expression to find a string included between two characters while EXCLUDING the delimiters

I need to extract from a string a set of characters which are included between two delimiters, without returning the delimiters themselves.

A simple example should be helpful:

Target: extract the substring between square brackets, without returning the brackets themselves.

Base string: This is a test string [more or less]

If I use the following reg. ex.

\[.*?\]

The match is [more or less]. I need to get only more or less (without the brackets).

Is it possible to do it?


j
jottr

Easy done:

(?<=\[)(.*?)(?=\])

Technically that's using lookaheads and lookbehinds. See Lookahead and Lookbehind Zero-Width Assertions. The pattern consists of:

is preceded by a [ that is not captured (lookbehind);

a non-greedy captured group. It's non-greedy to stop at the first ]; and

is followed by a ] that is not captured (lookahead).

Alternatively you can just capture what's between the square brackets:

\[(.*?)\]

and return the first captured group instead of the entire match.


"Easy done", LOL! :) Regular expressions always give me headache, I tend to forget them as soon as I find the ones that solve my problems. About your solutions: the first works as expected, the second doesn't, it keeps including the brackets. I'm using C#, maybe the RegEx object has its own "flavour" of regex engine...
It's doing that because you're looking at the whole match rather than the first matched group.
Does this work if the substring also contains the delimiters? For example in This is a test string [more [or] less] would this return more [or] less ?
@gnzlbg no, it would return "more [or"
This is returning the string along with the begin and end string
Z
Zanon

If you are using JavaScript, the solution provided by cletus, (?<=\[)(.*?)(?=\]) won't work because JavaScript doesn't support the lookbehind operator.

Edit: actually, now (ES2018) it's possible to use the lookbehind operator. Just add / to define the regex string, like this:

var regex = /(?<=\[)(.*?)(?=\])/;

Old answer:

Solution:

var regex = /\[(.*?)\]/;
var strToMatch = "This is a test string [more or less]";
var matched = regex.exec(strToMatch);

It will return:

["[more or less]", "more or less"]

So, what you need is the second value. Use:

var matched = regex.exec(strToMatch)[1];

To return:

"more or less"

what if there are multiple matches of [more or less] in the string?
Lookbehind assertions have been added to RegExp in ES2018
c
cletus

You just need to 'capture' the bit between the brackets.

\[(.*?)\]

To capture you put it inside parentheses. You do not say which language this is using. In Perl for example, you would access this using the $1 variable.

my $string ='This is the match [more or less]';
$string =~ /\[(.*?)\]/;
print "match:$1\n";

Other languages will have different mechanisms. C#, for example, uses the Match collection class, I believe.


Thanks, but this solution didn't work, it keeps including the square brackets. As I wrote in my comment to Cletus' solution, it could be that C# RegEx object interprets it differently. I'm not expert on C# though, so it's just a conjecture, maybe it's just my lack of knowledge. :)
s
stevec

Here's a general example with obvious delimiters (X and Y):

(?<=X)(.*?)(?=Y)

Here it's used to find the string between X and Y. Rubular example here, or see image:

https://i.stack.imgur.com/jE62L.png


S
Stieneee

[^\[] Match any character that is not [.

+ Match 1 or more of the anything that is not [. Creates groups of these matches.

(?=\]) Positive lookahead ]. Matches a group ending with ] without including it in the result.

Done.

[^\[]+(?=\])

Proof.

http://regexr.com/3gobr

Similar to the solution proposed by null. But the additional \] is not required. As an additional note, it appears \ is not required to escape the [ after the ^. For readability, I would leave it in.

Does not work in the situation in which the delimiters are identical. "more or less" for example.


This is a good solution, however I have made a tweak so that it ignores an extra ']' at the end as well: [^\[\]]+(?=\])
r
realloc

PHP:

$string ='This is the match [more or less]';
preg_match('#\[(.*)\]#', $string, $match);
var_dump($match[1]);

L
Luis Febro

Most updated solution

If you are using Javascript, the best solution that I came up with is using match instead of exec method. Then, iterate matches and remove the delimiters with the result of the first group using $1

const text = "This is a test string [more or less], [more] and [less]";
const regex = /\[(.*?)\]/gi;
const resultMatchGroup = text.match(regex); // [ '[more or less]', '[more]', '[less]' ]
const desiredRes = resultMatchGroup.map(match => match.replace(regex, "$1"))
console.log("desiredRes", desiredRes); // [ 'more or less', 'more', 'less' ]

As you can see, this is useful for multiple delimiters in the text as well


n
null

This one specifically works for javascript's regular expression parser /[^[\]]+(?=])/g

just run this in the console

var regex = /[^[\]]+(?=])/g;
var str = "This is a test string [more or less]";
var match = regex.exec(str);
match;

C
Cătălin Rădoi

To remove also the [] use:

\[.+\]

But if you have two sets [] [], there is a problem with this one i.imgur.com/NEOLHZk.png
A
A. Jesús

I had the same problem using regex with bash scripting. I used a 2-step solution using pipes with grep -o applying

 '\[(.*?)\]'  

first, then

'\b.*\b'

Obviously not as efficient at the other answers, but an alternative.


t
techguy2000

I wanted to find a string between / and #, but # is sometimes optional. Here is the regex I use:

  (?<=\/)([^#]+)(?=#*)

A
Audwin Oyong

Here is how I got without '[' and ']' in C#:

var text = "This is a test string [more or less]";

// Getting only string between '[' and ']'
Regex regex = new Regex(@"\[(.+?)\]");
var matchGroups = regex.Matches(text);

for (int i = 0; i < matchGroups.Count; i++)
{
    Console.WriteLine(matchGroups[i].Groups[1]);
}

The output is:

more or less

N
Nico

If you need extract the text without the brackets, you can use bash awk

echo " [hola mundo] " | awk -F'[][]' '{print $2}'

result:

hola mundo