How to match all occurrences of a regex

ruby regex

Is there a quick way to find every match of a regular expression in Ruby? I've looked through the Regex object in the Ruby STL and searched on Google to no avail.

I read this is how can I search a string for all regex patterns and was horribly confused...

Andrew Marshall

Using scan should do the trick:

string.scan(/regex/)

But what abut this case? "match me!".scan(/.../) = [ "mat", "ch " "me!" ], but all occurrences of /.../ would be [ "mat", "atc", "tch", "ch ", ... ]

Not it wouldn't be. /.../ is a normal greedy regexp. It won't backtrack on matched content. you could try to use a lazy regexp but even that probably won't be enough. have a look at the regexp doc ruby-doc.org/core-1.9.3/Regexp.html to correctly express your regexp :)

this seems like a Ruby WTF... why is this on String instead of Regexp with the other regexp stuff? It isn't even mentioned anywhere on the docs for Regexp

I guess it's because it's defined and called on String not on Regex ... But it does actually make sense. You can write a regular expression to capture all matches using Regex#match and iterate over captured groups. Here you write a partial match function and want it applied mutiple times on a given string, this is not the responsibility of Regexp. I suggest you check the implementation of scan for a better understanding: ruby-doc.org/core-1.9.3/String.html#method-i-scan

@MichaelDickens: In this case, you can use /(?=(...))/.

the Tin Man

To find all the matching strings, use String's scan method.

str = "A 54mpl3 string w1th 7 numb3rs scatter36 ar0und"
str.scan(/\d+/)
#=> ["54", "3", "1", "7", "3", "36", "0"]

If you want, MatchData, which is the type of the object returned by the Regexp match method, use:

str.to_enum(:scan, /\d+/).map { Regexp.last_match }
#=> [#<MatchData "54">, #<MatchData "3">, #<MatchData "1">, #<MatchData "7">, #<MatchData "3">, #<MatchData "36">, #<MatchData "0">]

The benefit of using MatchData is that you can use methods like offset:

match_datas = str.to_enum(:scan, /\d+/).map { Regexp.last_match }
match_datas[0].offset(0)
#=> [2, 4]
match_datas[1].offset(0)
#=> [7, 8]

See these questions if you'd like to know more:

"How do I get the match data for all occurrences of a Ruby regular expression in a string?"

"Ruby regular expression matching enumerator with named capture support"

"How to find out the starting point for each match in ruby"

Reading about special variables $&, $', $1, $2 in Ruby will be helpful too.

the Tin Man

if you have a regexp with groups:

str="A 54mpl3 string w1th 7 numbers scatter3r ar0und"
re=/(\d+)[m-t]/

you can use String's scan method to find matching groups:

str.scan re
#> [["54"], ["1"], ["3"]]

To find the matching pattern:

str.to_enum(:scan,re).map {$&}
#> ["54m", "1t", "3r"]

str.scan(/\d+[m-t]/) # => ["54m", "1t", "3r"] is more idiomatic than str.to_enum(:scan,re).map {$&}

Maybe you misunderstood. The regular expression of the example of a user I replied was: /(\d+)[m-t]/ not /\d+[m-t]/ To write: re = /(\d+)[m-t]/; str.scan(re) is same str.scan(/(\d+)[mt]/) but I get #> [["" 54 "], [" 1 "], [" 3 "]] and not "54m", "1t", "3r"] The question was: if I have a regular expression with a group and want to capture all the patterns without changing the regular expression (leaving the group), how can I do it? In this sense, a possible solution, albeit a little cryptic and difficult to read, was: str.to_enum(:scan,re).map {$&}

the Tin Man

You can use string.scan(your_regex).flatten. If your regex contains groups, it will return in a single plain array.

string = "A 54mpl3 string w1th 7 numbers scatter3r ar0und"
your_regex = /(\d+)[m-t]/
string.scan(your_regex).flatten
=> ["54", "1", "3"]

Regex can be a named group as well.

string = 'group_photo.jpg'
regex = /\A(?<name>.*)\.(?<ext>.*)\z/
string.scan(regex).flatten

You can also use gsub, it's just one more way if you want MatchData.

str.gsub(/\d/).map{ Regexp.last_match }

Remove the grouping from your_regex = /(\d+)[m-t]/ and you won't need to use flatten. Your final example uses last_match which in this case is probably safe, but is a global and could possibly be overwritten if any regex was matched prior to calling last_match. Instead it's probably safer to use string.match(regex).captures # => ["group_photo", "jpg"] or string.scan(/\d+/) # => ["54", "3", "1", "7", "3", "0"] as shown in other answers, depending on the pattern and needs.

Victor

If you have capture groups () inside the regex for other purposes, the proposed solutions with String#scan and String#match are problematic:

String#scan only get what is inside the capture groups; String#match only get the first match, rejecting all the others; String#matches (proposed function) get all the matches.

On this case, we need a solution to match the regex without considering the capture groups.

String#matches

With the Refinements you can monkey patch the String class, implement the String#matches and this method will be available inside the scope of the class that is using the refinement. It is an incredible way to Monkey Patch classes on Ruby.

Setup

/lib/refinements/string_matches.rb

# This module add a String refinement to enable multiple String#match()s
# 1. `String#scan` only get what is inside the capture groups (inside the parens)
# 2. `String#match` only get the first match
# 3. `String#matches` (proposed function) get all the matches
module StringMatches
  refine String do
    def matches(regex)
      scan(/(?<matching>#{regex})/).flatten
    end
  end
end

Used: named capture groups

Usage

rails c

> require 'refinements/string_matches'

> using StringMatches

> 'function(1, 2, 3) + function(4, 5, 6)'.matches(/function\((\d), (\d), (\d)\)/)
=> ["function(1, 2, 3)", "function(4, 5, 6)"]

> 'function(1, 2, 3) + function(4, 5, 6)'.scan(/function\((\d), (\d), (\d)\)/)
=> [["1", "2", "3"], ["4", "5", "6"]]

> 'function(1, 2, 3) + function(4, 5, 6)'.match(/function\((\d), (\d), (\d)\)/)[0]
=> "function(1, 2, 3)"

How to match all occurrences of a regex

Follow WeChat

Want to stay one step ahead of the latest teleworks?

相似问题

Platform

Support

Links

Contact US