I would like to be able to find a match between the first letter of a word, and one of the letters in a group such as "ABC". In pseudocode, this might look something like:
case Process(word) =>
word.firstLetter match {
case([a-c][A-C]) =>
case _ =>
}
}
But how do I grab the first letter in Scala instead of Java? How do I express the regular expression properly? Is it possible to do this within a case class?
[a-cA-C]
for that regular expression.
Traversable
(like List
and Array
), if you want the first 3 chars, try "my string".take(3)
, for the first "foo".head
You can do this because regular expressions define extractors but you need to define the regex pattern first. I don't have access to a Scala REPL to test this but something like this should work.
val Pattern = "([a-cA-C])".r
word.firstLetter match {
case Pattern(c) => c bound to capture group here
case _ =>
}
Since version 2.10, one can use Scala's string interpolation feature:
implicit class RegexOps(sc: StringContext) {
def r = new util.matching.Regex(sc.parts.mkString, sc.parts.tail.map(_ => "x"): _*)
}
scala> "123" match { case r"\d+" => true case _ => false }
res34: Boolean = true
Even better one can bind regular expression groups:
scala> "123" match { case r"(\d+)$d" => d.toInt case _ => 0 }
res36: Int = 123
scala> "10+15" match { case r"(\d\d)${first}\+(\d\d)${second}" => first.toInt+second.toInt case _ => 0 }
res38: Int = 25
It is also possible to set more detailed binding mechanisms:
scala> object Doubler { def unapply(s: String) = Some(s.toInt*2) }
defined module Doubler
scala> "10" match { case r"(\d\d)${Doubler(d)}" => d case _ => 0 }
res40: Int = 20
scala> object isPositive { def unapply(s: String) = s.toInt >= 0 }
defined module isPositive
scala> "10" match { case r"(\d\d)${d @ isPositive()}" => d.toInt case _ => 0 }
res56: Int = 10
An impressive example on what's possible with Dynamic
is shown in the blog post Introduction to Type Dynamic:
object T {
class RegexpExtractor(params: List[String]) {
def unapplySeq(str: String) =
params.headOption flatMap (_.r unapplySeq str)
}
class StartsWithExtractor(params: List[String]) {
def unapply(str: String) =
params.headOption filter (str startsWith _) map (_ => str)
}
class MapExtractor(keys: List[String]) {
def unapplySeq[T](map: Map[String, T]) =
Some(keys.map(map get _))
}
import scala.language.dynamics
class ExtractorParams(params: List[String]) extends Dynamic {
val Map = new MapExtractor(params)
val StartsWith = new StartsWithExtractor(params)
val Regexp = new RegexpExtractor(params)
def selectDynamic(name: String) =
new ExtractorParams(params :+ name)
}
object p extends ExtractorParams(Nil)
Map("firstName" -> "John", "lastName" -> "Doe") match {
case p.firstName.lastName.Map(
Some(p.Jo.StartsWith(fn)),
Some(p.`.*(\\w)$`.Regexp(lastChar))) =>
println(s"Match! $fn ...$lastChar")
case _ => println("nope")
}
}
$
sign as a line end pattern: the compiler complains about lack of string termination.
case p.firstName.lastName.Map(...
pattern—how on earth do I read that?
Regex
everytime the match is checked. And that is quite a costly operation that involves compilation of the regex pattern.
As delnan pointed out, the match
keyword in Scala has nothing to do with regexes. To find out whether a string matches a regex, you can use the String.matches
method. To find out whether a string starts with an a, b or c in lower or upper case, the regex would look like this:
word.matches("[a-cA-C].*")
You can read this regex as "one of the characters a, b, c, A, B or C followed by anything" (.
means "any character" and *
means "zero or more times", so ".*" is any string).
To expand a little on Andrew's answer: The fact that regular expressions define extractors can be used to decompose the substrings matched by the regex very nicely using Scala's pattern matching, e.g.:
val Process = """([a-cA-C])([^\s]+)""".r // define first, rest is non-space
for (p <- Process findAllIn "aha bah Cah dah") p match {
case Process("b", _) => println("first: 'a', some rest")
case Process(_, rest) => println("some first, rest: " + rest)
// etc.
}
[]
), the caret indicates negation, so [^\s]
means 'non-whitespace'.
String.matches is the way to do pattern matching in the regex sense.
But as a handy aside, word.firstLetter in real Scala code looks like:
word(0)
Scala treats Strings as a sequence of Char's, so if for some reason you wanted to explicitly get the first character of the String and match it, you could use something like this:
"Cat"(0).toString.matches("[a-cA-C]")
res10: Boolean = true
I'm not proposing this as the general way to do regex pattern matching, but it's in line with your proposed approach to first find the first character of a String and then match it against a regex.
EDIT: To be clear, the way I would do this is, as others have said:
"Cat".matches("^[a-cA-C].*")
res14: Boolean = true
Just wanted to show an example as close as possible to your initial pseudocode. Cheers!
"Cat"(0).toString
could be more clearly written as "Cat" take 1
, imho.
val r = "[A-Ca-c]".r ; "cat"(0) match { case r() => }
.
First we should know that regular expression can separately be used. Here is an example:
import scala.util.matching.Regex
val pattern = "Scala".r // <=> val pattern = new Regex("Scala")
val str = "Scala is very cool"
val result = pattern findFirstIn str
result match {
case Some(v) => println(v)
case _ =>
} // output: Scala
Second we should notice that combining regular expression with pattern matching would be very powerful. Here is a simple example.
val date = """(\d\d\d\d)-(\d\d)-(\d\d)""".r
"2014-11-20" match {
case date(year, month, day) => "hello"
} // output: hello
In fact, regular expression itself is already very powerful; the only thing we need to do is to make it more powerful by Scala. Here are more examples in Scala Document: http://www.scala-lang.org/files/archive/api/current/index.html#scala.util.matching.Regex
Note that the approach from @AndrewMyers's answer matches the entire string to the regular expression, with the effect of anchoring the regular expression at both ends of the string using ^
and $
. Example:
scala> val MY_RE = "(foo|bar).*".r
MY_RE: scala.util.matching.Regex = (foo|bar).*
scala> val result = "foo123" match { case MY_RE(m) => m; case _ => "No match" }
result: String = foo
scala> val result = "baz123" match { case MY_RE(m) => m; case _ => "No match" }
result: String = No match
scala> val result = "abcfoo123" match { case MY_RE(m) => m; case _ => "No match" }
result: String = No match
And with no .*
at the end:
scala> val MY_RE2 = "(foo|bar)".r
MY_RE2: scala.util.matching.Regex = (foo|bar)
scala> val result = "foo123" match { case MY_RE2(m) => m; case _ => "No match" }
result: String = No match
val MY_RE2 = "(foo|bar)".r.unanchored ; "foo123" match { case MY_RE2(_*) => }
. More idiomatically, val re
without all caps.
Success story sharing
val Pattern = "[a-cA-C]".r
will not work. This is because match-case usesunapplySeq(target: Any): Option[List[String]]
, which returns the matching groups.val r = "[A-Ca-c]".r ; 'a' match { case r() => }
. scala-lang.org/api/current/#scala.util.matching.Regexval r = "([A-Ca-c])".r ; "C" match { case r(_*) => }
.