How To Use Regular Expression In Scala

Reading Time: 3 minutes

Hi folks, here I am in this article going to explain Regular expression. How to form regular expression in Scala.

What is Regular Expression:

A regular expression is a string of characters and punctuation that represents a search pattern. Popularized by Perl and command-line utilities like Grep, regular expressions are a standard feature in the libraries of most programming languages including Scala. In Scala, we called it Scala Regex.

The organization of Scala’s customary expressions depends on the Java class java.util.regex.Pattern. I suggest looking carefully at the Javadoc (Java Programming Interface Documentation) for java.util.regex.Pattern. if you are new to this kind of thing since Java (what’s the consequence of Scala) Standard expressions can be unique concerning the org you use with different dialects and tools.

To reconstitute a string into a regular expression, we need to use the .r() method with the specified string.

Let’s see through the example:

import scala.util.matching.Regex

val numberPassword: Regex = "[0-9]".r

numberPassword.findFirstMatchIn("testpassword") match {
  case Some(_) => println("Password Is Valid.")
  case None => println("Password must contain a number.")
}

in the above example, the numberPattern is a Regex (regular expression) which we use to make sure a password contains a number.

Suppose we are trying to find out the word from a statement. How to find that word in a statement in Scala there is a predefined method named findAllIn(). Let’s see by an example:-

import scala.util.matching.Regex
  
object RegularExpression 
{
  
    def main(args: Array[String]) 
    {
        val matchingWord = "Scala".r
        val statement = "Scala is a Functinal Programming."
  
        println(matchingWord findFirstIn statement)
    }
}

Output:

Some(Scala)

Here, we have called the method .r() on the specified string to get an instance of the Regex class, to produce a pattern. The method findFirstIn() is used in the above code to find the first match of a regular expression.

How to Form a Regular Expression:

The following regular expression operators are supported in Scala and using these operators we can form any type of regular expression:

1.) Basic operator for Regex:

Anchors — ^ and $

^a        matches any string that starts with a 
b$        matches a string that ends with b
^a b$   exact string match (starts and ends with a b)

Quantifiers — * + ? and {}

abc*    matches a string that has ab followed by zero or more c
abc+    matches a string that has ab followed by one or more c
abc?    matches a string that has ab followed by zero or one c
abc{2}  matches a string that has ab followed by 2 c
abc{2,} matches a string that has ab followed by 2 or more c
a(bc)*  matches a string that has a followed by zero or more copies of the sequence bc
a(bc)*  matches a string that has a followed by zero or more copies of the sequence bc
a(bc){2,5} matches a string that has a followed by 2 up to 5 copies of the sequence bc

OR operator — | or [ ]

a(b|c)   matches a string that has a followed by b or c (and captures b or c)
a[bc]    same as previous, but without capturing b or c

Character classes — \d \w \s and .

\d    matches a single character that is a digit
\w    matches a word character (alphanumeric character plus underscore)
\s    matches a whitespace character (includes tabs and line breaks)
.     matches any character

\d, \w and \s also present their negation with \D, \W and \S respectively.

2.) Intermediate operator for Regex:

Grouping and capturing — ( )

a(bc)  parentheses create a capturing group with value bc

a(?<foo>bc)  using ?<foo> we put a name to the group

This operator is very useful when we need to extract information from strings or data using your preferred programming language. Any multiple events captured by multiple groups will be exposed as a classical array: we will access their values ​​specified using an index on the result of the match.

Greedy and Lazy match:

Quantifiers (* + {}) are greedy operators, so they extend the match through the given text.

3.) Advanced Operator for Regex:

Boundaries — \b and \B

\babc\b   Performs a "whole words only" search pattern

\b represents an anchor like a caret (it’s the same as $ and ^ ) matching position where one side is a word character (like \w) and the other side is not a word character (e.g. this string beginning may contain or a space character)

Look-ahead and Look-behind — (?=) and (?<=)

d(?=r)  matches a d only if is followed by r, but r will not be part of the overall regex match

(?<=r)d  matches a d only if is preceded by an r, but r will not be part of the overall regex match

Summary:

As you noticed, the application areas of regex can be multiple and we can use regex for data validation, data contention, string parsing, and data scraping. Have fun and don’t forget to recommend the article if you liked it. For more blogs click here.

References:

https://docs.scala-lang.org/tour/regular-expression-patterns.html

Written by 

Aditya Narayan is a Software Consultant at Knoldus Inc. in Noida. He recently did his B.Tech in Computer Science and Engineering from Abdul Kalam Technical University. He is familiar with C, Html, CSS, Php, JavaScript, and SQL. His hobbies include watching movies, reading books, and traveling in his spare time.