Extracting values using Regular Expressions

Regular expression is a sequence of characters that define a search pattern. Regular expressions are used to find particular sequences in a string or is used to extract value from a string.

Fields of application range from validation to parsing/replacing strings, passing through translating data to other formats and web scraping.

To define a search pattern in scala, we need to import scala.util.matching.Regex and to convert a string into a regular expression, use .r method.

 

In the above example, we have pattern as the regex variable. The regex defined for this variable matches the word that contains only alphabets. findFirstMatchIn() method is defined in scala.util.matching.Regex to find the first occurrence of the pattern in the input string. Also, the return type of findFirstMatchIn() is an optional value. Similarly, we have findAllMatchIn that returns an iterator to iterate through all the possible matches found in the input string. In the below example, foundPatterns is an iterator and we have converted it to an array to see all the possible options.

 

 

Regular expressions are also widely used to extract values from a string. For example, to count the number of words in a file, we usually first remove all the extra spaces and then split the words on the basis of commas and dots. To make the process easy, we can use regular expression. To split the words of a file, we can write the regex as shown below:

 

There are many other use cases of regular expressions. It also works as an extractor. The regex for the values that need to be extracted must be wrapped inside the parenthesis. For example, to extract the id and name of a student from the input string “Student(1, Aashrita)”, we can use regular expression as shown below:

 

I hope this helps you to get a better understanding of how regular expression works. For more information, please have a look at the references below and refer the regex cheat sheet to write a regular expression for a particular use case.

References

  • https://docs.scala-lang.org/tour/regular-expression-patterns.html
  • https://www.scala-lang.org/api/2.12.5/scala/util/matching/Regex.html

knoldus-advt-sticker


Leave a Reply

%d bloggers like this: