Regex: A Library in RUST

Reading Time: 2 minutes

Regex is one of the most powerful, widely applicable, and sometimes intimidating techniques in software engineering. From validating email addresses to performing complex code refactors, regular expressions have a wide range of uses and are an essential entry in any software engineer’s toolbox.

In this article, we will see some of the major aspects of Regular Expression like:

  • What is Regular Expression
  • Why do we need Regular Expression
  • How to avoid compilation of same Regular Expression again and again
  • Some code snippet with the use of Regex(a library)

What is a Regular Expression?

A Regular Expression is a way to describe complex search patterns using sequences of characters or you may say it is used for compiling an expression and then using it to search, split or replace text.
For example:

use regex::Regex;
let result = Regex::new(r"^\d{4}-\d{2}-\d{2}$").unwrap();
assert!(result.is_match("2014-01-01"));

Let’s walk through this example piece-by-piece:

  • ^ – Signifies the start of a line.
  • $ – Signifies the end of a line.
  • r”” – Signifies raw string, a raw string do not process any escape sequences. For example, “\\d” is the same expression as r”\d”.
  • \d – Signifies a digit between 0 and 9.
  • \d{n} – n digit number.

So, here regex is the library name and the Regex is a struct declared in the regex library. Regex::new() function returns a Result. It will give error only in the cases of invalid expression and since we are coding this by ourselves we probably never going to get an error and therefore we can use unwrap() with this.

Now result.is_match(“2014-01-01”) is going to give true if the date passes regex and false if not.

Here is one more example:

fn main() {
  let result = Regex::new(r"\w{4}").unwrap();
  let name = "Test";
  println!("Found match :: {}",result.is_match(name));
}

Why Regex?

Regular Expressions can be used in any programming language. A knowledge of regex is very useful for validating user input, interacting with the Unix shell, searching/refactoring code in your favorite text editor, performing database text searches, and lots more.

Avoid Compiling Same Regex

It is not feasible to compile the same regular expression in a loop since compilation is typically expensive. (It takes anywhere from a few microseconds to a few milliseconds depending on the size of the regex).

For this, we use lazy_static crate to avoid compilation of regex more than once.

lazy_static! {
static ref RESULT: Regex = Regex::new("...").unwrap();
}
 RESULT.is_match(text)

Specifically, in this example, the regex will be compiled when it is used for the first time. On subsequent uses, it will reuse the previous compilation.

Let’s end this post with one last example:

fn main() {
  let mut file = File::open("regex.txt").expect("file not found");
  let mut file_content = String::new();
  file.read_to_string(&mut file_content).expect("something went wrong with reading this file");

  let email_pattern = match Regex::new(r"^([\w\.\-]+)@([\w\-]+)((\.(\w){2,3})+)$").expect("Unable to compile given regex");

  let result = email_pattern.find_iter(&file_content);
  for matched_pattern in result {
    println!("{:?}", &file_content[matched_pattern.start()..matched_pattern.end()]);
  }
}

In the above example, we are trying to get the email ids from the file. So, firstly we opened a file then write the content of the file into a variable of String type then put the regex into it, to extract the email ids.

Hope you all will get acquainted with the concept of regex.
Thanks for Reading!!!

Written by 

Pawan Singh Bisht is a Software Consultant at Knoldus Software LLP, having a strong experience of more than two years in the technology field. He has been well versed in the core implementation of Rust and Java. He loves to contribute to the community which he attained from the community.

Discover more from Knoldus Blogs

Subscribe now to keep reading and get access to the full archive.

Continue reading