Back2Basics: Scala Extractors in Detail

Table of contents

Reading Time: 5 minutes

While working with the Case Classes closely, we have a very concise way to decompose an entity using pattern matching. The potent example is pattern matching on the Some(x) and None, as most of us know they are case class and case object respectively.

The potential question that we have is do we have a provision to use similar patterns without having an associated case class?

Let us explore for the answer. Scala Extractors allows defining patterns that are decoupled from the object representation. They allow us to define patterns for pre-existing types.

What is an Extractor? What does it do? How does it work?

Please refer to the following blog. This will help you to grasp the basics that are prerequisite before we understand and explore extractors in more depth.

How does Pattern Matching Work?

Pattern matching allows us to decompose or deconstruct a given data structure, extracting the values it was constructed from into variables that become available for further processing.

Scala lets you decompose various kinds of data structures using pattern matching, the following are few of them frequently seen like lists, streams and instances of case classes.

Let us understand using an example:

class Player

case class FootBallPlayer(fName: String, lName: String, score: Int) extends Player

def selectedPlayers(listOfPlayers: List[FootBallPlayer], selectionScore: Int): List[Option[Player]] =
  listOfPlayers.map {
    case player@FootBallPlayer(_, _, score) if score > selectionScore => Some(player)
    case _ => None
  }

The simple reason why we are able to execute the above code snippet is the existence of extractors.

In its most primitive form, an extractor has the opposite role of a constructor. While the constructor creates an object from a given list of parameters, an extractor extracts the parameters from which an object passed to it was created.

The Scala library contains some predefined extractors. Case classes automatically create a companion object for themselves: it is a singleton object that contains an apply method for creating new instances of the case class and an unapply method that is implemented by an object in order for it to be an extractor.

Let us again work using an example, let us understand that there are many valid possible signatures for unapply method.

trait Player {
   
  def name: String
}

class FootBallPlayer(val name: String) extends Player

class RugbyPlayer(val name: String) extends Player

Now, the next step would be to implement extractors for the FootBallPlayer and RugbyPlayer classes in respective companion objects, just like Scala compiler would have done if there had been case classes. If the extractor is expected to extract a single parameter only from a given object, the signature of an unapply method will look like this:

object FootBallPlayer {
  def unapply(arg: FootBallPlayer): Option[String] = Some(arg.name)
}

object RugbyPlayer {
  def unapply(arg: RugbyPlayer): Option[String] = Some(arg.name)
}

The unapply method expects some object of type FootBallPlayer or Player and RugbyPlayer or Player respectively and returns an Option of type String, which is the type of the parameter it extracts. The unapply method returns either Some[T] if it successfully extracts the parameter from the given object or None, which implies that the parameters could not be extracted, as per the rules specified in the extractor implementation.

Let us see how we can invoke these :

val greatestPlayer: FootBallPlayer = new FootBallPlayer("Jindan")

FootBallPlayer.unapply(greatestPlayer)

In practice, we don’t call this method directly. The call to the extractor’s unapply method is made if the extractor is used as an extractor pattern. If the result of calling unapply is Some[T], it means that the pattern matches and the extracted value is extracted to the variable declared in the pattern. If it is a None, it means that the pattern doesn’t match.

Let us extend the first example further

abstract class Player
case class FootBallPlayer(fName: String, lName: String, score: Int) extends Player
case class RugbyPlayer(name: String, goals: Int) extends Player

def selectedPlayers(listOfPlayers: List[Player], selectionScore: Int) = listOfPlayers.map {
  case fPlayer@FootBallPlayer(_, _, score) if score > selectionScore => Some(fPlayer)
  case rPlayer@RugbyPlayer(_, goals) if goals > selectionScore => Some(rPlayer)
}

selectedPlayers(List(RugbyPlayer("Jean", 11)), 3)

Note that the two extractors never return a None. The snippet shows that it makes more sense than it might seem at first. For an object that could be of some other type, you could check its type and extract/deconstruct it at the same time.

In the above snippet, the FootBallPlayer pattern will not match because it expects an object of a different type than we passed in it. Hence, the player value is now passed to the unapply method of the RugbyPlayer companion object, as that extractor is used in the second pattern. This pattern will match, and the returned value is bound to the rPlayer parameter.

Additionally, one could observe that the extractor above has allowed us to extract several values. So any number of class fields can be extracted using the pattern matching and unapply method. Generally, if an extractor pattern is to decompose a given data structure into more than one parameter, the signature of the extractor’s unapply method needs to look like this:

def unapply(object: S): Option[(T1, …, Tn)]

Often there are occasions where we don’t really have the need to extract parameters from a data structure against which you want to match and instead, we just want to perform a simple boolean check only. Here we have a Boolean Extractor handy for us.

This leads to a slight modification in the signature of the extractor’s unapply method, to look like:

def selectedPlayers(listOfPlayers: List[Player], selectionScore: Int) = listOfPlayers.map {
  case fPlayer@FootBallPlayer(_, _, score) if score > selectionScore => true
  case rPlayer@RugbyPlayer(_, goals) if goals > selectionScore => false
  case _ => false
}

You might be wondering what is the @ here used for?

Scala’s pattern matching allows binding the value that is matched to a variable using the type that the used extractor expects. This is done using the @ operator. Since our fPlayer extractor expects an instance of FootBallPlayer, we have therefore bound the matched value to a variable fPlayer of type FootBallPlayer.

Another very commonly used variant is Infix operation patterns. In Scala we can deconstruct the lists and streams in a way that is akin to one of the ways you can create them, using the cons operator :: or #::. Let us follow with an example:

val list = 11 :: 2 :: 3 :: Nil

list match {
  case head :: tail => tail.map(_ * head)
  case head :: Nil => head - 1
}

val stream = 58 #:: 43 #:: 93 #:: Stream.empty

stream match {
  case first #:: second #:: _ => (first, second)
  case _ => (-1, -1)
}

If you are still wondering, how does this work?

As an alternative to the extractor pattern notation seen above, Scala also allows extractors to be used in an infix notation. So, instead of writing extractor(p_1, p_2), where p_1 and p_2 are the parameters to be extracted from a given data structure, it’s always possible to write p_1 extractor p_2. So, the infix operation pattern head #:: tail could also be written as #::(head, tail).

However, the important question is when should we prefer one notation over the other. Usage of infix operation patterns is only recommended for extractors that indeed are supposed to read like operators, which is true for the cons operators of List and Stream, but certainly not for our FootBallPlayer extractor.

Now another question that needs to be answered is “when should we make use of custom extractors, especially when we get some useful extractors implicitly when we use case classes?”

Programmers often argue pointing out that using case classes and pattern matching against them breaks encapsulation, they object to the coupling, the way we match against data with its concrete representation, this does not sync with the object-oriented point of view.

It’s a good practice to do functional programming in Scala, to use case classes as algebraic data types (ADTs) that contain pure data and no behaviour.

Often, implementing your own extractors is only necessary if we want to extract something from a type we actually have no control over, or if you need additional ways of pattern matching against certain data. A common usage of extractors is to extract meaningful values from some string.