Are you using Scala Collection efficiently?

Table of contents

Reading Time: 5 minutes

In this blog, We will be going to explore how we can use scala collections efficiently .

Though, we are taking care of immutability but still something more can be done to make your code more readable and faster.

List vs Vector:

Vector is a collection with good random access. List is indeed a linked list with very fast append-first operation (::), but the links are unidirectional.

Scala Vector has a very effective iteration implementation and, comparing to List is faster in traversing linearly. if you are planning to do quite some appending, Vector is a better choice and List don’t work well with parallel algorithms.

Most efficient way to create ListBuffer:

Order by efficient:

val list = new mutable.ListBuffer[String]
val list = mutable.ListBuffer.empty[String]
val list = mutable.ListBuffer[String]()

First approach creates only one object ListBuffer itself but other two approaches create an instanceof scala.collection.mutable.AddingBuilder first, which is then asked for a new instance of ListBuffer. Therefore, first approach is very efficient.

Prefer TrieMap over any other mutable Map:

// before
val mutableMap = collection.mutable.Map[T,T]()

// after
val immutableMap = collection.concurrent.TrieMap[T,T]()

If there is need to use mutable Map in multi-threading environment, use TrieMap. A Scala TrieMap is a trie-based concurrent scalable and immutable map implementation. Unlike normal trie maps, a Scala TrieMap has an efficient, non-blocking, O(1) time snapshot operation (and a slightly optimized readOnlySnapshot) operation.

Creating empty collections explicitly:

List[T]() //before

List.empty[T] //after

There are some immutable collection classes provide singleton “empty” implementations, however not all of the factory methods check length of the created collections. Thus, by making collection emptiness apparent at compile time, we could save either heap space (by reusing empty collection instances) or CPU cycles (otherwise wasted on runtime length checks).

Never negate emptiness-related properties:

//before
!list.isEmpty
!list.nonEmpty

//after
list.nonEmpty
list.isEmpty

Simple built-in methods add less visual clutter than a compound expression. It is also applicable to Set, Option,Map, Iterator

Never compute length for emptiness check:

// before
list.length > 0
list.length != 0
list.length == 0

// after
list.nonEmpty
list.nonEmpty
list.isEmpty

Simple property is more easy to perceive than compound expression. And collections are decedents of LinearSeq which will take O(n) time to compute length instead of O(1) time for IndexedSeq. Therefore, we can speed up our code by avoiding unnecessary computations when exact value is not required. Besides that, we can never use length method on infinite streams but we can verify stream emptiness directly.

Never compute full length of collection for length matching:

// before
list.length > n
list.length < n list.length == n
list.length != n

// after
list.lengthCompare(n) > 0
list.lengthCompare(n) < 0
list.lengthCompare(n) == 0
list.lengthCompare(n) != 0

Because computing length might be very “costly” for some collection classes. We can reduce comparisons time from O(length) to O(length min n) for decedents of LinearSeq. This approach won’t work with infinite streams.

Prefer length to size for arrays:

// before
array.size

// after
array.length

While size and length are synonyms, in scala 2.11 Array.size calls are still implemented via implicit conversion, so that intermediate wrapper objects are created for every method call. Unless you enable escape analysis in JVM , those temporary objects will burden GC and can potentially degrade code performance (especially, within loops).

Don’t retrieve first element by index:

// before
list(0)

// after
list.head

The second approach might be slightly faster for some collection classes. Additionally, property access is much simpler(both syntactically and semantically) than calling a method with an argument.

Don’t retrieve last element by index:

// before
list(list.length-1)

// after
list.last

The second approach is more obvious and allows to avoid unnecessary length computation. Besides, some collection classes can retrieve last element more efficiently comparing to by-index access.

Don’t check index bound explicitly:

// before
if (i < list.length) Some(list(i)) else None

// after
list.lift(i)

The second expression is more concise and semantically equivalent.

Don’t imitate headOption:

// before
if (list.nonEmpty) Some(list.head) else None
list.lift(0)

// after
list.headOption

The optimized expression is more concise.

Don’t imitate lastOption :

// before
if (list.nonEmpty) Some(list.last) else None
list.lift(list.length – 1)

// after
list.lastOption

The optimized expression is more concise and faster.

Be careful with indexOf and lastIndexOf argument types:

// before
List(1, 2, 3,4,5).indexOf(“1”) // compilable
List(1, 2, 3,4,5).lastIndexOf(“2”) // compilable

// after
List(1, 2, 3,4,5).indexOf(1)
List(1, 2, 3,4,5).lastIndexOf(2)

indexOf and lastIndexOf accepts arguments of Any type. In practice, that might lead to hard-to-find bugs, which are not discoverable at compile time. That’s where auxiliary IDE inspections come in handy.

Don’t use equality predicate to check element presence:

// before
list.exists(_ == x)

// after
list.contains(x)

The second expression is more concise and semantically equivalent.

Be careful with contains argument type:

// before
List(1, 2, 3,4,5).contains(“1”) // compilable

// after
List(1, 2, 3,4,5).contains(1)

Just like indexOf and lastIndexOf method, contains also accepts argument of type Any, that might lead to hard-to-find bugs, which are not discoverable at compile time. Be careful with the method arguments.

Don’t construct indices range manually:

// before
Range(0, list.length)

// After
list.indices

There is a in-built method that returns the range of all indices of a collection.

Don’t zip collection with its indices manually:

// before
list.zip(list.indices)

// after
list.zipWithIndex

For one thing, the second expression is more concise. Besides, we may expect some performance gain, because we avoid hidden length calculation (which might be expensive for linear sequences). It will also work with infinite streams as well.

Don’t negate filtering predicate:

// before
List(1,2,3,4,5).filter(_ %2 != 0)

// after
List(1,2,3,4,5).filterNot(_ %2 == 0)

The second expression is syntactically simpler (while semantically they are equal).

Don’t sort by a property manually:

// before
list.sortWith(_.property < _.property)

// after
list.sortBy(_.property)

Second expression is more clear and concise.

Don’t sort by identity manually:

// before
list.sortBy(identity)
list.sortWith(_ < _)

// after
list.sorted

There’s a special method for that, more clear and concise.

Perform reverse sorting in one step:

// before
list.sorted.reverse
list.sortBy(_.property).reverse

// after
list.sorted(Ordering[T].reverse)
list.sortBy(_.property)(Ordering[T].reverse)

That’s how we can avoid a temporary collection allocation and exclude the additional transformation step (to save both heap space and CPU cycles).

Don’t use sorting to find the smallest element:

// before
list.sorted.head
list.sortBy(_.property).head

// after
list.min
list.minBy(_.property)

The latter approach is more concise. Besides, it’s more effective, because it doesn’t create a temporary collection.

Don’t use sorting to find the largest element:

// before
list.sorted.last
list.sortBy(_.property).last

// after
list.max
list.maxBy(_.property)

Merge consecutive filter calls:

// before
list.filter(p1).filter(p2)

// after
list.filter(x => p1(x) && p2(x))

That’s how we can avoid creation of an intermediate collection (after the first filter call), and thus relieve GC’s burden.

The predicates p1 and p2 must be pure.

Don’t imitate span:

// before
val list1 = list.takeWhile(p)
val list2 = list.dropWhile(p)

// after
val (list1, list2) = list.span(p)

That’s how we can traverse the sequence and check the predicate only once, instead of twice.

The predicate p must be pure.

Don’t imitate partition:

// before
val list1 = list.filter(p)
val list2 = list.filterNot(p)

// after
val (list1, list2) = list.partition(p)

Again, the benefit is a single-pass computation.

The predicate p must be pure.

Don’t use map when result is not required:

// Before
list.map(???) // the result is ignored

// After
list.foreach(???)

When side effect is all that is needed, there’s no justification for calling map. Such a call is misleading (and less efficient).

I know, this is not the complete list of efficient use of scala collections. There are many other ideas also which we still need to explore.

Hope that this blog is helpful for you. 🙂

References:

3 thoughts on “Are you using Scala Collection efficiently?7 min read”

Mahesh Chand Kandpal says:

November 12, 2017 at 4:06 PM

Reblogged this on Mahesh's Programming Blog.
valenterry says:

November 15, 2017 at 3:02 PM

About merging consecutive filter calls: list.filter(x => p1(x) && p2(x))
Isn’t list.withFilter(p1(x)).withFilter(p2(x)) equally performant? I’m not quite sure about the inner workings of withFilter, but I assumed that its lazy nature would make merging multiple filter unnecessary.
joecamel says:

January 10, 2018 at 8:48 PM

Reblogged this on Coding, Unix & Other Hackeresque Things.