Vector Vs List in Scala

yellow bokeh photo
Reading Time: 5 minutes
Scala Logo

Introduction

In this article, we will compare the List and Vector data structures in Scala and analyze the pros and cons of using them in different use cases. As we all know List is the most popular data structure in scala and quite simply the most beginner friendly of Scala Collections. But is it the obvious choice for all if not most cases? Let us examine Scala’s Vector vs List and their performance in this article and help you pick the right choice for your use case.

What is a Vector?

Vector is a general-purpose, immutable data structure. It provides random access and updates in constant time and very fast appends and prepends. Vectors strike a good balance between fast random selections and fast random functional updates. They are currently the default implementation of immutable indexed sequences.

i.e you get the following result in REPL if you instantiate an IndexedSeq

val seq = IndexedSeq(1, 2)
val seq: IndexedSeq[Int] = Vector(1, 2)

Vector is a base-32 integer trie. A Trie is a type of n-ary search tree data structure that can efficiently store and search data. It is typically used to store strings but can be used to store ordered lists of any type such as digits or bits. Each node contains the value up to that point in the Trie. Each edge is the key we need to insert or search. Here is a sample example of a Trie

Trie with Strings implementation
Trie implementation using strings

Vector is a similar Trie where the root node has 32 children nodes. Each children node can have 32 children and so on. Hence to hold 2^30 elements a Trie has just 5 levels.

This makes traversing an “effectively constant time” (eC) operation. Further Vector elements exist in a batch of 32, thus providing a high locality. This means that they are stored in neighboring memory locations. Accessing such elements invokes the cache to store them hence providing quick access.

What is a List?

A List in scala is an immutable linked-list implementation of ordered elements of type [A]. Hence the splitting of the head and tail is a constant operation. List is a commonly used data structure in scala and is immutable. Adding elements to List or removing elements from the List returns a new one. It is a stack-type data structure optimal for last-in-first-out LIFO access patterns. It is not suited for random access or FIFO-type access patterns.

Vector Vs List in Scala

Both Vectors and Lists provide structural sharing. Structural sharing means copying similar prefix or suffix parts to the new memory location (new variable). This offers some optimization in update operations where we create a new structure from the existing one. However, there are some fundamental performance differences.

Head / Tail

Since List is a linked list, the head or tail operations are incredibly fast and are a constant (C) operation. Whereas in Vector it is nearly-a-constant-time operation(eC) considering factors such as the maximum size of the Vector.

Access

Since List is a LinearSeq, getting an element at a specified index is an O(k) time operation where k is the position of the element in the List. Vector, on the other hand, is an IndexedSeq and is way faster. It offers fast random access. Vectors are Tries and just need to do a constant number of operations (up to 7 in real-time use cases) to look up sufficiently large data. Hence List takes Linear time here while Vector takes nearly constant time.

Prepend / Append

Lists are fast for prepend operations. They just update the new node’s pointer to point to the existing List. Vectors also, provide fast prepend operations due to their tree-ish structure. Structural sharing helps them copy the existing structure and add a new element on top of it. Lists prepend in constant (C) time whereas Vectors prepend in nearly constant (eC) time.

As for append operations, Vectors are significantly faster than Lists. Lists traverse till the last node and then append the new element which takes a linear O(n) time. Vector lookup takes place in O(1) i.e constant time. Hence appending to the element also takes a nearly constant time. With help from structural sharing, Vectors can copy the prefix, add the new element, and return the new Vector.

Updates

Vector is an IndexedSeq which modifies the existing element in nearly constant time. It looks up the position to update in constant time. The element is updated. Finally, the prefix/suffix structure up to that point is shared, returning a Vector with the updated element.Lists on the other hand, have to traverse and find the element before updating them. This takes O(n) time. Hence Lists take Linear time for updating while Vectors take nearly constant time.

Iteration

Lists are a LinearSeq and are a good choice for iterative traversals. But Vectors offer equally good traversal time. Iteration in Lists is following each node’s pointer to its location. This can sometimes result in a cache miss. Addresses of those pointers reside in a different part of the memory and cannot be fetched from the cache.

Traversing elements is slightly more complicated in Vectors. Still, due to their excellent locality, elements belonging to the same level, especially to a particular node are almost next to each other. Hence Vectors reduce cache misses by a significant level. Tries traverse in O(n) time, and so Vectors also traverse in O(n) time.

Parallelism

List does not work well with parallel algorithms. Since a List requires O(n) traversals to create a sublist or access an element from it, it is not suitable to use Lists when implementing parallel algorithms. An example of such an algorithm is splitting a list into sublists and sorting each of the sublists. But this cannot be done in isolation. To traverse each of the sublists, we need to traverse the whole list from the start. Furthermore, Lists themselves contain mutable states in their internal implementation making them gullible to race conditions. A note from the official docs on scala says thus,

Note: Despite being an immutable collection, the implementation uses mutable state internally during construction. These state changes are invisible in single-threaded code but can lead to race conditions in some multi-threaded scenarios.

It also makes sense as to why List was not included in the parallel collections package in scala. Vector is much more suitable in this aspect providing fast random access and updates, thus making it useful in parallel algorithms without any race conditions. Scala ParVector is a specialized data structure that allows for efficient copying and splitting while keeping the data immutable in distributed environments.

As a summarization of the above differences, here are performance benchmarks from the docs, that list the differences between Vector vs List in Scala.

yellow bokeh photo

Conclusion

After a detailed analysis of Vector vs List in Scala, it is clear that Vectors gain the upper hand over Lists in case of random access, and updates while Lists are better at the head, tail, and prepend operations. Vector is the obvious choice when you need an Indexed Sequence collection and when implementing parallel algorithms.

I hope by considering the above points you will be able to decide which data structure is better suited for your use case. Feel free to add to the discussion by dropping your suggestions, and feedback below. Thanks for reading and I hope you have a good day!

Written by 

I'm a budding software engineer who is currently interested and experienced in Scala. Eager to learn and improve myself around technology and programming. Loves to read and write fiction.

Leave a Reply