Cool Breeze of Scala for Easy Computation: Introduction to Breeze Library

Mathematics is a core part of machine learning and to dive deep into machine learning one should possess basic knowledge of mathematics concepts but when you start developing algorithms, mathematics can be a real pain. Thankfully we have some awesome libraries that reduce some of our pain and also allows us to focus more on our basic requirement rather than focussing more on manipulation techniques.While hunting these awesome libraries I found this magical numerical processing library named as Breeze.

So we will be focusing on Breeze in this blog, which is one of the most popular and
powerful linear algebra libraries. Spark MLlib, which provides a powerful framework for scalable machine learning, builds on top of Breeze and Spark.


Breeze is a generic, clean and powerful Scala numerical processing library patterned after NumPy, Matlab, and R and licensed under Apache Public License 2.0. Breeze, the successor to Scalala, provides dense linear algebra, numerical routines, optimization, random number generators and signal processing among others.


In simple terms, Breeze is a Scala library that extends the
Scala collection library to provide support for vectors and matrices in addition to providing a whole bunch of functions that support their manipulation. We could safely compare Breeze to NumPy in Python terms. Breeze forms the foundation of MLlib—the Machine Learning library in Spark

Breeze comprises four libraries:

  • breeze-math: Numerics and Linear Algebra. Fast linear algebra backed by native libraries (via JBlas) where appropriate.
  • breeze-process: Tools for tokenizing, processing, and massaging data, especially textual data. Includes stemmers, tokenizers, and stop word filtering, among other features.
  • breeze-learn: Optimization and Machine Learning. Contains state-of-the-art routines for convex optimization, sampling distributions, several classifiers, and DSLs for Linear Programming and Belief Propagation.
  • breeze-viz: (Very alpha) Basic support for plotting, using JFreeChart.

Getting Breeze

In this first recipe, we will see how to pull the Breeze libraries into our project using Scala Build Tool (SBT).

name := "Breeze"

version := "0.1"

scalaVersion := "2.11.6"

libraryDependencies ++= Seq("org.scalanlp" % "breeze_2.11" % "0.12")

resolvers += "Sonatype Releases" at ""

Or for console

$ sbt
set libraryDependencies += "org.scalanlp" % "breeze_2.11" % "0.12"
set resolvers += "Sonatype Releases" at ""
set scalaVersion := "2.11.6"

Imports for getting breeze

import breeze.linalg._
import breeze.numerics._

Comparision with other numerical computing environments

Compared to other numerical computing environments, Breeze matrices default to column-major ordering, like Matlab, but indexing is 0-based, like Numpy. Breeze has as its core concepts matrices and column vectors. Row vectors are normally stored as matrices with a single row. This allows for greater type safety with the downside that conversion of row vectors to column vectors is performed using a transpose-slice (a.t(::,0)) instead of a simple transpose (a.t).


Operation Breeze Numpy
Zeroed matrix DenseMatrix.zeros[Double](n,m) zeros((n,m))
Zeroed vector DenseVector.zeros[Double](n) zeros(n)
Vector of ones DenseVector.ones[Double](n) ones(n)
Vector of particular number DenseVector.fill(n){5.0} ones(n) * 5
n element range linspace(start,stop,numvals)
Identity matrix DenseMatrix.eye[Double](n) eye(n)
Diagonal matrix diag(DenseVector(1.0,2.0,3.0)) diag((1,2,3))
Matrix inline creation DenseMatrix((1.0,2.0), (3.0,4.0)) array([ [1,2], [3,4] ])
Column vector inline creation DenseVector(1,2,3,4) array([1,2,3,4])
Row vector inline creation DenseVector(1,2,3,4).t array([1,2,3]).reshape(-1,1)
Vector from function DenseVector.tabulate(3){i => 2*i}
Matrix from function DenseMatrix.tabulate(3, 2){case (i, j) => i+j}

Reading and writing Matrices

Currently, Breeze supports IO for Matrices in two ways: Java serialization and csv. The latter comes from two functions: breeze.linalg.csvread and.breeze.linalg.csvwrite csvreadtakes a File, and optionally parameters for how the CSV file is delimited (e.g. if it is actually a tsv file, you can set tabs as the field delimiter.) and returns a DenseMatrix. Similarly, csvwrite takes a File and a DenseMatrix, and writes the contents of a matrix to a file.

Indexing and Slicing

Operation Breeze Matlab Numpy R
Basic Indexing a(0,1) a(1,2) a[0,1] a[1L,2L]
Extract subset of vector a(1 to 4) or a(1 until 5)or a.slice(1,5) a(2:5) a[1:5] a[2:5]
(negative steps) a(5 to 0 by -1) a(6:-1:1) a[5::-1] a[6:1]
(tail) a(1 to -1) a(2:end) a[1:] a[-1]
(last element) a( -1 ) a(end) a[-1] tail(a, n=1)
Extract column of matrix a(::, 2) a(:,3) a[:,2] a[,2]


Operation Breeze Matlab Numpy R
Elementwise addition a + b a + b a + b a + b
Shaped/Matrix multiplication a * b a * b dot(a, b) a %*% b
Elementwise multiplication a *:* b a .* b a * b a * b
Elementwise division a /:/ b a ./ b a / b a / b
Elementwise comparison a :< b a < b (gives matrix of 1/0 instead of true/false) a < b a < b
Elementwise equals a :== b a == b (gives matrix of 1/0 instead of true/false) a == b a == b

Map and Reduce

For most simple mapping tasks, one can simply use vectorized, or universal functions. Given a vector,v we can simply take the log of each element of a vector withlog(v) Sometimes, however, we want to apply a somewhat idiosyncratic function to each element of a vector. For this, we can use the map function:

val v = DenseVector(1.0,2.0,3.0) xi => foobar(xi) )

Breeze provides a number of built-in reduction functions such as sum, mean. You can implement a custom reduction using the higher-order functionreduce, For instance, we can sum the first 9 integers as follows:

val v = linspace(0,9,10)
val s = v.reduce( _ + _ )

Casting and type safety

Compared to Numpy and Matlab, Breeze requires you to be more explicit about the types of your variables. When you create a new vector, for example, you must specify a type (such as in DenseVector.zeros[Double](n)) in cases where a type cannot be inferred automatically. Automatic inference will occur when you create a vector by passing its initial values in (DenseVector). A common mistake is using integers for initialization (e.g. DenseVector), which would give a matrix of integers instead of doubles. Both Numpy and Matlab would default to doubles instead.

The breeze will not convert integers to doubles for you in most expressions. Simple operations like a :+ 3 when a is a DenseVector[Double] will not compile. Breeze provides a convert function, which can be used to explicitly cast. You can also usev.mapValues(_.toDouble)


Operation Breeze Matlab Numpy R
Convert to Int convert(a, Int) int(a) a.astype(int) as.integer(a)


Official  Wiki-Page: here Thanks for reading, keep sharing.




This entry was posted in machine learning, Scala and tagged , , , , , , , , . Bookmark the permalink.

2 Responses to Cool Breeze of Scala for Easy Computation: Introduction to Breeze Library

  1. Pingback: Scala for Easy Computation – sendilsadasivam

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s