Cool Breeze of Scala for Easy Computation: Introduction to Breeze Library

Mathematics is a core part of machine learning and to dive deep into machine learning one should possess basic knowledge of mathematics concepts but when you start developing algorithms, mathematics can be a real pain. Thankfully we have some awesome libraries that reduce some of our pain and also allows us to focus more on our basic requirement rather than focussing more on manipulation techniques.While hunting these awesome libraries I found this magical numerical processing library named as Breeze.

So we will be focusing on Breeze in this blog, which is one of the most popular and
powerful linear algebra libraries. Spark MLlib, which provides a powerful framework for scalable machine learning, builds on top of Breeze and Spark.

Introduction

Breeze is a generic, clean and powerful Scala numerical processing library patterned after NumPy, Matlab, and R and licensed under Apache Public License 2.0. Breeze, the successor to Scalala, provides dense linear algebra, numerical routines, optimization, random number generators and signal processing among others.

image1

In simple terms, Breeze is a Scala library that extends the
Scala collection library to provide support for vectors and matrices in addition to providing a whole bunch of functions that support their manipulation. We could safely compare Breeze to NumPy in Python terms. Breeze forms the foundation of MLlib—the Machine Learning library in Spark

Breeze comprises four libraries:

  • breeze-math: Numerics and Linear Algebra. Fast linear algebra backed by native libraries (via JBlas) where appropriate.
  • breeze-process: Tools for tokenizing, processing, and massaging data, especially textual data. Includes stemmers, tokenizers, and stop word filtering, among other features.
  • breeze-learn: Optimization and Machine Learning. Contains state-of-the-art routines for convex optimization, sampling distributions, several classifiers, and DSLs for Linear Programming and Belief Propagation.
  • breeze-viz: (Very alpha) Basic support for plotting, using JFreeChart.

Getting Breeze

In this first recipe, we will see how to pull the Breeze libraries into our project using Scala Build Tool (SBT).

name := "Breeze"

version := "0.1"

scalaVersion := "2.11.6"

libraryDependencies ++= Seq("org.scalanlp" % "breeze_2.11" % "0.12")

resolvers += "Sonatype Releases" at "https://oss.sonatype.org/content/repositories/releases/"

Or for console

$ sbt
set libraryDependencies += "org.scalanlp" % "breeze_2.11" % "0.12"
set resolvers += "Sonatype Releases" at "https://oss.sonatype.org/content/repositories/releases/"
set scalaVersion := "2.11.6"
console

Imports for getting breeze

import breeze.linalg._
import breeze.numerics._

Comparision with other numerical computing environments

Compared to other numerical computing environments, Breeze matrices default to column-major ordering, like Matlab, but indexing is 0-based, like Numpy. Breeze has as its core concepts matrices and column vectors. Row vectors are normally stored as matrices with a single row. This allows for greater type safety with the downside that conversion of row vectors to column vectors is performed using a transpose-slice (a.t(::,0)) instead of a simple transpose (a.t).

Creation

Operation Breeze Numpy
Zeroed matrix DenseMatrix.zeros[Double](n,m) zeros((n,m))
Zeroed vector DenseVector.zeros[Double](n) zeros(n)
Vector of ones DenseVector.ones[Double](n) ones(n)
Vector of particular number DenseVector.fill(n){5.0} ones(n) * 5
n element range linspace(start,stop,numvals)
Identity matrix DenseMatrix.eye[Double](n) eye(n)
Diagonal matrix diag(DenseVector(1.0,2.0,3.0)) diag((1,2,3))
Matrix inline creation DenseMatrix((1.0,2.0), (3.0,4.0)) array([ [1,2], [3,4] ])
Column vector inline creation DenseVector(1,2,3,4) array([1,2,3,4])
Row vector inline creation DenseVector(1,2,3,4).t array([1,2,3]).reshape(-1,1)
Vector from function DenseVector.tabulate(3){i => 2*i}
Matrix from function DenseMatrix.tabulate(3, 2){case (i, j) => i+j}

Reading and writing Matrices

Currently, Breeze supports IO for Matrices in two ways: Java serialization and csv. The latter comes from two functions: breeze.linalg.csvread and.breeze.linalg.csvwrite csvreadtakes a File, and optionally parameters for how the CSV file is delimited (e.g. if it is actually a tsv file, you can set tabs as the field delimiter.) and returns a DenseMatrix. Similarly, csvwrite takes a File and a DenseMatrix, and writes the contents of a matrix to a file.

Indexing and Slicing

Operation Breeze Matlab Numpy R
Basic Indexing a(0,1) a(1,2) a[0,1] a[1L,2L]
Extract subset of vector a(1 to 4) or a(1 until 5)or a.slice(1,5) a(2:5) a[1:5] a[2:5]
(negative steps) a(5 to 0 by -1) a(6:-1:1) a[5::-1] a[6:1]
(tail) a(1 to -1) a(2:end) a[1:] a[-1]
(last element) a( -1 ) a(end) a[-1] tail(a, n=1)
Extract column of matrix a(::, 2) a(:,3) a[:,2] a[,2]

Operations

Operation Breeze Matlab Numpy R
Elementwise addition a + b a + b a + b a + b
Shaped/Matrix multiplication a * b a * b dot(a, b) a %*% b
Elementwise multiplication a *:* b a .* b a * b a * b
Elementwise division a /:/ b a ./ b a / b a / b
Elementwise comparison a :< b a < b (gives matrix of 1/0 instead of true/false) a < b a < b
Elementwise equals a :== b a == b (gives matrix of 1/0 instead of true/false) a == b a == b

Map and Reduce

For most simple mapping tasks, one can simply use vectorized, or universal functions. Given a vector,v we can simply take the log of each element of a vector withlog(v) Sometimes, however, we want to apply a somewhat idiosyncratic function to each element of a vector. For this, we can use the map function:

val v = DenseVector(1.0,2.0,3.0)
v.map( xi => foobar(xi) )

Breeze provides a number of built-in reduction functions such as sum, mean. You can implement a custom reduction using the higher-order functionreduce, For instance, we can sum the first 9 integers as follows:

val v = linspace(0,9,10)
val s = v.reduce( _ + _ )

Casting and type safety

Compared to Numpy and Matlab, Breeze requires you to be more explicit about the types of your variables. When you create a new vector, for example, you must specify a type (such as in DenseVector.zeros[Double](n)) in cases where a type cannot be inferred automatically. Automatic inference will occur when you create a vector by passing its initial values in (DenseVector). A common mistake is using integers for initialization (e.g. DenseVector), which would give a matrix of integers instead of doubles. Both Numpy and Matlab would default to doubles instead.

The breeze will not convert integers to doubles for you in most expressions. Simple operations like a :+ 3 when a is a DenseVector[Double] will not compile. Breeze provides a convert function, which can be used to explicitly cast. You can also usev.mapValues(_.toDouble)

Casting

Operation Breeze Matlab Numpy R
Convert to Int convert(a, Int) int(a) a.astype(int) as.integer(a)

 

References:
Official  Wiki-Page: here Thanks for reading, keep sharing.

 


knoldus-advt-sticker


 

Written by 

Nitin Aggarwal is a software consultant at Knoldus Software INC having more than 1.5 years of experience. Nitin likes to explore new technologies and learn new things every day. He loves watching cricket, marvels movies, playing guitar and exploring new places. Nitin is familiar with programming languages such as Java, Scala, C, C++, Html, CSS, technologies like lagom, Akka, Kafka, spark, and databases like Cassandra, MySql, PostgreSQL, graph DB like Titan DB.

3 thoughts on “Cool Breeze of Scala for Easy Computation: Introduction to Breeze Library

Leave a Reply

%d bloggers like this: