Mathematics is a core part of machine learning and to dive deep into machine learning one should possess basic knowledge of mathematics concepts but when you start developing algorithms, mathematics can be a real pain. Thankfully we have some awesome libraries that reduce some of our pain and also allows us to focus more on our basic requirement rather than focussing more on manipulation techniques.While hunting these awesome libraries I found this magical numerical processing library named as Breeze.
So we will be focusing on Breeze in this blog, which is one of the most popular and
powerful linear algebra libraries. Spark MLlib, which provides a powerful framework for scalable machine learning, builds on top of Breeze and Spark.
Introduction
Breeze is a generic, clean and powerful Scala numerical processing library patterned after NumPy, Matlab, and R and licensed under Apache Public License 2.0. Breeze, the successor to Scalala, provides dense linear algebra, numerical routines, optimization, random number generators and signal processing among others.
In simple terms, Breeze is a Scala library that extends the
Scala collection library to provide support for vectors and matrices in addition to providing a whole bunch of functions that support their manipulation. We could safely compare Breeze to NumPy in Python terms. Breeze forms the foundation of MLlib—the Machine Learning library in Spark
Breeze comprises four libraries:
- breeze-math: Numerics and Linear Algebra. Fast linear algebra backed by native libraries (via JBlas) where appropriate.
- breeze-process: Tools for tokenizing, processing, and massaging data, especially textual data. Includes stemmers, tokenizers, and stop word filtering, among other features.
- breeze-learn: Optimization and Machine Learning. Contains state-of-the-art routines for convex optimization, sampling distributions, several classifiers, and DSLs for Linear Programming and Belief Propagation.
- breeze-viz: (Very alpha) Basic support for plotting, using JFreeChart.
Getting Breeze
In this first recipe, we will see how to pull the Breeze libraries into our project using Scala Build Tool (SBT).
name := "Breeze" version := "0.1" scalaVersion := "2.11.6" libraryDependencies ++= Seq("org.scalanlp" % "breeze_2.11" % "0.12") resolvers += "Sonatype Releases" at "https://oss.sonatype.org/content/repositories/releases/"
Or for console
$ sbt set libraryDependencies += "org.scalanlp" % "breeze_2.11" % "0.12" set resolvers += "Sonatype Releases" at "https://oss.sonatype.org/content/repositories/releases/" set scalaVersion := "2.11.6" console
Imports for getting breeze
import breeze.linalg._
import breeze.numerics._
Comparision with other numerical computing environments
Compared to other numerical computing environments, Breeze matrices default to column-major ordering, like Matlab, but indexing is 0-based, like Numpy. Breeze has as its core concepts matrices and column vectors. Row vectors are normally stored as matrices with a single row. This allows for greater type safety with the downside that conversion of row vectors to column vectors is performed using a transpose-slice (a.t(::,0)
) instead of a simple transpose (a.t
).
Creation
Operation | Breeze | Numpy |
---|---|---|
Zeroed matrix | DenseMatrix.zeros[Double](n,m) |
zeros((n,m)) |
Zeroed vector | DenseVector.zeros[Double](n) |
zeros(n) |
Vector of ones | DenseVector.ones[Double](n) |
ones(n) |
Vector of particular number | DenseVector.fill(n){5.0} |
ones(n) * 5 |
n element range | linspace(start,stop,numvals) |
|
Identity matrix | DenseMatrix.eye[Double](n) |
eye(n) |
Diagonal matrix | diag(DenseVector(1.0,2.0,3.0)) |
diag((1,2,3)) |
Matrix inline creation | DenseMatrix((1.0,2.0), (3.0,4.0)) |
array([ [1,2], [3,4] ]) |
Column vector inline creation | DenseVector(1,2,3,4) |
array([1,2,3,4]) |
Row vector inline creation | DenseVector(1,2,3,4).t |
array([1,2,3]).reshape(-1,1) |
Vector from function | DenseVector.tabulate(3){i => 2*i} |
|
Matrix from function | DenseMatrix.tabulate(3, 2){case (i, j) => i+j} |
Reading and writing Matrices
Currently, Breeze supports IO for Matrices in two ways: Java serialization and csv. The latter comes from two functions: breeze.linalg.csvread
and.breeze.linalg.csvwrite
csvread
takes a File, and optionally parameters for how the CSV file is delimited (e.g. if it is actually a tsv file, you can set tabs as the field delimiter.) and returns a DenseMatrix. Similarly, csvwrite
takes a File and a DenseMatrix, and writes the contents of a matrix to a file.
Indexing and Slicing
Operation | Breeze | Matlab | Numpy | R |
---|---|---|---|---|
Basic Indexing | a(0,1) |
a(1,2) |
a[0,1] |
a[1L,2L] |
Extract subset of vector | a(1 to 4) or a(1 until 5) or a.slice(1,5) |
a(2:5) |
a[1:5] |
a[2:5] |
(negative steps) | a(5 to 0 by -1) |
a(6:-1:1) |
a[5::-1] |
a[6:1] |
(tail) | a(1 to -1) |
a(2:end) |
a[1:] |
a[-1] |
(last element) | a( -1 ) |
a(end) |
a[-1] |
tail(a, n=1) |
Extract column of matrix | a(::, 2) |
a(:,3) |
a[:,2] |
a[,2] |
Operations
Operation | Breeze | Matlab | Numpy | R |
---|---|---|---|---|
Elementwise addition | a + b |
a + b |
a + b |
a + b |
Shaped/Matrix multiplication | a * b |
a * b |
dot(a, b) |
a %*% b |
Elementwise multiplication | a *:* b |
a .* b |
a * b |
a * b |
Elementwise division | a /:/ b |
a ./ b |
a / b |
a / b |
Elementwise comparison | a :< b |
a < b (gives matrix of 1/0 instead of true/false) |
a < b |
a < b |
Elementwise equals | a :== b |
a == b (gives matrix of 1/0 instead of true/false) |
a == b |
a == b |
Map and Reduce
For most simple mapping tasks, one can simply use vectorized, or universal functions. Given a vector,v
we can simply take the log of each element of a vector withlog(v)
Sometimes, however, we want to apply a somewhat idiosyncratic function to each element of a vector. For this, we can use the map function:
val v = DenseVector(1.0,2.0,3.0)
v.map( xi => foobar(xi) )
Breeze provides a number of built-in reduction functions such as sum, mean. You can implement a custom reduction using the higher-order functionreduce
, For instance, we can sum the first 9 integers as follows:
val v = linspace(0,9,10)
val s = v.reduce( _ + _ )
Casting and type safety
Compared to Numpy and Matlab, Breeze requires you to be more explicit about the types of your variables. When you create a new vector, for example, you must specify a type (such as in DenseVector.zeros[Double](n)
) in cases where a type cannot be inferred automatically. Automatic inference will occur when you create a vector by passing its initial values in (DenseVector
). A common mistake is using integers for initialization (e.g. DenseVector
), which would give a matrix of integers instead of doubles. Both Numpy and Matlab would default to doubles instead.
The breeze will not convert integers to doubles for you in most expressions. Simple operations like a :+ 3
when a
is a DenseVector[Double]
will not compile. Breeze provides a convert function, which can be used to explicitly cast. You can also usev.mapValues(_.toDouble)
Casting
Operation | Breeze | Matlab | Numpy | R |
---|---|---|---|---|
Convert to Int | convert(a, Int) |
int(a) |
a.astype(int) |
as.integer(a) |
References:
Official Wiki-Page: here Thanks for reading, keep sharing.
Reblogged this on Coding, Unix & Other Hackeresque Things.