NumPy – Say Bye to Loops

Reading Time: 5 minutes

It is said that Python, compared to low-level languages like C, improves development time at the expense of runtime. But there are a handful of ways to speed up operation runtime in Python without sacrificing ease of use. One such suited package available for fast numerical operations is NumPy or Numerical Python.

What NumPy is, how to install and use it is already covered in the blog Introduction to NumPy. But what this blog wants to throw light on is what makes NumPy so special and powerful.

Let’s first look at a simple Python code which multiplies each element of one list (a) to corresponding elements of the second list (b). The code looks like this:

# Using List Comprehension
[(x*y) for x,y in zip(L1 , L2)]

Though the code is simple enough, but what if a and b each contain millions of numbers? We will have to pay the price for the inefficiencies of looping in Python.

Numpy achieves the same task using ndarrays (a,b here) with the following line of code:

c = a * b

Not only is the syntax simpler, but is efficient as well. With no loops in the execution, the overhead decreases by a large number. But how NumPy does that?

When it comes to computation, there are two major concepts that lend NumPy its power:

  • Vectorization
  • Broadcasting


Vectorization is a powerful ability within NumPy which is used to speed up the code execution without using loop. It expresses operations as occurring on entire arrays rather than their individual elements.

Looping over an array or any data structure in Python has a lot of overhead involved. In NumPy, Vectorized Operations delegates the looping internally to highly optimized C and Fortran functions, making for cleaner and faster Python code. So, vectorization refers to the concept of replacing explicit for-loops with array expressions, which can then be computed internally with a low-level language, like C.

Consider the following lines of code:

import time
size = 1000000
L1 = range(size)
L2 = range(size)
start = time.time()
result = [(x*y) for x,y in zip(L1 , L2)]
print("python list took :" ,(time.time()-start)*1000 )

The output for the above Python code is:

'python list took :', 417.16599464416504)

Using NumPy to achieve the same result, the code looks like:

import numpy as np
import time
size = 1000000
a1 = np.arange(size)
a2 = np.arange(size)
start = time.time()
result = a1 * a2
print("numpy array took :" ,(time.time() - start)*1000)

Running the above code yields the following output:

'numpy array took :', 14.219999313354492

It is clear from the output that the NumPy code is almost 30 times faster than the non-vectorized Python code.

Another important factor to consider is that not only NumPy delegates to C for faster processing, but with some element-wise operations and linear algebra, it also takes the advantage of computing within multiple threads.


Broadcasting is another important NumPy abstraction. For two equal sized vectors, vectorization takes care of the computation. But when unequally sized arrays are encountered, Broadcasting comes into the picture.

The term Broadcasting describes how NumPy treats arrays with different shapes during arithmetic operations. Subject to certain constraints, the smaller array is “broadcast” across the larger array to have compatible shapes. This helps in vectorizing array operations so that looping occurs in C instead of Python, making the overall execution faster.

The simplest broadcasting example occurs when we combine an array and a scalar value in an operation:

a = np.array([1.0, 2.0, 3.0])
b = 2.0
a * b
array([ 2.,  4.,  6.])

In this code, the scalar b can be imagined to be stretched during the arithmetic operation into an array with the same shape as a. The new elements in b are simply copies of the original scalar. NumPy does not actually stretch the array in memory, but just repeat the computation.

When operating on two arrays, NumPy compares their shapes element-wise. It starts with the trailing dimensions and works its way forward. Two dimensions are compatible when

  1. they are equal, or
  2. one of them is 1
# 2-D array
arr1 = arr1 = np.array([0,1,2,4,10,10]).reshape((3,2))
# 1-D array
arr2 = np.array([4,5])
# arr2 with dimension 1 broadcasts to arr1
arr1 - arr2 

The output for the above array addition becomes:

array([[-4, -4],
       [-2, -1],
       [ 6,  5]])

The broadcasting process can be visualised as:

Some of the scenarios where arrays can not be broadcasted are:

# trailing dimensions do not match
A      (1d array):  3
B      (1d array):  4

# when other dimensions do not match
A      (2d array):      2 x 1
B      (3d array):  8 x 4 x 3
# Here the last dimension of the smaller array is 1, but second to last does not match

Rules for Broadcast

There are some significantly more complex cases too. A more rigorous definition of when any arbitrary number of arrays of any shape can broadcast together:

A set of arrays is “broadcastable” to the same shape if the following rules produce a valid result, meaning one of the following is true:

1. The arrays all have exactly the same shape.

>>> arr3 = np.arange(1,16).reshape((3,5))
>>> print(arr3)
array([[ 1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10],
       [11, 12, 13, 14, 15]])
>>> arr4 = np.array([[4,5,6,7,8],[1,2,3,4,5],[0,0,0,0,0]])
>>> print(arr4)
array([[ 1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10],
       [11, 12, 13, 14, 15]])
# Checking if the two array have same shape
>>> print(arr3.shape == arr4.shape)
# Same shape, hence broadcastable
>>> arr3 + arr4
array([[ 5,  7,  9, 11, 13],
       [ 7,  9, 11, 13, 15],
       [11, 12, 13, 14, 15]])

2. The arrays all have the same number of dimensions, and the length of each dimension is either a common length or 1.

# Checking if arr3 and arr4 have the same number of dimensions
>>> arr3.ndim == arr4.ndim
# Hence broadcastable

3. The arrays that have too few dimensions can have their shapes prepended with a dimension of length 1 to satisfy property #2.

# If one of the arrays have small dimension,then  dimension 1 can be prepended to it to match the larger array's dimension
>>> arr3 = np.arange(1,16).reshape((3,5))
>>> arr3.shape
(3, 5)
>>> arr4 = np.array([1,2,3,4,5])
# Prepending 1 to it's shape makes it (1,5)
# Comparing the two shapes, the arrays now become broadcastable
>>> arr3 + arr4
array([[ 2,  4,  6,  8, 10],
       [ 7,  9, 11, 13, 15],
       [12, 14, 16, 18, 20]])

>>> arr5 = np.array([1,2,3])
>>> arr5.shape
#Prepending 1 makes the shape (1,3), which still does not make arr3 and arr5 broadcastable
>>> arr3 + arr5
ValueError: operands could not be broadcast together with shapes (3,5) (3,) 


While working on data analysis or machine learning projects, having a solid understanding of NumPy is nearly mandatory. The reason being that packages for data analysis are either built on Numpy or work heavily with it. And Numpy draws it’s power from Vectorization and Broadcasting. So, understanding what these concepts offer to the domain gives an extra edge while data manipulation.


1 thought on “NumPy – Say Bye to Loops6 min read

Comments are closed.