R-The Statistical Programming Language

Reading Time: 5 minutes

R is a powerful language used widely for data analysis and statistical computing. It was developed in the early 90s. It is one of the most popular languages used by statisticians, data analysts, researchers, and marketers to retrieve, clean, analyze, visualize and present data. It is open source and free. It supports cross-platform interoperability i.e, R code written on one platform can easily be ported to another without any issues.

IEEE publishes a list of the most popular programming languages each year. R was ranked 5th in 2016, up from 6th in 2015. It is a big deal for a domain-specific language like R to be more popular than a general purpose language like C#.

R is easy to learn. All you need is data and a clear intent to draw a conclusion based on analysis of that data. However, programmers that come from a Python, PHP or Java background might find R quirky and confusing at first. The syntax that R uses is a bit different from other common programming languages.

To install and run R on your ubuntu systems use the following commands :

sudo apt-get update
sudo apt-get install r-base

After installation, type R in your terminal and you are good to go!

Basics of R Programming

R can be used like a calculator and indeed one of its principal uses is to undertake complex mathematical and statistical calculations. R can perform simple calculations as well as more complex ones.
To get familiar with R coding environment, let’s start with some basic calculations. R console can be used as an interactive calculator too:

You also see that this line begins with [1] rather than the > cursor. R is telling you that the first element of the answer is 5. At the moment this does not seem very useful, but the usefulness becomes clearer later when the answers become longer.

Datatypes

R is object oriented, which means that it expects to find named things to deal with in some way. For example, if you are conducting an experiment and collecting data from several samples, you want to create several named data objects in R in order to work on them and do your analyses later on.
A vector, matrix, data frame, even a variable is an object. So, R has 5 basic classes of objects. This includes:

1. Character
2. Numeric (Real Numbers)
3. Integer (Whole Numbers)
4. Complex
5. Logical (True / False)

R has various types of ‘data types’ which include vector (numeric, integer etc), matrices, data frames, and list. Let’s understand them one by one.

Vector

A vector contains object of same class. It contains element of the same type. The data types can be logical, integer, double, character, complex or raw.
But, you can mix objects of different classes too. When objects of different classes are mixed in a list, coercion occurs. This effect causes the objects of different types to ‘convert’ into one class. Coercion is from lower to higher types from logical to integer to double to character.
For example:
Vectors are generally created using the c() function which is used to combine or concatenate.

List

A list is a special type of vector which contains elements of different data types.
For example:
List can be created using the list() function.

Matrices

Matrices are the R objects in which the elements are arranged in a two-dimensional rectangular layout. They contain elements of the same atomic types. Though we can create a matrix containing only characters or only logical values, they are not of much use. We use matrices containing numeric elements to be used in mathematical calculations.
A Matrix is created using the matrix() function.

The basic syntax for creating a matrix in R is −
matrix(data, nrow, ncol, byrow, dimnames)

Where, data is the input vector which becomes the data elements of the matrix
nrow is the number of rows to be created
ncol is the number of columns to be created
byrow is a logical clue. If TRUE then the input vector elements are arranged by row.
dimname is the names assigned to the rows and columns

Factors
Factor is a data structure used for fields that take only predefined, finite number of values (categorical data).
For example, a data field such as marital status may contain only values from single, married, separated, divorced, or widowed.
In such case, we know the possible values beforehand and these predefined, distinct values are called levels. Following is an example of factor in R.

Data Frames

This is the most commonly used member of data types family. It is used to store tabular data. It is different from matrix. In a matrix, every element must have the same class. But, in a data frame, you can put list of vectors containing different classes. This means every column of a data frame acts like a list.
For example,

Functions in R

Functions are used to logically break our code into simpler parts which become easy to maintain and understand.
Syntax for writing a function in R:

R provides certain number of built-in functions like seq(), mean(), max(), sum(x) and paste(…) etc. :

User-defined functions
Let’s start with our own Hello World program!

So, that was the introduction to R. We’ll dive deeper into R in my further blogs. 🙂

References:

  1. Beginning R The Statistical Programming Language – Dr. Mark Gardener
  2. https://www.programiz.com/r-programming
  3. https://www.analyticsvidhya.com/blog/2016/02/complete-tutorial-learn-data-science-scratch/

KNOLDUS-advt-sticker

Written by 

Tech Enthusiast

3 thoughts on “R-The Statistical Programming Language6 min read

    1. @Matt Sandy
      By definition, we say that a list contains elements of different data types but yes it can have different data structures as well. Eg. List contains a list or a dataframe.

Comments are closed.