Topic 3 R Programming - Short Introduction

you might not think that programmers are artists, but programming is an extremely creative profession. It’s logic-based creativity. -John Romero

Here we will cover the basic structures of R programming, including control flow (if else) and loops (iteration routines) followed by writing our first function.
We will confine our discussion to the beginner’s level.

3.1 Programming Control Flow

Control flow (or flow control) is a well defined sequence of conditional statements, loops and statements which directs the R script (or code in generalised sense) to execute one thing or the other based on the conditions written in the program.

3.1.1 if-else Conditional Statements

We use if-else conditional statements when we want the R program to branch out in different directions based on a logical condition.
The following example compares the mean of two stocks and assigns a variable with the greater mean.

data_stocks = read.csv("data/us_stocks.csv")
# remove NAs from the data

data_stocks = na.omit(data_stocks)
m_msft = mean(data_stocks$MSFT)
m_aapl = mean(data_stocks$AAPL)
if (m_msft > m_aapl) {
    g_mean = m_msft
    message("Msft mean is higher")
} else {
    g_mean = m_aapl
    message("Aapl mean is higher")
}
g_mean  #print greater mean

[1] 207.7967

The if-else also works as a function call, the if-else call in the example above can be reduced to one line as follows.
Note that the curly brackets in case of just one statement are optional. They are required in case of a block operation. Its easy to just use them to avoid confusion.
R also has a function \(\mathtt{ifelse}\) which does the same operation as in example above. See \(\mathtt{help(ifelse)}\) for more details

# arguments to ifelse
args(ifelse)

function (test, yes, no) 
NULL

g_mean = ifelse(m_msft > m_aapl, m_msft, m_aapl)
g_mean

[1] 207.7967

3.1.2 Loops

Loops are the common feature in almost all the programming languages.
R provides three basic loops using \(\mathtt{for}\), \(\mathtt{while}\) and \(\mathtt{repeat}\).

# construct the loop
j = 0
for (i in 1:15) {
    j = j + i  #add i to j 
    print(j)  #print the sequential sum
}

[1] 1
[1] 3
[1] 6
[1] 10
[1] 15
[1] 21
[1] 28
[1] 36
[1] 45
[1] 55
[1] 66
[1] 78
[1] 91
[1] 105
[1] 120

3.1.3 \(\mathtt{while}\) loop

\(\mathtt{while}\) loop evaluates an expression or a function while a condition is TRUE. Lets repeat the above example using \(\mathtt{while}\) loop

# intialise j and i
j = 0
i = 1
# one can also use i<=15
while (i < 16) {
    j = j + i
    i = i + 1
    print(j)
}

[1] 1
[1] 3
[1] 6
[1] 10
[1] 15
[1] 21
[1] 28
[1] 36
[1] 45
[1] 55
[1] 66
[1] 78
[1] 91
[1] 105
[1] 120

3.1.4 \(\mathtt{repeat}\) loop

\(\mathtt{repeat}\) repeats the same expression till it is broken due to a condition.

# intialize
j = 0
i = 1
repeat {
    j = j + i
    i = i + 1
    print(j)
    if (i > 15)
        break
}

[1] 1
[1] 3
[1] 6
[1] 10
[1] 15
[1] 21
[1] 28
[1] 36
[1] 45
[1] 55
[1] 66
[1] 78
[1] 91
[1] 105
[1] 120

Together all these three loops can be used for iterative operations. The loops come handy when you have to iterate medium to large data by row or column ⁶.

3.2 Functions in R

In all research fields there are few statistical (or otherwise) calculations which are used frequently by the users, for example calculation of returns in finance research.
R provides the facility of creating specific functions to evaluate a set of arguments and return an output value which are stored as R objects.
The functions in R are created by the keyword \(\mathtt{function}\) which takes the following syntax.

\[\mathtt{function(arguments)}body\]

The arguments are the values/defaults/variables which are used in the body of the function to evaluate an expression.
The body of the function is enclosed in curly braces. The function is assigned to a named object and called by passing arguments to the object.
The following example illustrates by creating a function to calculate mean for all the columns in the data set \(\mathtt{data\_stocks}\)

# the following function takes 2 arguments, x a data frame, dates to indicate
# if there are dates in the first column
cal_mean = function(x, dates = TRUE) {
    num_cols = ncol(x)  #calculate the number of columns
    # num_cols=ifelse(dates==TRUE,num_cals-1,num_cals) lets use a list and a
    # loop to refresh our concepts
    m_stocks = list()  #creating an empty list

    # use for loop assign the starting value based on the dates column,we skip
    # dates column if they are present (dates are basically row names to more
    # generalised version will be to check for row names)
    l = ifelse(dates == TRUE, 2, 1)
    j = 1  #starting point in the list m_stocks
    for (i in l:num_cols) {
        m_stocks[[j]] = mean(x[, i])
        j = j + 1
    }
    names(m_stocks) = colnames(x[, l:num_cols])
    return(m_stocks)
}

# lets call the function cal_mean (output not shown)
cal_mean(data_stocks, TRUE)

$MSFT
[1] 26.91177

$IBM
[1] 122.3303

$AAPL
[1] 207.7967

$MCD
[1] 58.95141

$PG
[1] 61.32512

$GOOG
[1] 469.9453

# lets call the function with no dates column
cal_mean(data_stocks[, 2:ncol(data_stocks)], FALSE)

$MSFT
[1] 26.91177

$IBM
[1] 122.3303

$AAPL
[1] 207.7967

$MCD
[1] 58.95141

$PG
[1] 61.32512

$GOOG
[1] 469.9453

Using simple loops can be resource intensive for large datasets or operations. There are other approaches such as using iterative functions like \(\mathtt{lapply, sapply}\) etc or parallel computing methods to get better results in such cases.↩︎