# Topic 3 R Programming - Short Introduction

you might not think that programmers are artists, but programming is an extremely creative profession. It’s logic-based creativity. -John Romero

• Here we will cover the basic structures of R programming, including control flow (if else) and loops (iteration routines) followed by writing our first function.

• We will confine our discussion to the beginner’s level.

## 3.1 Programming Control Flow

• Control flow (or flow control) is a well defined sequence of conditional statements, loops and statements which directs the R script (or code in generalised sense) to execute one thing or the other based on the conditions written in the program.

### 3.1.1 if-else Conditional Statements

• We use if-else conditional statements when we want the R program to branch out in different directions based on a logical condition.

• The following example compares the mean of two stocks and assigns a variable with the greater mean.

data_stocks = read.csv("data/us_stocks.csv")
# remove NAs from the data

data_stocks = na.omit(data_stocks)
m_msft = mean(data_stocks$MSFT) m_aapl = mean(data_stocks$AAPL)
if (m_msft > m_aapl) {
g_mean = m_msft
message("Msft mean is higher")
} else {
g_mean = m_aapl
message("Aapl mean is higher")
}
g_mean  #print greater mean
[1] 207.7967
• The if-else also works as a function call, the if-else call in the example above can be reduced to one line as follows.

• Note that the curly brackets in case of just one statement are optional. They are required in case of a block operation. Its easy to just use them to avoid confusion.

• R also has a function $$\mathtt{ifelse}$$ which does the same operation as in example above. See $$\mathtt{help(ifelse)}$$ for more details

# arguments to ifelse
args(ifelse)
function (test, yes, no)
NULL
g_mean = ifelse(m_msft > m_aapl, m_msft, m_aapl)
g_mean
[1] 207.7967

### 3.1.2 Loops

• Loops are the common feature in almost all the programming languages.

• R provides three basic loops using $$\mathtt{for}$$, $$\mathtt{while}$$ and $$\mathtt{repeat}$$.

# construct the loop
j = 0
for (i in 1:15) {
j = j + i  #add i to j
print(j)  #print the sequential sum
}
[1] 1
[1] 3
[1] 6
[1] 10
[1] 15
[1] 21
[1] 28
[1] 36
[1] 45
[1] 55
[1] 66
[1] 78
[1] 91
[1] 105
[1] 120

### 3.1.3$$\mathtt{while}$$ loop

• $$\mathtt{while}$$ loop evaluates an expression or a function while a condition is TRUE. Lets repeat the above example using $$\mathtt{while}$$ loop
# intialise j and i
j = 0
i = 1
# one can also use i<=15
while (i < 16) {
j = j + i
i = i + 1
print(j)
}
[1] 1
[1] 3
[1] 6
[1] 10
[1] 15
[1] 21
[1] 28
[1] 36
[1] 45
[1] 55
[1] 66
[1] 78
[1] 91
[1] 105
[1] 120

### 3.1.4$$\mathtt{repeat}$$ loop

• $$\mathtt{repeat}$$ repeats the same expression till it is broken due to a condition.
# intialize
j = 0
i = 1
repeat {
j = j + i
i = i + 1
print(j)
if (i > 15)
break
}
[1] 1
[1] 3
[1] 6
[1] 10
[1] 15
[1] 21
[1] 28
[1] 36
[1] 45
[1] 55
[1] 66
[1] 78
[1] 91
[1] 105
[1] 120

Together all these three loops can be used for iterative operations. The loops come handy when you have to iterate medium to large data by row or column 6.

## 3.2 Functions in R

• In all research fields there are few statistical (or otherwise) calculations which are used frequently by the users, for example calculation of returns in finance research.

• R provides the facility of creating specific functions to evaluate a set of arguments and return an output value which are stored as R objects.

• The functions in R are created by the keyword $$\mathtt{function}$$ which takes the following syntax.

$\mathtt{function(arguments)}body$

• The arguments are the values/defaults/variables which are used in the body of the function to evaluate an expression.

• The body of the function is enclosed in curly braces. The function is assigned to a named object and called by passing arguments to the object.

• The following example illustrates by creating a function to calculate mean for all the columns in the data set $$\mathtt{data\_stocks}$$

# the following function takes 2 arguments, x a data frame, dates to indicate
# if there are dates in the first column
cal_mean = function(x, dates = TRUE) {
num_cols = ncol(x)  #calculate the number of columns
# num_cols=ifelse(dates==TRUE,num_cals-1,num_cals) lets use a list and a
# loop to refresh our concepts
m_stocks = list()  #creating an empty list

# use for loop assign the starting value based on the dates column,we skip
# dates column if they are present (dates are basically row names to more
# generalised version will be to check for row names)
l = ifelse(dates == TRUE, 2, 1)
j = 1  #starting point in the list m_stocks
for (i in l:num_cols) {
m_stocks[[j]] = mean(x[, i])
j = j + 1
}
names(m_stocks) = colnames(x[, l:num_cols])
return(m_stocks)
}
# lets call the function cal_mean (output not shown)
cal_mean(data_stocks, TRUE)
$MSFT [1] 26.91177$IBM
[1] 122.3303

$AAPL [1] 207.7967$MCD
[1] 58.95141

$PG [1] 61.32512$GOOG
[1] 469.9453
# lets call the function with no dates column
cal_mean(data_stocks[, 2:ncol(data_stocks)], FALSE)
$MSFT [1] 26.91177$IBM
[1] 122.3303

$AAPL [1] 207.7967$MCD
[1] 58.95141

$PG [1] 61.32512$GOOG
[1] 469.9453

1. Using simple loops can be resource intensive for large datasets or operations. There are other approaches such as using iterative functions like $$\mathtt{lapply, sapply}$$ etc or parallel computing methods to get better results in such cases.↩︎