Topic 3 R Programming - Short Introduction
you might not think that programmers are artists, but programming is an extremely creative profession. It’s logic-based creativity. -John Romero
Here we will cover the basic structures of R programming, including control flow (if else) and loops (iteration routines) followed by writing our first function.
We will confine our discussion to the beginner’s level.
3.1 Programming Control Flow
- Control flow (or flow control) is a well defined sequence of conditional statements, loops and statements which directs the R script (or code in generalised sense) to execute one thing or the other based on the conditions written in the program.
3.1.1 if-else Conditional Statements
We use if-else conditional statements when we want the R program to branch out in different directions based on a logical condition.
The following example compares the mean of two stocks and assigns a variable with the greater mean.
= read.csv("data/us_stocks.csv")
data_stocks # remove NAs from the data
= na.omit(data_stocks)
data_stocks = mean(data_stocks$MSFT)
m_msft = mean(data_stocks$AAPL)
m_aapl if (m_msft > m_aapl) {
= m_msft
g_mean message("Msft mean is higher")
else {
} = m_aapl
g_mean message("Aapl mean is higher")
}#print greater mean g_mean
[1] 207.7967
The if-else also works as a function call, the if-else call in the example above can be reduced to one line as follows.
Note that the curly brackets in case of just one statement are optional. They are required in case of a block operation. Its easy to just use them to avoid confusion.
R also has a function \(\mathtt{ifelse}\) which does the same operation as in example above. See \(\mathtt{help(ifelse)}\) for more details
# arguments to ifelse
args(ifelse)
function (test, yes, no)
NULL
= ifelse(m_msft > m_aapl, m_msft, m_aapl)
g_mean g_mean
[1] 207.7967
3.1.2 Loops
Loops are the common feature in almost all the programming languages.
R provides three basic loops using \(\mathtt{for}\), \(\mathtt{while}\) and \(\mathtt{repeat}\).
# construct the loop
= 0
j for (i in 1:15) {
= j + i #add i to j
j print(j) #print the sequential sum
}
[1] 1
[1] 3
[1] 6
[1] 10
[1] 15
[1] 21
[1] 28
[1] 36
[1] 45
[1] 55
[1] 66
[1] 78
[1] 91
[1] 105
[1] 120
3.1.3 \(\mathtt{while}\) loop
- \(\mathtt{while}\) loop evaluates an expression or a function while a condition is TRUE. Lets repeat the above example using \(\mathtt{while}\) loop
# intialise j and i
= 0
j = 1
i # one can also use i<=15
while (i < 16) {
= j + i
j = i + 1
i print(j)
}
[1] 1
[1] 3
[1] 6
[1] 10
[1] 15
[1] 21
[1] 28
[1] 36
[1] 45
[1] 55
[1] 66
[1] 78
[1] 91
[1] 105
[1] 120
3.1.4 \(\mathtt{repeat}\) loop
- \(\mathtt{repeat}\) repeats the same expression till it is broken due to a condition.
# intialize
= 0
j = 1
i repeat {
= j + i
j = i + 1
i print(j)
if (i > 15)
break
}
[1] 1
[1] 3
[1] 6
[1] 10
[1] 15
[1] 21
[1] 28
[1] 36
[1] 45
[1] 55
[1] 66
[1] 78
[1] 91
[1] 105
[1] 120
Together all these three loops can be used for iterative operations. The loops come handy when you have to iterate medium to large data by row or column 6.
3.2 Functions in R
In all research fields there are few statistical (or otherwise) calculations which are used frequently by the users, for example calculation of returns in finance research.
R provides the facility of creating specific functions to evaluate a set of arguments and return an output value which are stored as R objects.
The functions in R are created by the keyword \(\mathtt{function}\) which takes the following syntax.
\[\mathtt{function(arguments)}body\]
The arguments are the values/defaults/variables which are used in the body of the function to evaluate an expression.
The body of the function is enclosed in curly braces. The function is assigned to a named object and called by passing arguments to the object.
The following example illustrates by creating a function to calculate mean for all the columns in the data set \(\mathtt{data\_stocks}\)
# the following function takes 2 arguments, x a data frame, dates to indicate
# if there are dates in the first column
= function(x, dates = TRUE) {
cal_mean = ncol(x) #calculate the number of columns
num_cols # num_cols=ifelse(dates==TRUE,num_cals-1,num_cals) lets use a list and a
# loop to refresh our concepts
= list() #creating an empty list
m_stocks
# use for loop assign the starting value based on the dates column,we skip
# dates column if they are present (dates are basically row names to more
# generalised version will be to check for row names)
= ifelse(dates == TRUE, 2, 1)
l = 1 #starting point in the list m_stocks
j for (i in l:num_cols) {
= mean(x[, i])
m_stocks[[j]] = j + 1
j
}names(m_stocks) = colnames(x[, l:num_cols])
return(m_stocks)
}
# lets call the function cal_mean (output not shown)
cal_mean(data_stocks, TRUE)
$MSFT
[1] 26.91177
$IBM
[1] 122.3303
$AAPL
[1] 207.7967
$MCD
[1] 58.95141
$PG
[1] 61.32512
$GOOG
[1] 469.9453
# lets call the function with no dates column
cal_mean(data_stocks[, 2:ncol(data_stocks)], FALSE)
$MSFT
[1] 26.91177
$IBM
[1] 122.3303
$AAPL
[1] 207.7967
$MCD
[1] 58.95141
$PG
[1] 61.32512
$GOOG
[1] 469.9453
Using simple loops can be resource intensive for large datasets or operations. There are other approaches such as using iterative functions like \(\mathtt{lapply, sapply}\) etc or parallel computing methods to get better results in such cases.↩︎