Another suggestion of my colleague in the office. He asked me about loops in R. But I decided to do a more general post, because loops are structures that you will find in every programming language. So it is interesting to understand how they work and then apply the concept to a specific language.
Loops are important when you want to repeat an action (or a group of actions) many times. There are two types of loops:
- loops to do something n times, called for loops or for do loops. For example, if you want to print the numbers 1 to 10.
- loops to do something until we get to a certain condition. These types of loops are called while loops. For example, from 10, subtract 1 and continue the subtraction until we reach less than 3.
In this post we will see only for do loops. While loops will come in another post.
For do loops
The most common type of loops, because usually we need to repeat some actions for different known scenarios. Known scenarios as, for example, run different models or the same model using different values or to check values in a matrix, cell by cell, etc.
See the pseudocode below:
for mod A B C D
do
run mod for set of observations 1
end for
Let’s analyze this pseudocode. In this example, we want to run 4 different models (A, B, C and D) for the same set of observations. In general (I don’t know if it is true for all languages) in the for do loop structures we have to set a start and end points. In this example the loop start with the word for and finish at sentence end for. So, the language will know that all actions between these two points should be repeated for each model.
As you can see in the middle of the loop, the name of the models doesn’t appear. Only the word mod is used. This is because mod is a variable that takes the value of each model in each round of the loop.
Translating the code to human language should result in something like:
for each model A, B, C and D stored in the variable mod in each round
run mod whit the set of values 1
when all models were used, finish
So, in summary, in a for do loop structure we will find the start and end points and between both points all the actions to do.
Another thing that we can do with loops is to nest loops into loops.
Suppose that you want to test each model, but for 2 different sets of data, you can do something like:
for mod A B C D
do
for nset 1 2
do
run mod for set of observations nset
end for
end for
So, in the pseudocode above, you will run each model twice; each time with one different set of observations. The red parts define the loop for models and the blue parts define the loop for the different sets of data. The green words are the variables to store models and data sets. The loop will be process like:
Round 1: Run mod = A with nset = data set 1
Round 2: Run mod = A with nset = data set 2
Round 3: Run mod = B with nset = data set 1
...
Round 8: Run mod = D with nset = data set 2
We will do some examples in R and bash.
Examples with R
# let's start for a very simple one
# If you will do only one thing you can write the command directly after the for() command
for( names in c("Paul", "Carol", "Julie", "Jim")) print(paste( "Hi, ", names, "!", sep=""))
# Or we can use {} to group all the actions included in the loop.
for( names in c("Paul", "Carol", "Julie", "Jim")) {
print(paste( "Hi, ", names, "!", sep=""))
}
# now nested for do loops
for(names in c("Paul", "Carol", "Julie", "Jim")){
for(hg in c("Hi", "Bye")){
print(paste(hg, ", ", names, "!", sep=""))
}
}
As you can see in R the loop start by the word for and between the “( )” it defines the variable to be used and the values that this variable will adopt. The actions over the values are explicit between the “{}” and the last “}” defines the end of the loop.
Note: If you will do only one action, in R it is possible to put the command just after the for(), like: for(i in 1:10) print(i). But as in most of the cases we will do more than one thing should be between “{}”
In the second example we have two nested loops to combine the names with a “Hi” and “Bye”.
An example of 2 nested loops could be to walk in a matrix:
## Example walking in a matrix cell by cell
x=matrix( rnorm(200), ncol = 10, nrow = 20)
for(i in 1:20){
for(j in 1:10){
if(x[i,j] <= 0) print(paste("in row=", i, " and column=", j, "we have=", x[i,j]))
}
}
So, let’s try a more complex problem.
Let’s simulate two sets of observation and two factors to test some models.
### Two data sets and 2 factors
# Prepare data
f1 = as.factor(sample(c(1:4), 100, replace = T))
f2 = as.factor(sample(c(1:10), 100, replace = T))
d1 = rnorm(100)
d2 = d1 + rnorm(100,10)
# string to be used in the loop to write the models
prepmod1 = "~f1"
prepmod2 = "~f2"
prepmod3 = "~f1+f2"
# and store them in a variable
models=c(prepmod1, prepmod2, prepmod3)
# Two loops the first one will iterate over the diferent factors
for(mod in models){
# the second one will go for the datasets
for(ds in c("d1", "d2")){
# prepare the formula to perform a lm()
form=as.formula(paste(ds,mod,sep=""))
# print the summary for each model
print(summary(lm(form)))
# save plots of the lm()
png(paste(ds,mod,".png",sep=""))
layout(matrix(1:4, ncol = 2))
plot(lm(form), main=form, ask = FALSE)
layout(1)
dev.off()
# Finish the second loop
}
# finish the first loop
}
All data were randomly generated so don’t expect any logical results. In addition, it is just an example to use different actions and two loops. It is not the best way to do this job.
Examples with bash
Just to see the structure for a different language, bellow you have the simple examples in bash.
# simple loop in bash
for names in Paul Carol Julie Jim
do
echo Hi, $names!
done
# nested loops in bash
for names in Paul Carol Julie Jim
do
for hg in Hi Bye
do
echo $hg, $names!
done
done
As you can see the loop structure is for (a set of values), do (something) and (finish with) done.
In another post we will talk about while loops.