Maître de Conférences en Informatique à l'Université d'Angers
Ce site est en cours de reconstruction certains liens peuvent ne pas fonctionner ou certaines images peuvent ne pas s'afficher.
You can define different kinds of data like boolean, string, numeric value, vector, matrix and data frames (a matrix of different value types).
In R, T represents TRUE and F FALSE. You can use the operators of the C language to create a boolean expression.
> a <- T # a is TRUE > b <- T # b is TRUE > a & b # a AND b [1] TRUE > a | b # A OR b [1] TRUE > a | !b # A OR NOT(b) [1] TRUE > a & !b # A AND NOT(b) [1] FALSE
There are many functions that operate on strings, here are some examples.
> m <- "hello world!"
> nchar(m) # size of the string
[1] 12
> toupper(m) # convert to uppercase
[1] "HELLO WORLD!"
> substr(m, 7,4) ## extract substring(string, from, to), so here it will
## not work because 4<7
[1] ""
> substr(m, 7,11) ## extract substring from characters 7 to 11
[1] "world"
> paste(rep("=*=", 6))
[1] "=*=" "=*=" "=*=" "=*=" "=*=" "=*="
> stringi::stri_dup("=*=",6)
[1] "=*==*==*==*==*==*="
The function strsplit can cut a string in function of some pattern:
Let's consider the following string: "A␣␣␣␣text␣␣with␣spaces␣␣␣␣"
## use space to separate words
> strsplit("A text with spaces ", " ")
[[1]]
[1] "A" "" "" "" "text" "" "with" "spaces"
[9] "" "" "" ""
## use regular expression: the separator is represented by several spaces
> strsplit("A text with spaces ", "[ ]+", perl = T)
[[1]]
[1] "A" "text" "with" "spaces"
Numeric values are defined naturally:
> a <- 3.1415 > b <- a * 2.3 - 7.55 > a [1] 3.1415 > b [1] -0.32455
You can create a list (also called a vector) using different operators :
> x <- c(1.2, 3.5, 4, -7.2) > x [1] 1.2 3.5 4.0 -7.2 > y <- 1:5 > y [1] 1 2 3 4 5 ##/* create vector of 10 values initialized with 0 */ > x <- numeric(10) > x [1] 0 0 0 0 0 0 0 0 0 0
When you use seq(), you can specify to = or lengh.out = :
> seq(from = 1, to = 10, by = 0.5)
[1] 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0
[16] 8.5 9.0 9.5 10.0
> seq(from = 10, to = 0, by = -2)
[1] 10 8 6 4 2 0
> seq(from = 1, by = 2, length.out=3)
[1] 1 3 5
# repeat 'F'
rep('F', 4)
[1] "F" "F" "F" "F"
You can access and modify the contents of a vector, note that the first index of a vector starts at 1:
> x[1] [1] 1.2 > x[4] [1] -7.2 > x[4] <- 333.33 > x [1] 1.20 3.50 4.00 333.33
To get the list of variables you can use ls() and to remove a variable you can use rm(). See section Manage, save and restore environment for details.
> x <- c(1,3,5,7) > pi_div_2 = pi / 2 > pi = 3.1415 > pi_div_2 = pi / 2 > rasp_pi = 6.28 # two different variables > my.long.variable=1 > my_long_variable=2
A data frame is used for storing data tables. It is a list of vectors of equal length. In the following example you can see how to handle a data frame. We define information about Intel CPUs:
cpus <- c("i7-4790", "i5-3570K", "i5-7400", "i7-2600")
launch.date <- c("2014-04-01", "2012-04-01", "2017-01-01", "2011-01-01")
tdp <- c(84,77,64,95)
litho <- c(22,22,14,32)
cores <- c(4,4,4,4)
threads <- c(8,4,4,8)
mydata <- data.frame(cpus, launch.date, tdp, litho, cores, threads, stringsAsFactors = FALSE)
# read the data frame
> source("dfdef.rs")
> mydata
/*
cpus launch.date tdp litho cores threads
1 i7-4790 2014-04-01 84 22 4 8
2 i5-3570K 2012-04-01 77 22 4 4
3 i5-7400 2017-01-01 64 14 4 4
4 i7-2600 2011-01-01 95 32 4 8
*/
# show number of rows
> nrow(mydata)
[1] 4
# show number of columns
> ncol(mydata)
[1] 6
# get information about first column using numeric index
> mydata[1]
/*
cpus
1 i7-4790
2 i5-3570K
3 i5-7400
4 i7-2600
*/
# or use
> mydata[,1]
[1] i7-4790 i5-3570K i5-7400 i7-2600
Levels: i5-3570K i5-7400 i7-2600 i7-4790
# get information about first column's name
> mydata$cpus
[1] i7-4790 i5-3570K i5-7400 i7-2600
Levels: i5-3570K i5-7400 i7-2600 i7-4790
Note that you can modify the names of the columns during the definition of the data frame or after by using colnames():
# during definition we use tdp.in.W (Thermal Dissipation Power in Watts)
> mydata <- data.frame(cpus, launch.date, tdp.in.W = tdp, litho, cores, threads)
# get the names of the columns
> colnames(mydata)
[1] "cpus" "launch.date" "tdp.in.W" "litho" "cores"
[6] "threads"
# modify names of columnes
> colnames(mydata) <- c("CPUS", "launch", "TDP.W", "LITHO.NM", "CORES", "TH")
# insert new row
> newrow <- c("Pentium-M-760","2014-04-01",27,90,1,1)
> mydata <- rbind(mydata[1:2,], newrow, mydata[-(1:2),])
1 i7-4790 2014-04-01 84 22 4 8
2 i5-3570K 2012-04-01 77 22 4 4
3 Pentium-M-760 2014-04-01 27 90 1 1
31 i5-7400 2017-01-01 64 14 4 4
4 i7-2600 2011-01-01 95 32 4 8
#remove row
> mydata <- mydata[-c(4), ]
> mydata
cpus launch.date tdp litho cores threads
1 i7-4790 2014-04-01 84 22 4 8
2 i5-3570K 2012-04-01 77 22 4 4
3 Pentium-M-760 2014-04-01 27 90 1 1
4 i7-2600 2011-01-01 95 32 4 8
# or
> mydata <- mydata[-which(rownames(mydata) %in% c("31")),]
You can select information in a data frame by rows, by columns or using a filter.
# selection by columns
> mydata[,c("cpus","tdp")]
/*
cpus tdp
1 i7-4790 84
2 i5-3570K 77
3 i5-7400 64
4 i7-2600 95
*/
# selection by rows
> mydata[c(2:3),]
/*
cpus launch.date tdp litho cores threads
2 i5-3570K 2012-04-01 77 22 4 4
3 i5-7400 2017-01-01 64 14 4 4
*/
# selection by rows and columns
> mydata[c(2:3),c("cpus","tdp")]
/*
cpus tdp
2 i5-3570K 77
3 i5-7400 64
*/
# selection using a filter
# we want the cpus that have a thermal dissipation power
# greater than 80
> mydata[mydata$tdp > 80,]
/*
cpus launch.date tdp litho cores threads
1 i7-4790 2014-04-01 84 22 4 8
*/
# order using 1 column
> mydata[order(litho),]
/*
cpus launch.date tdp litho cores threads
3 i5-7400 2017-01-01 64 14 4 4
1 i7-4790 2014-04-01 84 22 4 8
2 i5-3570K 2012-04-01 77 22 4 4
4 i7-2600 2011-01-01 95 32 4 8
*/
# order using two columns
> mydata[order(litho,tdp),]
/*
cpus launch.date tdp litho cores threads
3 i5-7400 2017-01-01 64 14 4 4
2 i5-3570K 2012-04-01 77 22 4 4
1 i7-4790 2014-04-01 84 22 4 8
4 i7-2600 2011-01-01 95 32 4 8
*/