This document has no dependencies.

Improvements and corrections to this document can be submitted on its GitHub in its repository.

A very brief overview of core R object types and how to subset them.

- “An Introduction to R” ships with R and can also be access on the web (HTML | PDF). This introduction contains a lot of useful material but it is written very terse; you will need to pay close attention to the details. It is useful to re-read this introduction after you have used R for a while; you are likely to learn new details you had missed at first.

The most basic object in R is an atomic vector. Examples includes `numeric`

, `integer`

, `logical`

, `character`

and `factor`

. These objects have a single length and can have names, which can be used for indexing

```
x <- 1:10
names(x) <- letters[1:10]
class(x)
```

`## [1] "integer"`

`x[1:3]`

```
## a b c
## 1 2 3
```

`x[c("a", "b")]`

```
## a b
## 1 2
```

The following types of atomic vectors are used frequently

`numeric`

- for numeric values.`integer`

- for integer values.`character`

- for characters (strings).`factor`

- for factors.`logical`

- for logical values.

All vectors can have missing values.

Note: names of vectors does not need to be unique. This can lead to subsetting problems:

```
x <- 1:3
names(x) <- c("A", "A", "B")
x
```

```
## A A B
## 1 2 3
```

`x["A"]`

```
## A
## 1
```

Note that you don’t even get a warning, so watch out for non-unique names! You can check for unique names by using the functions `unique`

, `duplicated`

or (easiest) `anyDuplicated`

.

`anyDuplicated(names(x))`

`## [1] 2`

```
names(x) <- c("A", "B", "C")
anyDuplicated(names(x))
```

`## [1] 0`

`anyDuplicated`

returns the index of the first duplicated name, so `0`

indicates nothing is duplicated.

The default in R is to represent numbers as `numeric`

, NOT `integer`

. This is something that can usually be ignored, but you might run into some issues in Bioconductor with this. Note that even constructions that looks like `integer`

are really `numeric`

:

```
x <- 1
class(x)
```

`## [1] "numeric"`

```
x <- 1:3
class(x)
```

`## [1] "integer"`

The way to make sure to get an `integer`

in R is to append `L`

to the numbers

```
x <- 1L
class(x)
```

`## [1] "integer"`

So why the distinguishing between `integer`

and `numeric`

? Internally, the way computers represents and calculates numbers are different between `integer`

and `numeric`

.

`integer`

mathematics are different.`numeric`

can hold much larger values than`integer`

.`numeric`

takes up slightly more RAM (but nothing to worry about).

Point 2 is something you can sometimes run into, in Bioconductor. The maximum `integer`

is

`.Machine$integer.max`

`## [1] 2147483647`

`2^31 -1 == .Machine$integer.max`

`## [1] TRUE`

`round(.Machine$integer.max / 10^6, 1)`

`## [1] 2147.5`

This number is smaller than the number of bases in the human genome. So we sometimes (accidentally) add up numbers which exceeds this. The fix is to use `as.numeric`

to convert the `integer`

to `numeric`

.

This number is also the limit for how long an atomic vector can be. So you cannot have a single vector which is as long as the human genome. In R we are beginning to get support for something called “long vectors” which basically are … long vectors. But the support for long vectors is not yet pervasive.

`matrix`

is a two-dimensional object. All values in a `matrix`

has to have the same type (`numeric`

or `character`

or any of the other atomic vector types). It is optional to have `rownames`

or `colnames`

and these names does not have to be unique.

```
x <- matrix(1:9, ncol = 3, nrow = 3)
rownames(x) <- c("A","B", "B")
x
```

```
## [,1] [,2] [,3]
## A 1 4 7
## B 2 5 8
## B 3 6 9
```

`dim(x)`

`## [1] 3 3`

`nrow(x)`

`## [1] 3`

`ncol(x)`

`## [1] 3`

Subsetting is two-dimensional; the first dimension is rows and the second is columns. You can even subset with a matrix of the same dimension, but watch out for the return object.

`x[1:2,]`

```
## [,1] [,2] [,3]
## A 1 4 7
## B 2 5 8
```

`x["B",]`

`## [1] 2 5 8`

`x[x >= 5]`

`## [1] 5 6 7 8 9`

(note how subsetting with a non-unique name does not lead to an error). If you grab a single row or a single column from a `matrix`

you get a vector. Sometimes, it is really nice to get a `matrix`

; you do that by using `drop=FALSE`

in the subsetting:

`x[1,]`

`## [1] 1 4 7`

`x[1,,drop=FALSE]`

```
## [,1] [,2] [,3]
## A 1 4 7
```

There are a lot of mathematical operations working on matrices, for example `rowSums`

, `colSums`

and things like `eigen`

for eigenvector decomposition. I am a heavy user of the package *matrixStats* for the full suite of `rowXX`

and `colXX`

with `XX`

being any standard statistical function such as `sd()`

, `var()`

, `quantiles()`

etc.

Internally, a `matrix`

is just a `vector`

with a dimension attribute. In R we have column-first orientation, so the columns are filled up first:

`matrix(1:9, 3, 3)`

```
## [,1] [,2] [,3]
## [1,] 1 4 7
## [2,] 2 5 8
## [3,] 3 6 9
```

`matrix(1:9, 3, 3, byrow = TRUE)`

```
## [,1] [,2] [,3]
## [1,] 1 2 3
## [2,] 4 5 6
## [3,] 7 8 9
```

`list`

s are like `vector`

s, but can hold together objects of arbitrary kind.

```
x <- list(1:3, letters[1:3], is.numeric)
x
```

```
## [[1]]
## [1] 1 2 3
##
## [[2]]
## [1] "a" "b" "c"
##
## [[3]]
## function (x) .Primitive("is.numeric")
```

```
names(x) <- c("numbers", "letters", "function")
x[1:2]
```

```
## $numbers
## [1] 1 2 3
##
## $letters
## [1] "a" "b" "c"
```

`x[1]`

```
## $numbers
## [1] 1 2 3
```

`x[[1]]`

`## [1] 1 2 3`

See how subsetting creates another `list`

. To get to the actual content of the first element, you need double brackets `[[`

. The distinction between `[`

and `[[`

is critical to understand.

You can use `$`

on a named list. However, R has something called “partial” matching for `$`

:

`x$letters`

`## [1] "a" "b" "c"`

`x["letters"]`

```
## $letters
## [1] "a" "b" "c"
```

`x$let`

`## [1] "a" "b" "c"`

`x["let"]`

```
## $<NA>
## NULL
```

Trick: sometimes you want a list where each element is a single number. Use `as.list`

:

`as.list(1:3)`

```
## [[1]]
## [1] 1
##
## [[2]]
## [1] 2
##
## [[3]]
## [1] 3
```

`list(1:3)`

```
## [[1]]
## [1] 1 2 3
```

It is quite common to have a `list`

where each element is of the same kind, for example a `numeric`

vector. You can apply a function to each element in the `list`

by using `lapply()`

; this returns another `list`

which is named if the input is.

```
x <- list(a = rnorm(3), b = rnorm(3))
lapply(x, mean)
```

```
## $a
## [1] -0.6467477
##
## $b
## [1] 0.6932006
```

If the output of the function is of the same kind, you can simplify the output using `sapply`

(simplify apply). This is particularly useful if the function in question returns a single number.

`sapply(x, mean)`

```
## a b
## -0.6467477 0.6932006
```

`data.frame`

are fundamental to data analysis. They look like matrices, but each column can be a separate type, so you can mix and match different data types. They are required to have unique column and row names. If no rowname is given, it’ll use `1:nrow`

.

```
x <- data.frame(sex = c("M", "M", "F"), age = c(32,34,29))
x
```

```
## sex age
## 1 M 32
## 2 M 34
## 3 F 29
```

You access columns by `$`

or `[[`

:

`x$sex`

```
## [1] M M F
## Levels: F M
```

`x[["sex"]]`

```
## [1] M M F
## Levels: F M
```

Note how `sex`

was converted into a `factor`

. This is a frequent source of errors, so much that I highly encourage users to make sure they never have `factor`

s in their `data.frame`

s. This conversion can be disabled by `stringsAsfactors=FALSE`

:

```
x <- data.frame(sex = c("M", "M", "F"), age = c(32,34,29), stringsAsFactors = FALSE)
x$sex
```

`## [1] "M" "M" "F"`

Behind the scenes, a `data.frame`

is really a `list`

. Why does this matter? Well, for one, it allows you to use `lapply`

and `sapply`

across the columns:

`sapply(x, class)`

```
## sex age
## "character" "numeric"
```

We often have to convert R objects from one type to another. For basic R types (as described above), you have the `as.XX`

family of functions, with `XX`

being all the types of objects listed above.

`x`

```
## sex age
## 1 M 32
## 2 M 34
## 3 F 29
```

`as.matrix(x)`

```
## sex age
## [1,] "M" "32"
## [2,] "M" "34"
## [3,] "F" "29"
```

`as.list(x)`

```
## $sex
## [1] "M" "M" "F"
##
## $age
## [1] 32 34 29
```

When we convert the `data.frame`

to a `matrix`

it becomes a `character`

matrix, because there is a `character`

column and this is the only way to keep the contents.

For more “complicated” objects there is a suite of `as()`

functions, which you use as follows

```
library(methods)
as(x, "matrix")
```

```
## sex age
## [1,] "M" "32"
## [2,] "M" "34"
## [3,] "F" "29"
```

This is how you convert most Bioconductor objects.

```
## R version 3.2.1 (2015-06-18)
## Platform: x86_64-apple-darwin13.4.0 (64-bit)
## Running under: OS X 10.10.5 (Yosemite)
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## attached base packages:
## [1] methods stats graphics grDevices utils datasets base
##
## other attached packages:
## [1] BiocStyle_1.6.0 rmarkdown_0.7
##
## loaded via a namespace (and not attached):
## [1] magrittr_1.5 formatR_1.2 tools_3.2.1 htmltools_0.2.6
## [5] yaml_2.1.13 stringi_0.5-5 knitr_1.11 stringr_1.0.0
## [9] digest_0.6.8 evaluate_0.7.2
```