2  Base R Review

2.1 Vectors

Consider this vector:

char_vector <- c("data", "science", "is", "fun")
char_vector
[1] "data"    "science" "is"      "fun"    

We can add additional elements to the vector via:

char_vector2 <- c(char_vector,"!!")
char_vector2
[1] "data"    "science" "is"      "fun"     "!!"     

Length of a vector:

length(char_vector2)
[1] 5

You can index an element of a vector:

char_vector2[2]
[1] "science"

The class function will tell us what type of data is contained in the assignment:

class(char_vector2)
[1] "character"

2.2 Dealing with missing data

First we create a vector with missing values:

x <- c("a", NA, "c", "d", NA)

we can see which elements are missing in x with the is.na() function:

[1] FALSE  TRUE FALSE FALSE  TRUE

we can create a table of missing data:


FALSE  TRUE 
    3     2 

Note that empty strings are not recorded as missing in R

x <- c("a", "", "c", "d", "")
is.na(x)
[1] FALSE FALSE FALSE FALSE FALSE

Therefore, it is important to understand the data and what should be considered “missing”, once this is known we can recode the missing data:

x <- c("a", "", "c", "d", "")
x2 = ifelse(x == "", NA, x)
is.na(x2)
[1] FALSE  TRUE FALSE FALSE  TRUE

2.3 Explore factors

The class of this vector will be character:

gender <- c("male", "female","female","male")
class(gender)
[1] "character"

We will now coerce gender to a factor:

gender <- factor(gender)
class(gender)
[1] "factor"

Now that gender is a factor we can take a look at the ‘levels’ that have been assigned to the values:

levels(gender)
[1] "female" "male"  

2.4 Explore and manipulate data frames

We can combine multiple vectors to form a data frame

gender <- c("male", "female", "female","male","female","male")
weight <- c(170, 161, 192, 205, 122, 155)
age <- c(25, 33, 59, 19, 47, 66)

df <- data.frame(gender, weight, age)

df
  gender weight age
1   male    170  25
2 female    161  33
3 female    192  59
4   male    205  19
5 female    122  47
6   male    155  66

Look at the class of the data frame:

class(df)
[1] "data.frame"

Look at dimensions of data frame (rows and columns):

dim(df)
[1] 6 3
nrow(df)
[1] 6
ncol(df)
[1] 3

Look at the first 3 rows

head(df,3)
  gender weight age
1   male    170  25
2 female    161  33
3 female    192  59

What is the value in the 2nd row and 3rd column? The first number represents the row number and the second is the column number:

df[2,3]
[1] 33

What are the values just in the 3rd column?

df[,3]
[1] 25 33 59 19 47 66

Another way to pull values from a column is to index on the column name:

df[,c("weight","age")]
  weight age
1    170  25
2    161  33
3    192  59
4    205  19
5    122  47
6    155  66

You can also pull values from a column by calling the variable name with a $:

df$age
[1] 25 33 59 19 47 66

2.5 Create new variables

Create a weight dummy for weights > 150 vs <= 150

df$weight_dum <- ifelse(df$weight> 150, 1, 0)
head(df)
  gender weight age weight_dum
1   male    170  25          1
2 female    161  33          1
3 female    192  59          1
4   male    205  19          1
5 female    122  47          0
6   male    155  66          1

Create a factor variable for age groups

df$agecat <- ifelse(df$age > 65, "Senior",
                    ifelse(df$age > 45 & df$age <=75, "Middle Aged", "Young"))

head(df)
  gender weight age weight_dum      agecat
1   male    170  25          1       Young
2 female    161  33          1       Young
3 female    192  59          1 Middle Aged
4   male    205  19          1       Young
5 female    122  47          0 Middle Aged
6   male    155  66          1      Senior

See how many patients in each category of age

table(df$agecat)

Middle Aged      Senior       Young 
          2           1           3 

2.6 Create your own function

bmi_function <- function(kg,m) {
  # divide kilograms by meters squared              
  kg / m^2
}

bmi_function(72, 1.6)
[1] 28.125

2.7 Remove missing from a data frame

id <- c('Patient A', 'Patient B', 'Patient C')
weight <- c(123, 145, NA)

df <- data.frame(id, weight)
df
         id weight
1 Patient A    123
2 Patient B    145
3 Patient C     NA
         id weight
1 Patient A    123
2 Patient B    145