char_vector <- c("data", "science", "is", "fun")
char_vector
[1] "data" "science" "is" "fun"
Consider this vector:
char_vector <- c("data", "science", "is", "fun")
char_vector
[1] "data" "science" "is" "fun"
We can add additional elements to the vector via:
char_vector2 <- c(char_vector,"!!")
char_vector2
[1] "data" "science" "is" "fun" "!!"
Length of a vector:
length(char_vector2)
[1] 5
You can index an element of a vector:
char_vector2[2]
[1] "science"
The class
function will tell us what type of data is contained in the assignment:
class(char_vector2)
[1] "character"
First we create a vector with missing values:
x <- c("a", NA, "c", "d", NA)
we can see which elements are missing in x
with the is.na()
function:
is.na(x)
[1] FALSE TRUE FALSE FALSE TRUE
we can create a table of missing data:
Note that empty strings are not recorded as missing in R
Therefore, it is important to understand the data and what should be considered “missing”, once this is known we can recode the missing data:
The class of this vector will be character:
We will now coerce gender to a factor:
Now that gender is a factor we can take a look at the ‘levels’ that have been assigned to the values:
levels(gender)
[1] "female" "male"
We can combine multiple vectors to form a data frame
gender <- c("male", "female", "female","male","female","male")
weight <- c(170, 161, 192, 205, 122, 155)
age <- c(25, 33, 59, 19, 47, 66)
df <- data.frame(gender, weight, age)
df
gender weight age
1 male 170 25
2 female 161 33
3 female 192 59
4 male 205 19
5 female 122 47
6 male 155 66
Look at the class of the data frame:
class(df)
[1] "data.frame"
Look at dimensions of data frame (rows and columns):
Look at the first 3 rows
head(df,3)
gender weight age
1 male 170 25
2 female 161 33
3 female 192 59
What is the value in the 2nd row and 3rd column? The first number represents the row number and the second is the column number:
df[2,3]
[1] 33
What are the values just in the 3rd column?
df[,3]
[1] 25 33 59 19 47 66
Another way to pull values from a column is to index on the column name:
df[,c("weight","age")]
weight age
1 170 25
2 161 33
3 192 59
4 205 19
5 122 47
6 155 66
You can also pull values from a column by calling the variable name with a $
:
df$age
[1] 25 33 59 19 47 66
Create a weight dummy for weights > 150 vs <= 150
gender weight age weight_dum
1 male 170 25 1
2 female 161 33 1
3 female 192 59 1
4 male 205 19 1
5 female 122 47 0
6 male 155 66 1
Create a factor variable for age groups
df$agecat <- ifelse(df$age > 65, "Senior",
ifelse(df$age > 45 & df$age <=75, "Middle Aged", "Young"))
head(df)
gender weight age weight_dum agecat
1 male 170 25 1 Young
2 female 161 33 1 Young
3 female 192 59 1 Middle Aged
4 male 205 19 1 Young
5 female 122 47 0 Middle Aged
6 male 155 66 1 Senior
See how many patients in each category of age
table(df$agecat)
Middle Aged Senior Young
2 1 3
bmi_function <- function(kg,m) {
# divide kilograms by meters squared
kg / m^2
}
bmi_function(72, 1.6)
[1] 28.125
id <- c('Patient A', 'Patient B', 'Patient C')
weight <- c(123, 145, NA)
df <- data.frame(id, weight)
df
id weight
1 Patient A 123
2 Patient B 145
3 Patient C NA
na.omit(df)
id weight
1 Patient A 123
2 Patient B 145