We will assign the data from GlobalLandTemperaturesByCountry.csv
to a variable called mydata
Import dataset > From text (base)
Input
mydata <- read.csv("GlobalLandTemperaturesByCountry.csv")
You can inspect the first few lines of mydata
with head
Input
head(mydata, 5)
Input
names(mydata)
Input
head(mydata, n = 10)
Input
table(mydata$Country)
Input
is.factor(mydata$Country)
Input
mydata$Country <- as.factor(mydata$Country)
Input
is.factor(mydata$Country)
Input
is.numeric(mydata$AverageTemperature)
You might have noticed NA
in some rows of the dataset, what are they?
Input
head(mydata, n = 10)
We can remove NA
s with na.omit
Input
mydata <- na.omit(mydata)
Input
head(mydata, n = 10)
Useful when your data has many columns and you only need a subset of them
Input
select(mydata, dt, Country)
Input
filter(mydata, Country=="Canada")
Pipe |
filter any row that matches either condition
Input
filter(mydata, Country=="Canada" | Country == "China")
Ampersand &
filter any row that matches both conditions
Input
filter(mydata, Country=="Canada" & AverageTemperature > 12)
We might need new columns representing operations on previously existing data. This is required if, for instance, we want to create a numeric variable named year
or to create a categorical variable named era
, which represents if the measurement was from the electronic
or gas & oil
era.
Input
mydata <- mutate(mydata, year = as.numeric(format(as.Date(dt), "%Y")))
There are multiple commands packed in the mutate operation. Take a look at each transformation, step by step:
Input
format(as.Date(mydata$dt), "%Y")
as.numeric(format(as.Date(mydata$dt), "%Y"))
mutate(mydata, year = as.numeric(format(as.Date(dt), "%Y")))
mydata <- mutate(mydata, year = as.numeric(format(as.Date(dt), "%Y")))
Input
head(mydata)
is.numeric(mydata$year)
We use a if_else
comparison. If year
is less or equal to 1969, we assign the value gas & oil
. Otherwise, we assign the value electronic
.
Input
mydata <- mutate(mydata, era=if_else(
year <= 1969, "gas & oil", "electronic",
))
head(mydata, n=5)