Answer to exercise at descriptive stats:
Input
carbon <- read.csv("CarbonDioxideEmission.csv")
carbon <- na.omit(carbon)
carbon$CarbonDioxide <- as.numeric(carbon$CarbonDioxide)
carbon$year <- as.numeric(carbon$year)
head(carbon)
tail(carbon)
carbon %>%
group_by(year) %>%
summarise(avg_emission = mean(CarbonDioxide))
Can we merge data?
We now have 2 interesting datasets:
-
mydata
- we can obtain the average temperature of each year -
carbon
- we can obtain the average carbon emission of each year
Each dataset contains part of the data needed to answer our research questions:
-
H01 the earth average temperature has not dramatically increased since the advent of electronics
-
H02 the emission of carbon dioxide does not influence the earth average temperature
However, we may need to merge the two datasets to answer our research questions!
Notice that both datasets contain a column for the year. Let’s use year to merge the data!
merge( ): Merge two datasets using one or more columns
Input
carbon <- group_by(carbon, year) %>% # 1
summarise(AverageCarbonEmission = mean(CarbonDioxide)) # 2
newdata <- group_by(mydata, year, era) %>% # 3
summarise(AverageTemperature = mean(AverageTemperature)) # 4
carbon <- merge(newdata, carbon[, c("year", "AverageCarbonEmission")], by="year") # 5
A detailed explanation explained line-by-line:
-
the
carbon
data will be updated with the result of the right-hand operation. The right-hand operation groups the dataset by year -
after grouping, we apply the
summarise
to create a column namedcarbon
the value of the column is themean
of theCarbonDioxide
emission -
we do a similar grouping operation creating a
newdata
data. -
this time, we are interested in the mean of the
AverageTemperature
-
we now have two variables, one with the
year
andAverageCarbonEmission
and another with year andAverageTemperature
. Let’smerge
these two tables in a finalcarbon
table.-
by
indicates the key column to merge the two datasets- Our merging criteria is the
year
- Our merging criteria is the
-
We will copy two columns from the initial carbon table using
carbon[, c("year", "AverageCarbonEmission")]
- this is the same as
select(carbon, year, AverageCarbonEmission)
- this is the same as
-