Inference Statistics
Please make sure you have merged the two datasets. Check Merging for instructions.
Hypothesis testing
From Wikipedia:
To determine whether a result is statistically significant, a researcher calculates a p-value
, which is the probability of observing an effect of the same magnitude or more extreme given that the null hypothesis is true.
The null hypothesis is rejected if the p-value is less than a predetermined level, α
.
α
is called the significance level, and is the probability of rejecting the null hypothesis given that it is true (a type I error).
α
is usually set at or below 5%.
Our null hypotheses
H01 There is no difference in the Average temperature in the gas & oil
and the electronic
era
Independent T-test
It is often used to see whether there is a group difference in continuous data between two groups
We can only run a T-test if our model follows certain assumptions:
- Independence
- Normality
- Equal variance
Input
t.test(AverageTemperature ~ era, data=carbon, var.eq=TRUE)
Output
Two Sample t-test
data: AverageTemperature by era
t = 3.7437, df = 54, p-value = 0.0004415
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
0.1806106 0.5970976
sample estimates:
mean in group electronic mean in group gas & oil
19.13249 18.74364
Interpreting the results:
t
value guides our analysis. Read more at this linkdf = 54
degrees of freedom-
p-value < 0.0004415
is smaller thanα = 0.05
so that means that we can reject the null hypothesis - Which one seems higher?
- mean in group
gas & oil
=18.74364
- mean in group
eletronics
=19.13249
- mean in group
Correlation
H02 Is there any association between the AverageTemperature
and the AverageCarbonEmission
?
Pearson’s correlation
Is used to examine associations between variables (represented by continuous data) by looking at the direction and strength of the associations
Input
cor.test(carbon$AverageTemperature, carbon$AverageCarbonEmission, method="pearson")
Output
Pearson's product-moment correlation
data: carbon$AverageTemperature and carbon$AverageCarbonEmission
t = 14.919, df = 54, p-value < 2.2e-16
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
0.8299122 0.9386169
sample estimates:
cor
0.8970832
Interpreting the results:
-
p-value < 2.2e-16
so that means that there is statistically significant correlation betweentemperature
andcarbon emission
-
How strong is the correlation
cor
=0.8970832
-
Interpretation varies by research field so results should be interpreted with caution
-
cor
varies from-1
to1
positive values indicate that an increase in thex
variable increases they
variable. In this case, a value closer to1
means a strong positive correlation