Worksheet Week 7
Self-Assessment Questions10
- Why do we need to calculate standard errors of estimators?
- Describe the relationship between confidence intervals and hypothesis testing.
- What does \(\sigma^2\) represent substantively?
Please stop here and don’t go beyond this point until we have compared notes on your answers.
Regression – Standard Errors of Coefficients
- Using the following regression calculations, determine the size of the standard errors of \(\hat{\beta}_{0}\) and \(\hat{\beta}_{1}\)
- in tabular form
- in matrix form
i | age (x) | income (y) | \(y-\bar{y}\) | \((y-\bar{y})^2\) | \(x-\bar{x}\) | \((x-\bar{x})^2\) | \((x-\bar{x})(y-\bar{y})\) | \(\hat{y}\) |
---|---|---|---|---|---|---|---|---|
1 | 22 | 700 | 700 | 490000 | 22 | 484 | 15400 | 820.3 |
2 | 19 | 650 | 650 | 422500 | 19 | 361 | 12350 | 701.2 |
3 | 56 | 2300 | 2300 | 5290000 | 56 | 3136 | 128800 | 2170.1 |
4 | 45 | 1900 | 1900 | 3610000 | 45 | 2025 | 85500 | 1733.4 |
5 | 37 | 2000 | 2000 | 4000000 | 37 | 1369 | 74000 | 1415.8 |
6 | 23 | 900 | 900 | 810000 | 23 | 529 | 20700 | 860.0 |
7 | 32 | 1000 | 1000 | 1000000 | 32 | 1024 | 32000 | 1217.3 |
8 | 65 | 2500 | 2500 | 6250000 | 65 | 4225 | 162500 | 2527.4 |
9 | 43 | 1800 | 1800 | 3240000 | 43 | 1849 | 77400 | 1654.0 |
10 | 48 | 1200 | 1200 | 1440000 | 48 | 2304 | 57600 | 1852.5 |
MEAN | 39 | 1495 | ||||||
SUM | 4202250 | 2096 | 83200 |
Regression – Hypothesis Testing & Confidence Intervals
Consider the following regression, where gdp
indicates Gross Domestic Product (PPP) in 2005 US Dollars, and life
indicates life expectancy at birth in years. I am running the regression with the first line and store the results in the object wdi
. The second line asks R to display the detailed results for the regression. Data are taken from World Bank (2024), Boix et al. (2018), and Marshall & Gurr (2020).
model1 <- lm(gdppc ~ lifeexp, data = wdi)
summary(model1)
Call:
lm(formula = gdppc ~ lifeexp, data = wdi)
Residuals:
Min 1Q Median 3Q Max
-18457 -9877 -4187 4963 78242
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -94306.8 10406.7 -9.062 3.33e-16 ***
lifeexp 1503.2 144.4 10.412 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 14690 on 167 degrees of freedom
(26 observations deleted due to missingness)
Multiple R-squared: 0.3936, Adjusted R-squared: 0.39
F-statistic: 108.4 on 1 and 167 DF, p-value: < 2.2e-16
- Formulate the null and alternative hypotheses which is tested in this model.
- Build the regression function.
- Is the coefficient significant?
- Interpret the coefficients.
How would you interpret the coefficient if it was insignificant? Think carefully about what an insignificant result means in plain English to answer this question.
- Plot the regression function in a suitable diagram using
ggplot
. - Explain how the t-value for
life
is obtained. - What do our results mean for the hypotheses?
- What does the value of “Multiple R-Squared” (this is equivalent to the R-Squared we calculated by hand last week) mean?
- Calculate the 95% confidence intervals for the coefficient
life
and the intercept. Compare your results to the R output below. - Find two explanations in the output for why the coefficient for
life
is statistically significant at the 5% level?
confint(model1,level = 0.95)
2.5 % 97.5 %
(Intercept) -114852.512 -73761.11
lifeexp 1218.183 1788.24
Solutions
You can find the Solutions in the Downloads Section.
Some of the content of this worksheet is taken from Reiche (forthcoming).↩︎