Worksheet Week 7
Self-Assessment Questions10
- Why does \(\hat{\beta}_1\) have a sampling distribution?
- What is the difference between \(\text{se}(\hat{\beta}_1)\) and \(\hat{\text{se}}(\hat{\beta}_1)\)?
- What does \(\sigma^2\) represent substantively?
- If \(\hat{\beta}_1=0\), why is \(\hat{\beta}_0=\bar{y}\)? Give two reasons.
Please stop here and don’t go beyond this point until we have compared notes on your answers.
Regression – Standard Errors of Coefficients
- Using the following regression calculations, determine the size of the standard errors of \(\hat{\beta}_{0}\) and \(\hat{\beta}_{1}\)
- in tabular form
- in matrix form
i | age (x) | income (y) | \(y-\bar{y}\) | \((y-\bar{y})^2\) | \(x-\bar{x}\) | \((x-\bar{x})^2\) | \((x-\bar{x})(y-\bar{y})\) | \(\hat{y}\) |
1 | 22 | 700 | -795 | 632025 | -17 | 289 | 13515 | 820.3 |
2 | 19 | 650 | -845 | 714025 | -20 | 400 | 16900 | 701.2 |
3 | 56 | 2300 | 805 | 648025 | 17 | 289 | 13685 | 2170.1 |
4 | 45 | 1900 | 405 | 164025 | 6 | 36 | 2430 | 1733.4 |
5 | 37 | 2000 | 505 | 255025 | -2 | 4 | -1010 | 1415.8 |
6 | 23 | 900 | -595 | 354025 | -16 | 256 | 9520 | 860.0 |
7 | 32 | 1000 | -495 | 245025 | -7 | 49 | 3465 | 1217.3 |
8 | 65 | 2500 | 1005 | 1010025 | 26 | 676 | 26130 | 2527.4 |
9 | 43 | 1800 | 305 | 93025 | 4 | 16 | 1220 | 1654.0 |
10 | 48 | 1200 | -295 | 87025 | 9 | 81 | -2655 | 1852.5 |
MEAN | 39 | 1495 | ||||||
SUM | 4202250 | 2096 | 83200 |
Regression – Hypothesis Testing & Confidence Intervals
Consider the following regression, where gdp
indicates Gross Domestic Product (PPP) in 2005 US Dollars, and life
indicates life expectancy at birth in years. I am running the regression with the first line and store the results in the object wdi
. The second line asks R to display the detailed results for the regression. Data are taken from World Bank (2024), Boix et al. (2018), and Marshall & Gurr (2020).
model1 <- lm(gdppc ~ lifeexp, data = wdi)
lm(formula = gdppc ~ lifeexp, data = wdi)
Min 1Q Median 3Q Max
-18457 -9877 -4187 4963 78242
Estimate Std. Error t value Pr(>|t|)
(Intercept) -94306.8 10406.7 -9.062 3.33e-16 ***
lifeexp 1503.2 144.4 10.412 < 2e-16 ***
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 14690 on 167 degrees of freedom
(26 observations deleted due to missingness)
Multiple R-squared: 0.3936, Adjusted R-squared: 0.39
F-statistic: 108.4 on 1 and 167 DF, p-value: < 2.2e-16
- Formulate the null and alternative hypotheses which is tested in this model.
- Build the regression function.
- Are the coefficients significant?
- Interpret the coefficients.
How would you interpret the coefficient if it was insignificant? Think carefully about what an insignificant result means in plain English to answer this question.
- Plot the regression function in a suitable diagram using
. - Explain how the t-value for
is obtained. - What do our results mean for the hypotheses?
- What does the value of “Multiple R-Squared” (this is equivalent to the R-Squared we calculated by hand last week) mean?
- Calculate the 95% confidence intervals for the coefficient
and the intercept. Compare your results to the R output below. - Find two explanations in the output for why the coefficient for
is statistically significant at the 5% level?
confint(model1,level = 0.95)
2.5 % 97.5 %
(Intercept) -114852.512 -73761.11
lifeexp 1218.183 1788.24
You can find the Solutions in the Downloads Section.
Some of the content of this worksheet is taken from Reiche (forthcoming).↩︎