Worksheet Week 7

Self-Assessment Questions10

  1. Why does \(\hat{\beta}_1\) have a sampling distribution?
  2. What is the difference between \(\text{se}(\hat{\beta}_1)\) and \(\hat{\text{se}}(\hat{\beta}_1)\)?
  3. What does \(\sigma^2\) represent substantively?
  4. If \(\hat{\beta}_1=0\), why is \(\hat{\beta}_0=\bar{y}\)? Give two reasons.

Please stop here and don’t go beyond this point until we have compared notes on your answers.


Regression – Standard Errors of Coefficients

  1. Using the following regression calculations, determine the size of the standard errors of \(\hat{\beta}_{0}\) and \(\hat{\beta}_{1}\)
    1. in tabular form
    2. in matrix form
Table 8: Regression Data Set
i age (x) income (y) \(y-\bar{y}\) \((y-\bar{y})^2\) \(x-\bar{x}\) \((x-\bar{x})^2\) \((x-\bar{x})(y-\bar{y})\) \(\hat{y}\)
1 22 700 -795 632025 -17 289 13515 820.3
2 19 650 -845 714025 -20 400 16900 701.2
3 56 2300 805 648025 17 289 13685 2170.1
4 45 1900 405 164025 6 36 2430 1733.4
5 37 2000 505 255025 -2 4 -1010 1415.8
6 23 900 -595 354025 -16 256 9520 860.0
7 32 1000 -495 245025 -7 49 3465 1217.3
8 65 2500 1005 1010025 26 676 26130 2527.4
9 43 1800 305 93025 4 16 1220 1654.0
10 48 1200 -295 87025 9 81 -2655 1852.5
MEAN 39 1495
SUM 4202250 2096 83200

Regression – Hypothesis Testing & Confidence Intervals

Consider the following regression, where gdp indicates Gross Domestic Product (PPP) in 2005 US Dollars, and life indicates life expectancy at birth in years. I am running the regression with the first line and store the results in the object wdi. The second line asks R to display the detailed results for the regression. Data are taken from World Bank (2024), Boix et al. (2018), and Marshall & Gurr (2020).

model1 <- lm(gdppc ~ lifeexp, data = wdi)
summary(model1)

Call:
lm(formula = gdppc ~ lifeexp, data = wdi)

Residuals:
   Min     1Q Median     3Q    Max 
-18457  -9877  -4187   4963  78242 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) -94306.8    10406.7  -9.062 3.33e-16 ***
lifeexp       1503.2      144.4  10.412  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 14690 on 167 degrees of freedom
  (26 observations deleted due to missingness)
Multiple R-squared:  0.3936,    Adjusted R-squared:   0.39 
F-statistic: 108.4 on 1 and 167 DF,  p-value: < 2.2e-16
  1. Formulate the null and alternative hypotheses which is tested in this model.
  2. Build the regression function.
  3. Are the coefficients significant?
  4. Interpret the coefficients.

How would you interpret the coefficient if it was insignificant? Think carefully about what an insignificant result means in plain English to answer this question.

  1. Plot the regression function in a suitable diagram using ggplot.
  2. Explain how the t-value for life is obtained.
  3. What do our results mean for the hypotheses?
  4. What does the value of “Multiple R-Squared” (this is equivalent to the R-Squared we calculated by hand last week) mean?
  5. Calculate the 95% confidence intervals for the coefficient life and the intercept. Compare your results to the R output below.
  6. Find two explanations in the output for why the coefficient for life is statistically significant at the 5% level?
confint(model1,level = 0.95)
                  2.5 %    97.5 %
(Intercept) -114852.512 -73761.11
lifeexp        1218.183   1788.24

Solutions

You can find the Solutions in the Downloads Section.


  1. Some of the content of this worksheet is taken from Reiche (forthcoming).↩︎