Worksheet Week 7

Self-Assessment Questions10

  1. Why do we need to calculate standard errors of estimators?
  2. Describe the relationship between confidence intervals and hypothesis testing.
  3. What does \(\sigma^2\) represent substantively?

Please stop here and don’t go beyond this point until we have compared notes on your answers.


Regression – Standard Errors of Coefficients

  1. Using the following regression calculations, determine the size of the standard errors of \(\hat{\beta}_{0}\) and \(\hat{\beta}_{1}\)
    1. in tabular form
    2. in matrix form
Table 5: Regression Data Set
i age (x) income (y) \(y-\bar{y}\) \((y-\bar{y})^2\) \(x-\bar{x}\) \((x-\bar{x})^2\) \((x-\bar{x})(y-\bar{y})\) \(\hat{y}\)
1 22 700 700 490000 22 484 15400 820.3
2 19 650 650 422500 19 361 12350 701.2
3 56 2300 2300 5290000 56 3136 128800 2170.1
4 45 1900 1900 3610000 45 2025 85500 1733.4
5 37 2000 2000 4000000 37 1369 74000 1415.8
6 23 900 900 810000 23 529 20700 860.0
7 32 1000 1000 1000000 32 1024 32000 1217.3
8 65 2500 2500 6250000 65 4225 162500 2527.4
9 43 1800 1800 3240000 43 1849 77400 1654.0
10 48 1200 1200 1440000 48 2304 57600 1852.5
MEAN 39 1495
SUM 4202250 2096 83200

Regression – Hypothesis Testing & Confidence Intervals

Consider the following regression, where gdp indicates Gross Domestic Product (PPP) in 2005 US Dollars, and life indicates life expectancy at birth in years. I am running the regression with the first line and store the results in the object wdi. The second line asks R to display the detailed results for the regression. Data are taken from World Bank (2024), Boix et al. (2018), and Marshall & Gurr (2020).

model1 <- lm(gdppc ~ lifeexp, data = wdi)
summary(model1)

Call:
lm(formula = gdppc ~ lifeexp, data = wdi)

Residuals:
   Min     1Q Median     3Q    Max 
-18457  -9877  -4187   4963  78242 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) -94306.8    10406.7  -9.062 3.33e-16 ***
lifeexp       1503.2      144.4  10.412  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 14690 on 167 degrees of freedom
  (26 observations deleted due to missingness)
Multiple R-squared:  0.3936,    Adjusted R-squared:   0.39 
F-statistic: 108.4 on 1 and 167 DF,  p-value: < 2.2e-16
  1. Formulate the null and alternative hypotheses which is tested in this model.
  2. Build the regression function.
  3. Is the coefficient significant?
  4. Interpret the coefficients.

How would you interpret the coefficient if it was insignificant? Think carefully about what an insignificant result means in plain English to answer this question.

  1. Plot the regression function in a suitable diagram using ggplot.
  2. Explain how the t-value for life is obtained.
  3. What do our results mean for the hypotheses?
  4. What does the value of “Multiple R-Squared” (this is equivalent to the R-Squared we calculated by hand last week) mean?
  5. Calculate the 95% confidence intervals for the coefficient life and the intercept. Compare your results to the R output below.
  6. Find two explanations in the output for why the coefficient for life is statistically significant at the 5% level?
confint(model1,level = 0.95)
                  2.5 %    97.5 %
(Intercept) -114852.512 -73761.11
lifeexp        1218.183   1788.24

Solutions

You can find the Solutions in the Downloads Section.


  1. Some of the content of this worksheet is taken from Reiche (forthcoming).↩︎