Worksheet Week 5

Self-Assessment Questions9

  1. Why do we need a measure to assess goodness of fit?
  2. How do you interpret R-Squared?
  3. Explain in your own words what the residual sum of squares (RSS) means.
  4. Explain one of the mathematical properties of OLS in your own words.

Please stop here and don’t go beyond this point until we have compared notes on your answers.


Mathematical Properties of OLS

  1. I showed in the lecture that the predicted values of y are uncorrelated with the residuals. Show mathematically that this is not the case for the error terms.

Hint: Consider carefully Equation 2 on Slide 8.


Goodness of Fit – Using R

We will be using our WDI example again to explore model fit in R. Data are taken from World Bank (2024), Boix et al. (2018), and Marshall & Gurr (2020).

  1. Download the WDI_PO12Q.csv data set. Set your working directory and load the data set as an object called wdi.
  2. We will re-assess our question from Week 1, whether the level of GDP has an influence on life expectancy
  3. State the null and the alternative hypothesis (directional)
  4. Run the regression model.
wdi <- read.csv("files/Week 5/WDI_PO12Q.csv")
attach(wdi)
model <- lm(lifeexp ~ gdppc)
summary(model)

Call:
lm(formula = lifeexp ~ gdppc)

Residuals:
    Min      1Q  Median      3Q     Max 
-17.445  -3.260   1.545   4.760   8.969 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) 6.815e+01  5.797e-01  117.56   <2e-16 ***
gdppc       2.619e-04  2.515e-05   10.41   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 6.129 on 167 degrees of freedom
  (26 observations deleted due to missingness)
Multiple R-squared:  0.3936,    Adjusted R-squared:   0.39 
F-statistic: 108.4 on 1 and 167 DF,  p-value: < 2.2e-16
  1. Build the SRF and interpret the coefficients.
  2. Assess and interpret model fit, using “Multiple R-squared” which is equivalent to the R\(^2\) we discussed in the lecture.

We have “only” covered the Estimate column in this output so far. We will be working towards interpreting the rest over the next few weeks.

You can also extract specific blocks of the output table by placing brackets [] after the summary() function. For example summary()[1]. Try to extract the block containing model fit, like this:

$r.squared
[1] 0.3936356

Goodness of Fit – By Hand

  1. Using the data and calculations presented in the Table below, calculate the coefficient of determination, \(r^{2}\), with \(\hat{Y_{i}}= -53.1 + 39.7 X_{i}\)
Table 4: Regression Data Set
i age (x) income (y) \(y-\bar{y}\) \((y-\bar{y})^2\)
1 22 700 700 490000
2 19 650 650 422500
3 56 2300 2300 5290000
4 45 1900 1900 3610000
5 37 2000 2000 4000000
6 23 900 900 810000
7 32 1000 1000 1000000
8 65 2500 2500 6250000
9 43 1800 1800 3240000
10 48 1200 1200 1440000

Homework for Week 7


Solutions

You can find the Solutions in the Downloads Section.



  1. Some of the content of this worksheet is taken from Reiche (forthcoming).↩︎