Worksheet Week 5
Self-Assessment Questions9
- Why do we need a measure to assess goodness of fit?
- How do you interpret R-Squared?
- Explain in your own words what the residual sum of squares (RSS) means.
- Explain one of the mathematical properties of OLS in your own words.
Please stop here and don’t go beyond this point until we have compared notes on your answers.
Mathematical Properties of OLS
- I showed in the lecture that the predicted values of y are uncorrelated with the residuals. Show mathematically that this is not the case for the error terms.
Hint: Consider carefully Equation 2 on Slide 8.
Goodness of Fit – Using R
We will be using our WDI example again to explore model fit in R. Data are taken from World Bank (2024), Boix et al. (2018), and Marshall & Gurr (2020).
- Download the WDI_PO12Q.csv data set. Set your working directory and load the data set as an object called
wdi
. - We will re-assess our question from Week 1, whether the level of GDP has an influence on life expectancy
- State the null and the alternative hypothesis (directional)
- Run the regression model.
wdi <- read.csv("files/Week 5/WDI_PO12Q.csv")
attach(wdi)
model <- lm(lifeexp ~ gdppc)
summary(model)
Call:
lm(formula = lifeexp ~ gdppc)
Residuals:
Min 1Q Median 3Q Max
-17.445 -3.260 1.545 4.760 8.969
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 6.815e+01 5.797e-01 117.56 <2e-16 ***
gdppc 2.619e-04 2.515e-05 10.41 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 6.129 on 167 degrees of freedom
(26 observations deleted due to missingness)
Multiple R-squared: 0.3936, Adjusted R-squared: 0.39
F-statistic: 108.4 on 1 and 167 DF, p-value: < 2.2e-16
- Build the SRF and interpret the coefficients.
- Assess and interpret model fit, using “Multiple R-squared” which is equivalent to the R\(^2\) we discussed in the lecture.
We have “only” covered the Estimate
column in this output so far. We will be working towards interpreting the rest over the next few weeks.
You can also extract specific blocks of the output table by placing brackets [] after the summary()
function. For example summary()[1]
. Try to extract the block containing model fit, like this:
$r.squared
[1] 0.3936356
Goodness of Fit – By Hand
- Using the data and calculations presented in the Table below, calculate the coefficient of determination, \(r^{2}\), with \(\hat{Y_{i}}= -53.1 + 39.7 X_{i}\)
i | age (x) | income (y) | \(y-\bar{y}\) | \((y-\bar{y})^2\) |
---|---|---|---|---|
1 | 22 | 700 | 700 | 490000 |
2 | 19 | 650 | 650 | 422500 |
3 | 56 | 2300 | 2300 | 5290000 |
4 | 45 | 1900 | 1900 | 3610000 |
5 | 37 | 2000 | 2000 | 4000000 |
6 | 23 | 900 | 900 | 810000 |
7 | 32 | 1000 | 1000 | 1000000 |
8 | 65 | 2500 | 2500 | 6250000 |
9 | 43 | 1800 | 1800 | 3240000 |
10 | 48 | 1200 | 1200 | 1440000 |
Homework for Week 7
- Prepare for the in-class test in Week 7, see the Reading Week Section.
- Read the items marked “essential” on the reading list (see Talis)
- Work through this week’s flashcards to familiarise yourself with the relevant R functions.
- Find an example for each NEW function and apply it in R to ensure it works
- Complete the Week 5 Moodle Quiz
- Work through the Week 7 “Methods, Methods, Methods” Section
Some of the content of this worksheet is taken from Reiche (forthcoming).↩︎