Worksheet Week 4
Self-Assessment Questions8
- How does OLS fit the regression line?
- Why do we have to square the residuals for OLS?
- What are the advantages of working with matrices in regression analysis?
- Explain the concept of an identity matrix.
Please stop here and don’t go beyond this point until we have compared notes on your answers.
Regression – Calculations
Consider the following data set:
i | age (x) | income (y) |
---|---|---|
1 | 22 | 700 |
2 | 19 | 650 |
3 | 56 | 2300 |
4 | 45 | 1900 |
5 | 37 | 2000 |
6 | 23 | 900 |
7 | 32 | 1000 |
8 | 65 | 2500 |
9 | 43 | 1800 |
10 | 48 | 1200 |
- Plot the data in Table 3 in a suitable scatter plot. Yes, on paper.
- Fit a line of best fit through the scatter plot (by eyeballing and a ruler).
- Assuming a regression model of the type \(Y_{i}=\beta_{0}+ \beta_{1}X_{i}+\epsilon_{i}\), calculate the estimators for \(\beta_{0}\) and \(\beta_{1}\).
Create a table like the one I used in the lecture. How do do the intermediary calculations in this table relate to the formulae for the coefficients?
- Calculate the regression coefficients \(\hat{\beta}_{0}\) and \(\hat{\beta}_{1}\) using matrices.
- Specify the SRF and interpret the estimators of \(\beta_{0}\) and \(\beta_{1}\).
Regression in R
- Load the data set from Excel.
You can also enter the values of a table manually using the matrix()
function as follows:
- Run the regression in R as follows:
regression <- lm(income ~ age, data = incomedata)
summary(regression)
Call:
lm(formula = income ~ age, data = incomedata)
Residuals:
Min 1Q Median 3Q Max
-652.25 -102.92 6.53 142.21 584.39
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -53.092 304.716 -0.174 0.866011
age 39.695 7.325 5.419 0.000631 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 335.3 on 8 degrees of freedom
Multiple R-squared: 0.7859, Adjusted R-squared: 0.7592
F-statistic: 29.37 on 1 and 8 DF, p-value: 0.0006314
- Check your results from the first section.
Homework for Week 5
- Read the items marked “essential” on the reading list (see Talis)
- Work through this week’s flashcards to familiarise yourself with the relevant R functions.
- Find an example for each NEW function and apply it in R to ensure it works
- Complete the Week 4 Moodle Quiz
- Work through the Week 5 “Methods, Methods, Methods” Section.
Solutions
You can find the Solutions in the Downloads Section.
Make up a data set like the one in this worksheet and practice calculations. You can check your results either with R, or with the Excel sheet in the solutions if you want the intermidiary results.
Some of the content of this worksheet is taken from Reiche (forthcoming).↩︎