Worksheet Week 4

Self-Assessment Questions8

  1. How does OLS fit the regression line?
  2. Why do we have to square the residuals for OLS?
  3. What are the advantages of working with matrices in regression analysis?
  4. Explain the concept of an identity matrix.

Please stop here and don’t go beyond this point until we have compared notes on your answers.


Regression – Calculations

Consider the following data set:

Table 3: Regression Data Set
i age (x) income (y)
1 22 700
2 19 650
3 56 2300
4 45 1900
5 37 2000
6 23 900
7 32 1000
8 65 2500
9 43 1800
10 48 1200
  1. Plot the data in Table 3 in a suitable scatter plot. Yes, on paper.
  2. Fit a line of best fit through the scatter plot (by eyeballing and a ruler).
  3. Assuming a regression model of the type \(Y_{i}=\beta_{0}+ \beta_{1}X_{i}+\epsilon_{i}\), calculate the estimators for \(\beta_{0}\) and \(\beta_{1}\).

Create a table like the one I used in the lecture. How do do the intermediary calculations in this table relate to the formulae for the coefficients?

  1. Calculate the regression coefficients \(\hat{\beta}_{0}\) and \(\hat{\beta}_{1}\) using matrices.
  2. Specify the SRF and interpret the estimators of \(\beta_{0}\) and \(\beta_{1}\).

Regression in R

  1. Load the data set from Excel.
library(readxl)

incomedata <- read_excel("files/Week 4/PO12Q_4.xlsx")

You can also enter the values of a table manually using the matrix() function as follows:

table <- matrix(c(22,19,56,45,37,23,32,65,43,48,
                  700,650,2300,1900,2000,900,1000,2500,2800,1200),
                nrow=10,ncol=2)
incomedata <- data.frame(table)
  1. Run the regression in R as follows:
regression <- lm(income ~ age, data = incomedata)
summary(regression)

Call:
lm(formula = income ~ age, data = incomedata)

Residuals:
    Min      1Q  Median      3Q     Max 
-652.25 -102.92    6.53  142.21  584.39 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  -53.092    304.716  -0.174 0.866011    
age           39.695      7.325   5.419 0.000631 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 335.3 on 8 degrees of freedom
Multiple R-squared:  0.7859,    Adjusted R-squared:  0.7592 
F-statistic: 29.37 on 1 and 8 DF,  p-value: 0.0006314
  1. Check your results from the first section.

Homework for Week 5


Solutions

You can find the Solutions in the Downloads Section.

Make up a data set like the one in this worksheet and practice calculations. You can check your results either with R, or with the Excel sheet in the solutions if you want the intermidiary results.



  1. Some of the content of this worksheet is taken from Reiche (forthcoming).↩︎