Methods, Methods, Methods

This week we will be testing two of the CLAs, namely homoscedasticity and colinearity. As always, we will use the American National Election Studies (ANES). I will draw on the data set and regression models from Week 8 to assess these assumptions.

Data Prep

Place the ANES data in a folder which you will be using as a working directory for this session. Open the “Code for Data Preparation” below, and copy this into an RScript. Remember to adjust the working directory with the setwd() command at the beginning. Then run the RScript and you will be ready to proceed to the video.

Code for Data Preparation

######################################
# MMM - Week 8 - Data Preparation
######################################


# Set WD
setwd()

# Load packages

library(tidyverse)

# Load data set

anes <- read_csv("anes.csv")

# Get rid of missing values for variables used in analysis today

## 999 is equivalent to NA, so needs to be recoded
anes$fttrump1 <- with(anes, replace(fttrump1, fttrump1 == 999, NA)) 
anes$income <- with(anes, replace(income, income == 99, NA)) 

anes <- filter(anes, 
               !is.na(fttrump1),
               !is.na(age),
               !is.na(income))

# Turn income variable into a numerical variable with mid-points of each level

anes$income <- factor(anes$income)
table(anes$income)
anes <- anes %>%
  mutate(income_fac = recode(income,
                             '1'= "2500",
                             '2'= "7499.5",
                             '3'= "12499.5",
                             '4'= "17499.5",                       
                             '5'= "22499.5",
                             '6'= "27499.5",
                             '7'= "32499.5",
                             '8'= "37499.5",
                             '9'= "42499.5",
                             '10'= "47499.5",
                             '11'= "52499.5",
                             '12'= "57499.5",
                             '13'= "62499.5",
                             '14'= "67499.5",
                             '15'= "72499.5",
                             '16'= "77499.5",
                             '17'= "82499.5",
                             '18'= "87499.5",
                             '19'= "92499.5",
                             '20'= "97499.5",
                             '21'= "112499.5",
                             '22'= "137499.5",
                             '23'= "162499.5",
                             '24'= "187499.5",
                             '25'= "224999.5",
                             '26'= "500000"))

anes$inc <- as.numeric(as.character(anes$income_fac))

# save data set for use in video

write.csv(anes, "anes_week10.csv")

Video and RScript

You can find the video introducing you to this week’s method by way of a worked example below. You can also access the code I am typing up in the video in the “Code for Data Analysis” section. I would encourage you to type it yourself, though, as then code tends to better sink into the depths of your brain 😉

Code for Data Analysis

######################################
# MMM - Week 10 - BLUE
######################################

# Set WD
setwd()

# Load packages

library(tidyverse)

# Load data set

anes <- read.csv("anes_week8.csv")


# Regression Models
############################

model1 <- lm(fttrump1 ~ inc, data = anes)

model2 <- lm(fttrump1 ~ age, data = anes)

model3 <- lm(fttrump1 ~ inc + age, data = anes)

# Testing for Homoscedasticity
################################

library(lmtest)

#Null Hypothesis: Homoscedasticity
#Alternative: Heteroscedasticity

bptest(model1)

bptest(model2)

library(sandwich)

coeftest(model2, vcov = vcovHC(model2, type="HC3"))

# Testing for Collinearity
################################

library(car)

vif(model3)

Making Your Regression BLUE