Methods, Methods, Methods
This week’s method is the two-sample-test, for both means and proportions.
Just as last week we will be working with the American National Election Studies (ANES), and to be more precise with the pilot survey conducted before the 2020 presidential election. If you haven’t already done so, you will have to register with ANES in order to download the data set. To do so, please follow this link.
Data Prep
Place the ANES data in a folder which you will be using as a working directory for this session. Open the “Code for Data Preparation” below, and copy this into an RScript. Remember to adjust the working directory with the setwd()
command at the beginning. Then run the RScript and you will be ready to proceed to the video.
Code for Data Preparation
######################################
# MMM - Week 2 - Data Preparation
######################################
# Set WD
setwd()
# Load packages
library(tidyverse)
# Load data set
anes <- read_csv("anes.csv")
# Get rid of missing values for variables used in analysis today
## 999 is equivalent to NA, so needs to be recoded
anes$fttrump1 <- with(anes, replace(fttrump1, fttrump1 == 999, NA))
anes$income <- with(anes, replace(income, income == 99, NA))
anes <- filter(anes,
!is.na(fttrump1),
!is.na(sex),
!is.na(income))
# turn support for Trump `fttrump1` into binary variable measuring support
anes <- anes %>%
mutate(trump=
cut(fttrump1, breaks=c(-1, 51, 100),
labels=c("no","yes")))
anes$trump <- as.factor(anes$trump)
# Label variable `sex`
anes$sex <- factor(anes$sex)
anes <- anes %>%
mutate(sex=
recode(sex,"1"="Male",
"2"="Female"))
anes$sex <- as.factor(anes$sex)
# Turn income variable into a numerical variable with mid-points of each level
anes$income <- factor(anes$income)
anes <- anes %>%
mutate(income_fac = recode(income,
'1'= "2500",
'2'= "7499.5",
'3'= "12499.5",
'4'= "17499.5",
'5'= "22499.5",
'6'= "27499.5",
'7'= "32499.5",
'8'= "37499.5",
'9'= "42499.5",
'10'= "47499.5",
'11'= "52499.5",
'12'= "57499.5",
'13'= "62499.5",
'14'= "67499.5",
'15'= "72499.5",
'16'= "77499.5",
'17'= "82499.5",
'18'= "87499.5",
'19'= "92499.5",
'20'= "97499.5",
'21'= "112499.5",
'22'= "137499.5",
'23'= "162499.5",
'24'= "187499.5",
'25'= "224999.5",
'26'= "500000"))
anes$inc <- as.numeric(as.character(anes$income_fac))
# save data set for use in video
write.csv(anes, "anes_week2.csv")
Video and RScript
You can find the video introducing you to this week’s method by way of a worked example below. You can also access the code I am typing up in the video in the “Code for Data Analysis” section. I would encourage you to type it yourself, though, as then code tends to better sink into the depths of your brain 😉
Code for Data Analysis
######################################
# MMM - Week 2 - t-Test
######################################
# Set WD
setwd()
# Load packages
library(tidyverse)
# Load data set
anes <- read_csv("anes_week2.csv")
anes$sex <- as.factor(anes$sex)
anes$trump <- as.factor(anes$trump)
# tTest for a proportion
############################
# Is the proportion of men supporting Trump higher than the proportion of women?
table(anes$sex, anes$trump)
# are proportions equal?
prop.test(c(606,682),c(1600,1469), correct=F)
#is the proportion of women larger?
prop.test(c(606,682),c(1600,1469), correct=F, alternative = "greater")
# # tTest for a mean
############################
# Do men earn more in the US than women?
install.packages("car")
library(car)
leveneTest(anes$inc ~ anes$sex)
# we reject the null, variances are not equal
t.test(inc ~ sex, data=anes, var.equal = FALSE)
table(anes$sex)
t.test(inc ~ sex, data=anes, var.equal = FALSE, alternative = "less")
Two-Sample Tests in R