Worksheet Week 9
Group Work – Self-Reflection13
- How does the concept of the conditional expected value relate to the interpretation of coefficients in multiple regression?
- How do you select variables for a multiple regression model?
- Why do we apply non-linear transformations to variables?
- How do you interpret the effect of a logarithmized variable?
- Give an example for each of the variable transformations we discussed in the lecture.
Please stop here and don’t go beyond this point until we have compared notes on your answers.
Working with Regression Analysis
We will be using the “crime.csv” data available in the Downloads Section to analyse attitudes and experiences of crime in England and Wales. The data set contains several variables relating to questions asked about experience and perceptions of crime to a representative sample along with demographic details on the respondents. Data are taken from University of Manchester, Cathie Marsh Institute for Social Research (CMIST), UK Data Service, Office for National Statistics (2019). For variable labels, please consult the Crime Data Code Book.
Data Prep
Each respondent was randomly assigned to a different module, indicated by split
and only asked a subsection of the questions. The antisocx
variable is part of module A and asks for a score from respondents on how much antisocial behaviour is in the neighbourhood.
- Create a new data set for those who were chosen for the ‘A’ module. Call this
crime.a
.
Data Analysis
- Previous research has suggested men perceive more antisocial behaviour in rural areas than urban areas.
- Create a linear model using only male respondents from
crime.a
with the dependent variableantisocx
and the independent variablerural2
. Do your findings support previous research? - Test the same model using only women. How do the two models differ?
- Create a linear model using only male respondents from
- The
wburgl
variable asks respondents how worried they are about being burgled with answers ranging from “Very worried” to “Not at all worried” along with “Not applicable” and “Don’t know.”
- Recode those with “Not applicable” or “Don’t know” as NAs.
b.Using
agegrp7
andwburgl
as continuous variables, test the hypothesis that older people are more worried about being burgled. - Using a dummy variable test whether those over 65 have are more worried about burglary then those who are younger.
- Using dummies test whether any of the age groups differ significantly from the youngest age group.
- What does the intercept in each of the three models represent?
Going Further
I have written these exercises so that you can practise in R a little more. These are a bt more demanding, however.If you skip this section, please make sure to do the exercises in the next section, as this taps into the variable transformations we explored in the lecture.
- There are five variables which ask how worried the respondent is about being the victim of various crimes:
- Create an additive variable called
worry
from these variables so that a score of 0 indicates the respondent answered all “Not at all worried” and a score of 15 indicates the respondent answered all “Very worried.” (Tip: Make sure to clean the variables forNAs
). - What are the mode, mean and median for
worry
? - Describe the variable
educat3
. How could it be modified to be used as a continuous variable? - Using
educat3
as both a continuous and factor variable evaluate the statement “Worry about being a victim of crime is higher in those with lower education levels.” - What are the \(R^2\) values for the two models calculated? Which is higher? Does this make that model better than the other?
- Calculate \(97.5\%\) confidence intervals for the coefficients for those with GCSEs and those with Degrees. What does this tell you?
- Create an additive variable called
Transformation of Variables
This Section uses the london_exercises
data set available in the Downloads Section. Data are taken from London Data Store (2013). This document provides a full codebook.
- Unemployment rate, defined as the ratio of people in full time employment to population of working age is often said to be related to crime. Generate an unemployment rate variable for each of the wards.
- It is theorised that unemployment is a driving factor behind crime rates.
- Plot a scatter graph with unemployment rate, crime rate, and the regression line that may be used to evaluate the theory. Describe the plot and the best fit line.
- Plot the graph again excluding wards with a crime rate greater than 500. Describe the plot and the best fit line.
- Plot another graph excluding wards with a crime rate of over 500 with the crime rate log transformed. Describe the plot and the best fit line.
- Build both models. Interpret both including the effect size. Which model fits the data better?
Homework for Week 10
- Work through this week’s flashcards to familiarise yourself with the relevant R functions.
- Find an example for each NEW function and apply it in R to ensure it works
- Complete the Week 9 Moodle Quiz
- Revise the material of weeks 7-9 and note any questions you might have
- Work through the Week 10 “Methods, Methods, Methods” Section.
All exercises are a reproduction from Reiche (forthcoming).↩︎