Worksheet Week 9

Group Work – Self-Reflection¹⁴

How does the concept of the conditional expected value relate to the interpretation of coefficients in multiple regression?
How do you select variables for a multiple regression model?
Why do we apply non-linear transformations to variables?
How do you interpret the effect of a logarithmized variable?
Using data from London wards (London Data Store, 2013) the regression models in Table 18 explain voter turnout in the 2012 mayoral elections.
1. interpret the intercept in Model 1
2. interpret the slope coefficient in Model 1
3. interpret the slope coefficients in Model 3
4. explain why the size effect of the slope coefficients in Models 1 and 2 is so different
5. interpret the model fit measure in Model 3

Table 18: Regression Models for Self-Reflection Exercises.
	Dependent Variable: Turnout in 2012 Mayoral Elections
	Bivariate (1)	Bivariate (2)	Multiple (3)
+ p < 0.1, * p < 0.05, p < 0.01, * p < 0.001
Age (median)	0.740***		0.724***
	(0.063)		(0.064)
Crime Rate (per 1,000)		-0.009**	-0.005+
		(0.003)	(0.003)
Constant	7.555***	34.900***	8.536***
	(2.284)	(0.319)	(2.340)
Num.Obs.	625	625	625
R2	0.180	0.015	0.184
R2 Adj.	0.179	0.014	0.182

Please stop here and don’t go beyond this point until we have compared notes on your answers.

Working with Regression Analysis

We will be using the “crime.csv” data available in the Downloads Section to analyse attitudes and experiences of crime in England and Wales. The data set contains several variables relating to questions asked about experience and perceptions of crime to a representative sample along with demographic details on the respondents (University of Manchester, Cathie Marsh Institute for Social Research (CMIST), UK Data Service, Office for National Statistics, 2019). For variable labels, please consult the Crime Data Code Book.

Data Analysis

Each respondent was randomly assigned to a different module, indicated by split and only asked a subsection of the questions. The antisocx variable is part of module A and asks for a score from respondents on how much antisocial behaviour is in the neighbourhood. Previous research has suggested men perceive more antisocial behaviour in rural areas than urban areas.
1. Create a new data set for those who were chosen for the ‘A’ module. Call this crime.a.
2. Create a linear model using only male respondents from crime.a with the dependent variable antisocx and the independent variable rural2. Do your findings support previous research?
3. Test the same model using only women. How do the two models differ?
We will now be using the full crime data frame. The wburgl variable asks respondents how worried they are about being burgled with answers ranging from “Very worried” to “Not at all worried” along with “Not applicable” and “Don’t know.”

Recode those with “Not applicable” or “Don’t know” as NAs. b.Using agegrp7 and wburgl as continuous variables, test the hypothesis that older people are more worried about being burgled.
Using a dummy variable test whether those over 65 have are more worried about burglary then those who are younger.
Using dummies test whether any of the age groups differ significantly from the youngest age group.
What does the intercept in each of the three models represent?

Going Further

I have written these exercises so that you can practice in R a little more. These are a bit more demanding, however. If you skip this section, please make sure to do the exercises in the next section, as this taps into the variable transformations we explored in the lecture.

There are five variables which ask how worried the respondent is about being the victim of various crimes:
1. Create an additive variable called worry from these variables so that a score of 0 indicates the respondent answered all “Not at all worried” and a score of 15 indicates the respondent answered all “Very worried.” (Tip: Make sure to clean the variables for NAs).
2. What are the mode, mean and median for worry?
3. Describe the variable educat3. How could it be modified to be used as a continuous variable?
4. Using educat3 as both a continuous and factor variable evaluate the statement “Worry about being a victim of crime is higher in those with lower education levels.”
5. What are the $R^2$ values for the two models calculated? Which is higher? Does this make that model better than the other?
6. Calculate $97.5\%$ confidence intervals for the coefficients for those with GCSEs and those with Degrees. What does this tell you?

Transformation of Variables

This Section uses the london_exercises data set available in the Downloads Section. Data are taken from London Data Store (2013). This document provides a full codebook.

Unemployment rate, defined as the ratio of people in full time employment to population of working age is often said to be related to crime. Generate an unemployment rate variable for each of the wards.
It is theorised that unemployment is a driving factor behind crime rates.
1. Plot a scatter graph with unemployment rate, crime rate, and the regression line that may be used to evaluate the theory. Describe the plot and the best fit line.
2. Plot the graph again excluding wards with a crime rate greater than 500. Describe the plot and the best fit line.
3. Plot another graph excluding wards with a crime rate of over 500 with the crime rate log transformed. Describe the plot and the best fit line.
4. Build both models (b and c). Interpret both including the effect size. Which model fits the data better?

Homework for Week 10

Work through this week’s flashcards to familiarise yourself with the relevant R functions.
Find an example for each NEW function and apply it in R to ensure it works
Complete the Week 9 Moodle Quiz
Revise the material of weeks 7-9 and note any questions you might have
Work through the Week 10 “Methods, Methods, Methods” Section.

Solutions

You can find the Solutions in the Downloads Section.

Group Work – Self-Reflection14