Worksheet Week 9

Group Work – Self-Reflection13

  1. How does the concept of the conditional expected value relate to the interpretation of coefficients in multiple regression?
  2. How do you select variables for a multiple regression model?
  3. Why do we apply non-linear transformations to variables?
  4. How do you interpret the effect of a logarithmized variable?
  5. Give an example for each of the variable transformations we discussed in the lecture.

Please stop here and don’t go beyond this point until we have compared notes on your answers.


Working with Regression Analysis

We will be using the “crime.csv” data available in the Downloads Section to analyse attitudes and experiences of crime in England and Wales. The data set contains several variables relating to questions asked about experience and perceptions of crime to a representative sample along with demographic details on the respondents. Data are taken from University of Manchester, Cathie Marsh Institute for Social Research (CMIST), UK Data Service, Office for National Statistics (2019). For variable labels, please consult the Crime Data Code Book.

Data Prep

Each respondent was randomly assigned to a different module, indicated by split and only asked a subsection of the questions. The antisocx variable is part of module A and asks for a score from respondents on how much antisocial behaviour is in the neighbourhood.

  1. Create a new data set for those who were chosen for the ‘A’ module. Call this crime.a.

Data Analysis

  1. Previous research has suggested men perceive more antisocial behaviour in rural areas than urban areas.
    1. Create a linear model using only male respondents from crime.a with the dependent variable antisocx and the independent variable rural2. Do your findings support previous research?
    2. Test the same model using only women. How do the two models differ?
  2. The wburgl variable asks respondents how worried they are about being burgled with answers ranging from “Very worried” to “Not at all worried” along with “Not applicable” and “Don’t know.”
  1. Recode those with “Not applicable” or “Don’t know” as NAs. b.Using agegrp7 and wburgl as continuous variables, test the hypothesis that older people are more worried about being burgled.
  2. Using a dummy variable test whether those over 65 have are more worried about burglary then those who are younger.
  3. Using dummies test whether any of the age groups differ significantly from the youngest age group.
  4. What does the intercept in each of the three models represent?

Going Further

I have written these exercises so that you can practise in R a little more. These are a bt more demanding, however.If you skip this section, please make sure to do the exercises in the next section, as this taps into the variable transformations we explored in the lecture.

  1. There are five variables which ask how worried the respondent is about being the victim of various crimes:
    1. Create an additive variable called worry from these variables so that a score of 0 indicates the respondent answered all “Not at all worried” and a score of 15 indicates the respondent answered all “Very worried.” (Tip: Make sure to clean the variables for NAs).
    2. What are the mode, mean and median for worry?
    3. Describe the variable educat3. How could it be modified to be used as a continuous variable?
    4. Using educat3 as both a continuous and factor variable evaluate the statement “Worry about being a victim of crime is higher in those with lower education levels.”
    5. What are the \(R^2\) values for the two models calculated? Which is higher? Does this make that model better than the other?
    6. Calculate \(97.5\%\) confidence intervals for the coefficients for those with GCSEs and those with Degrees. What does this tell you?

Transformation of Variables

This Section uses the london_exercises data set available in the Downloads Section. Data are taken from London Data Store (2013). This document provides a full codebook.

  1. Unemployment rate, defined as the ratio of people in full time employment to population of working age is often said to be related to crime. Generate an unemployment rate variable for each of the wards.
  2. It is theorised that unemployment is a driving factor behind crime rates.
    1. Plot a scatter graph with unemployment rate, crime rate, and the regression line that may be used to evaluate the theory. Describe the plot and the best fit line.
    2. Plot the graph again excluding wards with a crime rate greater than 500. Describe the plot and the best fit line.
    3. Plot another graph excluding wards with a crime rate of over 500 with the crime rate log transformed. Describe the plot and the best fit line.
    4. Build both models. Interpret both including the effect size. Which model fits the data better?

Homework for Week 10


Solutions

You can find the Solutions in the Downloads Section.



  1. All exercises are a reproduction from Reiche (forthcoming).↩︎