Significance Testing

Group Work – Self-Reflection¹³

What does a significance test do?
Why can we not sacrifice the randomisation assumption in significance testing?
Explain the difference between a significance test and a confidence interval.
Explain the relationship between the desired $\alpha$ level and the Type I and II Errors.
What are the differences between a one- and a two-sided significance test? Give an example for each.

Please stop here and don’t go beyond this point until we have compared notes on your answers.

R Exercises

These exercises will use the ks2.csv dataset. This data comprises fictitious¹⁴ average grades of Key Stage 2 (KS2) students in the UK, with 1,980 KS2 students’ test scores being included for reading (reading), mathematics (maths), and grammar, punctuation, and spelling (gps), as well as the mean of these three test scores (avg\_all). The test scores have been standardised, with 80 representing the lowest possible mark, 120 representing the highest possible mark, 100 representing the minimum passing mark, and -1 representing an ungraded test.

Load the ks2.csv dataset into R. Remove any observations that have any ungraded test scores.
Calculate the means and standard deviations of each of the three subjects. Write a brief description of the insights that can be drawn from these values.
Conduct a t-test to see if the means of each of the three subjects’ marks are statistically different from 100 at the 95% confidence level. Identify three ways that suggest statistically significant or insignificant differences.
Conduct a t-test to see if the mean of the average score of the three tests is statistically less than 105 at the 99% confidence level. Interpret the results.
The following questions will look at students’ English abilities generally.
1. Create a new variable called english, which consists of the average of the reading and grammar, punctuation, and spelling variables.
2. Conduct a t-test to see if the mean of the average score of the new English variable is statistically less than 105 at the 99.9% confidence level. Interpret the results.
3. Conduct a t-test to see if the mean of the average score of the new English variable is statistically different from 105 at the 99.9% confidence level. Interpret the results.
4. Is the any difference in the interpretation between the two above tests? Are there any differences in the results? Why?
You are tasked with investing the performance of students who passed in mathematics those who did not. For the following tests, use a 95% confidence level.
1. Create a binary variable that has two categories: those who passed mathematics (100 $\leq$ mark) and those who failed mathematics (mark $<$ 100).
2. Conduct a proportion test to see if the proportion of students that fail mathematics is 10% or greater. Interpret the results.
3. Conduct a t-test on the group who fail mathematics to see if they, on average, have marks for English statistically less than 100. Interpret the results.
4. Conduct a t-test on the group who pass mathematics to see if they, on average, have marks for grammar, spelling, and punctuation statistically greater than 105. Interpret the results.
Now it is worth investigating how students who fail at least one subject perform.
1. Create a binary variable that has two categories: those who passed all three subjects (all marks greater than or equal to 100) and those who failed at least one subject (one or more marks less than 100).
2. It can be hypothesised that the group of students who failed will have a mean of all of the test marks significantly below the pass mark of 100. Test this and interpret the findings with respect to the statistical and practical significance of the test.
Imagine you are part of a team work working within the Department for Education, tasked with investigating this sample to produce recommendations for policymakers.
1. Normalise the variable that contains the average of all three marks by setting the lowest mark (80) to 0, the highest mark (120) to 100, and the minimum pass mark (100) to 50. Justify why it might be useful to normalise these marks to this scale for non-specialist policymakers.
2. Construct a categorical variable that consists of five categories: 0-49.99 (Fail), 50-59.99 (Pass), 60-60.99 (Merit), 70-79.99 (Distinction), and 80+ (Distinction+).
3. Answer the following questions but write your answers as if intended for a non-specialist policy making with no knowledge of statistics:
  1. Are the averages of the Pass, Merit, and Distinction groups different from their middle marks (55, 65, and 75, respectively)? If so, which direction?
  2. Is the average mark of the Distinction+ group lower than the maximum mark of the group?
  3. Is the mean mark of the Fail group higher than the median mark of the group? Which skew does this indicate in the distribution?

Homework for Week 10

“Sit” the practice exam which uses a case study similar to the one you will receive in the exam.
Work through this week’s flashcards to familiarise yourself with the relevant R functions.
Find an example for each NEW function and apply it in R to ensure it works
Finish the Week 8 R Exercises
Prepare a list of questions you have for the exam.

Solutions

You can find the Solutions in the Downloads Section.

All exercises are a reproduction from Reiche (forthcoming).↩︎
The means are based on KS2 scaled score averages which have been averaged over 2016-2019.↩︎

Significance Testing

Group Work – Self-Reflection13

R Exercises

Homework for Week 10

Solutions

Group Work – Self-Reflection¹³