Glossary

Unless otherwise noted, the definitions are taken from Reiche (forthcoming).

Table 13: Glossary for PO11Q
Term	Description
analysis	A detailed evaluation of data to discover their structure and relevant information to answer a research question
asymmetry	The notion that while X causes Y, Y does not cause X. It is established with temporal priority, manipulated events, and/or the independence of causes (see Brady (2011))
attribute	A component or characteristic of a concept
background concept	The broad constellation of meanings and understandings associated with the concept (Adcock & Collier, 2001, p. 531)
categorical	Describing the qualitative categories of a characteristic, for example different religions
causal	In order to establish a causal relationship, the following criteria must be met concurrently: Relevance of the variables within the broader theoretical and empirical context of the research Clear Theoretical Framework Clear conceptualization Exclusion of alternative explanations Asymmetry Significant (and sufficiently strong) statistical association
central limit theorem	In random sampling with a large sample size – where n=30 is usually sufficient – the sampling distribution of the sample mean $\bar{y}$ will be approximately normally distributed, irrespective of the shape of the population distribution
concept	Abstract ideas which form the building blocks of theories (Clark et al., 2021, p. 150)
conceptualization	Formulating a systematized concept through reasoning about the background concept, in light of the goals of research (Adcock & Collier, 2001, p. 531)
confidence interval	A confidence interval constructs an interval of numbers which will contain the true parameter of the population (e.g. the mean) in $(1-\alpha)$ times of cases. $\alpha$ is usually chosen to be small, so that our confidence interval has a probability of 95% or 99%.
confidence level	The confidence level is the probability with which the confidence interval is believed to contain the true parameter of the population and is defined as $(1-\alpha)$
conflation	A variable does not belong to the attribute in question, but to a different one (Munck & Verkuilen, 2002, pp. 13–14)
constant	A variable which does not vary
continuous	Can assume any value within defined measurement boundaries
critical value	The critical value is a threshold that determines the boundary for rejecting the null hypothesis (H $_0$ ) in a hypothesis test. It is a point on the probability distribution of the test statistic beyond which the null hypothesis is rejected. The critical value is chosen based on the significance level ( $\alpha$ ) of the test, which represents the probability of making a Type I error (i.e., rejecting a true null hypothesis).
cross-sectional data	Look at different units (or cross-sections) $i$ at a single point in time
data	Derives from the Latin which means . For our purposes it is a collection of numbers (or quantities) for the purpose of analysis
data set	A collection of numerical values for individual observations, separated into distinctive variables
degrees of freedom	Degrees of freedom express constraints on our estimation process by specifying how many values in the calculation are free to vary
democracy	A system in which the population chooses and holds accountable elected representatives through fair, free, and contested, multi-party elections. The human rights, civil rights, and civil liberties of individuals are protected by law
dependent variable	Is dependent through some statistical or stochastic process on the value of an independent variable
descriptive statistics	Summarise information about the centre and variability of a variable
deviation	The deviation $d$ of an observation $y_{i}$ from the sample mean $\bar{y}$ is the difference between them: $d=y_{i}-\bar{y}$
dichotomous	Can only assume two mutually exclusive, but internally homogeneous qualitative categories
discrete	The result of a counting process
distribution	Refers to the display of the values a variable can assume, together with their respective absolute or relative frequency
generalisability	The ability to apply the findings made on the basis of a representative sample to the population
histogram	Displays through rectangles the frequency with which the values of a continuous variable occur in specific ranges
hypothesis	In statistics, a hypothesis is a formal statement about a population parameter or relationship between variables. Hypotheses guide statistical tests to determine whether data support or refute them. The hypothesis suggesting an effect or difference is called the alternative hypothesis. The alternative hypothesis is always paired with a null-hypothesis, suggesting no effect or difference.
independent variable	Influences or helps us predict the level of a dependent variable. It is often treated as fixed, or “given” in statistical analysis, and is sometimes also called “explanatory variable”
interpretation	The explanation of results to answer the research question
interquartile range	The difference between the 3 $^{\text{rd}}$ and the 1 $^{\text{st}}$ quartiles
literature review	An analytical summary of the literature relating to a particular topic with the objective of identifying a gap and thus motivating a research question
mean	Is equal to the sum of the observations divided by the number of observations
measurement	Refers to the selection of a measure or variable
median	Separates the lower half from the upper half of observations
method	A tool for systematic investigation
mode	Is the most frequently occurring value
non-probability sampling	In non-probability sampling not every unit has the same probability of being sampled
normal distribution	The normal distribution is a bell-shaped probability distribution that is symmetrical around the mean. Approximately 68% of values fall within 1 standard deviation of the mean, 96% within 2 standard deviations, and 99.7% within 3 standard deviations. This is known as the empirical rule.
outlier	Defined as a value larger than the third quartile plus 1.5 times the interquartile range, or the first quartile minus 1.5 times the interquartile range
p-value	The p-value indicates the probability of obtaining a result equal to, or even more extreme than the observed value, assuming the null hypothesis is true. Common thresholds for significance are 0.05, 0.01, and 0.001. A smaller p-value suggests stronger evidence against the null hypothesis. The p-value is denoted as $p$ .
parameter	A parameter is the value a statistic would assume in the long run. It is also called the Expected Value
percentile	In ordered data, the percentile refers to the value of a variable below which a certain proportion of observations falls
population	Collection of all cases which possess certain pre-defined characteristics
population distribution	The probability distribution of the population
primary data	Primary data are data you have collected yourself
probability	Refers to how many times out of a total number of cases a particular event occurs. We can also see it as the chance of a particular event occurring
probability sampling	In probability sampling all units have the same probability of entering the sample. In addition, all possible combinations of n cases must have the same probability to be selected
QM	The process of testing theoretical propositions through the analysis of numerical data in order to provide an answer to a research question
quartile	Divides ordered data into four equal parts and indicates the percentage of observations that falls into the respective quartile and below
range	The difference between the largest and the smallest observation
redundancy	Two or more variables measure the same sub-attribute (Munck & Verkuilen, 2002, p. 13)
reliability	Refers to the extent to which repeated measurement produces the same results
representative sample	A sample which contains all characteristic of the population in accurate proportions
research question	A specific enquiry relating to a particular topic or subject. It forms the starting point of the research cycle
sample	A sub-group of the population
sample distribution	The probability distribution of a sample
sampling	The process of selecting sampling units from the population
sampling distribution	The probability distribution of a sample statistics, such as the mean. It can be derived from repeated sampling, or by estimation
sampling error	The extent to which the mean of the population and the mean of the sample differ from one another
sampling method	The way the sample is created
secondary data	Secondary data are data which have been collected by somebody else
significance level	The significance level, denoted by $\alpha$ , is the threshold used in hypothesis testing to determine if a result is statistically significant. It represents the probability of rejecting the null hypothesis when it’s actually true (a Type I error). Common levels are 0.05 or 0.01, indicating 5% or 1% risk. We will cover this properly in Week 9.
significance test	A significance test is a statistical method used to determine whether observed data provide enough evidence to reject a null hypothesis. It calculates a probability of observing data as extreme as, or more extreme than, the actual sample results, assuming the null hypothesis is true
Social Sciences	Are concerned with the study of society and seek to scientifically describe and explain the behaviour of actors
standard deviation	The standard deviation s is defined as $\begin{equation}s=\sqrt{\frac{\text{sum of squared deviations}}{\text{sample size} -1}}=\sqrt{\frac{\Sigma(y_{i} - \bar{y})^2}{n-1}}\end{equation}$
standard error	The standard deviation of the sampling distribution. It is defined as: $\begin{equation}\sigma_{\bar{y}} = \frac{\sigma}{\sqrt{n}}\end{equation}$
symmetry	In the context of causation, symmetry is understood as the law-like regularity of events. There needs to be a recipe (causal mechanism) which regularly produces effects from causes (see Brady (2011))
systematized concept	A specific formulation of a concept used by a given scholar or group of scholars; commonly involves an explicit definition (Adcock & Collier, 2001, p. 531)
t-distribution	The t-Distribution is bell-shaped and symmetrical around a mean of zero. Its shape is dependent on the degrees of freedom in the estimation process.
test statistic	A test statistic is a value calculated from the sample data that is used to decide whether to reject the null hypothesis (H $_0$ ) in a hypothesis test. It quantifies the degree to which the observed data diverges from what is expected under the null hypothesis. In a t-test, the test statistic is a t-value, which measures the distance between the sample mean and the (hypothesised) population mean, expressed in units of standard errors.
theory	A formal set of ideas that is intended to explain why something happens or exists (Oxford Learner’s Dictionaries, n.d.)
Type I Error	A Type I Error occurs when a null hypothesis (H $_0$ ) that is actually true is incorrectly rejected. It is also known as a false positive errors, as it suggests that an effect of difference exists, when, in fact, it does not. The probability of committing a Type I Error is denoted by the significance level ( $\alpha$ ) of the test, which is typically set before conducting the test (e.g. $\alpha$ = 0.05). This means that there is a 5% chance of rejecting the true null hypothesis.
Type II Error	A Type II Error occurs when a null hypothesis (H $_0$ ) that is actually false is incorrectly accepted (or not rejected). It is also known as a false negative error, as it suggests that no effect or difference exists when, in fact, there is one.
validity	The extent to which the measure (variable) you choose genuinely represents the concept in question. The word comes from the Latin word “validus” which means “strong”
variable	An element of a conceptual component which varies. We also call these “measures”
variance	Is equal to the squared standard deviation
z-score	The z-score, sometimes also referred to as z-value, expresses in units of standard deviation how far an observation of interest falls away from the mean. It is defined as $\begin{equation}z = \frac{\text{observation} - \text{mean}}{\text{standard deviation}}=\frac{y-\mu}{\sigma}\end{equation}$