Glossary

Unless otherwise noted, the definitions are taken from Reiche (forthcoming).

Table 13: Glossary for PO11Q
Term Description
analysis A detailed evaluation of data to discover their structure and relevant information to answer a research question
asymmetry The notion that while X causes Y, Y does not cause X. It is established with temporal priority, manipulated events, and/or the independence of causes (see Brady (2011))
attribute A component or characteristic of a concept
background concept The broad constellation of meanings and understandings associated with the concept (Adcock & Collier, 2001, p. 531)
categorical Describing the qualitative categories of a characteristic, for example different religions
causal In order to establish a causal relationship, the following criteria must be met concurrently:
  1. Relevance of the variables within the broader theoretical and empirical context of the research
    1. Clear Theoretical Framework
    2. Clear conceptualization
    3. Exclusion of alternative explanations
  2. Asymmetry
  3. Significant (and sufficiently strong) statistical association
central limit theorem In random sampling with a large sample size – where n=30 is usually sufficient – the sampling distribution of the sample mean \(\bar{y}\) will be approximately normally distributed, irrespective of the shape of the population distribution
concept Abstract ideas which form the building blocks of theories (Clark et al., 2021, p. 150)
conceptualization Formulating a systematized concept through reasoning about the background concept, in light of the goals of research (Adcock & Collier, 2001, p. 531)
confidence interval A confidence interval constructs an interval of numbers which will contain the true parameter of the population (e.g. the mean) in \((1-\alpha)\) times of cases. \(\alpha\) is usually chosen to be small, so that our confidence interval has a probability of 95% or 99%.
confidence level The confidence level is the probability with which the confidence interval is believed to contain the true parameter of the population and is defined as \((1-\alpha)\)
conflation A variable does not belong to the attribute in question, but to a different one (Munck & Verkuilen, 2002, pp. 13–14)
constant A variable which does not vary
continuous Can assume any value within defined measurement boundaries
critical value The critical value is a threshold that determines the boundary for rejecting the null hypothesis (H\(_0\)) in a hypothesis test. It is a point on the probability distribution of the test statistic beyond which the null hypothesis is rejected. The critical value is chosen based on the significance level (\(\alpha\)) of the test, which represents the probability of making a Type I error (i.e., rejecting a true null hypothesis).
cross-sectional data Look at different units (or cross-sections) \(i\) at a single point in time
data Derives from the Latin which means . For our purposes it is a collection of numbers (or quantities) for the purpose of analysis
data set A collection of numerical values for individual observations, separated into distinctive variables
degrees of freedom Degrees of freedom express constraints on our estimation process by specifying how many values in the calculation are free to vary
democracy A system in which the population chooses and holds accountable elected representatives through fair, free, and contested, multi-party elections. The human rights, civil rights, and civil liberties of individuals are protected by law
dependent variable Is dependent through some statistical or stochastic process on the value of an independent variable
descriptive statistics Summarise information about the centre and variability of a variable
deviation The deviation \(d\) of an observation \(y_{i}\) from the sample mean \(\bar{y}\) is the difference between them: \(d=y_{i}-\bar{y}\)
dichotomous Can only assume two mutually exclusive, but internally homogeneous qualitative categories
discrete The result of a counting process
distribution Refers to the display of the values a variable can assume, together with their respective absolute or relative frequency
generalisability The ability to apply the findings made on the basis of a representative sample to the population
histogram Displays through rectangles the frequency with which the values of a continuous variable occur in specific ranges
hypothesis In statistics, a hypothesis is a formal statement about a population parameter or relationship between variables. Hypotheses guide statistical tests to determine whether data support or refute them. The hypothesis suggesting an effect or difference is called the alternative hypothesis. The alternative hypothesis is always paired with a null-hypothesis, suggesting no effect or difference.
independent variable Influences or helps us predict the level of a dependent variable. It is often treated as fixed, or “given” in statistical analysis, and is sometimes also called “explanatory variable”
interpretation The explanation of results to answer the research question
interquartile range The difference between the 3\(^{\text{rd}}\) and the 1\(^{\text{st}}\) quartiles
literature review An analytical summary of the literature relating to a particular topic with the objective of identifying a gap and thus motivating a research question
mean Is equal to the sum of the observations divided by the number of observations
measurement Refers to the selection of a measure or variable
median Separates the lower half from the upper half of observations
method A tool for systematic investigation
mode Is the most frequently occurring value
non-probability sampling In non-probability sampling not every unit has the same probability of being sampled
normal distribution The Normal Distribution is a bell-shaped probability distribution which is symmetrical around the mean
outlier Defined as a value larger than the third quartile plus 1.5 times the interquartile range, or the first quartile minus 1.5 times the interquartile range
p-value The p-value indicates the probability of obtaining a result equal to, or even more extreme than the observed value, assuming the null hypothesis is true. Common thresholds for significance are 0.05, 0.01, and 0.001. A smaller p-value suggests stronger evidence against the null hypothesis. The p-value is denoted as \(p\).
parameter A parameter is the value a statistic would assume in the long run. It is also called the Expected Value
percentile In ordered data, the percentile refers to the value of a variable below which a certain proportion of observations falls
population Collection of all cases which possess certain pre-defined characteristics
population distribution The probability distribution of the population
primary data Primary data are data you have collected yourself
probability Refers to how many times out of a total number of cases a particular event occurs. We can also see it as the chance of a particular event occurring
probability sampling In probability sampling all units have the same probability of entering the sample. In addition, all possible combinations of n cases must have the same probability to be selected
QM The process of testing theoretical propositions through the analysis of numerical data in order to provide an answer to a research question
quartile Divides ordered data into four equal parts and indicates the percentage of observations that falls into the respective quartile and below
range The difference between the largest and the smallest observation
redundancy Two or more variables measure the same sub-attribute (Munck & Verkuilen, 2002, p. 13)
reliability Refers to the extent to which repeated measurement produces the same results
representative sample A sample which contains all characteristic of the population in accurate proportions
research question A specific enquiry relating to a particular topic or subject. It forms the starting point of the research cycle
sample A sub-group of the population
sample distribution The probability distribution of a sample
sampling The process of selecting sampling units from the population
sampling distribution The probability distribution of a sample statistics, such as the mean. It can be derived from repeated sampling, or by estimation
sampling error The extent to which the mean of the population and the mean of the sample differ from one another
sampling method The way the sample is created
secondary data Secondary data are data which have been collected by somebody else
significance level The significance level, denoted by \(\alpha\), is the threshold used in hypothesis testing to determine if a result is statistically significant. It represents the probability of rejecting the null hypothesis when it’s actually true (a Type I error). Common levels are 0.05 or 0.01, indicating 5% or 1% risk. We will cover this properly in Week 9.
significance test A significance test is a statistical method used to determine whether observed data provide enough evidence to reject a null hypothesis. It calculates a probability of observing data as extreme as, or more extreme than, the actual sample results, assuming the null hypothesis is true
Social Sciences Are concerned with the study of society and seek to scientifically describe and explain the behaviour of actors
standard deviation The standard deviation s is defined as \[\begin{equation*}s=\sqrt{\frac{\text{sum of squared deviations}}{\text{sample size} -1}}=\sqrt{\frac{\Sigma(y_{i} - \bar{y})^2}{n-1}}\end{equation*}\]
standard error The standard deviation of the sampling distribution. It is defined as:\[\begin{equation*}\sigma_{\bar{y}} = \frac{\sigma}{\sqrt{n}}\end{equation*}\]
symmetry In the context of causation, symmetry is understood as the law-like regularity of events. There needs to be a recipe (causal mechanism) which regularly produces effects from causes (see Brady (2011))
systematized concept A specific formulation of a concept used by a given scholar or group of scholars; commonly involves an explicit definition (Adcock & Collier, 2001, p. 531)
t-distribution The t-Distribution is bell-shaped and symmetrical around a mean of zero. Its shape is dependent on the degrees of freedom in the estimation process.
test statistic A test statistic is a value calculated from the sample data that is used to decide whether to reject the null hypothesis (H\(_0\)) in a hypothesis test. It quantifies the degree to which the observed data diverges from what is expected under the null hypothesis. In a t-test, the test statistic is a t-value, which measures the distance between the sample mean and the (hypothesised) population mean, expressed in units of standard errors.
theory A formal set of ideas that is intended to explain why something happens or exists (Oxford Learner’s Dictionaries, n.d.)
Type I Error A Type I Error occurs when a null hypothesis (H\(_0\)) that is actually true is incorrectly rejected. It is also known as a false positive errors, as it suggests that an effect of difference exists, when, in fact, it does not. The probability of committing a Type I Error is denoted by the significance level (\(\alpha\)) of the test, which is typically set before conducting the test (e.g. \(\alpha\) = 0.05). This means that there is a 5% chance of rejecting the true null hypothesis.
Type II Error A Type II Error occurs when a null hypothesis (H\(_0\)) that is actually false is incorrectly accepted (or not rejected). It is also known as a false negative error, as it suggests that no effect or difference exists when, in fact, there is one.
validity The extent to which the measure (variable) you choose genuinely represents the concept in question. The word comes from the Latin word “validus” which means “strong”
variable An element of a conceptual component which varies. We also call these “measures”
variance Is equal to the squared standard deviation
z-score The z-score, sometimes also referred to as z-value, expresses in units of standard deviation how far an observation of interest falls away from the mean. It is defined as \[\begin{equation*}z = \frac{\text{observation} - \text{mean}}{\text{standard deviation}}=\frac{y-\mu}{\sigma}\end{equation*}\]