Glossary
Unless otherwise noted, the definitions are taken from Reiche (forthcoming).
Term | Description |
---|---|
analysis | A detailed evaluation of data to discover their structure and relevant information to answer a research question |
asymmetry | The notion that while X causes Y, Y does not cause X. It is established with temporal priority, manipulated events, and/or the independence of causes (see Brady (2011)) |
attribute | A component or characteristic of a concept |
background concept | The broad constellation of meanings and understandings associated with the concept (Adcock & Collier, 2001, p. 531) |
categorical | Describing the qualitative categories of a characteristic, for example different religions |
causal |
In order to establish a causal relationship, the following criteria must be met concurrently:
|
central limit theorem | In random sampling with a large sample size – where n=30 is usually sufficient – the sampling distribution of the sample mean \(\bar{y}\) will be approximately normally distributed, irrespective of the shape of the population distribution |
concept | Abstract ideas which form the building blocks of theories (Clark et al., 2021, p. 150) |
conceptualization | Formulating a systematized concept through reasoning about the background concept, in light of the goals of research (Adcock & Collier, 2001, p. 531) |
confidence interval | A confidence interval constructs an interval of numbers which will contain the true parameter of the population (e.g. the mean) in \((1-\alpha)\) times of cases. \(\alpha\) is usually chosen to be small, so that our confidence interval has a probability of 95% or 99%. |
confidence level | The confidence level is the probability with which the confidence interval is believed to contain the true parameter of the population and is defined as \((1-\alpha)\) |
conflation | A variable does not belong to the attribute in question, but to a different one (Munck & Verkuilen, 2002, pp. 13–14) |
constant | A variable which does not vary |
continuous | Can assume any value within defined measurement boundaries |
critical value | The critical value is a threshold that determines the boundary for rejecting the null hypothesis (H\(_0\)) in a hypothesis test. It is a point on the probability distribution of the test statistic beyond which the null hypothesis is rejected. The critical value is chosen based on the significance level (\(\alpha\)) of the test, which represents the probability of making a Type I error (i.e., rejecting a true null hypothesis). |
cross-sectional data | Look at different units (or cross-sections) \(i\) at a single point in time |
data | Derives from the Latin which means . For our purposes it is a collection of numbers (or quantities) for the purpose of analysis |
data set | A collection of numerical values for individual observations, separated into distinctive variables |
degrees of freedom | Degrees of freedom express constraints on our estimation process by specifying how many values in the calculation are free to vary |
democracy | A system in which the population chooses and holds accountable elected representatives through fair, free, and contested, multi-party elections. The human rights, civil rights, and civil liberties of individuals are protected by law |
dependent variable | Is dependent through some statistical or stochastic process on the value of an independent variable |
descriptive statistics | Summarise information about the centre and variability of a variable |
deviation | The deviation \(d\) of an observation \(y_{i}\) from the sample mean \(\bar{y}\) is the difference between them: \(d=y_{i}-\bar{y}\) |
dichotomous | Can only assume two mutually exclusive, but internally homogeneous qualitative categories |
discrete | The result of a counting process |
distribution | Refers to the display of the values a variable can assume, together with their respective absolute or relative frequency |
generalisability | The ability to apply the findings made on the basis of a representative sample to the population |
histogram | Displays through rectangles the frequency with which the values of a continuous variable occur in specific ranges |
hypothesis | In statistics, a hypothesis is a formal statement about a population parameter or relationship between variables. Hypotheses guide statistical tests to determine whether data support or refute them. The hypothesis suggesting an effect or difference is called the alternative hypothesis. The alternative hypothesis is always paired with a null-hypothesis, suggesting no effect or difference. |
independent variable | Influences or helps us predict the level of a dependent variable. It is often treated as fixed, or “given” in statistical analysis, and is sometimes also called “explanatory variable” |
interpretation | The explanation of results to answer the research question |
interquartile range | The difference between the 3\(^{\text{rd}}\) and the 1\(^{\text{st}}\) quartiles |
literature review | An analytical summary of the literature relating to a particular topic with the objective of identifying a gap and thus motivating a research question |
mean | Is equal to the sum of the observations divided by the number of observations |
measurement | Refers to the selection of a measure or variable |
median | Separates the lower half from the upper half of observations |
method | A tool for systematic investigation |
mode | Is the most frequently occurring value |
non-probability sampling | In non-probability sampling not every unit has the same probability of being sampled |
normal distribution | The Normal Distribution is a bell-shaped probability distribution which is symmetrical around the mean |
outlier | Defined as a value larger than the third quartile plus 1.5 times the interquartile range, or the first quartile minus 1.5 times the interquartile range |
p-value | The p-value indicates the probability of obtaining a result equal to, or even more extreme than the observed value, assuming the null hypothesis is true. Common thresholds for significance are 0.05, 0.01, and 0.001. A smaller p-value suggests stronger evidence against the null hypothesis. The p-value is denoted as \(p\). |
parameter | A parameter is the value a statistic would assume in the long run. It is also called the Expected Value |
percentile | In ordered data, the percentile refers to the value of a variable below which a certain proportion of observations falls |
population | Collection of all cases which possess certain pre-defined characteristics |
population distribution | The probability distribution of the population |
primary data | Primary data are data you have collected yourself |
probability | Refers to how many times out of a total number of cases a particular event occurs. We can also see it as the chance of a particular event occurring |
probability sampling | In probability sampling all units have the same probability of entering the sample. In addition, all possible combinations of n cases must have the same probability to be selected |
QM | The process of testing theoretical propositions through the analysis of numerical data in order to provide an answer to a research question |
quartile | Divides ordered data into four equal parts and indicates the percentage of observations that falls into the respective quartile and below |
range | The difference between the largest and the smallest observation |
redundancy | Two or more variables measure the same sub-attribute (Munck & Verkuilen, 2002, p. 13) |
reliability | Refers to the extent to which repeated measurement produces the same results |
representative sample | A sample which contains all characteristic of the population in accurate proportions |
research question | A specific enquiry relating to a particular topic or subject. It forms the starting point of the research cycle |
sample | A sub-group of the population |
sample distribution | The probability distribution of a sample |
sampling | The process of selecting sampling units from the population |
sampling distribution | The probability distribution of a sample statistics, such as the mean. It can be derived from repeated sampling, or by estimation |
sampling error | The extent to which the mean of the population and the mean of the sample differ from one another |
sampling method | The way the sample is created |
secondary data | Secondary data are data which have been collected by somebody else |
significance level | The significance level, denoted by \(\alpha\), is the threshold used in hypothesis testing to determine if a result is statistically significant. It represents the probability of rejecting the null hypothesis when it’s actually true (a Type I error). Common levels are 0.05 or 0.01, indicating 5% or 1% risk. We will cover this properly in Week 9. |
significance test | A significance test is a statistical method used to determine whether observed data provide enough evidence to reject a null hypothesis. It calculates a probability of observing data as extreme as, or more extreme than, the actual sample results, assuming the null hypothesis is true |
Social Sciences | Are concerned with the study of society and seek to scientifically describe and explain the behaviour of actors |
standard deviation | The standard deviation s is defined as \[\begin{equation*}s=\sqrt{\frac{\text{sum of squared deviations}}{\text{sample size} -1}}=\sqrt{\frac{\Sigma(y_{i} - \bar{y})^2}{n-1}}\end{equation*}\] |
standard error | The standard deviation of the sampling distribution. It is defined as:\[\begin{equation*}\sigma_{\bar{y}} = \frac{\sigma}{\sqrt{n}}\end{equation*}\] |
symmetry | In the context of causation, symmetry is understood as the law-like regularity of events. There needs to be a recipe (causal mechanism) which regularly produces effects from causes (see Brady (2011)) |
systematized concept | A specific formulation of a concept used by a given scholar or group of scholars; commonly involves an explicit definition (Adcock & Collier, 2001, p. 531) |
t-distribution | The t-Distribution is bell-shaped and symmetrical around a mean of zero. Its shape is dependent on the degrees of freedom in the estimation process. |
test statistic | A test statistic is a value calculated from the sample data that is used to decide whether to reject the null hypothesis (H\(_0\)) in a hypothesis test. It quantifies the degree to which the observed data diverges from what is expected under the null hypothesis. In a t-test, the test statistic is a t-value, which measures the distance between the sample mean and the (hypothesised) population mean, expressed in units of standard errors. |
theory | A formal set of ideas that is intended to explain why something happens or exists (Oxford Learner’s Dictionaries, n.d.) |
Type I Error | A Type I Error occurs when a null hypothesis (H\(_0\)) that is actually true is incorrectly rejected. It is also known as a false positive errors, as it suggests that an effect of difference exists, when, in fact, it does not. The probability of committing a Type I Error is denoted by the significance level (\(\alpha\)) of the test, which is typically set before conducting the test (e.g. \(\alpha\) = 0.05). This means that there is a 5% chance of rejecting the true null hypothesis. |
Type II Error | A Type II Error occurs when a null hypothesis (H\(_0\)) that is actually false is incorrectly accepted (or not rejected). It is also known as a false negative error, as it suggests that no effect or difference exists when, in fact, there is one. |
validity | The extent to which the measure (variable) you choose genuinely represents the concept in question. The word comes from the Latin word “validus” which means “strong” |
variable | An element of a conceptual component which varies. We also call these “measures” |
variance | Is equal to the squared standard deviation |
z-score | The z-score, sometimes also referred to as z-value, expresses in units of standard deviation how far an observation of interest falls away from the mean. It is defined as \[\begin{equation*}z = \frac{\text{observation} - \text{mean}}{\text{standard deviation}}=\frac{y-\mu}{\sigma}\end{equation*}\] |