Glossary

Unless otherwise noted, the definitions are taken from Reiche (forthcoming).

Table 22: Glossary for PO12Q
Term	Description
adjusted rsquared	The coefficient of determination for multiple regression.
autocorrelation	The value of one error term does not allow us to predict the value of another error term. As such their covariances must be zero
categorical	Describing the qualitative categories of a characteristic, for example different religions
coefficient	A coefficient is a numerical expression which is multiplied with the value of a variable
coefficient of determination	Indicates the proportion of the variation in the dependent variable which is explained through the independent variable. It is defined as $\frac{\text{Explained Sum of Squares}}{\text{Total Sum of Squares}}$
collinearity	If two variables are functionally dependent on each other we call these collinear. Should this apply to multiple variables at the same time we call these multi-collinear. (multi-)collinearity exists if – and only if – the values follow this function precisely. Such a situation is rare, as usually variables are more loosely related to one another
conditional expectation function	see Population Regression Function (PRF)
confidence interval	A confidence interval constructs an interval of numbers which will contain the true parameter of the population (e.g. the mean) in $(1-\alpha)$ times of cases. $\alpha$ is usually chosen to be small, so that our confidence interval has a probability of 95% or 99%
confidence level	The confidence level is the probability with which the confidence interval is believed to contain the true parameter of the population and is defined as $(1-\alpha)$
correlation	The statistical dependence of two random variables which is determined by pairwise comparison of values
covariance	A measure which assesses the degree to which the values of two variables ( $X_i$ and $X_j$ ) vary together (their joint variability).
degrees of freedom	Degrees of freedom express constraints on our estimation process by specifying how many values in the calculation are free to vary
dichotomous	Can only assume two mutually exclusive, but internally homogeneous qualitative categories
dummy variable	Dummy variables are dichotomous, categorical variables which indicate the presence or the absence of a characteristic.
error term	The error term quantifies the distance between each observation and the corresponding point on the regression line. The terms are denoted as $\epsilon_{i}$
heteroscedasticity	The variability of the error terms changes for different observations $X$
homoscedasticity	Refers to a constant variance of the error terms
intercept	The intercept is the point at which the regression line intersects the y-axis. In this book we denote it as $\beta_{0}$
law of iterative expectations	According to this law, “the expected value of X is equal to the expectationover the conditional expectation of X given Y” (Robinson, 2022)
logarithm	Logarithm is defined as the exponent or power to which a base must be raised to yield a given number. Expressed mathematically, $x$ is the logarithm of $n$ to the base $b$ if $b^x = n$ , in which case $x = log_b n$ (Murray, 2023)
matrix	A “set of numbers arranged in rows and columns so as to form a rectangular array” (Ronan, 2023)
model building	Running a number of regression models, each testing a different combination of variables.
multicollinearity	A situation in which an independent variable is a function of multiple other independent variables in the regression model
omitted variable bias	If you omit an important variable in a regression model, it will bias the size of other coefficients if the variables are correlated.
Ordinary Least Squares	The method of fitting a regression line by means of minimizing the sum of the squared distances between the observations and the estimated values
outer product	The outer product of two vectors results in a matrix in which each element is the product of corresponding elements from the first vector (rows) and the second vector (columns).
parsimony	Refers to the principle .
partial slope coefficient	A partial slope coefficient measures the influence of a variable in multiple regression, holding all other independent variables in the model constant
Population Regression Function	The Population Regression Function (PRF) describes the expected distribution of $y$ , given the values of the independent variable(s) $x$ . It is also called the conditional expectation function (CEF) and can be denoted as $E(y_{i}\|x_{i})$
reference category	The category of a dummy variable in respect to which the effect on the value of the dependent variable is displayed
regression	Regression analysis determines the direction and magnitude of influence of one or more independent variables on a dependent variable
regression line	The regression line describes how the dependent variable is functionally related to the values of the independent variable. It it defined by the intercept $\beta_{0}$ and the slope $\beta_{1}$
residual	An estimation of the error term. The difference between an observation $y_{i}$ and the estimated value $\hat{y}_{i}$ . Denoted as $\hat{\epsilon}_{i}$
robust standard errors	Also known as Huber-White standard errors, correct for heteroscedasticity by “[adjusting] the model-based standard errors using the empirical variability of the model residuals” (Mansournia et al., 2020, p. 347)
Sample Regression Function	A regression line based on a randomly drawn sample
significance level	Is denoted as $\alpha$ and defined as $1-$ confidence level.
slope	A slope is defined as rise over run, and so it tells us how many units of y we need to climb (or descend if the slope is negative) for every additional unit of the independent variable $x$
time-series data	Time-series data review a certain characteristic over time $t$ , where $t$ runs from 1 to $T$
Variance-Covariance Matrix	A variance-covariance matrix is a square matrix that encapsulates the variances of different variables on its diagonal and the covariances between pairs of variables in its off-diagonal elements.
Variance Inflation Factor (VIF)	A measure to quantify the how much larger the observed variance of a coefficient is compared to a scenario in which the variable was totally functionally independent from the other independent variables in the model