Glossary

Table 21: Glossary for PO12Q
Term Description
adjusted rsquared The coefficient of determination for multiple regression.
autocorrelation The value of one error term does not allow us to predict the value of another error term. As such their covariances must be zero
categorical Describing the qualitative categories of a characteristic, for example different religions
coefficient A coefficient is a numerical expression which is multiplied with the value of a variable
coefficient of determination Indicates the proportion of the variation in the dependent variable which is explained through the independent variable. It is defined as \(\frac{\text{Explained Sum of Squares}}{\text{Total Sum of Squares}}\)
collinearity If two variables are functionally dependent on each other we call these collinear. Should this apply to multiple variables at the same time we call these multi-collinear. (multi-)collinearity exists if – and only if – the values follow this function precisely. Such a situation is rare, as usually variables are more loosely related to one another
conditional expectation function see Population Regression Function (PRF)
confidence interval A confidence interval constructs an interval of numbers which will contain the true parameter of the population (e.g. the mean) in \((1-\alpha)\) times of cases. \(\alpha\) is usually chosen to be small, so that our confidence interval has a probability of 95% or 99%
confidence level The confidence level is the probability with which the confidence interval is believed to contain the true parameter of the population and is defined as \((1-\alpha)\)
correlation The statistical dependence of two random variables which is determined by pairwise comparison of values
degrees of freedom Degrees of freedom express constraints on our estimation process by specifying how many values in the calculation are free to vary
dichotomous Can only assume two mutually exclusive, but internally homogeneous qualitative categories
dummy variable Dummy variables are dichotomous, categorical variables which indicate the presence or the absence of a characteristic.
error term The error term quantifies the distance between each observation and the corresponding point on the regression line. The terms are denoted as \(\epsilon_{i}\)
heteroscedasticity The variability of the error terms changes for different observations \(X\)
homoscedasticity Refers to a constant variance of the error terms
intercept The intercept is the point at which the regression line intersects the y-axis. In this book we denote it as \(\beta_{0}\)
logarithm Logarithm is defined as the exponent or power to which a base must be raised to yield a given number. Expressed mathematically, \(x\) is the logarithm of \(n\) to the base \(b\) if \(b^x = n\), in which case \(x = log_b n\)
matrix A set of numbers arranged in rows and columns so as to form a rectangular array
model building Running a number of regression models, each testing a different combination of variables.
multicollinearity A situation in which an independent variable is a function of multiple other independent variables in the regression model
omitted variable bias If you omit an important variable in a regression model, it will bias the size of other coefficients if the variables are correlated.
Ordinary Least Squares The method of fitting a regression line by means of minimizing the sum of the squared distances between the observations and the estimated values
parsimony Refers to the principle .
partial slope coefficient A partial slope coefficient measures the influence of a variable in multiple regression, holding all other independent variables in the model constant
Population Regression Function The Population Regression Function (PRF) describes the expected distribution of \(y\), given the values of the independent variable(s) \(x\). It is also called the conditional expectation function (CEF) and can be denoted as \(E(y_{i}|x_{i})\)
reference category The category of a dummy variable in respect to which the effect on the value of the dependent variable is displayed
regression Regression analysis determines the direction and magnitude of influence of one or more independent variables on a dependent variable
regression line The regression line describes how the dependent variable is functionally related to the values of the independent variable. It it defined by the intercept \(\beta_{0}\) and the slope \(\beta_{1}\)
residual An estimation of the error term. The difference between an observation \(y_{i}\) and the estimated value \(\hat{y}_{i}\). Denoted as \(\hat{\epsilon}_{i}\)
robust standard errors Also known as Huber-White standard errors, correct for heteroscedasticity by adjusting the model-based standard errors using the empirical variability of the model residuals
Sample Regression Function A regression line based on a randomly drawn sample
significance level Is denoted as \(\alpha\) and defined as \(1-\)confidence level.
slope A slope is defined as rise over run, and so it tells us how many units of y we need to climb (or descend if the slope is negative) for every additional unit of the independent variable \(x\)
time-series data Time-series data review a certain characteristic over time \(t\), where \(t\) runs from 1 to \(T\)
Variance Inflation Factor (VIF) A measure to quantify the how much larger the observed variance of a coefficient is compared to a scenario in which the variable was totally functionally independent from the other independent variables in the model