adjusted rsquared
|
The coefficient of determination for multiple regression.
|
autocorrelation
|
The value of one error term does not allow us to predict the value of another error term. As such their covariances must be zero
|
categorical
|
Describing the qualitative categories of a characteristic, for example different religions
|
coefficient
|
A coefficient is a numerical expression which is multiplied with the value of a variable
|
coefficient of determination
|
Indicates the proportion of the variation in the dependent variable which is explained through the independent variable. It is defined as \(\frac{\text{Explained Sum of Squares}}{\text{Total Sum of Squares}}\)
|
collinearity
|
If two variables are functionally dependent on each other we call these collinear. Should this apply to multiple variables at the same time we call these multi-collinear. (multi-)collinearity exists if – and only if – the values follow this function precisely. Such a situation is rare, as usually variables are more loosely related to one another
|
conditional expectation function
|
see Population Regression Function (PRF)
|
confidence interval
|
A confidence interval constructs an interval of numbers which will contain the true parameter of the population (e.g. the mean) in \((1-\alpha)\) times of cases. \(\alpha\) is usually chosen to be small, so that our confidence interval has a probability of 95% or 99%
|
confidence level
|
The confidence level is the probability with which the confidence interval is believed to contain the true parameter of the population and is defined as \((1-\alpha)\)
|
correlation
|
The statistical dependence of two random variables which is determined by pairwise comparison of values
|
degrees of freedom
|
Degrees of freedom express constraints on our estimation process by specifying how many values in the calculation are free to vary
|
dichotomous
|
Can only assume two mutually exclusive, but internally homogeneous qualitative categories
|
dummy variable
|
Dummy variables are dichotomous, categorical variables which indicate the presence or the absence of a characteristic.
|
error term
|
The error term quantifies the distance between each observation and the corresponding point on the regression line. The terms are denoted as \(\epsilon_{i}\)
|
heteroscedasticity
|
The variability of the error terms changes for different observations \(X\)
|
homoscedasticity
|
Refers to a constant variance of the error terms
|
intercept
|
The intercept is the point at which the regression line intersects the y-axis. In this book we denote it as \(\beta_{0}\)
|
logarithm
|
Logarithm is defined as the exponent or power to which a base must be raised to yield a given number. Expressed mathematically, \(x\) is the logarithm of \(n\) to the base \(b\) if \(b^x = n\), in which case \(x = log_b n\)
|
matrix
|
A set of numbers arranged in rows and columns so as to form a rectangular array
|
model building
|
Running a number of regression models, each testing a different combination of variables.
|
multicollinearity
|
A situation in which an independent variable is a function of multiple other independent variables in the regression model
|
omitted variable bias
|
If you omit an important variable in a regression model, it will bias the size of other coefficients if the variables are correlated.
|
Ordinary Least Squares
|
The method of fitting a regression line by means of minimizing the sum of the squared distances between the observations and the estimated values
|
parsimony
|
Refers to the principle .
|
partial slope coefficient
|
A partial slope coefficient measures the influence of a variable in multiple regression, holding all other independent variables in the model constant
|
Population Regression Function
|
The Population Regression Function (PRF) describes the expected distribution of \(y\), given the values of the independent variable(s) \(x\). It is also called the conditional expectation function (CEF) and can be denoted as \(E(y_{i}|x_{i})\)
|
reference category
|
The category of a dummy variable in respect to which the effect on the value of the dependent variable is displayed
|
regression
|
Regression analysis determines the direction and magnitude of influence of one or more independent variables on a dependent variable
|
regression line
|
The regression line describes how the dependent variable is functionally related to the values of the independent variable. It it defined by the intercept \(\beta_{0}\) and the slope \(\beta_{1}\)
|
residual
|
An estimation of the error term. The difference between an observation \(y_{i}\) and the estimated value \(\hat{y}_{i}\). Denoted as \(\hat{\epsilon}_{i}\)
|
robust standard errors
|
Also known as Huber-White standard errors, correct for heteroscedasticity by adjusting the model-based standard errors using the empirical variability of the model residuals
|
Sample Regression Function
|
A regression line based on a randomly drawn sample
|
significance level
|
Is denoted as \(\alpha\) and defined as \(1-\)confidence level.
|
slope
|
A slope is defined as rise over run, and so it tells us how many units of y we need to climb (or descend if the slope is negative) for every additional unit of the independent variable \(x\)
|
time-series data
|
Time-series data review a certain characteristic over time \(t\), where \(t\) runs from 1 to \(T\)
|
Variance Inflation Factor (VIF)
|
A measure to quantify the how much larger the observed variance of a coefficient is compared to a scenario in which the variable was totally functionally independent from the other independent variables in the model
|