Joint Estimation of Emergence and Survival

Rationale

It is perfectly adequate to estimate the processes of democratic emergence and survival separately. There is a more elegant approach, however, in which emergence and survival are estimated together. This is often employed in the literature, for example in the article by Boix & Stokes (2003). The rationale here is that the model will, overall, use more observations and therefore this might influence standard errors and thus statistical significance.

As you will see, the joint estimation requires a lot of manual calculation and general faff. I am mainly showing you this here, so that you can understand the output in aforementioned articles. If you want to employ this yourself, then you could:

  • estimate the joint model as outlined here to check statistical significance of variables
  • then estimate emergence and survival separately and work with those separately estimated models to calculate predicted probabilities and ROC curves

So, what’s the setup? Again, we are incorporating the lagged value \(y_{t-1}\), to encapsulate all the history prior to period \(t\). There is just one little trick I need to introduce before we can start on the model: I will replace \(y_{t-1}\) by an indicator variable \(I_{D}\) in Equation (1) which assumes the value 1 if a country was a democracy in the previous year, and zero otherwise (notation adapted from Epstein et al., 2006, p. 553). Following on from this, we can write the model as follows8:

\[\begin{equation} P(D_{i,t})=\Phi(\beta_{0}+\beta_{1} X_{i,t} +\beta_{2} I_{D} + \beta_{3} I_{D} X_{i,t} + \epsilon_{i,t}) \tag{1} \end{equation}\]

where \(P(D_{it})\) is the probability that a country \(i\) was a democracy in year \(t\), \(\Phi\) is the cumulative normal distribution (the s-shaped distribution which you know from the introduction of probit in week 3), \(I_{D}\) the aforementioned indicator variable, \(X_{i,t}\) is an independent variable for country \(i\) in year \(t\), and \(\epsilon_{i,t}\) is a zero mean stochastic disturbance (I only include this for completeness’ sake here. As this is irrelevant for us here and now, I will drop this from the following discussion, not to confuse you unnecessarily). This model is a multiplicative interaction model which allow us to model the probability of a country to be a democracy, conditional on its regime type in the previous year. The indicator variable \(I_{D}\), equal to \(y_{t-1}\) captures this information. How does this work?

Let us start by assuming that in year \(t-1\) country \(i\) was an autocracy. In this case, the indicator variable \(I_{D}\) would be equal to zero, and therefore Equation (1) can be re-written as

\[\begin{equation} P(D_{i,t})=\Phi(\beta_{0}+\beta_{1} X_{i,t}) \end{equation}\]

In this case, the coefficient \(\beta_{1}\) would only represent the impact of variable \(X_{i,t}\) on the probability of an autocracy transitioning to democracy. Expressed more formally, we are dealing with conditional probabilities here, in the case of \(\beta_{1}\), we would obtain the impact of variable \(X_{i,t}\) on a country to be a democracy in year \(t\), under the condition that it was an autocracy in year \(t-1\). It is conditional on this, because we set the indicator variable to zero before.

We can construct a similar scenario for the condition, that a country was a democracy in the previous year. In this case, the indicator variable \(I_{D}\) would be equal to \(1\), and we would obtain equation (1) again:

\[\begin{equation*} P(D_{i,t})=\Phi(\beta_{0}+\beta_{1} X_{i,t} +\beta_{2} I_{D} + \beta_{3} I_{D} X_{i,t}) \end{equation*}\]

With \(I_{D}=1\), this can be simplified to:

\[\begin{equation} P(D_{it})=\Phi((\beta_{0} +\beta_{2}) + (\beta_{1} + \beta_{3}) X_{i,t}) \end{equation}\]

This equation illustrates very well, that the impact of variable \(X_{i,t}\) on the probability of democracy to remain a democracy is now made up out of the sum of the two coefficients \(\beta_{1}\) and \(\beta_{3}\), whereas the constant is the sum of coefficients \(\beta_{0}\) and \(\beta_{2}\).


Example

To make this more tangible, let’s look at an example. Assume we want to look at the effect of per capita GDP on democratic emergence (i.e. the transition of an autocracy to democracy), and democratic survival (i.e. the transition from democracy to democracy. In this case we would specify the model as follows:

\[\begin{equation} P(D_{it})=\Phi(\beta_{0}+\beta_{1} \text{per capita GDP}_{i,t} +\beta_{2} I_{D} + \beta_{3} I_{D} \text{per capita GDP}_{i,t}) \tag{2} \end{equation}\]

The coefficient indicating the impact of per capita GDP on democratic emergence is \(\beta_{1}\). As illustrated above, the indicator variable \(I_{D}\) is zero in this case, and the equation would be reduced to

\[\begin{equation} P(D_{it})=\Phi(\beta_{0}+\beta_{1} \text{per capita GDP}_{i,t}) \end{equation}\]

For democratic survival, the coefficient indicating the impact of per capita GDP would be the sum of coefficients \(\beta_{1}\) and \(\beta_{3}\)

\[\begin{equation} P(D_{it})=\Phi((\beta_{0} +\beta_{2}) + (\beta_{1} + \beta_{3}) \text{per capita GDP}_{i,t}) \end{equation}\]


Application in R

How does all of this look in R, and how do you apply this to a real-world scenario? Let’s do this using the above example of per capita GDP in the global data set. You have already created the indicator variable \(I_{D}\) in the form of the variable l.democracy. What we need next is a variable that captures \(I_{D} \text{per capita GDP}_{i,t}\), so that we can calculate \(\beta_{3}\). To create this variable, type

world$gdppc_l.democracy <- world$l.democracy * world$gdppc

Now, we are ready to estimate the model. Remember, formally, this is written as:

\[\begin{equation*} P(D_{it})=\Phi(\beta_{0}+\beta_{1} \text{per capita GDP}_{i,t} +\beta_{2} I_{D} + \beta_{3} I_{D} \text{per capita GDP}_{i,t}) \end{equation*}\]

We replace this in the R command with the equivalent variables:

joint <- glm(democracy ~ gdppc + l.democracy + gdppc_l.democracy, 
                 data = world,
                 na.action = na.exclude,
                 family = binomial(link = "probit"))

If you have done regression analysis before, and you worry about (multi-)collinearity in such a model, then please note that:

Analysts should include all constitutive terms when specifying multiplicative interaction models except in very rare circumstances. By constitutive terms, we mean each of the elements that constitute the interaction term. Thus, X and Z are the constitutive terms in [this model: \(y=\beta_0 + \beta_1 X + \beta_2 Z + \beta_4 XZ + \epsilon\)]. (Brambor et al., 2006, p. 66)

You obtain the following output:

summary(joint)

Call:
glm(formula = democracy ~ gdppc + l.democracy + gdppc_l.democracy, 
    family = binomial(link = "probit"), data = world, na.action = na.exclude)

Coefficients:
                    Estimate Std. Error z value Pr(>|z|)    
(Intercept)       -1.968e+00  5.139e-02 -38.290  < 2e-16 ***
gdppc             -2.954e-05  1.452e-05  -2.034    0.042 *  
l.democracy        3.587e+00  1.001e-01  35.842  < 2e-16 ***
gdppc_l.democracy  2.513e-04  4.863e-05   5.169 2.36e-07 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 10718.5  on 7731  degrees of freedom
Residual deviance:  1299.9  on 7728  degrees of freedom
  (1724 observations deleted due to missingness)
AIC: 1307.9

Number of Fisher Scoring iterations: 11

For democratic emergence, we can report the Intercept (Intercept) and the slope coefficient gdppc straight away as -1.968 and -0.00002954, respectively. The slope coefficient is significant, as the p-value is \(<0.05\). This is in line with the findings from estimating emergence separately. For democratic survival, we need to take the sum of (Intercept) and l.democracy to obtain the intercept, and gdppc and gdppc_l.democracy to obtain the slope coefficient. This calculation will yield the same coefficients as the two-stepped analysis.

The last step is the assessment of statistical significance. For the emergence model, we can once again interpret the output straight away. For the survival scenario, however, we need to test the hypothesis that for example \(\beta_{0}\) and \(\beta_{3}\) are jointly different from zero. As the survival effect is a joint-venture between these two coefficients, we also need to assess their significance jointly, and not just concentrate on \(\beta_{3}\). This is the mistake Przeworski et al. (2000) have made in their seminal book, and this is what is discussed on the first few pages of the article by Epstein et al. (2006). To do this, we are using a post-estimation command which is testing the aforementioned hypothesis that \(\beta_{0}\) and\(\beta_{3}\) are jointly different from zero, called a Wald-Test.

For this we need to install and load a new package, called survey. This is a regression term test, where you first need to state the object within which the results are stored (here: joint), and the two terms you want to test, preceded by a tilde and connected by a plus. Lastly we specify that we want the Wald Test.

library(survey)
regTermTest(joint, ~gdppc+gdppc_l.democracy, method="Wald")
Wald test for gdppc gdppc_l.democracy
 in glm(formula = democracy ~ gdppc + l.democracy + gdppc_l.democracy, 
    family = binomial(link = "probit"), data = world, na.action = na.exclude)
F =  13.48875  on  2  and  7728  df: p= 1.4194e-06 

With \(1.4194e-06\) we can reject the null hypothesis, and conclude that jointly, the slope coefficients are different from zero, and as such that per capita GDP explains democratic survival.


  1. This discussion draws on Brambor et al. (2006). Please note that I am deliberately NOT lagging the independent variables (IVs) on this worksheet to keep notation as simple as possible. If you decide to run the interaction model, make sure you lag the IVs as explained above.↩︎