Model Building

When we are testing a theory, such as modernisation theory, then we usually have a plethora of independent variables to measure the concepts involved in that theory. Take economic development. In its origins, modernisation theory saw development almost purely as economic growth, a notion enshrined in what we refer to now as Classical Modernisation Theory. But as our understanding of what constitutes “development” has changed over time, so have the propositions of modernisation theory. Diamond (1992) made an important contribution to the literature by proposing that wealth, such as per capita GDP, really is only a means to facilitate social change, such as increasing education levels, allowing people to look after their health, etc. He posited that it is these social changes that facilitate democracy. Wealth is necessary, but it is only a first step.

Modernisation theory, therefore, can be seen as having to parts, the classical, and the new one. When you look at journal articlaes, you will notice that these parts are usually tested separately in regression models, first just the classical component, then just the social component, then both of them together. This is an important strategy in regressiona nalysis, as it allows us to isoloate the explanatory power of not only each part of the theory, but also of individual variables.

Let me illustrate what I mean by this. First, we need to decide how to actually measure “economic development” and “social change”. There are a lot of options:

  • Economic Development
    • GDP
    • Agricultural Land
    • Access to electricity
    • Mobile Phone Subscriptions
  • Social Development
    • Education: literacy, primary completion rate, school enrolment, etc.
    • Health: life expectancy, tuberculosis incidents, health expenditure, etc.

Once we have selected our variables, we start by runnign bivariate models between each independent variable and democracy as the dependent variable. This give sus information if each variable is individually able to explain democracy, and if so, how well. Then we start to combine independent variables into multivariate models, to test different combinations. This will allow us to see if one independent variable might take explanatory power away from another, and as such explains democracy better. I have done this in the following table:

Dependent variable:
Polity V
(1) (2) (3)
per capita GDP 0.0001*** 0.0001*
(0.00003) (0.00003)
Life Expectancy 0.160*** 0.090
(0.054) (0.066)
Constant 2.584*** -7.373** -3.248
(0.605) (3.691) (4.339)
Observations 150 152 150
R2 0.064 0.056 0.075
Adjusted R2 0.057 0.050 0.063
Note: p<0.1; p<0.05; p<0.01

As you can see, both per capita GDP and life expectancy can explain variation in the Polity V score in 2007 (Models 1 and 2, repsectively). But when we combine the two in Model 3, life expectancy loses its significance, whilst per capita retains (an admittedly lower level of) significance. In these three models, we have thus discovered evidence that Diamond was wrong: When we look at the role of per capita GDP and life expectancy simultaneously, per capita GDP is able to explain democracy whilst life expectancy is not.

How would you make use of this in the assessment? It goes without saying that you will have to run a lot of models to find such a story. But it would make little sense to overburden the reader of your assessment with all of these models (quite apart from constraints on your word count). Instead, you would only include in your assessment those regression models in the results tables which allow you to tell this story.

Note that in this case, I have also quite neatly tested both the classical and new approach to mdoernisation theory. Model 1 is classical modernisation, Models 2 and 3 test new modernisation theory. We have found evidence that classical modernisation theory is more applicable in this scenario than new modernisation.


Control Variables

The selection of independent variables MUST be guided by theory. After all, this is our purpose in running regression models: finding out whether a particular theory can explain an empirical phenomenon we are witnessing. If you select variables that are irrelevant to your theory, then the research design breaks down, as you are no longer focusing on testing your theory. I know it sounds trivial, but you would be surprised how many students mess this up in assessments.

But there is one – and only one – exception to the rule of not including variables that are not motivated by the theory you are testing: so-called control variables.

For example, we suspect that the amount of official development aid (ODA) a country has received might influence democracy. Modernisation theory is only concerned with processed WITHIN a country, and so a flow of money coming from outside of the country is not part of the theory. Nonetheless, such funds can either facilitate development, in the sense of helping to build infrastructure such as roads, etc. Or it can be used for democracy promotion directly (the World Bank does tie some of its ODA funds to this purpose), through funding relevant institutions, facilitating elections, etc. So, even though these external funds have no place in our theory, we have strong reason to believe that they affect the dependent variable. And it is for this reason, that we would still include them in our regression model to control for their influence. Why? The principle of ceteris paribus (see Week 3) only applies with respect to those variables included in the model. And if we wish to control for ODA, or to purify those coefficients which are motivated by theory from the influence of ODA, we need to include this variable in the model in order to do so.

Even though I am presenting this here as the last step, you would realistically select these variables together with the ones motivated by the theory, as you will also have to perform conceptualisation and measurement, check data availability, etc.