Assessing the degree and drivers of sensitivity in political science

Michael Ganslmeier and Tim Vlandas have developed a new approach to measure the fragility of findings in political science. Showing that empirical results can change substantially when researchers vary reasonable and equally defensible modelling choices, they advocate for greater use of systematic robustness checks

Recent controversies around research transparency have reignited longstanding concerns about the fragility of empirical evidence in the social sciences. Headlines tend to focus on cases of misconduct and fraud. However, this often overlooks the degree of sensitivity of results to equally defensible modelling choices in empirical political science research.

Our new study set out to measure the fragility of findings in political science. How much do empirical results change when researchers vary reasonable and equally defensible modelling choices?

To answer this question, we estimated over 3.6 billion regression coefficients across four widely studied topics in political science: welfare generosity, democratisation, public goods provision, and institutional trust – although we report only results for the latter three in this blog post. Each topic is characterised by well-established theories, strong priors, and extensive empirical literatures.

How much do empirical results change when researchers vary reasonable and equally defensible modelling choices?

Our results reveal a striking pattern: the same independent variable often yields not just significant and insignificant coefficients but also a very large number of statistically significant positive and statistically significant negative effects, depending on how the model is set up. Thus, even good-faith research, conducted using standard methods and transparent data, can sometimes produce contradictory conclusions.

A new approach to sensitivity analysis

Recent advances – such as pre-registration, replication files, and registered reports – have significantly improved research transparency. However, they typically begin from a pre-specified model. And even when researchers follow best practices, they still face a series of equally plausible decisions. They must choose which years or countries to include, how to define concepts like 'welfare generosity', whether and which fixed effects to use, whether and how to adjust standard errors, and so on.

Each of these choices may seem minor on its own, and many researchers already use a wide range of robustness checks to explore their impact. But collectively, these decisions define an entire modelling universe. Navigating that space can have a profound effect on results. Standard robustness checks often examine one decision at a time. Yet this may fail to account for the joint influence of many reasonable modelling paths taken together.

Our goal was not to test a single hypothesis, but to observe how much the sign and significance of key coefficients change across plausible model specifications

To map that model space systematically, we combined insights from extreme bounds analysis and the multiverse approach. We then varied five core dimensions of empirical modelling: covariates, sample, outcome definitions, fixed effects, and standard error estimation. The goal was not to test a single hypothesis, nor indeed to replicate prior studies. Rather, it was to observe how much the sign and significance of key coefficients change across plausible model specifications.

The fragility of political science research

For many variables commonly used to support empirical claims, we found many model specifications where the estimated effect was positive and statistically significant. But we also discovered others where it was strongly negative and statistically significant (see graph below).

One clear implication is that conventional robustness checks, while valuable, may still be too limited in scope. Researchers frequently vary control variables, estimation techniques, or subsamples to assess the stability of their findings. But by examining modelling decisions in isolation, researchers can typically apply these checks sequentially and independently. Our results suggest, however, that this approach can miss the larger picture. It is not just which decisions researchers make, but how their combination determines the stability of empirical results.

Results from our study suggest that is not just which decisions researchers make, but how their combination determines the stability of empirical results

By systematically exploring a wide modelling space, while automating thousands of reasonable combinations of covariates, samples, estimators and operationalisations, our approach can assess the joint influence of modelling choices. This allows us to identify patterns of fragility that are invisible to conventional checks.

Share of significant coefficients in the model space for three topics

Three vertical bar charts compare factors affecting democratization (global), public good provision (China), and trust in institutions (Europe+). Blue bars show positive, significant effects; red bars show negative, significant effects.
The panels present the share of (positive and negative) significant coefficients (blue and red, respectively) of all independent variables in the unrestricted model universe for the three test cases: democratisation, regional provision, and institutional trust. The dashed line indicates 90%. Adapted from the authors' article in the Proceedings of the National Academy of Sciences (PNAS).

The sources of model uncertainty

We estimated the feature importance scores for these different model specification choices. To do so, we first extracted a random set of 250,000 regression coefficients from the unrestricted model universe for each topic. Then, we fitted a neural network to predict whether an estimate is 'negative significant', 'positive significant' or 'not significant'.

The graph below shows that it is not the control variables per se that drive the greatest source of variation, but decisions on sample construction. What matters are which countries or time periods researchers include – and how they define key outcomes. These upstream decisions, often made early and treated as background, exert the strongest influence on whether results are statistically significant – and in which direction.

Feature importance scores of model specification decisions

Three horizontal scatter plots compare feature importance for different cases: democratization (global), public good provision (China), and trust in institutions (Europe+). Points are color-coded by significance and effect direction.
The panels show the feature importance scores (SHAP values) for different model specification choices. Adapted from the authors’ accompanying article in the Proceedings of the National Academy of Sciences (PNAS).

Lessons for empirical research

To be clear, the implication of our findings is not that nothing is robust, nor that quantitative social science is futile. On the contrary, our work underscores the value of systematically understanding where results are strong and where (and why) they might be less stable.

With this new approach, we hope to provide an additional tool that researchers can use to carry out systematic robustness checks and to increase transparency. To that end, we provide our code with which future research can analyse and visualise the model space around a result.

This article was originally published on the LSE EUROPP blog on 14 August 2025.

This article presents the views of the author(s) and not necessarily those of the ECPR or the Editors of The Loop.

Contributing Authors

photograph of Michael Ganslmeier Michael Ganslmeier Assistant Professor in Computational Social Science, University of Exeter More by this author
photograph of Tim Vlandas Tim Vlandas Associate Professor of Comparative Social Policy and Fellow of St Antony’s College, University of Oxford More by this author

Share Article

Republish Article

We believe in the free flow of information Republish our articles for free, online or in print, under a Creative Commons license.

Creative Commons License

[sibwp_form id=1]

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *

The Loop

Cutting-edge analysis showcasing the work of the political science discipline at its best.
Read more
THE EUROPEAN CONSORTIUM FOR POLITICAL RESEARCH
Advancing Political Science
© 2025 European Consortium for Political Research. The ECPR is a charitable incorporated organisation (CIO) number 1167403 ECPR, Harbour House, 6-8 Hythe Quay, Colchester, CO2 8JF, United Kingdom.
linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram