All Articles, Featured, Political science methodology, Political Science Profession

Assessing the degree and drivers of sensitivity in political science

Michael Ganslmeier and Tim Vlandas have developed a new approach to measure the fragility of findings in political science. Showing that empirical results can change substantially when researchers vary reasonable and equally defensible modelling choices, they advocate for greater use of systematic robustness checks

Recent controversies around research transparency have reignited longstanding concerns about the fragility of empirical evidence in the social sciences. Headlines tend to focus on cases of misconduct and fraud. However, this often overlooks the degree of sensitivity of results to equally defensible modelling choices in empirical political science research.

Our new study set out to measure the fragility of findings in political science. How much do empirical results change when researchers vary reasonable and equally defensible modelling choices?

To answer this question, we estimated over 3.6 billion regression coefficients across four widely studied topics in political science: welfare generosity, democratisation, public goods provision, and institutional trust – although we report only results for the latter three in this blog post. Each topic is characterised by well-established theories, strong priors, and extensive empirical literatures.

How much do empirical results change when researchers vary reasonable and equally defensible modelling choices?

Our results reveal a striking pattern: the same independent variable often yields not just significant and insignificant coefficients but also a very large number of statistically significant positive and statistically significant negative effects, depending on how the model is set up. Thus, even good-faith research, conducted using standard methods and transparent data, can sometimes produce contradictory conclusions.

A new approach to sensitivity analysis

Recent advances – such as pre-registration, replication files, and registered reports – have significantly improved research transparency. However, they typically begin from a pre-specified model. And even when researchers follow best practices, they still face a series of equally plausible decisions. They must choose which years or countries to include, how to define concepts like 'welfare generosity', whether and which fixed effects to use, whether and how to adjust standard errors, and so on.

Each of these choices may seem minor on its own, and many researchers already use a wide range of robustness checks to explore their impact. But collectively, these decisions define an entire modelling universe. Navigating that space can have a profound effect on results. Standard robustness checks often examine one decision at a time. Yet this may fail to account for the joint influence of many reasonable modelling paths taken together.

Our goal was not to test a single hypothesis, but to observe how much the sign and significance of key coefficients change across plausible model specifications

To map that model space systematically, we combined insights from extreme bounds analysis and the multiverse approach. We then varied five core dimensions of empirical modelling: covariates, sample, outcome definitions, fixed effects, and standard error estimation. The goal was not to test a single hypothesis, nor indeed to replicate prior studies. Rather, it was to observe how much the sign and significance of key coefficients change across plausible model specifications.

The fragility of political science research

For many variables commonly used to support empirical claims, we found many model specifications where the estimated effect was positive and statistically significant. But we also discovered others where it was strongly negative and statistically significant (see graph below).

One clear implication is that conventional robustness checks, while valuable, may still be too limited in scope. Researchers frequently vary control variables, estimation techniques, or subsamples to assess the stability of their findings. But by examining modelling decisions in isolation, researchers can typically apply these checks sequentially and independently. Our results suggest, however, that this approach can miss the larger picture. It is not just which decisions researchers make, but how their combination determines the stability of empirical results.

Results from our study suggest that is not just which decisions researchers make, but how their combination determines the stability of empirical results

By systematically exploring a wide modelling space, while automating thousands of reasonable combinations of covariates, samples, estimators and operationalisations, our approach can assess the joint influence of modelling choices. This allows us to identify patterns of fragility that are invisible to conventional checks.

Three vertical bar charts compare factors affecting democratization (global), public good provision (China), and trust in institutions (Europe+). Blue bars show positive, significant effects; red bars show negative, significant effects. — The panels present the share of (positive and negative) significant coefficients (blue and red, respectively) of all independent variables in the unrestricted model universe for the three test cases: democratisation, regional provision, and institutional trust. The dashed line indicates 90%. Adapted from the authors' article in the ***Proceedings of the National Academy of Sciences*** *(PNAS)*.

The sources of model uncertainty

We estimated the feature importance scores for these different model specification choices. To do so, we first extracted a random set of 250,000 regression coefficients from the unrestricted model universe for each topic. Then, we fitted a neural network to predict whether an estimate is 'negative significant', 'positive significant' or 'not significant'.

The graph below shows that it is not the control variables per se that drive the greatest source of variation, but decisions on sample construction. What matters are which countries or time periods researchers include – and how they define key outcomes. These upstream decisions, often made early and treated as background, exert the strongest influence on whether results are statistically significant – and in which direction.

Feature importance scores of model specification decisions

Three horizontal scatter plots compare feature importance for different cases: democratization (global), public good provision (China), and trust in institutions (Europe+). Points are color-coded by significance and effect direction. — The panels show the feature importance scores (SHAP values) for different model specification choices. Adapted from the authors’ accompanying article in the *Proceedings of the National Academy of Sciences* (PNAS).

Lessons for empirical research

To be clear, the implication of our findings is not that nothing is robust, nor that quantitative social science is futile. On the contrary, our work underscores the value of systematically understanding where results are strong and where (and why) they might be less stable.

With this new approach, we hope to provide an additional tool that researchers can use to carry out systematic robustness checks and to increase transparency. To that end, we provide our code with which future research can analyse and visualise the model space around a result.

This article was originally published on the LSE EUROPP blog on 14 August 2025.

This article presents the views of the author(s) and not necessarily those of the ECPR or the Editors of The Loop.

democratisation institutional trust modelling choices political science political science discipline political science methods public goods provision robustness checks transparency welfare

Contributing Authors

Michael Ganslmeier Assistant Professor in Computational Social Science, University of Exeter More by this author

Tim Vlandas Associate Professor of Comparative Social Policy and Fellow of St Antony’s College, University of Oxford More by this author

Share Article

Republish Article

We believe in the free flow of information Republish our articles for free, online or in print, under a Creative Commons license.

Comments

Recently Published

When courts become weapons: how Chad jailed its opposition leader

Chad's 20-year conviction of opposition leader Succès Masra reveals how African courts have become weaponised against dissent. Across the continent, writes Michael Asiedu, from Benin to Uganda, authoritarian regimes are increasingly using fabricated charges to silence opponents. This, he says, masks repression behind democratic facades, erodes judicial independence, and weakens the prospects for genuine democratic transition

November 21, 2025

🌊 The nature of fascism and why it differs from populism

People are talking more and more about fascism, and often confusing it with populism. Paul D. Kenny argues that we need to understand how fascism stands out. It has never been just a matter of words or beliefs. It is a leader-centred cult that uses violence to eliminate opposition

November 20, 2025

🧭 The negative consequences of rule transfer in EU enlargement

László Bruszt and Julia Langbein argue that EU market rules, when applied to weaker economies, can trigger damaging side effects. Unless anticipated and managed, these risks threaten not just candidate countries but the European Union itself. Lessons from the 2004 enlargement are vital as Ukraine moves closer to membership

November 20, 2025

The Loop

Cutting-edge analysis showcasing the work of the political science discipline at its best.

THE EUROPEAN CONSORTIUM FOR POLITICAL RESEARCH

Advancing Political Science

Join our community

© 2025 European Consortium for Political Research. The ECPR is a charitable incorporated organisation (CIO) number 1167403 ECPR, Harbour House, 6-8 Hythe Quay, Colchester, CO2 8JF, United Kingdom.

Assessing the degree and drivers of sensitivity in political science

A new approach to sensitivity analysis

The fragility of political science research

The sources of model uncertainty

Feature importance scores of model specification decisions

Lessons for empirical research

Contributing Authors

Share Article

Republish Article

Stay in the loop with our biweekly digest

Comments

Leave a Reply Cancel reply

Recently Published

When courts become weapons: how Chad jailed its opposition leader

🌊 The nature of fascism and why it differs from populism

🧭 The negative consequences of rule transfer in EU enlargement

The Loop

Assessing the degree and drivers of sensitivity in political science

A new approach to sensitivity analysis

The fragility of political science research

Share of significant coefficients in the model space for three topics

The sources of model uncertainty

Feature importance scores of model specification decisions

Lessons for empirical research

Contributing Authors

Share Article

Republish Article

Republish this article

Assessing the degree and drivers of sensitivity in political science

Stay in the loop with our biweekly digest

Comments

Leave a Reply Cancel reply

Recently Published

When courts become weapons: how Chad jailed its opposition leader

🌊 The nature of fascism and why it differs from populism

🧭 The negative consequences of rule transfer in EU enlargement

The Loop