Statistical analysis (STV4020B)

2020-11-06

Master’s/PhD level. University of Oslo. Department of Political Science. 2020

This course provides a toolkit for students looking to execute their own research projects using quantitative methods. We explain the statistical theory behind models, but we emphasize understanding when different models are useful and how to employ them.

The course helps students 1) identify appropriate statistical models that describe different data types and 2) design studies that enable causal inference (as opposed to merely establishing correlation).

Describing data (generalized linear models)

Many of the phenomena political scientists take interest in can be classified into categories or events. This includes variables like voters’ choice of party, number of violent incidences in a time span, or time between violent events. The course introduces students to a number of models which are specifically designed to describe the underlying phenomenon that generates such data. The focus of the first part of the course is on correlations. Our purpose is to help students gear the statistical analysis towards a realistic description of the data.

Lecture 1: Maximum likelihood

(Ward and Ahlquist, 2018, ch 1, 2, 5-7)

We introduce maximum likelihood as a framework for statistical modeling.

Lecture 2: Models of outcome and choice: The logit model

(Ward and Ahlquist, 2018, ch 3)

Logistic regression is the base-line example of a generalized linear model (GLM). We go through how to model binary outcomes, how to interpret the results from a logit and why we can’t simply run an OLS.

Lecture 3: Models of outcome and choice: Ordered and multinomial regression

(Ward and Ahlquist, 2018, ch 8-9)

We go through models with categorical dependent variables: Ordinal and nominal categories. Example data used in class can be downloaded here. It is the same as the data from chapter 10 in laerdegR.

Lecture 4: Event count models: Poisson regression and other alternatives

(Ward and Ahlquist, 2018, ch 10)

Event count models describe grouped events. They are appropriate when we do not have relevant information at the event level, but are able to both count them and design contextual predictors. We go through the poisson model and how to check for overdispersion. We then consider different strategies to adress possible overdispersion. We study the quasi-poisson and the negative binomial models as well as hurle and zero-inflated models.

Towards the end of the lecture, we consider the decision tree for determining which GLM model might best describe our data.

Lecture 5: Event history/duration models

(Ward and Ahlquist, 2018, ch 11)

We can leverage information both from the fact that an event occured or not (binary outcome) and the time it took for the event to take place (count outcome). These models are made to describe a duration until an outcome occurs.

Causal inference

Beyond statistical description, researchers’ interest in political phenomena is often motivated by a wish to identify their causes. Causal claims focus on whether we would continue to observe $y$ even in the absence of $x$. This is a typical question when studying the effect of policies, for example. The potential outcomes framework helps us understand the conditions under which we can identify causal relationships. The second part of the course is devoted to a set of quantitative research designs that can be used to make causal inferences.

Causal inference

(Morgan and Winship, 2015, ch 1-3; Samii 2016)

We go through the idea of potential outcomes. What are the comparisons we would we like to make? And what is actually possible to compare with?

In the R-lab we’ll look at matching techniques. We will play with a version of the data from Glyn and Sen’s daughters-effect paper in APSR from 2014.

Regression and matching

(Ho et al., 2007; Iacus, King and Porro, 2011)

We introduce matching as a technique to adjust for observed confounders.

RDD and diff-in-diff

(Angrist and Pischke, 2015, ch 4 and 5; Wing, Simon and Bello-Gomez, 2018; Imbens and Lemieux 2008)

We go through regression discontinuity designs (RDD) where a treatment kicks in at a certain cutpoint of an observed running variable. We then have a look at the differences-in-differences approach (diff-in-diff) where we observe a treatment group and a control group before and after the treatment. It’s all about interaction effects and dummies!

We’ll play around with data on young adults, deathrates and alcohol both from chapter 4 and chapter 5.

Instrumental variables

(Sovey and Green, 2011)

We introduce instrumental variables and provide guidelines for their application in political science.

Complete syllabus

Learning outcome Students will get hands-on practice by learning how to implement the various techniques and by replicating existing studies in R. Students will also learn how to present results in a manner that is understandable to a general audience, for example by using graphs and other visual tools.

More information about the course is available at the University of Oslo’s webpages.

Silje Synnøve Lyder Hermansen

Assistant Professor

Silje’s research concerns democratic representation in courts and parliaments. She also teaches various courses in research methods and comparative politics.

Statistical analysis (STV4020B)

Describing data (generalized linear models)

Lecture 1: Maximum likelihood

Lecture 2: Models of outcome and choice: The logit model

Lecture 3: Models of outcome and choice: Ordered and multinomial regression

Lecture 4: Event count models: Poisson regression and other alternatives

Lecture 5: Event history/duration models

Causal inference

Causal inference

Regression and matching

RDD and diff-in-diff

Instrumental variables

Complete syllabus

Silje Synnøve Lyder Hermansen

Assistant Professor

Related