Statistical analysis (STV4020B)

2019-11-15

Master’s/PhD level. University of Oslo. Department of Political Science. 2019

Course description

This course provides a toolkit for students looking to execute their own research projects using quantitative methods. We go through the statistical theory behind models, but we emphasize understanding when models are useful and how to employ them. The course helps students 1) identify appropriate statistical models that describe different data types and 2) design studies that enable causal inference (as opposed to merely establishing correlation).

Many of the phenomena political scientists take interest in can be classified into categories or events. This includes variables like voters’ choice of party, number of violent incidences in a time span or time between violent events. The course introduces students to a number of models which are specifically designed to describe the underlying phenomenon that generates such data. The focus of the first part of the course is on correlations. Our purpose is to help students gear the statistical analysis towards a realistic description of the data.

Beyond statistical description, researchers’ interest in political phenomena is often motivated by a wish to identify their causes. Causal claims focus on whether we would continue to observe $y$ even in the absence of $x$. This is a typical question when studying the effect of policies, for example. The potential outcomes framework provides a useful heuristic for assessing causal relationships. The second part of the course is devoted to a set of quantitative research designs that can be used to make causal inferences.

Learning outcome Students will get hands-on practice by learning how to implement the various techniques and by replicating existing studies in R. Students will also learn how to present results in a manner that is understandable to a general audience, for example by using graphs and other visual tools.

More information about the course is available at the University of Oslo’s webpages.

Slides

Maximum likelihood
Logistic regression (Ward and Ahlquist, 2018, ch 3)

Logistic regression is the base-line example of a generalized linear model (GLM). We go through how to model binary outcomes, how to interpret the results from a logit and why we can’t simply run an OLS.

Ordered and multinomial regression (Ward and Ahlquist, 2018, ch 8-9)

We go through models with categorical dependent variables: Ordinal and nominal categories. Example data used in class can be downloaded here. It is the same as the data from chapter 10 in laerdegR.

Event count models (Ward and Ahlquist, 2018, ch 10)

Event count models describe grouped events. They are appropriate when we do not have relevant information at the event level, but are able to both count them and design contextual predictors. We go through the poisson model and how to check for overdispersion. We then consider different strategies to adress possible overdispersion. We study the quasi-poisson and the negative binomial models as well as hurle and zero-inflated models.

Event history/duration models (Ward and Ahlquist, 2018, ch 11)

We can leverage information both from the fact that an event occured or not (binary outcome) and the time it took for the event to take place (count outcome). These models are made to describe a duration until an outcome occurs.

Causal inference (Angrist and Pischke, 2015, ch 1)

We go through the idea of potential outcomes. What are the comparisons we would we like to make? And what is actually possible to compare with?

In the R-lab we’ll look at matching techniques. We will play with a version of the data from Glyn and Sen’s daughters-effect paper in APSR from 2014.

Regression and matching (Angrist and Pischke, 2015, ch 2)
RDD and diff-in-diff (Angrist and Pischke, 2015, ch 4 and 5)

We go through regression discontinuity designs (RDD) where a treatment kicks in at a certain cutpoint of an observed running variable. We then have a look at the differences-in-differences approach (diff-in-diff) where we observe a treatment group and a control group before and after the treatment. It’s all about interaction effects and dummies!

We’ll examplify with data from chapter 4 and chapter 5 in the book.

Instrumental variables (Angrist and Pischke, 2015, ch 3 and 6)
LaTeX + R = Sant

A brief introduction to LaTeX and how to use it in combination with R scripts (in Norwegian).

Silje Synnøve Lyder Hermansen

Assistant Professor

Silje’s research concerns democratic representation in courts and parliaments. She also teaches various courses in research methods and comparative politics.