Statistical models beyond linear regression

Master’s level. University of Copenhagen. Department of Political Science. 2024

Here you can find the slides and other material used for the different lectures.

Learning the basics

We will spend our first weeks familiarizing with R and linear models (OLS).

Week 1: Introduction

Hermansen 2023, ch. 1-4, p. 19-70

We start by familiarizing with the topic of statistical models when the main assumptions underlying linear models (OLS) are not present. We then move on to R.

Slides:

The best way to follow the class, is to code along. You can find info on how to install R (the statistical software) and RStudio (the interface) here or here.

Week 2: Descriptive statistics and graphical display

Hermansen 2023, ch. 5-6, p. 73-119

You have had the time to familiarize with base R, functions and objects. Let’s piece this together, and explore two new dialects: ggplot2 for graphical display and tidyverse for data manipulation/recoding.

Notebooks:

If you haven’t installed the RiPraksis package, you may fetch the data here: kap6.rda

Week 3: Linear models and non-linear effects

Hermansen 2023, ch. 7-9, p. 123-194; Gelman 2007, ch 3-4, p. 29-79; Berry 2012, p 653-671

We will spend the week familiarizing with non-linear effects in linear models (interaction effects) and how to interpret model results.

Notebooks:

The data we will be working on are a subset of the replication data for “Blurred Lines betwen electoral and parliamentary representation: The use of constituency staff among Members of the European Parliament” European Union Politics (2023).

You can download the data here: MEP2014.rda

When data is structured

Week 4-5: Hierarchical/multilevel models

Gelman and Hill (2007), ch 11-13, p. 235-299

We start by going through the assumptions of the linear model in order to transition to instances where observations share common characteristics (they are not i.i.d.). We then go through the opportunities offered by hierarchical models: varying intercepts, varying slopes, 2-level regression and how these models pool information.

Slides:

Problem sets:

R-notebook:

You can download the data I use to exemplify linear assumptions (MEP2014.rda) and hierarchical structures (MEP.rda) here.

When outcomes are descrete

Week 6: Binary outcomes

Ward (2018), ch. 3, p. 43-78; ch. 6, p. 119-132

We will be working on how to model binary outcomes using a logit model. Our focus is on two different ways of understanding the logit model: As a regression on a latent variable or as a regression on a recoded dependent variable (logodds).

The data I will be exemplifying with are a subset of the replication data for “Blurred Lines between electoral and parliamentary representation: The use of constituency staff among Members of the European Parliament” European Union Politics (2023).

You can download the data here: MEP2016.rda

Slides:

Problem set:

R-notebook:

Week 7-8: Categorical outcomes

Ward 2018, ch. 8-9, p. 141-189

The data I will be exemplifying with is drawn from the European Social Survey.

You can download the data here: kap6.rda and kap10.rda

Slides:

Problem set:

R-notebook:

Week 10: Count outcomes

Ward 2018, ch. 10, p. 190-216

Slides:

Problem set:

Suggested supplementary reading Gelman and Hill (2007), ch 6.2 p. 110-116

We’ll use yet another data set on MEPs; this time on the number of legislative proposals they handle during their tenure. You can download the data here: df_yoshinaka.rda

Notebook:

Week 11: Duration outcome

Ward (2018), ch. 11, p. 190-216

Slides:

We will be working with a classical dataset on duration models used by Box-Steffensmaier’s 1996 study of candidates’ campaign funding and the entry of challengers. You can find the R-version of the data for the notebook here: warchest.rda

Notebook:

Week 12: Event count and history outcomes: former exams

We’ll be working together on former exams this week. You can find the exam questions and replication data here. You can also find some excellent former student hand-ins for inspiration on Absalon.

Event count models (Scharpf, 2020):

Event history models (Fortna, 2015):

Week 14: Missing data

Our data frequently contains missing observations. This week, we take the time to reflect and explore what kind of missing observations we have and whether it may bias our results. It can be useful to have an active approach to how to address missing information. We will be using the MEP.rda data on the European Parliament.

Slides:

Notebook:

Complete syllabus

Please familiarize with the syllabus.

Download the syllabus

Course plan

Week Topic Date Reading
1 Introduction to R as a statistics software 05.02; 07.02 Hermansen 2023, ch. 1-4, p. 19-70
2 Descriptive statistics and graphical display 12.02; 14.02 Hermansen 2023, ch. 5-6, p. 73-119
3 Linear regression 19.02; 21.02 Hermansen 2023, ch. 7-9, p. 123-194
Gelman 2007, ch 3-4, p. 29-79
Berry 2012
King 2000 (supplementary reading)
4-5 Hierarchical data structures 26.02; 28.03 Gelman 2007, ch 11-12, p. 235-278
04.03; 06.03 Gelman 2007, ch 13, p. 279-300
Gelman 2007, ch 14-15, p. 301-342 (supplementary reading)
6 Binary outcomes (logistic regression) 11.03; 13.03 Ward 2018, ch. 3, p. 43-78
Ward2018, ch. 6, p. 119-132
Gelman2007, ch. 6, p. 109-134 (supplementary reading)
7-8 Categorical outcomes (multinomial and ordered logistic regression) 18.03; 20.03; 03.04 Ward 2018, ch. 8-9, p. 141-189
Assignment 1 is given 03.04
9 Workshop week 08.04; 10.04 Assignment 1 presentation, Assignment helpdesk, Dynamic reporting
Assignment due (optional) 14.04
10-11 Count outcomes (poisson, negative binomial and hurdle models) 15.04; 17.04 Linear Digressions: podcast on poisson distribution
Ward 2018, ch. 10, p. 190-216
Gelman2007, ch. 6, p. 109-134 (supplementary reading)
11-12 Event history data (survival models) 22.04; 24.04; 29.04 Ward 2018, ch. 11, p. 190-216
Assignment 2 is given 29.04
13 Workshop week 06.05; 08.05 Assignment 2 presentation, Assignment helpdesk, shiny-app
14 Missing data 06.05; 13.05 Ward 2018, ch 12, p. 249-270
Gelman 2007, ch 25, p. 529-545
Deadline portfolio exam 01.06

Literature

Berry, William D., Matt Golder, and Daniel Milton. 2012. “Improving Tests of Theories Positing Interaction”". The Journal of Politics.

Gelman, Andrew, and Jennifer Hill. 2007. Data Analysis Using Regression and Multilevel/Hierarchical Models. Leiden: Cambridge University Press.

Hermansen, Silje Synnøve Lyder. 2023. R i praksis - en introduktion for samfundsvidenskaberne. 1st ed. Copenhagen: DJØF Forlag.

King, Gary, Michael Tomz, and Jason Wittenberg. 2000. “Making the Most of Statistical Analyses: Improving Interpretation and Presentation.” American Journal of Political Science. 44 (2): 341–55.

Ward, Michael D., and John S. Ahlquist. 2018. Maximum Likelihood for Social Science: Strategies for Analysis. Analytical Methods for Social Research. Cambridge: Cambridge University Press.

Avatar
Silje Synnøve Lyder Hermansen
Assistant Professor

Silje’s research concerns democratic representation in courts and parliaments. She also teaches various courses in research methods and comparative politics.

Related