Statistical models beyond linear regression (2023)

Master’s level. University of Copenhagen. Department of Political Science. 2023

Here you can find the slides and other material used for the different lectures.

Learning R

We will spend our first three weeks familiarizing with R.

Week 1 (theory)

Hermansen 2023, ch. 1-4, p. 19-70

We start by familiarizing with the topic of statistical models when the main assumptions underlying linear models (OLS) are not present. We then move on to R.

Slides:

Week 2 (lab week)

Hermansen 2023, ch. 5-6, p. 73-119

You have had the time to familiarize with base R, functions and objects. Let’s piece this together, and explore two new dialects: ggplot2 for plotting and tidyverse for data manipulation.

Notebooks:

If you haven’t installed the RiPraksis package, you may fetch the data here: kap6.rda

Week 3 (lab week)

Hermansen 2023, ch. 7-9, p. 123-194; Gelman (2007), ch 3-4, p. 29-79

We will be working on linear models (OLS) in R. Our focus is on interpretation. Using the R codes and concepts from last week, we will be making visual and textual interpretations of different model results.

The data we will be working on are a subset of the replication data for “Blurred Lines betwen electoral and parliamentary representation: The use of constituency staff among Members of the European Parliament” European Union Politics (2023).

You can download the data here: MEP2014.rda

Notebook:

Week 4 (theory week): Binary outcomes

Ward (2018), ch. 3, p. 43-78; ch. 6, p. 119-132

We will be working on how to model binary outcomes using a logit model. Our focus is on two different ways of understanding the logit model: As a regression on a latent variable or as a regression on a recoded dependent variable (logodds).

The data I will be examplifying with are a subset of the replication data for “Blurred Lines betwen electoral and parliamentary representation: The use of constituency staff among Members of the European Parliament” European Union Politics (2023).

You can download the data here: MEP2016.rda

Slides:

Week 5 (R week)

Ward (2018), ch. 3, p. 43-78; ch. 6, p. 119-132.

Suggested supplementary reading Hermansen (2023), ch 10; Gelman and Hill (2007), ch 5

We will continue our work on binary outcomes using the logit model. The notebook covers R codes for model interpretation, but our focus will be on model evaluation. Be prepared to share your answers to the problem set.

Notebook:

Week 6 (theory week): Categorical and ordered outcomes

Ward (2018), ch. 8-9, p. 141-189

Slides/notebook:

Week 7 (theory week): Event count outcomes

Ward (2018), ch. 10, p. 190-216

Suggested supplementary reading Gelman and Hill (2007), ch 6.2 p. 110-116

Slides/notebook:

Week 8 (R week): Event count outcomes

Ward (2018), ch. 10, p. 190-216

Suggested supplementary reading Gelman and Hill (2007), ch 6.2 p. 110-116

We’ll use yet another data set on MEPs; this time on the number of legislative proposals they handle during their tenure. You can download the data here: df_yoshinaka.rda

Slides/notebook:

Week 9 (theory week): Event history outcomes

Ward (2018), ch. 11, p. 190-216

Slides/notebook:

Week 10 (R week): Event history outcomes

Ward (2018), ch. 11, p. 190-216

We will be working with a classical dataset on duration models used by Box-Steffensmaier’s 1996 study of candidates’ campaign funding and the entry of challengers. You can find the R-version of the data for the notebook here: warchest.rda

Slides/notebook:

Week 11 (Theory week): Hierarchical models

Gelman and Hill (2007),ch 11-12, p. 235-278

You can find the data for the slides here: MEP.rda

Slides/notebook:

Week 12 (R week): Hierarchical models

Gelman and Hill (2007), ch 11-12, p. 235-278

You can find the data for the slides here: MEP.rda

Slides/notebook:

Week 13 (theory week): Missing data

Ward (2018), ch 12, p. 249-270; Gelman (2007), ch 25, p. 529-545

You can find the data for the all our activities here: MEP.rda

Slides/notebook:

Week 14 (R week): Missing data

Ward (2018), ch 12, p. 249-270; Gelman (2007), ch 25, p. 529-545

You can find the data for the all our activities here: MEP.rda

Slides/notebook:

Complete syllabus

Please familiarize with the syllabus before our first meeting.

Download the syllabus

Course plan

Week Topic Date Reading
1 Introduction to R as a statistics software 09.02 Hermansen (2023), ch. 1-4, p. 19-70
2 Descriptive statistics and graphical display 16.02 Hermansen (2023), ch. 5-6, p. 73-119
3 Linear regression 23.02 Hermansen (2023), ch. 7-9, p. 123-194
Gelman (2007), ch 3-4, p. 29-79
4-5 Binary outcomes (logistic regression) 02.03; 09.03 Ward (2018), ch. 3, p. 43-78
Ward (2018), ch. 6, p. 119-132
6-7 Categorical outcomes (multinomial and ordered logistic regression) 09.03; 16.03 Ward (2018), ch. 8-9, p. 141-189
8-9 Count outcomes (poisson, negative binomial and hurdle models) 23.03; 30.03 Linear Digressions: episode on the poisson distribution
Ward (2018), ch. 10, p. 190-216
10-11 Event history data (survival models) 13.04; 20.04 Ward (2018), ch. 11, p. 190-216
12-13 Hierarchical data structures 27.04; 4.05 Gelman (2007), ch 11-12, p. 235-278
Gelman (2007), ch 13, p. 279-300
Gelman (2007), ch 14-15, p. 301-342
14-15 Missing data 11.05; 16.05 Ward (2018), ch 12, p. 249-270
Gelman (2007), ch 25, p. 529-545

Literature

Gelman, Andrew, and Jennifer Hill. 2007. Data Analysis Using Regression and Multilevel/Hierarchical Models. Leiden: Cambridge University Press.

Hermansen, Silje Synnøve Lyder. 2023. R i praksis - en introduktion for samfundsvidenskaberne. 1st ed. Copenhagen: DJØF Forlag.

Ward, Michael D., and John S. Ahlquist. 2018. Maximum Likelihood for Social Science: Strategies for Analysis. Analytical Methods for Social Research. Cambridge: Cambridge University Press.

Avatar
Silje Synnøve Lyder Hermansen
Assistant Professor

Silje’s research concerns democratic representation in courts and parliaments. She also teaches various courses in research methods and comparative politics.

Related