Master’s level. University of Copenhagen. Department of Political Science. 2024
Here you can find the slides and other material used for the different lectures.
Learning the basics
We will spend our first weeks familiarizing with R and linear models (OLS).
Week 1: Introduction
Hermansen 2023, ch. 1-4, p. 19-70
We start by familiarizing with the topic of statistical models when the main assumptions underlying linear models (OLS) are not present. We then move on to R.
Slides:
The best way to follow the class, is to code along. You can find info on how to install R (the statistical software) and RStudio (the interface) here or here.
Week 2: Descriptive statistics and graphical display
Hermansen 2023, ch. 5-6, p. 73-119
You have had the time to familiarize with base R, functions and objects. Let’s piece this together, and explore two new dialects: ggplot2
for graphical display and tidyverse
for data manipulation/recoding.
Notebooks:
- Data manipulation: dialects and tidyverse pipes
- Descriptive statistics: numeric summaries and visuals using ggplot2
If you haven’t installed the RiPraksis
package, you may fetch the data here: kap6.rda
Week 3: Linear models and non-linear effects
Hermansen 2023, ch. 7-9, p. 123-194; Gelman 2007, ch 3-4, p. 29-79; Berry 2012, p 653-671
We will spend the week familiarizing with non-linear effects in linear models (interaction effects) and how to interpret model results.
Notebooks:
The data we will be working on are a subset of the replication data for “Blurred Lines betwen electoral and parliamentary representation: The use of constituency staff among Members of the European Parliament” European Union Politics (2023).
You can download the data here: MEP2014.rda
When data is structured
Week 4-5: Hierarchical/multilevel models
Gelman and Hill (2007), ch 11-13, p. 235-299
We start by going through the assumptions of the linear model in order to transition to instances where observations share common characteristics (they are not i.i.d.). We then go through the opportunities offered by hierarchical models: varying intercepts, varying slopes, 2-level regression and how these models pool information.
Slides:
- Day 1: Assumptions of the linear model and grouped variation
- Day 2-3: Overview over hierarchical models
Problem sets:
- Problem set 1: Assumptions of the linear model
- Problem set 2: Within- and between-group variation
R-notebook:
You can download the data I use to exemplify linear assumptions (MEP2014.rda) and hierarchical structures (MEP.rda) here.
When outcomes are descrete
Week 6: Binary outcomes
Ward (2018), ch. 3, p. 43-78; ch. 6, p. 119-132
We will be working on how to model binary outcomes using a logit model. Our focus is on two different ways of understanding the logit model: As a regression on a latent variable or as a regression on a recoded dependent variable (logodds).
The data I will be exemplifying with are a subset of the replication data for “Blurred Lines between electoral and parliamentary representation: The use of constituency staff among Members of the European Parliament” European Union Politics (2023).
You can download the data here: MEP2016.rda
Slides:
Problem set:
- Problem set 3: Intuitions from the binomial logistic model
R-notebook:
Week 7-8: Categorical outcomes
Ward 2018, ch. 8-9, p. 141-189
The data I will be exemplifying with is drawn from the European Social Survey.
You can download the data here: kap6.rda and kap10.rda
Slides:
Problem set:
R-notebook:
Week 10: Count outcomes
Ward 2018, ch. 10, p. 190-216
Slides:
- Day 1: Event count models
Problem set:
- Problem set 5: Interpretation and estimation of poisson regression
Suggested supplementary reading Gelman and Hill (2007), ch 6.2 p. 110-116
We’ll use yet another data set on MEPs; this time on the number of legislative proposals they handle during their tenure. You can download the data here: df_yoshinaka.rda
Notebook:
Week 11: Duration outcome
Ward (2018), ch. 11, p. 190-216
Slides:
We will be working with a classical dataset on duration models used by Box-Steffensmaier’s 1996 study of candidates’ campaign funding and the entry of challengers. You can find the R-version of the data for the notebook here: warchest.rda
Notebook:
Week 12: Event count and history outcomes: former exams
We’ll be working together on former exams this week. You can find the exam questions and replication data here. You can also find some excellent former student hand-ins for inspiration on Absalon.
Event count models (Scharpf, 2020):
- Portfolio questions
- Replication data: df_scharpf
- Article: Scharpf, 2020
Event history models (Fortna, 2015):
- Portfolio questions
- Replication data: fortna.rda
- Article: Fortna, 2015
Week 14: Missing data
Our data frequently contains missing observations. This week, we take the time to reflect and explore what kind of missing observations we have and whether it may bias our results. It can be useful to have an active approach to how to address missing information. We will be using the MEP.rda data on the European Parliament.
Slides:
Notebook:
Complete syllabus
Please familiarize with the syllabus.
Course plan
Week | Topic | Date | Reading |
---|---|---|---|
1 | Introduction to R as a statistics software | 05.02; 07.02 | Hermansen 2023, ch. 1-4, p. 19-70 |
2 | Descriptive statistics and graphical display | 12.02; 14.02 | Hermansen 2023, ch. 5-6, p. 73-119 |
3 | Linear regression | 19.02; 21.02 | Hermansen 2023, ch. 7-9, p. 123-194 |
Gelman 2007, ch 3-4, p. 29-79 | |||
Berry 2012 | |||
King 2000 (supplementary reading) | |||
4-5 | Hierarchical data structures | 26.02; 28.03 | Gelman 2007, ch 11-12, p. 235-278 |
04.03; 06.03 | Gelman 2007, ch 13, p. 279-300 | ||
Gelman 2007, ch 14-15, p. 301-342 (supplementary reading) | |||
6 | Binary outcomes (logistic regression) | 11.03; 13.03 | Ward 2018, ch. 3, p. 43-78 |
Ward2018, ch. 6, p. 119-132 | |||
Gelman2007, ch. 6, p. 109-134 (supplementary reading) | |||
7-8 | Categorical outcomes (multinomial and ordered logistic regression) | 18.03; 20.03; 03.04 | Ward 2018, ch. 8-9, p. 141-189 |
Assignment 1 is given | 03.04 | ||
9 | Workshop week | 08.04; 10.04 | Assignment 1 presentation, Assignment helpdesk, Dynamic reporting |
Assignment due (optional) | 14.04 | ||
10-11 | Count outcomes (poisson, negative binomial and hurdle models) | 15.04; 17.04 | Linear Digressions: podcast on poisson distribution |
Ward 2018, ch. 10, p. 190-216 | |||
Gelman2007, ch. 6, p. 109-134 (supplementary reading) | |||
11-12 | Event history data (survival models) | 22.04; 24.04; 29.04 | Ward 2018, ch. 11, p. 190-216 |
Assignment 2 is given | 29.04 | ||
13 | Workshop week | 06.05; 08.05 | Assignment 2 presentation, Assignment helpdesk, shiny -app |
14 | Missing data | 06.05; 13.05 | Ward 2018, ch 12, p. 249-270 |
Gelman 2007, ch 25, p. 529-545 | |||
Deadline portfolio exam | 01.06 |
Literature
Berry, William D., Matt Golder, and Daniel Milton. 2012. “Improving Tests of Theories Positing Interaction”". The Journal of Politics.
Gelman, Andrew, and Jennifer Hill. 2007. Data Analysis Using Regression and Multilevel/Hierarchical Models. Leiden: Cambridge University Press.
Hermansen, Silje Synnøve Lyder. 2023. R i praksis - en introduktion for samfundsvidenskaberne. 1st ed. Copenhagen: DJØF Forlag.
King, Gary, Michael Tomz, and Jason Wittenberg. 2000. “Making the Most of Statistical Analyses: Improving Interpretation and Presentation.” American Journal of Political Science. 44 (2): 341–55.
Ward, Michael D., and John S. Ahlquist. 2018. Maximum Likelihood for Social Science: Strategies for Analysis. Analytical Methods for Social Research. Cambridge: Cambridge University Press.