Statistics in general – and logit models in particular – are based on comparisons. All statements are done by comparison to a reference group. This is true for the binomial logistic regression (for binary outcomes) where all observations have one reference group. When outcomes are categorical, we also engage in the comparison game. However, the reference group is different. This problem set aims to help you see what implications this has for the interpretation of the models, as well as getting an intuition for what goes on “under the hood” in the estimation of the models.

We will be working on the same data from the European Social Survey (ESS) that is used in chapter 6 in (Hermansen 2023). You can download it from my website: kap6.rda

download.file("https://siljehermansen.github.io/teaching/beyond-linear-models//kap6.rda",
              destfile = "kap6.rda")
#Rename
df <- kap6 %>%
  mutate(Education = Uddannelse,
         Income = Indtaegt,
         Female = Kvinde,
         Burden = Belastning,
         Immigrant = Innvandrer)

Be prepared to share on padlet: https://padlet.com/siljesynnove/beyond-linear-regression-yxuvstrvfvjia47y

Multinomial logistic regression

Many outcomes of interest to political scientists are categorical and cannot immediately be ordered (i.e. they are nominal). These outcomes are often the result of a choice, based on some (latent) preference held by the chooser. Party choice among voters is the most typical example. We can use the data to estimate the utility that choosers derive from different options simply by looking at the options they prefer. Here, we use the multinomial logistic model (Ward and Ahlquist 2018, ch 9).

Exercise 1: Interpret the multinomial model

Let’s make a hypothesis that voters’ income has a bearing on their choice of party because parties differ in the amount of redistribution they are willing to support.

Begin by choosing a reference category. Which one did you choose? Why? You can use the functions as.factor() and relevel()
Estimate a multinomial model where choice of party is a function of education and income.

library(nnet)
mod.cat <- multinom(Party ~ Education + Income, df)

The model effectively estimates the effect of income at equal levels of education. Which party has the voters with the highest/lowest revenue? In your opinion, does our knowledge of respondents’ income help us understand how voters discriminate between parties?
Interpret the marginal effects (by exponentiating the slope coefficients). Which parties are similar to your reference category with respect to the effect of income? For the party-pairs that are significantly different, what is the effect of income on the respondents’ choice of party?
Make predictions for a scenario where a respondent is rich (Income = 9) by filling in the equation for Dansk Folkeparti (DF), Venstre (V) and Socialistisk Folkeparti (SF). What do you find?
Reestimate your model with a new reference category? What happened?
Based on your new model fit, make predictions for the same scenarios as in question 1e. What did you find? Why?

Exercise 2: How the multinomial model is estimated

Fit an intercept-only model for party choice using the multinom() function from the nnet package. Be mindful of the reference level you pick.

Calculate the regression coefficients in an intercept-only model by hand. You might use my code from the lecture slides for inspiration.
Calculate a the parameters of a multinomial model of party choice with a binary predictor by hand. The data contains a predictor that flags the respondent’s gender (Kvinde/Female)

\[Pr(m_j) \sim log(odds)\] \[log(odds) = a_m + b_m \times Female_i\]

* use the code from excersise a, but subset the data according to the value of the predictor. That is, create two data frames, one for all observations with "Female == 0" (`df0`) and one for "Female == 1" (`df1`).
* calculate the odds of voting for each of the m-1 parties (be mindful of your reference category) for each of the data frames (df0 and df1).
* slope:
  - calculate the oddsratio by dividing the odds of observing a female respondent (Female == 1; `df1`) by the odds of not observing a female respondent  (Kvinde == 0; `df0`).
  - logtransform the oddsratio
* intercept: logtransform the odds of observing Female == 0.

Compare your results to the regression coefficients from the multinomial model estimated with the `multinom()` function.

How would you explain the slope coefficient to your neighbour/classmate?

The ordinal model

The data contains an ordinal variable where the respondents are asked to what extent they consider immigration to be a burden to the Danish society (Belastning/Burden). Here, we use the ordered logit model (Ward and Ahlquist 2018, ch 8).

Exercise 3: Fit and interpret an ordinal model

Recode the variable in R into an ordinal variable. To do so, you can use the as.factor() and the ordered() functions. Store the result in a new variable (e.g. Burden_ord).
Fit an ordinal model where agreement to the statement is a function of the respondent’s income, education and immigrant background.

library(MASS)
mod.ord <- polr(Burden_ord ~ Income + Education + Immigrant, 
                Hess = T,
                df)

Present the results in stargazer. You can use the argument ord.intercept= to display the intercepts/cutvalues.
What is the marginal effect of income? Can you make a sentence that states the finding?

In the ordinal model, the cutpoints (where one category transitions to another) can be understood as intercepts. Specifically, the slope coefficients are estimated by comparing all observations above the cutpoint with all observations below the cutpoint.

Calculate the probability that an observation is below the first – then the second – cutpoint in the model. The predicted probabilities in an ordinal model follow the following equation:

\[Pr(y < \tau) \sim log(odds)\] \[log(odds) = a - bx\]

Note that the slope parameter is subtracted from the intercept!

* set a fixed scenario for each of the predictors.
* calculate the logodds by filling in the equation for each of the intercepts/cutpoints, holding the scenario constant
* backtransform from logodds to probability by using the logistic function

\[Pr(y < \tau) = \frac{exp(logodds)}{1 + exp(logodds)}\]

Using the same scenario, calculate the probability that an observation falls between cutpoints “0|1” and “1|2” by subtracting one cut point from the other.
Check your results by having R calculate the predicted probabilities. Remember to keep your scenario constant.
Calculate the cumulative sum of the probabilities using the cumsum() function: predict(mod, newdata, type = "probs") %>% cumsum. What is the difference?

Exercise 4: Assumptions of the ordered logit model

Estimate the effect of income on the perception that immigrants are a burden. Include the controls for education and immigrant background as before.

We will check the two assumptions of the model: the parallel lines assumption (i.e. the slope is approximately the same regardless of the threshold/cutpoint it passes) and that the categories are indeed ordered and distinguishable from each other.

Create 10 binary variables (one for each of the cut points in the “Burden” variable), then run 10 binomial regressions. You might take inspiration from the slides or from the following code. mod <- glm(Burden < 1 ~ Income + Education + Immigrant, df, family = "binomial")
Display the 10 binary models and the ordinal model from the previous exercise in a stargazer() results table. Are the regression coefficients very different?
Create a coefplot for each of the 10 coefficients on income. To do so, you will want to make a small data frame with one observation per model and variables reporting the slope coefficient, its standard error and the cutvalue associated with each model. What do you see?
Create a similar coefplot for the intercepts/cutpoints of the model. There is no need to backtransform them.

Literature

Hermansen, Silje Synnøve Lyder. 2023. R i praksis - en introduktion for samfundsvidenskaberne. 1st ed. Copenhagen: DJØF Forlag. https://www.djoef-forlag.dk/book-info/r-i-praksis.

Ward, Michael D., and John S. Ahlquist. 2018. Maximum Likelihood for Social Science: Strategies for Analysis. Analytical Methods for Social Research. Cambridge: Cambridge University Press. https://doi.org/10.1017/9781316888544.

Problem set: Models for categorical data

Silje Synnøve Lyder Hermansen

2024-03-18