Statistics in general – and logit models in particular – are based on comparisons. All statements are done by comparison to a reference group. Since all the GLMs in this class draw from a probability distribution in the exponential family, it also means that the effect size of one variable depends on the value of the other variables in the regression. These exercices are intended to help you see that.

We will be working on the likelihood that Members of the European Parliament will pool resources. The dependent variable PoolLocal reports whether an MEP shares at least one constituency-based (local) assistant with a fellow party member. The hypothesis is that MEPs that compete in candidate-centered systems (OpenList == 1) will have fewer incentives to share resources with party colleagues, since they compete against each other during the election (i.e. there is an intra-party compatition).

The first exercise focuses on interpretation of the results from a model that is already estimated. The second excercise allows you to see how those estimates are calculated.

Exercise 1: Predicted point estimates/first difference (text)

What is the predicted effect of changing electoral system on MEPs’ propensity to share local assistants …

In Bulgaria (Labor cost == 4.4); when the party is small (Seat share == 0.1).
In Denmark (Labor cost == 42); when the party is small (Seat share == 0.1).
Is this a realistic set of scenarios?

Compare the two predicted probabilities for each pairs of scenarios by calculating the first differences (Ward and Ahlquist (2018), ch 3; Hermansen (2023), ch 7 and 10).

Be prepared to share on padlet: https://padlet.com/siljesynnove/what-is-the-predicted-effect-of-going-from-party-centered-to-f63o5ohmr3lts4d4

**MEPs’ propensity to share local assistants (a binomial logit)**

	Dependent variable

	PoolsLocal

OpenList	-1.124^***
	(0.181)

SeatsNatPal.prop	-1.930^***
	(0.527)

LaborCost	0.056^***
	(0.009)

Constant	-1.094^***
	(0.286)


Observations	686
Log Likelihood	-392.832
Akaike Inf. Crit.	793.665

Note:	p<0.1; p<0.05; p<0.01

Exercise 2: The recoding of the dependent variable (binomial logit)

To obtain a continuous and unbounded dependent variable on which to run a regression, we recode the 0s and 1s. We do this by making comparisons between observations that have a successful outcome (1s) and those that have a failure (0s). That is, we sum over the number of successes and the number of failures, then compare them. All the regression coefficients are in fact the result of such comparisons.

In this exercise, you will calculate the regression coefficients of two simple binomial logistic regressions. Remember that all coefficients are reported as logodds and change in logodds (oddsratio).

Download the data from my website: https://siljehermansen.github.io/teaching/beyond-linear-models/MEP2016.rda

Exercise 2a: Base-line model without predictors.

We’ll start out by calculating the intercept in an intercept-only model. \(y\) reports the logodds that MEPs share resources.

\(y_i = \alpha\)

In an intercept-only model, there is no change, so the only comparison is between the number of successes and failures in the data.

1. Calculate the intercept

Calculate the probability, then the odds, then the logodds that an MEP will pool resources.

2. Run a binomial regression with only an intercept.

How do your logodds compare with the regression coefficients?

mod0 <- glm(PoolsLocal ~ 1,
            family = "binomial",
            df)

Exercise 2b: Model with one predictor

Now, let’s expand the analysis to a predictor. We want to describe the likelihood that an MEP will share at least one local assistant as a function of the electoral system.

\(y = \alpha + \beta x\)

In a model with predictors, we add a second comparison between the groups. In this example, we hve a binary predictor (OpenList).

1. Calculate the slope parameter.

First, divide the data in two according to the values of the predictor. Second, calculate the odds of sharing assistants in each group. Third, calculate the ratio between the two odds. Fourth, logtransform.

2. Calculate the intercept. The intercept (\(\alpha\)) is defined as the value of \(y\) when all the \(x\)s are 0. In our model, it reports the logodds when PoolsLocal == 0.

3. Run the model in R and compare.

mod1 <- glm(PoolsLocal ~
              OpenList,
            family = "binomial",
            df)

Literature

Hermansen, Silje Synnøve Lyder. 2023. R i praksis - en introduktion for samfundsvidenskaberne. 1st ed. Copenhagen: DJØF Forlag. https://www.djoef-forlag.dk/book-info/r-i-praksis.

Ward, Michael D., and John S. Ahlquist. 2018. Maximum Likelihood for Social Science: Strategies for Analysis. Analytical Methods for Social Research. Cambridge: Cambridge University Press. https://doi.org/10.1017/9781316888544.

Problem set: Logistic regression

Silje Synnøve Lyder Hermansen

2023-03-04

Exercise 1: Predicted point estimates/first difference (text)

Exercise 2: The recoding of the dependent variable (binomial logit)

Exercise 2a: Base-line model without predictors.

Exercise 2b: Model with one predictor

Literature