Statistics in general – and logit models in particular – are based on comparisons. All statements are done by comparison to a reference group. Since all the GLMs in this class draw from a probability distribution in the exponential family, it also means that the effect size of one variable depends on the value of the other variables in the regression. These exercices are intended to help you see that.
We will be working on the likelihood that Members of the European
Parliament will pool resources. The dependent variable
PoolLocal
reports whether an MEP shares at least one
constituency-based (local) assistant with a fellow party member. The
hypothesis is that MEPs that compete in candidate-centered systems
(OpenList
== 1) will have fewer incentives to share
resources with party colleagues, since they compete against each other
during the election (i.e. there is an intra-party competition).
The first exercise focuses on interpretation of the results from a model that is already estimated (Ward and Ahlquist (2018), ch 3; Hermansen (2023), ch 7 and 10).. The second exercise allows you to see how those estimates are calculated.
Be prepared to share on padlet: https://padlet.com/siljesynnove/blr
Fit a model with the following predictors.
mod <- glm(PoolsLocal ~
#Candidate-centered system (x = 1) vs party-centered (x = 0)
OpenList
#Party size: share of seats in national parliament
+ SeatsNatPal.prop
#Cost of hiring an assistant in member state
+ LaborCost,
#Recoding strategy (logit-transformation) and binomial probability distribution
family = binomial(link = "logit"),
#Data
df)
What is the marginal effect of changing the electoral system?
What is the predicted probability of sharing assistants when x is 0 and 1, respectively (OpenList)?
To do so, fill in the equation to get the logodds (z):
\[z = -1.09 + -1.12x_1 + -1.93x_2 + 0.06 x_3\]
Then, use the logistic transformation to back-transform:
\[\frac{e^z}{1+e^z} = \frac{exp(z)}{1+exp(z)}\]
Compare the two predicted probabilities for each pairs of scenarios by calculating the first differences.
Dependent variable | |
PoolsLocal | |
OpenList | -1.124*** |
(0.181) | |
SeatsNatPal.prop | -1.930*** |
(0.527) | |
LaborCost | 0.056*** |
(0.009) | |
Constant | -1.094*** |
(0.286) | |
Observations | 686 |
Log Likelihood | -392.832 |
Akaike Inf. Crit. | 793.665 |
Note: | p<0.1; p<0.05; p<0.01 |
To obtain a continuous and unbounded dependent variable on which to run a regression, we recode the 0s and 1s. We do this by making comparisons between observations that have a successful outcome (1s) and those that have a failure (0s). That is, we sum over the number of successes and the number of failures, then compare them. All the regression coefficients are in fact the result of such comparisons.
In this exercise, you will calculate the regression coefficients of two simple binomial logistic regressions. Remember that all coefficients are reported as logodds and change in logodds (oddsratio).
Download the data from my website: https://siljehermansen.github.io/teaching/beyond-linear-models/MEP2016.rda
We’ll start out by calculating the intercept in an intercept-only model. \(y\) reports whether each MEP shared an assistant or not, while \(z\) reports the logodds that MEPs share resources.
\[y = \alpha\] \[z = \alpha\]
\[ z = logit(p) = \frac{p}{1-p}\]
In an intercept-only model, there is no change between groups, so the only comparison is between the number of successes and failures in the data.
a. Calculate the intercept
Calculate the probability, then the odds, then the logodds that an
MEP will pool resources (PoolsLocal
). To do so, you will
have to calculate the number of MEPs that share resources (successes)
and the number of MEPs that don’t share (failures).
b. Run a binomial regression with only an intercept.
How do your logodds compare with the regression coefficients?
mod0 <- glm(PoolsLocal ~ 1,
family = "binomial",
df)
Now, let’s expand the analysis to a binary predictor. We want to describe the likelihood that an MEP will share at least one local assistant as a function of the electoral system.
\(y = \alpha + \beta x\)
In a model with predictors, we add a second comparison between the
groups. In this example, we have a binary predictor
(OpenList
).
a. Calculate the odds of pooling resources in each electoral system
OpenList
).b. Calculate the intercept.
The intercept (\(\alpha\)) is
defined as the value of \(y\) when all
the \(x\)s are 0. In our model, it
reports the logodds when PoolsLocal == 0
.
c. Calculate the slope parameter.
c. Run the model in R and compare.
Did you find the same thing?
mod1 <- glm(PoolsLocal ~
OpenList,
family = "binomial",
df)