Linear models: Non-linear effects

In this session, we will be exploring how to estimate and interpret non-linear effects in linear models. That is, the predictors may have a non-linear effect on the outcome. If we want to model this, we’ll have to recode the predictor. Later, we’ll see that in GLMs, the outcome variable is recoded, meaning that the relationships between all the predictors and the outcome are nonlinear.

Introduction

Main methodological takeaways

Main substantial take-aways:

equations R estimates are always linear
non-linear effects therefore require recoding and/or additional parameters that alter the shape of the curve
interpretation of non-linear effects require us to fill inn values for the predictors (scenarios)
examples of non linear effects: log-transformation, quadric terms, interaction effects
interaction effects are symmetric

I will introduce a few additional R-functions:

recoding
- baseR: I(), x^2, x * z
visualization
- ggplot2: xlim(); labs() geom_line(), ggtitle()
- ggeffects: more uses
- marginaleffects: plot_slope()

Political science example

We will be working on some of the insights from Hermansen and Pegan (2023) study of election seeking among Members of the European Parliament (MEPs). You can find a short version of the paper in this EUROPP-blog post.

I expect that MEPs draw on their parliamentary allowance to increase their chances of being elected. Some MEPs will seek reelection to the Euorpean Parliament, but some of them may also wish a career in national politics. If there is a sufficient number of such MEPs, I’d expect the MEPs’ hiring practices to covary with the national electoral cycle and the national electoral system as well.

The data

We will work with a subset of the replication data for the article. Our data lists all MEPs present in parliament in the spring semester of 2014.

I start by fetching the R packages we will use from the library.

#Data wrangling
library(dplyr); 
#Visualization
library(ggplot2); library(marginaleffects)
#Simulate + visualize results
library(ggeffects)
#Results table
library(stargazer)

For convenience, we’ll be working with cross-sectional data from 2014. We begin by loading in the data.

If I have an internet connection, I can download the file directly from the class’ website and put it in my working directory.

download.file(url = "https://siljehermansen.github.io/teaching/beyond-linear-models/MEP2014.rda",
              destfile = "MEP2014.rda")

Now, I can load it into R and rename the object to my liking (df).

#Load in
load("MEP2014.rda")

#Rename
df <- MEP2014

Descriptive statistics

Let’s start with the basic descriptive statistics, so that you get some repetion and familiarize with the data set.

I begin by subsetting the data to the variables I’d like to describe. I also create a vector with plain-English description of each variable for the table

df_desc <- df %>%
  #Select relevant variables
  dplyr::select(LocalAssistants,
                NationalCandidateCentered,
                OpenList,
                ProxNatElection,
                LaborCost,
                SeatsNatPal.prop,
                VoteShare_LastElection,
                Nationality) %>%
  #Redefine as a data frame; this sometimes causes problems in stargazer
  as.data.frame()

#Define variable names for stargazer
variable_description <- c("Local staff size (#assistants)",
                          "National electoral system is candidate-centered (binary)",
                          "European electoral system is candidate-centered (binary)",
                          "Proximity of national election (years)",
                          "Labor cost (in 1000 euros)",
                          "Party size in national parliament (proportion)",
                          "Vote share in the last election (percentage)")

I then use stargazer() to display the model statistics.

**Descriptive statistics**

Statistic	N	Mean	St. Dev.	Min	Max

Local staff size (#assistants)	739	2.900	3.200	0.000	40.000
National electoral system is candidate-centered (binary)	739	0.650	0.480	0	1
European electoral system is candidate-centered (binary)	739	0.470	0.500	0	1
Proximity of national election (years)	739	-2.300	1.200	-4.000	-0.120
Labor cost (in 1000 euros)	739	23.000	11.000	3.800	41.000
Party size in national parliament (proportion)	722	0.260	0.180	0.000	0.670
Vote share in the last election (percentage)	647	20.000	13.000	0.430	59.000

Our dependent variable is the number of local assistants hired by MEPs.

df %>%
  ggplot() +
  geom_histogram(
    aes(
      LocalAssistants
    )
  ) +
  ggtitle("Distribution of staff size among MEPs")

One of the two main predictors is the proximity of national elections. It is measured as the negative count since last national election (truncated at 4 years). Since it is a continuous/metric variable, I describe it using histogram. As we can see, MEPs vary in where they are in the national electoral cycle.

df %>%
  ggplot() +
  geom_histogram(
    aes(
      ProxNatElection
    )
  ) +
  ggtitle("Proximity of national election")

Electoral system

The second variable of interest is the incentive to cultivate a personal vote in the national election. My variable categorizes the national electoral system as being “Candidate-centered” (incentive to cultivate a personal vote) or “Party-centered” (low incentive to cultivate a personal vote). Since it is binary, I’ll treat it as a categorical variable and describe it using a barplot. I also have a similar variable for the EU electoral system.

For convenience, I’ll recode the two variables to be categorical with some substantive description of their values.

df <- 
  df %>%
  mutate(
    eu_syst = if_else(OpenList == 1,
                      "EU candidate-centered",
                      "EU party-centered"),
    nat_syst = if_else(NationalCandidateCentered == 1,
                       "National candidate-centered",
                       "National party-centered")
  )

df %>%
  ggplot +
  geom_bar(aes(nat_syst)) +
  ggtitle("Distribution of national electoral systems")

Overall, I see that there is plenty of variation in all three variables; enough to have some fun.

Non-linear effects

Politial science theories often imply a non-linear effect of x. That is the effect of x depends on the values of other variables in our scenario. This is true for:

the effect of x varies depending on the value of x: non-linear recodings: equations with quadric terms and exponential effects (log-transformation)
the effect of x varies depending on the value of another variable (i.e. the moderating variable, z): interaction effects
the effect of x depends on the intercept and all the other predictors in the equation: e.g. generalized linear models (GLMs) where the outcome (y) is recoded to alter the relationship between y and all the predictors in the equation.

Quadric and cubic terms

Let’s explore the bivariate relationship between the electoral calendar and local staff size. Local regressions (loess curves) are useful to explore the bivariate relationship between two metric variables. What shape would it take if we let the regression line describe itself.

df %>%
  ggplot +
  #Local regression
  geom_smooth(
    aes(
      y = LocalAssistants,
      x = ProxNatElection
    )
  ) +
  ggtitle("Local regression: Effect of electoral calendar on local staff size")

Ouch! This looks challenging… MEPs that are far away from the election has a high staff size. Staff size then drops for MEPs that are three years away from the national election. Staff size is then higher for those that are two years away, then it drops again.

A better test for electoral mobilization would be to follow the change in individual MEPs’ hiring practices over time, but that would be panel data; the topic for next week.

Estimation

We can describe curvy relationships using quadric and cubic terms. These are in fact interaction effects where the effect of x depends on the value of x.

I can implement this as a curvilinear effect where I add a quadric term to the equation.

Quadric term: \(y = a + b_1x + b_2x^2\) Cubid term: \(y = a + b_1x + b_2x^2 + b_3x^3\)

I could have created a new variable in R before I estimate the models.

df <- df %>% mutate(ProxNatElection2 = ProxNatElection^2)

However, I don’t do that here. Instead I recode directly in the model object: the I() function asks R to calculate what is in the parenthesis before moving on. In the parenthesis, I then calculate the square of the electoral calendar (I(ProxNatElection^2)). The reason why I do this, is so that the ggeffect package can visualize the interaction automatically.

mod.linear <- lm(
  LocalAssistants ~
    #Linear effect
    ProxNatElection,
  df
)


mod.quad <- lm(
  LocalAssistants ~
    ProxNatElection
  #Quadric term: interact x with x
  + I(ProxNatElection^2),
  df
)


mod.cube <- lm(
  LocalAssistants ~
    ProxNatElection
  #Quadrix + cubic terms: three-way interaction with x
  + I(ProxNatElection^2)
  + I(ProxNatElection^3),
  df
)

**Effect of electoral calendar on local staff size**

	Dependent variable:

	LocalAssistants
	(1)	(2)	(3)

ProxNatElection	0.014	-0.510	-4.000^***
	(0.097)	(0.470)	(1.100)

I(ProxNatElection2)		-0.120	-2.000^***
		(0.100)	(0.580)

I(ProxNatElection3)			-0.290^***
			(0.087)

Constant	2.900^***	2.500^***	0.980
	(0.250)	(0.440)	(0.640)


Observations	739	739	739
R²	0.00003	0.002	0.017
Adjusted R²	-0.001	-0.001	0.013
Residual Std. Error	3.200 (df = 737)	3.200 (df = 736)	3.200 (df = 735)
F Statistic	0.021 (df = 1; 737)	0.660 (df = 2; 736)	4.200^*** (df = 3; 735)

Note:	p<0.1; p<0.05; p<0.01

I’ve reported the results in the results table. Here, we see get the impression that there is no effect of the electoral calendar if we only consider its linear effect. The second model allows the curve to bend once, the third model allows the curve to change direction twice, as in the descriptive plot.

I can visualize this effect by using the simulation in the ggpredict() function contained in the ggeffects package.

eff.cub <- ggpredict(mod.cube, terms = "ProxNatElection")
eff.cub %>% plot + ggtitle("Effect of electoral calendar on staff size",
                           subtitle = "Model 3")

This is a very opportunisitc way of modeling the data. Quite often, these curvilinear effects are due to confounders; we’re picking up the effect of other variables. Here, I’ll add the type of electoral system at the European and national level as an additional covariates.

Let’s describe the bivariate relationship within different electoral systems.

df %>%
  ggplot +
  geom_smooth(
    aes(
      y = LocalAssistants,
      x = ProxNatElection
    ), 
    se = F,
    color = "black",
    lty = 2
  ) +
  geom_smooth(
    aes(
      y = LocalAssistants,
      x = ProxNatElection
    ),
    method = "lm"
  ) +
  facet_wrap(~ eu_syst + nat_syst)

Well… it seems the line is somewhat bellshaped in three out of four combinations of the European and national electoral systems. We can model that.

mod2 <- lm(LocalAssistants ~ 
             + eu_syst
           +  nat_syst
           + ProxNatElection,
           df)

mod2.quad <- lm(LocalAssistants ~ 
                  + eu_syst
                + nat_syst
                + ProxNatElection
                + I(ProxNatElection^2),
                df)

**Effect of electoral calendar on local staff size**

	Dependent variable:

	LocalAssistants
	(1)	(2)

eu_systEU party-centered	-1.100^***	-1.200^***
	(0.230)	(0.240)

nat_systNational party-centered	-1.200^***	-1.200^***
	(0.250)	(0.250)

ProxNatElection	-0.110	-0.930^*
	(0.097)	(0.480)

I(ProxNatElection2)		-0.180^*
		(0.100)

Constant	3.600^***	3.000^***
	(0.280)	(0.440)


Observations	739	739
R²	0.053	0.057
Adjusted R²	0.049	0.051
Residual Std. Error	3.100 (df = 735)	3.100 (df = 734)
F Statistic	14.000^*** (df = 3; 735)	11.000^*** (df = 4; 734)

Note:	p<0.1; p<0.05; p<0.01

Reading from the first model in the table, we see that the effect of the electoral calendar is negative and statistically insignificant. That is, as we approach a national election, the size of MEPs’ local staff decreases, but it could be a glitch in the data. Neither the direction nor the precision are what I expected!

In the second model, both of the signs of the coefficients for national electoral calendar are negative. Does that mean that the size of MEPs’ local staff decreases as we approach elections? No, that depends on the value of the second coefficient of the variable!

Visualization: ggeffects()

Let’s visualize the effect using the ggpredict() function.

eff2.quad <- ggpredict(mod2.quad, terms = "ProxNatElection")
eff2.quad %>% plot

The effect first increases, then decreases. I can calculate the x-coordinate for the vertex, that is, I can calculate when in the electoral calendar the effect switches direction.

It is given by the following formula:

\(x^* = -\frac{b_1}{2b_2}\)

\(x^* = -\frac{-0.93}{2 \times -0.18}\)

\(-2.58 = -\frac{-0.93}{2 \times -0.18}\)

The effect is at its highest at 2.58 years before the next election, at which point the direction changes.

Interaction effect

Models with interaction effects are like russian dolls, they contain regressions within regressions. Specifically, they let the effect (b) of one variable (x) depend on the value of another, often called the “moderating variable” (z). To express that relationship, we add a multiplicative term between the two variables and estimate a parameter for it (\(b_3\)).

\(y = a + b_1x + b_2z + b_3xz\)

Interaction effects are notoriously hard to interpret. In my opinion, this is where creating scenarios and inspecting predicted effects becomes crucial.

Interaction effects pose two challenges relating to theory-testing.

The pitfall of p-hacking

Interaction effects draw on a subset of the data to estimate a separate effect of x within a subgroup defined by z (often labelled the “moderating” variable). This opens the question of how big our effective sample is.

Considering regressions as comparisons of means, it means that any group average that would be generated “by chance” in less than 1 out of 10 samples is also likely to be labelled as “statistically significant”. In other words, it suffices to try out all the interaction effects that your model could potentially allow for. If you randomly try 10 interaction effects, you have a fair chance of finding one that is “significant”.

In other words, your choice of interaction effects should be theory-driven.

Stringent theory testing

Berry, Golder, and Milton (2012) take this approach to a new level. They argue that theories that predict interaction effects can (and should) be tested thoroughly. They highlight that too many researchers forget that interaction effects are symmetric. We should only retain interactive theories that substantiate that 1) the effect of x on y depends on z and that 2) the effect of z on y depends on x. This test should be done substantially (i.e. does it logically make sense in our theoretical story) and statistically (our topic).

To do so, they suggest focusing on marginal effects to validate as many of the following statements:

marginal effect of x is substantial when z is at its minimum
marginal effect of x is substantial when z is at its maximum
marginal effect of z is substantial when x is at its minimum
marginal effect of z is substantial when x is at its maximum
marginal the effect of the product term (\(bxz\)) should be substantial

In the paper, they rely on marginal effects, where the marginal effect of x is conditional on z \(ME(x|z)\).

Let’s consider a conditional theory of local investment amonf MEPs. Specifically, I expect that as national elections draw near, some MEPs (those that would want to transition to national politics) are inclined to hire larger teams of local assistants. However, I also believe that this is more pressing for MEPs that would compete in a candidate-centered system.

\(Local \:Assistants = b_1 + b_2\times Calendar + b_3\times Electoral \:System + b_4 \times Calendar \times Electoral \:System\)

The symmetric element means that it should be equally true that

MEPs from candidate-centered systems hire equally many or more local assistants at all given calendar dates (ie they always have more incentives to cultivate personal votes than MEPs)
MEPs react more to the electoral calendar when they hail from candidate-centered systems.

I implement this as a model in R by interacting the variables ProxNatElection (x, measuring the negative number of years until the next election) and NationalCandidateCentered (z, binary).

Let’s start by figuring out what the authors mean by ME(x|z) by considering interaction effects in a split sample analysis.

Split-sample analysis

Let’s start with some trivariate statistics. What is the linear effect of the electoral calendar on staff size conditional on the national electoral system (moderating variable)? The electoral system is a binary variable, while the electoral calendar is continuous.

We could conceptualize this as a split sample. That is, the electoral calendar has one effect when MEPs compete in candidate-centered systems (one sample) and another effect when they compete on the party label (second sample).

We can visualize this using the facet_wrap() function in ggplot2.

Effect of ELECTORAL CALENDAR conditional on electoral system

df %>%
  ggplot +
  geom_smooth(
    aes(
      #Dependent variable (y)
      y = LocalAssistants,
      #Predictor (x)
      x = ProxNatElection
    ), 
    #Local regression
    method = "loess",
    #No uncertainty
    se = FALSE,
    #Stipulated linetype
    lty = 2,
    #Color
    color = "black"
  ) +
  geom_smooth(
    aes(
      #Dependent variable (y)
      y = LocalAssistants,
      #Predictor (x)
      x = ProxNatElection
    ), 
    #Linear regression
    method = "lm"
  ) +
  #Moderating variable (z)
  facet_wrap(~ nat_syst) +
  ggtitle("Effect of national electoral calendar conditional on electoral system")

We can also model this as a split sample explicitly.

#First split: z = candidate centered
mod.split1 <- lm(LocalAssistants ~
                 + ProxNatElection,
                 df %>%
                   #Filter to only consider candidate-centered systems
                   filter(
                     NationalCandidateCentered == 1
                   )
)

#Second Split: z= party centered

mod.split2 <- lm(LocalAssistants ~
                 + ProxNatElection,
                 df %>%
                   #Filter to only consider candidate-centered systems
                   filter(
                     NationalCandidateCentered == 0
                   )
)

**Effect of electoral calendar on local staff size: split sample**

	Dependent variable:

	LocalAssistants
	Candidate-centered	Party-centered
	(1)	(2)

ProxNatElection	0.330^**	-0.740^***
	(0.140)	(0.120)

Constant	4.000^***	0.230
	(0.320)	(0.370)


Observations	479	260
R²	0.012	0.120
Adjusted R²	0.010	0.120
Residual Std. Error	3.400 (df = 477)	2.500 (df = 258)
F Statistic	5.800^** (df = 1; 477)	36.000^*** (df = 1; 258)

Note:	p<0.1; p<0.05; p<0.01

We see both from the regressions and from the plot that the effect of the electoral calendar is different in the two subsamples. While MEPs potentially competing in candidate-centered systems at the national level tend to hire more local assistants as the next national election draws near, it is the opposite in party-centered systems.

In each of these models, the marginal effect of the electoral calendar is represented by b, a single regression slope. This is an approximation of Berry, Golder, and Milton (2012)’s use of the marginal effect (the joint effects of \(b_1\) and \(b_3\)): ME(x|z).

We can always split the sample and run separate analyses. The nice thing is, you can immediately see if the marginal effect of your variable of interest is in the expected direction and whether it is statistically significant.

However you don’t know whether the difference between the two effects of the electoral calendar is statistically significant. Nor is this possible if your moderating variable is metric. This is why we resort to interaction effects. When we do so, the effect of x will depend on two parameters: \(b_1\) and \(b_3\). That’s what I do in the following.

Full-sample estimation: interaction effect

The interaction effect consists in three components, each variable is involved in two of those three parameters.

y = a + b_1x + b_2z + b_3xz

In R, we fit the interaction effect as a multiplication between the two variables, electoral system and electoral calendar.

mod5 <- lm(
  LocalAssistants ~
  + NationalCandidateCentered
  * ProxNatElection,
  df
)
summary(mod5)

## 
## Call:
## lm(formula = LocalAssistants ~ +NationalCandidateCentered * ProxNatElection, 
##     data = df)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
##  -3.85  -1.87  -0.66   0.97  36.31 
## 
## Coefficients:
##                                           Estimate Std. Error t value Pr(>|t|)
## (Intercept)                                  0.234      0.459    0.51     0.61
## NationalCandidateCentered                    3.727      0.544    6.85  1.5e-11
## ProxNatElection                             -0.743      0.155   -4.80  1.9e-06
## NationalCandidateCentered:ProxNatElection    1.068      0.198    5.40  9.0e-08
##                                              
## (Intercept)                                  
## NationalCandidateCentered                 ***
## ProxNatElection                           ***
## NationalCandidateCentered:ProxNatElection ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.1 on 735 degrees of freedom
## Multiple R-squared:  0.0629, Adjusted R-squared:  0.0591 
## F-statistic: 16.5 on 3 and 735 DF,  p-value: 2.34e-10

From the model parameters we already understand that the product term is of some size and statistically significant. However, what does this mean?

Marginal effect

Each of the two variables involved in the interaction effect now has two regression coefficients. They can be read as an intercept (\(b_1\)) and a slope coefficient (\(b_2\)) that together define ME(x|z), the joint effect of x. Specifically, there is a baseline effect of x when z == 0 (\(b_1\)) while the slope (\(b_3\)) depends on the value of the moderating predictor (z).

In the following, I implement Berry, Golder, and Milton (2012)’s advice for the effects a. and b. Note that in the following, I’m talking about the effect of x or z, not the predicted value of y.

Intercept: The effect of the calendar is -0.74 when the national electoral system is party-centered (z = 0).

\(ME(Calendar|Electoral\:System = "Party") = b_2\times Calendar + b_3 \times Calendar \times 0\)

\(-0.74 + 1.07 \times 0 = -0.74\)

Here are a few formulations:

In party-centered systems, a one-unit increase in the electoral calendar corresponds to a 0.74 decrease in the number of local assistants in an MEP’s local team, on average.
Since the moderating variable is binary and the data is cross-sectional, a better formulation would be: MEPs in party-centered systems keep on average 0.74 fewer local assistants on their pay roll compared to their colleagues that are one year closer to the next national election.
Since local assistants are easily thought of as human, and there is no such thing as 0.74 humans, we might even say: As elections approach, about three in four MEPs in party-centered systems cut one local assistant per year.

Conditional slope: The effect of the electoral calendar also depends on the effect of the interaction term for the electoral calendar, \(b_3\) and z. Here, the value of z is either 0 or 1, so I’ll examplify with 1.

\(ME(Calendar|Electoral\:System = "Candidate") = b_2\times Calendar + b_3 \times Calendar \times 1\)

\(-0.74 + 1.07 \times 1 = 0.33\)

In candidate centered systems, a one-unit increase in the electoral calendar corresponds to a 0.33 increase in the number of local assistants in an MEP’s local team, on average. All the other formulations also hold for this scenario.

Marginal effect plot with the `marginaleffects` package

In this example the z-predictor has only two values. Berry, Golder, and Milton (2012) focus on the marginal effect, suggesting to always make scenarios by letting z – the moderating variable — slide from low to high to verify if the effect of x is consistent with the theory. In the marginal effect plot they let the x-axis report the value of z, while the y-axis reports the size of the regression coefficient.

In R, we can make marginal effects plots using the marginaleffects package. Here’s the online resource for the package. If you haven’t already installed it, you do so before you fetch it in the library. The package is compatible with ggplot2, so in the following I also add elements to make the graphic more readable.

#Install the package with install.packages("marginaleffects").
library(marginaleffects)
plot_slopes(mod5, 
            #The variable whose slope coefficient you want to plot (x) on the y-axis
            variables = "ProxNatElection",
            #The conditional variable (z)
            condition = "NationalCandidateCentered") +
  #Add a line to show where the null effect is
  geom_hline(aes(yintercept = 0), lty = 3) +
  #Labels for the x- and y-axes
  ylab("ME(x|z): Marginal effect of electoral calendar") +
  xlab("Electoral system (z)") +
  #Plot title
  ggtitle("Marginal effect of electoral calendar")

The marginal effect plot has the advantage that it clearly displays the range of z in which x is statistically significant (and it’s direction). In our plot, we see that the slope coefficient for national electoral calendar is negative and different from zero in party-centered systems (z = 0), and positive and different from zero when in candidate-centered systems (z = 1).

Effect plot: predicted outcomes with `ggeffects()`

Personally, I find it more intuitive to consider the predicted outcomes directly. That is, I prefer predicting outcomes over the full range of x, with a separate slope for different reference values of z.

That is, I’d rather use the usual ggpredict() function from the ggeffects package.

Here, I construct two sets of scenarios: One for the effect of electoral calendar when the national electoral system is candidate-centered, another for the effect of the electoral calendar when the system is party-centered.

eff.int <- ggpredict(mod5,
                     terms = c(
                       #First variable: plotted on the x axis
                       "ProxNatElection",
                       #Second variable: in the grouping (z)
                       "NationalCandidateCentered"
                     ))
eff.int %>% 
  plot +
  #ggplot elements to facilitate reading
  ylab("Predicted local staff size (E(y))") +
  xlab("Years until national election (x)") +
  ggtitle("Local staff size among MEPs: effect of electoral incentives") +
  #Label for the moderating variable
  labs(color = "Electoral system")

All interactions are symmetric

Berry et al emphasize that all interactions are symmetric, and that the reliance on a “moderating variable”, z, is a fiction. It is just as true that the effect of national electoral system on MEPs hiring practices depends on the electoral calendar as vice-versa. They recommend to always flip the interpretation and check with our theory. Does the theory predict this type of effect?

In my example, I’ve theorized that MEPs will hire more local staff as elections approach, especially in candidate centered systems. However, with the interaction effect, we simultaneously create an expectation that the effect of the electoral system depends on the electoral calendar. Specifically, we might say that I expect that MEPs from candidate-centered systems tend to have more local staffers on their payroll, and that this effect is the highest when the electoral incentives are the highest (i.e. close to elections).

To test this, I flip the interpretation and proceed as before. For convenience, I refit model 5 with the nat_syst (the categorical operationalization) instead of NationalCandidateCentered (the numeric operationalization), but that’s just because I want the labels done automatically in ggpredict().

mod5alt <- lm(LocalAssistants ~ 
                nat_syst
              * ProxNatElection,
              df)

Note how I specify the scenarios this time. Berry et al suggest checknig the minimum and maximum values of z (the moderating variable, here electoral calendar). This is useful in the marginal effects plot, but not here. In the effect plot, where I visualize predictions from scenarios, I prefer more realistic values.

eff2.int <- ggpredict(mod5alt,
                     terms = c(
                       "nat_syst",
                       #Three scenarios: 4, 2 and 1 year before the election
                       "ProxNatElection[-4,-2,-1]"
                     ))
eff2.int %>% 
  plot +
  #Plot lines between the average predictions
  geom_line(aes(
    y = predicted,
    x = x
  ),
  #dotted line
  lty = 2) +
  #The moderating variable is recorded as "group" in the eff2.int data frame
  facet_wrap(~group)

Your turn!

Consider the slopes of the electoral calendar for model 5 and the theoretical argument. Is this a convincing support for the theory?
Can you refit the model by including controls for the labor cost (LaborCoast) and EU-level electoral system (OpenList)? Call it model 6.
Compare the results from the new model 6 to the effect of the electoral system in model 5. What happens?
Calculate the marginal effect of both electoral calendar and system according to Berry et al’s recommendation (i.e. the five effects).
Visualize the effect of both.

Literature

Berry, William D., Matt Golder, and Daniel Milton. 2012. “Improving Tests of Theories Positing Interaction.” The Journal of Politics 74 (3): 653–71. https://doi.org/10.1017/S0022381612000199.

Hermansen, Silje Synnøve Lyder, and Andreja Pegan. 2023. “Blurred Lines Between Electoral and Parliamentary Representation: The Use of Constituency Staff Among Members of the European Parliament.” European Union Politics 24 (2): 239–63. https://doi.org/10.1177/14651165221149900.