Let’s explore missing data imputation the intuitive way; by doing it!
Our research question is whether Members of the European Parliament (MEPs) that do not have to cultivate a personal vote – but instead are facing a party selectorate that cares about policy delivery – spend more time in Parliament when they are re-election seekers.
The dependent variable Attendance
reports the number of
committee meetings the MEP has attended while ClosedList
indicates whether the MEP will compete in a party-centered system if
(s)he were to seek reelection. FutureInEP
describes whether
the MEP plans on seeking reelected. The “hitch” here is that the
variable is mostly missing; it comes from a survey.
How can we address that problem?
For simplicity, we will run a pooled OLS on the data with an
interaction term between the MEP’s ambition and electoral system. You
can find the data online
https://siljehermansen.github.io/teaching/beyond-linear-models/MEP.rda
.
Be prepared to share the table/codes/thoughts on padlet: https://padlet.com/siljesynnove/exercise-1-simple-na-fixes-df5ppo9n2x0a4wuc
Consult Gelman and Hill (2007) p. 532-34 and follow their descriptions for simple fixes.
FutureInEP
and
ClosedList
. Personally, I used OLS estimation for this
one.EPGroup
variable to weigh the observations in
an alternative model:FutureInEP
.FutureInEP
using EPGroup
as a predictor.weights = 1/preds
)NA
by the mean of the
FutureInEP
variableNA
by the group mean of the
FutureInEP
variable. Use the EPGroup
as a
grouping variable.FutureInEP
variable by randomly sample from the observed parts of the
variable?You can use the sample()
function in R. Here, I randomly
draw 4 samples from my x
variable. I also specify that the
machine is not allowed to draw the same observation twice.
x <- 1:10
sample(x = x,
size = 4,
replace = F)
[1] 8 7 5 1
After having randomly sampled, you’d have to replace only the
NA
s in FutureInEP
with your imputed values,
while keeping the observed values intact.
Can you impute the missing values using regression? Here are a few
suggestions for predictors Age
(of the MEP),
TermsInOffice
(of the MEP), position
(of
national party towards EU integration). The variable
ParlGovID
identifies the national parties, while
Nationality
and Period
flag the MEP’s member
state and the period of study (one of 10 semesters).
fit the regression model of your choice
use the predict()
function to extract predicted
values. Replace the missing values and re-run the main model.
Compare.
can you combine your prediction with an element of random sampling?
se.fit = T
)rnorm()
functionHere, I draw randomly twice from a vector of two normal distributions; in the first one the mean is 1 and the standard error is 0.1; in the second the mean is 2 and the standard error 0.1.
mean <- c(1,2)
se <- c(0.1, 0.1)
rnorm(n = 2,
mean = mean,
sd = se)
[1] 1.024479 1.980594