What GLM should I choose?

How can I choose the best (parsimonious) description of my data? Here is my mental map of choice criteria when I consider what Generalized Linear Model (GLM) to rely on.

Let's assume I want to describe the relationship (β) between an independent variable (x) and a dependent variable (y) in the form of:

y = α + βx + ε

Let's furthermore assume that the ideal description is not only parsimonious (i.e. it suffices with one parameter β) but also faithful to the data (i.e. the model predicts outcomes consistently over the entire range of y). GLMs come in handy when we want to describe relationships in which the distribution of errors (ε) is not normal.

Decision tree for GLMs

I generally begin by considering how my dependent variable is measured and the data generating process that I believe produced it. It is standard procedure to check if the assumptions inherent in the model are satisfied and potentially reconsider the choice.

Avatar
Silje Synnøve Lyder Hermansen
Assistant Professor

Silje’s research concerns democratic representation in courts and parliaments. She also teaches various courses in research methods and comparative politics.

Related