Tag Archives: modeling

Deviation change test in R

The deviance change test is one method of comparing nested models. A significant change test indicates that the model containing more parameter estimates accounts for significantly more variability in the outcome variable than the model containing fewer parameter estimates.

If the number of parameters that differ between two models is limited to one, the deviance change test can be a method of determining if the contribution of a predictor variable is significantly greater than nothing.

Deviance tests only work with nested models. Models are nested if all of the models effects (fixed or random) in the simpler model are also included in the more complex model. In other words, the “bigger” model is achieved simply by attaching one or more additional predictors to the “smaller” model.

Here’s a little function to get the job done:

deviance <- function(a,b) {
 diffneg2LL <- (-2*as.numeric(logLik(a))) - (-2*as.numeric(logLik(b)))
 dfneg2LL <- (attr(logLik(b), "df") - attr(logLik(a), "df"))
 return(1 -pchisq(diffneg2LL, dfneg2LL))
}

Once this is defined in your script. You simply have to pass it two model objects. The smaller, more reduced model as “a” and the bigger, fuller model as “b.” You’ll need to have created the two models using lm or lmer or some such.

Here is an example of two, nested models. These are both multilevel models, one with random slopes and intercepts, the other with just random intercepts.

full <- lmer(Y ~ X*W + (X|GROUP), data = dataFrame)
reduced <- lmer(Y ~ X*W + (1|GROUP), data = dataFrame)

Where Y is the outcome variable, X is the level 1 predictor (often a continuous measure of an individual within a group), W is the level 2 predictor (often a category describing groups), and GROUP is often some ID variable unique attributed to each group. dataFrame holds all the values: Y, X, W, and GROUP will be columns in this data set.

Once the models and deviance function are defined in your script, you just need to call the function and pass it the model objects.

deviance(reduced,full)

What does the script do? Using the logLik function, it finds the log-likelihood of each model. These are each then multiplied by negative two. The -2 log likelihood is known in the R package lme4 as the “REML criterion at convergence.”

Because the fuller model includes additional predictors, it counts for more variability than the reduced model (or, at worst, no less variability). This smaller -2LL value is subtracted from the larger and stored as a variable, diffneg2LL. (It’s probably not the most streamlined approach to create additional variables like this, but I opted for explicitness over efficiency here.)

The logLik function also ends up determining the degrees of freedom for the model. The fuller model includes more df when calculating the log likelihood, as it contains more parameters. In the case of the two models above, the full model has two more degrees of freedom than the reduced model, because the reduced model lacks both an estimate for the variance for the random intercepts and an estimate of the covariance between intercepts and slopes.

The difference in degrees of freedom between the two models is equal to the number of parameters that differ between the models.

-2 log likelihoods are chi-square distributed, and the degrees of freedom for the chi-square distribution is the number of parameters that differ between the two models. The last line of the function, then, calls the pchisq function, passing it the -2LL value and the number of parameters. Subtract this number from 1, and you get a p-value indicating the probability of obtaining a -2LL this large (or larger) if the parameters added to the fuller model really contributed nothing to improving predictions.