How complex should models be?

complex1Photo by Stef Lewandowski and licensed under creative commons

There is a longstanding discussion about the degree of model complexity that species distributions models have to have in order to maximise the usefulness of the predictions. A discussion that started with observations that more complex models were fitting species distributions data better than simpler models. See for example, Segurado & Araújo 2004 and Elith et al. 2006.

Of course, at the root of the problem is the issue of “usefulness”. As discussed elsewhere, the perception of usefulness is related to the particular uses of the models and the particular component of the niche that is being predicted. There is circumstantial evidence that complex models fit data better and simpler models predict independent events better. In other words, complex models would work fine when the goal is to predict distributions of unkown populations within the same region where samples have been drawn to fit the models (thus being proficient at predicting the realised niche), whereas simpler models would be more suited to characterise species-environmental response curves, thus being better at generalising outside the training window (thus being arguably better equipped to approach the fundamental niche).

In a recent paper (Callejas and Araújo 2015), we introduced the concept of computational complexity, widely used in theoretical computer sciences, to quantify the complexity of different species distributions model. In addition to model complexity, we characterised the complexity of species distributional data after examining their geometrical properties. Then, we generated virtual species distributions and tested the ability of models with varying degrees of complexity to predict data of varying complexities involving climate change and climate stasis.

Of the eight species distribution models analysed, five (Random Forest, boosted regression trees, generalized additive models, multivariate adaptive regression splines, MaxEnt) showed similar performance despite differences in computational complexity. The ability of models to forecast distributions under climate change was also not affected by model complexity. In contrast, geometrical characteristics of the data were related to model performance in several ways: complex datasets were consistently more difficult to model, and the complexity of the data was affected by the choice of predictors and the type of data analysed.

Given our definition of complexity, our study contradicts the widely held view that the complexity of species distributions models has significant effects in their predictive ability while findings support for previous observations that the geometrical properties of species distributions data and their relationship with the environment are strong predictors of model success.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>