Thursday, December 23, 2010

R squared irrelevant?

In addition to problems in experimental design, the key results were felt to be weak.  In Fig. 2, the most significant result is the relationship between physiological drought tolerance and abundance for uplands, which has an R2 of 0.12.  The data presented are not strong enough to support the main points in the abstract.

There are two metrics used to judge a particular result: the coefficient of determination (r2) and the P-value, which is the probability that the observed result could have happened by chance. 

For some modelers seeking to replicate observed phenomenon, the coefficient of determination is the key statistic to assess. Eddy covariance modelers often seek to explain observed patterns in carbon flux and when they can generate a high enough r2 by including different independent variables in their model, the feel they've modeled the system well enough and they move on.  

For hypothesis testing in ecology, r2 is irrelevant. Think about the extremes. Let's say we are attempting to test whether a given trait explains the abundance of species in an ecosystem. Trait A explains 90% of the observed variation in abundance among species. P < 0.001. This would generally be considered highly ecologically significant and important to report. 

Now think about the opposite extreme. Trait A explains 2% of the variation. P = 0.9. Pretty high confidence that you can reject the null and state that trait A is not important. Publishable? Absolutely if there are strong hypotheses relating the two. If we beforehand believe that trait A is likely to predict abundance, then it is likely more important to publish that it didn't explain abundance than if it turned out to explain a high proportion of variance. 

There are some cases where the r2's are important to examine. For example, a model result might be r2 = 0.4, P = 0.1. This happens when there isn't much statistical power. This would be one case where the r2 is a helpful parameter.

What about the above-referenced case of r2 = 0.12, which happened to have a P = 0.008? Is the absolute value of the r2 relevant here? We ran a study that compared physiological drought tolerance to the abundance of 60 species. On the one hand, 12% can be a lot. Let's say there are 8 factors that equally explain abundance. None have ever been identified. you identify one of the 8 with high statistical significance. r2 is only 0.12. That seems important. Or there are only two factors that explain abundance, you've identified one of the two, but there is a lot of measurement error or associated random variation that is inherent to the system. r2 will equal 0.12. [see Shipley's work on relativizing r2 to take account for this.] In short, 12% of the variation is 12% more than we knew before and might be about all we can ever hope to know.

On the other hand, 12% can be considered low if you expected a lot more. In our case we showed for one contrast, drought tolerance explained 12% of the variation in abundance, while in a paired contrast, it explained 0.1% (P = 0.8). If you expect that drought it important and that drought tolerance explains a low proportion of the variation, then that's a scientifically important result that is only strengthened by the low r2. 

r2's in and of themselves are irrelevant. They have to be contextualized to expectations. a high r2 that confirms expectations might be considered less important to publish than a low r2 that contradicts explanation.

In the specific case referenced above, we view an r2 of 0.12 both ways. Our statements in the abstract were 1) In this mesic grassland, physiological drought tolerance appears to increase the abundance of plants in xeric uplands, but does not in the mesic lowlands", and 2) "In all, drought appears to have a limited role in structuring the Konza plant community."

At Konza, drought tolerance explains 12% of the variation in abundance of species in uplands and is responsible for explaining 50-fold variation in abundance. that's a lot, but it happens to against a background of 5 orders of magnitude of abundance across species. At the same time, if one's expectation is that drought is very important in a drought-prone ecosystem where production is highly sensitive to interannual variation in precipitation, that same 12% (when paired with 0%) is actually not a lot. 

The Achilles' Heel of modern scientific publishing is the negative result. It would be silly to only publish papers with positive slopes rather than negative ones, but it seems straightforward to reject papers because r2's are low. Time and time again, it's been shown that not publishing negative results (low r2, low P value) biases our scientific understanding. 

Coefficients of determination are certainly not irrelevant, but they are only relevant within the context of expectations and the statistical significance of a test.