Thursday, August 16, 2012

How to design traits experiments

Relationship between the # of species I sampled and the number of replicates for each species across all the trait screening experiments and surveys I have done over the past 15 years.

One of the fundamental tradeoffs in plant trait science (and ecology in general) is how to array replicates among subjects vs. within them. In trait screening research, this usually takes the form of deciding whether to measure a lot of species or a lot of replicates for species.

If one measures a large number of species with little replication, then there is little certainty of the any one point.

If one measures few species with a lot of replication, generality in relationships across species is compromised, potentially to the point where an existing relationship among species is not detected.

The tension is an old one and often addressed by "balancing" designs. Measuring an intermediate number of species with an intermediate level of replication.

That would be wrong.

Different scenarios require different approaches. If the goal of a project is to test for relationships among species, then first replicate among and then within. This often means just one replicate per species, which seems flawed, but the species is actually a replicate. Replicating within a species reduces statistical power to detect an overall relationship, even if there is no way to assess the confidence of any one point.*

*This approach was something that was in the milk at Cedar Creek. In 2001, I published a trait study from Cedar Creek that stated what was commonly understood explicitly "Only one plant or clone was sampled per species. Although this minimizes our confidence in the value for a parameter of any one species, for a given amount of sampling effort, this approach maximizes the confidence in the overall relationship among all species." That experiment had 76 species. Seems paltry in a way.

Other scenarios generate other approaches. For example, if traits are measured to better understand the performance of species in an experiment, then first array replicates across those species and then replicate within the species.

If the objective is to generate community-weighted means, then replicates should be added in proportion to their abundance. I'm pretty sure no one has ever done this design.

There are other important questions about trait screening design (that I'm trying to work out in a paper). How to incorporate phylogeny and when and how to array replicates across growth environments all influence the design. With any luck, the manuscript I'm leading will document best practices and can be referred to by researchers and reviewers.

The key to emphasize here is that often the best design will have no replication at the species level. You measure as little as you can on each species for as many species as possible. One measurement for one individual for a lot of species is the best experimental design in many cases and should not be compromised with replicates.


  1. Nice post - thanks. When I think about sampling efficiency, my mind often goes to thinking about efficiency of effort, and what we might save by going with a shotgun approach.

    The shotgun approach (where you sample more or less what you come across) has a number of downsides. However, its major advantage is that you don't expend effort in choosing the samples. If it takes a significant time investment to find every last species, perhaps you're better off forgetting about those last few points, and adding more within-species replicates since you might, for example, get 10 more within-species replicates for the effort you'd spend trying to find every one out-of-species replicate.

    If there's lots of post-harvest effort or cost (such as nutrient analyses), then being very careful about each replicate you choose makes sense. But what if the majority of the effort in expended on the front end?

    1. Seems like a reasonable thing to consider. It should not be too hard to add that to the equation. Part of the problem in doing the math on sampling effort is not having knowledge of the variability among vs. within. It's hard to know how much additional replicates of a species will provide vs. another species.