Tuesday, May 24, 2016

Why we cite papers

Scholarly works are set apart from other types of writings by the use of citations. An essay on natural history might cover a scientific topic, but it is just an essay until it contains citations. Scientific papers are not scientific without citations.***

***This blog post is certainly not scientific...no citations here. OK, maybe one.

Most scientists do not question the need for citations nor the role they play in the paper itself. When we do not have a common understanding of the role of citation, we have trouble determining when citations are improper and what to do when what we think to be true shifts.

Most of us think that the big debates about citations is formatting. Do we number our citations or list the authors and dates each time? There are deeper issues that that. They have nothing to do with formatting.

The first time I really thought about citations is I remember that Stephen Jay Gould once got into trouble for citing a paper in his thesis that was not contained in his school's library.* His advisors questioned the link between the statement he was making and the original citation. They were not refuting that his statement wasn't true. Only that he didn't know it was true, because he could not have examined the original source. Another author's judgment on the assessment of truth was insufficient. That's how rigorous citation can be.

*This is a place where a citation is really needed. But, I can't remember which of his books I read this in. Structure of Evolutionary Theory? Panda's Thumb? I'm fuzzy on the details here, but whether it happened or not, it could have happened, which is all that is necessary here.

When I think about how I use citations, I feel there are two types of citations that I use.

The first I call vertical citations.

Vertical citations are the links between what has been found to be true in the past and a statement we currently would like to make to establish the truth.

For example, here is the first sentence of a paper that I just submitted to a journal:

There are approximately 1 billion cattle in the world with cattle populations steadily increasing over the past few decades (Estell et al. 2014).

This is a vertical citation. I am going back into the literature to provide evidence of the truth of a statement. I personally have not counted how many cattle there are in the world. Nor have I determined whether cattle populations are increasing or decreasing over the past few decades. So, instead of going out and counting cattle, I cite a paper that has established this to be true or has cited the papers that have established this to be true.  The paper I chose to cite is Estell et al. 2014***

***et al. stands for et alia (in the neuter form), which means and others in Latin. Et alia is almost always abbreviated et al., which is funny because we really aren't saving that many characters. Really just one. I think, in part, it gets abbreviated because the actual Latin phrase depends on whether the "others" are male, female, or both. Easier to write "et al." than determine whether et alii, et aliae, et alia is more appropriate.

So, when are vertical citations necessary?

Any time we make a statement in a scientific paper about what we consider to be true outside of the personal experience we are describing, a citation is necessary.

Any time.

If we want to say that there are a billion cattle in the world, we need a citation. If we want to say that atmospheric CO2 concentrations are increasing, we need a citation. The sky is blue? Citation. Gravity exists? Citation.

Now, if we want to say that we performed a certain procedure in an experiment, we do not need a citation. We hold it true that we might have measured something at a certain temperature, but there is no citation for this since it comes from our experience, not the literature.

Vertical citations go back into the literature to provide justification for the truth of statements we are making. Think of the Newton's phrase, if I have seen further, it is by standing on the shoulders of giants...When we cite a previous work, we are placing our foot on the shoulder of a giant that has come before us. We are reaching down vertically to build something taller.

As opposed to vertical citations, there are also horizontal citations. Like vertical citations, they reach down into the literature to establish the truth, but the purpose is different.

Horizontal citations are primarily for context. In the introduction, horizontal citations are typically used to identify intellectual tension. Study A found this. Study B found that. This and that cannot be both true under our current intellectual framework. We cite these papers to show what other researchers have found to justify our work.

In the discussion, horizontal citations are used in a similar manner, but it is not to establish that there is intellectual tension, but to see if there is intellectual tension. Study A found this. Study B found that. We found this, too. Therefore, it seems like this is more likely to be true than that.

With horizontal citations, we are not citing other giants, but instead other dwarfs (or other Isaac Newtons).**

**the original metaphor was "dwarfs standing on the shoulders of giants". Citation here. We think of Newton as a giant now, but originally he would have been a dwarf in the metaphor.

So, when I think about how I reference the literature, it is generally vertically or horizontally. I am either reaching down to stand taller, or reaching across to build linkages.

That's probably a long enough post for now. Down the line, I should cover the consequences of failing to cite the literature correctly and the consequences of determining that the findings of a published paper was not true: what happens when a giant tumbles?

Mostly as a note to myself, comparing legal citations and scientific citations is also instructive. The law only cares about what was legally true at the time the law was being examined. Science cares about what is known to be true at the time the scientific fact was established and after. Hence, changes in the law and changes in scientific understanding have much different consequences.

Wednesday, March 9, 2016

Declines in tree nutrient concentration over past 25 years

I've been trying to catch up on journals lately. Apparently, I hadn't read anything from Global Change Biology over the past 2 years. Must have been distracted. No time like the present...

Here's one that struck me as amazing.

Researchers in Europe resampled forest leaves from 1992 - 2009 across a large number of plots in Europe. At each site for a subset of species they assessed nutrient concentrations and leaf mass--a pretty simple and standard measurement. Doing this allowed them to examine the trajectory of nutrient concentrations (and contents). Nutrient concentrations in leaves are critical to determining tree productivity as well as interactions with herbivores, so knowing whether concentrations are going up or down is critical to modeling the future productivity of these forests.

Here's the simplified result: almost all nutrient concentrations were declining. 20 nutrients had declining concentrations. 2 were increasing.

Here's an example of the pattern for beech. white bars are concentrations, grey contents.

The authors focus on P nutrition the most, emphasizing the role of N deposition in promoting P limitation. Yet, even N concentrations were declining. These declines must be more than just N deposition causing imbalances, especially since N deposition has been declining over the time period. 

The authors suggest elevated atmospheric CO2 might also be playing a role, as well as droughts and warming, but this paper mostly describes the pattern, which is fine.

The big question is: What is causing this massive, continental decline in nutrient concentrations?

Monday, March 7, 2016

ASA statement on P-values

The American Statistical Associations statement on the use of p-values can be found here.

The short list is:

  1. P-values can indicate how incompatible the data are with a specified statistical model. 
  2. P-values do not measure the probability that the studied hypothesis is true, or the probability that the data were produced by random chance alone. 
  3. Scientific conclusions and business or policy decisions should not be based only on whether a p-value passes a specific threshold. 
  4. Proper inference requires full reporting and transparency. 
  5. A p-value, or statistical significance, does not measure the size of an effect or the importance of a result. 
  6. By itself, a p-value does not provide a good measure of evidence regarding a model or hypothesis. 
My personal take is that there are a few corrections in how p-values are used. 

1) p< 0.05 is arbitrary. report the exact p-values and think of them as a continuum. Don't think a paper should be accepted just because p < 0.05. Don't reject a paper just because p > 0.05. 

2) the p-value reported needs to be contextualized with the number of comparisons made. this is where p-hacking shows up. if you do 20 independent analyses, 1 is likely to have p-value < 0.05. You need to state that you did an additional 19 analyses if you are reporting the 20th. if you went and added more data or looked more carefully for outliers because a p-value wasn't low enough, this needs to be reported.

3) p-values and effect sizes must be reported together. an independent assessment of whether the measured effect is biologically relevant is needed. 

#2 on the list is the hardest to comprehend because it involves logical assumptions of the test. 

The manuscript's explanation of this is:

Researchers often wish to turn a p-value into a statement about the truth of a null hypothesis, or about the probability that random chance produced the observed data. The p-value is neither. It is a statement about data in relation to a specified hypothetical explanation, and is not a statement about the explanation itself.

At RetractionWatch, the author explains it this way:

Retraction Watch: Some of the principles seem straightforward, but I was curious about #2 – I often hear people describe the purpose of a p value as a way to estimate the probability the data were produced by random chance alone. Why is that a false belief? 
Ron Wasserstein: Let’s think about what that statement would mean for a simplistic example. Suppose a new treatment for a serious disease is alleged to work better than the current treatment. We test the claim by matching 5 pairs of similarly ill patients and randomly assigning one to the current and one to the new treatment in each pair. The null hypothesis is that the new treatment and the old each have a 50-50 chance of producing the better outcome for any pair. If that’s true, the probability the new treatment will win for all five pairs is (½)5 = 1/32, or about 0.03. If the data show that the new treatment does produce a better outcome for all 5 pairs, the p-value is 0.03. It represents the probability of that result, under the assumption that the new and old treatments are equally likely to win. It is not the probability the new treatment and the old treatment are equally likely to win.
This is perhaps subtle, but it is not quibbling.  It is a most basic logical fallacy to conclude something is true that you had to assume to be true in order to reach that conclusion.  If you fall for that fallacy, then you will conclude there is only a 3% chance that the treatments are equally likely to produce the better outcome, and assign a 97% chance that the new treatment is better. You will have committed, as Vizzini says in “The Princess Bride,” a classic (and serious) blunder.
I'm still looking for the right wording on this one, but it seems like the probability that the null hypothesis is true given the effect size observed. 

Saturday, March 5, 2016

Biogeochemical Planetary Boundary: Beyond the zone of uncertainty? (Part II)

I think of scientists as having two jobs.

One is to create intellectual tension.

The other is to resolve it.

Creating intellectual tension is generating hypotheses. Hypotheses that we do not know whether they are true or false represents intellectual tension. Competing hypotheses about how the world works are also intellectual tension. We do not know which is true. This is the tension.

Resolving intellectual tension can sometimes occur by identifying logical flaws in one hypothesis. Generally, intellectual tension is resolved by collecting data. It is a fair question about whether a hypothesis can ever be proven or disproven and therefore whether intellectual tension is ever fully resolved, but the process of science works to reduce intellectual by favoring hypotheses.

In the previous post, I identified some important intellectual tension in the scientific world.

There is the hypothesis that the planet has exceeded a biogeochemical "planetary boundary". Too much nitrogen is being fixed and entering ecosystems. This is the hypothesis.

Yet, it is unclear whether this is causing planetary-scale eutrophication of terrestrial ecosystems or  aquatic ecosystems.

On the one hand, we have a hypothesis where the world is awash in nitrogen. We fix more nitrogen than ever and apply it to ecosystems on a massive scale. As a result, nitrogen is leaking out into waterways creating dead zones in the oceans. Nitrogen is also entering the atmosphere and raining down on even the most remote ecosystems on earth. As a result, terrestrial ecosystems are becoming eutrophied. Species adapted to low nitrogen availability are being crowded out by faster growing plants. Biodiversity is plummeting. Productivity is increasing unsustainably. With all this extra nitrogen, we have exceeded a biogeochemical planetary boundary. Civilization as we know it is threatened.

Yet, the intellectual tension on this hypothesis actually takes the form of a competing hypothesis. It is possible that not only have we not exceeded a planetary boundary for nitrogen, but ecosystems might be becoming more nitrogen limited over time. As temperatures warm and atmospheric CO2 builds up, this might stimulate the demand for N more than it is being supplied. Plants and microbes become more limited by N. Plant N concentrations decline. Photosynthesis declines. Plants that compete well for N become more dominant. Less N leaks out of ecosystems into streams. Productivity becomes more and more constrained by the lack of nitrogen. Vegetation sequesters less and less carbon than they could be, all because there is not enough nitrogen. As a result, more CO2 accumulates in the atmosphere than could be if forests had more nitrogen. Climates warm even faster. Civilization as we know it is threatened.

Intellectual tension like this could not be as stark.

If you reduce the world to one pixel, there is either too much nitrogen. Or there is too little.

Resolving this tension requires data. On the one hand, we know that N is being fixed in ever greater amounts. On the other hand, CO2 continues to increase which shifts demand for N even higher. Back again, N is raining down on ecosystems still at an elevated rate. Yet, the NO3- concentrations of water in streams is so low, stream water is approaching the NO3- concentrations of distilled water.

The only way to resolve this tension is to collect data on N availability.

Yet we need long-term measurements of N availability to know for sure whether N is becoming more or less limiting.

We don't have these.

We could use the species composition of plant communities in conjunction with indices of what plants represent low or high N availability, but again we have not invested in long-term monitoring of our plant communities.

The tension of whether the world is becoming more eutrophic or more oligotrophic has existed for a long time now.

It probably is not a bad thing to think that civilization is threatened. But we should at least know whether it is because there is too much nitrogen or too little before we try to fix it. Or else our remedies might exacerbate the situation.

Without the right data, we cannot resolve this tension. That means we start monitoring key indices like N availability and species composition now and try to answer the question in 10 years.

Or we find a different dataset that allows us to reconstruct N availability on broad spatial scales far enough back in time to discern the trajectory of N availability.

Do we have the data to resolve this tension?

I think we might...

Let's see what reviewers say.

Biogeochemical Planetary Boundary: Beyond the zone of uncertainty? (Part I)

The cycling of nitrogen in a terrestrial ecosystem determines its primary (and secondary) productivity, its diversity, and how much (and how) nitrogen is lost to the atmosphere and waters. In general, plant productivity is limited by the availability of nitrogen. Add a little more nitrogen, and not much changes. Productivity increases, but qualitatively, the ecosystem functions the same. Add a little more, and the ecosystem changes quantitatively, but not qualitatively. Productivity increases. N concentrations increase a bit, but it still is qualitatively similar to the unfertilized ecosystem.

Keep fertilizing the ecosystem with N, and eventually the ecosystem reaches a threshold. Not only does productivity increase, but a lot of other things change. Suddenly, plant N concentrations increase a lot. The plant community shifts towards plants that thrive under higher N. They have high N concentrations, they use alkaloids instead of tannins to defend themselves, their leaves are built to capture as much light as possible, rather than avoid capturing too much light. In the soil, the soil microbial community shifts and the richness of N causes N to start leaving the soils in ways it hadn't before. More NO3- comes out in the waters. More gaseous N is lost to the atmosphere.

This threshold has been repeated experimentally in individual ecosystems throughout the world. And we've seen it when we non-experimentally add a lot of N to pastures or croplands or even forests.

What we see at the plot level or even at the level of the stand or region could potentially have analogs at the planetary level. As humans fix more and more N and more and more N is added to the ecosystems, could the whole planet flip states and autocatalyze from a oligotrophic world to a eutrophic world? Could N limitation become the exception, rather than the rule.

In 2009, Rockstrom et al. published their summary of the state of the earth in respect to Planetary Boundaries (see my 2012 post on the issue here). These planetary boundaries are planet-wide environmental boundaries or ‘tipping points’. Exceed these thresholds, and humanity is at risk.

That paper was updated last year by Steffen et al. As before, the authors state that for climate change, we have entered a "zone of uncertainty" with "increasing risk". Despite all the warming, the sea level rise, the collapsing ice sheets, the potential for a shutdown of the thermohaline circulation, losses of coral reefs, thawing of permafrost, and climatic reorganization underway, their summary is that humanity is still in a safe operating space climatically.

In contrast, for the global nitrogen cycle, the status is the same as in 2009. We are apparently beyond the zone of uncertainty, and humanity is currently at high risk of exceeding a planetary threshold.

That sounds pretty dire.

But are we?

The basis for this assessment is from a recent paper by de Vries et al. 2013.

Reading the paper, apparently, for the planet to have exceeded a planetary boundary for N requires that one of the following (according to the authors) has exceeded safe operating space:

1) eutrophication of terrestrial ecosystems
2) eutrophication of marine ecosystems
3) acidification of soils and fresh waters
4) NOx, a greenhouse gas
5) ozone formation
6) groundwater contamination
7) stratospheric ozone depletion

There is really  no evidence of too much tropospheric ozone or too much groundwater contamination for humans to safely inhabit planet. Soils do not appear to be becoming acidified due to N deposition and fertilization globally. NOx levels are not deathly high. Stratospheric ozone levels are still recovering from CFC phase-outs.

Therefore, if humanity has exceeded a biogeochemical planetary boundary, then there must be evidence of planetary-scale eutrophication of terrestrial or marine ecosystems.

In a future post, I'll examine the intellectual tension about this idea...

Monday, December 7, 2015

Highlights from Nature Climate Change in 2015

Catching up on a year's worth of articles from Nature Climate Change. It's like binge-watching your favorite program. There are a lot of great "episodes", but here are some that stood out for me:

Central US experience a greater frequency of floods most likely due to a greater frequency of heavy rainfall events and rain-on-snow events.

Mallakpour, I. and G. Villarini. 2015. The changing nature of flooding across the central United States. Nature Climate Change 5:250-254.

Growing season length is increasing almost everywhere.
Buitenwerf, R., L. Rose, and S. I. Higgins. 2015. Three decades of multi-dimensional change in global leaf phenology. Nature Climate Change 5:364-368.

Increasing CO2 decreases plant N:P, while warming and water increase it. 
Yuan, Z. Y. and H. Y. H. Chen. 2015. Decoupling of nitrogen and phosphorus in terrestrial plants associated with global changes. Nature Climate Change 5:465-469.

Temperature is more tightly coupled with greenhouse gases than insolation. "This confirms the existence of a positive feedback operating in climate change whereby warming itself may amplify a rise in GHG concentrations." Note the new analytical techniques here to evaluate complex systems.
van Nes, E. H., M. Scheffer, V. Brovkin, T. M. Lenton, H. Ye, E. Deyle, and G. Sugihara. 2015. Causal feedbacks in climate change. Nature Climate Change 5:445-448.

Microbial decomposition generates heat that thaws permafrost faster.
Hollesen, J., H. Matthiesen, A. B. Møller, and B. Elberling. 2015. Permafrost thawing in organic Arctic soils accelerated by ground heat production. Nature Climate Change 5:574-578.

European forests are more efficient with their water use, but longer growing season, warmer temperatures, and increased leaf area lead to transpiring more water. Frank, D. C., B. Poulter, M. Saurer, J. Esper, C. Huntingford, G. Helle, K. Treydte, N. E. Zimmermann, G. H. Schleser, A. Ahlström, et al. 2015. Water-use efficiency and transpiration across European forests during the Anthropocene. Nature Climate Change 5:579-583.

Tall, leafy trees are most likely to get nailed by drought in the future. McDowell, N. G. and C. D. Allen. 2015. Darcy's law predicts widespread forest mortality under climate warming. Nature Climate Change 5:669-672.

Tuesday, December 1, 2015

Ecological and environmental funding priorities

The above graph does not display patterns of ecological priorities. It's about funding at NIH (recently published in Science). 

The y-axis is millions of dollars spent in 2010. The x-axis is disability adjusted life years--the cumulative number of years lost to ill-health, disability, or death. (DALY).

The relationship is a good one, in statistical terms. Diseases with a low burden are funded less than diseases with a high burden. There are residuals, too. Diseases with a global presence (malaria, AIDS) appear to be funded at a greater rate than their US DALY. Lung cancer, migraines, and suicide, which are not as trendy (or likely thought to be the fault of the stricken) are funded less.

My question today is why don't we have a graph like this for ecology/environmental science?

The first question would be what do we put on the x-axis? Do we have an ecological equivalent of DALY? Probably not. That right there is one of the biggest failings on how to prioritize. We don't have a standard to compare for prioritizing**. 

**If anything, we make expert lists like "Top 50 Priorities in [insert discipline]"

That means ecological funding graph is likely to be a bar chart. Still, I'd like to see that.**

**I guess another way to do it would be to put it in terms of societal benefit. Ecological goods and services. X-axis would be dollars, then.

Then what are the categories? 

We can't do it by standard categories like carbon cycling, population dynamics, community composition. These do not necessarily speak to societal challenges. 

You need the equivalent of diseases. So, disease would be one category. Climate change, elevated CO2, nitrogen deposition, drought, biodiversity loss, water quality...

Then I guess you'd need to categorize funding for a given year from NSF and maybe EPA, USGS...

That graph wouldn't be as good as the DALY-funding graph, but I'd still like to see it. 

The reason is that my suspicion is that funding priorities are all out of whack relative to societal need.  If we could have a bivariate plot, it'd be messier than the DALY graph.

My other suspicion is that the y-axis would be a lot smaller. Many of the most important ecological/environmental issues of our time wouldn't even hit the psoriasis level of funding, no less autism.**

**Heck, all of NSF Directorate for Biological Science ~$700M is about what NIH spends on depression.

If we could put together this graph, that should help arguments on how ecology/environmental science is funded. 

Because my last suspicion is that we're underfunded. We wish we were funded like peptic ulcers, or even migraines relative to the importance of the issues.** 

**I think down the line there is going to be a grand bargain where Congress will want NSF to justify their funding based on societal need. It might be a bargain worth taking if the ecological/environmental science gets funded at similar proportions to societal need as disease.