Tuesday, May 31, 2016

CUDOS in science

Here's something I hadn't seen before.

I've thought that it's interesting that there isn't a Hippocratic oath for scientists (scientists didn't exist in the days of Hippocrates). It turns out there are norms of scientific society.

I read about this in Wootton's book, but hadn't ever heard about them. Apparently, these norms were described in 1942 by Merton in his description of the sociology of science.

The norms of science go by the acronym of CUDOS:

Organized Skepticism

The wikipedia page describes them fairly well. As does this blog post.

In short, these norms describe the ideals of science. The results should be open to everyone. Ideas (and opportunities) are evaluated blind to the characteristics of the individual. Scientists report results independent of the consequences of the outcome. All ideas are subject to scrutiny.

Wootton's analogy between science and the law in these norms is pretty interesting. The legal profession holds similar ideas**. For example, evidence should never be withheld to opposing parties, which is similar to communalism.

**Though Wootton does not delve into it, the adversarial nature of legal actions is not replicated generally in science even though there is still a tradition of a "defense" of theses or dissertations.

Most scientists believe they deserve more kudos for the work they do. Apparently CUDOS are built into science.

Monday, May 30, 2016

Book review: The Invention of Science

Currently, science is undergoing a convulsion. The very way that science operates is changing. It's a change that appears to be unprecedented in modern times.

For the first time, science is being forced to deal with bias. Questions of reproducibility have become a crisis. The review process is under renewed scrutiny. The nature and openness of publishing is being assaulted legally and illegally. Everyone from editors to scientists to funding agencies are being forced to reckon with consequences of retractions at an unprecedented rate.

The Invention of Science, a new book by David Wootton addresses none of these modern ills. But, sometimes, modern crises are an important time to revisit our history. The Invention of Science is an unparalleled examination of the long, slow (and sometimes convulsive birth of science).

Note, this thing is a wrist-breaker. 600 pages before you get to the endnotes. That's a good thing. Understanding the history of a topic is not something to do in Cliff Notes form. You need a comfortable chair and a pen for the margins to absorb the lessons.

The thesis of this book is that science (as we currently define it) once did not exist. Knowledge was generated through means other than science. In order for science to be invented, a number of conventions had to be created, too. We needed a new vocabulary. People needed to act and interact differently. The conceptual framework that we recognize as scientific had to not only be assembled, it had to displace previous frameworks.

A book of this scope is hard to summarize with any justice.

Here is the first sentence of the book. "Modern science was invented between 1572, when Tyco Brahe saw a nova...and 1704, when Newton published his Opticks...." Science took a bit over 100 years to invent. It's only a bit over 300 years old.

Over a hundred years to invent something that seems so simple that we do it every day? Why so long?

The book answers why it wasn't as easy as people might think.

The middle chapters are the ones I've spent the most time on. These are their titles: Facts, Experiments, Laws, Hypotheses/Theories, Evidence and Judgment.

These chapters lay out the history of the main elements of the modern scientific approach.

I'm going to have to read these chapters one or two more times before I can crystallize them, but their scopes are the raw material for anyone trying to understand if not shape modern science.

For example, the word "fact" (with its modern meaning) did not exist in any language. The Greeks and Romans had no word for "fact". The concept of a "fact" did not exist. And facts are not the same as the truth.

Let me quote here.

"What is a fact? It is a sort of trump card in an intellectual game...Facts are a linguistic device which ensures that experience always trumps authority and reason."

Facts are a linguistic device? Since when is the truth a device? Facts must be something other than what we recognize them to be.

The experiments chapter describes a number of the early experiments. Here's a quote: "This is the first 'proper' experiment, in that it involves a carefully designed procedure, verification (the onlookers are thereto ensure this really is a reliable account), repetition and independent replication, followed rapidly by dissemination."

When did this happen? 1648 when a brother-in-law of Pascal climbed a mountain with a barometer.

But note his definition of an experiment. It involves verification. Repetition. Replication. Followed by  dissemination. Our modern crisis comes about because of a lack of verification, repetition, and replication (or reproducibility as we refer to it). Only touched on, the author highlights the motto of an Italian society. The motto was: provando i reprovando. Test and retest. Hard to imagine that as any modern society's motto.

The evidence and judgement chapter has interesting nuggets, too. In part, it examines the legal frameworks of different European countries, which affects how scientists came to prove things. Drawing techniques for a judicial system that relies on judges vs. juries leads to different ways of conducting science. That thumbprint is still with us today. Like any organism that has evolved, modern science still bears the marks of its history and past forms.

Here's a quick example he provides:

"A friend of mine was once in hospital in Paris. The doctors told him that they had an hypothesis regarding the nature of his illness which they intended to prove, where in England they would have told him that he had certain symptoms which suggested a diagnosis which they would run tests to confirm."

This is a subtle difference, but one whose distinctions should be obvious to anyone practicing science. Different paths do not always lead to the same destination, so choose the path wisely.

Right now, our science is in the middle of a transformation. The question is whether a new layer will simply be added or whether parts will be torn down and rebuilt. Anyone who offers an opinion on how science should be reformulated is wise to know it's history. This is a good book to start on that.

But buy it in hardcover so you can write in the margins.

The only drawback is that the margins are not wide enough.

Tuesday, May 24, 2016

Why we cite papers

Scholarly works are set apart from other types of writings by the use of citations. An essay on natural history might cover a scientific topic, but it is just an essay until it contains citations. Scientific papers are not scientific without citations.***

***This blog post is certainly not scientific...no citations here. OK, maybe one.

Most scientists do not question the need for citations nor the role they play in the paper itself. When we do not have a common understanding of the role of citation, we have trouble determining when citations are improper and what to do when what we think to be true shifts.

Most of us think that the big debates about citations is formatting. Do we number our citations or list the authors and dates each time? There are deeper issues that that. They have nothing to do with formatting.

The first time I really thought about citations is I remember that Stephen Jay Gould once got into trouble for citing a paper in his thesis that was not contained in his school's library.* His advisors questioned the link between the statement he was making and the original citation. They were not refuting that his statement wasn't true. Only that he didn't know it was true, because he could not have examined the original source. Another author's judgment on the assessment of truth was insufficient. That's how rigorous citation can be.

*This is a place where a citation is really needed. But, I can't remember which of his books I read this in. Structure of Evolutionary Theory? Panda's Thumb? I'm fuzzy on the details here, but whether it happened or not, it could have happened, which is all that is necessary here.

When I think about how I use citations, I feel there are two types of citations that I use.

The first I call vertical citations.

Vertical citations are the links between what has been found to be true in the past and a statement we currently would like to make to establish the truth.

For example, here is the first sentence of a paper that I just submitted to a journal:

There are approximately 1 billion cattle in the world with cattle populations steadily increasing over the past few decades (Estell et al. 2014).

This is a vertical citation. I am going back into the literature to provide evidence of the truth of a statement. I personally have not counted how many cattle there are in the world. Nor have I determined whether cattle populations are increasing or decreasing over the past few decades. So, instead of going out and counting cattle, I cite a paper that has established this to be true or has cited the papers that have established this to be true.  The paper I chose to cite is Estell et al. 2014***

***et al. stands for et alia (in the neuter form), which means and others in Latin. Et alia is almost always abbreviated et al., which is funny because we really aren't saving that many characters. Really just one. I think, in part, it gets abbreviated because the actual Latin phrase depends on whether the "others" are male, female, or both. Easier to write "et al." than determine whether et alii, et aliae, et alia is more appropriate.

So, when are vertical citations necessary?

Any time we make a statement in a scientific paper about what we consider to be true outside of the personal experience we are describing, a citation is necessary.

Any time.

If we want to say that there are a billion cattle in the world, we need a citation. If we want to say that atmospheric CO2 concentrations are increasing, we need a citation. The sky is blue? Citation. Gravity exists? Citation.

Now, if we want to say that we performed a certain procedure in an experiment, we do not need a citation. We hold it true that we might have measured something at a certain temperature, but there is no citation for this since it comes from our experience, not the literature.

Vertical citations go back into the literature to provide justification for the truth of statements we are making. Think of the Newton's phrase, if I have seen further, it is by standing on the shoulders of giants...When we cite a previous work, we are placing our foot on the shoulder of a giant that has come before us. We are reaching down vertically to build something taller.

As opposed to vertical citations, there are also horizontal citations. Like vertical citations, they reach down into the literature to establish the truth, but the purpose is different.

Horizontal citations are primarily for context. In the introduction, horizontal citations are typically used to identify intellectual tension. Study A found this. Study B found that. This and that cannot be both true under our current intellectual framework. We cite these papers to show what other researchers have found to justify our work.

In the discussion, horizontal citations are used in a similar manner, but it is not to establish that there is intellectual tension, but to see if there is intellectual tension. Study A found this. Study B found that. We found this, too. Therefore, it seems like this is more likely to be true than that.

With horizontal citations, we are not citing other giants, but instead other dwarfs (or other Isaac Newtons).**

**the original metaphor was "dwarfs standing on the shoulders of giants". Citation here. We think of Newton as a giant now, but originally he would have been a dwarf in the metaphor.

So, when I think about how I reference the literature, it is generally vertically or horizontally. I am either reaching down to stand taller, or reaching across to build linkages.

That's probably a long enough post for now. Down the line, I should cover the consequences of failing to cite the literature correctly and the consequences of determining that the findings of a published paper was not true: what happens when a giant tumbles?

Mostly as a note to myself, comparing legal citations and scientific citations is also instructive. The law only cares about what was legally true at the time the law was being examined. Science cares about what is known to be true at the time the scientific fact was established and after. Hence, changes in the law and changes in scientific understanding have much different consequences.

Wednesday, March 9, 2016

Declines in tree nutrient concentration over past 25 years

I've been trying to catch up on journals lately. Apparently, I hadn't read anything from Global Change Biology over the past 2 years. Must have been distracted. No time like the present...

Here's one that struck me as amazing.

Researchers in Europe resampled forest leaves from 1992 - 2009 across a large number of plots in Europe. At each site for a subset of species they assessed nutrient concentrations and leaf mass--a pretty simple and standard measurement. Doing this allowed them to examine the trajectory of nutrient concentrations (and contents). Nutrient concentrations in leaves are critical to determining tree productivity as well as interactions with herbivores, so knowing whether concentrations are going up or down is critical to modeling the future productivity of these forests.

Here's the simplified result: almost all nutrient concentrations were declining. 20 nutrients had declining concentrations. 2 were increasing.

Here's an example of the pattern for beech. white bars are concentrations, grey contents.

The authors focus on P nutrition the most, emphasizing the role of N deposition in promoting P limitation. Yet, even N concentrations were declining. These declines must be more than just N deposition causing imbalances, especially since N deposition has been declining over the time period. 

The authors suggest elevated atmospheric CO2 might also be playing a role, as well as droughts and warming, but this paper mostly describes the pattern, which is fine.

The big question is: What is causing this massive, continental decline in nutrient concentrations?

Monday, March 7, 2016

ASA statement on P-values

The American Statistical Associations statement on the use of p-values can be found here.

The short list is:

  1. P-values can indicate how incompatible the data are with a specified statistical model. 
  2. P-values do not measure the probability that the studied hypothesis is true, or the probability that the data were produced by random chance alone. 
  3. Scientific conclusions and business or policy decisions should not be based only on whether a p-value passes a specific threshold. 
  4. Proper inference requires full reporting and transparency. 
  5. A p-value, or statistical significance, does not measure the size of an effect or the importance of a result. 
  6. By itself, a p-value does not provide a good measure of evidence regarding a model or hypothesis. 
My personal take is that there are a few corrections in how p-values are used. 

1) p< 0.05 is arbitrary. report the exact p-values and think of them as a continuum. Don't think a paper should be accepted just because p < 0.05. Don't reject a paper just because p > 0.05. 

2) the p-value reported needs to be contextualized with the number of comparisons made. this is where p-hacking shows up. if you do 20 independent analyses, 1 is likely to have p-value < 0.05. You need to state that you did an additional 19 analyses if you are reporting the 20th. if you went and added more data or looked more carefully for outliers because a p-value wasn't low enough, this needs to be reported.

3) p-values and effect sizes must be reported together. an independent assessment of whether the measured effect is biologically relevant is needed. 

#2 on the list is the hardest to comprehend because it involves logical assumptions of the test. 

The manuscript's explanation of this is:

Researchers often wish to turn a p-value into a statement about the truth of a null hypothesis, or about the probability that random chance produced the observed data. The p-value is neither. It is a statement about data in relation to a specified hypothetical explanation, and is not a statement about the explanation itself.

At RetractionWatch, the author explains it this way:

Retraction Watch: Some of the principles seem straightforward, but I was curious about #2 – I often hear people describe the purpose of a p value as a way to estimate the probability the data were produced by random chance alone. Why is that a false belief? 
Ron Wasserstein: Let’s think about what that statement would mean for a simplistic example. Suppose a new treatment for a serious disease is alleged to work better than the current treatment. We test the claim by matching 5 pairs of similarly ill patients and randomly assigning one to the current and one to the new treatment in each pair. The null hypothesis is that the new treatment and the old each have a 50-50 chance of producing the better outcome for any pair. If that’s true, the probability the new treatment will win for all five pairs is (½)5 = 1/32, or about 0.03. If the data show that the new treatment does produce a better outcome for all 5 pairs, the p-value is 0.03. It represents the probability of that result, under the assumption that the new and old treatments are equally likely to win. It is not the probability the new treatment and the old treatment are equally likely to win.
This is perhaps subtle, but it is not quibbling.  It is a most basic logical fallacy to conclude something is true that you had to assume to be true in order to reach that conclusion.  If you fall for that fallacy, then you will conclude there is only a 3% chance that the treatments are equally likely to produce the better outcome, and assign a 97% chance that the new treatment is better. You will have committed, as Vizzini says in “The Princess Bride,” a classic (and serious) blunder.
I'm still looking for the right wording on this one, but it seems like the probability that the null hypothesis is true given the effect size observed. 

Saturday, March 5, 2016

Biogeochemical Planetary Boundary: Beyond the zone of uncertainty? (Part II)

I think of scientists as having two jobs.

One is to create intellectual tension.

The other is to resolve it.

Creating intellectual tension is generating hypotheses. Hypotheses that we do not know whether they are true or false represents intellectual tension. Competing hypotheses about how the world works are also intellectual tension. We do not know which is true. This is the tension.

Resolving intellectual tension can sometimes occur by identifying logical flaws in one hypothesis. Generally, intellectual tension is resolved by collecting data. It is a fair question about whether a hypothesis can ever be proven or disproven and therefore whether intellectual tension is ever fully resolved, but the process of science works to reduce intellectual by favoring hypotheses.

In the previous post, I identified some important intellectual tension in the scientific world.

There is the hypothesis that the planet has exceeded a biogeochemical "planetary boundary". Too much nitrogen is being fixed and entering ecosystems. This is the hypothesis.

Yet, it is unclear whether this is causing planetary-scale eutrophication of terrestrial ecosystems or  aquatic ecosystems.

On the one hand, we have a hypothesis where the world is awash in nitrogen. We fix more nitrogen than ever and apply it to ecosystems on a massive scale. As a result, nitrogen is leaking out into waterways creating dead zones in the oceans. Nitrogen is also entering the atmosphere and raining down on even the most remote ecosystems on earth. As a result, terrestrial ecosystems are becoming eutrophied. Species adapted to low nitrogen availability are being crowded out by faster growing plants. Biodiversity is plummeting. Productivity is increasing unsustainably. With all this extra nitrogen, we have exceeded a biogeochemical planetary boundary. Civilization as we know it is threatened.

Yet, the intellectual tension on this hypothesis actually takes the form of a competing hypothesis. It is possible that not only have we not exceeded a planetary boundary for nitrogen, but ecosystems might be becoming more nitrogen limited over time. As temperatures warm and atmospheric CO2 builds up, this might stimulate the demand for N more than it is being supplied. Plants and microbes become more limited by N. Plant N concentrations decline. Photosynthesis declines. Plants that compete well for N become more dominant. Less N leaks out of ecosystems into streams. Productivity becomes more and more constrained by the lack of nitrogen. Vegetation sequesters less and less carbon than they could be, all because there is not enough nitrogen. As a result, more CO2 accumulates in the atmosphere than could be if forests had more nitrogen. Climates warm even faster. Civilization as we know it is threatened.

Intellectual tension like this could not be as stark.

If you reduce the world to one pixel, there is either too much nitrogen. Or there is too little.

Resolving this tension requires data. On the one hand, we know that N is being fixed in ever greater amounts. On the other hand, CO2 continues to increase which shifts demand for N even higher. Back again, N is raining down on ecosystems still at an elevated rate. Yet, the NO3- concentrations of water in streams is so low, stream water is approaching the NO3- concentrations of distilled water.

The only way to resolve this tension is to collect data on N availability.

Yet we need long-term measurements of N availability to know for sure whether N is becoming more or less limiting.

We don't have these.

We could use the species composition of plant communities in conjunction with indices of what plants represent low or high N availability, but again we have not invested in long-term monitoring of our plant communities.

The tension of whether the world is becoming more eutrophic or more oligotrophic has existed for a long time now.

It probably is not a bad thing to think that civilization is threatened. But we should at least know whether it is because there is too much nitrogen or too little before we try to fix it. Or else our remedies might exacerbate the situation.

Without the right data, we cannot resolve this tension. That means we start monitoring key indices like N availability and species composition now and try to answer the question in 10 years.

Or we find a different dataset that allows us to reconstruct N availability on broad spatial scales far enough back in time to discern the trajectory of N availability.

Do we have the data to resolve this tension?

I think we might...

Let's see what reviewers say.

Biogeochemical Planetary Boundary: Beyond the zone of uncertainty? (Part I)

The cycling of nitrogen in a terrestrial ecosystem determines its primary (and secondary) productivity, its diversity, and how much (and how) nitrogen is lost to the atmosphere and waters. In general, plant productivity is limited by the availability of nitrogen. Add a little more nitrogen, and not much changes. Productivity increases, but qualitatively, the ecosystem functions the same. Add a little more, and the ecosystem changes quantitatively, but not qualitatively. Productivity increases. N concentrations increase a bit, but it still is qualitatively similar to the unfertilized ecosystem.

Keep fertilizing the ecosystem with N, and eventually the ecosystem reaches a threshold. Not only does productivity increase, but a lot of other things change. Suddenly, plant N concentrations increase a lot. The plant community shifts towards plants that thrive under higher N. They have high N concentrations, they use alkaloids instead of tannins to defend themselves, their leaves are built to capture as much light as possible, rather than avoid capturing too much light. In the soil, the soil microbial community shifts and the richness of N causes N to start leaving the soils in ways it hadn't before. More NO3- comes out in the waters. More gaseous N is lost to the atmosphere.

This threshold has been repeated experimentally in individual ecosystems throughout the world. And we've seen it when we non-experimentally add a lot of N to pastures or croplands or even forests.

What we see at the plot level or even at the level of the stand or region could potentially have analogs at the planetary level. As humans fix more and more N and more and more N is added to the ecosystems, could the whole planet flip states and autocatalyze from a oligotrophic world to a eutrophic world? Could N limitation become the exception, rather than the rule.

In 2009, Rockstrom et al. published their summary of the state of the earth in respect to Planetary Boundaries (see my 2012 post on the issue here). These planetary boundaries are planet-wide environmental boundaries or ‘tipping points’. Exceed these thresholds, and humanity is at risk.

That paper was updated last year by Steffen et al. As before, the authors state that for climate change, we have entered a "zone of uncertainty" with "increasing risk". Despite all the warming, the sea level rise, the collapsing ice sheets, the potential for a shutdown of the thermohaline circulation, losses of coral reefs, thawing of permafrost, and climatic reorganization underway, their summary is that humanity is still in a safe operating space climatically.

In contrast, for the global nitrogen cycle, the status is the same as in 2009. We are apparently beyond the zone of uncertainty, and humanity is currently at high risk of exceeding a planetary threshold.

That sounds pretty dire.

But are we?

The basis for this assessment is from a recent paper by de Vries et al. 2013.

Reading the paper, apparently, for the planet to have exceeded a planetary boundary for N requires that one of the following (according to the authors) has exceeded safe operating space:

1) eutrophication of terrestrial ecosystems
2) eutrophication of marine ecosystems
3) acidification of soils and fresh waters
4) NOx, a greenhouse gas
5) ozone formation
6) groundwater contamination
7) stratospheric ozone depletion

There is really  no evidence of too much tropospheric ozone or too much groundwater contamination for humans to safely inhabit planet. Soils do not appear to be becoming acidified due to N deposition and fertilization globally. NOx levels are not deathly high. Stratospheric ozone levels are still recovering from CFC phase-outs.

Therefore, if humanity has exceeded a biogeochemical planetary boundary, then there must be evidence of planetary-scale eutrophication of terrestrial or marine ecosystems.

In a future post, I'll examine the intellectual tension about this idea...