The American Statistical Associations statement on the use of p-values can be found here.
The short list is:
- P-values can indicate how incompatible the data are with a specified statistical model.
- P-values do not measure the probability that the studied hypothesis is true, or the probability that the data were produced by random chance alone.
- Scientific conclusions and business or policy decisions should not be based only on whether a p-value passes a specific threshold.
- Proper inference requires full reporting and transparency.
- A p-value, or statistical significance, does not measure the size of an effect or the importance of a result.
- By itself, a p-value does not provide a good measure of evidence regarding a model or hypothesis.
My personal take is that there are a few corrections in how p-values are used.
1) p< 0.05 is arbitrary. report the exact p-values and think of them as a continuum. Don't think a paper should be accepted just because p < 0.05. Don't reject a paper just because p > 0.05.
2) the p-value reported needs to be contextualized with the number of comparisons made. this is where p-hacking shows up. if you do 20 independent analyses, 1 is likely to have p-value < 0.05. You need to state that you did an additional 19 analyses if you are reporting the 20th. if you went and added more data or looked more carefully for outliers because a p-value wasn't low enough, this needs to be reported.
3) p-values and effect sizes must be reported together. an independent assessment of whether the measured effect is biologically relevant is needed.
#2 on the list is the hardest to comprehend because it involves logical assumptions of the test.
The manuscript's explanation of this is:
Researchers often wish to turn a p-value into a statement about the truth of a null hypothesis, or about the probability that random chance produced the observed data. The p-value is neither. It is a statement about data in relation to a specified hypothetical explanation, and is not a statement about the explanation itself.
At RetractionWatch, the author explains it this way:
Retraction Watch: Some of the principles seem straightforward, but I was curious about #2 – I often hear people describe the purpose of a p value as a way to estimate the probability the data were produced by random chance alone. Why is that a false belief?
Ron Wasserstein: Let’s think about what that statement would mean for a simplistic example. Suppose a new treatment for a serious disease is alleged to work better than the current treatment. We test the claim by matching 5 pairs of similarly ill patients and randomly assigning one to the current and one to the new treatment in each pair. The null hypothesis is that the new treatment and the old each have a 50-50 chance of producing the better outcome for any pair. If that’s true, the probability the new treatment will win for all five pairs is (½)5 = 1/32, or about 0.03. If the data show that the new treatment does produce a better outcome for all 5 pairs, the p-value is 0.03. It represents the probability of that result, under the assumption that the new and old treatments are equally likely to win. It is not the probability the new treatment and the old treatment are equally likely to win.
This is perhaps subtle, but it is not quibbling. It is a most basic logical fallacy to conclude something is true that you had to assume to be true in order to reach that conclusion. If you fall for that fallacy, then you will conclude there is only a 3% chance that the treatments are equally likely to produce the better outcome, and assign a 97% chance that the new treatment is better. You will have committed, as Vizzini says in “The Princess Bride,” a classic (and serious) blunder.
I'm still looking for the right wording on this one, but it seems like the probability that the null hypothesis is true given the effect size observed.