Claire Duvallet wrote a great post about the Science papers (the original and the follow-up) that examined the role of environmental factors versus “bad luck” in the acquisition of cancer. Claire talks about a lot of the angles around these papers that I also find really interesting: the scope of the work, how was it communicated, and how microbiome science compares to cancer science in its fairness.
I wanted to expand a little bit on the statistical pedagogy of the thing. The original article said:
A linear correlation equal to 0.804 suggests that 65% (39% to 81%; 95% CI) of the differences in cancer risk among different tissues can be explained by the total number of stem cell divisions in those tissues. Thus, the stochastic effects of DNA replication appear to be the major contributor to cancer in humans.
In other words, two-thirds of the variation in cancer risk across tissues can be explained by differences in the number of cell divisions in those tissues. For example, colorectal stem cells divide relatively many times and colorectal cancer is relatively common, while bone cells divide relatively few times and bone cancers are rare.
The interesting link here is that, with each cell division, there is a chance of a cancer-causing error cropping up in that cell daughters’ DNA. Thus, this result was widely interpreted as saying that two-thirds of cancers are due to random chance, leaving only one-third of cancers due to environmental factors. Under this reasoning, most people who smoked and got lung cancer actually got lung cancer because they were unlucky, not because they smoked.
This idea is wrong, but it’s wrong in a very subtle way. I was fascinated to read this WebMD article, where the journalist got it wrong, and where a quote from the scientific paper’s authors said it right. The journalist’s lede says: “Although about one-third of cancers can be linked to environmental factors or inherited genes, new research suggests the remaining two-thirds may be caused by random mutations.” I struggle to simplify this (incorrect) statement any further, except to say, maybe: Most cancer is from bad luck.
The author said: “If two-thirds of cancer incidence across tissues is explained by random DNA mutations that occur when stem cells divide, then changing our lifestyle and habits will be a huge help in preventing certain cancers, but this may not be as effective for a variety of others.” This point is much more subtle: it’s saying that changing environmental factors should change the rates of some cancers more than others.
(Frustratingly, even Science got it wrong: the teaser before the article says “Remarkably, this ‘bad luck’ component explains a far greater number of cancers than do hereditary and environmental factors.” What does it mean to “explain” a number of cancers? I’m pretty sure that’s not a well-formed sentence.)
So I think there are two interesting communication/pedagagy hiccups here.
First, there’s a lack of statistical literacy that led to the conflation of the statements “X% of the differences in rates of A across B is explained by differences in C” and the entirely different “X% of A is caused by a correlate of C”. For example, say I told you that 40% of the variance in female-victim homicide rates across states are explained by gun ownership rates in different states. That is not the same as saying that 40% of women who are murdered are murdered with guns. It’s not even close to saying that!
Second, I think the authors or journalists could have cleared up a lot of this confusion by saying something like “Because colorectal cells divide more than bone cells, it stands to reason that the percentage of colorectal cancers that are due to random chance is greater than the percentage of bone cancers that are due to random chance”. Or, more pointedly, say (the thing that I think is true but isn’t said anywhere, which is) that _we don’t know _what fraction of cancers are due to random chance.
Please someone tell me if that last thing is true or not.