Lies, Damn Lies, Statistics

#1 How we got cholesterol obsession and many other pseudo-scientific so-called evidence-based global health catastrophes. #2 Why your doctor can’t see the tree (you) for the forest of conventional statistics-derived-driven-calibrated diagnostics and treatments. An article in The Scientist:

http://www.the-scientist.com/?articles.view/articleNo/36781/title/Opinion–Statistical-Misconceptions/

Below are the main points; in the full article online you can read illustrative examples:

“Misconception #1: Correlation implies causality

Every scientist knows that “correlation does not imply causation.” Indeed, both variables may incidentally show the same tendency of quantitative variability without any logical and natural relationship between them at all. Alternatively, two variables may trend together since they are under the impact of the same confounding factors that are causing the changes in both. Nevertheless, the inappropriate assumption of causality is the biggest source of error in interpreting the results of correlation analysis…

Misconception #2: Individuals follow the group

It is not always possible to make inferences about the nature of individuals from information about the group to which those individuals belong. Many researchers do make such assumptions, however, thereby falling victim to the ecological inference fallacy…

Misconception #3: A correlation of zero implies independence

Based on the previous two examples, it is clear that high values of the linear correlation coefficient cannot by themselves be sufficient to conclude about the relationship between the variables. Conversely, a correlation coefficient of zero does not mean that the variables are independent. That is because the correlation coefficient measures linear association only. A U-shaped, non-monotonic relationship, for example, may have a correlation of zero…”

The use of false proxies may be based on #1 but also leads to conclusions that are misleading and wrong at a new level of error beyond the statistical interpretation of data.

#3 is related to what is called in math ‘local optimum,’ that is, if you do not consider the full range of possible independent variables/circumstances you may find an optimum within your limited scope of study which is not the more general, universal or global optimum. This is the error behind low fat diets. If you compare the health results of 10, 20 and 30% fat diets and don’t consider 40, 50 and 60% you might miss the greater benefit of diets with fat in the higher end of the range, overlooking the fact that fat/carb/protein is a two not one dimensional space and the consequence of a more general negative impact of carbs and glycemic load on health.

See also the earlier post, The Half Life Of Truth

It is misleading to use the term Evidence-Based to characterize modern science and medicine. Even Aristotelian science was based in part on evidence. Is there no interpretation, extrapolation and logical inference in modern science? What characterizes modern science are the specific statistical methods, tools and analysis currently in vogue but changing over time as this article exposes. What would be a more honest name for modern scientific method? Actuary? Lies, damn lies, science? Monte Carlo, Las Vegas, Macau, casino science? Thermodynamics? In all these MOs (method of operation) the bottom line is the population or ensemble; that is what counts and individuals are not important.

Another view of scientific errors, Type One vs. Type Two errors, can be carried over to cost/benefit or risk management.

“All statistical hypothesis tests have a probability of making type I and type II errors. For example, all blood tests for a disease will falsely detect the disease in some proportion of people who don’t have it, and will fail to detect the disease in some proportion of people who do have it.”

https://en.wikipedia.org/wiki/Type_I_and_type_II_errors