Statistics: Neither “lies” nor “damned lies”

Encountered yet another “lies, damned lies, and statistics” adherent this morning. My response:

When you adhere to the rules of statistics, there are no “lies” or “damned lies.” There’s only statistics, which is neither. It’s a mathematical science, similar to the science of calculus, originally called infinitesimal calculus or “the calculus of infinitesimals”, which is the mathematical study of continuous change, in the same way that geometry is the study of shape and algebra is the study of generalizations of arithmetic operations.
You understand geometry, right?

Calculus has two major branches, differential calculus and integral calculus. Differential calculus concerns instantaneous rates of change and the slopes of curves.

Are you with me?

Integral calculus concerns accumulation of quantities and the areas under and between curves. These two branches are related to each other by the fundamental theorem of calculus. Both branches make use of the fundamental notions of convergence of infinite sequences and infinite series to a well-defined limit.

How’re you doing so far?

Statistics, on the other hand, is the discipline that concerns the collection, organization, displaying, analysis, interpretation and presentation of data. In applying statistics to a scientific, industrial, or social problem, it is conventional to begin with a statistical population or a statistical model to be studied. Populations can be diverse groups of people or objects such as “all people living in a country” or “every atom composing a crystal”. Statistics deals with every aspect of data, including the planning of data collection in terms of the design of surveys and experiments.

If you’re handling data — collecting it, organizing it, measuring it, analyzing it, and presenting it, you must know statistics. If you don’t know statistics, you’re not handling data. You’re just messing with it.

The only rub i.e. friction with statistics involves the those who don’t understand statistics yet keep walking around chanting that old false flag tri-mantra denigrating statistics as if they had the slightest clue as to how statistics even works.

Allow me to give you a real-world example:

Wells, Atkins, and Montgomery, a medical partnership, rarely does surgeries. But when they do, they request overall feedback on a scale of 1 to 10 from the patient. In the month of August, Welles performed five surgeries, with overall patient feedback scores of 2, 4, 9, 3 and 2, whereas Atkins’ six surgeries resulted in scores of 3, 7, 5, 8, 4 and 3.

What test, exactly, do you use to compare means in order to determine whether you have enough data to say with any statistical certainty that Atkins’ mean of 5 is actually higher than Welles’ mean of 4, or whether there’s simply not enough data to make that conclusion i.e. the bounds of variability combined with the lack of quantity of data doesn’t support any definitive differentiation between the means?

You see, if you erroneously believe “there are lies, damned lies, and statistics,” you might be tempted to say, 5 > 4 so Atkins wins this month’s bonus.

But if you actually knew what the heck you were talking about i.e. if you knew statistics, you’d compare the means using a Pooled t Test, which takes about two minutes if you know what you’re doing, and determine the variability is too high compared to the quantity of samples to make a definitive determination.

THAT’S statistics.

It’s why we rolled our eyes at those in Stan Eval every time they posted a highlight rip with three — just three — scores and scrawled, “We’re noticing a trend!!!” That’s NOT statistics. That’s merely a lie.

So, then, why do we use it?

“In statisticspooled variance (also known as combinedcomposite, or overall variance) is a method for estimating variance of several different populations when the mean of each population may be different, but one may assume that the variance of each population is the same. The numerical estimate resulting from the use of this method is also called the pooled variance.

Under the assumption of equal population variances, the pooled sample variance provides a higher precision estimate of variance than the individual sample variances. This higher precision can lead to increased statistical power when used in statistical tests that compare the populations, such as the t-test.”

If all this is going over your head, then perhaps you shouldn’t be walking around falsely claiming, “there’s lies, damned lies, and statistics.” It’s not that doing so makes you look like an idiot. It’s that doing so does a great disservice to one of the most powerful and real-world results-useful mathematical disciplines known to mankind.

Leave a Reply