Confidence Intervals – Gregory Faletto

On Friday, I responded to a prompt on the platform formerly known as Twitter asking for controversial statistics opinions. I offered one of my own:

In intro stats we should just teach that a 95% confidence interval means that there's a 95% probability the true value is in the interval.

It's fine. It's true (before the data are collected). https://t.co/yXWfhNxhR7
— Greg Faletto (@GregoryFaletto) July 27, 2024

This take did prove controversial—I saw people some people I consider very smart who agreed with me and some who disagreed. In light of the discussions, I think I have a slightly more modest take I’d like to push forward. But it will take a little bit to get there. First let’s get the basics out of the way and briefly talk about this distinction at the center of this discussion.

The Distinction

A confidence interval is a statistic—a function of data. If the assumptions of a 95% confidence interval hold, then before we collect the data, we know that there is a 95% probability we will collect a data set that results in a confidence interval containing the true population parameter.

After we collect the data, the confidence interval is just an interval, and the population parameter is either in it or it isn’t (though we don’t know which). Given a realized (collected) data set, it is no longer correct to make a probabilistic statement about a confidence interval.

Why I Said What I Said

I’m going to introduce an analogy here. Let’s say I’m going to flip a coin, and my prediction is that I’m going to get heads. I’m playing a game, and if my prediction is right I get 1 point; if I’m wrong I get 0.

Before I flip the coin, the number of points I’m going to get is a Bernoulli random variable.
Next I flip the coin, but I cover the coin with my hand as it lands. So the number points I’m going to get is now locked-in—it’s no longer random, it’s now just a fixed number (either 0 or 1), though I don’t know what that number is.
Finally, I lift my hand and see whether I was right or wrong. So now I know whether or not I got a point.

The important thing to notice here is that in Step 1, the number of points I’m going to get is a random variable, while in Steps 2 and 3 the number of points I’m going to get is just a fixed number—nothing’s random anymore.

Here are two possible assessments of Step 2.

Assessment A: “I am 50% confident that I gained one point from this coin flip.”
Assessment B: “The probability that I gained one point from that coin flip is 50%.”

In the confidence interval situation, before we collect data, the outcome of the procedure to construct a 95% confidence interval is a random variable (like Step 1 of the above analogy). After we collect the data, the outcome is locked in (realized), and the population parameter is either in the interval or it’s not, but we don’t know which, just like Step 2 of the above analogy. (Step 3 never happens for confidence intervals—if we knew the true population parameter, we wouldn’t be collecting data and constructing a confidence interval in the first place.)

The distinction we’re talking about here for confidence intervals is between these two assessments of Step 2. The way that introductory statistics is taught presupposes that all else equal, it’s a really big deal that students say something like Assessment A when they talk about confidence intervals, and that they know it is incorrect to say something like Assessment B. (By “really big deal,” what I mean is that on the final exam, realistically we’re going to be able to test like 5 – 10 skills tops, and the status quo is that this distinction is going to be one of those things we test.)

The crux of my perspective is that I think we should re-examine whether insisting that students avoid phrasing like Assessment B in favor of Assessment A is really something we ought to devote non-negligible time and attention to (and grade students on).

Why I Think We Should Think About This

First of all, part of my perspective is that when we are discussing how an introductory statistics course ought to be taught, we ought to optimize towards the experience of people who are not likely to take another statistics course. Statistics is a topic that touches a lot of other areas, and for that reason lots of people are required to take statistics. In fact, the bulk of students in introductory statistics classes probably fall in this category. I think it’s important for society that those students walk away with a reasonable understanding of the important concepts. And I think aiming for this goal is one of the best ways for statistics instructors to have a positive impact.

Further, among that group, I think we should follow the triage principle and specifically be most interested in catering towards people who are on the borderline of grasping the most important concepts we want students to leave this class grasping—people who might understand if we teach the material in a way that they understand, but won’t if we don’t. We don’t need to worry so much about people who are going to understand the material even if the curriculum isn’t catered to them. We also don’t need to hold up the rest of the class to cater to students who, for whatever reason, are not putting a baseline level of effort into the class. (I’ve 100% seen situations where students who were doing this were prioritizing correctly, so I’m writing here without judgment of students in that group.)

So here’s the kind of scenario I have in mind. In my experience with teaching and tutoring college-level introductory statistics (or, very similarly, AP Statistics), students who are on the borderline of grasping the basic concepts in statistics can be confused about why we object so strongly to the phrasing “the probability the true value is in the confidence interval is 95%,” where their confusion is orthogonal to the concepts we really care about them understanding. I’ve seen students who grasp that a 95% confidence interval will contain the true value 95% of the time, who understand that for a single realization the true value either is or is not in the interval, but still get hung up about why Assessment B is wrong.

Imagine a student like this who understands the important concepts, but they get back their quiz or test and see that they lost points because of the question that asks them not to say “the probability the true value is in the confidence interval is 95%.” They’re frustrated, they’re putting in effort but they’ve been struggling all semester, and they really just want to get a B. Now we’ve moved on to hypothesis testing. Do we want to create an incentive for this student to spend more time dwelling on the distinction between Assessment A and Assessment B, or do we want them to focus on understanding hypothesis testing (another tricky concept)?

I think their confusion is understandable, and not that central to the most important concepts in introductory statistics. I don’t think getting to this level of precision about what a probability really means is a valuable use of the time and attention of students who are on the borderline of grasping the basic concepts or not.

I agree that grasping that (1) in the long run a 95% confidence interval will contain the true value 95% of the time and (2) for a single realization the true value either is or is not in the interval are both important. If they grasp that stuff, let’s move on to the next topic.

One Other Caveat

I think there are other possible valid students to consider targeting in an introductory statistics class.

For example, theoretical machine learning researchers seem to come disproportionately from computer science and engineering departments, even though statisticians are clearly well-positioned to do this kind of research too and bring a valuable perspective. I’ve noticed anecdotally that there are lots of people out there who are under the vague impression that statistics is not mathematically rigorous, probably because of the way we teach introductory statistics. (If you are in this position and you’re wondering what mathematically rigorous statistics treatments are out there, some of the many good books you could check out include van de Vaart, Wainwright, Bühlmann and van de Geer, or DasGupta.)
Relatedly, another group we should maybe think about is people who are on the borderline of deciding whether to proceed with a statistics major instead of a different STEM major. To appeal to these students, we could choose to focus on some of the most interesting topics in modern statistics, like machine learning, causal inference, or selective inference.

My hunch is that it would probably be better to have a separate introductory course available catering to these kinds of students.