Where is the Randomness?

What’s the probability that a standard six-sided die lands with exactly 3 dots showing? We’ll probably agree that the answer is one out of six. But why this value, and what does it mean? The standard explanation that you’ll find in a typical textbook goes something like this:

Probabilities represent “long run frequencies” of events. If we toss the same die many times, recording the number of dots showing after each throw, we’ll find that 3 dots are showing about one-sixth of the time. And the more observations we make, the closer that frequency will get to one-sixth.

Now suppose you toss a six-sided die 10,000 times and observe it landing with 3 dots showing after 5,000 of those tosses. Would you shrug and think well, I guess the probability of rolling a 3 is really one-half? Or would you instead:

Look more closely at the way you’re tossing the die?
Check for any blemishes in the die?
Try tossing the die in a different way, on a different surface for a while to see if you observe a different trend?
Ask a friend to toss the die for a while to see if you observe a different trend?
Many other things besides…

If you aren’t satisfied with the conclusion that the probability of rolling a 3 is really one-half, then you must already have some sense of what that probability is.

In order words, our notion of probability precedes the experiment. And the fact that we’re quicker to question the experimental setup (how we’re rolling, the condition of the die, etc.) than we are to redefine the probability of rolling a 3 indicates that the experiment simply acts to confirm what we already suspect.

Repeatable Events and Randomness

The textbook formulation of probability allows us to assign probabilities to “repeatable events”, like tossing a die or flipping a coin. This is the kind of thing that sounds good in principle despite nobody really knowing what it means.

Do I need to toss the die the exact same way each time? This doesn’t seem possible (because it isn’t). The die isn’t the same, and I’m not the same, not to mention our surroundings. Moreover, if it were possible to toss the die in exactly the same way, shouldn’t the outcome be the exact same each time?

So requiring “exact repeatability” is too strong. Presumably what we want instead is for each toss to be “similar enough” to the other tosses: we shouldn’t toss it really high in some cases, while dropping it from an eighth of an inch off the table in others. In fact, we should never drop it from that close to the table, since in that case we’ve pretty much guaranteed how it’s going to land.

At this point it’s tempting to identify the quality we’re homing in on as randomness. The problem with the close-to-the-table-drop toss is that it isn’t random enough. The kinds of things we can assign probabilities to are random repeatable events, like tossing a die or flipping a coin in a random way.

Unfortunately “randomness”, much like “repeatable event”, is the kind of concept that sounds good on paper without anyone being able to articulate exactly what it is. What makes one way of flipping a coin random and another way nonrandom? With a great deal of practice you might be able to flip a coin in a way that looks random to the casual observer, while still influencing how it will land; you could then use this trick to flip 90 heads in 100 trials.

Since I don’t know what randomness actually is, I can only guess at the response to this thought experiment: your “trick flips” aren’t random precisely because they resulted in 90 heads out of 100 flips. And yet it’s possible to flip a fair coin and observe 90 heads in 100 flips. Perhaps I just got really lucky. The response to this is surely: But you knew in advance how the coin was going to land! Truly random events don’t work like that.

Ah, we’re getting warmer.

Banishing Randomness

We invented the concept of randomness in an attempt to disqualify certain “unfair” ways of tossing a die. While we were unable to determine what randomness really is, we discovered that a key component of it is uncertainty. In fact, as far as I can tell, that’s the only component.

Where does this leave us? The textbook formulation of probability attempted to draw a very crisp line around the things we can assign probabilities to: repeatable random events. But we’ve now seen that the only salient aspect of “repeatable random events”—ill-defined as that notion is—is that we’re uncertain about their outcomes.

It’s this observation that motivates an alternative and much more powerful notion of probability: probabilities represent our degree of belief in propositions.

Make no mistake: this is a fundamental shift in perspective. According to the textbook formulation, probabilities are “out there in the world”: the probability of rolling a 3 is a property of the die (and possibly the environment in which it’s tossed), and we determine the true value of this probability by performing experiments. The alternative definition denies this, instead insisting that that probabilities are inside our minds.

One immediate consequence of adopting this alternative point of view is that certain statements about probabilities no longer make sense. For example, there is no “true value” of the probability of rolling a 3; you and I may very well calculate different values depending on the circumstances. It also makes no sense to try to discover the probability of an event by performing experiments. To paraphrase Jaynes, this is like attempting to determine how much I like apple pie by experimenting on a piece of apple pie. My affinity for apple pie exists in my mind, not inside a piece of pie.

What Have We Gained?

Adopting this alternative (Bayesian) point of view cleanly sidesteps the issues we presented above: we don’t need to figure out what a “repeatable event” is, and we don’t need to postulate a magical quality called “randomness”. But we’ve gained much more besides. In this new frame of mind, we can now assign probabilities to one-off events, and even to events that have already occurred.

As a first taste of this, consider the following: Alice stands facing away from Bob; she flips a coin and observes which face is showing, but doesn’t reveal this to Bob. Carol asks Bob: “What’s the probability that the coin landed heads-up?” Bob says it’s one-half. Then Carol asks Alice the same question. Unlike Bob, she claims it’s zero.

This simple scenario is enough to exhibit the flavor of Bayesianism as opposed to the classical (Frequentist) view:

Alice and Bob assign different probabilities to the same event.
They both assign probabilities to an event that has already occurred.

Are Probabilities Subjective?

Doesn’t this mean that probabilities (à la Bayes) are completely subjective? Absolutely! But this doesn’t matter nearly as much as it might seem. That’s because a subjective statement is just an objective statement with a certain amount of implied context; in theory, by explicitly including the context, we can transform any subjective statement into an equivalent objective one.

For example, when Alice says, “vanilla is the best flavor of ice cream”, the statement, “vanilla is the best flavor of ice cream” is clearly subjective. But the statement, “Alice thinks vanilla is the best flavor of ice cream” is entirely objective. Equivalently, we might say, “If I were Alice, I would think vanilla is the best flavor of ice cream”. In other words: if I had all of the information she had (instead of the information I have), I would arrive at the same conclusion.

This explains how Alice and Bob can state different values for the probability that the coin landed heads: Alice knows more than Bob does. Both should agree that, were they put in the other’s position, they’d have reached the same conclusion (as the other did).

So probabilities are subjective because “subjective” just means “dependent on an individual’s mind”, but they’re far from arbitrary.

Tutorial

With a bit of work, it’s possible to develop rules that probabilities must abide by. The result is an extension of logic to include propositions whose truth we’re uncertain about. Specifically, we’ll imagine a function $\mathrm{C}$ that takes a proposition as input and indicates how strongly we believe the proposition by outputting a value from 0 to 1 (0 meaning “not at all” and 1 meaning “certain”).

Actually, this function will take two propositions as arguments: the first being the proposition under consideration (e.g. “The coin will lands heads-up”), and the second being all of the background information we have available:

\mathrm{C} : \mathrm{Prop} \times \mathrm{Prop} \rightarrow [0, 1]

\underbrace{\mathrm{C}(A \vert X)}_{\text{Our certainty or credence about A, in light of background knowledge X.}}

We have no idea how to define $\mathrm{C}$ ! In some sense, the definition would be a recipe for a “rational mind”, something AI researchers would love to discover.

But we do know something about how $\mathrm{C}$ should behave. Specifically, given any proposition $A$ , our certainty that $A$ is the case and our certainty that $A$ isn’t the case (given some background information $X$ ) should sum to 1:

\mathrm{C}(A \vert X) + \mathrm{C}(\overline{A} \vert X) = 1

For example, if we think that the probability of rolling a 3 is one-sixth, then we ought to think that the probability of not rolling a 3 is five-sixths. Second, we can “decompose” (or “factor”) our confidence in a conjunction into a product of probabilities:

\begin{aligned} \mathrm{C}(AB \vert X) &= \mathrm{C}(A \vert X) \, \mathrm{C}(B \vert AX) \\ &= \mathrm{C}(B \vert X) \, \mathrm{C}(A \vert BX) \end{aligned}

TODO Finish

Logical, Not Causal

TODO Hypergeometric example

The real difference between Frequentists and Bayesians is that Frequentists want probabilities to be ontological, whereas Bayesians see them as epistemological. Frequentists see probabilities as part of the territory, whereas Bayesians locate them on the map.