Probability Basics Explained: Understanding Chance, Events, and Rules

Updated June 2026
Probability is the branch of mathematics that quantifies uncertainty, assigning numbers between 0 and 1 to events to express how likely they are to occur. It forms the theoretical foundation of all inferential statistics, enabling researchers to determine how likely observed results would be under various assumptions and to make rational decisions in the face of incomplete information.

What Probability Measures

At its core, probability assigns a number to an event that reflects how likely that event is to occur. A probability of 0 means the event is impossible, a probability of 1 means it is certain, and values between 0 and 1 express degrees of likelihood. Flipping a fair coin gives heads a probability of 0.5 because, over many flips, heads appears about half the time. Rolling a 6 on a fair die has probability 1/6 (approximately 0.167) because the die has six equally likely outcomes.

The sample space is the set of all possible outcomes of an experiment or process. For a single coin flip, the sample space is {heads, tails}. For rolling two dice, the sample space contains 36 ordered pairs, from (1,1) to (6,6). An event is any subset of the sample space. The event "rolling an even number" on a single die corresponds to the subset {2, 4, 6}, which contains 3 of the 6 equally likely outcomes, giving it probability 3/6 = 0.5.

Three main interpretations of probability exist. The classical (theoretical) interpretation defines probability as the ratio of favorable outcomes to total equally likely outcomes. The frequentist interpretation defines probability as the long-run relative frequency of an event over many repeated trials. The subjective (Bayesian) interpretation treats probability as a degree of belief that can be updated as new evidence arrives. Each interpretation leads to the same mathematical rules, but they differ in philosophy and application.

Fundamental Rules of Probability

Three axioms, established by mathematician Andrey Kolmogorov in 1933, form the foundation from which all probability rules derive.

First, the probability of any event is non-negative: P(A) >= 0 for any event A. Second, the probability of the entire sample space is 1: P(S) = 1, meaning something must happen. Third, for mutually exclusive events (events that cannot occur simultaneously), the probability of their union equals the sum of their individual probabilities.

The complement rule states that P(not A) = 1 - P(A). If the probability of rain tomorrow is 0.3, the probability of no rain is 0.7. This rule is especially useful when calculating the probability of "at least one" occurrence, since P(at least one) = 1 - P(none).

The addition rule for any two events states: P(A or B) = P(A) + P(B) - P(A and B). You subtract the intersection to avoid counting outcomes that belong to both events twice. For mutually exclusive events, P(A and B) = 0, so the formula simplifies to P(A or B) = P(A) + P(B). The probability of drawing a heart or a king from a standard deck is 13/52 + 4/52 - 1/52 = 16/52, subtracting the king of hearts which is both a heart and a king.

The multiplication rule for independent events states: P(A and B) = P(A) x P(B). Two events are independent when the occurrence of one does not affect the probability of the other. Successive coin flips are independent because each flip has no memory of previous results. The probability of getting heads on three consecutive flips is 0.5 x 0.5 x 0.5 = 0.125.

Conditional Probability

When events are not independent, the probability of one event changes depending on whether another event has occurred. Conditional probability, written P(A|B), means "the probability of A given that B has occurred." The formula is:

P(A|B) = P(A and B) / P(B)

Consider a standard deck of 52 cards. The probability of drawing an ace is 4/52. But if you know the card drawn is a face card or ace (16 cards total), the conditional probability of it being an ace given it is a face card or ace becomes 4/16 = 0.25. The condition restricts the sample space to only those outcomes where B is true.

Conditional probability explains why many intuitive probability judgments go wrong. The probability of testing positive for a rare disease given that you actually have it (sensitivity) is very different from the probability of having the disease given that you tested positive (positive predictive value). Confusing these two conditional probabilities leads to unnecessary panic when screening tests return positive results for rare conditions.

Bayes' Theorem

Bayes' theorem provides a formal method for reversing conditional probabilities, computing P(A|B) from P(B|A):

P(A|B) = P(B|A) x P(A) / P(B)

This theorem is the mathematical engine behind Bayesian statistics. It tells you how to update your beliefs (the prior probability P(A)) in light of new evidence (the likelihood P(B|A)) to arrive at revised beliefs (the posterior probability P(A|B)).

A practical example: suppose a disease affects 1 in 1000 people (P(disease) = 0.001). A test detects the disease 99% of the time when it is present (sensitivity = 0.99) and correctly returns negative 95% of the time when the disease is absent (specificity = 0.95). If you test positive, what is the probability you actually have the disease? Applying Bayes' theorem: P(disease|positive) = (0.99 x 0.001) / ((0.99 x 0.001) + (0.05 x 0.999)) = 0.00099 / 0.05094 = approximately 0.019, or about 2%. Despite the test's apparent accuracy, a positive result for a rare disease still means you probably do not have it, because false positives from the large healthy population overwhelm true positives from the small affected population.

Independence and Dependence

Two events are independent if knowing that one occurred provides no information about whether the other will occur. Mathematically, A and B are independent if and only if P(A|B) = P(A), or equivalently, P(A and B) = P(A) x P(B). Coin flips, die rolls across separate dice, and random samples drawn with replacement are all independent.

Events are dependent when one event affects the probability of another. Drawing cards without replacement creates dependence: after drawing one ace from a deck of 52, the probability of drawing a second ace changes from 4/52 to 3/51. Manufacturing defects on an assembly line can exhibit dependence if a machine malfunction causes a burst of defective items rather than isolated random failures.

Mistaking dependent events for independent ones (or vice versa) leads to serious errors. The gambler's fallacy assumes that independent random outcomes are dependent: believing that after a streak of 10 heads, tails is "due." The coin has no memory, so each flip remains 50-50 regardless of history. Conversely, treating stock market returns as independent when they exhibit serial correlation leads to underestimating the probability of extended market crashes.

The Law of Large Numbers

The law of large numbers guarantees that as the number of independent trials increases, the sample average converges toward the expected value (population mean). Flip a fair coin 10 times and you might observe 70% heads, but flip it 10,000 times and the observed proportion will fall very close to 50%. This law justifies the frequentist interpretation of probability and explains why casinos reliably profit despite the randomness of individual bets.

The law does not say that outcomes must "balance out" in the short run. After 10 heads in a row, the coin does not owe you tails. Rather, future results will eventually dilute the early imbalance as the sample grows. The proportion converges because the denominator grows without bound while the numerator grows at a rate proportional to the true probability.

The Central Limit Theorem

The central limit theorem (CLT) is arguably the most important result in probability for applied statistics. It states that when you take sufficiently large random samples from any population with finite mean and variance, the distribution of sample means will be approximately normal, regardless of the shape of the original population distribution.

This result is remarkable because it means you can use normal distribution-based methods (z-tests, confidence intervals) even when the underlying data is skewed, uniform, bimodal, or follows any other non-normal pattern. The approximation improves as sample size increases, with n = 30 often cited as a rough threshold for "sufficiently large," though highly skewed populations may require larger samples.

The CLT also quantifies how much sample means vary: the standard deviation of the sampling distribution (called the standard error) equals the population standard deviation divided by the square root of the sample size. This relationship explains why larger samples produce more precise estimates and why quadrupling the sample size cuts the standard error in half.

Key Takeaway

Probability provides the mathematical language for quantifying uncertainty. The addition and multiplication rules handle combined events, conditional probability reveals how knowledge changes likelihood, and the central limit theorem connects sample data to population-level conclusions regardless of the underlying distribution shape.