Keywords

1 Introduction

Charles Darwin presented his theory of evolution in The Origin of Species,Footnote 1 published in 1859. In so doing, he fundamentally transformed the practice of biology. Prior to Darwin, biology was mostly a descriptive discipline concerned with dissecting organisms and with arranging species into coherent systems of classification. After his work, biology was a proper theoretical science, one worthy of a place right alongside physics and chemistry. In the light of evolution, the patterns of anatomical similarities and differences among species were no longer brute facts, but were instead the logical outcome of a comprehensible historical process. Classification now had an objective basis in the one true tree of life and no longer had to be based on arbitrary human preferences.Footnote 2 The empirical and philosophical questions raised by evolution gradually initiated a panoply of research programs that continue to bear fruit in the present.

There has been extensive scholarship exploring the historical effect of evolution on biological practice.Footnote 3 Less well-explored is the effect of Darwin’s work on mathematical practice. I do not mean by this that evolution altered the way mathematicians viewed the philosophical underpinnings of their discipline. Rather, I mean that evolution offered surprising new opportunities for mathematical modeling, especially with regard to probability theory and statistics.

Darwin himself was famously ambivalent about mathematics, and the Origin contains no equations or mathematical models. That notwithstanding, the eventual ascent of his ideas to the centerpiece of biological thought was in large measure the result of abstract, probabilistic modeling undertaken first by Ronald Fisher and later by Sewall Wright and J. B. S. Haldane, in the early twentieth century. Probability is a major part of modern evolutionary theory, playing a central role in models of short-term allelic change in gene pools, as well as in methods of phylogenetic reconstruction.

Darwin’s theory implied that chance was an ineliminable part of natural history and, therefore, that probability and statistics were essential to understanding it. From the mathematical side, many innovations in probability and statistics arose from a desire to model evolutionary processes.

However, for as long as there have been evolutionists, there have also been anti-evolutionists, meaning those who believe that no fully naturalistic process can explain the broad sweep of natural history and that it is necessary to appeal in some way to the actions of an intelligent designer. They make a variety of arguments in defense of their views, and some of these arguments rely explicitly on probability. Scientists are mostly dismissive of these arguments, finding them to be both biologically and mathematically fallacious.Footnote 4 I agree with this view. However, their superficial sophistication and their cachet among lay audiences in cultural disputes about the proper relations between church and state make them worth taking seriously. Science and mathematics should not be seen merely as ivory tower disciplines, and when their methods are abused to serve political agendas, it behooves scholars to take notice.Footnote 5

This chapter is organized as follows: Section 2 provides definitions for certain key terms that may be unfamiliar to some readers. Section 3 describes those elements of Darwin’s theory of evolution relevant to our later discussions. Section 4 situates Darwin’s work in the broader historical context of the so-called probabilistic revolution and notes some interesting parallels between Darwinian natural selection and the introduction of statistical techniques in physics.

The decades following the publication of the Origin saw the emergence of two schools of thought on how to test Darwinian hypotheses. The so-called “biometricians” applied statistical methods to the study of naturally occurring variations, while the “Mendelians” preferred methods drawn from combinatorics and discrete probability. This dichotomy is explored in Sect. 5. While both approaches had merit, it was the methods of the Mendelians that had more relevance specifically to evolution. Their work led to the modern science of population genetics, and we present some of the basic models and equations of this discipline in Sect. 6. Section 7 presents a brief discussion of more modern uses of probability in evolutionary studies. Finally, we discuss the probabilistic arguments of evolution’s critics in Sect. 8.

A short survey article of this sort must inevitably begin with an apology for everything that has been left out. Whole books get written on the subject matter of any of the major sections of this paper. It is hoped, however, that these brief remarks will serve as an introduction to a fascinating chapter in the history of both mathematics and biology.

2 Biological Terminology

As we go along, it will be necessary to employ some of the standard terminology used by geneticists and evolutionary biologists. It will be convenient to have all the needed definitions in one place, to avoid frequent disruptions to our story later on. Readers familiar with basic biology can skip this section.

It is customary to make a distinction between genotype and phenotype. The genotype of an organism is its collection of genes. Its phenotype is the sum total of its observable characteristics. When we are discussing a population of interbreeding organisms, the collection of all genes found in any of the individuals is called the gene pool for that population. A particular version of a gene that appears in a specific organism is called an allele. The location on a chromosome where an allele resides is called the locus of that allele.

The two main mechanisms of evolutionary change are natural selection and genetic drift. In both cases, we imagine that an allele initially appears with some frequency in a population and that, at some later time, we find that frequency has changed. If the change in frequency occurred because the presence or absence of the allele affects the probability that an organism will survive long enough to reproduce, then we say the change resulted from natural selection. If instead the change occurred because of random sampling bias or some other stochastic factor, then the change is attributed to genetic drift.

If we sequence a particular genetic locus in a large number of interbreeding organisms, we invariably find many variations of the same gene. This variation is referred to as genetic polymorphism. A common scenario among sexually reproducing organisms is that an individual inherits one copy of a gene from its mother and one from its father. If the two copies are the same, then the individual is said to be homozygous for this gene. Otherwise, it is said to be heterozygous.

Finally, phylogenetics refers to the branch of evolutionary biology devoted to working out the evolutionary relationships among modern species, or possibly extinct species in the context of paleontology. Such relationships are typically expressed in the form of trees of descent.

3 The Basics of Darwinian Evolution

Darwin’s theory begins with two empirical observations:

  1. 1.

    Species are comprised of individuals that vary from one another, and at least some of these variations are heritable.

  2. 2.

    Species tend to reproduce at a geometric rate, implying that their numbers would increase quickly if this tendency is not checked by environmental constraints.

We then draw two conclusions from these observations:

  1. 3.

    There is a struggle for existence among the individuals of a species. Some individuals will fortuitously possess variations that give them an advantage in this struggle, and they will tend to leave more offspring than those not possessing those variations.

  2. 4.

    These variations will accumulate through the generations, to the point where they alter the average physical characteristics of the species.

Darwin referred to the process through which favorable variations accumulate over time as “natural selection.”Footnote 6

None of these points is controversial, and they have been readily agreed to by everyone from Darwin onward. Items (1) and (2) are simple empirical facts, and items (3) and (4) seem like undeniable conclusions from them. The controversy did not arise until Darwin extended these ideas beyond what many of his contemporaries felt were their appropriate scope.

Darwin believed that natural selection was so powerful that over time the accumulation of small variations could lead to new species. He argued that all modern species arose through a process of descent with modification from more ancient species, which themselves arose from more ancient species still. Eventually, this process traces back to the origin of life, or perhaps to a small number of very ancient original life forms. Darwin made no attempt to explain the origin of life, but instead took as given some relatively simple form of ancient life. He argued that through all the eons in which that ancient life was evolving, natural selection was an especially important mechanism of organic change.

To persuade others of this conclusion, Darwin began The Origin of Species with a chapter about pigeons. He noted the success of human breeders in producing pigeon varieties by selectively breeding from among those possessing characteristics the breeder found desirable. This was meant as proof of concept for the notion that small, naturally-occurring variations could add up to significant change in the average traits of a species. This chapter was followed by several others of a theoretical nature, in which Darwin laid out the central concepts underlying the process of natural selection and responded to possible objections to his ideas.

This led to the heart of the book, in which Darwin laid out the evidence for evolution. His argument took the form of a consilience of inductions, meaning that he showed how evolution neatly accounted for the facts in several seemingly distinct branches of the life sciences. In particular, he pointed to facts drawn from the fields of biogeography, classification, morphology, and paleontology. In each case, he argued that the facts were best explained by a process of descent with modification, rather than through a process of separate or special creations.

These ideas raise a host of scientific and philosophical questions, and Darwin’s critics were quick to point them out. Was natural selection really so powerful that it could transform one species into another? Could complex structures like wings or eyes actually arise through a fully natural process? Can a scientific theory legitimately be defended by a consilience of inductions in the absence of direct experimental confirmation? Does evolution entail nominalism over essentialism with regard to species? If evolution plays out through natural selection, then what room is there for teleology, final causes, or divine action?

These are all good questions, and all have been the subject of intense discussion from Darwin’s time right through to the present. For our purposes, however, the most important part of his theory, and arguably the most radical, is that he placed chance right at the heart of natural history.

4 Evolution as Part of the “Probabilistic Revolution”

A modern mathematical reader will quickly note the probabilistic machinery appropriate to Darwin’s theory. Evolution has the structure of a Markov chain. Roughly, the states of the chain are the gene pools of organisms and their environments. The transition probabilities are such that states genetically close to what already exist are more likely to arise in the next generation than those that are far away. The process is “memoryless” in the sense that the future evolution of a species depends only on its current state and not on the historical vagaries leading to that state. More specifically, we see aspects of a random walk, as when genetic drift and neutral mutations lead to evolutionary change in the absence of natural selection, as well as branching processes, as when a local population becomes isolated from its ancestral stock and eventually becomes a new species.

However, this sort of mathematical machinery did not exist in 1859. Probability itself was in a somewhat rudimentary state at that time and was still mostly concerned with combinatorial problems related to games of chance.Footnote 7 Philosophically, probability was dominated by the work of Laplace, who took an epistemic view of the subject. Having accepted the law-bound picture of the universe inherent in Newtonian physics, he believed that statements of probability could only be expressions of human ignorance. This view was considered so fundamental that he opened his Philosophical Essay on Probabilities by writing:

All events, even those which on account of their insignificance do not seem to follow the great laws of nature, are a result of it just as necessarily as the revolutions of the sun. In ignorance of the ties which unite such events to the entire system of the universe, they have been made to depend upon final causes or upon hazard, according as they occur and are repeated with regularity, or appear without regard to order; but these imaginary causes have gradually receded with the widening bounds of knowledge and disappear before sound philosophy, which sees in them only the expression of our ignorance of the true causes.Footnote 8

Laplace did not present this as a viewpoint in need of defense, but simply as an obvious conclusion of then-modern science.

Given the prevalence of this view, you can imagine the response of many of Darwin’s contemporaries to his theory, which featured two distinct roles for chance: favorable variations were said to arise fortuitously, and selection was understood as a statistical tendency of certain members of a population to leave more offspring than their competitors. This ran into philosophical opposition from those who deplored the idea of chance playing an explanatory role in natural history, as well as theological opposition from those who feared that Darwin was ruling out a role for divine providence in the origin of species.

“Chance” can be a challenging notion to pin down since it means different things in different contexts.Footnote 9 To say that some event occurred by chance might mean that it occurred with no cause at all, as is sometimes suggested in modern discussions of quantum mechanics. It might mean that the event did not result from the plan of an intelligent agent. Sometimes the intent is that the event was inherently unpredictable beforehand on account of the limitations on human knowledge. Still another meaning is that the event was the result of a large number of independent causal chains, as when we consider all of the things that had to happen just so for a traffic accident to occur.

Darwin’s notion of chance was not really any of these.Footnote 10 He certainly did not mean the variations were uncaused. Rather, his intention was that the variations among organisms are random with respect to the needs of the organisms. He argued that we should not imagine that an organism senses that a particular variation would be useful and then through some inner drive manifests just that variation. In a passage from his book The Variation of Plants and Animals Under Domestication, he offered a useful analogy:

[I] have spoken of selection as the paramount power, yet its action absolutely depends on what we in our ignorance call spontaneous or accidental variability. Let an architect be compelled to build an edifice with uncut stones, fallen from a precipice. The shape of each fragment may be called accidental; yet the shape of each has been determined by the force of gravity, the nature of the rock, and the slope of the precipice – events and circumstances, all of which depend on natural laws; but there is no relation between these laws and the purpose for which each fragment is used by the builder. In the same manner the variations of each creature are determined by fixed and immutable laws; but these bear no relation to the living structure which is slowly built up through the power of selection.Footnote 11

However, with regard to scientists’ openness to chance as an explanatory factor in nature, there was a fortuitous aspect to the timing of The Origin of Species. The mid-nineteenth century is a period in scientific history referred to by some scholars as the “probabilistic revolution.”Footnote 12 Researchers in a variety of disciplines were coming to realize that statistical and probabilistic techniques were not just useful, but essential to solving practical problems.

In an interesting historical coincidence, Darwin published his book just a year before James Clerk Maxwell published his seminal work on the probability distribution appropriate to the velocities of individual particles in large ensembles of gas molecules, which later became foundational in the development of statistical mechanics.Footnote 13 This is especially noteworthy since Darwinian evolution has some interesting parallels with statistical mechanics.

At the heart of statistical mechanics is the idea that order can arise from randomness. The predictable, law-like behavior of matter arises from the random motions and collisions among large ensembles of microscopic particles. This approach led to tremendous advances in our understanding of thermodynamics. For example, the second law holds that entropy will never spontaneously decrease in an isolated system. In practical terms, this means that the energy of an isolated system tends to assume more dissipated forms and becomes less available for doing work. In classical thermodynamics, this was simply an empirical observation that was believed in general because it was seen to hold in any specific system in which the principle could be tested. Statistical mechanics, with its distinction between macrostates and microstates, provides insight into why energy dissipation in isolated systems is to be expected.

Darwin was essentially doing for natural history what statistical mechanics had done for thermodynamics. He showed there was no need to reference the master plan of an omnipotent engineer to explain the existence of modern species. Instead, they could arise from a chaotic jumble of random variations passed through the sieve of natural selection. Moreover, taking this view allowed tremendous insight into other aspects of biology.

Philosophically, the scientists of Darwin’s time might have been more willing to countenance chance explanations than were their counterparts in previous eras. However, that still left open a central question: How can we put Darwinian natural selection to empirical tests?

5 Biometry Versus Mendelism

Darwin was largely successful in convincing the scientists of his time that evolution had occurred. The numerous lines of evidence he adduced from various branches of anatomy, coupled with the evidence from paleontology and biogeography, were sufficient to convince scientists that modern species shared a far higher degree of relatedness with one another than had previously been appreciated.Footnote 14

However, Darwin was far less successful in convincing people that natural selection was the primary mechanism of evolution. Skeptics wondered if the small, random variations central to Darwin’s theory could really accumulate to the point of creating new species. Moreover, the favored theory of heredity at the time held that the characteristics of the offspring were a literal blend of the characteristics of the parents. If this were correct, then small variations would quickly vanish when the organism possessing the variation mated with those who did not. Regression to the mean would quickly occur, making significant directional evolution impossible.

A rival explanation of evolutionary change relied on so-called sports, meaning offspring that exhibited large variations relative to their parents. According to this school of thought, it was the variations themselves that led to significant evolutionary change. Natural selection, to the extent it was relevant at all, served only to affect minor changes in the characteristics of species. Advocates of this view came to be known as “mutationists.”Footnote 15

Plainly, the impasse could only be broken through a careful, quantitative study of naturally occurring variation among individuals. However, this presented a problem since the physical basis of heredity was entirely unknown at that time. The very existence of genes, much less their chemical structure, would not be confirmed until the early twentieth century. Consequently, scientists could only study observable, phenotypic variations in organisms. This led to an area of research known as “biometry.”

The general philosophy underlying this approach was clearly expressed by Walter Weldon, an especially prominent biometrician, in a commentary on one of his papers:

The questions raised by the Darwinian hypothesis are purely statistical, and this statistical method is the only one at present obvious by which that hypothesis can be experimentally checked.Footnote 16

Elsewhere he elaborated on what this meant:

It cannot be too strongly urged that the problem of animal evolution is essentially a statistical problem: that before we can properly estimate the changes at present going on in a race or species we must know accurately (a) the percentage of animals which exhibit a given amount of abnormality with regard to a particular character; (b) the degree of abnormality of other organs which accompanies a given abnormality of one; (c) the difference between the death rate per cent. in animals of different degree of abnormality of parents, with respect to any organ; (d) the abnormality of offspring in terms of the abnormality of parents and vice versa. These are all questions of arithmetic; and when we know the numerical answers to these questions for a number of species we shall know the direction and the rate of change in these species at the present day – a knowledge which is the only legitimate basis for speculations as to their past history and future fate.Footnote 17

In Weldon’s view, the main problem of evolutionary research was to determine, for some measurable trait in an organism, the extent of its variation around the population’s mean and its degree of correlation with the same trait in the previous generation. In other words, how much variation was there and how heritable was this variation?

Reading Weldon’s papers today, it is impossible not to be impressed by the sheer patience and attention to detail this work required. In an example we can take to be representative of biometrical research generally, Weldon examined variations in the carapaces of the abundant crab species Carcinus moenas. By taking measurements on thousands of samples and then comparing those measurements with those taken on ancestral samples by prior researchers, Weldon was able to measure the degree of selective advantage of certain carapace designs over others. Here is a typical passage:

The belief in which the work was undertaken was, that the law of variation would be found throughout to be that of the ordinary probability equation and this belief was tested in the following way: – In each of the thirty-five groups, the arithmetic mean of the frontal breadths, and the mean of all the deviations from it, were determined; and from the “mean error” found in this way the modulus of the probability function was calculated. Then, by calling the mean of each group zero, and expressing the deviations from the mean in terms of the modulus, a number of curves were obtained, in each of which the modulus was unity and mean zero; a similar curve of adults was constructed, and the corresponding ordinate of all the thirty-six curves so obtained were added together.Footnote 18

To clarify, “the ordinary probability equation” is what we today call the normal or Gaussian distribution. However, in Weldon’s time this distribution was called the “law of errors,” from its historical roots in studying measurement errors in experimentation. This explains Weldon’s references to “mean errors.” We should note, however, that this term has some biological significance as well. Prior to Darwin, the general belief was that species had immutable essences and that individual variation represented a distraction from understanding that essence.Footnote 19 Darwin reversed that thinking since it was a consequence of his theory that species were ephemeral and that it was the variations themselves that mattered. That Weldon was still thinking of variations as “errors” is suggestive that he had not fully absorbed Darwin’s way of thinking, even while trying to collect data in support of evolution by natural selection.

The theoretical development of statistical methods in the late nineteenth century owes a lot to biometry and especially to the work of its most distinguished practitioner, Karl Pearson. Among other contributions, Pearson developed several techniques for determining curves of best fit, as well as the “chi squared” test for statistical significance. Pearson published a series of 18 papers under the title “Mathematical Contributions to the Theory of Evolution,” and believe me when I tell you they do not make for light reading.Footnote 20

Biometricians focused on traits that varied continuously, such as height, weight, and shell length. They sided with Darwin against the mutationists by emphasizing the importance of selection acting on small variations. They produced examples, such as Weldon’s crab studies, showing that this process could lead to directional change.

Starting around 1900, however, they faced a strong challenge from researchers focusing on discrete character traits. The challengers took the pioneering work of Gregor Mendel as their inspiration. Mendel was a Moravian monk who undertook some of the first systematic investigations of inheritance. He focused on varieties of peas, distinguishing, in particular, those that were wrinkled from those that were smooth. He cross-bred various combinations of pea plants, keeping careful track of the frequency of each variety in subsequent generations. He discovered that a trait might be absent from one generation, only to reappear later on. Data of this sort eventually led him to the now standard notions that there were particles of heredity (today called “genes,” of course), that offspring received one copy of each gene from its parents, and that some genes were dominant and others recessive.

Mendel’s work was largely ignored in his own time, but it was reintroduced to the scientific community in the early twentieth century by researchers such as Hugo De Vries. The sort of mathematics appropriate to understanding particulate inheritance of discrete traits differs from the sort appropriate to studying continuous variation. Contrary to Weldon’s assertions about statistics, the new mathematics of inheritance was combinatorial and probabilistic.Footnote 21

An early success in this vein is the Hardy-Weinberg law, named for the mathematician (Hardy) and geneticist (Weinberg) who published it independently in 1908. It is an example of a “one locus, two allele” model, meaning that we imagine a single genetic locus that is home to one of two alleles, which we shall denote A1 and A2. Suppose that A1 and A2 initially appear with frequencies p and q = 1 – p. We want to determine the frequencies of these alleles in the next generation, if we assume there is no migration, selection, or genetic drift.

Given our set-up, there are three possible genotypes at this locus, which we can denote by A1A1, A1A2, and A2A2. Let us assume that the population is very large and that mating is random with respect to this allele. The frequencies p and q can be interpreted as the probabilities that a randomly chosen allele from the population will be A1 or A2, respectively. In this case, it is easy to see that the frequencies of the three genotypes in the next generation will be p2, 2pq, and q2, respectively. Note that this conclusion depended only on the initial frequencies of the two alleles and that these frequencies did not change from one generation to the next. Consequently, these frequencies will be constant down through the generations, at least until the environment changes so as to make one of our assumptions no longer valid.

Hardy published this in a letter to the editor that opened like this:

I am reluctant to intrude in a discussion concerning matters of which I have no expert knowledge, and I should have expected the very simple point which I wish to make to have been familiar to biologists. However, some remarks of Mr. Udny Yule, to which Mr. R. C. Punnett has called my attention, suggest that it may still be worth making.Footnote 22

Yule had muffed a question about the frequency of dominant alleles in Mendelian populations, and one can sense Hardy’s frustration at having to ride in to set things right. Historically, it tells us something about the novelty of the combinatorial approach at this time that a distinguished geneticist like Yule could have erred on a point that is mathematically elementary.

The combinatorial approach of the Mendelians was initially seen as anti-Darwinian, in the sense that particulate inheritance seemed to support discontinuous evolution through the appearance of sports and not gradual evolution through the accumulation of small variations. However, the work of Ronald Fisher, culminating in his 1930 book The Genetical Theory of Natural Selection, reconciled Darwinian gradualism with particulate inheritance. Fisher’s mathematical modeling showed that the biometrical emphasis on continuous variation was entirely compatible with Mendelian combinatorics.

Fisher’s work was supplemented with additional efforts from Sewall Wright and J. B. S. Haldane, and the modern science of population genetics was born. This body of work was seen as putting natural selection on a firm theoretical footing.Footnote 23 Their work made it clear that even small variations conferring tiny reproductive advantages on their possessors would eventually spread through the population, under a variety of plausible assumptions about nature.

6 Some Equations of Population Genetics

Evolutionary biology addresses big questions about human origins and our place in the natural world. This is why it is sometimes perceived as intruding on religion’s territory and why it receives so much attention from the general public.

However, the quotidian concerns of professional biologists are generally more mundane than this. The grand sprawl of natural history does not really lend itself to mathematical modeling, forcing theoreticians to focus instead on the short-term fate of specific alleles over a few generations. It was precisely this sort of modeling that was pioneered by Fisher, Haldane, and Wright, and it remains a focus of attention for researchers today.

Biologist John Gillespie offers the following humorous description of population genetics:

Population geneticists spend most of their time doing one of two things: describing the genetic structure of populations or theorizing on the evolutionary forces acting on populations. On a good day, these two activities mesh and true insights emerge.Footnote 24

On the theoretical side, the goal is to develop models reflecting various assumptions about the environment and mating habits of a hypothetical species. We then use these models to prove theorems about the likely fate of a given allele, or perhaps a collection of alleles. In these models, there is always a huge trade-off between biological plausibility and mathematical tractability. Evolutionary change in gene pools is governed by so many factors that our models can easily come to look like alphabet soup. That notwithstanding, it is impressive how often biologically simplistic models still turn out to be useful. To experience some of the flavor of this field, let us derive a standard result.Footnote 25

For our starting point, we consider a single genetic locus at which two alleles, still denoted A1 and A2, reside. We assume that A1 initially appears with frequency p and that A2 appears with frequency q = 1 – p. Our interest is in the frequency of A1 as the generations go forth.

Now, there are four factors that might affect the frequency of A1 through the generations: migration, mutation, genetic drift, and natural selection. A proper model would contain variables for all of them, but to keep things tractable we will focus primarily on natural selection. We will assume that mating is random with respect to A1 and A2, meaning that individuals choose their mates without any regard for their genotype at this locus.

Recall that genetic drift refers to changes in gene frequency that result from stochastic causes. It is intuitively clear, and easy to prove mathematically, that genetic drift will be a more powerful force in small populations than in large ones.Footnote 26 For this reason, when modeling natural selection, it is customary to assume the population is so large that it is effectively infinite.

Now, with two alleles we have the three genotypes A1A1, A1A2, and A2A2. We assume that initially the three genotypes are at their Hardy-Weinberg frequencies of p2, 2pq, and q2, respectively. The three phenotypes each have a certain probability of surviving long enough to reproduce, and we will denote these probabilities by w11, w12, and w22, respectively. Each probability individually is said to represent the “fitness” of its associated genotype.

We can also define the mean fitness of the population by the formula

$$ \overline{w}={p}^2{w}_{11}+2 pq{w}_{12}+{q}^2{w}_{22}. $$

In probabilistic terms, this is the expected fitness of a randomly chosen individual in the population.

The number of successful gametes produced by each genotype is proportional to its fitness. If the population size is N, then the total number of successful gametes produced is \( N\overline{w} \). If we let p′ denote the frequency of A1 among the offspring, then we can write

$$ {p}^{\hbox{'}}=\frac{Np^2{w}_{11}+\frac{1}{2}\left(2N\, pq{w}_{12}\right)}{N\overline{w}}=\frac{p^2{w}_{11}+ pq{w}_{12}}{\overline{w}}. $$

If this is unclear, note that the A1A1 genotype only produces A1 gametes, while the A1A2 genotype produces A1 and A2 gametes in equal number.

Finally, we denote the change in frequency of A1 due to selection by Δsp. We compute

$$ {\varDelta}_sp={p}^{\hbox{'}}-p=\frac{p^2{w}_{11}+ pq{w}_{12}-p\overline{w}}{\overline{w}}. $$

Keeping in mind that q = 1 – p, this eventually simplifies to

$$ {\varDelta}_sp=\frac{pq\left[p\left({w}_{11}-{w}_{12}\right)+q\left({w}_{12}-{w}_{22}\right)\right]}{p^2{w}_{11}+2 pq{w}_{12}+{q}^2{w}_{22}}. $$
(1)

We have summarized our findings in Table 1.

Table 1 Our set-up for analyzing the effect of selection on the frequency of an allele

Equation (1) becomes easier to study if we employ the following notational convention: Scale the fitnesses of the three genotypes so that A1A1 has fitness 1 (implying that A1A2 and A2A2 have fitnesses w12/w11 and w22/w11, respectively). We now let s denote the “selection coefficient” for the relative fitnesses of the homozygous genotypes, and we let h denote the “heterozygous effect,” which measures the fitness of the heterozygote relative to the selective difference between the homozygotes. With this notation, we can express the fitnesses of A1A1, A1A2, and A2A2 respectively, as 1, 1 – hs, and 1 – s. A positive value of s indicates that A1 has a selective advantage over A2, while a negative value indicates the reverse.

Equation (1) now simplifies to

$$ {\varDelta}_sp=\frac{pqs\left[ ph+q\left(1-h\right)\right]}{\overline{w}}. $$

In this form, it becomes straightforward to analyze the effect of selection on the frequency of p given different assumptions about s and h.

By analyzing the various cases, one is led to the conclusion that natural selection will always alter the frequency of p in a direction that increases the mean fitness of the population. This result is known as the “fundamental theorem of natural selection.”Footnote 27

Models of this sort played a central role in the establishment of the so-called synthetic theory of evolution, or simply the “modern synthesis,” in the 1940s. Prior to this theoretical modeling, many believed that the small genetic variations on which Darwin put so much emphasis would simply be invisible to natural selection. It was argued that in a typical case, whatever reproductive advantage such variations provided would be too small for selection to overcome other environmental forces. Population genetic models can be used to show that this objection is unfounded. Even very small selective advantages will be sufficient for natural selection to notice, so to speak.Footnote 28

Models of the sort we have considered in this section are sometimes said to comprise “classical population genetics.” Let us now briefly consider a more modern approach to the subject.

7 Coalescent Trees and Phylogenetic Reconstruction

The study of heredity has made a lot of progress since the time of the Mendelians and the biometricians. Today we understand genes down to the molecular level. Researchers no longer have to make do with measurable phenotypic traits of organisms or with hypothetical entities existing only in mathematical equations. Instead, they can study evolution down to the level of individual base pair substitutions in DNA molecules.

The widespread availability of genetic sequence data has opened up new possibilities for population genetics. Whereas the classical approach looked forward in time to model the short-term evolution of allele frequencies, the modern approach looks backward to model the evolutionary history of polymorphism data collected in the present. The result of such an analysis is a rooted, binary tree. The tree’s leaves represent the genetic sequence data in the present, while its root represents the most recent common ancestor. If we orient the tree vertically with the root on top, then we can follow time backward from the bottom to the top. We would then see initially disparate nodes gradually uniting, with the number of tree branches decreasing by one with each such unification. In the terminology of population genetics, each such unification is called a “coalescence,” and the resulting diagram is called a “coalescent tree.”

This is illustrated in Fig. 1, which shows two possible coalescent trees on a starting sample of size four. The two trees have different topologies, corresponding to different possible evolutionary histories for the sample. The subscripts on the time units to the right of the figure indicate the number of branches at that stage of the evolutionary process. We can interpret T(k), where 1 ≤ k ≤ N, to be a random variable that records the length of time during which the number of branches is k.

Fig. 1
figure 1

Two coalescent trees on samples of size four

The goal of coalescent analysis is to develop mathematical models for understanding genetic polymorphisms in modern species. The sequence data of the present presumably arose through a lengthy and complex evolutionary process that was influenced by numerous variables. We want to be able to use coalescent analysis to look at the data and say something like, “Data of precisely this sort is what we would expect if the variables X, Y, and Z were at play in the evolutionary process.” Clearly, any such model will have to be stochastic. There are so many chance factors in evolution that our conclusions about genealogical relationships, except in the most trivial cases, will have to be probabilistic.

In what follows, to keep our models mathematically tractable, we will assume there is no selection or migration. For our purposes, evolution proceeds solely by genetic drift, with possible mutations along the way.

Probability distributions on evolutionary trees were extensively studied in the 1960s, most notably by Motoo Kimura and Warren Ewens. They based their models on an analogy between genetic drift and Brownian motion. In a stochastic evolutionary process, random interactions among genes in one generation lead to further such interactions in later generations, leading to essentially chaotic changes in gene frequencies further down the line. This is roughly analogous to Brownian motion, in which the chaotic movements of microscopic particles in a medium are explained by a cascading sequence of collisions among other microscopic particles.

There is a rich mathematical theory underlying Brownian motion. This theory is based on diffusion equations, which are a subset of partial differential equations more generally. Even though we generally think of evolution as a discrete time process, it can nonetheless be effectively studied with continuous time models. Kimura and Ewens adapted the theory of diffusion equations to the needs of population genetics, thereby developing probabilistic models for evolutionary trees. These models were then tested against the results of Monte Carlo simulations using computers, which was very pioneering at the time.Footnote 29 Modern coalescent theory is largely based on adapting their models to look backward in time instead of forward.Footnote 30

Let us see a standard, if somewhat basic, calculation in this field.Footnote 31 Our model assumes that all evolution is random, meaning that in any generation, any individual is as likely to reproduce as any other. This implies that as we go backward in time, any two lineages are as likely to coalesce as any other. Likewise, any tree topology will be as probable as any other as well.

If N is the number of endpoints in the present, then the probability that a pair of lineages coalesce one generation back in time is just 1/N, and the probability that they remain distinct is therefore 1 – (1/N). The probability that they remain distinct for at least τ generations is then (1 − (1/N))τ.

If we scale time so that one unit corresponds to N generations, then the probability that two lineages stay distinct for more than t time units is

$$ {\left(1-\frac{1}{N}\right)}^{\left\lfloor Nt\right\rfloor}\to {e}^{-t}, $$

as N → ∞. With this scaling, the coalescence time is exponentially distributed with a mean of 1 in the limit.

We can extend this to k lineages. The probability that none of them coalesce in the previous generation is

$$ \prod \limits_{i=0}^{k-1}\;\frac{N-i}{N}=\prod \limits_{i=1}^{k-1}\left(1-\frac{i}{N}\right)=1-\frac{\left(\begin{array}{c}k\\ {}2\end{array}\right)}{N}+O\, \left(\frac{1}{N^2}\right)\, , $$

and the probability that more than two do so is O(1/N2). If N is very large, then this last probability is negligible.

It is illuminating to ask for the expected time to the most recent common ancestor for the entire sample (as opposed to the ancestor for a subset of the initial sample). Our previous work implies that T(k) is exponentially distributed with mean 2/k(k – 1). Using E(X) to denote the expected value of the random variable X, we now compute

$$ E\left(\sum \limits_{k=2}^NT(k)\right)=\sum \limits_{k=2}^NE\left(T(k)\right)=\sum \limits_{k=2}^N\;\frac{2}{k\left(k-1\right)}=2\left(1-\frac{1}{N}\right). $$

On the other hand, we also have E(T(2)) = 1. We conclude that the expected time during which there are only two branches is greater than half the total expected height of the tree.

A useful analogy is to picture a group of cannibalistic beetles in a box. In each beetle encounter, one beetle eats the other. When there are many beetles in the box, collisions will happen frequently and the number of beetles will quickly go down. When there is a small number of beetles in the box, collisions will be far more infrequent.Footnote 32

Let us conclude this section by noting that coalescent theory is a subfield of phylogenetic reconstruction more generally. The grand project of evolutionary biology is to work out the one true tree of life that produced the modern riot of extant species. Trees are fundamentally combinatorial objects, and similar mathematical questions arise whether our data comes from microscopic DNA sequences or macroscopic data of the sort paleontologists might study. The study of Markov models on trees is very highly developed, and this theory is increasingly finding applications in the development of phylogenetic research.Footnote 33

8 The Abuse of Probability by Anti-Evolutionists

With the previous section, we complete our brief survey of the relationships between probability theory and evolutionary biology. We have seen that, historically, the relationship has gone both ways. A desire to test Darwin’s ideas led to progress in statistics, and probabilistic modeling was essential to placing evolutionary biology on a solid theoretical footing. The relationship continues into the present, with ever more sophisticated probabilistic methods being applied to the problems of phylogenetic reconstruction.

However, this is not the end of our story. More so than most branches of science, evolutionary biology has a political dimension, especially in the United States. Within certain faith communities, evolution is seen as contrary to religion, and this sometimes leads to conflicts over the science curriculum to be taught in public schools. Given this social relevance, and given that evolution’s critics frequently employ probability theory in their discourse, it behooves us to take a look at their arguments, if only to understand why most scientists find them unconvincing.

Leaving aside religious considerations, evolution really does seem to run afoul of a powerful intuition that natural forces do not construct complex, functional systems, no matter how much time you give them. Our everyday experience is that machines break down and fall apart unless energy and intelligence are expended to maintain them, but evolution claims that organisms have become more complex over time. Mathematical anti-evolutionism is about trying to place that intuition on a more rigorous footing. Since probability theory is the branch of mathematics that quantifies our notions of what is likely and what is not, it is unsurprising that evolution’s critics rely so heavily on it in their writing.

In this regard, a classic anti-evolutionist argument points to the emergence of complex proteins as the thing that is too improbable to countenance. Proteins can be viewed as long chains of amino acids in which each link is drawn from among 20 possibilities. This makes it possible to think of protein formation in combinatorial terms. Here is an especially clear version of the argument, as presented in a mainstream American periodical:

The specificity of hemoglobin is described by the improbability of the specific amino acid sequence occurring by random chance. Such specificity is capable of exact calculation in the permutation formula

$$ P=\frac{N!}{n_1!\times {n}_2!\times {n}_3!\dots \mathrm{etc}.}, $$

where N is the total number of amino acids in hemoglobin (574); n1, etc. are the number of separate kinds of amino acids; … In the case of hemoglobin, … the specific numerical value of the solution is P = 10654. Thus, we can state that the improbability of hemoglobin occurring by random selection can be represented by the infinitely small number 10–654, which means 10 divided by itself 654 times: as near to zero as one could consider.Footnote 34

Arguments of this general sort are ubiquitous in anti-evolutionist literature.Footnote 35

There are many variations on the basic theme, but the logic always conforms to the following general scheme:

  1. 1.

    Identify a complex biological structure, such as a specific gene or protein.

  2. 2.

    Model its evolution as a process of randomly selecting one item from a very large space of equiprobable possibilities.

  3. 3.

    Use elementary combinatorics to determine the size of the space, which we shall call S.

  4. 4.

    Conclude that the probability of the structure having evolved by chance is 1/S, and assert that this is too small for evolution to be plausible.

I shall refer to this as the Basic Argument from Improbability (BAI).

Since proteins arise ultimately from genes, let us speak, for the moment, of “genotype space” as opposed to “protein space.” Mathematically speaking, genotype space is a metric space equipped with a probability distribution. The metric is such that genotypes that are mutationally close to our starting point are assigned a smaller distance than those that are far away. The probability distribution assigns a higher probability to genotypes that are more fit than it does to genotypes that are less fit.

The BAI is fallacious because it completely ignores the effect of natural selection on the probabilistic and metric structures of genotype space. Evolution does not navigate genotype space by choosing points at random. Instead, it finds its starting point at the origin of life and then carries out a series of local searches in the neighborhood of wherever it happens to be. The probabilistic and metric structures guarantee that most of the space has a probability close to zero of ever arising, while fit genotypes close to our starting point have a high probability of arising. The probability distribution appropriate to genotype space is highly non-uniform, and this fact shows that the calculation put forth by the BAI is based on a hopelessly unrealistic model.Footnote 36

Moving on, a more sophisticated probabilistic, anti-evolutionary argument has been put forward by mathematician William Dembski in a series of books and articles.Footnote 37 He defines a quantity he calls “complex, specified information (CSI).” His basic idea is that it is not improbability by itself that suggests design, but rather improbability coupled with a recognizable pattern. Any sequence of Hs and Ts is as unlikely as any other when we flip a coin 100 times, but 100 Hs, or a perfect alternation of Hs and Ts, show a clear pattern and would therefore suggest trickery of some sort. Any pattern of crags and grooves on a mountain is highly unlikely, but the faces on Mt. Rushmore also fit a pattern and therefore must be explained by intelligent design. Thus, “complex” means low probability, and “specified” means that we find a recognizable pattern. “Information” refers to any event or object whose origin we wish to explain.

As applied to evolutionary questions, the argument might play out like this: Choose a functional structure like the flagellum used for propulsion by some bacteria. It is complex in the sense that it is composed of many individual proteins that must be arranged in a precise way. It is specified in the sense that it functions in a way that is strongly analogous to an outboard motor on a boat. Thus, the flagellum is complex and specified and, therefore, must be designed.

If we are to treat this as a rigorous argument, then within Dembski’s framework we must answer two questions: How do we carry out a meaningful probability calculation to establish complexity in the sense he intends? And how do we distinguish the design-suggesting patterns from those we impose on nature through excessive imagination? (For example, it is very easy to look at a fluffy, cumulus cloud and imagine that you see a dragon.) Let us see how Dembski tries to answer these questions.

To justify the reduction of probability to combinatorics, Dembski relies on the notion of an “irreducibly complex” system. In this context, this idea is the brainchild of biochemist Michael Behe who defined an irreducibly complex system to be one in which several distinct parts work together to carry out some function, such that the removal of any one part causes the system to cease functioning. In his 1996 book Darwin’s Black Box, Behe argued that such a system could not evolve gradually because the precursor systems would inevitably be missing some of the parts and therefore could not function. Since natural selection has no foresight, it will not preserve a non-functional system in the hope that later developments will somehow render it useful.

Dembski now explains the relevance of irreducible complexity to carrying out probability calculations:

An irreducibly complex system is a discrete combinatorial object. Probabilities therefore naturally arise and attach to such objects. Such objects are invariably composed of building blocks. Moreover, these building blocks need to converge on some location. Finally, once at this location the building blocks need to be configured to form the object. It follows that the probability of obtaining an irreducibly complex system is the probability of originating the building blocks required for the system, multiplied times the probability of locating them in one place once the building blocks are given, multiplied times the probability of configuring them once the building blocks are given and located in one place.Footnote 38

Dembski’s argument was recently endorsed in an article published in the academic Journal of Theoretical Biology. Authors Steinar Thorvaldsen and Ola Hössjer present the argument like this:

[D]embski proposes an equation based on three independent events: Ap: originating the building blocks (protein chains) of the protein complex …, Al: localizing the building blocks in the same place, and Ac: configuring the building blocks correctly to form the complex. Then the probability of a protein complex is the multiplicative product of the probabilities of the origination of its constituent parts, the localization of those parts in one place, and the configuration of those parts into the resulting system (contact topology). This leads to the following estimate for the probability of a protein complex (PC) composed of N independent building blocks:

$$ \hat{P}\left({A}_{PC}\right)=\prod \limits_{n=1}^N\left[P\left({A}_p^{(n)}|{\hat{\theta}}_p^{(n)}\right)\cdot P\left({A}_l^{(n)}|{\hat{\theta}}_l^{(n)}\right)\cdot P\left({A}_c^{(n)}|{\hat{\theta}}_c^{(n)}\right)\right], $$

where \( {\theta}_p^{(n)},{\theta}_l^{(n)},\mathrm{and}\ {\theta}_c^{(n)} \) are the parameters involved in forming the protein chain, the localization, and the configuration of the nth building block.Footnote 39

While this argument is an improvement over the BAI, it is not difficult to find grave errors in its formulation. The first is that it is not legitimate to treat an irreducibly complex system as a discrete, combinatorial object because there are several plausible scenarios for how such a system could evolve gradually. For example, the removal of redundancy can lead to irreducible complexity, as when a scaffold supports an arch during its construction. When the arch is complete, the scaffold is removed, leaving behind an irreducibly complex structure (the arch cannot support itself without its capstone, and the capstone has nothing to rest on without the arch.) An interdependence of parts can also arise when one or more parts is not required when it is first introduced, but later becomes essential because of changes in the surrounding system.Footnote 40

Moreover, modeling the evolution of a complex structure as a threefold process of origination, localization, and configuration is biologically unrealistic. Instead of combinatorial arrangements of proteins or other sorts of component parts, we should be thinking about the underlying genetic instructions leading to the finished system. However, the same instructions that dictate which proteins get produced also direct those proteins to specific locations and mandate the relationships among them. This shows that the three probabilities in Thorvaldsen’s and Hössjer’s equation are not independent and therefore cannot be multiplied.

Now let us see how Dembski deals with specification. To address the problem of distinguishing design-suggesting patterns from those we simply impose on nature, he argues that the pattern must be “detachable” from the object itself.

To see the basic idea, we imagine firing an arrow at the wall of a barn. After the arrow embeds itself in the wall, we could paint a small circle around it, thereby making it appear that we hit a small target.Footnote 41 This is a non-detachable pattern since we had to see where the arrow landed before identifying the target. A detachable pattern can be described separately from the object or event itself.

Dembski provides a very technical definition of “detachability”:

Given a reference class of possibilities w, a chance hypothesis H, a probability measure induced by H and defined on w (i. e. P(⋅ | H)), and an event/sample E from w, a rejection function f is detachable from E if and only if a subject possesses background knowledge K that is conditionally independent of E (i.e., P(E | H&K) = P(E|H)) and such that K explicitly and univocally identifies the function f. Any rejection region R of the form Tγ = {w ∈ w | f(w) ≥ γ}Tδ = {w ∈ w| f(w) ≤ δ} is then said to be detachable from E as well. Furthermore, R is then called a specification of E and E is said to be specified.Footnote 42.

Readers with some statistical background will recognize this as standard Fisherian hypothesis testing. In the present context, however, this definition seems unhelpful because we have no way of matching up Dembski’s formalism with biologically realistic counterparts in real life. If we are pondering the origins of a bacterial flagellum, for example, then what, exactly, are playing the roles of w, K, E, f, Tγ, and Tδ?

Indeed, when it comes time to discuss “specification” in the context of particular systems, Dembski does not refer back to this formalism. Instead, he writes:

Biological specification always refers to function. An organism is a functional system comprising many functional subsystems. In virtue of their function, these systems embody patterns that are objectively given and can be identified independently of the systems that embody them.Footnote 43

This is too casual to serve as an argument that functional systems are specified in the precise technical sense required by his theorizing. Biologists say that evolution by natural selection produces functional systems as a matter of course, and this lends some urgency to the question of whether the function of a system can serve as a proper specification. Does the flagellum instantiate a design-suggesting pattern, or is it just the sort of thing that evolution is able to do?

Our conclusion, then, is that Dembski’s theorizing about CSI is not persuasive. We have no way of carrying out meaningful probability calculations, and we have no way of distinguishing design-suggesting patterns in biology from those that can arise through natural causes.

There are many other mathematical arguments in anti-evolutionist discourse, but inevitably they are unsuccessful for the kinds of reasons discussed here.Footnote 44 The metaphor of evolution exploring a vast genotype space can be helpful in many contexts, and biologists often employ it in explaining their reasoning. However, the probabilistic and metric structures of the space are too complex to permit broad mathematical conclusions to be drawn regarding what is possible and what is not.

9 Conclusion

Biologist Sergey Gavrilets, himself a major contributor to the mathematical theory of evolution, writes:

Since the time of the Modern Synthesis, evolutionary biology has arguably remained one of the most mathematized branches of the life sciences, in which mathematical models and methods continuously guide empirical research, provide tools for testing hypotheses, explain complex interactions between multiple evolutionary factors, train biological intuition, identify crucial parameters and factors, evaluate relevant temporal and spatial scales, and point to the gaps in biological knowledge, as well as provide simple and intuitive tools and metaphors for thinking about complex phenomena.Footnote 45

Indeed, among Darwin’s most important legacies is the discovery that probability theory is essential to mathematical modeling in biology. With some clear thinking and technical skill, probability can elucidate many formally mysterious aspects of natural history. When its methods are abused by ideologues, it can create obfuscation and confusion. Regardless, explorations of the connections between evolutionary biology and probability have significantly affected the practice of both disciplines, much to their mutual benefit.