Forensic laboratories generally provide estimates of the frequency of a matching DNA profiles
among members of three broad racial groups in North America: Caucasians, African-Americans, and
Hispanics.^{28} The frequency estimates are derived from data bases in which are recorded the DNA profiles
of a large number of individuals (usually several hundred) from each racial group. The individuals profiled
in the data bases are usually “convenience samples” of blood donors or paternity case litigants.

To generate frequency estimates that may be as rare as one in a billion, or even one in a trillion, from a data base of several hundred individuals, forensic laboratories typically follow a three-step procedure. First, they estimate the frequency of each allele in the DNA profile by simply counting to determine the proportion of people in the data base who have it. If two percent of the alleles (of a particular locus) are type A and three percent are type B, their frequencies would be stated as .02 and .03 respectively.

Second, they estimate the frequency of each genotype by using the formula 2pq, where p and q are
the frequencies of the two alleles in the genotype. Suppose, for example, that a genotype consisted of
alleles A and B. The frequency of genotype AB would be estimated to be 2 x .02 x .03 = .0012
(approximately 1 in 833).^{29} This formula assumes that the frequencies of the two alleles in a genotype are
statistically independent and may significantly underestimate the frequency of genotypes if the allele
frequencies are not independent.^{30}

Third, they estimate the frequency of the overall DNA profile by multiplying the frequencies of
each genotype. For example, suppose that there is a three-locus match between the suspect and the
evide ntiary sample. At the first locus, both have genotype AB, which has an estimated frequency of
0.0012; at the second locus, both have genotype CD, which has an estimated frequency of 0.005; at the
third locus both have genotype EF, which has an estimated frequency of 0.01. An analyst would typically
report that the frequency of the overall profile, across the three loci, is .0012 x .005 x .01 = .00000006, or one in 16.7 million. This formula, sometimes called the product rule, assumes that the frequencies of the
genotypes are statistically independent and may significantly underestimate the frequency of the multilocus
genotype if the frequencies are not independent.^{31}

^{28} Some laboratories divide Hispanics into subcategories (Southwestern and Southeastern Hispanics) and
some include additional groups (e.g., Orientals, American Indians).

^{29} The product of the individual allele frequencies is multiplied by 2 because there are two ways a person
can get a given genotype. A person may have genotype AB as a result of receiving A from his father and B
from his mother, or vice versa. By analogy, there are two ways to roll number eleven with a pair of dice: a
five on the first die and a six on the second, or vice-versa. Hence, the probability of rolling eleven is 2 x
1/6 x 1/6 = 1/18

^{30} When alleles at any genotype are statistically independent in a particular population, the population is
said to be in Hardy-Weinberg equilibrium . See NRC Report, p. 78.

^{31} When the genotypes at different loci are statistically independent in a given population, the population
is said to be in linkage equilibrium. See NRC Report, p. 78-79.

On-Line Random Match Probability Calculator

Brenner, Charles H. (2003). Forensic mathematics of DNA matching.

Koehler, J. (1995). The random match probability of DNA evidence: irrelevant and prejudicial?. Jurimetrics. 35: 201-218.