Random Match Probability - Dr. Dan Krane, Wright State University



Forensic laboratories generally provide estimates of the frequency of a matching DNA profiles among members of three broad racial groups in North America: Caucasians, African-Americans, and Hispanics.28 The frequency estimates are derived from data bases in which are recorded the DNA profiles of a large number of individuals (usually several hundred) from each racial group. The individuals profiled in the data bases are usually “convenience samples” of blood donors or paternity case litigants.

To generate frequency estimates that may be as rare as one in a billion, or even one in a trillion, from a data base of several hundred individuals, forensic laboratories typically follow a three-step procedure. First, they estimate the frequency of each allele in the DNA profile by simply counting to determine the proportion of people in the data base who have it. If two percent of the alleles (of a particular locus) are type A and three percent are type B, their frequencies would be stated as .02 and .03 respectively.

Second, they estimate the frequency of each genotype by using the formula 2pq, where p and q are the frequencies of the two alleles in the genotype. Suppose, for example, that a genotype consisted of alleles A and B. The frequency of genotype AB would be estimated to be 2 x .02 x .03 = .0012 (approximately 1 in 833).29 This formula assumes that the frequencies of the two alleles in a genotype are statistically independent and may significantly underestimate the frequency of genotypes if the allele frequencies are not independent.30

Third, they estimate the frequency of the overall DNA profile by multiplying the frequencies of each genotype. For example, suppose that there is a three-locus match between the suspect and the evide ntiary sample. At the first locus, both have genotype AB, which has an estimated frequency of 0.0012; at the second locus, both have genotype CD, which has an estimated frequency of 0.005; at the third locus both have genotype EF, which has an estimated frequency of 0.01. An analyst would typically report that the frequency of the overall profile, across the three loci, is .0012 x .005 x .01 = .00000006, or one in 16.7 million. This formula, sometimes called the product rule, assumes that the frequencies of the genotypes are statistically independent and may significantly underestimate the frequency of the multilocus genotype if the frequencies are not independent.31


28 Some laboratories divide Hispanics into subcategories (Southwestern and Southeastern Hispanics) and some include additional groups (e.g., Orientals, American Indians).

29 The product of the individual allele frequencies is multiplied by 2 because there are two ways a person can get a given genotype. A person may have genotype AB as a result of receiving A from his father and B from his mother, or vice versa. By analogy, there are two ways to roll number eleven with a pair of dice: a five on the first die and a six on the second, or vice-versa. Hence, the probability of rolling eleven is 2 x 1/6 x 1/6 = 1/18

30 When alleles at any genotype are statistically independent in a particular population, the population is said to be in Hardy-Weinberg equilibrium . See NRC Report, p. 78.

31 When the genotypes at different loci are statistically independent in a given population, the population is said to be in linkage equilibrium. See NRC Report, p. 78-79.



Materials

On-Line Random Match Probability Calculator

Brenner, Charles H. (2003). Forensic mathematics of DNA matching.

Koehler, J. (1995). The random match probability of DNA evidence: irrelevant and prejudicial?. Jurimetrics. 35: 201-218.



Return to main page