Implications of flawed databases - Dr. Larry Mueller, UC Irvine

The United Kingdom and all fifty American states now have government-operated databanks containing the DNA profiles of known offenders. Many crimes have been solved when a databank search revealed a match between the DNA profile of a blood or semen sample left by the perpetrator at a crime scene and the profile of a known individual in the databank. A databank match is called a cold hit.

The FBI maintains a national databank of DNA profiles known as CODIS (Combined DNA Indexing System), which includes a Convicted Offender Index (containing profiles of offenders submitted by states) and a Forensic Index (containing DNA profiles of evidence related to unsolved crimes). CODIS allows government crime laboratories at a state and local level to conduct national searches which might reveal, for example, that semen deposited during an unsolved rape in Florida could have come from a known offender from Virginia.

Government databanks were initially limited to convicted violent or sex offenders. However, there has been serious discussion of expanding databanks to include arrestees, or even to make them universal (perhaps by sampling DNA from all citizens at birth), in the interest of better crime control.

Civil libertarians have expressed concern that government agencies could use the genetic information they collect in an intrusive or inappropriate manner. The information included in CODIS is limited to numerical data that designate RFLP and STR profiles. These profiles are useful for identifying individuals but are linked to no known medical or behavioral characteristics. However, most states have retained blood samples from those included in state databanks. State and federal statutes limit the disclosure of information contained in government databanks and generally specify that it be used solely for law enforcement purposes.

When police have the DNA profile of a perpetrator but cannot establish his or her identity, they sometimes conduct what has become known as a DNA dragnet, in which large numbers of individuals in the relevant community are asked to submit samples voluntarily for DNA testing. Police generally collect samples by rubbing inside the individual’s cheek with a cotton swab. Even if the guilty party does not submit a sample, the DNA dragnet may help police by narrowing the number of possible suspects. The first DNA dragnet, which was chronicled in Joseph Wambaugh’s book “The Blooding,” helped police solve two murders in Leicester England in 1987. The guilty man was identified when, in an effort to avoid suspicion, he asked a friend to submit a sample in his place. DNA dragnets have since been used repeatedly in Britain and are becoming more common in the U.S.

Prosecutors in some jurisdictions have developed a procedural innovation called a DNA warrant as a means of avoiding the statute of limitations in cases where they have DNA from the perpetrator but have not yet identified a suspect. Before the statute of limitations runs out, charges are formally filed in the case, but the “defendant” is identified by DNA profile rather than by name. The legality and constitutionality of this practice is still subject to debate.

The assumption that the alleles in DNA profiles are statistically independent has been a key point of contention. When DNA evidence was first introduced, a number of experts raised the concern that human populations might be structured, such that certain DNA profiles are particularly common in people of the same ethnic, religious or geographic subgroup. If there is a significant amount of structure in U.S. populations, then the standard method of calculating DNA profile frequencies, which assumes alleles are statistically independent, would be invalid and might greatly underestimate the frequency of a matching profile.

By analogy, suppose that a population survey showed that 10 percent (1 in 10) of Europeans have blond hair, 10 percent have blue eyes, and 10 percent have fair skin. Multiplying these frequencies yields a figure of .001 (1 in 1000) for the frequency of Europeans with all three traits. This estimate is invalid because these traits tend to occur together among Nordics. The estimate of .001 is obviously far too low for Scandinavia, where Nordics are concentrated. Moreover, because Nordics constitute a significant percentage of the European population, the estimate of .001 is also too low for Europe as a whole.32

Whether there is sufficient structure in human populations to invalidate forensic statistics was a hotly debated issue in the early 1990s,33 although empirical research has since allayed much of the concern. In the early 1990s, this debate led courts in several jurisdictions to exclude DNA evidence under the Frye standard, on grounds that the method for statistical computation was not generally accepted.34 A second National Research Council report in 1996 (commonly referred to as NRC II) indicated that the population substructure controversy had subsided and recommended that an alternative corrective factor often referred to as “theta” be applied in product rule calculations for only those loci where an individual possesses two copies of the same allele. “The abundance of data in different ethnic groups within the major races and the genetically and statistically sound methods recommended in this report imply that the ceiling principle and the interim ceiling principle are unnecessary.” Most laboratories today follow the NRC recommendations.

One of the most recent statements of acceptance of the unmodified product rule was made by the Supreme Court of California, a court that has rigorously examined DNA evidence. In People v. Soto, the court concluded that “the [courts below] correctly determined that the unmodified product rule, as applied in DNA forensic analysis, is generally accepted in the relevant scientific community of population geneticists, and that statistical calculations made utilizing that rule meet the Kelly standard of admissibility.”

32 Any errors caused by population structure are exacerbated when the frequency of individual characteristics is estimated from an inappropriate data base. For example, if one relied on a population of Sicilians to estimate the frequency of blond hair, blue eyes and fair skin, among Europeans, one might mistakenly assume each characteristic was found in one person in 100, rather than 1 in 10. Multiplication would then lead to an estimate that only 1 person in one million has blond hair, blue eyes, and fair skin.

33 For reviews, see K. Roeder, DNA Fingerprinting: A Review of the Controversy, 9 Statis.Sci 222 (1994) and accompanying commentary by multiple authors; B. Weir, Population Genetics in the Forensic DNA Debate, 89 Proc.Natl.Acad.Sci. 11654 (1992); D. Kaye, DNA Evidence: Probability, Population Genetics, and the Courts, 7 Harv.J.L&Tech. 101 (1993); Thompson, supra note * , at 61-89.

34 Commonwealth v. Curnin, 565 N.E.2d 440 (Mass. 1991); Commonwealth v. Lanigan, 596 N.E.2d 311 (Mass. 1992); People v. Barney, 8 Cal.App.4th 798, 10 Cal.Rptr 731 (1992); State v. Vandebogart, 136 N.H. 365, 616 A.2d 843 (1992); U.S. v. Porter, 618 A.2d 629 (D.C. App. 1992); People v. Wallace, 17 Cal.Rptr. 2d 721 (Cal.App. 1 Dist. 1993); State v. Bible, 1993 WL 306544 (Ariz.Sup.Ct. August 12, 1993);


DNA Advisory Board. (2000). Statistical and population genetics issues affecting the evaluation of the frequency of occurrence of DNA profiles calculated from pertinent population database(s). Forensic Science Communications. 2(3).

Donnelly, P. and Friedman, R. (199). DNA database searches and the legal consumption of scientific evidence. Michigan Law Review. 97:931.

Li, G.C., and Johnston, L. Observations associated with CODIS hits obtained by searching a large DNA databank.

Mueller, L. D. (1998). Letter to the Editor. Journal of Forensic Science 43:446-447.

Mueller, L. D. (in press). Population substructure. Encyclopedia of Genetics. Academic Press, London.

Return to main page