GenoStat Table of Contents
1. Introduction
2. Mixture resolution method
3. Cumulative statistics
4. Entering DNA profile data
5. Additional analysis parameters
Appendix
1. Introduction
GenoStat® is a powerful tool used to calculate DNA statistics and resolve DNA mixtures (separate mixtures into their contributor components). GenoStat has been written in Java to
be able to be run on virtually any computer. To use GenoStat's statistics mode, simply check the allele boxes representing the DNA profile in your sample. To use GenoStat's
mixture resolution mode, enter the RFU values for the peaks found in a sample's electropherogram. The mathematically possible mixture separations
(hypotheses) will be listed in the resolved mixtures area.
The default databases used by GenoStat are the FBI's "2015 Expanded FBI STR Population Data":
Hares, Douglas R. (2015), Selection and implementation of expanded CODIS core loci in the United States.Forensic Science International: Genetics, 17: 33-34. doi: 10.1016/j.fsigen.2015.03.006
Previous versions of the FBI databases as well as many additional population databases are bundled with GenoStat. Please see Appendix A.5 for further information.
2. Mixture resolution method
Samples containing DNA from two or more individuals can be difficult to interpret. Peak height information can be used as the basis of an automatable approach that
objectively resolves the DNA profiles of contributors to mixed forensic samples. With this approach, all possible alternative combinations of genotypes that can account
for the alleles at a given locus are evaluated for their ability to satisfy a set of two generally accepted constraints (peak height balance and peak height additivity).
Hypothetical genotype combinations that are not eliminated from consideration can then be used to generate either a set of random match probabilities or a "constrained" combined
probability of inclusion for the locus. This approach is unarguably objective as only information from the evidentiary sample is required for mixture resolution.
For each locus, our approach to mixture resolution postulates all possible genotype combinations and tests each for compliance with the predicted conditions that must be satisfied
in order for that genotype combination to be acceptable. Consider the case of there being exactly two contributors to a mixed sample. The n peaks present at a given
locus are ranked by height (or area) and labeled: P1, P2,
, Pn, where P1 is a peak of minimal height and Pn a peak of
maximal height. All potential contributor genotype combinations are then listed. For example, at a locus with four peaks
(labeled P1-P4) the possible set of genotypes for two individuals that could explain the observation of all four peaks are:
[(P4, P3), (P2, P1)], [(P4, P2), (P3, P1)], and [(P4, P1), (P3, P2)].
The conditions that must be satisfied for each mixture combination pair are:
| Contributor #1 | Contributor #2 |
Hypothesis #1 | P4 and P3 | P2 and P1 |
Hypothesis #2 | P4 and P2 | P3 and P1 |
Hypothesis #3 | P4 and P1 | P3 and P2 |
Each possible pairing of contributor genotypes represents a hypothesis that is tested for satisfiability against determined conditions of peak height balance and additivity.
If either of the satisfiability conditions fail for a given hypothesis, that hypothesis is removed from further consideration. If all but one alternative hypothesis for a
given locus has been eliminated from consideration, then the remaining hypothesis represents an unambiguously resolved genotype.
3. Cumulative statistics
The Cumulative statistics window forms the right side of the GenoStat screen. This area contains the results of the statistical calculations across all entered loci.
3.1 Locus selection
Statistics are only calculated for selected (checked) loci. The list of loci can be found on the right side of the GenoStat screen. Loci can be included or
removed from generation of statistics by checking or unchecking the box next to the locus name. There are separate selection boxes for the random match
probability (RMP) and combined probability of inclusion (CPI). Any profile can be included in the combined probability of inclusion (CPI) calculation,
but only resolved loci (or loci with one selected mixture hypothesis) can be included in the random match probability (RMP) calculation.
3.2 Cumulative combined probability of inclusion (CPI)
The "unconstrained CPI" is the standard CPI formula, which includes all possible
mixture contributor profiles for a given mixture profile. The "constrained" combined probability of inclusion (CPI) only considers the alternative hypotheses
of genotype combinations that have not been eliminated from consideration. If a locus is fully resolved, then the "solved CPI" is reported.
A full explanation of the statistical methods can be found in the appendix.
3.3 Cumulative random match probability (RMP)
The cumulative random match probability is the random match probability (RMP) across all selected loci that have been fully resolved and/or loci with only a single mixture hypothesis selected.
The level of relatedness can be selected with the drop-down box located above the reported random match probabilities. The random match probability provides the chance that a randomly selected,
unrelated individual would have a profile that matched that seen in a particular sample. GenoStat also provides estimates that a randomly selected related individual would have the same
profile that was observed in a particular sample. Different degrees of relatedness result in different chances of finding a matching profile. GenoStat can generate statistical estimates for the
following levels of relatedness: siblings; half-siblings; parents and children; aunts/uncles and nephews/nieces; first cousins; and second cousins.
A full explanation of these formulas can be found in the RMP appendix.
3.4 Cumulative mixture ratio
The cumulative mixture ratio is an approximation of the amount of DNA present from each contributor. The value is presented as [amount of contributor 1]:[amount of contributor 2].
The mixture ratio is determined by utilizing the peak height information from loci that are fully resolved or loci where only one mixture hypothesis has been selected.
3.5 Database selection
This version of GenoStat contains the three main databases published by the FBI: African American, Caucasian, and Southwest Hispanics. See Appendix for citations.
3.6 Minimum frequencies
For many tables included in Genostat, including the default tables, a minimum allele frequency ratio of 5/2N, where N
is the number of individuals sampled, is enforced. This method was chosen in accordance with:
National Research Council. The Evaluation of Forensic DNA Evidence. Washington, DC: The National Academies Press, 1996.
(freely available for download at NAP.edu)
4. Entering DNA profile data
The left side of the GenoStat screen contains the areas where peak information is entered, loci are resolved, and locus statistics are shown.
4.1 Enter peak height information
Select one of the tested loci by selecting its tab along the top-left portion of the screen. Next, find the box corresponding to the first allele in
the sample's electropherogram and enter its RFU value. Repeat the process for the remaining peaks in the locus.
Once all alleles for a given locus have been entered, click the button labeled "Click here to resolve mixture."
NOTE: This version of GenoStat supports at most two contributors. Therefore, you cannot resolve a locus with more than four peaks.
4.1.1 Off-ladder alleles
GenoStat supports off-ladder alleles. Select on the button labeled "Off-ladder alleles" located under the allele entry boxes for a given locus.
Here, you will be able to enter the allele's label and RFU value. To avoid any potential confusion, please ensure that the off-ladder allele's label is unique.
Do not label an allele with the same label as an on-ladder allele and do not label two off-ladder alleles with the same name.
4.2. Mixture hypotheses
Once you have clicked the button labeled "Click here to resolve mixture", GenoStat will attempt to resolve the mixture. If a separation is mathematically possible,
each possible set of contributor profiles (hypotheses) will be listed in the "resolved mixtures" box. If only one hypothesis is available, then the
locus is fully resolved. If multiple hypotheses are available, then it is possible to remove a hypothesis from consideration by unchecking its
"include" box (the box next to the hypothesis).
NOTE: A locus is only included in the random match probability (RMP) statistics if its RMP box is checked in the main window
and if the locus has only a single mixture hypothesis. If a locus is not fully resolved, you must deselect all but one mixture hypothesis for that locus to
be included in the RMP statistics.
4.3 Locus statistics
The locus statistics window contains the random match probabilities (RMPs) for the contributor profiles in each mixture hypothesis for a given locus.
In addition, the combined probability of inclusion (CPI) is reported for the locus. The "unconstrained CPI" is the standard CPI formula, which includes all possible
mixture contributor profiles for a given mixture profile. The "constrained CPI" . A "constrained" combined probability of inclusion (CPI) only considers the alternative hypotheses
of genotype combinations that have not been eliminated from consideration. If a locus is fully resolved, then the "solved CPI" is reported.
A full explanation of the statistical methods can be found in the appendix.
5. Additional analysis parameters
5.1 Minimum peak height threshold (MPHT)
When three or fewer alleles are observed at a particular locus, it is sometimes also possible that alleles possessed by one or both contributor are present at levels below the detection
capability of the equipment used for genotyping (i.e., allelic dropout). The label MPHT is used to represent potential peaks below the minimum peak height threshold that may
need to be considered in order to evaluate all possible contributor profiles.
The minimum peak height threshold (MPHT) is a user-defined parameter that represents the RFU threshold utilized during the course of the electronic analysis of the DNA data.
GenoStat shares the Applied Biosystems' default minimum peak height threshold of 50 RFUs.
5.2 Peak height imbalance ratio (PHR)
Peak height balance demands that two peaks from the same contributor must have peak heights within a specific constant multiplier of each other.
Thus, in order for a profile containing (P1, P2) to satisfy peak height balance, it must be true that:
P1/P2 ≥ PHR (where P1 ≤ P2)
for the specific value of PHR appropriate for the measurement technology used in analyzing the sample.
General practice has found that, "[t]he peak height ratio, as measured by dividing the height of the lower quantity peak in relative fluorescence units by the
height of the higher quantity allele peak, should be greater than approximately 70% in a single source sample" (Butler, 2001).
5.3 Magnitude-dependant peak height ratio (MD-PHR)
The extent to which two peaks are balanced varies with the magnitude of the peaks at a locus.
As the magnitude of peaks increase, peak balance also tends to become greater (i.e., higher than 70%).
Similarly, peak height balance ratios can fall below 70% for smaller peak pairs.
Expected peak imbalance thresholds can be determined based on the average peak height of a contributor at a locus with the formula:
MD-PHR = 0.059 • ln(Ave peak height) + 0.36
This equation forms the 95% decision boundary for a validation study of 1763 heterozygous loci (manuscript forthcoming).
The magnitude-dependant peak height ratio can be enabled with the checkbox on the main screen (MD-PHR).