TL;DR: An expectation-maximization (EM) algorithm leading to maximum-likelihood estimates of molecular haplotype frequencies under the assumption of Hardy-Weinberg proportions is implemented and appears to be useful for the analysis of nuclear DNA sequences or highly variable loci.
Abstract: Molecular techniques allow the survey of a large number of linked polymorphic loci in random samples from diploid populations. However, the gametic phase of haplotypes is usually unknown when diploid individuals are heterozygous at more than one locus. To overcome this difficulty, we implement an expectation-maximization (EM) algorithm leading to maximum-likelihood estimates of molecular haplotype frequencies under the assumption of Hardy-Weinberg proportions. The performance of the algorithm is evaluated for simulated data representing both DNA sequences and highly polymorphic loci with different levels of recombination. As expected, the EM algorithm is found to perform best for large samples, regardless of recombination rates among loci. To ensure finding the global maximum likelihood estimate, the EM algorithm should be started from several initial conditions. The present approach appears to be useful for the analysis of nuclear DNA sequences or highly variable loci. Although the algorithm, in principle, can accommodate an arbitrary number of loci, there are practical limitations because the computing time grows exponentially with the number of polymorphic loci. Although the algorithm, in principle, can accommodate an arbitrary number of loci, there are practical limitations because the computing time grows exponentially with the number of polymorphic loci.
TL;DR: The association of alleles among different loci was studied in natural populations of Hordeum spontaneum, the evolutionary progenitor of cultivated barley, and the variance of the number of heterozygous loci in two randomly chosen gametes affords a useful measure of such association.
Abstract: The association of alleles among different loci was studied in natural populations of Hordeum spontaneum, the evolutionary progenitor of cultivated barley. The variance of the number of heterozygous loci in two randomly chosen gametes affords a useful measure of such association. The behavior of this statistic in several particular models is described. Generally, linkage (gametic phase) disequilibrium tends to increase the variance above the value expected under complete independence. This increase is greatest when disequilibria are such as to maximize the sum of squares of the two-locus gametic frequencies.-When data on several loci per individual are available, the observed variance may be tested for its agreement with that expected under the hypothesis of complete interlocus independence, using the sampling theory of this model. When applied to allozyme data from 26 polymorphic populations of wild barley, this test demonstrated the presence of geographically widespread multilocus organization. On average, the variance was 80% higher than expected under random association. Gametic frequencies for four esterase loci in both of these populations of wild barley and two composite crosses of cultivated barley were analyzed. Most generations of the composites showed less multilocus structure, as measured by the indices of association, than the wild populations.
TL;DR: This work used fluorescence microscopy and variation at nine microsatellite loci to show that the false spider mite, Brevipalpus phoenicis, consists of haploid female parthenogens, and shows that this reproductive anomaly is caused by infection by an undescribed endosymbiotic bacterium, which results in feminization of haploids genetic males.
Abstract: The dominance of the diploid state in higher organisms, with haploidy generally confined to the gametic phase, has led to the perception that diploidy is favored by selection. This view is highlighted by the fact that no known female organism within the Metazoa exists exclusively (or even for a prolonged period) in a haploid state. We used fluorescence microscopy and variation at nine microsatellite loci to show that the false spider mite, Brevipalpus phoenicis, consists of haploid female parthenogens. We show that this reproductive anomaly is caused by infection by an undescribed endosymbiotic bacterium, which results in feminization of haploid genetic males.
TL;DR: It is found that PHASE usually had very low false-positives and the majority of genotypes that could not be resolved with high confidence included an allele occurring only once in a dataset, and genotypes involving two low-frequency alleles were disproportionately represented in the pool of unresolved genotypes.
Abstract: A widely-used approach for screening nuclear DNA markers is to obtain sequence data and use bioinformatic algorithms to estimate which two alleles are present in heterozygous individuals. It is common practice to omit unresolved genotypes from downstream analyses, but the implications of this have not been investigated. We evaluated the haplotype reconstruction method implemented by PHASE in the context of phylogeographic applications. Empirical sequence datasets from five non-coding nuclear loci with gametic phase ascribed by molecular approaches were coupled with simulated datasets to investigate three key issues: (1) haplotype reconstruction error rates and the nature of inference errors, (2) dataset features and genotypic configurations that drive haplotype reconstruction uncertainty, and (3) impacts of omitting unresolved genotypes on levels of observed phylogenetic diversity and the accuracy of downstream phylogeographic analyses. We found that PHASE usually had very low false-positives (i.e., a low rate of confidently inferring haplotype pairs that were incorrect). The majority of genotypes that could not be resolved with high confidence included an allele occurring only once in a dataset, and genotypic configurations involving two low-frequency alleles were disproportionately represented in the pool of unresolved genotypes. The standard practice of omitting unresolved genotypes from downstream analyses can lead to considerable reductions in overall phylogenetic diversity that is skewed towards the loss of alleles with larger-than-average pairwise sequence divergences, and in turn, this causes systematic bias in estimates of important population genetic parameters. A combination of experimental and computational approaches for resolving phase of segregating sites in phylogeographic applications is essential. We outline practical approaches to mitigating potential impacts of computational haplotype reconstruction on phylogeographic inferences. With targeted application of laboratory procedures that enable unambiguous phase determination via physical isolation of alleles from diploid PCR products, relatively little investment of time and effort is needed to overcome the observed biases.
TL;DR: A general likelihood‐based approach to inferring haplotypes‐disease associations in studies of unrelated individuals and an application to the Carolina Breast Cancer Study reveals significant haplotype effects and haplotype‐smoking interactions in the development of breast cancer.
Abstract: The associations between haplotypes and disease phenotypes offer valuable clues about the genetic determinants of complex diseases. It is highly challenging to make statistical inferences about these associations because of the unknown gametic phase in genotype data. We describe a general likelihood-based approach to inferring haplotype-disease associations in studies of unrelated individuals. We consider all possible phenotypes (including disease indicator, quantitative trait, and potentially censored age at onset of disease) and all commonly used study designs (including cross-sectional, case-control, cohort, nested case-control, and case-cohort). The effects of haplotypes on phenotype are characterized by appropriate regression models, which allow various genetic mechanisms and gene-environment interactions. We present the likelihood functions for all study designs and disease phenotypes under Hardy-Weinberg disequilibrium. The corresponding maximum likelihood estimators are approximately unbiased, normally distributed, and statistically efficient. We provide simple and efficient numerical algorithms to calculate the maximum likelihood estimators and their variances, and implement these algorithms in a freely available computer program. Extensive simulation studies demonstrate that the proposed methods perform well in realistic situations. An application to the Carolina Breast Cancer Study reveals significant haplotype effects and haplotype-smoking interactions in the development of breast cancer.