TL;DR: In this article, a statistical framework for background adjustment of Affymetrix GeneChip arrays is presented, which is based on simple hybridization theory from molecular biology and experiments specifically designed to help develop it.
Abstract: High density oligonucleotide expression arrays are widely used in many areas of biomedical research. Affymetrix GeneChip arrays are the most popular. In the Affymetrix system, a fair amount of further pre-processing and data reduction occurs following the image processing step. Statistical procedures developed by academic groups have been successful at improving the default algorithms provided by the Affymetrix system. In this paper we present a solution to one of the pre-processing steps, background adjustment, based on a formal statistical framework. Our solution greatly improves the performance of the technology in various practical applications.Affymetrix GeneChip arrays use short oligonucleotides to probe for genes in an RNA sample. Typically each gene will be represented by 11-20 pairs of oligonucleotide probes. The first component of these pairs is referred to as a perfect match probe and is designed to hybridize only with transcripts from the intended gene (specific hybridization). However, hybridization by other sequences (non-specific hybridization) is unavoidable. Furthermore, hybridization strengths are measured by a scanner that introduces optical noise. Therefore, the observed intensities need to be adjusted to give accurate measurements of specific hybridization. One approach to adjusting is to pair each perfect match probe with a mismatch probe that is designed with the intention of measuring non-specific hybridization. The default adjustment, provided as part of the Affymetrix system, is based on the difference between perfect match and mismatch probe intensities. We have found that this approach can be improved via the use of estimators derived from a statistical model that use probe sequence information. The model is based on simple hybridization theory from molecular biology and experiments specifically designed to help develop it.A final step in the pre-processing of these arrays is to combine the 11-20 probe pair intensities,after background adjustment and normalization, for a given gene to define a measure of expression that represents the amount of the corresponding mRNA species. In this paper we illustrate the practical consequences of not adjusting appropriately for the presence of nonspecific hybridization and provide a solution based on our background adjustment procedure. Software that computes our adjustment is available as part of the Bioconductor project (http://www.bioconductor.
TL;DR: This chapter demonstrates Bioconductor tools useful for creating lists of genes that are differentially expressed in two populations and starts from the raw probe level data (CEL files) and concludes with the creation of annotated reports.
Abstract: The predominant use for microarrays is the measurement of genome-wide expression levels, and the most commonly used microarray platform is the Affymetrix GeneChip Affymetrix GeneChip arrays use short oligonucleotides to probe for genes in an RNA sample Genes are represented by a set of oligonucleotide probes each with a length of 25 bases Because of their short length, multiple probes are used to improve specificity Affymetrix arrays typically use between 11 and 20 probe pairs, referred to as a probeset, for each gene One component of these pairs is referred to as a perfect match probe (PM) and is designed to hybridize only with transcripts from the intended gene (specific hybridization) However, hybridization to the PM probes by other mRNA species (non-specific hybridization) is unavoidable Therefore, the observed intensities need to be adjusted to be accurately quantified The other component of a probe pair, the mismatch probe (MM), is constructed with the intention of measuring only the nonspecific component of the corresponding PM probe Affymetrix’s strategy is to make MM probes identical to their PM counterpart except that the 13-th base is exchanged with its complement The identification of genes that are differentially expressed in two populations is a popular application of Affymetrix GeneChip technology Due to the cost of this technology, experiments using a small number of arrays are common A situation we often see is the case where three arrays are used for each population In this lab, we give an example of how to quickly create lists of genes that are interesting in the sense that they appear to be differentially expressed, starting from the raw probe level data (CEL files) In Section 2, we briefly describe the functions necessary to import the data into Bioconductor In Section 3 we talk about preprocessing In Section 4, we describe ways to rank genes and decide on a cutoff Finally, in Section 5 we describe how to make annotated reports and examine the PubMed literature related to the genes in our list
TL;DR: In this article, the binding free energy of each fragment with a perfect complimentary sequence of the fragment is calculated, and if any bound is above a predetermined, fixed threshold, a fragment is extended one nucleotide at a time until the bound is below the threshold or the fragment has the same length as the probe.
Abstract: A computer-implemented method as follows. Providing a list of target sequences associated with one or more organisms in a list of organisms. Providing a list of candidate prototype sequences suspected of hybridizing to one or more of the target sequences. Generating a collection of probes corresponding to each candidate prototype sequence, each collection of probes having a set of probes for every subsequence having a predetermined, fixed subsequence length of the corresponding candidate prototype sequence. The sets consist of the corresponding subsequence and every variation of the corresponding subsequence formed by varying a center nucleotide of the corresponding subsequence. Generating a set of fragments corresponding to each target sequence, each set of fragments having every fragment having a predetermined, fixed fragment length of the corresponding target sequence. Calculating the binding free energy of each fragment with a perfect complimentary sequence of the fragment. If any binding free energy is above a predetermined, fixed threshold, the fragment is extended one nucleotide at a time until the binding free energy is below the threshold or the fragment is the same length as the probe, generating a set of extended fragments. Determining which extended fragments are perfect matches to any of the probes. Assembling a base call sequence corresponding to each candidate prototype sequence. The base call sequence has a base call corresponding to the center nucleotide of each probe of the corresponding prototype sequence that is a perfect match to any extended fragment, but for which the other members of the set of probes containing the perfect match probe are not perfect matches to any extended fragment and a non-base call in all other circumstances.
TL;DR: In this article, a stem-loop probe for single nucleotide polymorphism (SNP) genotyping of individual SNP nucleic acid target sequences comprises first (1), second (2), and third (3) single stranded nucleic acids portions.
Abstract: A stem-loop probe for single nucleotide polymorphism (SNP) genotyping of individual SNP nucleic acid target sequences comprises first (1), second (2), and third (3) single stranded nucleic acid portions. The second single stranded nucleic acid portion (2) is located between the first (1) and the third (3) single stranded nucleic acid portions. The first (1) and the third (3) single stranded nucleic acid portions build a double stranded, intramolecular stem (10). The second single stranded nucleic acid portion (2) forms a single stranded oligonucleotide loop (20) with a nucleotide sequence that is complementary to individual SNP nucleic acid target sequences. The nucleotide sequence of the stem-loop probe is chosen such that perfect match probe/target hybrids have a melting point Tm that is at least 5 °C higher than the T m of mismatched probe/target hybrids. The first (1) and the third (3) single stranded nucleic acid portions of the stem-loop probe comprise a 3' or 5' end configured as an A, T, or C nucleotide, to which A, T, or C nucleotide a non-quenched fluorophore is conjugated. In a method of detecting single nucleotide polymorphism (SNP) in nucleic acid containing samples, a pair of such stem-loop probes for SNP genotyping of two individual SNP nucleic acid target sequences of a sample is utilized. The stem-loop probes (100) comprise the same first (3), second (2), and third (3) single stranded nucleic acid portions and a ratio of perfect match probe/target hybrids to mismatched probe/target hybrids is detected at a certain temperature.
TL;DR: In this article, the binding free energy of each fragment with a perfect complimentary sequence of the fragment is calculated, and if any bound is above a predetermined, fixed threshold, a fragment is extended one nucleotide at a time until the bound is below the threshold or the fragment has the same length as the probe.
Abstract: A computer-implemented method as follows. Providing a list of target sequences associated with one or more organisms in a list of organisms. Providing a list of candidate prototype sequences suspected of hybridizing to one or more of the target sequences. Generating a collection of probes corresponding to each candidate prototype sequence, each collection of probes having a set of probes for every subsequence having a predetermined, fixed subsequence length of the corresponding candidate prototype sequence. The sets consist of the corresponding subsequence and every variation of the corresponding subsequence formed by varying a center nucleotide of the corresponding subsequence. Generating a set of fragments corresponding to each target sequence, each set of fragments having every fragment having a predetermined, fixed fragment length of the corresponding target sequence. Calculating the binding free energy of each fragment with a perfect complimentary sequence of the fragment. If any binding free energy is above a predetermined, fixed threshold, the fragment is extended one nucleotide at a time until the binding free energy is below the threshold or the fragment is the same length as the probe, generating a set of extended fragments. Determining which extended fragments are perfect matches to any of the probes. Assembling a base call sequence corresponding to each candidate prototype sequence. The base call sequence has a base call corresponding to the center nucleotide of each probe of the corresponding prototype sequence that is a perfect match to any extended fragment, but for which the other members of the set of probes containing the perfect match probe are not perfect matches to any extended fragment and a non-base call in all other circumstances.