TL;DR: A novel gene-modeling method, ECgene (Gene modeling by EST Clustering), which combines genome-based EST clustering and the transcript assembly procedure in a coherent and consistent fashion and takes alternative splicing events into consideration.
Abstract: With the availability of the human genome map and fast algorithms for sequence alignment, genome-based EST clustering became a viable method for gene modeling. We developed a novel gene-modeling method, ECgene (Gene modeling by EST Clustering), which combines genome-based EST clustering and the transcript assembly procedure in a coherent and consistent fashion. Specifically, ECgene takes alternative splicing events into consideration. The position of splice sites (i.e., exon-intron boundaries) in the genome map is utilized as the critical information in the whole procedure. Sequences that share any splice sites are grouped together to define an EST cluster in a manner similar to that of the genome-based version of the UniGene algorithm. Transcript assembly is achieved using graph theory that represents the exon connectivity in each cluster as a directed acyclic graph (DAG). Distinct paths along exons correspond to possible gene models encompassing all alternative splicing events. EST sequences in each cluster are subclustered further according to the compatibility with gene structure of each splice variant, and they can be regarded as clone evidence for the corresponding isoform. The reliability of each isoform is assessed from the nature of cluster members and from the minimum number of clones required to reconstruct all exons in the transcript.
TL;DR: Findings and systems biology implications of biomarker candidates from a mouse model of human pancreatic ductal adenocarcinoma and a mouse models of human Her2/neu-induced breast cancer are summarized.
Abstract: Alternative splicing plays an important role in protein diversity without increasing genome size. Earlier thought to be uncommon, splicing appears to affect the majority of genes. Alternative splice variants have been detected at the mRNA level in many diseases. We have designed and demonstrated a discovery pipeline for alternative splice variant (ASV) proteins from tandem MS/MS datasets. We created a modified ECgene database with entries from exhaustive three-frame translation of Ensembl transcripts and gene models from ECgene, with periodic updates. The human database has 14 million entries; the mouse database, 10 million entries. We match MS/MS findings against these potential translation products to identify and quantify known and novel ASVs. In this review, we summarize findings and systems biology implications of biomarker candidates from a mouse model of human pancreatic ductal adenocarcinoma [28] and a mouse model of human Her2/neu-induced breast cancer [27]. The same approach is being applied to human tumors, plasma, and cell line studies of other cancers.
TL;DR: The ECgene's AS modeling and EST clustering to nine organisms for which sufficient EST data are available in the GenBank is expanded and several new applications to analyze differential expression are introduced.
Abstract: ECgene (http://genome.ewha.ac.kr/ECgene) was developed to provide functional annotation for alternatively spliced genes. The applications encompass the genome-based transcript modeling for alternative splicing (AS), domain analysis with Gene Ontology (GO) annotation and expression analysis based on the EST and SAGE data. We have expanded the ECgene's AS modeling and EST clustering to nine organisms for which sufficient EST data are available in the GenBank. As for the human genome, we have also introduced several new applications to analyze differential expression. ECprofiler is an ontology-based candidate gene search system that allows users to select an arbitrary combination of gene expression pattern and GO functional categories. DEGEST is a database of differentially expressed genes and isoforms based on the EST information. Importantly, gene expression is analyzed at three distinctive levels—gene, isoform and exon levels. The user interfaces for functional and expression analyses have been substantially improved. ASviewer is a dedicated java application that visualizes the transcript structure and functional features of alternatively spliced variants. The SAGE part of the expression module provides many additional features including SNP, differential expression and alternative tag positions.
TL;DR: ASmodeler is a novel web-based utility that finds gene models including alternative splicing events from genomic alignment of mRNA, EST and protein sequences that essentially combines the genome-based sequence clustering and transcript assembly procedures in a coherent fashion.
Abstract: Alternative splicing is in important mechanism of modulating gene function and expression which greatly expands transcriptome diversity. ASmodeler is a novel web-based utility that finds gene models including alternative splicing events from genomic alignment of mRNA, EST and protein sequences. User-supplied sequences are aligned against the genome map using the BLAT and SIM4 programs. Resulting exon connectivity is analyzed by applying graph-theoretic methods to build all possible gene models including splice variants. The algorithm essentially combines the genome-based sequence clustering and transcript assembly procedures in a coherent fashion. In addition to the user-supplied sequences, UniGene clusters and many well-known gene predictions such as Genscan, Ensembl and Acembly may be included in gene modeling. The current implementation supports human, mouse and rat genomes. ASmodeler is available at http://genome.ewha.ac.kr/ECgene/ASmodeler/.
TL;DR: A new method is developed that effectively distinguishes a true isoform among multiple isoforms in a gene, and 94% of true isoforms were identified by the scoring algorithm.