TL;DR: The human erythropoietin gene has been isolated from a genomic phage library by using mixed 20-mer and 17-mer oligonucleotide probes and encodes a 27-amino acid signal peptide and a 166-AMino acid mature protein with a calculated Mr of 18,399.
Abstract: The human erythropoietin gene has been isolated from a genomic phage library by using mixed 20-mer and 17-mer oligonucleotide probes. The entire coding region of the gene is contained in a 5.4-kilobase HindIII-BamHI fragment. The gene contains four intervening sequences (1562 base pairs) and five exons (582 base pairs). It encodes a 27-amino acid signal peptide and a 166-amino acid mature protein with a calculated Mr of 18,399. The erythropoietin gene, when introduced into Chinese hamster ovary cells, produces erythropoietin that is biologically active in vitro and in vivo.
TL;DR: A new member of the tyrosine kinase proto-oncogene family has been identified on the basis of its amplification in a human mammary carcinoma.
Abstract: The cellular gene encoding the receptor for epidermal growth factor (EGF) has considerable homology to the oncogene of avian erythroblastosis virus. In a human mammary carcinoma, a DNA sequence was identified that is related to v-erbB but amplified in a manner that appeared to distinguish it from the gene for the EGF receptor. Molecular cloning of this DNA segment and nucleotide sequence analysis revealed the presence of two putative exons in a DNA segment whose predicted amino acid sequence was closely related to, but different from, the corresponding sequence of the erbB/EGF receptor. Moreover, this DNA segment identified a 5-kilobase transcript distinct from the transcripts of the EGF receptor gene. Thus, a new member of the tyrosine kinase proto-oncogene family has been identified on the basis of its amplification in a human mammary carcinoma.
TL;DR: The LDL receptor appears to be a mosaic protein built up of exons shared with different proteins, and it therefore belongs to several supergene families.
Abstract: The multifunctional nature of coated pit receptors predicts that these proteins will contain multiple domains. To establish the genetic basis for these domains (LDL) receptor. This gene is more than 45 kilobases in length and contains 18 exons, most of which correlate with functional domains previously defined at the protein level. Thirteen of the 18 exons encode protein sequences that are homologous to sequences in other proteins: five of these exons encode a sequence similar to one in the C9 component of complement; three exons encode a sequence similar to a repeat sequence in the precursor for epidermal growth factor (EGF) and in three proteins of the blood clotting system (factor IX, factor X, and protein C); and five other exons encode nonrepeated sequences that are shared only with the EGF precursor. The LDL receptor appears to be a mosaic protein built up of exons shared with different proteins, and it therefore belongs to several supergene families.
TL;DR: Two different human genomic DNA libraries were screened for the gene for blood coagulation factor IX by employing a cDNA for the human protein as a hybridization probe and found five overlapping lambda phages were identified that contained the genes for factor IX.
Abstract: Two different human genomic DNA libraries were screened for the gene for blood coagulation factor IX by employing a cDNA for the human protein as a hybridization probe. Five overlapping lambda phages were identified that contained the gene for factor IX. The complete DNA sequence of about 38 kilobases for the gene and the adjacent 5' and 3' flanking regions was established by the dideoxy chain termination and chemical degradation methods. The gene contained about 33.5 kilobases of DNA, including seven introns and eight exons within the coding and 3' noncoding regions of the gene. The eight exons code for a prepro leader sequence and 415 amino acids that make up the mature protein circulating in plasma. The intervening sequences range in size from 188 to 9473 nucleotides and contain four Alu repetitive sequences, including one in intron A and three in intron F. A fifth Alu repetitive sequence was found immediately flanking the 3' end of the gene. A 50 base pair insert in intron A was found in a clone from one of the genomic libraries but was absent in clones from the other library. Intron A as well as the 3' noncoding region of the gene also contained alternating purine-pyrimidine sequences that provide potential left-handed helical DNA or Z-DNA structures for the gene. KpnI repetitive sequences were identified in intron D and the region flanking the 5' end of the gene. The 5' flanking region also contained a 1.9-kb HindIII subfamily repeat. The seven introns in the gene for factor IX were located in essentially the same position as the seven introns in the gene for human protein C, while the first three were found in positions identical with those in the gene for human prothrombin.
TL;DR: Transcription Termination in Eukaryotes Is Heterogeneous and Depends on Specific DNA Sequences Downstream of the 3' Processing Site(s) In all genes of eukaryote studied so far, with the possible exception of yeast, RNA polymerase II transcribes across the polyadenylation site(s), and terminates, sometimes very far, downstream of the DNA sequences coding for the 3- mRNA termini.
Abstract: Max L. Birnstiel, Meinrad Busslinger, and Katharina Strub Institut fL~r Molekularbiologie II der Universitt Legon et al., 1979). While transcription termina- tion is dispensable, there is some evidence that RNA processing is essential. For eXample, mutations in the polyadenylation signal in the ~2- and/~-globin genes are responsible for thalassemias in man (Higgs et al., 1983; Orkin et al., 1985). The downstream mRNA segment that is cleaved from the eukaryotic mRNA precursor in the 3' processing reac- tion is uncapped, extremely unstable in vivo, and is usu- ally available for analysis only in pulse-labeled nuclear RNA or as nascent RNA chains during run-on experi- ments. A recent estimate suggests a half life of consider- ably less than 10 minutes for the unused RNA sequences of the mouse/3-globin gene (Citron et al., 1984). Because of the great metabolic instability of the downstream tran- scripts, 3' cleavage of pre-mRNA in eukaryotes is, in many ways, the phenotypic equivalent of transcription termi- nation. The transcription termination and/or processing events responsible for generating the 3' terminus are not impor- tant merely for producing a defined end to the molecule. In several cases, alternative 3' ends can be generated, and the actual 3' end achieved in a particular molecule may in turn control processing events in preceding regions of the molecule; for instance, the pattern of splic- ing may be changed, leading to the synthesis of alterna- tive proteins. Examples are presented by complex gene units such as the heavy immunoglobulin gene (Mather et al., 1984), the adeno late transcription unit (for review, see Darnell, 1982; Nevins, 1983) and the calcitonin gene (for review, see Rosenfeld et al., 1984). Recently, it has been discovered that cell cycle regulation of histone gene ex- pression depends largely on sequences near the 3' termi- nus (L~scher et al., 1985). Thus, the events at the 3' end of mRNA provide a focal point for the study of gene regulation. Transcription Termination in Eukaryotes Is Heterogeneous and Depends on Specific DNA Sequences Downstream of the 3' Processing Site(s) In all genes of eukaryotes studied so far, with the possible exception of yeast, RNA polymerase II transcribes across the polyadenylation site(s) and terminates, sometimes very far, downstream of the DNA sequences coding for the 3' mRNA termini. This was first recognized through analy- sis of pulse-labeled RNAs of the adeno major late tran- scription unit (Nevins and Darnell, 1978; reviewed by Dar- nell, 1982; Nevins, 1983) and has, more recently, been fully supported by in vitro nuclear run-on studies on a vari- ety of chromosomal genes (Hofer et al., 1981; Sheffery et al., 1984; Weintraub et al., 1981; Mather et al., 1984; Frayne et al., 1984; Amara et al., 1984). In these latter ex- periments, transcripts of the spacer downstream of the polyadenylation site(s) were shown to be present in near equimolar amounts compared to those of the mRNA cod- ing sequences. So far, the sequences in which transcription actually ceases are ill-defined. The observation that there is a gradual reduction in hybridization of transcripts to DNA re- striction fragments distal to the genes (Hofer and Darnell, 1981; Citron et al., 1984; HagenbLichle et al., 1984) and the results of $1 mapping experiments (Hagenb~chle et al., 1984) have suggested that transcription termination oc- curs at multiple sites in a stretch of DNA extending over hundreds, or even thousands, of nucleotides. The most rapid falloff in density of transcribing polymerases was found in the chicken ovalbumin gene, where ,~90% of transcription terminates in a discrete region 900 bp down- stream of the last exon (LeMeur et al., 1984). This region extends over 170 bp and includes AT-rich sequences that resemble some terminal mRNA sequences of yeast (see below). At face value, the combined data mean that tran- scription termination in eukaryotes depends on weak, serially-repeated termination sequences. The sequences directing transcription termination are presently being investigated by sequence manipulation experiments. The first such study, carried out on a H2A histone gene, involved the classic strategy of identifying
TL;DR: Southern blot analysis showed close similarity of the restriction patterns of the rat c-erbB-2 gene and the rat neu oncogene, suggesting possible involvement of c- Derbyshire-related DNA clones in human cancer.
Abstract: From a human genomic library, we obtained six v-erbB-related DNA clones. A DNA probe prepared from one of the clones, lambda 107, hybridized to EcoRI fragments of 6.4 and 13 kilobase pairs of human DNA. Neither of these fragments was amplified in A431 vulva carcinoma cells, in which the gene encoding the epidermal growth factor receptor is amplified. In addition, the probe from lambda 107 hybridized with a single, 4.8-kilobase poly(A)+ RNA species and did not react with EGF receptor mRNA. Thus, we conclude that clone lambda 107 represents a v-erbB-related gene (c-erbB-2) that is distinct from the EGF receptor gene. In contrast, the other five clones were shown to represent the EGF receptor gene (c-erbB-1). Partial nucleotide sequence analysis of the lambda 107 insert showed that this clone contained at least seven putative exons and that six of them could encode the kinase domain characteristic of protein products of the src oncogene family. Southern blot analysis showed close similarity of the restriction patterns of the rat c-erbB-2 gene and the rat neu oncogene, suggesting possible involvement of c-erbB-2 in human cancer. In fact, approximately 30-fold amplification of c-erbB-2 was observed in a human adenocarcinoma of the salivary gland.
TL;DR: The results indicate that what appears to be a gene duplication event giving rise to these two distinct regions must have arisen a long time ago in the evolution of this gene locus.
Abstract: The organization and sequences of the human beta-chain T-cell receptor diversity, joining, and constant region segments are described. The beta chain of the human T-cell receptor, analogous to the mouse counterpart, consists of two distinct constant region genes approximately equal to 10 kilobases apart. The two constant region genes, C beta 1 and C beta 2, are very similar not only in sequence but also in genomic organization. The coding sequences of each of these C beta constant region genes are divided into four exons. The first two exons encode most of the extracellular constant domain. The third exon encodes a major part of the presumed transmembrane portion, and the last exon contains the cytoplasmic coding sequence as well as 3' untranslated sequences. Except for a stretch of approximately equal to 95 highly conserved nucleotides extending 3' of the first exon of the C region genes, little homology can be found between the intron sequences of C beta 1 and C beta 2. A small cluster of joining region (J beta) gene segments is located approximately equal to 5 kilobases upstream of each of these two constant regions. The first cluster, J beta 1, contains six functional J gene segments while the second, J beta 2, contains seven functional J gene segments. In addition, diversity region (D beta) gene segments are located approximately equal to 600 base pairs upstream of each J beta. Recombinational signals containing highly conserved heptamer and nonamer sequences separated by 12 or 23 bases are found adjacent to all of these D beta and J beta gene segments. These signal sequences are thought to be involved in the somatic recombination processes. These results indicate that what appears to be a gene duplication event giving rise to these two distinct regions must have arisen a long time ago in the evolution of this gene locus.
TL;DR: It is demonstrated that small exons with characteristic split codon structure are differentially spliced in intricate combinatorial patterns to generate a minimum of 10, and potentially 64, distinct troponin T mRNAs, encoding different isoforms, in a developmentally regulated and tissue-specific manner.
TL;DR: Cosmid clones containing the gene for human adenosine deaminase (ADA) were isolated and it was shown in a functional assay that a stretch of 135 bp immediately preceding the cap site has promoter activity.
Abstract: Cosmid clones containing the gene for human adenosine deaminase (ADA) were isolated. The gene is 32 kb long and split into 12 exons. The exact sizes and boundaries of the exon blocks including the transcription start sites were determined. The sequence upstream from this cap site lacks the TATA and CAAT boxes characteristic for eukaryotic promoters. Nevertheless, we have shown in a functional assay that a stretch of 135 bp immediately preceding the cap site has promoter activity. This 135-bp DNA fragment is extremely rich in G/C residues (82%). It contains three inverted repeats that allow the formation of cruciform structures, a 10-bp and a 16-bp direct repeat and five G/C-rich motifs (GGGCGGG) disposed in a strikingly symmetrical fashion. Some of these structural features were also found in the promoter region of other genes and we discuss their possible function. Knowledge of the exact positions of the intron-exon boundaries allowed us to propose models for abnormal RNA processing that occurs in previously investigated ADA-deficient cell lines.
TL;DR: Both in extracts and in cells, an exon became optional when sequestered in a hairpin loop, and the same types of alternatively spliced RNAs were formed when a similar template was introduced into HeLa cells by transfection.
TL;DR: When inserted into SV40‐based expression vectors the human p53 cDNA successfully directs the production of a polypeptide with an apparent mol.
Abstract: A 2.5-kb cDNA clone for human p53 tumor antigen has been isolated. This clone contains the entire coding region including 135 bp upstream of the first ATG. Comparison of the nucleotide sequence of human p53 and mouse p53 demonstrates that the first ATG in human p53 corresponds to the second ATG (codon No. 4) in mouse p53. The human p53 comprises 393 residues and is longer than the mouse p53 due to six additional codons present at the region corresponding to exon 4 of the mouse p53 gene. The DNA sequence homology between the coding regions of mouse and human p53 is 81% and the conservation of homology is not equally distributed along the molecule. When inserted into SV40-based expression vectors the human p53 cDNA successfully directs the production of a polypeptide with an apparent mol. wt. of 55 kd which can be precipitated by monoclonal antibodies to p53.
TL;DR: A cDNA clone that expresses granulocyte‐macrophage colony stimulating factor (GM‐CSF) activity in COS‐7 cells has been isolated from a pcD library prepared from mRNA derived from concanavalin A‐activated mouse helper T cell clones based on homology with the mouse GM‐ CSF cDNA sequence.
Abstract: A cDNA clone that expresses granulocyte-macrophage colony stimulating factor (GM-CSF) activity in COS-7 cells has been isolated from a pcD library prepared from mRNA derived from concanavalin A-activated mouse helper T cell clones. Based on homology with the mouse GM-CSF cDNA sequence, the mouse GM-CSF gene was isolated. The human GM-CSF gene was also isolated based on homology with the human GM-CSF cDNA sequence. The nucleotide sequences determined for the genes and their flanking regions revealed that both the mouse and human GM-CSF genes are composed of three introns and four exons. The organization of the mouse and human GM-CSF genes are highly homologous and strong sequence homology between the two genes is found both in the coding and non-coding regions. A 'TATA'-like sequence was found 20-25 bp upstream from the transcription initiation site. In the 5'-flanking region, there is a highly homologous region extending 330 bp upstream of the putative TATA box. This sequence may play a role in regulation of expression of the GM-CSF gene. These structures are compared with those of different lymphokine genes and their regulatory regions.
TL;DR: There is probably a single MBP gene in the mouse genome, and that there is a single major 5' end for mouse MBP transcripts, 47 bp 5' of the initiator methionine codon.
TL;DR: To establish whether the Iμ transcripts have any translational potential and to elucidate the structure of their promoter region, sequence analysis of cloned Iμ complementary DNAs, primer extension and S1, nuclease mapping has found that these transcripts have remarkable 5′ heterogeneity.
Abstract: Transcriptional competence of the immunoglobulin heavy-chain locus (IgH) is established at an early stage of lymphoid cell development, leading to the appearance of RNA components, previously called Cμ RNA1 or sterile-μ RNA2, which contain constant-region sequences but lack variable-region sequences. These components are of two types: those which initiate in the D region of alleles that have undergone DJH (diversity–joining region) rearrangement (Dμ transcripts) and those which initiate within the JH–Cμ intron (hereafter termed Iμ transcripts)3,4. In pre-B and early B cells, Dμ and lμ transcripts are nearly as abundant as the messenger RNA encoding μ heavy chain2,3,5. The Dμ transcripts are spliced into RNAs containing D, JH and Cμ sequences, and in some, but not all, cases these RNAs are translated into Dμ proteins4. To establish whether the Iμ transcripts have any translational potential and to elucidate the structure of their promoter region, we have determined their transcription initiation sites and their mode of splicing. As reported here, by using sequence analysis of cloned Iμ complementary DNAs, primer extension and S1, nuclease mapping, we have found that these transcripts have remarkable 5′ heterogeneity: ther e are more than five distinct start sites spanning a region of 44 nucleotides that is located downstream of an octanucleotide found in all variable-region promoters. Such imprecise initiation may result from the lack of a well-defined T A T AA motif and the unusual proximity of the octanucleotide to the enhancer region. Approximately 700 nucleotides downstream from these initiation sites, a cryptic splice site is used to create a nontranslatable exon (‘nontron’) which is joined to the Cμ1 domain. The properties of the nontron may be important for the mechanism of allelic exclusion.
TL;DR: Using modified adenovirus and beta-globin transcription units, pairs of transcripts were constructed that contained mutually complementary sequences in their introns and raised the possibility that some of the mRNAs in a cell acquire exons from more than one primary transcript.
TL;DR: The gene encoding the human interleukin-2 (IL-2) receptor consists of 8 exons spanning more than 25 kilobases on chromosome 10 andAlternative messenger RNA (mRNA) splicing may delete exon 4 sequences, resulting in a mRNA that does not encode a functional IL-2 receptor.
Abstract: The gene encoding the human interleukin-2 (IL-2) receptor consists of 8 exons spanning more than 25 kilobases on chromosome 10. Exons 2 and 4 were derived from a gene duplication event and unexpectedly also are homologous to the recognition domain of human complement factor B. Alternative messenger RNA (mRNA) splicing may delete exon 4 sequences, resulting in a mRNA that does not encode a functional IL-2 receptor. Leukemic T cells infected with HTLV-I and normal activated T cells express IL-2 receptors with identical deduced protein sequences. Receptor gene transcription is initiated at two principal sites in normal activated T cells. Adult T cell leukemia cells infected with HTLV-I show activity at both of these sites, but also at a third transcription initiation site.
TL;DR: The human T-cell receptor α-chain gene consists of a number of noncontiguous V and J gene segments and a C region gene, which determines antigen specificity.
Abstract: An essential property of the immune system is its ability to generate great diversity in antibody and T-cell immune responses. The genetic and molecular mechanisms responsible for the generation of antibody diversity have been investigated during the past several years. The gene for the variable (V) region, which determines antigen specificity, is assembled when one member of each of the dispersed clusters of V gene segments, diversity (D) elements (for heavy chains only) and joining (J) segments are fused by DNA rearrangement. The cloning of the beta-chain of the T-cell antigen receptor revealed that the organization of the beta-chain locus, which is similar to that of immunoglobulin genes, is also composed of noncontiguous segments of V, D, J and constant (C) region genes. The structure of the alpha-chain seems to consist of a V and a C domain connected by a J segment. We report here that the human T-cell receptor alpha-chain gene consists of a number of noncontiguous V and J gene segments and a C region gene. The V region gene segment is interrupted by a single intron, whereas the C region contains four exons. The J segments, situated 5' of the C region gene, are dispersed over a distance of at least 35 kilobases (kb). Signal sequences, which are presumably involved in DNA recombination, are found next to the V and J gene segments.
TL;DR: It is concluded that the levels of alpha1(III) and alpha2(I) collagen mRNA are often but not necessarily coordinately regulated by transformation in mouse cells.
TL;DR: Using cloned human prealbumin cDNA as a probe, Southern blot hybridization of human genomic DNA revealed that the pre albumin gene consists of an unique, single-copy DNA.
TL;DR: Establishment of the exon/intron arrangement of the Type III gene was obtained by electron microscopic analysis in conjunction with sequencing of selected genomic regions and some evolutionary features of this important family of proteins confirmed.
TL;DR: A strong homology was found between this area of the IFN‐activated (2′‐5′) oligo A synthetase gene and the corresponding region of the human fibroblast IFn‐beta 1 gene, whose transcription is also stimulated by IFN priming.
Abstract: The (2'-5') oligo A synthetase E, one of the translational inhibitory enzymes whose synthesis is strongly induced by all interferons (IFNs), is shown to be encoded in human cells by a 13.5-kb gene. By a cell-specific differential splicing, between the seventh and an additional eighth exon of this gene, two active E mRNAs of 1.6 and 1.8 kb are produced, along with several longer transcripts. cDNA clones for the two mRNAs were obtained and their sequences indicate that the human (2'-5') oligo A synthetase gene codes for two forms of the enzyme of mol. wt. 41 000 and 46 000, which differ only by their C-terminal ends. The product of the 1.6-kb RNA (E16) has a very hydrophobic C terminus, which is replaced by a longer acidic C-terminal sequence in the 1.8-kb RNA product (E18). The transcriptional start site of the gene was identified and 200 bp of the 5' flanking region were sequenced. A strong homology was found between this region of the IFN-activated (2'-5') oligo A synthetase gene and the corresponding region of the human fibroblast IFN-beta 1 gene, whose transcription is also stimulated by IFN priming. The gene has two polyadenylation sites which share a common undecanucleotide, but are used in a cell-specific manner to give rise to the 1.6- and 1.8-kb mRNAs.
TL;DR: This work has shown that the amino acid sequences of the human low-density lipoprotein (LDL) receptor and the human precursor for epidermal growth factor (EGF) show 33 percent identity over a stretch of 400 residues, suggesting that the homologous region may have resulted from a duplication of an ancestral gene.
Abstract: The amino acid sequences of the human low-density lipoprotein (LDL) receptor and the human precursor for epidermal growth factor (EGF) show 33 percent identity over a stretch of 400 residues. This region of homologous is encoded by eight contiguous exons in each respective gene. Of the nine introns that separate these exons, five are located in identical positions in the two protein sequences. This finding suggests that the homologous region may have resulted from a duplication of an ancestral gene and that the two genes evolved further by recruitment of exons from other genes, which provided the specific functional domains of the LDL receptor and the EGF precursor.
TL;DR: In higher eukaryotes at least three sequence elements participate in the initiation of the splicing reaction: the 5′ splice site, 3′ splicing site consensus sequence and the RNA branchpoint.
Abstract: Pre-mRNA splicing has been shown to occur by a two-step pathway. In the first stage, the pre-mRNA is cleaved at the 5' splice site, generating the first exon RNA species and an RNA species composed of the intron and second exon (IVS1-exon 2 RNA species). In the second stage, cleavage at the 3' splice site and ligation of the exons occurs, resulting in the excision of the intact intron. The excised intron and IVS1-exon 2 RNA species are in the form of a lariat in which the 5' end of the intron is joined to an adenosine residue near the 3' end of the intron by a 2'-5' phosphodiester bond. Here we show that although cleavage at the 3' splice site does not occur until the second stage of the splicing reaction, at least a portion of the 3' splice site consensus sequence is necessary for 5' splice site cleavage and lariat formation. Thus, in higher eukaryotes at least three sequence elements participate in the initiation of the splicing reaction: the 5' splice site, 3' splice site consensus sequence and the RNA branchpoint.
TL;DR: Results suggest that selection of the 3' splice site accompanied by the association of a factor with the branch point may be the initial step in mammalian pre-mRNA splicing.
TL;DR: The TACTAAC sequence in the yeast CYH2m gene intron is altered to TACTACC, which changes the nucleotide at the normal position of the branch in intron RNA lariats produced during pre-mRNA splicing, and it prevents splicing in vivo.
TL;DR: The exact termini of the gene as well as its 5' flanking sequences (promoter region) are described and two short repetitive sequences closely associated with the pro-alpha 1(I) gene have been identified to be different members of the AluI family of repeats.
TL;DR: The close sequence homology between the two phenobarbital-inducible cytochrome P-450 genes is found to extend to the promoter region with one notable exception and this may somehow be related to the difference in the level of cy tochrome P -450b and P- 450e in the inductive phase of phenobarBital administration.
TL;DR: It is suggested that both the calcitonin and CGRP exons arose from a common primordial sequence, suggesting that duplication and rearrangement events are responsible for the generation of this complex transcription unit.
Abstract: Two mRNAs generated as a consequence of alternative RNA processing events in expression of the human calcitonin gene encode the protein precursors of either calcitonin or calcitonin gene-related peptide (CGRP). Both calcitonin and CGRP RNAs and their encoded peptide products are expressed in the human pituitary and in medullary thyroid tumors. On the basis of sequence comparison, it is suggested that both the calcitonin and CGRP exons arose from a common primordial sequence, suggesting that duplication and rearrangement events are responsible for the generation of this complex transcription unit.
TL;DR: A comparison of the organization of the GFAP gene with that of genes encoding other IF proteins reveals that the structure of IF genes is highly conserved in spite of considerable divergence at the amino acid level, while most of the evolutionary events leading to the placement of introns in IF genes must have occurred prior to the duplication and subsequent divergence ofIF genes from a presumptive common ancestral sequence.
Abstract: We report the complete sequence of the gene encoding mouse glial fibrillary acidic protein (GFAP), the intermediate filament (IF) protein specific to astrocytes. The 9.8 kb gene includes nine exons separated by introns ranging in size from 0.2 to 2.5 kb. A comparison of the organization of the GFAP gene with that of genes encoding other IF proteins reveals that the structure of IF genes is highly conserved in spite of considerable divergence at the amino acid level. Thus, most of the evolutionary events leading to the placement of introns in IF genes must have occurred prior to the duplication and subsequent divergence of IF genes from a presumptive common ancestral sequence. The conserved gene organization is unrelated to structural features of IF proteins. A curious feature of the GFAP gene is the large number of repeated sequences found in the introns. Six tracts of reiterated di- or trinucleotides are present, plus tandem repeats of two different novel sequences. One repeat is unique to the GFAP gene; the other occurs elsewhere in the mouse genome, although at relatively low frequency.