About: Compiler-compiler is a research topic. Over the lifetime, 295 publications have been published within this topic receiving 6326 citations. The topic is also known as: parser generator & compiler generator.
TL;DR: This book is the definitive guide to using the completely rebuilt ANTLR v3 and describes all features in detail, including the amazing new LL(*) parsing technology, tree construction facilities, StringTemplate code generation template engine, and sophisticated AnTLRWorks GUI development environment.
Abstract: ANTLR v3 is the most powerful, easy-to-use parser generator built to date, and represents the culmination of more than 15 years of research by Terence Parr. This book is the essential reference guide to using this completely rebuilt version of ANTLR, with its amazing new LL(*) parsing technology, tree construction facilities, StringTemplate code generation template engine, and sophisticated ANTLRWorks GUI development environment. Learn to use ANTLR directly from the author! ANTLR is a parser generator-a program that generates code to translate a specified input language into a nice, tidy data structure. You might think that parser generators are only used to build compilers. But in fact, programmers usually use parser generators to build translators and interpreters for domain-specific languages such as proprietary data formats, common network protocols, text processing languages, and domain-specific programming languages. Domain-specific languages are important to software development because they represent a more natural, high fidelity, robust, and maintainable means of encoding a problem than simply writing software in a general-purpose language. For example, NASA uses domain-specific command languages for space missions to improve reliability, reduce risk, reduce cost, and increase the speed of development. Even the first Apollo guidance control computer from the 1960s used a domain-specific language that supported vector computations. This book is the definitive guide to using the completely rebuilt ANTLR v3 and describes all features in detail, including the amazing new LL(*) parsing technology, tree construction facilities, StringTemplate code generation template engine, and sophisticated ANTLRWorks GUI development environment. You'll learn all about ANTLR grammar syntax, resolving grammar ambiguities, parser fault tolerance and error reporting, embedding actions to interpret or translate languages, building intermediate-form trees, extracting information from trees, generating source code, and how to use the ANTLR Java API.
TL;DR: A method to automatically generate an actual compiler from a formal description which is, in some sense, the partial evaluation of a computation process is described.
Abstract: This paper reports the relationship between formal description of semantics (i.e., interpreter) of a programming language and an actual compiler. The paper also describes a method to automatically generate an actual compiler from a formal description which is, in some sense, the partial evaluation of a computation process. The compiler-compiler inspired by this method differs from conventional ones in that the compiler-compiler based on our method can describe an evaluation procedure (interpreter) in defining the semantics of a programming language, while the conventional one describes a translation process.
TL;DR: ReRAGs is an object-oriented technique for rewriting abstract syntax trees in order to simplify compilation and allows compilers to be written in a high-level declarative and modular fashion, supporting language extensibility as well as reuse of modules for different compiler-related tools.
Abstract: This paper presents an object-oriented technique for rewriting abstract syntax trees in order to simplify compilation. The technique, Rewritable Reference Attributed Grammars (ReRAGs), is completely declarative and supports both rewrites and computations by means of attributes. We have implemented ReRAGs in our aspect-oriented compiler compiler tool JastAdd II. Our largest application is a complete static-semantic analyzer for Java 1.4. ReRAGs uses three synergistic mechanisms for supporting separation of concerns: inheritance for model modularization, aspects for cross-cutting concerns, and rewrites that allow computations to be expressed on the most suitable model. This allows compilers to be written in a high-level declarative and modular fashion, supporting language extensibility as well as reuse of modules for different compiler-related tools. We present the ReRAG formalism, its evaluation algorithm, and examples of its use. Initial measurements using a subset of the Java class library as our benchmarks indicate that our generated compiler is only a few times slower than the standard compiler, javac, in J2SE 1.4.2 SDK. This shows that ReRAGs are already useful for large-scale practical applications, despite that optimization has not been our primary concern so far.
TL;DR: The PQCC experience reveals that—contrary to common practice and belief—retargetability and a high level of optimization are not incompatible.
Abstract: The PQCC experience reveals that—contrary to common practice and belief—retargetability and a high level of optimization are not incompatible.
TL;DR: A natural language parse ranker of a natural language processing (NLP) system employs a goodness function to rank the possible grammatically valid parses of an utterance as mentioned in this paper.
Abstract: A natural language parse ranker of a natural language processing (NLP) system employs a goodness function to rank the possible grammatically valid parses of an utterance. The goodness function generates a statistical goodness measure (SGM) for each valid parse. The parse ranker orders the parses based upon their SGM values. It presents the parse with the greatest SGM value as the one that most likely represents the intended meaning of the speaker. The goodness function of this parse ranker is highly accurate in representing the intended meaning of a speaker. It also has reasonable training data requirements. With this parse ranker, the SGM of a particular parse is the combination of all of the probabilities of each node within the parse tree of such parse. The probability at a given node is the probability of taking a transition (“grammar rule”) at that point. The probability at a node is conditioned on highly predicative linguistic phenomena, such as “phrase levels,” “null transitions,” and “syntactic history”.