TL;DR: In this article, a model based on n-ary relations, a normal form for data base relations, and the concept of a universal data sublanguage are introduced, and certain operations on relations are discussed and applied to the problems of redundancy and consistency in the user's model.
Abstract: Future users of large data banks must be protected from having to know how the data is organized in the machine (the internal representation). A prompting service which supplies such information is not a satisfactory solution. Activities of users at terminals and most application programs should remain unaffected when the internal representation of data is changed and even when some aspects of the external representation are changed. Changes in data representation will often be needed as a result of changes in query, update, and report traffic and natural growth in the types of stored information.Existing noninferential, formatted data systems provide users with tree-structured files or slightly more general network models of the data. In Section 1, inadequacies of these models are discussed. A model based on n-ary relations, a normal form for data base relations, and the concept of a universal data sublanguage are introduced. In Section 2, certain operations on relations (other than logical inference) are discussed and applied to the problems of redundancy and consistency in the user's model.
TL;DR: The analysis of cross-classified data: Independence, quasi-independence, and Interactions in Contingency Tables with or without Missing Entries as mentioned in this paper is an example of such a model.
Abstract: (1968). The Analysis of Cross-Classified Data: Independence, Quasi-Independence, and Interactions in Contingency Tables with or without Missing Entries. Journal of the American Statistical Association: Vol. 63, No. 324, pp. 1091-1131.
TL;DR: In this article, the authors proposed an alternative based on a permanence of updating ratios, which guarantees all limit conditions even in the presence of complex data interdependence, and extended the recombination formula to any number n of data events.
Abstract: Consider the assessment of any unknown event A through its conditional probability P(A | B,C) given two data events B, C of different sources. Each event could involve many locations jointly, but the two data events are assumed such that the probabilities P(A | B) and P(A | C) can be evaluated. The challenge is to recombine these two partially conditioned probabilities into a model for P(A | B,C) without having to assume independence of the two data events B and C. The probability P(A | B,C) is then used for estimation or simulation of the event A. In presence of actual data dependence, the combination algorithm provided by the traditional conditional independence hypothesis is shown to be nonrobust leading to various inconsistencies. An alternative based on a permanence of updating ratios is proposed, which guarantees all limit conditions even in presence of complex data interdependence. The resulting recombination formula is extended to any number n of data events and a paradigm is offered to introduce formal data interdependence.
TL;DR: Starburst as discussed by the authors is a research prototype of an extensible relational database management system that is under development at IBM Almaden Research Center and includes hierarchical types and functions, large unstructured and structured complex objects, and user-defined rules to respond to changes in the database.
Abstract: Starburst is a research prototype of an extensible relational database management system that is under development at the IBM Almaden Research Center. Through extensions to Starburst, we are incorporating the advanced structuring and data behavior features offered by object-oriented database management systems, while retaining the significant gains in data independence and data integrity of the relational model and upward compatibility with its standard access language, SQL. Some of the advanced features supported by Starburst extensions, described in this paper, include hierarchies of user-defined types and functions, large unstructured and structured complex objects, and user-defined rules to respond to changes in the database.
TL;DR: An algorithm is presented, integrated with a conventional query optimizer, that translates queries over this logical schema into plans that access physical storage structures that allow storage structures to be tuned to the expected or observed workload to achieve significantly better performance than is possible with conventional techniques.
Abstract: Physical data independence is touted as a central feature of modern database systems. It allows users to frame queries in terms of the logical structure of the data, letting a query processor automatically translate them into optimal plans that access physical storage structures. Both relational and object-oriented systems, however, force users to frame their queries in terms of a logical schema that is directly tied to physical structures. We present an approach that eliminates this dependence. All storage structures are defined in a declarative language based on relational algebra as functions of a logical schema. We present an algorithm, integrated with a conventional query optimizer, that translates queries over this logical schema into plans that access the storage structures. We also show how to compile update requests into plans that update all relevant storage structures consistently and optimally. Finally, we report on experiments with a prototype implementation of our approach that demonstrate how it allows storage structures to be tuned to the expected or observed workload to achieve significantly better performance than is possible with conventional techniques.