TL;DR: It is shown that a specific problem solving episode, or case, may be viewed as data, information, or knowledge, depending on its role in decision making and learning from experience, and a conceptual framework for integration is suggested by focusing on their different roles and frames of reference within a decision-making process.
Abstract: The unclear distinction between data, information, and knowledge has impaired their combination and utilization for the development of integrated systems. There is need for a unified definitional model of data, information, and knowledge based on their roles in computational and cognitive information processing. An attempt to clarify these basic notions is made, and a conceptual framework for integration is suggested by focusing on their different roles and frames of reference within a decision-making process. On this basis, ways of integrating the functionalities of databases, information systems and knowledge-based systems are discussed by taking a knowledge level perspective to the analysis and modeling of systems behaviour. Motivated by recent work in the area of case-based reasoning related to decision support systems, it is further shown that a specific problem solving episode, or case, may be viewed as data, information, or knowledge, depending on its role in decision making and learning from experience. An outline of a case-based system architecture is presented, and used to show that a focus on the retaining and reuse of past cases facilitates a gradual and evolutionary transition from an information system to a knowledge-based system.
TL;DR: This paper proposes an integration of two theoretically well founded ORM techniques: FORM and PSM, with main focus on a common terminological framework, and on the notion of subtyping.
Abstract: Although Entity-Relationship (ER) modelling techniques are commonly used for information modelling, Object-Role Modelling (ORM) techniques are becoming increasingly popular, partly because they include detailed design procedures providing guidelines for the modeller. As with the ER approach, a number of different ORM techniques exist. In this paper, we propose an integration of two theoretically well founded ORM techniques: FORM and PSM. Our main focus is on a common terminological framework, and on the notion of subtyping. Subtyping has long been an important feature of semantic approaches to conceptual schema design. It is also the concept in which FORM and PSM differ the most in their formalization. The subtyping issue is discussed from three different viewpoints covering syntactical, identification, and population issues. Finally, a wider comparison of approaches to subtyping is made, which encompasses other ER-based and ORM-based information modelling techniques, and highlights how formal subtype definitions facilitate a comprehensive specification of subtype constraints.
TL;DR: The definition of CoLan is described, a high-level declarative Constraint Description Language for use with an Object-Oriented Database (OODB), which has features of both first-order logic and functional programming and is based on Daplex.
Abstract: This paper is about the definition of CoLan, a high-level declarative Constraint Description Language, for use with an Object-Oriented Database (OODB). CoLan has features of both first-order logic and functional programming and is based on Daplex. CoLan expressions are translated into Prolog code that implements the operational semantics of the constraint. Pieces of generated code are cached inside the class descriptor of the ‘host’ class attached to appropriate slots. The pieces of code are retrieved along an inheritance path when an update on the database is attempted. If the update violates any of the retrieved constraints then it is rejected with an informative message. Thus constraints are expressed declaratively and they can even be retracted individually. However, they are implemented efficiently as code-generated methods, triggered selectively by an update. The implementation is described for the ADAM OODB, which uses meta-classes of the CoLan system to generate class descriptions.
TL;DR: Algorithm SETM is described, a set-oriented algorithms for mining association rules that are simple, fast, and stable over the range of parameter values, and its performance makes it feasible to build interactive data mining tools for large databases.
Abstract: Data mining is an important real-life application for businesses. It is critical to find efficient ways of mining large data sets. In order to benefit from the experience with relational databases, a set-oriented approach to mining data is needed. In such an approach, the data mining operations are expressed in terms of relational or set-oriented operations. Query optimization technology can then be used for efficient processing.
In this paper, we describe set-oriented algorithms for mining association rules. Such algorithms imply performing multiple joins and thus may appear to be inherently less efficient than special-purpose algorithms. We develop new algorithms that can be expressed as SQL queries, and discuss optimization of these algorithms. After analytical evaluation, an algorithm named SETM emerges as the algorithm of choice. Algorithm SETM uses only simple database primitives, viz., sorting and merge-scan join. Algorithm SETM is simple, fast, and stable over the range of parameter values. It is easily parallelized and we suggest several additional optimizations. The set-oriented nature of Algorithm SETM makes it possible to develop extensions easily and its performance makes it feasible to build interactive data mining tools for large databases.
TL;DR: Fragments of a method for object oriented analysis and the process algebra that allows to formally verify a conceptual schema build according to this method for deadlock behaviour are presented.
Abstract: Object oriented models model structural and behavioural aspects of objects in the Universe of Discourse. As the dynamic aspects of objects include parallelism and synchronisation of object life cycles, conceptual schemes must be verified for problematic behaviour like deadlock. In this paper we will present fragments of a method for object oriented analysis and the process algebra that allows to formally verify a conceptual schema build according to this method for deadlock behaviour.
TL;DR: This research analyzes sets as abstractions of individual members, in order to develop a list of criteria for a ‘good’ set modelling construct, referred to as a grouping, within an object-oriented context.
Abstract: Although high-level data abstractions have been recognized as very important for modelling complex applications, one abstraction that is not widely incorporated into design methodologies is that of sets. This research analyzes sets as abstractions of individual members, in order to develop a list of criteria for a ‘good’ set modelling construct, referred to as a grouping. In addition, ways in which the member-of relationship, mapping individual members onto a set can be enriched with additional semantics, are proposed within an object-oriented context.
TL;DR: It is argued that from a pragmatic viewpoint, both layerd and Integrated approaches to support active capability need to be pursued and that a design that better meets the active database objectives is needed.
Abstract: The need for active capability for non-traditional applications and its concomitant benefits are well-established. Although the event-based technique for monitoring conditions (leading to the integrated architecture) is the most versatile of all the techniques, from a practical viewpoint there is a need for enhancing pre-existing non-active DBMSs to support active capability. The set of techniques that can be used for providing this add-on active capability (leading to the layered architecture) imposes certain limitations on the extent of active capability that can be supported. Insights into the details of techniques as well as their impact on the architecture entails a design that better meets the active database objectives. This paper identifies a repertoire of techniques for condition monitoring and discusses their suitability to different architectures. This paper argues that from a pragmatic viewpoint, both layerd and Integrated approaches to support active capability need to be pursued. Then it comes polling and event-based or asynchronous monitoring techniques using an implementation on Symbolics using Common Lisp with Flavors. The focus of this comparison is on: techniques, performance, influence of implementation strategies on performance, and identification of opportunities for optimization.
TL;DR: The complementary problem of using multiple sets of integrity constraints to create a new set of global integrity constraints is examined and helps facilitate both query optimization and update validation tasks.
Abstract: In a heterogeneous distributed database environment, each component database is characterized by its own logical schema and its own set of integrity constraints. The task of generating a global schema from a constituent local schemata has been addressed by many researchers. The complementary problem of using multiple sets of integrity constraints to create a new set of global integrity constraints is examined in this paper. These global integrity constraints facilitate both query optimization and update validation tasks.
TL;DR: The language Elisa-D, which is based on the grammar of the communication in the Universe of Discourse, is described, which has a direct meaning in the universe of discourse, while natural language expressions are easily formalised in this language.
Abstract: In this paper, we introduce a query language for evolving information systems. Evolving information systems go beyond the capacity of conventional database systems, not only as they incorporate a time dimension, but also since they allow all aspects of the system to evolve. The introduced language is related to the philosophy underlying NIAM (Natural language Information Analysis Method). This method investigates the grammar of the communication in the Universe of Discourse. Usually this grammar is depicted as an information structure diagram (NIAM or ER schema). This paper describes the language Elisa-D, which is based on this grammar. As a result, expressions in this language have a direct meaning in the universe of discourse, while natural language expressions are easily formalised in this language.
TL;DR: A conceptual framework for explaining database design expertise is proposed and the components of the framework are applied to each phase of the design process and used to provide guidelines for the level of expertise developers might strive to obtain.
Abstract: Database design is a complex and time-consuming process. In order to automate database design, an understanding of the nature of expertise that goes into the design process is needed. Although a number of expert systems have been developed to assist or replace a database designer, database design expertise has not been examined in any detail. This paper proposes a conceptual framework for explaining this type of expertise. The components of the framework are applied to each phase of the design process and used to provide guidelines for the level of expertise developers might strive to obtain. Several representative systems are analyzed, based on the framework, to explore the degree to which expertise is being captured. Implications for the future development of database design expert systems are discussed.
TL;DR: This paper presents a framework for agent communication based on the blackboard paradigm which is able to manage temporal information, and it provides its multiple access and coherence management protocols.
Abstract: The multi-agent system paradigm emerges as an interesting approach in the Knowledge Based System (KBS) field, when distributed problem-solving techniques are required for solving problems that can be represented as a collection of groups of cooperating intelligent individuals. A key concept in the multi-agent systems is the interaction between agents. On the other hand time plays a crucial role in a wide range of KBS applications. Temporal reasoning and representations consists of formalizing the notion of time and providing means to represent and reason about the temporal aspects of knowledge. This paper presents a framework for agent communication based on the blackboard paradigm which is able to manage temporal information, and it provides its multiple access and coherence management protocols.
TL;DR: An algorithmic method for transforming a binary-relationship conceptual schema to an object-oriented (OO) database schema, which first identifies the essential objects in the BR schema, together with their relationships and constraints, maintaining the semantics and all types of constraints present in the conceptual schema.
Abstract: We describe an algorithmic method for transforming a binary-relationship (BR) conceptual schema to an object-oriented (OO) database schema. The BR schema is a semantically rich diagram that represents the reality being modeled in terms of objects, relationships and constraints. It is easy to understand and serves as a communication tool between users and designers. Therefore it can be created in the early stages of system development, and later on be transformed into a specific OO database schema. The transformation method employs a multi-stage algorithm, which first identifies the essential objects in the BR schema, together with their relationships and constraints. These are then mapped to object classes, attributes, and constraints, maintaining the semantics and all types of constraints present in the conceptual schema.
TL;DR: A planner that reasons explicitly about time and a safe reaction in time is presented, illustrated by a railway control and transportation application that is relevant since it is safety-critical and co complex that knowledge-based techniques are necessary to master the complexity appropriately.
Abstract: This paper presents a planner that reasons explicitly about time and a safe reaction in time. A distributed architecture supports usage of knowledge-based techniques to find effective solutions and procedural knowledge to react very fast on asynchronous events. If enough time is available, the knowledge-based planner ‘programs’ the procedural component with an improved reaction. The approach is illustrated by a railway control and transportation application that is relevant since it is safety-critical and co complex that knowledge-based techniques are necessary to master the complexity appropriately. The Planner is implemented in a PROLOG-environment.
TL;DR: The main advantage of the proposed method is that the integrity check is performed primarily at compile time, and it was compared both fundamentally and experimentally with existing methods.
Abstract: An update of a consistent database can influence the integrity of the database. The available integrity checking methods in deductive databases are often criticized for their lack of efficiency. The main goal of this paper is to present a new integrity checking method which does not have some of the disadvantages of existing methods. The main advantage of the proposed method is that the integrity check is performed primarily at compile time. In order to demonstrate the improvement in efficiency of the proposed method it was compared both fundamentally and experimentally with existing methods.
TL;DR: This work identifies semantic problems associated with the querying and updating of spatio-temporal interval data and proposes operations which alleviate these problems, and formally shows the equivalence of the non-optimised and optimised operations, and discusses the performance gains of the latter.
Abstract: We identify semantic problems associated with the querying and updating of spatio-temporal interval data and propose operations which alleviate these problems. We first motivate two key requirements for the manipulation of such data, namely that no two tuples of a relation should intersect or be mergeable. We then examine the properties of two operations, unfold and fold, and show how they can be used to define three further operations which, respectively: eliminate intersecting or mergeable data from a relation incorporating interval attributes, yielding a so-called canonical relation: add data to a canonical relation while preserving the canonicity property; and remove data from a canonical relation while also preserving canonicity. We formally show the correctness of all these operations. An examination of their space and time requirements then leads us to define an equivalent set of optimised operations. We formally show the equivalence of the non-optimised and optimised operations, and discuss the performance gains of the latter.
TL;DR: A mechanism to establish preference hierarchies of norms and reasoning about the nonmonotonicity and augments conditional deontic logic with defeasible reasoning is provided in order to maintain consistency in the presence of de ontic conflicts and resolveDeontic paradoxes.
Abstract: Deontic logic, developed as a logic of normative reasoning, is often too rigid to be applied to practical normative systems. First, it fails to reason about situations in the presence of deontic conflicts. Also, it suffers from various logical paradoxes. A possible approach to solve problems of deontic conflicts is to establish preference hierarchies of norms and reasoning about the nonmonotonicity. This paper provides a mechanism to establish such hierarchies and augments conditional deontic logic with defeasible reasoning in order to maintain consistency in the presence of deontic conflicts and resolve deontic paradoxes.
TL;DR: Computer based design support associated with the IOOM methodology is discussed and the concept of specification refinement level allows the developer to examine and develop specifications at different levels of abstraction.
Abstract: The ITHACA application development approach emphasizes reuse of components, stored in a repository, during all development phases Methodological support is needed in particular in the first development phases: requirement collection and analysis, and requirement specification The ITHACA Object-Oriented Methodology (IOOM) is based on the concepts of object-orientation to facilitate composition of reusable specification components, on the concept of role, that permits a flexible composition of specifications, and on the concept of specification refinement level, that allows the developer to examine and develop specifications at different levels of abstraction Computer based design support associated with the IOOM methodology is discussed
TL;DR: The representation and implementation techniques used to build SPARK, a logic-based support system for designers engaged in concurrent engineering, are detailed, with an example from Printed Wiring Board Design.
Abstract: In this paper we detail the representation and implementation techniques used to build SPARK, a logic-based support system for designers engaged in concurrent engineering. Design rules are represented as constraints in a constraint satisfaction problem. This problem is translated into equivalent order-sorted logic formulae that form a concurrent engineering logic problem. The solution to this problem is determined through interactive constraint satisfaction performed by a deduction system and associated proof strategy. This illustrated with an example from Printed Wiring Board Design.
TL;DR: The proposed document model (called the D _model) combines the relational and object-oriented paradigms and adopts a very natural view for describing the office documents and an algebra for manipulating frame instances contained in folders is presented.
Abstract: This paper proposes a new approach to modeling documents in a personal office environment. The proposed document model (called the D _model) combines the relational and object-oriented paradigms and adopts a very natural view for describing the office documents. Documents are grouped into classes. Each class is characterized by a frame template, which describes the type for the class of documents. A frame template is instantiated by providing it with values to form a frame instance, representing a synopsis of a particular document associated with the template. Based on the nature of their contents, different frame instances can be grouped into a folder. Thus, a folder is a set of frame instances which may or may not be associated with the same template. The D _model describes documents using dual hierarchies: a document type hierarchy, depicting the structural organization of the documents, and a folder organization, representing the user's logical file structure. The document type hierarchy exploits structural commonalities between frame templates. Such a hierarchy helps to classify various documents. The folder organization mimics the user's real-world document filing system and provides the user with an intuitively clear view of his/her file structure. Such a view facilitates document retrieval and filing activities. We also present an algebra (called the D _algebra) for manipulating frame instances contained in folders. In contrast to existing algebraic languages, the D _algebra provides operators for manipulating heterogeneous sets (i.e. sets with elements of different types). The proposed document model and algebraic language have been implemented as part of TEXPROS, a personal document processing system currently running in our laboratory.
TL;DR: The syntax and the semantics of the language are described, and its use in statistical data modeling is discussed, and the basis for devising inference techniques for the language is described, based on an interesting correspondence between the language and propositional dynamic logic.
Abstract: We describe a new language for statistical data modeling. The language offers a general framework for the representation of elementary and summary data, and has three main characteristics: (i) the types of modeling primitives it provides are particularly suited for representing objects from a statistical point of view; (ii) it includes a rich set of structuring mechanisms for both elementary and summary data, which are given a formal semantics by means of logic; (iii) it is equipped with specialized inference procedures, allowing to perform different kinds of checks on the representation. The language is intended to be used during the specification phase of a statistical database, which we consider a knowledge-driven activity, where the availability of both powerful structuring mechanisms and suitable reasoning techniques constitute a valuable tool to the designer. The main focus of this paper is on the formal foundation of our approach. We describe the syntax and the semantics of the language, and we discuss its use in statistical data modeling. Also, we describe the basis for devising inference techniques for our language. Such techniques are based on an interesting correspondence between the language and propositional dynamic logic.
TL;DR: MOODD distinguishes itself by generating an object-oriented data definition from requirements written in a natural language by using a Requirement Specification Language (RSL) to represent user requirements.
Abstract: In this paper we propose a Method for Object-Oriented Database Design, called MOODD. Considerable recent research [22] has focused on object-oriented design techniques. MOODD distinguishes itself by generating an object-oriented data definition from requirements written in a natural language. Examples to illustrate the method are given for a particular OODBS called O2 [19]. MOODD consists of three phases. The first phase is to use a Requirement Specification Language (RSL) [27] to represent user requirements. Next, the RSL requirements are converted into an object-oriented conceptual model composed of the Nested Entity-Relationship (NER) [10]/ Update Protocol Model (UPM) [9]. Finally, the NER/UPM schemas are implemented by an object-oriented data model.
TL;DR: It is initially shown that all proposed Valid Time (VT) models can be applied to areas of practical interest, not related to VT data management, and the properties of these models are identified.
Abstract: In recent years a lot of divergent approaches have been proposed for the management of Valid Time (Historical) data. However, no systematic effort has been reported, concerning the identification of the properties of such a model. In the present work, it is initially shown that all proposed Valid Time (VT) models can be applied to areas of practical interest, not related to VT data management. All VT models are also classified with respect to two orthogonal parameters, the way time is represented, and the level at which it is incorporated (tuple or attribute). This enables to identify that two reference VT models can be specified, VT-1NF, a simple extension to the conventional relational model and VT-NESTED, a more general one, which supports, in addition, relation-valued attributes. The properties of these models are identified. Two more reference interval relational models are proposed, I-1NF and I-NESTED, which support any type of interval data. I-NESTED is the most general, in that it can be applied to all the areas in which all others are applicable. Results are also reported, concerning the evaluation of all VT models.
TL;DR: This paper inherits and extends the treatment of relationships found in semantic data models to behavioural object-oriented models by presenting an approach to uniformly capture the update rules for user-defined relationships.
Abstract: In semantic data models, abstract relationship (e.g. generalization, aggregation, etc.) semantics are defined, specifying how insertion, deletion and modification operations made at a higher level of abstraction can affect the objects abstracted over and vice versa. These semantics, also known as structural constraints, are expressed through so-called update rules. This perspective has been somewhat lost in most object-oriented systems, where user-defined relationships are supported as simple pointers and their semantics are embedded, distributed and replicated within the operations accessing these pointers. This paper inherits and extends the treatment of relationships found in semantic data models to behavioural object-oriented models by presenting an approach to uniformly capture the update rules for user-defined relationships. The stress is not on supporting relationships as first-class objects, but on describing their update rules (or operational semantics) through a set of constructors namely, reaction, anticipation, delegation and exception. The approach has been borne out by an implementation in an active object-oriented database system.
TL;DR: The syntax and the semantics of this language, called Datalog A, are given by showing that model theoretic properties of ordinary Datalogs extend to Datalogy A .
Abstract: In this paper the problem of extending the logic database language Datalog with primitives to support array definitions and manipulations is addressed. The syntax and the semantics of this language, called DatalogA, are given by showing that model theoretic properties of ordinary Datalog extend to DatalogA . DatalogA fixpoint semantics and its implementation are also studied and presented. Sufficient conditions assuring program evaluation convergence when manipulating real-valued arrays are finally discussed.
TL;DR: The presentation includes material on the ER graph rewriting formalism, i.e., the actual tool, as well as an introduction to some formal graph rewriting prerequisites, to clarify the underlying formal concepts.
Abstract: Sequential graph rewriting systems are proposed as a meta-level formalism providing the concise and sound definition of different ER diagram languages. These rewriting systems can be used to define ER-based approaches for various DB modelling subtasks like schema design, evolution and integration. In addition, they are a natural choice for syntax directed ER CASE workbenches. In particular, by using specialized ER graph rewriting systems as meta input for CASE tools, the resulting tool behavior can be guided and controlled. Moreover, grammar driven modelling tools can be easily adapted for the needs of a particular enterprise or software factory without superimposing a particular ER dialect on the end users. Additional benefits result from the use of ER graph rewriting systems as a comparison framework for the continuously enlarging set of ER dialects. The presentation includes material on the ER graph rewriting formalism, i.e., the actual tool, as well as an introduction to some formal graph rewriting prerequisites. An exemplary application, in particular ER graph generation, is used to clarify the underlying formal concepts.
TL;DR: In the absence of a data strucure which can provide all types of access equally efficiently, an integrated data structure is an acceptable solution which offers an efficient way for increasing the performance of database management systems.
Abstract: In the past a number of file organizations have been proposed for processing different types of queries efficiently. To our knowledge none of the existing file organizations is capable of supporting all types of accesses equally efficiently. In this paper we have taken a different approach for designing an integated data structure which offers multiple access paths for processing different types of queries efficiently. The data structure reported here can be implemented on disk based as well as main memory database systems, however, in this paper we report its behavior mainly in main memory database environment. Our approach is to fuse those data structures which offer an efficient access paths for a particular type. To show the feasibility of our scheme we fused the B + -tree, the grid file and extendible hashing structures, using a proper interface. We implemented and measured its performance through simulation modeling. Our results show that the data structure does improve concurrency and offers a higher throughput for a variety of transaction processing workloads. We argue that our scheme is different than creating secondary indexes for improving concurrency. In the absence of a data strucure which can provide all types of access equally efficiently, an integrated data structure is an acceptable solution which offers an efficient way for increasing the performance of database management systems.
TL;DR: A coarse-grained algorithm for active and procedural object distribution is developed in order to endow the individual workstations with as much autonomy as possible and to enable the model to transit one state to another with a minimum level of perturbation.
Abstract: We present a conceptual framework for a distributed office system using concepts inherent to object-oriented formalism. The functions of the active object and the procedure servers are described. So are the attributes of the objects stored in these two servers. In order to endow the individual workstations with as much autonomy as possible, we develop a coarse-grained algorithm for active and procedural object distribution. An office model should have a high level of evolvability. To accommodate this phenomenon we devise a mechanism to enable the model to transit one state to another with a minimum level of perturbation.
TL;DR: A new object identification scheme for object-oriented databases is suggested, which is a logical identification scheme, but the indirection of references is implemented effectively, and experiments confirm the benefits of the proposed organization.
Abstract: A new object identification scheme for object-oriented databases is suggested. Its purpose is to choose object surrogates so that they convey information about preferred clustering of stored objects. The surrogates are composed of two fields: cluster code and sequence number. A tailored variant of extendible hashing is proposed for the access method. A hash function is evaluated to produce the cluster code at object creation time, so that surrogates themselves act as hashed pseudokeys. The problem how to manage variable-size logical clusters is solved by chopping the large ones into physical subclusters of restricted size. The suggested approach shares the advantages of plain surrogates and structured addresses: it is a logical identification scheme, but the indirection of references is implemented effectively. Experiments confirm the benefits of the proposed organization.
TL;DR: This paper presents a CF estimation procedure which can be applied to totally clustered attributes and shows the accuracy of the proposed CF estimates and the improvment in their behaviour compared to previously published estimates.
Abstract: Cost models based on the clustering factor (CF) of the attributes have been proposed and shown to be attractive for block access estimation in databases, thanks to their accuracy and economy of use. While query optimizers can use the actual CFs , measured from the data, physical design methods and tools must rely on estimates before the data are stored. In this paper we present a CF estimation procedure which can be applied to totally clustered attributes (e.g. ordered attributes). Simple and accurate approximations of the derived formulas are also introduced. Simulations show the accuracy of the proposed CF estimates and the improvment in their behaviour compared to previously published estimates. Reliability for physical design of cost models based on the CF in the presence of a skewed data distribution is also discussed.