TL;DR: Intelligent, tool-supported techniques to information extraction and integration from both structured and semistructured data sources are proposed and provided in the framework of the MOMIS system based on a conventional wrapper/mediator architecture.
Abstract: Developing intelligent tools for the integration of information extracted from multiple heterogeneous sources is a challenging issue to effectively exploit the numerous sources available on-line in global information systems. In this paper, we propose intelligent, tool-supported techniques to information extraction and integration from both structured and semistructured data sources. An object-oriented language, with an underlying Description Logic, called ODLI3, derived from the standard ODMG is introduced for information extraction. ODLI3 descriptions of the source schemas are exploited first to set a Common Thesaurus for the sources. Information integration is then performed in a semiautomatic way by exploiting the knowledge in the Common Thesaurus and ODLI3 descriptions of source schemas with a combination of clustering techniques and Description Logics. This integration process gives rise to a virtual integrated view of the underlying sources for which mapping rules and integrity constraints are specified to handle heterogeneity. Integration techniques described in the paper are provided in the framework of the MOMIS system based on a conventional wrapper/mediator architecture.
TL;DR: An overview of the basic key enabling technologies needed to build intelligent information agents, and respective examples of information agent systems currently deployed on the Internet are presented.
Abstract: The vast amount of heterogeneous information sources available on the Internet demands advanced solutions for acquiring, mediating, and maintaining relevant information for the common user. Intelligent information agents are autonomous computational software entities that are especially meant to (1) provide pro-active resource discovery, (2) resolve information impedance of information consumers and providers, and (3) offer value-added information services and products. These agents are supposed to cope with the difficulties associated with the information overload of the user, preferably just in time. Based on a systematic classification of intelligent information agents, this paper presents an overview of the basic key enabling technologies needed to build such agents, and respective examples of information agent systems currently deployed on the Internet.
TL;DR: This paper extends the traditional organizational meta model with teams and proposes a team-enabled workflow reference model and uses object constraint language (OCL) to express constraints with respect to the distribution of work to teams.
Abstract: Today's workflow systems assume that each work item is executed by a single worker. From the viewpoint of the system, a worker with the proper qualifications selects a work item, executes the associated work, and reports the result. There is usually no support for teams, i.e., groups of people collaborating by jointly executing work items (e.g., the program committee of a conference, the management team of a company, a working group, and the board of directors). In this paper, we propose the addition of a team concept to today's workflow management systems. Clearly, this involves a marriage of workflow and groupware technology. To shed light on the introduction of teams, we extend the traditional organizational meta model with teams and propose a team-enabled workflow reference model. For this reference model and to express constraints with respect to the distribution of work to teams, we use object constraint language (OCL).
TL;DR: This work uses ontologies to derive a canonical structure, i.e. a DTD, to access sets of distributed XML documents on a conceptual level to lead to applications providing a broad range of high quality information.
Abstract: Currently dozens of XML-based applications exist or are under development. Many of them offer DTDs that define the structure of actual XML documents. Access to these documents relies on special purpose applications or on query languages that are closely tied to the document structures. Our approach uses ontologies to derive a canonical structure, i.e. a DTD, to access sets of distributed XML documents on a conceptual level.We will show how the combination of conceptual modeling, inheritance, and inference mechanisms on the one hand with the popularity, simplicity, and flexibility of XML on the other hand leads to applications providing a broad range of high quality information.
TL;DR: Today's data warehouse and OLAP systems offer little support to automatize decision tasks that occur frequently and for which well-established decision procedures are available, so functionality can be provided by extending the conventional data warehouse architecture with analysis rules.
Abstract: Conventional data warehouses are passive. All tasks related to analysing data and making decisions must be carried out manually by analysts. Today's data warehouse and OLAP systems offer little support to automatize decision tasks that occur frequently and for which well-established decision procedures are available. Such a functionality can be provided by extending the conventional data warehouse architecture with analysis rules, which mimic the work of an analyst during decision making. Analysis rules extend the basic event/condition/action (ECA) rule structure with mechanisms to analyse data multidimensionally and to make decisions. The resulting architecture is called active data warehouse.
TL;DR: It is proved formally that, in general, lying and refusal are incomparable in many respects, but, under fairly natural assumptions, lies and refusals lead to surprisingly similar behaviors and convey exactly the same information to the user.
Abstract: Security policies and the corresponding enforcement mechanisms may have to deal with the logical consequences of the data encoded in information systems. Users may apply background knowledge about the application domain and about the system to infer more information than what is explicitly returned as answers to their queries. Some of the approaches to dealing with such a scenario are dynamic . For each query, the correct answer is first judged by some censor and then – if necessary – appropriately modified to preserve security. In this paper we contribute to the formal study of such approaches by extending to the case of known potential secrets the comparison of the two possible answer modifications, namely, lying and refusal . First, we explicitly define the security requirements. Second, we extend to such requirements a previous results on security preservation using lies. Then we introduce a variant of the refusal-based approach, suitable for potential secrets. Finally, we extensively analyze and compare the two approaches. We prove formally that, in general, they are incomparable in many respects, but, under fairly natural assumptions, lies and refusals lead to surprisingly similar behaviors and convey exactly the same information to the user. The latter result leads to a fundamental new insight on the relative benefits of the two approaches.
TL;DR: This paper presents the semantic knowledge that needs to be captured during transformation to ensure a correct relational schema and shows an algorithm that can derive such semantic knowledge from a given XML Document Type Definition and preserve it as semantic constraints in relational database terms.
Abstract: As Extensible Markup Language (XML) is emerging as the data format of the Internet era, there are increasing needs to efficiently store and query XML data. One path to this goal is transforming XML data into relational format in order to use relational database technology. Although several transformation algorithms exist, they are incomplete in the sense that they focus only on structural aspects and ignore semantic aspects. In this paper, we present the semantic knowledge that needs to be captured during transformation to ensure a correct relational schema. Further, we show an algorithm that can (1) derive such semantic knowledge from a given XML Document Type Definition (DTD) and (2) preserve the knowledge by representing it as semantic constraints in relational database terms. By combining existing transformation algorithms and our constraints-preserving algorithm, one can transform XML DTD to relational schema where correct semantics and behaviors are guaranteed by the preserved constraints. Experimental results are also presented.
TL;DR: A general method is provided that, given a set of select-project-join queries to be satisfied by the DW, generates sets of materialized views that satisfy all the input queries.
Abstract: A global data warehouse (DW) integrates data from multiple distributed heterogeneous databases and other information sources. A global DW can be abstractly seen as a set of materialized views. The selection of views for materialization in a DW is an important decision in the design of a DW. Current commercial products do not provide tools for automatic DW design. We provide a general method that, given a set of select-project-join queries to be satisfied by the DW, generates sets of materialized views that satisfy all the input queries. This process is complex since `common subexpressions' between the queries need to be detected and exploited. Our method is then applied to solve the problem of selecting such a materialized view set that fits in the space allocated to the DW for materialization and minimizes the combined overall query evaluation and view maintenance cost. We design algorithms which are implemented and we report on their experimental evaluation.
TL;DR: The problem of finding traversal patterns from collections of frequently occurring access sequences is examined and three algorithms, one which is level-wise with respect to the lengths of the patterns and two which are not are presented.
Abstract: In data models that have graph representations, users navigate following the links of the graph structure. Conducting data mining on collected information about user accesses in such models, involves the determination of frequently occurring access sequences. In this paper, the problem of finding traversal patterns from such collections is examined. The determination of patterns is based on the graph structure of the model. For this purpose, three algorithms, one which is level-wise with respect to the lengths of the patterns and two which are not are presented. Additionally, we consider the fact that accesses within patterns may be interleaved with random accesses due to navigational purposes. The definition of the pattern type generalizes existing ones in order to take into account this fact. The performance of all algorithms and their sensitivity to several parameters is examined experimentally.
TL;DR: A novel mechanism based on generic models of product individuals organised into a specialisation hierarchy to support multiple abstraction levels is defined and a set of transformation operations on models is defined.
Abstract: The need for product customisation is driving industrial companies towards a very large product variety, which affects many functions of a company, including the after-sales. Systematic maintenance records of very different product individuals cannot be kept without an abstract view to the population of delivered products. However, the older the product individual, the less systematically recorded information there usually is about it. We defined a novel mechanism based on generic models of product individuals organised into a specialisation hierarchy to support multiple abstraction levels. For creating such hierarchies, we defined a set of transformation operations on models.
TL;DR: The view selection problem under the maintenance time constraint is investigated and two efficient, heuristic algorithms for the problem are proposed to reduce the problem to some well-solved optimization problems.
Abstract: A data warehouse is a data repository which collects and maintains a large amount of data from multiple distributed, autonomous and possibly heterogeneous data sources. Often the data is stored in the form of materialized views in order to provide fast access to the integrated data. One of the most important decisions in designing a data warehouse is the selection of views for materialization. The objective is to select an appropriate set of views that minimizes the total query response time with the constraint that the total maintenance time for these materialized views is within a given bound. This view selection problem is totally different from the view selection problem under the disk space constraint. In this paper the view selection problem under the maintenance time constraint is investigated. Two efficient, heuristic algorithms for the problem are proposed. The key to devising the proposed algorithms is to define good heuristic functions and to reduce the problem to some well-solved optimization problems. As a result, an approximate solution of the known optimization problem will give a feasible solution of the original problem. (C) 2001 Elsevier Science B.V. All rights reserved.
TL;DR: This research proposes a smart web query (SWQ) method for the semantic retrieval of web data that uses domain semantics represented as context ontologies to specify and formulate appropriate web queries to search and relies on semantic search filters to identify and rank relevant web pages semi-automatically.
Abstract: The efficient query and extraction of web data is often difficult, because web data does not conform to any data organization standard. In addition, the development of web search technology is still at a relatively early stage. Search engines provide only primitive data query capabilities, and require a detailed syntactic specification to retrieve relevant data. Furthermore, web data exists in a myriad of formats including PDF documents, images, and sound clips that are difficult to be queried. This research proposes a smart web query (SWQ) method for the semantic retrieval of web data. The SWQ method uses domain semantics represented as context ontologies to specify and formulate appropriate web queries to search. This method also relies on semantic search filters to identify and rank relevant web pages semi-automatically. Unlike traditional ontologies that are structured in a hierarchy, terms and their relationships that pertain to a particular domain are organized with a flexible structure by the context ontologies. An SWQ engine is being developed to test the proposed method. Financial trading (e.g. stocks, bonds, unit trusts) is adapted as an example domain (i.e., context) to test and validate the SWQ method and engine.
TL;DR: A Description Logic scheme is described – a Description Logic (DL) – and it is shown through an example how a DL can play a part in the classification construction process, aiding in the production of coherent hierarchies and ensuring that the relationships represented in a thesaurus are sensible.
Abstract: Semantic metadata describing subject content plays a vital role in supporting indexing and retrieval in Digital Libraries. Mechanisms used to deliver this metadata include keyword collections, thesauri and classifications. Constructing a large thesaurus, however, is a difficult process which can be facilitated through the application of knowledge representation techniques developed for managing and reasoning about concepts. We describe such a scheme – a Description Logic (DL) – and show through an example how a DL can play a part in the classification construction process, aiding in the production of coherent hierarchies and ensuring that the relationships represented in a thesaurus are sensible.
TL;DR: An approach to integrate object life-cycles in object-oriented database schemas using a notation based on Petri nets is presented and different methods of integration depending on identified semantic correspondences between the object types in the views are introduced.
Abstract: Database schemas are often not defined by a single person but by several future users of the database who define their possibly different views on the proposed database schema. These views are collected and integrated during the subsequent design step of view integration . In object-oriented databases, view integration must handle the integration of the structure and the behavior of objects. Whereas integration of object structure has been treated in the realm of semantic data models in the past, integration of object behavior has received little attention so far. Behavior is usually defined at two levels of detail: by activities, which correspond to methods in object-oriented languages, and by object life-cycles, which represent the overall behavior of objects during their life time. This paper presents an approach to integrate object life-cycles in object-oriented database schemas using a notation based on Petri nets. We will introduce different methods of integration depending on identified semantic correspondences between the object types in the views.
TL;DR: This paper defines the dense region location problem as an optimization problem and develops a chunk scanning algorithm to compute dense regions, and proves a lower bound on the accuracy of the dense regions computed.
Abstract: On-line analytical processing (OLAP) has become a very useful tool in decision support systems built on data warehouses. Relational OLAP (ROLAP) and multidimensional OLAP (MOLAP) are two popular approaches for building OLAP systems. These two approaches have very different performance characteristics: MOLAP has good query performance but bad space efficiency, while ROLAP can be built on mature RDBMS technology but it needs sizable indices to support it. Many data warehouses contain many small clustered multidimensional data ( dense regions ), with sparse points scattered around in the rest of the space. For these databases, we propose that the dense regions be located and separated from the sparse points. The dense regions can subsequently be represented by small MOLAPs, while the sparse points are put in a ROLAP table. Thus the MOLAP and ROLAP approaches can be integrated in one structure to build a high performance and space efficient dense-region-based data cube. In this paper, we define the dense region location problem as an optimization problem and develop a chunk scanning algorithm to compute dense regions. We prove a lower bound on the accuracy of the dense regions computed. Also, we analyze the sensitivity of the accuracy on user inputs. Finally, extensive experiments are performed to study the efficiency and accuracy of the proposed algorithm.
TL;DR: The results of the evaluation show that the performance of the RDB implementation transferred from an OO conceptual model using the object-relational transformation methodology is better than the relational implementation using a conventional relational modelling.
Abstract: The emergence of the object-oriented (OO) methodology has shown its capabilities in modelling the real world better than the earlier relational methodology. However, object-oriented databases (OODBs) are still considered immature in comparison with relational databases (RDBs) which have existed for many years. RDBs still continue to dominate the implementation of databases constituting more than 90% of all database implementations [28] . It was felt worthwhile to exploit the great modelling power of OO methodology, while still facilitating relational implementations. These reasons have led us to develop an object-relational transformation methodology [20] , [21] , [22] , [23] , [24] , [25] which allows us to use the OO methodology for data modelling and to transform it into a relational logical model for implementation in relational database management systems (RDBMSs). The main purpose of this paper is to present a performance evaluation of the transformation methodology. The evaluation covers I/O cost models of different types of queries. The type of evaluation is basically comparison-based, in which the performance of SQL operations upon a set of tables derived from the relational data model is compared with the tables derived from the OO data model using the transformation methodology. The results of the evaluation show that the performance of the RDB implementation transferred from an OO conceptual model using our object-relational transformation methodology is better than the relational implementation using a conventional relational modelling. Moreover, in many cases, the relational modelling is not applicable since it cannot capture the design semantics particularly relating to collection types. Our object-relational methodology solves this problem.
TL;DR: A method for selecting and materializing views is proposed, which selects and horizontally fragments a view, recomputes the size of the stored partitioned view while deciding further views to select.
Abstract: Data warehouse views typically store large aggregate tables based on a subset of dimension attributes of the main data warehouse fact table. Aggregate views can be stored as 2 n subviews of a data cube with n attributes. Methods have been proposed for selecting only some of the data cube views to materialize in order to speed up query response time, accommodate storage space constraint and reduce warehouse maintenance cost. This paper proposes a method for selecting and materializing views, which selects and horizontally fragments a view, recomputes the size of the stored partitioned view while deciding further views to select. ” 2001 Elsevier Science B.V. All rights reserved.
TL;DR: This paper defines a new technique to manage cover stories and proposes a formal representation of a multilevel database containing cover stories that can be interpreted for any kind of database (e.g., relational, object-oriented, etc.).
Abstract: In a multilevel database, cover stories are usually managed using the ambiguous technique of polyinstantiation. In this paper, we define a new technique to manage cover stories and propose a formal representation of a multilevel database containing cover stories. Our model aims to be a generic model, that is, it can be interpreted for any kind of database (e.g. relational, object- oriented etc). We then consider the problem of updating a multilevel database containing cover stories managed with our technique.
TL;DR: This paper proposes several prefetch policies, especially the Hillbert curve-based ones which can alleviate user response time under the assumption that user callback access pattern has spatial locality, and shows that the prefetch strategies based on theHillbert curve achieve higher efficiency than the other naive or no prefetch ones.
Abstract: In a Web-enabled geographic information system (GIS) application, it would be possible for users to navigate existing spatial objects (e.g., points, lines) or spatial query results containing large objects (e.g., raster images, Web documents) on the Web browser. For efficiency, relatively `light' spatial objects exist on the Web browser, while `heavy' real information like large objects resides in the remote server. Only when the users callback real information for a certain spatial object, the server transmits it to the browser. In this paper, we propose several prefetch policies, especially the Hillbert curve-based ones which can alleviate user response time under the assumption that user callback access pattern has spatial locality. We conducted diverse experiments to show that our prefetch strategies based on the Hillbert curve achieve higher efficiency than the other naive or no prefetch ones.
TL;DR: An approach for designing an object-relational database that inspects each case of semantic relationships and proposes some solutions for implementation, and makes a comparison with the relational data model.
Abstract: We propose an approach for designing an object-relational database. We inspect each case of semantic relationships (one-to-one, one-to-many, many-to-many and n -ary). For each of these cases we list the different solutions for implementation and propose some solutions. We also make a comparison with the relational data model. Our results can be applied to current RDBMS including object extensions (Oracle8, DB2-IBM, Informix…).
TL;DR: A conceptual architecture for the organization information space across collections of component systems in a multi- database network that provides serendipity, exploration and contextualisation support so that users can achieve logical connections between concepts they are familiar with and schema terms employed in multi-database systems is presented.
Abstract: The promises of network-accessible information are increasingly difficult to achieve. These difficulties are due to a variety of causes, such as, the rapid growth in the volume of network-available information and the increasing complexity, diversity and terminological fluctuations of the different information sources available. This paper presents a conceptual architecture for the organization information space across collections of component systems in a multi-database network that provides serendipity, exploration and contextualisation support so that users can achieve logical connections between concepts they are familiar with and schema terms employed in multi-database systems. Large-scale searching for multi-database schema information is guided by a combination of lexical, structural and semantic aspects of schema terms in order to reveal more meaning both about the contents of an information term and about its placement within the distributed information space.
TL;DR: Warehousing the web in this context consists of creating different virtual web views with layered databases of descriptors organized hierarchicly and using a declarative adhoc mining language to locate explicit as well as implicit knowledge from the web warehouse.
Abstract: We are so used to the ubiquitous World-Wide Web (WWW) that we take it for granted. There is no need to emphasize how dynamic, large, rich, and unstructured, yet important the web is. From researchers and engineers to children and retired elderly, everyone uses the WWW for a variety of needs. A multitude of tools and search engines were developed to find and retrieve resources from the web. However, everyone knows how frustrating the experience with search engines can be. It is very difficult to find, if ever found, relevant information or patterns from within resources on the Internet. The idea presented in this paper is to “warehouse” the web in a structure that would allow efficient information retrieval and knowledge discovery from the Internet. Warehousing the web in this context consists of creating different virtual web views with layered databases of descriptors organized hierarchicly. Using a declarative adhoc mining language, one can find and pinpoint explicit as well as implicit knowledge from the web warehouse.
TL;DR: This work proves the necessity of path cardinality constraints, and gives an appropriate foundation for the notion of pivoting such that decomposing relationship types does no longer require the existence of a given unary functional dependency.
Abstract: In the relational data model, the problem of data redundancy has been successfully tackled via decomposition. In advanced data models, decomposition by pivoting provides a similar concept. Pivoting has been introduced by Biskup, Menzel and Polle, and used for decomposing relationship types according to a unary non-key functional dependency. Our objective is to study pivoting in the presence of cardinality constraints which are commonly used in semantic data models. For this, we generalize the notion of pivoting such that decomposing relationship types does no longer require the existence of a given unary functional dependency. In order to ensure the equivalence of the given schema and its image under pivoting, the original application-dependent constraints have to be preserved. We discuss this problem for sets of participation and co-occurrence constraints. In particular, we prove the necessity of path cardinality constraints, and give an appropriate foundation for this concept.
TL;DR: A methodology for the design of an efficient storage structure of OODB that minimizes the database operating costs and uses a genetic algorithm to solve the intractable problem of inheritance of instance variables.
Abstract: Object-oriented databases (OODBs) are known to be rich in functionality but poor in performance. One of the important factors that affect performance is the physical database design. We developed a methodology for the design of an efficient storage structure of OODB that minimizes the database operating costs. The input for our method is the logical OODB schema and set of user transactions of retrieval and update types. The output of our method is the determination of which instance variables should be inherited from direct and indirect superclasses and stored in which subclasses. We used a genetic algorithm (GA) to solve this intractable problem. The methodology was applied on a university database. Compared to previous storage models, the storage model produced with our methodology showed database performance improvement ranging from 26% to 31%, on the average. Our results demonstrate a cost-effective storage structure design that boosts the operating performance of OODBs.
TL;DR: This paper describes operations according to the DataBase Version (DBV) model, which allows an efficient management of as many versions as needed of the real world entities in multiversion databases.
Abstract: Multiversion databases allow to represent in a database several states, or versions, of the real world entities. To take into account the new dimension introduced by versioning, new operations must be added to conventional database programming languages. In this paper, we describe such operations according to the DataBase Version (DBV) model, which allows an efficient management of as many versions as needed. Operations are first presented intuitively, then formal definitions of their syntax and their semantics is given. The work presented is considered as a syntactical framework for the development of sophisticated tools for design applications and configuration management. Special attention is paid to operations on complex object versions.
TL;DR: The peski project is examined, which is concerned with assisting a human expert to build knowledge-based systems under uncertainty, and how verification and validation are currently achieved in peski is examined.
Abstract: Knowledge-base V&V primarily addresses the question: “Does my knowledge-base contain the right answer and can I arrive at it?” One of the main goals of our work is to properly encapsulate the knowledge representation and allow the expert to work with manageable-sized chunks of the knowledge-base. This work develops a new methodology for the verification and validation of Bayesian knowledge-bases that assists in constructing and testing such knowledge-bases. Assistance takes the form of ensuring that the knowledge is syntactically correct, correcting “imperfect” knowledge, and also identifying when the current knowledge-base is insufficient as well as suggesting ways to resolve this insufficiency. The basis of our approach is the use of probabilistic network models of knowledge. This provides a framework for formally defining and working on the problems of uncertainty in the knowledge-base. In this paper, we examine the peski project which is concerned with assisting a human expert to build knowledge-based systems under uncertainty. We focus on how verification and validation are currently achieved in peski .
TL;DR: It is shown that, while CQL is easy to use and user-friendly, it is nonetheless more than first-order complete.
Abstract: Concept-based query languages allow users to specify queries directly against conceptual schemas. The primary goal of their development is ease-of-use and user-friendliness. However, existing concept-based query languages require the end-user to explicitly specify query paths in totality, thereby rendering such systems not as easy to use and user-friendly as they could be. The conceptual query language (CQL) discussed in this paper also allows end-users to specify queries directly against the conceptual schemas of database applications, using concepts and constructs that are native to and exist on the schemas. Unlike other existing concept-based query languages, however, CQL queries are abbreviated, i.e., the entire path of a query does not have to be specified. CQL is, therefore, an abbreviated concept-based query language. CQL is developed with the aim of combining the ease-of-use and user-friendliness of concept-based languages with the power of formal languages. It does not require end-users to be familiar with the structure and organization of the application database, but only with the content. Therefore, it makes minimal demands on end-users' cognitive knowledge of database technology without sacrificing expressive power. In this paper, the formal semantics and the theoretical basis of CQL are presented. It is shown that, while CQL is easy to use and user-friendly, it is nonetheless more than first-order complete. A contribution of this study is the use of the semantic roles played by entities in their associations with other entities to support abbreviated conceptual queries. Although only mentioned here in passing, a prototype of CQL has been implemented as a front-end to a relational database manager.
TL;DR: An extension of W ord N et is presented, which contains a number of special types of relationships that are not available in W Ord N et and can be used in a special C ase -tool for supporting C olor -X.
Abstract: In this article we discuss what kind of information can be obtained from W ord N et and what kind of information should be added to W ord N et in order to make it better suitable for the support of the C olor -X method. We will present an extension of W ord N et (called W ord N et ++), which contains a number of special types of relationships that are not available in W ord N et . Additionally, W ord N et ++ is instantiated with knowledge about some particular domain. Also, we will show some example scenarios for how W ord N et ++ can be used for verification of C olor -X-models and we will show how this lexicon can be used in a special C ase -tool for supporting C olor -X.
TL;DR: This work pretend that inserting a terminology between informal textual documents and their formalization can help to serve both of these goals.
Abstract: Modeling often concerns the translation of informal texts into formal representations. This translation process requires support for itself and for its traceability. We pretend that inserting a terminology between informal textual documents and their formalization can help to serve both of these goals. Modern terminology extraction tools support the formalization process by using terms as a first sketch of formalized concepts. Moreover, the terms can be employed for linking the concepts and the textual sources. They act as a powerful navigation structure. This is exemplified through the presentation of a fully implemented system.
TL;DR: This paper is the first to propose comprehensive support for relationship evolution during schema evolution, and presents an approach that de-couples the constraints from the schema evolution code, thereby enabling their update without any re-coding effort.
Abstract: Relationships have been repeatedly identified as an important object-oriented modeling construct. Most emerging modeling standards such as the object database management group (ODMG) object model and UML have some support for relationships. However object-oriented database (OODB) systems have largely ignored the existence of relationships during schema evolution. We are the first to propose comprehensive support for relationship evolution. A complete schema evolution facility for any OODB system must provide primitives to manipulate all object model constructs, and maintenance strategies for the structural and referential integrity of the database under such evolution. We propose a set of basic evolution primitives for relationships as well as a compound set of changes that can be applied to the same. However, given the myriad of possible change semantics a user may desire in the future, any pre-defined set is not sufficient. Rather we present a flexible schema evolution framework that allows the user to define new relationship transformations as well as to extend existing ones. Addressing the second problem, namely of updating schema evolution primitives to conform to the new set of invariants, can be a very expensive re-engineering effort. In this paper we present an approach that de-couples the constraints from the schema evolution code, thereby enabling their update without any re-coding effort. We also present an approach that can be used to verify the correctness of these complex evolution operations using the de-coupled constraints.