Top 56 papers presented at Data and Knowledge Engineering in 2001

Showing papers presented at "Data and Knowledge Engineering in 2001"

Journal Article•10.1016/S0169-023X(00)00047-1•

Semantic integration of heterogeneous information sources

[...]

Sonia Bergamaschi¹, Silvana Castano², Maurizio Vincini¹, Domenico Beneventano¹•Institutions (2)

University of Modena and Reggio Emilia¹, University of Milan²

1 Mar 2001

TL;DR: Intelligent, tool-supported techniques to information extraction and integration from both structured and semistructured data sources are proposed and provided in the framework of the MOMIS system based on a conventional wrapper/mediator architecture.

...read moreread less

Abstract: Developing intelligent tools for the integration of information extracted from multiple heterogeneous sources is a challenging issue to effectively exploit the numerous sources available on-line in global information systems. In this paper, we propose intelligent, tool-supported techniques to information extraction and integration from both structured and semistructured data sources. An object-oriented language, with an underlying Description Logic, called ODLI3, derived from the standard ODMG is introduced for information extraction. ODLI3 descriptions of the source schemas are exploited first to set a Common Thesaurus for the sources. Information integration is then performed in a semiautomatic way by exploiting the knowledge in the Common Thesaurus and ODLI3 descriptions of source schemas with a combination of clustering techniques and Description Logics. This integration process gives rise to a virtual integrated view of the underlying sources for which mapping rules and integrity constraints are specified to handle heterogeneity. Integration techniques described in the paper are provided in the framework of the MOMIS system based on a conventional wrapper/mediator architecture.

...read moreread less

348 citations

Journal Article•10.1016/S0169-023X(00)00049-5•

Information agent technology for the Internet: a survey

[...]

Matthias Klusch¹•Institutions (1)

German Research Centre for Artificial Intelligence¹

1 Mar 2001

TL;DR: An overview of the basic key enabling technologies needed to build intelligent information agents, and respective examples of information agent systems currently deployed on the Internet are presented.

...read moreread less

Abstract: The vast amount of heterogeneous information sources available on the Internet demands advanced solutions for acquiring, mediating, and maintaining relevant information for the common user. Intelligent information agents are autonomous computational software entities that are especially meant to (1) provide pro-active resource discovery, (2) resolve information impedance of information consumers and providers, and (3) offer value-added information services and products. These agents are supposed to cope with the difficulties associated with the information overload of the user, preferably just in time. Based on a systematic classification of intelligent information agents, this paper presents an overview of the basic key enabling technologies needed to build such agents, and respective examples of information agent systems currently deployed on the Internet.

...read moreread less

226 citations

Journal Article•10.1016/S0169-023X(01)00034-9•

A reference model for team-enabled workflow management systems

[...]

W.M.P. van der Aalst¹, Akhil Kumar²•Institutions (2)

Eindhoven University of Technology¹, Bell Labs²

1 Sep 2001

TL;DR: This paper extends the traditional organizational meta model with teams and proposes a team-enabled workflow reference model and uses object constraint language (OCL) to express constraints with respect to the distribution of work to teams.

...read moreread less

Abstract: Today's workflow systems assume that each work item is executed by a single worker. From the viewpoint of the system, a worker with the proper qualifications selects a work item, executes the associated work, and reports the result. There is usually no support for teams, i.e., groups of people collaborating by jointly executing work items (e.g., the program committee of a conference, the management team of a company, a working group, and the board of directors). In this paper, we propose the addition of a team concept to today's workflow management systems. Clearly, this involves a marriage of workflow and groupware technology. To shed light on the introduction of teams, we extend the traditional organizational meta model with teams and propose a team-enabled workflow reference model. For this reference model and to express constraints with respect to the distribution of work to teams, we use object constraint language (OCL).

...read moreread less

127 citations

Journal Article•10.1016/S0169-023X(00)00048-3•

How to structure and access XML documents with ontologies

[...]

Michael Erdmann, Rudi Studer

1 Mar 2001

TL;DR: This work uses ontologies to derive a canonical structure, i.e. a DTD, to access sets of distributed XML documents on a conceptual level to lead to applications providing a broad range of high quality information.

...read moreread less

Abstract: Currently dozens of XML-based applications exist or are under development. Many of them offer DTDs that define the structure of actual XML documents. Access to these documents relies on special purpose applications or on query languages that are closely tied to the document structures. Our approach uses ontologies to derive a canonical structure, i.e. a DTD, to access sets of distributed XML documents on a conceptual level.We will show how the combination of conceptual modeling, inheritance, and inference mechanisms on the one hand with the popularity, simplicity, and flexibility of XML on the other hand leads to applications providing a broad range of high quality information.

...read moreread less

121 citations

Journal Article•10.1016/S0169-023X(01)00042-8•

Active data warehouses: complementing OLAP with analysis rules

[...]

Thomas Thalhammer, Michael Schrefl, Mukesh K. Mohania¹•Institutions (1)

IBM¹

1 Dec 2001

TL;DR: Today's data warehouse and OLAP systems offer little support to automatize decision tasks that occur frequently and for which well-established decision procedures are available, so functionality can be provided by extending the conventional data warehouse architecture with analysis rules.

...read moreread less

Abstract: Conventional data warehouses are passive. All tasks related to analysing data and making decisions must be carried out manually by analysts. Today's data warehouse and OLAP systems offer little support to automatize decision tasks that occur frequently and for which well-established decision procedures are available. Such a functionality can be provided by extending the conventional data warehouse architecture with analysis rules, which mimic the work of an analyst during decision making. Analysis rules extend the basic event/condition/action (ECA) rule structure with mechanisms to analyse data multidimensionally and to make decisions. The resulting architecture is called active data warehouse.

...read moreread less

110 citations

Journal Article•10.1016/S0169-023X(01)00024-6•

Lying versus refusal for known potential secrets

[...]

Joachim Biskup¹, Piero A. Bonatti²•Institutions (2)

Technical University of Dortmund¹, University of Milan²

1 Aug 2001

TL;DR: It is proved formally that, in general, lying and refusal are incomparable in many respects, but, under fairly natural assumptions, lies and refusals lead to surprisingly similar behaviors and convey exactly the same information to the user.

...read moreread less

Abstract: Security policies and the corresponding enforcement mechanisms may have to deal with the logical consequences of the data encoded in information systems. Users may apply background knowledge about the application domain and about the system to infer more information than what is explicitly returned as answers to their queries. Some of the approaches to dealing with such a scenario are dynamic . For each query, the correct answer is first judged by some censor and then – if necessary – appropriately modified to preserve security. In this paper we contribute to the formal study of such approaches by extending to the case of known potential secrets the comparison of the two possible answer modifications, namely, lying and refusal . First, we explicitly define the security requirements. Second, we extend to such requirements a previous results on security preservation using lies. Then we introduce a variant of the refusal-based approach, suitable for potential secrets. Finally, we extensively analyze and compare the two approaches. We prove formally that, in general, they are incomparable in many respects, but, under fairly natural assumptions, lies and refusals lead to surprisingly similar behaviors and convey exactly the same information to the user. The latter result leads to a fundamental new insight on the relative benefits of the two approaches.

...read moreread less

83 citations

Journal Article•10.1016/S0169-023X(01)00028-3•

CPI: constraints-preserving inlining algorithm for mapping XML DTD to relational schema

[...]

Dongwon Lee¹, Wesley W. Chu²•Institutions (2)

Penn State College of Information Sciences and Technology¹, University of California, Los Angeles²

1 Oct 2001

TL;DR: This paper presents the semantic knowledge that needs to be captured during transformation to ensure a correct relational schema and shows an algorithm that can derive such semantic knowledge from a given XML Document Type Definition and preserve it as semantic constraints in relational database terms.

...read moreread less

Abstract: As Extensible Markup Language (XML) is emerging as the data format of the Internet era, there are increasing needs to efficiently store and query XML data. One path to this goal is transforming XML data into relational format in order to use relational database technology. Although several transformation algorithms exist, they are incomplete in the sense that they focus only on structural aspects and ignore semantic aspects. In this paper, we present the semantic knowledge that needs to be captured during transformation to ensure a correct relational schema. Further, we show an algorithm that can (1) derive such semantic knowledge from a given XML Document Type Definition (DTD) and (2) preserve the knowledge by representing it as semantic constraints in relational database terms. By combining existing transformation algorithms and our constraints-preserving algorithm, one can transform XML DTD to relational schema where correct semantics and behaviors are guaranteed by the preserved constraints. Experimental results are also presented.

...read moreread less

79 citations

Journal Article•10.1016/S0169-023X(01)00041-6•

View selection for designing the global data warehouse

[...]

Dimitri Theodoratos¹, Spyros Ligoudistianos¹, Timos Sellis¹•Institutions (1)

National Technical University of Athens¹

1 Dec 2001

TL;DR: A general method is provided that, given a set of select-project-join queries to be satisfied by the DW, generates sets of materialized views that satisfy all the input queries.

...read moreread less

Abstract: A global data warehouse (DW) integrates data from multiple distributed heterogeneous databases and other information sources. A global DW can be abstractly seen as a set of materialized views. The selection of views for materialization in a DW is an important decision in the design of a DW. Current commercial products do not provide tools for automatic DW design. We provide a general method that, given a set of select-project-join queries to be satisfied by the DW, generates sets of materialized views that satisfy all the input queries. This process is complex since `common subexpressions' between the queries need to be detected and exploited. Our method is then applied to solve the problem of selecting such a materialized view set that fits in the space allocated to the DW for materialization and minimizes the combined overall query evaluation and view maintenance cost. We design algorithms which are implemented and we report on their experimental evaluation.

...read moreread less

71 citations

Journal Article•10.1016/S0169-023X(01)00008-8•

Mining patterns from graph traversals

[...]

Alexandros Nanopoulos, Yannis Manolopoulos¹•Institutions (1)

University of Cyprus¹

1 Jun 2001

TL;DR: The problem of finding traversal patterns from collections of frequently occurring access sequences is examined and three algorithms, one which is level-wise with respect to the lengths of the patterns and two which are not are presented.

...read moreread less

Abstract: In data models that have graph representations, users navigate following the links of the graph structure. Conducting data mining on collected information about user accesses in such models, involves the determination of frequently occurring access sequences. In this paper, the problem of finding traversal patterns from such collections is examined. The determination of patterns is based on the graph structure of the model. For this purpose, three algorithms, one which is level-wise with respect to the lengths of the patterns and two which are not are presented. Additionally, we consider the fact that accesses within patterns may be interleaved with random accesses due to navigational purposes. The definition of the pattern type generalizes existing ones in order to take into account this fact. The performance of all algorithms and their sensitivity to several parameters is examined experimentally.

...read moreread less

71 citations

Journal Article•10.1016/S0169-023X(00)00034-3•

Multiple abstraction levels in modelling product structures

[...]

Tomi Männistö¹, Hannu Peltonen¹, Timo Soininen¹, Reijo Sulonen¹•Institutions (1)

Helsinki University of Technology¹

1 Jan 2001

TL;DR: A novel mechanism based on generic models of product individuals organised into a specialisation hierarchy to support multiple abstraction levels is defined and a set of transformation operations on models is defined.

...read moreread less

Abstract: The need for product customisation is driving industrial companies towards a very large product variety, which affects many functions of a company, including the after-sales. Systematic maintenance records of very different product individuals cannot be kept without an abstract view to the population of delivered products. However, the older the product individual, the less systematically recorded information there usually is about it. We defined a novel mechanism based on generic models of product individuals organised into a specialisation hierarchy to support multiple abstraction levels. For creating such hierarchies, we defined a set of transformation operations on models.

...read moreread less

55 citations

Journal Article•10.1016/S0169-023X(01)00007-6•

Materialized view selection under the maintenance time constraint

[...]

Weifa Liang¹, Hui Wang², Maria E. Orlowska²•Institutions (2)

Australian National University¹, University of Queensland²

1 May 2001

TL;DR: The view selection problem under the maintenance time constraint is investigated and two efficient, heuristic algorithms for the problem are proposed to reduce the problem to some well-solved optimization problems.

...read moreread less

Abstract: A data warehouse is a data repository which collects and maintains a large amount of data from multiple distributed, autonomous and possibly heterogeneous data sources. Often the data is stored in the form of materialized views in order to provide fast access to the integrated data. One of the most important decisions in designing a data warehouse is the selection of views for materialization. The objective is to select an appropriate set of views that minimizes the total query response time with the constraint that the total maintenance time for these materialized views is within a given bound. This view selection problem is totally different from the view selection problem under the disk space constraint. In this paper the view selection problem under the maintenance time constraint is investigated. Two efficient, heuristic algorithms for the problem are proposed. The key to devising the proposed algorithms is to define good heuristic functions and to reduce the problem to some well-solved optimization problems. As a result, an approximate solution of the known optimization problem will give a feasible solution of the original problem. (C) 2001 Elsevier Science B.V. All rights reserved.

...read moreread less

Journal Article•10.1016/S0169-023X(01)00017-9•

A smart web query method for semantic retrieval of web data

[...]

Roger H. L. Chiang¹, Cecil Eng Huang Chua², Veda C. Storey²•Institutions (2)

University of Cincinnati¹, Georgia State University²

1 Jul 2001

TL;DR: This research proposes a smart web query (SWQ) method for the semantic retrieval of web data that uses domain semantics represented as context ontologies to specify and formulate appropriate web queries to search and relies on semantic search filters to identify and rank relevant web pages semi-automatically.

...read moreread less

Abstract: The efficient query and extraction of web data is often difficult, because web data does not conform to any data organization standard. In addition, the development of web search technology is still at a relatively early stage. Search engines provide only primitive data query capabilities, and require a detailed syntactic specification to retrieve relevant data. Furthermore, web data exists in a myriad of formats including PDF documents, images, and sound clips that are difficult to be queried. This research proposes a smart web query (SWQ) method for the semantic retrieval of web data. The SWQ method uses domain semantics represented as context ontologies to specify and formulate appropriate web queries to search. This method also relies on semantic search filters to identify and rank relevant web pages semi-automatically. Unlike traditional ontologies that are structured in a hierarchy, terms and their relationships that pertain to a particular domain are organized with a flexible structure by the context ontologies. An SWQ engine is being developed to test the proposed method. Financial trading (e.g. stocks, bonds, unit trusts) is adapted as an example domain (i.e., context) to test and validate the SWQ method and engine.

...read moreread less

Journal Article•10.1016/S0169-023X(00)00052-5•

Thesaurus construction through knowledge representation

[...]

Sean Bechhofer¹, Carole Goble¹•Institutions (1)

University of Manchester¹

1 Apr 2001

TL;DR: A Description Logic scheme is described – a Description Logic (DL) – and it is shown through an example how a DL can play a part in the classification construction process, aiding in the production of coherent hierarchies and ensuring that the relationships represented in a thesaurus are sensible.

...read moreread less

Abstract: Semantic metadata describing subject content plays a vital role in supporting indexing and retrieval in Digital Libraries. Mechanisms used to deliver this metadata include keyword collections, thesauri and classifications. Constructing a large thesaurus, however, is a difficult process which can be facilitated through the application of knowledge representation techniques developed for managing and reasoning about concepts. We describe such a scheme – a Description Logic (DL) – and show through an example how a DL can play a part in the classification construction process, aiding in the production of coherent hierarchies and ensuring that the relationships represented in a thesaurus are sensible.

...read moreread less

Journal Article•10.1016/S0169-023X(00)00043-4•

View integration of behavior in object-oriented databases

[...]

Günter Preuner¹, Stefan Conrad², Michael Schrefl³•Institutions (3)

Johannes Kepler University of Linz¹, Ludwig Maximilian University of Munich², University of South Australia³

1 Feb 2001

TL;DR: An approach to integrate object life-cycles in object-oriented database schemas using a notation based on Petri nets is presented and different methods of integration depending on identified semantic correspondences between the object types in the views are introduced.

...read moreread less

Abstract: Database schemas are often not defined by a single person but by several future users of the database who define their possibly different views on the proposed database schema. These views are collected and integrated during the subsequent design step of view integration . In object-oriented databases, view integration must handle the integration of the structure and the behavior of objects. Whereas integration of object structure has been treated in the realm of semantic data models in the past, integration of object behavior has received little attention so far. Behavior is usually defined at two levels of detail: by activities, which correspond to methods in object-oriented languages, and by object life-cycles, which represent the overall behavior of objects during their life time. This paper presents an approach to integrate object life-cycles in object-oriented database schemas using a notation based on Petri nets. We will introduce different methods of integration depending on identified semantic correspondences between the object types in the views.

...read moreread less

Journal Article•10.1016/S0169-023X(00)00027-6•

Towards the building of a dense-region-based OLAP system

[...]

David W. Cheung¹, Bo Zhou², Ben Kao¹, Hu Kan³, Sau Dan Lee¹ - Show less +1 more•Institutions (3)

University of Hong Kong¹, Zhejiang University², Tsinghua University³

1 Jan 2001

TL;DR: This paper defines the dense region location problem as an optimization problem and develops a chunk scanning algorithm to compute dense regions, and proves a lower bound on the accuracy of the dense regions computed.

...read moreread less

Abstract: On-line analytical processing (OLAP) has become a very useful tool in decision support systems built on data warehouses. Relational OLAP (ROLAP) and multidimensional OLAP (MOLAP) are two popular approaches for building OLAP systems. These two approaches have very different performance characteristics: MOLAP has good query performance but bad space efficiency, while ROLAP can be built on mature RDBMS technology but it needs sizable indices to support it. Many data warehouses contain many small clustered multidimensional data ( dense regions ), with sparse points scattered around in the rest of the space. For these databases, we propose that the dense regions be located and separated from the sparse points. The dense regions can subsequently be represented by small MOLAPs, while the sparse points are put in a ROLAP table. Thus the MOLAP and ROLAP approaches can be integrated in one structure to build a high performance and space efficient dense-region-based data cube. In this paper, we define the dense region location problem as an optimization problem and develop a chunk scanning algorithm to compute dense regions. We prove a lower bound on the accuracy of the dense regions computed. Also, we analyze the sensitivity of the accuracy on user inputs. Finally, extensive experiments are performed to study the efficiency and accuracy of the proposed algorithm.

...read moreread less

Journal Article•10.1016/S0169-023X(01)00026-X•

Performance evaluation of the object-relational transformation methodology

[...]

Johanna Wenny Rahayu¹, Elizabeth Chang¹, Tharam S. Dillon¹, David Taniar²•Institutions (2)

La Trobe University¹, Monash University²

1 Sep 2001

TL;DR: The results of the evaluation show that the performance of the RDB implementation transferred from an OO conceptual model using the object-relational transformation methodology is better than the relational implementation using a conventional relational modelling.

...read moreread less

Abstract: The emergence of the object-oriented (OO) methodology has shown its capabilities in modelling the real world better than the earlier relational methodology. However, object-oriented databases (OODBs) are still considered immature in comparison with relational databases (RDBs) which have existed for many years. RDBs still continue to dominate the implementation of databases constituting more than 90% of all database implementations [28] . It was felt worthwhile to exploit the great modelling power of OO methodology, while still facilitating relational implementations. These reasons have led us to develop an object-relational transformation methodology [20] , [21] , [22] , [23] , [24] , [25] which allows us to use the OO methodology for data modelling and to transform it into a relational logical model for implementation in relational database management systems (RDBMSs). The main purpose of this paper is to present a performance evaluation of the transformation methodology. The evaluation covers I/O cost models of different types of queries. The type of evaluation is basically comparison-based, in which the performance of SQL operations upon a set of tables derived from the relational data model is compared with the tables derived from the OO data model using the transformation methodology. The results of the evaluation show that the performance of the RDB implementation transferred from an OO conceptual model using our object-relational transformation methodology is better than the relational implementation using a conventional relational modelling. Moreover, in many cases, the relational modelling is not applicable since it cannot capture the design semantics particularly relating to collection types. Our object-relational methodology solves this problem.

...read moreread less

Journal Article•10.1016/S0169-023X(00)00044-6•

Selecting and materializing horizontally partitioned warehouse views

[...]

Christie I. Ezeife¹•Institutions (1)

University of Windsor¹

1 Feb 2001

TL;DR: A method for selecting and materializing views is proposed, which selects and horizontally fragments a view, recomputes the size of the stored partitioned view while deciding further views to select.

...read moreread less

Abstract: Data warehouse views typically store large aggregate tables based on a subset of dimension attributes of the main data warehouse fact table. Aggregate views can be stored as 2 n subviews of a data cube with n attributes. Methods have been proposed for selecting only some of the data cube views to materialize in order to speed up query response time, accommodate storage space constraint and reduce warehouse maintenance cost. This paper proposes a method for selecting and materializing views, which selects and horizontally fragments a view, recomputes the size of the stored partitioned view while deciding further views to select. ” 2001 Elsevier Science B.V. All rights reserved.

...read moreread less

Journal Article•10.1016/S0169-023X(01)00006-4•

Cover story management

[...]

Frédéric Cuppens, Alban Gabillon¹•Institutions (1)

University of the South, Toulon-Var¹

1 May 2001

TL;DR: This paper defines a new technique to manage cover stories and proposes a formal representation of a multilevel database containing cover stories that can be interpreted for any kind of database (e.g., relational, object-oriented, etc.).

...read moreread less

Abstract: In a multilevel database, cover stories are usually managed using the ambiguous technique of polyinstantiation. In this paper, we define a new technique to manage cover stories and propose a formal representation of a multilevel database containing cover stories. Our model aims to be a generic model, that is, it can be interpreted for any kind of database (e.g. relational, object- oriented etc). We then consider the problem of updating a multilevel database containing cover stories managed with our technique.

...read moreread less

Journal Article•10.1016/S0169-023X(01)00002-7•

Prefetch policies for large objects in a web-enabled GIS application

[...]

Dongjoo Park¹, Hyoung-Joo Kim¹•Institutions (1)

Seoul National University¹

1 Apr 2001

TL;DR: This paper proposes several prefetch policies, especially the Hillbert curve-based ones which can alleviate user response time under the assumption that user callback access pattern has spatial locality, and shows that the prefetch strategies based on theHillbert curve achieve higher efficiency than the other naive or no prefetch ones.

...read moreread less

Abstract: In a Web-enabled geographic information system (GIS) application, it would be possible for users to navigate existing spatial objects (e.g., points, lines) or spatial query results containing large objects (e.g., raster images, Web documents) on the Web browser. For efficiency, relatively `light' spatial objects exist on the Web browser, while `heavy' real information like large objects resides in the remote server. Only when the users callback real information for a certain spatial object, the server transmits it to the browser. In this paper, we propose several prefetch policies, especially the Hillbert curve-based ones which can alleviate user response time under the assumption that user callback access pattern has spatial locality. We conducted diverse experiments to show that our prefetch strategies based on the Hillbert curve achieve higher efficiency than the other naive or no prefetch ones.

...read moreread less

Journal Article•10.1016/S0169-023X(00)00035-5•

Modeling relationships in object-relational databases

[...]

Christian Soutou¹•Institutions (1)

University of Toulouse¹

1 Jan 2001

TL;DR: An approach for designing an object-relational database that inspects each case of semantic relationships and proposes some solutions for implementation, and makes a comparison with the relational data model.

...read moreread less

Abstract: We propose an approach for designing an object-relational database. We inspect each case of semantic relationships (one-to-one, one-to-many, many-to-many and n -ary). For each of these cases we list the different solutions for implementation and propose some solutions. We also make a comparison with the relational data model. Our results can be applied to current RDBMS including object extensions (Oracle8, DB2-IBM, Informix…).

...read moreread less

Journal Article•10.1016/S0169-023X(00)00050-1•

Landscaping the information space of large multi-database networks

[...]

Mike P. Papazoglou¹, Henderik A. Proper, Jian Yang¹•Institutions (1)

Tilburg University¹

1 Mar 2001

TL;DR: A conceptual architecture for the organization information space across collections of component systems in a multi- database network that provides serendipity, exploration and contextualisation support so that users can achieve logical connections between concepts they are familiar with and schema terms employed in multi-database systems is presented.

...read moreread less

Abstract: The promises of network-accessible information are increasingly difficult to achieve. These difficulties are due to a variety of causes, such as, the rapid growth in the volume of network-available information and the increasing complexity, diversity and terminological fluctuations of the different information sources available. This paper presents a conceptual architecture for the organization information space across collections of component systems in a multi-database network that provides serendipity, exploration and contextualisation support so that users can achieve logical connections between concepts they are familiar with and schema terms employed in multi-database systems. Large-scale searching for multi-database schema information is guided by a combination of lexical, structural and semantic aspects of schema terms in order to reveal more meaning both about the contents of an information term and about its placement within the distributed information space.

...read moreread less

Journal Article•10.1016/S0169-023X(01)00037-4•

Building virtual web views

[...]

Osmar R. Zaïane¹•Institutions (1)

University of Alberta¹

1 Nov 2001

TL;DR: Warehousing the web in this context consists of creating different virtual web views with layered databases of descriptors organized hierarchicly and using a declarative adhoc mining language to locate explicit as well as implicit knowledge from the web warehouse.

...read moreread less

Abstract: We are so used to the ubiquitous World-Wide Web (WWW) that we take it for granted. There is no need to emphasize how dynamic, large, rich, and unstructured, yet important the web is. From researchers and engineers to children and retired elderly, everyone uses the WWW for a variety of needs. A multitude of tools and search engines were developed to find and retrieve resources from the web. However, everyone knows how frustrating the experience with search engines can be. It is very difficult to find, if ever found, relevant information or patterns from within resources on the Internet. The idea presented in this paper is to “warehouse” the web in a structure that would allow efficient information retrieval and knowledge discovery from the Internet. Warehousing the web in this context consists of creating different virtual web views with layered databases of descriptors organized hierarchicly. Using a declarative adhoc mining language, one can find and pinpoint explicit as well as implicit knowledge from the web warehouse.

...read moreread less

Journal Article•10.1016/S0169-023X(01)00031-3•

Decomposing relationship types by pivoting and schema equivalence

[...]

Sven Hartmann¹•Institutions (1)

University of Rostock¹

1 Oct 2001

TL;DR: This work proves the necessity of path cardinality constraints, and gives an appropriate foundation for the notion of pivoting such that decomposing relationship types does no longer require the existence of a given unary functional dependency.

...read moreread less

Abstract: In the relational data model, the problem of data redundancy has been successfully tackled via decomposition. In advanced data models, decomposition by pivoting provides a similar concept. Pivoting has been introduced by Biskup, Menzel and Polle, and used for decomposing relationship types according to a unary non-key functional dependency. Our objective is to study pivoting in the presence of cardinality constraints which are commonly used in semantic data models. For this, we generalize the notion of pivoting such that decomposing relationship types does no longer require the existence of a given unary functional dependency. In order to ensure the equivalence of the given schema and its image under pivoting, the original application-dependent constraints have to be preserved. We discuss this problem for sets of participation and co-occurrence constraints. In particular, we prove the necessity of path cardinality constraints, and give an appropriate foundation for this concept.

...read moreread less

Journal Article•10.1016/S0169-023X(01)00004-0•

An object-oriented database design for improved performance

[...]

Narasimhaiah Gorla¹•Institutions (1)

Wayne State University¹

1 May 2001

TL;DR: A methodology for the design of an efficient storage structure of OODB that minimizes the database operating costs and uses a genetic algorithm to solve the intractable problem of inheritance of instance variables.

...read moreread less

Abstract: Object-oriented databases (OODBs) are known to be rich in functionality but poor in performance. One of the important factors that affect performance is the physical database design. We developed a methodology for the design of an efficient storage structure of OODB that minimizes the database operating costs. The input for our method is the logical OODB schema and set of user transactions of retrieval and update types. The output of our method is the determination of which instance variables should be inherited from direct and indirect superclasses and stored in which subclasses. We used a genetic algorithm (GA) to solve this intractable problem. The methodology was applied on a university database. Compared to previous storage models, the storage model produced with our methodology showed database performance improvement ranging from 26% to 31%, on the average. Our results demonstrate a cost-effective storage structure design that boosts the operating performance of OODBs.

...read moreread less

Journal Article•10.1016/S0169-023X(00)00033-1•

A framework for programming multiversion databases

[...]

Stéphane Gançarski, Geneviève Jomier¹•Institutions (1)

Paris Dauphine University¹

1 Jan 2001

TL;DR: This paper describes operations according to the DataBase Version (DBV) model, which allows an efficient management of as many versions as needed of the real world entities in multiversion databases.

...read moreread less

Abstract: Multiversion databases allow to represent in a database several states, or versions, of the real world entities. To take into account the new dimension introduced by versioning, new operations must be added to conventional database programming languages. In this paper, we describe such operations according to the DataBase Version (DBV) model, which allows an efficient management of as many versions as needed. Operations are first presented intuitively, then formal definitions of their syntax and their semantics is given. The work presented is considered as a syntactical framework for the development of sophisticated tools for design applications and configuration management. Special attention is paid to operations on complex object versions.

...read moreread less

Journal Article•10.1016/S0169-023X(01)00011-8•

Verification and validation of Bayesian knowledge-bases

[...]

Jr. Eugene Santos¹•Institutions (1)

University of Connecticut¹

1 Jun 2001

TL;DR: The peski project is examined, which is concerned with assisting a human expert to build knowledge-based systems under uncertainty, and how verification and validation are currently achieved in peski is examined.

...read moreread less

Abstract: Knowledge-base V&V primarily addresses the question: “Does my knowledge-base contain the right answer and can I arrive at it?” One of the main goals of our work is to properly encapsulate the knowledge representation and allow the expert to work with manageable-sized chunks of the knowledge-base. This work develops a new methodology for the verification and validation of Bayesian knowledge-bases that assists in constructing and testing such knowledge-bases. Assistance takes the form of ensuring that the knowledge is syntactically correct, correcting “imperfect” knowledge, and also identifying when the current knowledge-base is insufficient as well as suggesting ways to resolve this insufficiency. The basis of our approach is the use of probabilistic network models of knowledge. This provides a framework for formally defining and working on the problems of uncertainty in the knowledge-base. In this paper, we examine the peski project which is concerned with assisting a human expert to build knowledge-based systems under uncertainty. We focus on how verification and validation are currently achieved in peski .

...read moreread less

Journal Article•10.1016/S0169-023X(00)00042-2•

A formal basis for an abbreviated concept-based query language

[...]

Vesper Owei¹, Shamkant B. Navathe²•Institutions (2)

University of Illinois at Chicago¹, Georgia Institute of Technology²

1 Feb 2001

TL;DR: It is shown that, while CQL is easy to use and user-friendly, it is nonetheless more than first-order complete.

...read moreread less

Abstract: Concept-based query languages allow users to specify queries directly against conceptual schemas. The primary goal of their development is ease-of-use and user-friendliness. However, existing concept-based query languages require the end-user to explicitly specify query paths in totality, thereby rendering such systems not as easy to use and user-friendly as they could be. The conceptual query language (CQL) discussed in this paper also allows end-users to specify queries directly against the conceptual schemas of database applications, using concepts and constructs that are native to and exist on the schemas. Unlike other existing concept-based query languages, however, CQL queries are abbreviated, i.e., the entire path of a query does not have to be specified. CQL is, therefore, an abbreviated concept-based query language. CQL is developed with the aim of combining the ease-of-use and user-friendliness of concept-based languages with the power of formal languages. It does not require end-users to be familiar with the structure and organization of the application database, but only with the content. Therefore, it makes minimal demands on end-users' cognitive knowledge of database technology without sacrificing expressive power. In this paper, the formal semantics and the theoretical basis of CQL are presented. It is shown that, while CQL is easy to use and user-friendly, it is nonetheless more than first-order complete. A contribution of this study is the use of the semantic roles played by entities in their associations with other entities to support abbreviated conceptual queries. Although only mentioned here in passing, a prototype of CQL has been implemented as a front-end to a relational database manager.

...read moreread less

Journal Article•10.1016/S0169-023X(01)00014-3•

WordNet++: a lexicon for the color-X-method

[...]

Frank Dehne¹, Ans A. G. Steuten¹, Reind P. van de Riet¹•Institutions (1)

VU University Amsterdam¹

1 Jul 2001

TL;DR: An extension of W ord N et is presented, which contains a number of special types of relationships that are not available in W Ord N et and can be used in a special C ase -tool for supporting C olor -X.

...read moreread less

Abstract: In this article we discuss what kind of information can be obtained from W ord N et and what kind of information should be added to W ord N et in order to make it better suitable for the support of the C olor -X method. We will present an extension of W ord N et (called W ord N et ++), which contains a number of special types of relationships that are not available in W ord N et . Additionally, W ord N et ++ is instantiated with knowledge about some particular domain. Also, we will show some example scenarios for how W ord N et ++ can be used for verification of C olor -X-models and we will show how this lexicon can be used in a special C ase -tool for supporting C olor -X.

...read moreread less

Journal Article•10.1016/S0169-023X(01)00015-5•

Traceability between models and texts through terminology

[...]

Farid Cerbah¹, Jérôme Euzenat²•Institutions (2)

Dassault Aviation¹, French Institute for Research in Computer Science and Automation²

1 Jul 2001

TL;DR: This work pretend that inserting a terminology between informal textual documents and their formalization can help to serve both of these goals.

...read moreread less

Abstract: Modeling often concerns the translation of informal texts into formal representations. This translation process requires support for itself and for its traceability. We pretend that inserting a terminology between informal textual documents and their formalization can help to serve both of these goals. Modern terminology extraction tools support the formalization process by using terms as a first sketch of formalized concepts. Moreover, the terms can be employed for linking the concepts and the textual sources. They act as a powerful navigation structure. This is exemplified through the presentation of a fully implemented system.

...read moreread less

Journal Article•10.1016/S0169-023X(01)00029-5•

ROVER: flexible yet consistent evolution of relationships

[...]

Kajal T. Claypool¹, Elke A. Rundensteiner¹, George T. Heineman¹•Institutions (1)

Worcester Polytechnic Institute¹

1 Oct 2001

TL;DR: This paper is the first to propose comprehensive support for relationship evolution during schema evolution, and presents an approach that de-couples the constraints from the schema evolution code, thereby enabling their update without any re-coding effort.

...read moreread less

Abstract: Relationships have been repeatedly identified as an important object-oriented modeling construct. Most emerging modeling standards such as the object database management group (ODMG) object model and UML have some support for relationships. However object-oriented database (OODB) systems have largely ignored the existence of relationships during schema evolution. We are the first to propose comprehensive support for relationship evolution. A complete schema evolution facility for any OODB system must provide primitives to manipulate all object model constructs, and maintenance strategies for the structural and referential integrity of the database under such evolution. We propose a set of basic evolution primitives for relationships as well as a compound set of changes that can be applied to the same. However, given the myriad of possible change semantics a user may desire in the future, any pre-defined set is not sufficient. Rather we present a flexible schema evolution framework that allows the user to define new relationship transformations as well as to extend existing ones. Addressing the second problem, namely of updating schema evolution primitives to conform to the new set of invariants, can be a very expensive re-engineering effort. In this paper we present an approach that de-couples the constraints from the schema evolution code, thereby enabling their update without any re-coding effort. We also present an approach that can be used to verify the correctness of these complex evolution operations using the de-coupled constraints.

...read moreread less