Top 55 papers presented at Data and Knowledge Engineering in 2004

Showing papers presented at "Data and Knowledge Engineering in 2004"

Journal Article•10.1016/J.DATAK.2004.03.010•

AGENT WORK: a workflow system supporting rule-based workflow adaptation

[...]

Robert Müller¹, Ulrike Greiner¹, Erhard Rahm¹•Institutions (1)

1 Nov 2004

TL;DR: AGENTWORK as discussed by the authors is a workflow management system supporting automated workflow adaptations in a comprehensive way, which uses temporal estimates to determine which remaining parts of running workflows are affected by an exception and is able to predictively perform suitable adaptations.

...read moreread less

Abstract: Current workflow management systems still lack support for dynamic and automatic workflow adaptations. However, this functionality is a major requirement for next-generation workflow systems to provide sufficient flexibility to cope with unexpected failure events. We present the concepts and implementation of AGENTWORK, a workflow management system supporting automated workflow adaptations in a comprehensive way. A rule-based approach is followed to specify exceptions and necessary workflow adaptations. AGENTWORK uses temporal estimates to determine which remaining parts of running workflows are affected by an exception and is able to predictively perform suitable adaptations. This helps to ensure that necessary adaptations are performed in time with minimal user interaction which is especially valuable in complex applications such as for medical treatments.

...read moreread less

358 citations

Journal Article•10.1016/S0169-023X(03)00103-4•

Building and maintaining ontologies: a set of algorithms

[...]

Nadira Lammari¹, Elisabeth Métais¹•Institutions (1)

Conservatoire national des arts et métiers¹

1 Feb 2004

TL;DR: Algorithms are provided to obtain a normalized hierarchy starting either from concepts or from instances using Boolean functions, and a way to give synthetic views of the hierarchy is provided.

...read moreread less

Abstract: "Is_A" links are the core component of all ontologies and are organized into "hierarchies of concepts". In this paper we will first address the problem of an automatic help to build sound hierarchies. Dependencies called "existence constraints" are the foundation for the definition of a "normalized" hierarchy of concepts. In the first part of the paper algorithms are provided to obtain a normalized hierarchy starting either from concepts or from instances using Boolean functions. The second part of the paper is devoted to the hierarchy maintenance: automatically inserting, merging or removing pieces of knowledge. We also provide a way to give synthetic views of the hierarchy.

...read moreread less

117 citations

Journal Article•10.1016/J.DATAK.2003.06.002•

Pushing the envelope: challenges in a frame-based representation of human anatomy

[...]

Natalya F. Noy¹, Mark A. Musen¹, José L. V. Mejino², Cornelius Rosse²•Institutions (2)

Stanford University¹, University of Washington²

1 Mar 2004

TL;DR: This work shows that traditional frame-based techniques such as is-a hierarchies, slots (roles) and role restrictions are not sufficient for a comprehensive model of this domain, and posit that even though the modeling structure imposed byframe-based systems may sometimes lead to complicated solutions, it is still worthwhile to use frame- based representation for very large-scale projects such as this one.

...read moreread less

Abstract: One of the main threads in the history of knowledge-representation formalisms is the trade-off between the expressiveness of first-order logic on the one hand and the tractability and ease-of-use of frame-based systems on the other hand. Frame-based systems provide intuitive, cognitively easy-to-understand, and scalable means for modeling a domain. However, when a domain model is particularly complex, frame-based representation may lead to complicated and sometimes awkward solutions. We have encountered such problems when developing the Digital Anatomist Foundational Model, an ontology aimed at representing comprehensively the physical organization of the human body. We show that traditional frame-based techniques such as is-a hierarchies, slots (roles) and role restrictions are not sufficient for a comprehensive model of this domain. The diverse modeling challenges and problems in this project required us to use such knowledge-representation techniques as reified relations, metaclasses and a metaclass hierarchy, different propagation patterns for template and own slots, and so on. We posit that even though the modeling structure imposed by frame-based systems may sometimes lead to complicated solutions, it is still worthwhile to use frame-based representation for very large-scale projects such as this one.

...read moreread less

101 citations

Journal Article•10.1016/J.DATAK.2003.08.007•

Algorithms for processing K -closest-pair queries in spatial databases

[...]

Antonio Corral¹, Yannis Manolopoulos², Yannis Theodoridis³, Michael Vassilakopoulos⁴•Institutions (4)

University of Almería¹, Aristotle University of Thessaloniki², University of Piraeus³, American Hotel & Lodging Educational Institute⁴

1 Apr 2004

TL;DR: A pruning heuristic and two updating strategies for minimizing the pruning distance are proposed and used in the design of three non-incremental branch-and-bound algorithms for K-CPQ between spatial objects stored in two R-trees.

...read moreread less

Abstract: This paper addresses the problem of finding the K closest pairs between two spatial datasets (the so-called, K closest pairs query, K-CPQ), where each dataset is stored in an R-tree. There are two different techniques for solving this kind of distance-based query. The first technique is the incremental approach, which returns the output elements one-by-one in ascending order of distance. The second one is the nonincremental alternative, which returns the K elements of the result all together at the end of the algorithm. In this paper, based on distance functions between two MBRs in the multidimensional Euclidean space, we propose a pruning heuristic and two updating strategies for minimizing the pruning distance, and use them in the design of three non-incremental branch-and-bound algorithms for K-CPQ between spatial objects stored in two R-trees. Two of those approaches are recursive following a Depth-First searching strategy and one is iterative obeying a Best-First traversal policy. The plane-sweep method and the search ordering are used as optimization techniques for improving the naive approaches. Besides, a number of interesting extensions of the K-CPQ (K-Self-CPQ, Semi-CPQ, K-FPQ (the K-farthest pairs query), etc.) are discussed. An extensive performance study is also presented. This study is based on experiments performed with real datasets. A wide range of values for the basic parameters affecting the performance of the algorithms is examined in order to designate the most efficient algorithm for each setting of parameter values. Finally, an experimental study of the behavior of the proposed K-CPQ branch-and-bound algorithms in terms of scalability of the dataset size and the K value is also included.

...read moreread less

92 citations

Journal Article•10.1016/J.DATAK.2003.03.001•

Transaction policies for service-oriented computing

[...]

Stefan Tai¹, Thomas Mikalsen¹, Eric Wohlstadter², Nirmit Desai³, Isabelle Rouvellou¹ - Show less +1 more•Institutions (3)

IBM¹, University of California, Davis², North Carolina State University³

1 Oct 2004

TL;DR: This paper argues for the use of declarative policy assertions to advertise and match support for different transaction styles, and introduces the concept of and system support for transaction coupling modes as the policy-based contracts guiding transactional business process execution.

...read moreread less

Abstract: Service-oriented computing is emerging as a distributed computing model where autonomous services interact with each other using standard Internet technology. In addition to the application-specific functions that services provide (different) services may also support (different) sets of protocols and formats addressing extra-functional concerns such as transaction processing and reliable messaging. This raises the need for services to complement their functional service descriptions with descriptions of extra-functional capabilities, requirements, and/or preferences, which must be matched and enforced for service interactions. In this paper, we address the problem of transactional coordination in service-oriented computing. We argue for the use of declarative policy assertions to advertise and match support for different transaction styles (direct transaction processing, queued transaction processing, and compensation-based transaction processing), and introduce the concept of and system support for transaction coupling modes as the policy-based contracts guiding transactional business process execution. We focus on concrete, protocol-specific policies that apply to relevant Web services specifications. Using transaction policies and our middleware system, we are able to support a reliable SOC environment.

...read moreread less

77 citations

Journal Article•10.1016/S0169-023X(03)00105-8•

Content-based text querying with ontological descriptors

[...]

Troels Andreasen¹, Per Anker Jensen², Jørgen Fischer Nilsson³, Patrizia Paggio, Bolette Sandford Pedersen, Hanne Erdman Thomsen⁴ - Show less +2 more•Institutions (4)

Roskilde University¹, University of Southern Denmark², Technical University of Denmark³, Copenhagen Business School⁴

1 Feb 2004

TL;DR: A method and a system for content-based querying of texts based on the availability of an ontology for the concepts in the text domain and the extraction of conceptual content of noun phrases into descriptors forming an integral part of the ontology are described.

...read moreread less

Abstract: This paper describes a method and a system for content-based querying of texts based on the availability of an ontology for the concepts in the text domain. A key principle in the system is the extraction of conceptual content of noun phrases into descriptors forming an integral part of the ontology.The retrieval of text passages rests on matching descriptors from the text against descriptors from the noun phrases in the query. The match does not need to be exact but is mediated by the ontology invoking in particular taxonomic reasoning with sub- and super-concepts. The paper also reports on a prototype implementation of the system.

...read moreread less

76 citations

Journal Article•10.1016/J.DATAK.2004.03.006•

An ER EC framework for e-contract modeling, enactment and monitoring

[...]

P. Radha Krishna¹, Kamalakar Karlapalem², Dickson K.W. Chiu•Institutions (2)

Institute for Development and Research in Banking Technology¹, International Institute of Information Technology²

1 Oct 2004

TL;DR: This paper presents an EREC framework for designing e-contract processes, a mechanism that allows modeling, enactment and monitoring that bridges between the XML contract document and Web Services based implementation model of an e- contract.

...read moreread less

Abstract: A contract is an agreement between two or more parties to create mutual business relations or legal obligations. It defines a set of activities to be performed by different parties satisfying a set of terms and conditions (clauses). An e-contract is a contract modeled, specified, executed and enacted (controlled and monitored) by a software system (such as a workflow system). As contracts are complex, their enactment is predominantly established and fulfilled with significant human involvement. This necessitates a comprehensive framework for generic fulfillment of e-contracts. In this paper, we present an EREC framework for designing e-contract processes, a mechanism that allows modeling, enactment and monitoring. This framework centers on the EREC model that bridges between the XML contract document and Web Services based implementation model of an e-contract.

...read moreread less

62 citations

Journal Article•10.1016/S0169-023X(03)00108-3•

Multimedia indexing through multi-source and multi-language information extraction: the MUMIS project

[...]

Horacio Saggion¹, Hamish Cunningham¹, Kalina Bontcheva¹, Diana Maynard¹, Oana Hamza¹, Yorik Wilks¹ - Show less +2 more•Institutions (1)

University of Sheffield¹

1 Feb 2004

TL;DR: This work describes the creation of a composite index from multiple and multi-lingual sources for the Multimedia Indexing and Searching Environment, a project aiming at developing technology to produce formal annotations about essential events in multimedia programme material.

...read moreread less

Abstract: We describe our work on information extraction from multiple sources for the Multimedia Indexing and Searching Environment, a project aiming at developing technology to produce formal annotations about essential events in multimedia programme material. The creation of a composite index from multiple and multi-lingual sources is a unique aspect of this project. The domain chosen for tuning the software components and testing is football. Our information extraction system is based on the use of finite state machinery pipelined with full semantic analysis and discourse interpretation.

...read moreread less

55 citations

Journal Article•10.1016/J.DATAK.2003.09.002•

Clustering classifiers for knowledge discovery from physically distributed databases

[...]

Grigorios Tsoumakas¹, Lefteris Angelis¹, Ioannis Vlahavas¹•Institutions (1)

Aristotle University of Thessaloniki¹

1 Jun 2004

TL;DR: This paper presents an approach for clustering distributed classifiers in order to discover groups of similar classifiers and thus similar databases with respect to a specific classification task and shows that clusters as a pre-processing step for classifier combination enhances the achieved predictive performance of the ensemble.

...read moreread less

Abstract: Most distributed classification approaches view data distribution as a technical issue and combine local models aiming at a single global model. This however, is unsuitable for inherently distributed databases, which are often described by more than one classification models that might differ conceptually. In this paper we present an approach for clustering distributed classifiers in order to discover groups of similar classifiers and thus similar databases with respect to a specific classification task. We also show that clustering distributed classifiers as a pre-processing step for classifier combination enhances the achieved predictive performance of the ensemble.

...read moreread less

50 citations

Journal Article•10.1016/J.DATAK.2003.11.001•

Materialization of fragmented views in multidimensional databases

[...]

Matteo Golfarelli¹, Vittorio Maniezzo¹, Stefano Rizzi¹•Institutions (1)

University of Bologna¹

1 Jun 2004

TL;DR: This paper investigates the benefits of materializing views in vertical fragments and formalizes the fragmentation problem as a 0-1 integer linear programming problem, which is then solved by means of a standard integer programming solver to determine the optimal fragmentation for a given workload.

...read moreread less

Abstract: The most effective technique to enhance performances of multidimensional databases consists in materializing redundant aggregates called views. In the classical approach to materialization, each view includes all and only the measures of the cube it aggregates. In this paper we investigate the benefits of materializing views in vertical fragments, aimed at minimizing the workload response time. We formalize the fragmentation problem as a 0-1 integer linear programming problem, which is then solved by means of a standard integer programming solver to determine the optimal fragmentation for a given workload. Finally, we demonstrate the usefulness of fragmentation by presenting a large set of experimental results based on the TPC-H benchmark.

...read moreread less

37 citations

Journal Article•10.1016/J.DATAK.2003.08.003•

The Tripod spatio-historical data model

[...]

Tony Griffiths¹, Alvaro A. A. Fernandes¹, Norman W. Paton¹, Robert Barr¹•Institutions (1)

University of Manchester¹

1 Apr 2004

TL;DR: A spatio-historical object model based on a specialized mechanism, called a history, for maintaining knowledge about entities that change over time is described and its capabilities in a crime data management application are illustrated.

...read moreread less

Abstract: The storage and analysis of large amounts of time-varying spatial and aspatial data is becoming an important feature of many application domains. This has fuelled the need for spatio-temporal extensions to data models and their associated querying facilities. To date, much of this work has focused on the relational data model, with object data models receiving far less consideration. Where descriptions of such object models do exist, these models fail to fully integrate their spatial, aspatial and temporal dimensions into a uniform and coherent model. In addition, there is currently a lack of systems which build upon these models to produce database architectures that address the broad spectrum of issues related to the delivery of a fully functional spatio-temporal DBMS. This paper presents a foundation for the development of such a system, called Tripod, by describing a spatio-historical object model based on a specialized mechanism, called a history, for maintaining knowledge about entities that change over time. Key features of the resulting model include: (i) consistent representations of primitive spatial and timestamp types; (ii) a component-based design in which spatial, timestamp and historical extensions are formalized incrementally, for subsequent use together or separately; (iii) compatibility with mainstream query processing frameworks for object databases; and (iv) the integration of the spatio-temporal proposal with the ODMG object database standard. The paper presents a comprehensive formal characterization of the model and illustrates its capabilities in a crime data management application. It is also shown how the model can be programmed using an extension to the ODMG language bindings. The model and language bindings have been fully implemented.

...read moreread less

Journal Article•10.1016/J.DATAK.2003.08.006•

Supporting temporal text-containment queries in temporal document databases

[...]

Kjetil Nørvåg¹•Institutions (1)

Norwegian University of Science and Technology¹

1 Apr 2004

TL;DR: This paper describes and discusses different index structures that can improve temporal text-containment querying and shows that even a very simple time-indexing approach can reduce query cost by up to three orders of magnitude.

...read moreread less

Abstract: In temporal document databases, and temporal XML databases, temporal text-containment queries are a potential performance bottleneck. In this paper we describe how to manage documents and index structures in such databases in a way that makes, temporal text-containment querying feasible. We describe and discuss different index structures that can improve such queries. Three of the alternatives have been implemented in the V2 temporal document database system, and the performance of the index structures is studied using temporal web data. The results show that even a very simple time-indexing approach can reduce query cost by up to three orders of magnitude.

...read moreread less

Journal Article•10.1016/S0169-023X(04)00003-5•

Correctness criteria for dynamic changes in workflow systems--a survey q

[...]

Stefanie Rinderle, Manfred Reichert, Peter Dadam

1 Jul 2004

TL;DR: This survey systematically classifies different approaches in the area of adaptive workflows and discusses their strengths and limitations along typical problems related to dynamic WF change.

...read moreread less

Abstract: The capability to dynamically adapt in-progress workflows (WF) is an essential requirement for any workflow management system (WfMS). This fact has been recognized by the WF community for a long time and different approaches in the area of adaptive workflows have been developed so far. This survey systematically classifies these approaches and discusses their strengths and limitations along typical problems related to dynamic WF change. Along this classification we present important criteria for the correct adaptation of running workflows and analyze how actual approaches satisfy them. Furthermore, we provide a detailed comparison of these approaches and sketch important further issues related to dynamic change. � 2004 Elsevier B.V. All rights reserved.

...read moreread less

Journal Article•10.1016/S0169-023X(03)00107-1•

Grammatical specification of domain ontologies

[...]

Troels Andreasen¹, Jørgen Fischer Nilsson²•Institutions (2)

Roskilde University¹, Technical University of Denmark²

1 Feb 2004

TL;DR: The proposed domain specification methodology is applied to ontology-guided content-based information retrieval in text databases and is advanced also as a general purpose methodology for ontological engineering.

...read moreread less

Abstract: This paper presents a formalism supporting the analysis and specification of domain ontologies. The method is founded theoretically on conventional context-free grammars. The use of production rules admits recursive formation of compound categories from given categories subjected to combinability constraints.The proposed domain specification methodology is applied to ontology-guided content-based information retrieval in text databases. It is advanced also as a general purpose methodology for ontological engineering.

...read moreread less

Journal Article•10.1016/J.DATAK.2004.05.002•

Self-monitoring query execution for adaptive query processing

[...]

Anastasios Gounaris¹, Norman W. Paton¹, Alvaro A. A. Fernandes¹, Rizos Sakellariou¹•Institutions (1)

University of Manchester¹

1 Dec 2004

TL;DR: This paper discusses monitoring of query plan execution as a topic in its own right, and advocates an approach based on self-monitoring algebraic operators, which is shown to be generic and independent of any specific adaptation mechanism.

...read moreread less

Abstract: Adaptive query processing generally involves a feedback loop comprising monitoring, assessment and response. So far, individual proposals have tended to group together an approach to monitoring, a means of assessment, and a form of response. However, there are many benefits in decoupling these three phases, and in constructing generic frameworks for each of them. To this end, this paper discusses monitoring of query plan execution as a topic in its own right, and advocates an approach based on self-monitoring algebraic operators. This approach is shown to be generic and independent of any specific adaptation mechanism, easily implementable and portable, sufficiently comprehensive, appropriate for heterogeneous distributed environments, and more importantly, capable of driving on-the-fly adaptations of query plan execution. An experimental evaluation of the overheads and of the quality of the results obtained by monitoring is also presented.

...read moreread less

Journal Article•10.1016/J.DATAK.2004.03.007•

Events as atomic contracts for component integration

[...]

Monique Snoeck¹, Wilfried Lemahieu¹, Frank Goethals¹, Guido Dedene¹, Jacques Vandenbulcke¹ - Show less +1 more•Institutions (1)

Katholieke Universiteit Leuven¹

1 Oct 2004

TL;DR: This paper describes an integration approach based on an event-based coordination paradigm, with particular emphasis on distributed, loosely coupled environments such as web services.

...read moreread less

Abstract: Today many companies rely on third party applications and application services for (part of) their information systems. When applications from different parties are used together, an integration problem arises. Similarly, cross-organisational application integration requires the coordination of distributed processing across several autonomous applications. In this paper, we describe an integration approach based on an event-based coordination paradigm. Interaction is based on atomic units of interaction called "business events". Each business event mirrors some event in the real world that requires the coordination of actions within a number of components. The coordination between applications is achieved by having applications specify preconditions for business events. As a result, a business event becomes a small scale contract between involved applications: each application can insert its own clauses into the contract by specifying preconditions. Moreover, a formal method for contract analysis is proposed, to verify whether the contract is free from contradictions and inconsistencies. Finally, in addition to its contracting aspect, the event-based communication paradigm entails a dispatching and coordination mechanism, which offers the additional advantage of a complete separation of the coordination aspects from the functionality aspects. The paper discusses different alternative architectures for event-based coordination, with particular emphasis on distributed, loosely coupled environments such as web services.

...read moreread less

Journal Article•10.1016/J.DATAK.2003.07.001•

Broadcast program generation for webcasting

[...]

Dimitrios Katsaros¹, Yannis Manolopoulos¹•Institutions (1)

Aristotle University of Thessaloniki¹

1 Apr 2004

TL;DR: A new algorithm, CascadedWebcasting, is proposed, for the generation of Webcasting programs, scalable in terms of the number of items transmitted and able to produce programs very close to optimal.

...read moreread less

Abstract: The advances in computer and communication technologies have made possible an ubiquitous computing environment were clients equipped with portable devices can send and receive data anytime and from anyplace. In such an asymmetric communication environment, data push has emerged as a very effective and scalable way to deliver information. Recently, the combination of push technology with the Internet and the Web [IEEE Trans. Comput. 50 (2001) 506, ACM/Kluwer Mobile Networks Appl. 7 (2002) 67] (referred to as Webcasting) has emerged "as a way out of the Web maze". Any broadcast program employed for Webcasting must be able to scale to the large number of transmitted pages.We study the issue of creating hierarchical Webcasting programs. We propose a new algorithm, CascadedWebcasting, for the generation of Webcasting programs, scalable in terms of the number of items transmitted and able to produce programs very close to optimal. We give an analytic model of the time complexity of the proposed method and we present a performance evaluation of CascadedWebcasting and a detailed comparison with existing algorithms using synthetic as well as real data. The experiments show that the CascadedWebcasting has negligible execution time and achieves an average access delay very close to that of the optimal algorithm.

...read moreread less

Journal Article•10.1016/J.DATAK.2003.10.006•

Finding aliases on the web using latent semantic analysis

[...]

Vinay Bhat¹, Tim Oates¹, Vishal Shanbhag¹, Charles Nicholas¹•Institutions (1)

University of Maryland, Baltimore County¹

1 May 2004

TL;DR: It is demonstrated empirically that under a broad range of circumstances LSA performs poorly, and a two-stage algorithm based on LSA that performs significantly better is described.

...read moreread less

Abstract: A common problem faced when gathering information from the web is the use of different names to refer to the same entity. For example, the city in India referred to as Bombay in some documents may be referred to as Mumbai in others because its name officially changed from the former to the latter in 1995. Multiplicity of names can cause relevant documents to be missed by search engines. Our goal is to develop an automated system that discovers additional names for an entity given just one of its names. Latent semantic analysis (LSA) is generally thought to be well-suited for this task [Numerical linear algebra with applications 3(4) (1996) 301]. We demonstrate empirically that under a broad range of circumstances LSA performs poorly, and describe a two-stage algorithm based on LSA that performs significantly better.

...read moreread less

Journal Article•10.1016/S0169-023X(03)00104-6•

Linguistic based search facilities in snowflake-like database schemes

[...]

Antje Düsterhöft, Bernhard Thalheim¹•Institutions (1)

Brandenburg University of Technology¹

1 Feb 2004

TL;DR: A simple and powerful approach to search based on a generalization of the theory of word fields to concept fields and providing an appropriate meta-structuring within database schemata that reflects context- or application-based search in a more appropriate way is developed.

...read moreread less

Abstract: Development of generic and general search is one of the most difficult tasks in website development. In practical day life search is not as difficult. The simplification is based on association and context facilities provided by the language and the application area.We aim in developing a simple and powerful approach to search based on a generalization of the theory of word fields to concept fields and providing an appropriate meta-structuring within database schemata that reflects context- or application-based search in a more appropriate way. The internal meta-structuring is based on star and snowflake meta-structures within the schema.

...read moreread less

Journal Article•10.1016/S0169-023X(03)00120-4•

Adaptive cell-based index for moving objects

[...]

Wonik Choi¹, Bongki Moon², Sukho Lee¹•Institutions (2)

Seoul National University¹, University of Arizona²

1 Jan 2004

TL;DR: The AIM is a cell-based multiversion access structure adopting an overlapping technique, which refines cells adaptively to handle regional data skew, and achieved higher query performance compared with R-tree based methods.

...read moreread less

Abstract: R-tree based access methods for moving objects are hardly applicable in practice, due mainly to excessive space requirements and high management costs. To overcome the limitations of such R-tree based access methods, we propose a new index structure called AIM (Adaptive cell-based Index for Moving objects). The AIM is a cell-based multiversion access structure adopting an overlapping technique. The AIM refines cells adaptively to handle regional data skew, which may change its locations over time. Through the extensive performance studies, we observed that The AIM consumed at most 30% of the space required by R-tree based methods, and achieved higher query performance compared with R-tree based methods.

...read moreread less

Journal Article•10.1016/S0169-023X(03)00123-X•

Man bites dog: looking for interesting inconsistencies in structured news reports

[...]

Emma Byrne¹, Anthony Hunter¹•Institutions (1)

University College London¹

1 Mar 2004

TL;DR: A framework for identifying interesting information in news reports by finding interesting inconsistencies is presented and an implemented system based on this framework outputs news reports of interest to the user together with associated explanations of why they are interesting.

...read moreread less

Abstract: Much useful information in news reports is often that which is surprising or unexpected. In other words, we harbour many expectations about the world, and when any of these expectations are violated (i.e. made inconsistent) by news, we have a strong indicator of some information that is interesting for us. In this paper we present a framework for identifying interesting information in news reports by finding interesting inconsistencies. An implemented system based on this framework (1) accepts structured news reports as inputs, (2) translates each report to a logical literal, (3) identifies the story of which the report is a part, (4) looks for inconsistencies between the report, the background knowledge, and a set of expectations, (5) classifies and evaluates those inconsistencies, and (6) outputs news reports of interest to the user together with associated explanations of why they are interesting.

...read moreread less

Journal Article•10.1016/S0169-023X(03)00092-2•

A framework for abstracting data sources having heterogeneous representation formats

[...]

Domenico Rosaci¹, Giorgio Terracina², Domenico Ursino¹•Institutions (2)

Mediterranea University of Reggio Calabria¹, University of Calabria²

1 Jan 2004

TL;DR: A new approach is proposed which is capable of semi-automatically carrying out the abstraction of a data source possibly encoded according to one among a variety of formats such as structured databases, OEM graphs and XML documents.

...read moreread less

Abstract: This paper deals with the issue of abstracting a data source characterized by one among several possible representation formats. First we show that data source abstraction plays a central role in several important application problems in the area of information system design. Then we propose a new approach which is capable of semi-automatically carrying out the abstraction of a data source possibly encoded according to one among a variety of formats such as structured databases, OEM graphs and XML documents. The capability to handle heterogeneous formats is obtained via the usage of a particular conceptual model, called SDR-Network, which is able to uniformly represent and handle data sources with different formats. As a significant application of the presented data source abstraction algorithm, the construction of an Intensional Repository is also illustrated.

...read moreread less

Journal Article•10.1016/J.DATAK.2004.04.004•

Fast range query estimation by N -level tree histograms

[...]

Francesco Buccafurri, Gianluca Lax

1 Nov 2004

TL;DR: A new histogram, called nLT, belonging to the latter class, which is based on a hierarchical decomposition of the original data distribution kept in a full binary tree, using bit saving for representing integer numbers, so that the reduced storage space allows us to increase the tree resolution and, consequently, its accuracy.

...read moreread less

Abstract: Histograms are a lossy compression technique widely applied in various application contexts, like query optimization, statistical and temporal databases, OLAP applications, data streams, and so on. In most cases, accuracy in reconstructing from the histogram some original information, plays a crucial role. Thus, several proposals for constructing histograms trying to maximize their accuracy, have been given in the recent past. Besides bucket-based histograms (i.e., histograms whose construction is driven by the search of a "good" domain partition), there are different new histograms, characterized by more complex structures (like, for instance, wavelet-based histograms). This paper presents a new histogram, called nLT, belonging to the latter class. It is based on a hierarchical decomposition of the original data distribution kept in a full binary tree. This tree, containing a set of pre-computed hierarchical queries, uses bit saving for representing integer numbers, so that the reduced storage space allows us to increase the tree resolution and, consequently, its accuracy. Experimental comparison shows the superiority of nLT w.r.t. the state-of-the-art histograms.

...read moreread less

Journal Article•10.1016/S0169-023X(03)00109-5•

Self-maintaining web pages: from theory to practice

[...]

Martin Bernauer¹, Michael Schrefl•Institutions (1)

Vienna University of Technology¹

1 Jan 2004

TL;DR: A declarative language for the definition of parameterized fragments and web pages is presented, how fragments are stored is shown, and how previously presented algorithms for propagating modifications of relations to web pages can be realized by database triggers are described.

...read moreread less

Abstract: The self-maintaining web pages (SMWP) approach employs concepts from distributed database design and active databases to keep pre-generated web pages in synchronization with database content. It maps fragments of relations to web pages and propagates modifications of relations incrementally to web pages. This paper shows how the SMWP approach can be put into practice using off-the-shelf relational database technology. It presents a declarative language for the definition of parameterized fragments and web pages, shows how fragments are stored, and describes how previously presented algorithms for propagating modifications of relations to web pages can be realized by database triggers.

...read moreread less

Journal Article•10.1016/J.DATAK.2004.05.003•

On demand synchronization and load distribution for database grid-based web applications

[...]

Wen-Syan Li¹, Kemal Altintas, Murat Kantarcioǧlu•Institutions (1)

IBM¹

1 Dec 2004

TL;DR: A scalable scheme for adaptive synchronization of data centers to maintain the load and application precision requirements and the experimental results show the effectiveness of the proposed scheme in maintaining both application result precision and load distribution.

...read moreread less

Abstract: With the availability of content delivery networks (CDN), many database-driven Web applications rely on data centers that host applications and database contents for better performance and higher reliability. However, it raises additional issues associated with database/data center synchronization, query/transaction routing, load balancing, and application result correctness/precision. In this paper, we investigate the issues in the context of data center synchronization for such load and precision critical Web applications in a distributed data center infrastructure. We develop a scalable scheme for adaptive synchronization of data centers to maintain the load and application precision requirements. A prototype has been built for the evaluation of the proposed scheme. The experimental results show the effectiveness of the proposed scheme in maintaining both application result precision and load distribution; adapting to traffic patterns and system capacity limits.

...read moreread less

Journal Article•10.1016/J.DATAK.2004.06.002•

The Active Vertice method: a performant filtering approach to high-dimensional indexing

[...]

Sören Balko¹, Ingo Schmitt¹, Gunter Saake¹•Institutions (1)

Otto-von-Guericke University Magdeburg¹

1 Dec 2004

TL;DR: The Active Vertice method is proposed as a novel filtering approach that enhances the discriminatory power of the approximations while maintaining their compactness in terms of storage consumption and is shown to be superior over existing filtering approaches.

...read moreread less

Abstract: The problem of finding nearest neighbors has emerged as an important foundation of feature-based similarity search in multimedia databases. Most spatial index structures based on the R-tree have failed to efficiently support nearest neighbor search in arbitrarily distributed high-dimensional data sets. In contrast, the so-called filtering principle as represented by the popular VA-file has turned out to be a more promising approach. Query processing is based on a flat file of compact vector approximations. In a first stage, those approximations are sequentially scanned and filtered so that in a second stage the nearest neighbors can be determined from a relatively small fraction of the data set.In this paper, we propose the Active Vertice method as a novel filtering approach. As opposed to the VA-file, approximation regions are arranged in a quad-tree like structure. High-dimensional feature vectors are assigned to ellipsoidal approximation regions on different levels of the tree. A compact approximation of a vector corresponds to the path within the index from the root to the respective tree node. When compared to the VA-file, our method enhances the discriminatory power of the approximations while maintaining their compactness in terms of storage consumption. To demonstrate its effectiveness, we conduct extensive experiments with synthetic as well as real-life data and show the superiority of our method over existing filtering approaches.

...read moreread less

Journal Article•10.1016/J.DATAK.2003.10.004•

Towards building logical views of websites

[...]

Zehua Liu¹, Wee Keong Ng¹, Ee-Peng Lim¹, Feifei Li¹•Institutions (1)

Nanyang Technological University¹

1 May 2004

TL;DR: This paper proposes the WICCAP Data Model, a data model that maps websites from their physical structure into commonly perceived logical views, and implements a visual tool, called the Mapping Wizard, to facilitate and automate the process of producing WIC CAP Data Models.

...read moreread less

Abstract: Information presented in a Website is usually organized into certain logical structure that is intuitive to users. It would be useful to model websites with such logical structure so that extraction of Web data from these sites can be performed in a simple and efficient manner. However, the recognition and reconstruction of such logical structure by software agent is not straightforward due to the complex hyper-link structure among webpages and the HTML formatting within each webpage. In this paper, we propose the WICCAP Data Model, a data model that maps websites from their physical structure into commonly perceived logical views. To enable easy and rapid creation of such data models, we have implemented a visual tool, called the Mapping Wizard, to facilitate and automate the process of producing WICCAP Data Models. Using the tool, the time required to construct a logical representation for a given Website is significantly reduced.

...read moreread less

Journal Article•10.1016/J.DATAK.2003.08.008•

Benchmarking access methods for time-evolving regional data

[...]

Theodoros Tzouramanis¹, Michael Vassilakopoulos², Yannis Manolopoulos³•Institutions (3)

University of the Aegean¹, American Hotel & Lodging Educational Institute², Aristotle University of Thessaloniki³

1 Jun 2004

TL;DR: Experimental results show that in most cases the Overlapping Linear Quadtrees method is the best choice for accessing time-evolving regional data.

...read moreread less

Abstract: In this paper we present a performance comparison of access methods for time-evolving regional data. Initially, we briefly review four temporal extensions of the Linear Region Quadtree: the Time-Split Linear Quadtree, the Multiversion Linear Quadtree, the Multiversion Access Structure for Evolving Raster Images and Overlapping Linear Quadtrees. These methods comprise a family of specialized access methods that can efficiently store and manipulate consecutive raster images. A new simpler implementation solution that provides efficient support for spatio-temporal queries referring to the past through these methods, is suggested. An extensive experimental space and time performance comparison of all the above access methods follows. The comparison is made under a common and flexible benchmarking environment in order to choose the best technique depending on the application and on the image characteristics. These experimental results show that in most cases the Overlapping Linear Quadtrees method is the best choice.

...read moreread less

Journal Article•10.1016/J.DATAK.2003.10.005•

Dynamically evolving concurrent information systems specification and validation: a component-based Petri nets proposal

[...]

Nasreddine Aoumeur¹, Gunter Saake¹•Institutions (1)

Otto-von-Guericke University Magdeburg¹

1 Aug 2004

TL;DR: An adequate Petri net-based meta-level is proposed for coping with dynamic changing of structural and behavioral aspects of CO-NETS components and how this level brings a satisfactory solution to the well-known inheritance-anomaly problem is discussed.

...read moreread less

Abstract: Besides the steady growing of size-complexity and distribution of present-day information systems, business volatility with rapid changes in users' wishes and technological upgrading are stressing an overwhelmingly need for more advanced conceptual modeling approaches. Such advanced conceptual models should coherently and soundly reflect these three crucial dimensions, namely the size, space and (evolution over) time dimensions. In contribution towards such advanced conceptual approaches, we presented in [Data Know. Eng. 42 (2) (2002) 143] a new form of integration of object-orientation with emphasize on componentization into a variety of algebraic Petri nets, we referred to as CO-NETS.The purpose of the present paper is to soundly extend this proposal for coping with dynamic changing of structural and behavioral aspects of CO-NETS components. To this aim, we are proposing an adequate Petri net-based meta-level that may be sketched as follows. First, we construct two "meta-nets' for each component: one concerns the modification of behavioral aspects and the other is for dealing with structural aspects. While the meta-net for behavioral dynamic enables the dynamic of any transition in a given component to be modified at runtime, the meta-net for structural aspects completes and enhances these capabilities by allowing involved messages and object signatures (i.e. structure) to be dynamically manipulated. In addition of a rigorous description of this meta-level and its illustration using a medium-complexity banking system example, we also discuss how this level brings a satisfactory solution to the well-known inheritance-anomaly problem.

...read moreread less

Journal Article•10.1016/J.DATAK.2003.12.001•

Encoding multiple inheritance hierarchies for lattice operations

[...]

M. F. van Bommel¹, Ping Wang¹•Institutions (1)

St. Francis Xavier University¹

1 Aug 2004

TL;DR: A new method is proposed, based on the top-down encoding of Caseau but without the lattice completion requirement, which permits incremental updates to the hierarchy to add nodes at the leaves, and supports efficient lattice computations in applications where the classes of objects are stored as codes.

...read moreread less

Abstract: Incremental updates to multiple inheritance hierarchies are becoming more prevalent with the increasing number of persistent applications supporting complex objects. Efficient computation of the lattice operations greatest lower bound (GLB), least upper bound (LUB), and subsumption is critical. Techniques for the compact encoding of a hierarchy are required that support the operations. One method is to plunge the given ordering into a Boolean lattice of binary words, and perform lattic operations via Boolean operators. An overview of the approach is given and several methods are examined and compared. A new method is proposed, based on the top-down encoding of Caseau but without the lattice completion requirement, which permits incremental updates to the hierarchy to add nodes at the leaves. The algorithm requires polynomial time and space for encoding, and supports efficient lattice computations in applications where the classes of objects are stored as codes. Experimental results illustrate its effectiveness, and an analysis is provided on the effect of the order of insertion on the encoding.

...read moreread less