Top 18 papers presented at Database Programming Languages in 2007

Showing papers presented at "Database Programming Languages in 2007"

Book Chapter•10.1007/978-3-540-75987-4_10•

Provenance as dependency analysis

[...]

James Cheney¹, Amal Ahmed², Umut A. Acar²•Institutions (2)

University of Edinburgh¹, Toyota Technological Institute at Chicago²

23 Sep 2007

TL;DR: It is argued that dependency analysis techniques familiar from program analysis and program slicing provide a formal foundation for forms of provenance that are intended to show how part of the output of a query depends on (parts of) its input.

...read moreread less

Abstract: Provenance is information recording the source, derivation, or history of some information. Provenance tracking has been studied in a variety of settings; however, although many design points have been explored, the mathematical or semantic foundations of data provenance have received comparatively little attention. In this paper, we argue that dependency analysis techniques familiar from program analysis and program slicing provide a formal foundation for forms of provenance that are intended to showhow(part of) the output of a query depends on (parts of) its input. We introduce a semantic characterization of such dependency provenance, show that this form of provenance is not computable, and provide dynamic and static approximation techniques.

...read moreread less

81 citations

Book Chapter•10.1007/978-3-540-75987-4_13•

Efficient evaluation of HAVING queries on a probabilistic database

[...]

Christopher Ré¹, Dan Suciu¹•Institutions (1)

University of Washington¹

23 Sep 2007

TL;DR: In this paper, the authors study the evaluation of positive conjunctive queries with predicate aggregates using MIN, MAX, COUNT, SUM, AVG or COUNT(DISTINCT) on probabilistic databases.

...read moreread less

Abstract: We study the evaluation of positive conjunctive queries with Boolean aggregate tests (similar to HAVING queries in SQL) on probabilistic databases. Our motivation is to handle aggregate queries over imprecise data resulting from information integration or information extraction. More precisely, we study conjunctive queries with predicate aggregates using MIN, MAX, COUNT, SUM, AVG or COUNT(DISTINCT) on probabilistic databases. Computing the precise output probabilities for positive conjunctive queries (without HAVING) is #P-hard, but is in P for a restricted class of queries called safe queries. Further, for queries without self-joins either a query is safe or its data complexity is #P-Hard, which shows that safe queries exactly capture tractable queries without self-joins. In this paper, for each aggregate above, we find a class of queries that exactly capture efficient evaluation for HAVING queries without self-joins. Our algorithms use a novel technique to compute the marginal distributions of elements in a semiring, which may be of independent interest.

...read moreread less

37 citations

Book Chapter•10.1007/978-3-540-75987-4_16•

Efficient inclusion for a class of XML types with interleaving and counting

[...]

Giorgio Ghelli¹, Dario Colazzo², Carlo Sartiani¹•Institutions (2)

University of Pisa¹, University of Paris-Sud²

23 Sep 2007

TL;DR: It is proved here that inclusion for XML types with interleaving and counting can be decided in polynomial time in presence of two important restrictions: no element appears twice in the same content model, and Kleene star is only applied to disjunctions of single elements.

...read moreread less

Abstract: Inclusion between XML types is important but expensive, and is much more expensive when unordered types are considered. We prove here that inclusion for XML types with interleaving and counting can be decided in polynomial time in presence of two important restrictions: no element appears twice in the same content model, and Kleene star is only applied to disjunctions of single elements. Our approach is based on the transformation of each such type into a set of constraints that completely characterizes the type. We then provide a complete deduction system to verify whether the constraints of one type imply all the constraints of another one.

...read moreread less

35 citations

Book Chapter•10.1007/978-3-540-75987-4_4•

A methodology for coupling fragments of XPath with structural indexes for XML documents

[...]

George H. L. Fletcher¹, Dirk Van Gucht², Yuqing Wu², Marc Gyssens³, Sofia Brenes², Jan Paredaens⁴ - Show less +2 more•Institutions (4)

Washington State University Vancouver¹, Indiana University², University of Hasselt³, University of Antwerp⁴

23 Sep 2007

TL;DR: In XPath query evaluation, indices similar to those used in relational database systems - namely, value indices on tags and text values - are first used, together with structural join algorithms, which turn out to be simple and efficient.

...read moreread less

Abstract: Supporting efficient access to XML data using XPath [3] continues to be an important research problem [6, 12]. XPath queries are used to specify nodelabeled trees which match portions of the hierarchical XML data. In XPath query evaluation, indices similar to those used in relational database systems - namely, value indices on tags and text values - are first used, together with structural join algorithms [1, 2, 19]. This approach turns out to be simple and efficient. However, the structural containment relationships native to XML data are not directly captured by value indices.

...read moreread less

33 citations

Proceedings Article•

Efficient Evaluation of.

[...]

Christopher Ré, Dan Suciu

1 Jan 2007

TL;DR: This paper finds a class of queries that exactly capture efficient evaluation for HAVING queries without self-joins, and uses a novel technique to compute the marginal distributions of elements in a semiring, which may be of independent interest.

...read moreread less

31 citations

Book Chapter•10.1007/978-3-540-75987-4_11•

A theory of stream queries

[...]

Yuri Gurevich¹, Dirk Leinders², Jan Van den Bussche²•Institutions (2)

Microsoft¹, University of Hasselt²

23 Sep 2007

TL;DR: Issues investigated include abstract definitions of computability of stream queries; the connection between abstract computability, continuity, monotonicity, and non-blocking operators; and bounded memory computabilityof stream queries using abstract state machines (ASMs).

...read moreread less

Abstract: Data streams are modeled as infinite or finite sequences of data elements coming from an arbitrary but fixed universe. The universe can have various built-in functions and predicates. Stream queries are modeled as functions from streams to streams. Both timed and untimed settings are considered. Issues investigated include abstract definitions of computability of stream queries; the connection between abstract computability, continuity, monotonicity, and non-blocking operators; and bounded memory computability of stream queries using abstract state machines (ASMs).

...read moreread less

29 citations

Book Chapter•10.1007/978-3-540-75987-4_8•

On the consistent rewriting of conjunctive queries under primary key constraints

[...]

Jef Wijsen¹•Institutions (1)

University of Mons¹

23 Sep 2007

TL;DR: Novel techniques are used to characterize classes of queries that have a consistent FO rewriting for R(x, y) ∧ R(y, c), where c is a constant and the first coordinate of R is the primary key.

...read moreread less

Abstract: This article deals with the computation of consistent answers to queries on relational databases that violate primary key constraints A repair of such inconsistent database is obtained by selecting a maximal number of tuples from each relation without ever selecting two distinct tuples that agree on the primary key We are interested in the following problem: Given a Boolean conjunctive query q, compute a Boolean first-order (FO) query ψ such that for every database db, ψ evaluates to true on db if and only if q evaluates to true on every repair of db Such ψ is called a consistent FO rewriting of q We use novel techniques to characterize classes of queries that have a consistent FO rewriting In this way, we are able to extend previously known classes and discover new ones Finally, we use an Ehrenfeucht-Fraisse game to show the non-existence of a consistent FO rewriting for (the existential closure of) R(x, y) ∧ R(y, c), where c is a constant and the first coordinate of R is the primary key

...read moreread less

28 citations

Book Chapter•10.1007/978-3-540-75987-4_12•

Querying structural and behavioral properties of business processes

[...]

Daniel Deutch¹, Tova Milo¹•Institutions (1)

Tel Aviv University¹

23 Sep 2007

TL;DR: A query evaluation algorithm of polynomial data complexity that can be applied uniformly to queries on the structure of the process specification as well as on the potential behavior of the defined process is proposed.

...read moreread less

Abstract: BPQL is a novel query language for querying business process specifications, introduced recently in [5,6]. It is based on an intuitive model of business processes as rewriting systems, an abstraction of the emerging BPEL (Business Process Execution Language) standard [7]. BPQL allows users to query business processes visually, in a manner very analogous to the language used to specify the processes. The goal of the present paper is to study the formal model underlying BPQL and investigate its properties as well as the complexity of query evaluation. We also study its relationship to previously suggested formalisms for process modeling and querying. In particular we propose a query evaluation algorithm of polynomial data complexity that can be applied uniformly to queries on the structure of the process specification as well as on the potential behavior of the defined process. We show that unless P=NP the efficiency of our algorithm is asymptotically optimal.

...read moreread less

28 citations

Book Chapter•10.1007/978-3-540-75987-4_17•

Towards practical typechecking for macro tree transducers

[...]

Alain Frisch¹, Haruo Hosoya²•Institutions (2)

French Institute for Research in Computer Science and Automation¹, University of Tokyo²

23 Sep 2007

TL;DR: The first step toward an implementation of mtt typechecker that has a practical efficiency is reported, to represent an input type obtained from a backward inference as an alternating tree automaton, in a style similar to Tozawa's XSLT0 typechecking.

...read moreread less

Abstract: Macro tree transducers (mtt) are an important model that both covers many useful XML transformations and allows decidable exact typechecking. This paper reports our first step toward an implementation of mtt typechecker that has a practical efficiency. Our approach is to represent an input type obtained from a backward inference as an alternating tree automaton, in a style similar to Tozawa's XSLT0 typechecking. In this approach, typechecking reduces to checking emptiness of an alternating tree automaton. We propose several optimizations (Cartesian factorization, state partitioning) on the backward inference process in order to produce much smaller alternating tree automata than the naive algorithm, and we present our efficient algorithm for checking emptiness of alternating tree automata, where we exploit the explicit representation of alternation for local optimizations. Our preliminary experiments confirm that our algorithm has a practical performance that can typecheck simple transformations with respect to the full XHTML in a reasonable time.

...read moreread less

21 citations

Book Chapter•10.1007/978-3-540-75987-4_9•

Relational completeness of query languages for annotated databases

[...]

Floris Geerts¹, Jan Van den Bussche¹•Institutions (1)

Transnational University Limburg¹

23 Sep 2007

TL;DR: It is shown that the color algebra is relationally complete: it is equivalent to the relational algebra on the explicit annotations, which extends a similar completeness result established for the query algebra of the MONDRIAN annotation system, from unions of conjunctive queries to the full relational algebra.

...read moreread less

Abstract: Annotated relational databases can be queried either by simply xmaking the annotations explicitly available along the ordinary data, or by adapting the standard query operators so that they have an implicit effect also on the annotations. We compare the expressive power of these two approaches. As a formal model for the implicit approach we propose the color algebra, an adaptation of the relational algebra to deal with the annotations. We show that the color algebra is relationally complete: it is equivalent to the relational algebra on the explicit annotations. Our result extends a similar completeness result established for the query algebra of the MONDRIAN annotation system, from unions of conjunctive queries to the full relational algebra.

...read moreread less

20 citations

Book Chapter•10.1007/978-3-540-75987-4_2•

Efficient algorithms for the tree homeomorphism problem

[...]

Michaela Götz¹, Christoph Koch¹, Wim Martens•Institutions (1)

Saarland University¹

23 Sep 2007

TL;DR: This paper first proves that deciding whether there is a tree homeomorphism is LOGSPACE-complete, improving on the current LOGCFL upper bound, and develops a practical algorithm for the tree homeomorphic decision problem that is both space- and time efficient.

...read moreread less

Abstract: Tree pattern matching is a fundamental problem that has a wide range of applications in Web data management, XML processing, and selective data dissemination. In this paper we develop efficient algorithms for the tree homeomorphism problem, i.e., the problem of matching a tree pattern with exclusively transitive (descendant) edges. We first prove that deciding whether there is a tree homeomorphism is LOGSPACE-complete, improving on the current LOGCFL upper bound. As our main result we develop a practical algorithm for the tree homeomorphism decision problem that is both space- and time efficient. The algorithm is in LOGDCFL and space consumption is strongly bounded, while the running time is linear in the size of the data tree. This algorithm immediately generalizes to the problem of matching the tree pattern against all subtrees of the data tree, preserving the mentioned efficiency properties.

...read moreread less

Book Chapter•10.1007/978-3-540-75987-4_6•

A better semantics for XQuery with side-effects

[...]

Giorgio Ghelli¹, Nicola Onose², Kristoffer H. Rose³, Jérôme Siméon³•Institutions (3)

University of Pisa¹, University of California, San Diego², IBM³

23 Sep 2007

TL;DR: This work formalizes the compilation of XQuery extended with updates into a database algebra by mapping both the source language and the algebra to a common core language with list comprehensions and extensible tuples.

...read moreread less

Abstract: Formal semantics for XQuery with side-effects have been proposed in [13,16]. We propose a different semantics which is better suited for database compilation. We substantiate this claim by formalizing the compilation of XQuery extended with updates into a database algebra. We prove the correctness of the proposed compilation by mapping both the source language and the algebra to a common core language with list comprehensions and extensible tuples.

...read moreread less

Book Chapter•10.1007/978-3-540-75987-4_7•

Repairing inconsistent XML write-access control policies

[...]

Loreto Bravo¹, James Cheney¹, Irini Fundulaki¹•Institutions (1)

University of Edinburgh¹

23 Sep 2007

TL;DR: In this article, the problem of deciding whether a policy is consistent, and if not, how its inconsistencies can be repaired is investigated, and it is shown that finding minimal repairs is NP-complete and heuristics for finding repairs.

...read moreread less

Abstract: XML access control policies involving updates may contain security flaws, here called inconsistencies, in which a forbidden operation may be simulated by performing a sequence of allowed operations. This paper investigates the problem of deciding whether a policy is consistent, and if not, how its inconsistencies can be repaired. We consider policies expressed in terms of annotated DTDs defining which operations are allowed or denied for the XML trees that are instances of the DTD. We show that consistency is decidable in PTIME for such policies and that consistent partial policies can be extended to unique "least-privilege" consistent total policies. We also consider repair problems based on deleting privileges to restore consistency, show that finding minimal repairs is NP-complete, and give heuristics for finding repairs.

...read moreread less

Book Chapter•10.1007/978-3-540-75987-4_1•

Xml publishing: bridging theory and practice

[...]

Wenfei Fan¹•Institutions (1)

University of Edinburgh¹

23 Sep 2007

TL;DR: An overview of recent advances in XML publishing is provided and a notion of publishing transducers recently developed for studying the expressive power and complexity of XML publishing languages is presented.

...read moreread less

Abstract: Transforming relational data into XML, as known as XML publishing, is often necessary when one wants to exchange data residing in databases or to create an XML interface of a traditional database. This paper aims to provide an overview of recent advances in XML publishing. We present a notion of publishing transducers recently developed for studying the expressive power and complexity of XML publishing languages. In terms of publishing transducers we then characterize XML publishing languages being used in practice. In addition, we address dynamic aspects of XML publishing, namely, incremental maintenance and update management of XML views published from relational data.

...read moreread less

Book Chapter•10.1007/978-3-540-75987-4_3•

Datalog programs over infinite databases, revisited

[...]

Sara Cohen¹, Joseph Gil¹, Evelina Zarivach¹•Institutions (1)

Technion – Israel Institute of Technology¹

23 Sep 2007

TL;DR: This paper found that the universe of Java code can be effectively modeled as an infinite database, and an algorithm to generate an efficient evaluation scheme of closed queries, which is a generalization of Vieille's famous QSQR algorithm for top-down evaluation of Datalog programs.

...read moreread less

Abstract: This paper's revisit of infinite relational databases, a model traditionally perceived as purely theoretical, was sparked by a concrete implementation setting, and the results obtained here were used in a practical database problem. In the course of implementing a database system for querying Java software, we found that the universe of Java code can be effectively modeled as an infinite database. This modeling makes it possible to distinguish between queries which are "open-ended," that is, whose result may grow as software components are added into the system, and queries which are "closed," in that their result does not change as the software base grows. Further, closed queries can be implemented much more efficiently than open queries. Achievements include an algorithm for distinguishing between these two kinds of queries (we assume that queries are written in Datalog), and an algorithm to generate an efficient evaluation scheme of closed queries, which is a generalization of Vieille's famous QSQR algorithm for top-down evaluation of Datalog programs. A by-product of this work is a rather terse and elegant representation of QSQR.

...read moreread less

Book Chapter•10.1007/978-3-540-75987-4_14•

Succinctness of pattern-based schema languages for XML

[...]

Wouter Gelade¹, Frank Neven¹•Institutions (1)

University of Hasselt¹

23 Sep 2007

TL;DR: The investigation is carried out relative to three kinds of vertical pattern languages: regular, linear, and strongly linear patterns and considers the complexity of the simplification problem for each of the considered pattern-based schema's.

...read moreread less

Abstract: Martens et al. defined a pattern-based specification language equivalent in expressive power to the widely adopted XML Schema definitions (XSDs). This language consists of rules of the form (r, s) where r and s are regular expressions and can be seen as a type-free extension of DTDs with vertical regular expressions. Sets of such rules can be interpreted both in an existential or universal way. In the present paper, we study the succinctness of both semantics w.r.t. each other and w.r.t. the common abstraction of XSDs in terms of single-type extended DTDs. The investigation is carried out relative to three kinds of vertical pattern languages: regular, linear, and strongly linear patterns. We also consider the complexity of the simplification problem for each of the considered pattern-based schema's.

...read moreread less

Book Chapter•10.1007/978-3-540-75987-4_15•

Analysis of imperative XML programs

[...]

Michael G. Burke¹, Igor Peshansky¹, Mukund Raghavachari¹, Christoph Reichenbach²•Institutions (2)

IBM¹, University of Colorado Boulder²

23 Sep 2007

TL;DR: This paper presents a program analysis, based on a flow-sensitive type system, for detecting both redundant computations and redundant traversals in XML processing programs, and describes two optimizations that take advantage of it.

...read moreread less

Abstract: The widespread adoption of XML has led to programming languages that support XML as a first class construct. In this paper, we present a method for analyzing and optimizing imperative XML processing programs. In particular, we present a program analysis, based on a flow-sensitive type system, for detecting both redundant computations and redundant traversals in XML processing programs. The analysis handles declarative queries over XML data and imperative loops that traverse XML values explicitly in a uniform framework. We describe two optimizations that take advantage of our analysis: one merges queries that traverse the same set of XML nodes, and the other replaces an XPath expression by a previously computed result. We show the effectiveness of our method by providing performance measurements on XMark benchmark queries and XLinq sample queries.

...read moreread less

Book Chapter•10.1007/978-3-540-75987-4_5•

Conjunctive query containment over trees

[...]

Henrik Björklund¹, Wim Martens¹, Thomas Schwentick¹•Institutions (1)

Technical University of Dortmund¹

23 Sep 2007

TL;DR: The complexity of containment and satisfiability of conjunctive queries over finite, unranked, labeled trees is studied with respect to the axes Child, NextSibling, their transitive and reflexive closures, and Following.

...read moreread less

Abstract: The complexity of containment and satisfiability of conjunctive queries over finite, unranked, labeled trees is studied with respect to the axes Child, NextSibling, their transitive and reflexive closures, and Following. For the containment problem a trichotomy is presented, classifying the problems as in PTIME, coNP-complete, or Π2P -complete. For the satisfiability problem most problems are classified as either in PTIME or NP-complete.

...read moreread less