Scispace (Formerly Typeset)
  1. Home
  2. Conferences
  3. Database Programming Languages
  4. 2007
  1. Home
  2. Conferences
  3. Database Programming Languages
  4. 2007
Showing papers presented at "Database Programming Languages in 2007"
Book Chapter•10.1007/978-3-540-75987-4_10•
Provenance as dependency analysis

[...]

James Cheney1, Amal Ahmed2, Umut A. Acar2•
University of Edinburgh1, Toyota Technological Institute at Chicago2
23 Sep 2007
TL;DR: It is argued that dependency analysis techniques familiar from program analysis and program slicing provide a formal foundation for forms of provenance that are intended to show how part of the output of a query depends on (parts of) its input.
Abstract: Provenance is information recording the source, derivation, or history of some information. Provenance tracking has been studied in a variety of settings; however, although many design points have been explored, the mathematical or semantic foundations of data provenance have received comparatively little attention. In this paper, we argue that dependency analysis techniques familiar from program analysis and program slicing provide a formal foundation for forms of provenance that are intended to showhow(part of) the output of a query depends on (parts of) its input. We introduce a semantic characterization of such dependency provenance, show that this form of provenance is not computable, and provide dynamic and static approximation techniques.

81 citations

Book Chapter•10.1007/978-3-540-75987-4_13•
Efficient evaluation of HAVING queries on a probabilistic database

[...]

Christopher Ré1, Dan Suciu1•
University of Washington1
23 Sep 2007
TL;DR: In this paper, the authors study the evaluation of positive conjunctive queries with predicate aggregates using MIN, MAX, COUNT, SUM, AVG or COUNT(DISTINCT) on probabilistic databases.
Abstract: We study the evaluation of positive conjunctive queries with Boolean aggregate tests (similar to HAVING queries in SQL) on probabilistic databases. Our motivation is to handle aggregate queries over imprecise data resulting from information integration or information extraction. More precisely, we study conjunctive queries with predicate aggregates using MIN, MAX, COUNT, SUM, AVG or COUNT(DISTINCT) on probabilistic databases. Computing the precise output probabilities for positive conjunctive queries (without HAVING) is #P-hard, but is in P for a restricted class of queries called safe queries. Further, for queries without self-joins either a query is safe or its data complexity is #P-Hard, which shows that safe queries exactly capture tractable queries without self-joins. In this paper, for each aggregate above, we find a class of queries that exactly capture efficient evaluation for HAVING queries without self-joins. Our algorithms use a novel technique to compute the marginal distributions of elements in a semiring, which may be of independent interest.

37 citations

Book Chapter•10.1007/978-3-540-75987-4_16•
Efficient inclusion for a class of XML types with interleaving and counting

[...]

Giorgio Ghelli1, Dario Colazzo2, Carlo Sartiani1•
University of Pisa1, University of Paris-Sud2
23 Sep 2007
TL;DR: It is proved here that inclusion for XML types with interleaving and counting can be decided in polynomial time in presence of two important restrictions: no element appears twice in the same content model, and Kleene star is only applied to disjunctions of single elements.
Abstract: Inclusion between XML types is important but expensive, and is much more expensive when unordered types are considered. We prove here that inclusion for XML types with interleaving and counting can be decided in polynomial time in presence of two important restrictions: no element appears twice in the same content model, and Kleene star is only applied to disjunctions of single elements. Our approach is based on the transformation of each such type into a set of constraints that completely characterizes the type. We then provide a complete deduction system to verify whether the constraints of one type imply all the constraints of another one.

35 citations

Book Chapter•10.1007/978-3-540-75987-4_4•
A methodology for coupling fragments of XPath with structural indexes for XML documents

[...]

George H. L. Fletcher1, Dirk Van Gucht2, Yuqing Wu2, Marc Gyssens3, Sofia Brenes2, Jan Paredaens4 •
Washington State University Vancouver1, Indiana University2, University of Hasselt3, University of Antwerp4
23 Sep 2007
TL;DR: In XPath query evaluation, indices similar to those used in relational database systems - namely, value indices on tags and text values - are first used, together with structural join algorithms, which turn out to be simple and efficient.
Abstract: Supporting efficient access to XML data using XPath [3] continues to be an important research problem [6, 12]. XPath queries are used to specify nodelabeled trees which match portions of the hierarchical XML data. In XPath query evaluation, indices similar to those used in relational database systems - namely, value indices on tags and text values - are first used, together with structural join algorithms [1, 2, 19]. This approach turns out to be simple and efficient. However, the structural containment relationships native to XML data are not directly captured by value indices.

33 citations

Proceedings Article•
Efficient Evaluation of.

[...]

Christopher Ré, Dan Suciu
1 Jan 2007
TL;DR: This paper finds a class of queries that exactly capture efficient evaluation for HAVING queries without self-joins, and uses a novel technique to compute the marginal distributions of elements in a semiring, which may be of independent interest.
Abstract: We study the evaluation of positive conjunctive queries with Boolean aggregate tests (similar to HAVING queries in SQL) on probabilistic databases. Our motivation is to handle aggregate queries over imprecise data resulting from information integration or information extraction. More precisely, we study conjunctive queries with predicate aggregates using MIN, MAX, COUNT, SUM, AVG or COUNT(DISTINCT) on probabilistic databases. Computing the precise output probabilities for positive conjunctive queries (without HAVING) is #P-hard, but is in P for a restricted class of queries called safe queries. Further, for queries without self-joins either a query is safe or its data complexity is #P-Hard, which shows that safe queries exactly capture tractable queries without self-joins. In this paper, for each aggregate above, we find a class of queries that exactly capture efficient evaluation for HAVING queries without self-joins. Our algorithms use a novel technique to compute the marginal distributions of elements in a semiring, which may be of independent interest.

31 citations

Book Chapter•10.1007/978-3-540-75987-4_11•
A theory of stream queries

[...]

Yuri Gurevich1, Dirk Leinders2, Jan Van den Bussche2•
Microsoft1, University of Hasselt2
23 Sep 2007
TL;DR: Issues investigated include abstract definitions of computability of stream queries; the connection between abstract computability, continuity, monotonicity, and non-blocking operators; and bounded memory computabilityof stream queries using abstract state machines (ASMs).
Abstract: Data streams are modeled as infinite or finite sequences of data elements coming from an arbitrary but fixed universe. The universe can have various built-in functions and predicates. Stream queries are modeled as functions from streams to streams. Both timed and untimed settings are considered. Issues investigated include abstract definitions of computability of stream queries; the connection between abstract computability, continuity, monotonicity, and non-blocking operators; and bounded memory computability of stream queries using abstract state machines (ASMs).

29 citations

Book Chapter•10.1007/978-3-540-75987-4_8•
On the consistent rewriting of conjunctive queries under primary key constraints

[...]

Jef Wijsen1•
University of Mons1
23 Sep 2007
TL;DR: Novel techniques are used to characterize classes of queries that have a consistent FO rewriting for R(x, y) ∧ R(y, c), where c is a constant and the first coordinate of R is the primary key.
Abstract: This article deals with the computation of consistent answers to queries on relational databases that violate primary key constraints A repair of such inconsistent database is obtained by selecting a maximal number of tuples from each relation without ever selecting two distinct tuples that agree on the primary key We are interested in the following problem: Given a Boolean conjunctive query q, compute a Boolean first-order (FO) query ψ such that for every database db, ψ evaluates to true on db if and only if q evaluates to true on every repair of db Such ψ is called a consistent FO rewriting of q We use novel techniques to characterize classes of queries that have a consistent FO rewriting In this way, we are able to extend previously known classes and discover new ones Finally, we use an Ehrenfeucht-Fraisse game to show the non-existence of a consistent FO rewriting for (the existential closure of) R(x, y) ∧ R(y, c), where c is a constant and the first coordinate of R is the primary key

28 citations

Book Chapter•10.1007/978-3-540-75987-4_12•
Querying structural and behavioral properties of business processes

[...]

Daniel Deutch1, Tova Milo1•
Tel Aviv University1
23 Sep 2007
TL;DR: A query evaluation algorithm of polynomial data complexity that can be applied uniformly to queries on the structure of the process specification as well as on the potential behavior of the defined process is proposed.
Abstract: BPQL is a novel query language for querying business process specifications, introduced recently in [5,6]. It is based on an intuitive model of business processes as rewriting systems, an abstraction of the emerging BPEL (Business Process Execution Language) standard [7]. BPQL allows users to query business processes visually, in a manner very analogous to the language used to specify the processes. The goal of the present paper is to study the formal model underlying BPQL and investigate its properties as well as the complexity of query evaluation. We also study its relationship to previously suggested formalisms for process modeling and querying. In particular we propose a query evaluation algorithm of polynomial data complexity that can be applied uniformly to queries on the structure of the process specification as well as on the potential behavior of the defined process. We show that unless P=NP the efficiency of our algorithm is asymptotically optimal.

28 citations

Book Chapter•10.1007/978-3-540-75987-4_17•
Towards practical typechecking for macro tree transducers

[...]

Alain Frisch1, Haruo Hosoya2•
French Institute for Research in Computer Science and Automation1, University of Tokyo2
23 Sep 2007
TL;DR: The first step toward an implementation of mtt typechecker that has a practical efficiency is reported, to represent an input type obtained from a backward inference as an alternating tree automaton, in a style similar to Tozawa's XSLT0 typechecking.
Abstract: Macro tree transducers (mtt) are an important model that both covers many useful XML transformations and allows decidable exact typechecking. This paper reports our first step toward an implementation of mtt typechecker that has a practical efficiency. Our approach is to represent an input type obtained from a backward inference as an alternating tree automaton, in a style similar to Tozawa's XSLT0 typechecking. In this approach, typechecking reduces to checking emptiness of an alternating tree automaton. We propose several optimizations (Cartesian factorization, state partitioning) on the backward inference process in order to produce much smaller alternating tree automata than the naive algorithm, and we present our efficient algorithm for checking emptiness of alternating tree automata, where we exploit the explicit representation of alternation for local optimizations. Our preliminary experiments confirm that our algorithm has a practical performance that can typecheck simple transformations with respect to the full XHTML in a reasonable time.

21 citations

Book Chapter•10.1007/978-3-540-75987-4_9•
Relational completeness of query languages for annotated databases

[...]

Floris Geerts1, Jan Van den Bussche1•
Transnational University Limburg1
23 Sep 2007
TL;DR: It is shown that the color algebra is relationally complete: it is equivalent to the relational algebra on the explicit annotations, which extends a similar completeness result established for the query algebra of the MONDRIAN annotation system, from unions of conjunctive queries to the full relational algebra.
Abstract: Annotated relational databases can be queried either by simply xmaking the annotations explicitly available along the ordinary data, or by adapting the standard query operators so that they have an implicit effect also on the annotations. We compare the expressive power of these two approaches. As a formal model for the implicit approach we propose the color algebra, an adaptation of the relational algebra to deal with the annotations. We show that the color algebra is relationally complete: it is equivalent to the relational algebra on the explicit annotations. Our result extends a similar completeness result established for the query algebra of the MONDRIAN annotation system, from unions of conjunctive queries to the full relational algebra.

20 citations

Book Chapter•10.1007/978-3-540-75987-4_2•
Efficient algorithms for the tree homeomorphism problem

[...]

Michaela Götz1, Christoph Koch1, Wim Martens•
Saarland University1
23 Sep 2007
TL;DR: This paper first proves that deciding whether there is a tree homeomorphism is LOGSPACE-complete, improving on the current LOGCFL upper bound, and develops a practical algorithm for the tree homeomorphic decision problem that is both space- and time efficient.
Abstract: Tree pattern matching is a fundamental problem that has a wide range of applications in Web data management, XML processing, and selective data dissemination. In this paper we develop efficient algorithms for the tree homeomorphism problem, i.e., the problem of matching a tree pattern with exclusively transitive (descendant) edges. We first prove that deciding whether there is a tree homeomorphism is LOGSPACE-complete, improving on the current LOGCFL upper bound. As our main result we develop a practical algorithm for the tree homeomorphism decision problem that is both space- and time efficient. The algorithm is in LOGDCFL and space consumption is strongly bounded, while the running time is linear in the size of the data tree. This algorithm immediately generalizes to the problem of matching the tree pattern against all subtrees of the data tree, preserving the mentioned efficiency properties.
Book Chapter•10.1007/978-3-540-75987-4_6•
A better semantics for XQuery with side-effects

[...]

Giorgio Ghelli1, Nicola Onose2, Kristoffer H. Rose3, Jérôme Siméon3•
University of Pisa1, University of California, San Diego2, IBM3
23 Sep 2007
TL;DR: This work formalizes the compilation of XQuery extended with updates into a database algebra by mapping both the source language and the algebra to a common core language with list comprehensions and extensible tuples.
Abstract: Formal semantics for XQuery with side-effects have been proposed in [13,16]. We propose a different semantics which is better suited for database compilation. We substantiate this claim by formalizing the compilation of XQuery extended with updates into a database algebra. We prove the correctness of the proposed compilation by mapping both the source language and the algebra to a common core language with list comprehensions and extensible tuples.
Book Chapter•10.1007/978-3-540-75987-4_7•
Repairing inconsistent XML write-access control policies

[...]

Loreto Bravo1, James Cheney1, Irini Fundulaki1•
University of Edinburgh1
23 Sep 2007
TL;DR: In this article, the problem of deciding whether a policy is consistent, and if not, how its inconsistencies can be repaired is investigated, and it is shown that finding minimal repairs is NP-complete and heuristics for finding repairs.
Abstract: XML access control policies involving updates may contain security flaws, here called inconsistencies, in which a forbidden operation may be simulated by performing a sequence of allowed operations. This paper investigates the problem of deciding whether a policy is consistent, and if not, how its inconsistencies can be repaired. We consider policies expressed in terms of annotated DTDs defining which operations are allowed or denied for the XML trees that are instances of the DTD. We show that consistency is decidable in PTIME for such policies and that consistent partial policies can be extended to unique "least-privilege" consistent total policies. We also consider repair problems based on deleting privileges to restore consistency, show that finding minimal repairs is NP-complete, and give heuristics for finding repairs.
Book Chapter•10.1007/978-3-540-75987-4_1•
Xml publishing: bridging theory and practice

[...]

Wenfei Fan1•
University of Edinburgh1
23 Sep 2007
TL;DR: An overview of recent advances in XML publishing is provided and a notion of publishing transducers recently developed for studying the expressive power and complexity of XML publishing languages is presented.
Abstract: Transforming relational data into XML, as known as XML publishing, is often necessary when one wants to exchange data residing in databases or to create an XML interface of a traditional database. This paper aims to provide an overview of recent advances in XML publishing. We present a notion of publishing transducers recently developed for studying the expressive power and complexity of XML publishing languages. In terms of publishing transducers we then characterize XML publishing languages being used in practice. In addition, we address dynamic aspects of XML publishing, namely, incremental maintenance and update management of XML views published from relational data.
Book Chapter•10.1007/978-3-540-75987-4_3•
Datalog programs over infinite databases, revisited

[...]

Sara Cohen1, Joseph Gil1, Evelina Zarivach1•
Technion – Israel Institute of Technology1
23 Sep 2007
TL;DR: This paper found that the universe of Java code can be effectively modeled as an infinite database, and an algorithm to generate an efficient evaluation scheme of closed queries, which is a generalization of Vieille's famous QSQR algorithm for top-down evaluation of Datalog programs.
Abstract: This paper's revisit of infinite relational databases, a model traditionally perceived as purely theoretical, was sparked by a concrete implementation setting, and the results obtained here were used in a practical database problem. In the course of implementing a database system for querying Java software, we found that the universe of Java code can be effectively modeled as an infinite database. This modeling makes it possible to distinguish between queries which are "open-ended," that is, whose result may grow as software components are added into the system, and queries which are "closed," in that their result does not change as the software base grows. Further, closed queries can be implemented much more efficiently than open queries. Achievements include an algorithm for distinguishing between these two kinds of queries (we assume that queries are written in Datalog), and an algorithm to generate an efficient evaluation scheme of closed queries, which is a generalization of Vieille's famous QSQR algorithm for top-down evaluation of Datalog programs. A by-product of this work is a rather terse and elegant representation of QSQR.
Book Chapter•10.1007/978-3-540-75987-4_14•
Succinctness of pattern-based schema languages for XML

[...]

Wouter Gelade1, Frank Neven1•
University of Hasselt1
23 Sep 2007
TL;DR: The investigation is carried out relative to three kinds of vertical pattern languages: regular, linear, and strongly linear patterns and considers the complexity of the simplification problem for each of the considered pattern-based schema's.
Abstract: Martens et al. defined a pattern-based specification language equivalent in expressive power to the widely adopted XML Schema definitions (XSDs). This language consists of rules of the form (r, s) where r and s are regular expressions and can be seen as a type-free extension of DTDs with vertical regular expressions. Sets of such rules can be interpreted both in an existential or universal way. In the present paper, we study the succinctness of both semantics w.r.t. each other and w.r.t. the common abstraction of XSDs in terms of single-type extended DTDs. The investigation is carried out relative to three kinds of vertical pattern languages: regular, linear, and strongly linear patterns. We also consider the complexity of the simplification problem for each of the considered pattern-based schema's.
Book Chapter•10.1007/978-3-540-75987-4_15•
Analysis of imperative XML programs

[...]

Michael G. Burke1, Igor Peshansky1, Mukund Raghavachari1, Christoph Reichenbach2•
IBM1, University of Colorado Boulder2
23 Sep 2007
TL;DR: This paper presents a program analysis, based on a flow-sensitive type system, for detecting both redundant computations and redundant traversals in XML processing programs, and describes two optimizations that take advantage of it.
Abstract: The widespread adoption of XML has led to programming languages that support XML as a first class construct. In this paper, we present a method for analyzing and optimizing imperative XML processing programs. In particular, we present a program analysis, based on a flow-sensitive type system, for detecting both redundant computations and redundant traversals in XML processing programs. The analysis handles declarative queries over XML data and imperative loops that traverse XML values explicitly in a uniform framework. We describe two optimizations that take advantage of our analysis: one merges queries that traverse the same set of XML nodes, and the other replaces an XPath expression by a previously computed result. We show the effectiveness of our method by providing performance measurements on XMark benchmark queries and XLinq sample queries.
Book Chapter•10.1007/978-3-540-75987-4_5•
Conjunctive query containment over trees

[...]

Henrik Björklund1, Wim Martens1, Thomas Schwentick1•
Technical University of Dortmund1
23 Sep 2007
TL;DR: The complexity of containment and satisfiability of conjunctive queries over finite, unranked, labeled trees is studied with respect to the axes Child, NextSibling, their transitive and reflexive closures, and Following.
Abstract: The complexity of containment and satisfiability of conjunctive queries over finite, unranked, labeled trees is studied with respect to the axes Child, NextSibling, their transitive and reflexive closures, and Following. For the containment problem a trichotomy is presented, classifying the problems as in PTIME, coNP-complete, or Π2P -complete. For the satisfiability problem most problems are classified as either in PTIME or NP-complete.

Tools

SciSpace AgentBiomedical AgentSciSpace RecruitSciSpace for EnterpriseAgent GalleryChat with PDFLiterature ReviewAI WriterFind TopicsParaphraserCitation GeneratorExtract DataAI DetectorCitation Booster

Learn

ResourcesLive Workshops

SciSpace

CareersSupportBrowse PapersPricingSciSpace Affiliate ProgramCancellation & Refund PolicyTermsPrivacyData Sources

Directories

PapersTopicsJournalsAuthorsConferencesInstitutionsCitation StylesWriting templates

Extension & Apps

SciSpace Chrome ExtensionSciSpace Mobile App

Contact

support@scispace.com
SciSpace

© 2026 | PubGenius Inc. | Suite # 217 691 S Milpitas Blvd Milpitas CA 95035, USA

soc2
Secured by Delve