TL;DR: This paper combines advances in graph databases and in temporal relational databases and proposes an evolving graph model, including a representation called TGraph and an algebra called TGA, that adheres to point-based semantics that is able to concisely express a wide range of common analysis tasks.
Abstract: Graph representations underlie many modern computer applications, capturing the structure of such diverse networks as the Internet, personal associations, roads, sensors, and metabolic pathways. While analysis of static graphs is a well-explored field, new emphasis is being placed on understanding and representing the ways in which networks change over time. Current research is delving into graph evolution rate and mechanisms, the impact of specific events on network evolution, and spatial and spatio-temporal patterns. However, systematic support for evolving graph querying and analytics still lacks. Our goal is to fill this gap, giving users an ability to concisely express a wide range of common analysis tasks.In this paper we combine advances in graph databases and in temporal relational databases and propose an evolving graph model, including a representation called TGraph and an algebra called TGA, that adheres to point-based semantics. TGA includes principled temporal generalizations of conventional graph operators, as well as novel operators that support exploratory analysis of evolving graphs at different levels of temporal and structural granularity.
TL;DR: A parametric inference mechanism, where a single parameter specifies the chosen trade-off between succinctness and precision for the inferred type, is described, designed for massive JSON collection, and hence admits a simple and efficient map-reduce implementation.
Abstract: Type systems express structural information about data, are human readable and hence crucial for understanding code, and are endowed with a formal definition that makes them a fundamental tool when proving program properties. Internal data structures of a database store quantitative information about data, information that is essential for optimization purposes, but is not used for documentation or for correctness proofs. In this paper we propose a new idea: raising a part of the quantitative information from the system-level structures to the type level.Our proposal is motivated by the problem of schema inference for massive collections of JSON data, which are nowadays often collected from external sources and stored in NoSQL systems without an a-priori schema, which makes a-posteriori schema inference extremely useful. NoSQL systems are oriented towards the management of heterogeneous data, and in this context we claim that quantitative information is important in order to assess the relative weight of different variants.We propose a type system where the same collection can be described at different levels of abstraction. Different abstraction levels are useful for different purposes, hence we describe a parametric inference mechanism, where a single parameter specifies the chosen trade-off between succinctness and precision for the inferred type. This algorithm is designed for massive JSON collection, and hence admits a simple and efficient map-reduce implementation.
TL;DR: A novel abstraction called a variational database is proposed that provides a compact and structured representation of general forms of data variations and enables users to query database variations easily.
Abstract: Data variations are prevalent in real-world applications. For example, software vendors handle variations in the business requirements, conventions, and environmental settings of a software product using hundreds of features each combination of which creates a different version of the product. In database-backed software, the database of each version may have a different schema and different content. Variations in the value and representation of each element in a dataset give rise to numerous variants in these applications. Users often would like to express information needs over all such variants. For example, a software vendor would like to perform common tests over all versions of its product, e.g., whether each relation in a relational database has a primary key. Hence, users need a query interface that hides the variational nature of the data and processes a query over all variations of a dataset. We propose a novel abstraction called a variational database that provides a compact and structured representation of general forms of data variations and enables users to query database variations easily.
TL;DR: This work shows that each fragment of the relation algebras where intersection and/or difference is only used on edges (and not on complex compositions) is expressively equivalent to a fragments of the semi-join algeBRas, and holds for node queries that evaluate to sets of nodes.
Abstract: Many graph query languages rely on the composition operator to navigate graphs and select nodes of interests, even though evaluating compositions of relations can be costly. Often, this need for composition can be reduced by rewriting towards queries that use semi-joins instead. In this way, the cost of evaluating queries can be significantly reduced.We study techniques to recognize and apply such rewritings. Concretely, we study the relationship between the expressive power of the relation algebras, that heavily rely on composition, and the semi-join algebras, that replace the composition operator in favor of the semi-join operators.As our main result, we show that each fragment of the relation algebras where intersection and/or difference is only used on edges (and not on complex compositions) is expressively equivalent to a fragment of the semi-join algebras. This expressive equivalence holds for node queries that evaluate to sets of nodes. For practical relevance, we exhibit constructive steps for rewriting relation algebra queries to semi-join algebra queries, and prove that these steps lead to only a well-bounded increase in the number of steps needed to evaluate the rewritten queries.In addition, on node-labeled graphs that are sibling-ordered trees, we establish new relationships among the expressive power of Regular XPath, Conditional XPath, FO-logic, and the semi-join algebra augmented with restricted fixpoint operators.
TL;DR: This talk presents the embedding of the multilingual GraalVM into the Oracle Database and the MySQL database to allow for executing stored procedures and user-defined functions, and shows how the use of JavaScript as stored procedure language can remedy the above mentioned disadvantages.
Abstract: Stored procedures provide a way to centralize business logic involving multiple SQL statements and running them inside a database management system. They are typically executed inside the address space of the database. Doing so helps avoid expensive network round trips and saves time and memory by having direct access to the data that is being processed.However, despite those benefits, the use of stored procedures is often considered harmful for a variety of reasons: (1) stored procedure languages are often vendor-specific, (2) developers for specialized stored procedure languages are hard to find, (3) stored procedures are stored in the database and often harder to keep track of within modern version control systems, and (4) tool support is often lacking behind that of other programming languages.The contributions of this talk are twofold: First, we present the embedding of the multilingual GraalVM into the Oracle Database and the MySQL database to allow for executing stored procedures and user-defined functions. Second, we show how the use of JavaScript as stored procedure language can remedy the above mentioned disadvantages.GraalVM is a multilingual virtual machine that is being developed by Oracle Labs. GraalVM allows for executing many modern programming languages such as JavaScript, R or Python with very high performance. Additionally, GraalVM has been designed to run on a Java Virtual Machine (JVM) as well as being embeddable into systems such as the Oracle Database that are not running on a JVM. In order to embed GraalVM and use it for executing stored procedures, it needs to access data stored in the database efficiently. If the types of the host system (e.g. SQL types) and the types of the stored procedure language match, this should happen without converting or copying the data. Otherwise, conversion of the data between the two type systems is required. We show how the speculative just-in-time compiler provided by GraalVM can be leveraged to perform such conversions with minimal effort.In the talk, we also demonstrate how JavaScript can be used as stored procedure language in the Oracle Database and MySQL. JavaScript is one of today's most popular programming languages with a vibrant open-source community providing a vast number of reusable software packages. We show an effective development workflow for implementing stored procedures in JavaScript and how developers can leverage popular open-source packages. With this, we want to remedy the above mentioned disadvantages and bring stored procedures to a larger developer audience.
TL;DR: A property-based testing tool for SPARQL which randomly generates test cases which consist on instances of an ontology which is tested with a Boolean property which is defined in terms of membership of ontology individuals to ontology classes.
Abstract: In this paper we describe a property-based testing tool for SPARQL. Given a SPARQL query, the tool randomly generates test cases which consist on instances of an ontology. The tool checks the well typed-ness of the SPARQL query as well as the consistency of the test cases with the ontology axioms. With this aim, a type system has been defined for SPARQL. Test cases are later used to execute the SPARQL query. The output of the SPARQL query is tested with a Boolean property which is defined in terms of membership of ontology individuals to ontology classes. The testing tool reports counterexamples when the Boolean property is not satisfied.
TL;DR: This paper shows that a typed notation for linear algebra exists and can be useful in formalizing and reasoning about data aggregation operations, and one such operation - the construction of a data cube - is shown to be easily expressible as a linear algebra operator.
Abstract: There is a need for a typed notation for linear algebra applicable to the fields of econometrics and data mining. In this paper we show that such a notation exists and can be useful in formalizing and reasoning about data aggregation operations.One such operation - the construction of a data cube - is shown to be easily expressible as a linear algebra operator. The construction is shown to be type-generic and some of its properties are derived from its typed definition and proved using matrix algebra. Other forms of data aggregation such as eg. rollup and cross tabulation are shown to be algebraically derivable from data cubes.
TL;DR: A unified programming environment for analytical applications is presented that includes AL, a programming language that combines concepts of various common analytical domains and a flexible compilation system that uses a language-, domain-, and platform independent program intermediate representation to separate high level application logic and physical organisation.
Abstract: Data driven organizations gather information on various aspects of their endeavours and analyze that information to gain valuable insights or to increase automatization. Today, these organizations can choose from a wealth of specialized analytical libraries and platforms to meet their functional and non-functional requirements. Indeed, many common application scenarios involve the combination of multiple such libraries and platforms in order to provide a holistic perspective. Due to the scattered landscape of specialized analytical tools, this integration can result in complex and hard to evolve applications. In addition, the necessary movement of data between tools and formats can introduce a serious performance penalty. In this article we present a unified programming environment for analytical applications. The environment includes AL, a programming language that combines concepts of various common analytical domains. Further, the environment also includes a flexible compilation system that uses a language-, domain-, and platform independent program intermediate representation to separate high level application logic and physical organisation. We provide a detailed introduction of AL, establish our program intermediate representation as a generally useful abstraction, and give a detailed explanation of the translation of AL programs into workloads for our experimental shared-memory processing engine.
TL;DR: This paper shows that for a given XQuery expression and a nested-relational DTD, the input expression can be transformed into an expression that can be evaluated without---potentially costly---ordering operations even if the input query requires its result to be in DDO.
Abstract: XQuery has an order-sensitive semantics in the sense that it requires nodes to be sorted in document order without duplicates (or in Distinct Document Order, DDO for short). This paper shows that for a given XQuery expression and a nested-relational DTD, the input expression can be transformed into an expression that can be evaluated without---potentially costly---ordering operations even if the input query requires its result to be in DDO. To this end, we propose an XQuery transformation algorithm that consists of simple rewriting rules. The basic idea is inspired by a generate-and-test approach as commonly used for solving search problems. We apply this approach when constructing the transformed expression: first, a skeleton query is prepared for the generate phase. This skeleton query can be evaluated without DDO, but it has the ability to return all nodes in DDO for all XML documents that conform to the input DTD. Second, an output expression is generated by injecting conditions for the test phase, which are extracted from the input expression, into the skeleton query. The key to performing both the extraction and injection of conditions in a systematic way is to utilize XQuery transformations that preserve equivalence up to DDO.
TL;DR: This work proposes a hybrid shipping approach based on static analysis which automatically partitions client code and only ships code to the server which is likely to improve performance, and demonstrates the viability of this approach in a prototype system which is called Locomotor.
Abstract: Server-side execution is a well-known method for improving the performance of database applications. Running code on the database server eliminates round trips to the client application resulting in significantly reduced latency. However, the common approach of explicitly writing server-side code in stored procedures has significant drawbacks. Application developers must develop and maintain code in two separate languages and manually partition code between the client and server. Code shipping is a viable alternative but still requires an explicit specification of which code can be run on the server. We propose a hybrid shipping approach based on static analysis which automatically partitions client code and only ships code to the server which is likely to improve performance.We demonstrate the viability of this approach in a prototype system which we call Locomotor. Locomotor operates on Python applications using the Redis key-value store. Through static analysis, it identifies fragments of code which can benefit from being executed on the server and automatically performs translation to execute the fragments on the server. Unlike some previous systems, Locomotor is not pattern-based and is able to ship a wide variety of code. By shipping code to the server, Locomotor is able to achieve significant performance gains over client-side execution with no modifications to the application code.
TL;DR: GraphScript is introduced, a domain-specific graph query language tailored to serve advanced graph analysis tasks and the specification of complex graph algorithms that allows algorithm customization to the user's needs.
Abstract: Real-world graph applications are typically domain-specific and model complex business processes in the property graph data model. To implement a domain-specific graph algorithm in the context of such a graph application, simply providing a set of built-in graph algorithms is usually not sufficient nor does it allow algorithm customization to the user's needs. To cope with these issues, graph database vendors provide---in addition to their declarative graph query languages---procedural interfaces to write user-defined graph algorithms.In this paper, we introduce GraphScript, a domain-specific graph query language tailored to serve advanced graph analysis tasks and the specification of complex graph algorithms. We describe the major language design of GraphScript, discuss graph-specific optimizations, and describe the integration into an enterprise data platform.