Proceedings Article10.1145/2882903.2899396
Emma in Action: Declarative Dataflows for Scalable Data Analysis
Alexander Alexandrov,Andreas Salzmann,Georgi Krastev,Asterios Katsifodimos,Volker Markl +4 more
- 26 Jun 2016
- pp 2073-2076
10
TL;DR: Emma as mentioned in this paper is a programming language embedded in Scala that promotes parallel collection processing through native constructs like Scala's for-comprehensions -a declarative syntax akin to SQL.
read more
Abstract: Parallel dataflow APIs based on second-order functions were originally seen as a flexible alternative to SQL. Over time, however, their complexity increased due to the number of physical aspects that had to be exposed by the underlying engines in order to facilitate efficient execution. To retain a sufficient level of abstraction and lower the barrier of entry for data scientists, projects like Spark and Flink currently offer domain-specific APIs on top of their parallel collection abstractions. This demonstration highlights the benefits of an alternative design based on deep language embedding. We showcase Emma - a programming language embedded in Scala. Emma promotes parallel collection processing through native constructs like Scala's for-comprehensions - a declarative syntax akin to SQL. In addition, Emma also advocates quasi-quoting the entire data analysis algorithm rather than its individual dataflow expressions. This allows for decomposing the quoted code into (sequential) control flow and (parallel) dataflow fragments, optimizing the dataflows in context, and transparently offloading them to an engine like Spark or Flink. The proposed design promises increased programmer productivity due to avoiding an impedance mismatch, thereby reducing the lag times and cost of data analysis.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
An intermediate representation for optimizing machine learning pipelines
Andreas Kunft,Asterios Katsifodimos,Sebastian Schelter,Sebastian Breß,Tilmann Rabl,Volker Markl +5 more
- 01 Jul 2019
TL;DR: Lara is presented, a declarative domainspecific language for collections and matrices with intermediate representation (IR) that reflects on the complete program, i.e., UDFs, control flow, and both data types, to enable holistic optimization of ML training pipelines.
Bridging the gap: towards optimization across linear and relational algebra
Andreas Kunft,Alexander Alexandrov,Asterios Katsifodimos,Volker Markl +3 more
- 26 Jun 2016
TL;DR: The design of Lara is presented, a deeply embedded language in Scala which enables authoring scalable programs using two abstract data types (DataBag and Matrix) and control flow constructs which enables joint optimizations over both relational and linear algebra.
39
Rethinking elastic online scheduling of big data streaming applications over high-velocity continuous data streams
TL;DR: Experimental results conclusively demonstrate that the proposed E-Stream provides better system response time and applications fairness compared to the existing Storm framework.
DASM: Data-Streaming-Based Computing in Nonvolatile Memory Architecture for Embedded System
TL;DR: A data-streaming design for the NVM-based CIM (e.g., DASM), which achieves speedup compared to the NVIDIA Jetson TK1 embedded GPU board, Intel Xeon E5-2640 CPU, the state-of-the-art field-programmable gate array (FPGA) design, with much lower power consumption.
17
Compile-Time Query Optimization for Big Data Analytics
Leonidas Fegaras
- 01 Jan 2019
TL;DR: A new query language for data-intensive scalable computing that is deeply embedded in Scala, called DIQL, and a query optimization framework that optimizes and translates DIQL queries to byte code at compile-time are introduced.
References
Large-Scale Parallel Collaborative Filtering for the Netflix Prize
Yunhong Zhou,Dennis M. Wilkinson,Robert Schreiber,Rong Pan +3 more
- 23 Jun 2008
TL;DR: This paper describes a CF algorithm alternating-least-squares with weighted-?-regularization(ALS-WR), which is implemented on a parallel Matlab platform and shows empirically that the performance of ALS-WR monotonically improves with both the number of features and thenumber of ALS iterations.
872
Implicit Parallelism through Deep Language Embedding
Alexander Alexandrov,Andreas Kunft,Asterios Katsifodimos,Felix Schüler,Lauritz Thamsen,Odej Kao,Tobias Herb,Volker Markl +7 more
- 27 May 2015
TL;DR: This paper proposes a language for complex data analysis embedded in Scala, which allows for declarative specification of dataflows and hides the notion of data-parallelism and distributed runtime behind a suitable intermediate representation.
Haskell boards the ferry: database-supported program execution for Haskell
George Giorgidze,Torsten Grust,Tom Schreiber,Jeroen Weijers +3 more
- 01 Sep 2010
TL;DR: A Haskell library for database-supported program execution that avoids unnecessary data transfer and context switching between the database coprocessor and the programming language runtime by ensuring that the number of generated relational queries is only determined by the program fragment’s type and not by the database size.
Organizing functional code for parallel execution or, foldl and foldr considered slightly harmful
Guy L. Steele
- 31 Aug 2009
TL;DR: This talk will discuss three ideas that I have found to be especially powerful in organizing Fortress programs so that they may be executed equally effectively either sequentially or in parallel: user-defined associative operators (and, more generally, user- defined monoids); conjugate transforms of data; and monoid-caching trees.
29