About: Extract, transform, load is a research topic. Over the lifetime, 117 publications have been published within this topic receiving 1282 citations. The topic is also known as: ETL.
TL;DR: This survey covers the conceptual and logical modeling of ETL processes, along with some design methods, and visits each stage of the E-T-L triplet, and examines problems that fall within each of these stages.
Abstract: The software processes that facilitate the original loading and the periodic refreshment of the data warehouse contents are commonly known as Extraction-Transformation-Loading (ETL) processes. The intention of this survey is to present the research work in the field of ETL technology in a structured way. To this end, we organize the coverage of the field as follows: (a) first, we cover the conceptual and logical modeling of ETL processes, along with some design methods, (b) we visit each stage of the E-T-L triplet, and examine problems that fall within each of these stages, (c) we discuss problems that pertain to the entirety of an ETL process, and, (d) we review some research prototypes of academic origin. [Article copies are available for purchase from InfoSci-on-Demand.com]
TL;DR: This survey covers the conceptual and logical modeling of ETL processes, along with some design methods, and visits each stage of the E-T-L triplet, and examines problems that fall within each of these stages.
Abstract: The software processes that facilitate the original loading and the periodic refreshment of the data warehouse contents are commonly known as Extraction-Transformation-Loading (ETL) processes. The intention of this survey is to present the research work in the field of ETL technology in a structured way. To this end, we organize the coverage of the field as follows: (a) first, we cover the conceptual and logical modeling of ETL processes, along with some design methods, (b) we visit each stage of the E-T-L triplet, and examine problems that fall within each of these stages, (c) we discuss problems that pertain to the entirety of an ETL process, and, (d) we review some research prototypes of academic origin. [Article copies are available for purchase from InfoSci-on-Demand.com]
TL;DR: In this paper, the authors present a DataStage Service Architecture (DSA) that helps automate and control the ETL process and allows developers to easily view and update the process.
Abstract: Novel tools for development and operation of ETL (Extract Transform Load) systems for populating databases. An embodiment uses metadata tables to describe relationships between jobs to run for processing data. These relationships can include parent-child job relation, and priority. These tools create a DataStage Service Architecture (DSA) that helps automate and control the ETL process. Other tools allow developers to easily view and update the ETL process.
TL;DR: This work validated the correctness of the extract, load, and transform (ETL) process of the extracted data of West Virginia Clinical and Translational Science Institute's Integrated Data Repository, a clinical data warehouse that includes data extracted from two EHR systems.
TL;DR: This paper presents the (Python-based) framework pygrametl which offers commonly used functionality for ETL development and proposes to do ETL programming by writing code, showing that both the development time and running time are short when using this framework.
Abstract: Extract-Transform-Load (ETL) processes are used for extracting data, transforming it and loading it into data warehouses (DWs). Many tools for creating ETL processes exist. The dominating tools all use graphical user interfaces (GUIs) where the developer visually defines the data flow and operations. In this paper, we challenge this approach and propose to do ETL programming by writing code. To make the programming easy, we present the (Python-based) framework pygrametl which offers commonly used functionality for ETL development. By using the framework, the developer can efficiently create effective ETL solutions from which the full power of programming can be exploited. Our experiments show that when pygrametl is used, both the development time and running time are short when compared to an existing GUI-based tool.