Denormalization

Topic Tools

Papers published on a yearly basis

Papers

Journal Article•10.1109/TKDE.2009.127•

Bayesian Classifiers Programmed in SQL

[...]

Carlos Ordonez¹, Sasi K. Pitchaimalai¹•Institutions (1)

University of Houston¹

01 Jan 2010-IEEE Transactions on Knowledge and Data Engineering

TL;DR: This work introduces two classifiers: naive Bayes and a classifier based on class decomposition using K-means clustering and achieves high classification accuracy, can efficiently analyze large data sets, and has linear scalability in SQL.

...read moreread less

Abstract: The Bayesian classifier is a fundamental classification technique. In this work, we focus on programming Bayesian classifiers in SQL. We introduce two classifiers: naive Bayes and a classifier based on class decomposition using K-means clustering. We consider two complementary tasks: model computation and scoring a data set. We study several layouts for tables and several indexing alternatives. We analyze how to transform equations into efficient SQL queries and introduce several query optimizations. We conduct experiments with real and synthetic data sets to evaluate classification accuracy, query optimizations, and scalability. Our Bayesian classifier is more accurate than naive Bayes and decision trees. Distance computation is significantly accelerated with horizontal layout for tables, denormalization, and pivoting. We also compare naive Bayes implementations in SQL and C++: SQL is about four times slower. Our Bayesian classifier in SQL achieves high classification accuracy, can efficiently analyze large data sets, and has linear scalability.

...read moreread less

62 citations

Journal Article•10.14778/2732951.2732965•

WideTable: an accelerator for analytical data processing

[...]

Yinan Li¹, Jignesh M. Patel¹•Institutions (1)

University of Wisconsin-Madison¹

1 Jun 2014

TL;DR: This paper presents a technique called WideTable, which is built by denormalizing the database, and then converting complex queries into simple scans on the underlying (wide) table, to improve the speed of analytical data processing systems.

...read moreread less

Abstract: This paper presents a technique called WideTable that aims to improve the speed of analytical data processing systems. A WideTable is built by denormalizing the database, and then converting complex queries into simple scans on the underlying (wide) table. To avoid the pitfalls associated with denormalization, e.g. space overheads, WideTable uses a combination of techniques including dictionary encoding and columnar storage. When denormalizing the data, WideTable uses outer joins to ensure that queries on tables in the schema graph, which are now nested as embedded tables in the WideTable, are processed correctly. Then, using a packed code scan technique, even complex queries on the original database can be answered by using simple scans on the WideTable(s). We experimentally evaluate our methods in a main memory setting using the queries in TPC-H, and demonstrate the effectiveness of our methods, both in terms of raw query performance and scalability when running on many-core machines.

...read moreread less

61 citations

Patent•

Automatic and transparent denormalization support, wherein denormalization is achieved through appending of fields to base relations of a normalized database

[...]

John D. Conley, Richard P. Whitehurst¹•Institutions (1)

CA Technologies¹

30 Mar 1990

TL;DR: In this article, a system is proposed to enable a database administrator to selectively denormalize a database transparently to users and programmers by keeping a record of the mapping between the denormalized fields and the base fields from which they are derived.

...read moreread less

Abstract: A system may be used to enable a database administrator to selectively denormalize a database transparently to users and programmers. The system keeps a record of the mapping between the denormalized fields and the base fields from which they are derived. Processors access those recorded links to keep the database self-consistent and to retrieve data from denormalized fields whenever possible.

...read moreread less

60 citations

Journal Article•10.3233/IDA-2011-0485•

Data set preprocessing and transformation in a database system

[...]

Carlos Ordonez¹•Institutions (1)

University of Houston¹

1 Jun 2011

TL;DR: This article presents a summary of the experience and recommendations to compute data set preprocessing and transformation inside a database system, which is the most time-consuming task in data mining projects, and identifies advantages and disadvantages from a practical standpoint based on data mining users feedback.

...read moreread less

Abstract: In general, there is a significant amount of data mining analysis performed outside a database system, which creates many data management issues This article presents a summary of our experience and recommendations to compute data set preprocessing and transformation inside a database system (ie data cleaning, record selection, summarization, denormalization, variable creation, coding), which is the most time-consuming task in data mining projects This aspect is largely ignored in the literature We present practical issues, common solutions and lessons learned when preparing and transforming data sets with the SQL language, based on experience from real-life projects We then provide specific guidelines to translate programs written in a traditional programming language into SQL statements Based on successful real-life projects, we present time performance comparisons between SQL code running inside the database system and external data mining programs We highlight which steps in data mining projects become faster when processed by the database system More importantly, we identify advantages and disadvantages from a practical standpoint based on data mining users feedback

...read moreread less

60 citations

Patent•

Denormalization system and method of operation

[...]

Michael P. Taborn¹, Steven Michael Burchfiel¹, David Terrence Matheny¹•Institutions (1)

IBM¹

27 Feb 1995

TL;DR: In this article, a method for denormalizing a floating point result is presented, which uses the same pipeline resources by means of the floating point unit feedback path and uses one of the exponent equalizing alignment shifters and an incrementor to round the denormalized result.

...read moreread less

Abstract: A system and method for denormalizing a floating point result is disclosed. Denormalized operands are capable of representing much smaller values than can be represented by a number normalized under the ANSI/IEEE standard 754-1985 that governs the representation of numbers in floating point notation to ensure uniformity among floating point notation users. The majority of results will be normalized operands and therefore the floating point unit pipeline is optimized to produce normalized results but contains wider exponent fields in order to represent values received as denormalized numbers. In order to return the result as a denormalized number with the smaller ANSI/IEEE exponent field, denormalization is accomplished by using the same pipeline resources by means of the floating point unit feedback path and uses one of the exponent equalizing alignment shifters and an incrementor in order to round the denormalized result. In this way, denormalized results can be provided without stopping the dispatching of instructions, without providing status bits in the register files and rename registers and without the hold signals often present in other floating point units to accomplish denormalization.

...read moreread less

58 citations

...

Expand

Year	Papers
2021	3
2020	9
2019	7
2018	13
2017	9
2016	10

Topic Tools

Papers published on a yearly basis

Papers

Bayesian Classifiers Programmed in SQL

WideTable: an accelerator for analytical data processing

Automatic and transparent denormalization support, wherein denormalization is achieved through appending of fields to base relations of a normalized database

Data set preprocessing and transformation in a database system

Denormalization system and method of operation

Related Topics (5)

Performance Metrics