Top 668 papers published in the topic of Column (database) in 2019

Showing papers on "Column (database) published in 2019"

Proceedings Article•10.1145/3292500.3330993•

Sherlock: A Deep Learning Approach to Semantic Data Type Detection

[...]

Madelon Hulsebos¹, Kevin Hu¹, Michiel A. Bakker¹, Emanuel Zgraggen¹, Arvind Satyanarayan¹, Tim Kraska¹, Çağatay Demiralp, César A. Hidalgo¹ - Show less +4 more•Institutions (1)

Massachusetts Institute of Technology¹

25 Jul 2019

TL;DR: Sherlock is introduced, a multi-input deep neural network for detecting semantic types that achieves a support-weighted F$_1 score of $0.89, exceeding that of machine learning baselines, dictionary and regular expression benchmarks, and the consensus of crowdsourced annotations.

...read moreread less

Abstract: Correctly detecting the semantic type of data columns is crucial for data science tasks such as automated data cleaning, schema matching, and data discovery. Existing data preparation and analysis systems rely on dictionary lookups and regular expression matching to detect semantic types. However, these matching-based approaches often are not robust to dirty data and only detect a limited number of types. We introduce Sherlock, a multi-input deep neural network for detecting semantic types. We train Sherlock on $686,765$ data columns retrieved from the VizNet corpus by matching $78$ semantic types from DBpedia to column headers. We characterize each matched column with $1,588$ features describing the statistical properties, character distributions, word embeddings, and paragraph vectors of column values. Sherlock achieves a support-weighted F$_1$ score of $0.89$, exceeding that of machine learning baselines, dictionary and regular expression benchmarks, and the consensus of crowdsourced annotations.

...read moreread less

206 citations

Proceedings Article•10.1109/ICDAR.2019.00029•

TableNet: Deep Learning Model for End-to-end Table Detection and Tabular Data Extraction from Scanned Document Images

[...]

Shubham Paliwal, Vishwanath D, Rohit Rahul, Monika Sharma, Lovekesh Vig - Show less +1 more

1 Sep 2019

TL;DR: This paper proposed TableNet, a novel end-to-end deep learning model for both table detection and structure recognition, which exploits the interdependence between the twin tasks of table detecting and table structure recognition to segment out the table and column regions.

...read moreread less

Abstract: With the widespread use of mobile phones and scanners to photograph and upload documents, the need for extracting the information trapped in unstructured document images such as retail receipts, insurance claim forms and financial invoices is becoming more acute. A major hurdle to this objective is that these images often contain information in the form of tables and extracting data from tabular sub-images presents a unique set of challenges. This includes accurate detection of the tabular region within an image, and subsequently detecting and extracting information from the rows and columns of the detected table. While some progress has been made in table detection, extracting the table contents is still a challenge since this involves more fine grained table structure(rows & columns) recognition. Prior approaches have attempted to solve the table detection and structure recognition problems independently using two separate models. In this paper, we propose TableNet: a novel end-to-end deep learning model for both table detection and structure recognition. The model exploits the interdependence between the twin tasks of table detection and table structure recognition to segment out the table and column regions. This is followed by semantic rule-based row extraction from the identified tabular sub-regions. The proposed model and extraction approach was evaluated on the publicly available ICDAR 2013 and Marmot Table datasets obtaining state of the art results. Additionally, we demonstrate that feeding additional semantic features further improves model performance and that the model exhibits transfer learning across datasets. Another contribution of this paper is to provide additional table structure annotations for the Marmot data, which currently only has annotations for table detection.

...read moreread less

167 citations

Journal Article•10.1609/AAAI.V33I01.330129•

ColNet: Embedding the Semantics of Web Tables for Column Type Prediction

[...]

Jiaoyan Chen¹, Ernesto Jiménez-Ruiz², Ian Horrocks¹, Charles Sutton³•Institutions (3)

University of Oxford¹, The Turing Institute², University of Edinburgh³

17 Jul 2019

TL;DR: In this paper, a neural network based column type annotation framework named ColNet is proposed, which is able to integrate KB reasoning and lookup with machine learning and can automatically train Convolutional Neural Networks for prediction.

...read moreread less

Abstract: Automatically annotating column types with knowledge base (KB) concepts is a critical task to gain a basic understanding of web tables. Current methods rely on either table metadata like column name or entity correspondences of cells in the KB, and may fail to deal with growing web tables with incomplete meta information. In this paper we propose a neural network based column type annotation framework named ColNet which is able to integrate KB reasoning and lookup with machine learning and can automatically train Convolutional Neural Networks for prediction. The prediction model not only considers the contextual semantics within a cell using word representation, but also embeds the semantics of a column by learning locality features from multiple cells. The method is evaluated with DBPedia and two different web table datasets, T2Dv2 from the general Web and Limaye from Wikipedia pages, and achieves higher performance than the state-of-the-art approaches.

...read moreread less

102 citations

Proceedings Article•10.1109/ICDAR.2019.00226•

DeepTabStR: Deep Learning based Table Structure Recognition

[...]

Shoaib Ahmed Siddiqui¹, Imran Ali Fateh¹, Syed Tahseen Raza Rizvi¹, Andreas Dengel¹, Sheraz Ahmed¹ - Show less +1 more•Institutions (1)

German Research Centre for Artificial Intelligence¹

1 Sep 2019

TL;DR: A novel method for the analysis of tabular structures in document images using the potential of deformable convolutional networks using the famous Page-Object Detection dataset, and a new image-based table structure recognition dataset, TabStructDB2, comprising of 1081 tables densely labeled with row and column information.

...read moreread less

Abstract: This paper presents a novel method for the analysis of tabular structures in document images using the potential of deformable convolutional networks. In order to assess the suitability of the model to the task of table structure recognition, most of the prior methods have been tested on the smaller ICDAR-13 table structure recognition dataset comprising of just 156 tables. We curated a new image-based table structure recognition dataset, TabStructDB2, comprising of 1081 tables densely labeled with row and column information. Instead of collecting new images for this purpose, we leveraged the famous Page-Object Detection dataset from ICDAR-17, and added structural information for all the tabular regions present in the dataset. This new publicly available dataset will enable the development of more sophisticated table structure recognition techniques in the future. We performed extensive evaluation on the two datasets (ICDAR-13 and TabStructDB) including crossdataset testing in order to evaluate the efficacy of the proposed approach. We achieved state-of-the-art results with deformable models on ICDAR-13 with an average F-Measure of 92.98% (89.42% for rows and 96.55% for columns) and report baseline results on TabStructDB for guiding future research efforts with an F-Measure of 93.72% (91.26% for rows and 95.59% for columns). Despite promising results, structural analysis of tables with arbitrary layouts is still far from achievable at this point.

...read moreread less

91 citations

Proceedings Article•10.1145/3343031.3350898•

Improving the Learning of Multi-column Convolutional Neural Network for Crowd Counting

[...]

Zhi-Qi Cheng¹, Jun-Xiu Li¹, Qi Dai², Xiao Wu¹, Jun-Yan He¹, Alexander G. Hauptmann³ - Show less +2 more•Institutions (3)

Southwest Jiaotong University¹, Microsoft², Carnegie Mellon University³

15 Oct 2019

TL;DR: Extensive experiments on four challenging benchmarks show that McML can significantly improve the original multi-column networks and outperform the other state-of-the-art approaches.

...read moreread less

Abstract: Tremendous variation in the scale of people/head size is a critical problem for crowd counting. To improve the scale invariance of feature representation, recent works extensively employ Convolutional Neural Networks with multi-column structures to handle different scales and resolutions. However, due to the substantial redundant parameters in columns, existing multi-column networks invariably exhibit almost the same scale features in different columns, which severely affects counting accuracy and leads to overfitting. In this paper, we attack this problem by proposing a novel Multicolumn Mutual Learning (McML) strategy. It has two main innovations: 1) A statistical network is incorporated into the multi-column framework to estimate the mutual information between columns, which can approximately indicate the scale correlation between features from different columns. By minimizing the mutual information, each column is guided to learn features with different image scales. 2) We devise a mutual learning scheme that can alternately optimize each column while keeping the other columns fixed on each mini-batch training data. With such asynchronous parameter update process, each column is inclined to learn different feature representation from others, which can efficiently reduce the parameter redundancy and improve generalization ability. More remarkably, McML can be applied to all existing multi-column networks and is end-to-end trainable. Extensive experiments on four challenging benchmarks show that McML can significantly improve the original multi-column networks and outperform the other state-of-the-art approaches.

...read moreread less

84 citations

Proceedings Article•10.18653/V1/D19-1310•

Table-to-Text Generation with Effective Hierarchical Encoder on Three Dimensions (Row, Column and Time).

[...]

Heng Gong¹, Xiaocheng Feng¹, Bing Qin¹, Ting Liu¹•Institutions (1)

Harbin Institute of Technology¹

1 Nov 2019

TL;DR: This work develops a table cell fusion gate to combine representations from row, column and time dimension into one dense vector according to the saliency of each dimension’s representation.

...read moreread less

Abstract: Although Seq2Seq models for table-to-text generation have achieved remarkable progress, modeling table representation in one dimension is inadequate. This is because (1) the table consists of multiple rows and columns, which means that encoding a table should not depend only on one dimensional sequence or set of records and (2) most of the tables are time series data (e.g. NBA game data, stock market data), which means that the description of the current table may be affected by its historical data. To address aforementioned problems, not only do we model each table cell considering other records in the same row, we also enrich table’s representation by modeling each table cell in context of other cells in the same column or with historical (time dimension) data respectively. In addition, we develop a table cell fusion gate to combine representations from row, column and time dimension into one dense vector according to the saliency of each dimension’s representation. We evaluated our methods on ROTOWIRE, a benchmark dataset of NBA basketball games. Both automatic and human evaluation results demonstrate the effectiveness of our model with improvement of 2.66 in BLEU over the strong baseline and outperformance of state-of-the-art model.

...read moreread less

71 citations

Journal Article•10.1002/EQE.3164•

Seismic fragilities of single-column highway bridges with rocking column-footing

[...]

Yazhou Xie¹, Jian Zhang², Reginald DesRoches¹, Jamie E. Padgett¹•Institutions (2)

Rice University¹, University of California, Los Angeles²

01 Jun 2019-Earthquake Engineering & Structural Dynamics

TL;DR: In this paper, the authors evaluated the effectiveness of using rocking isolation as a retrofit strategy for single-column concrete box-girder highway bridges in California, and derived the response class probabilities of rocking uplift and overturning.

...read moreread less

Abstract: Rocking isolation has been increasingly studied as a promising design concept to limit the earthquake damage of civil structures. Despite the difficulties and uncertainties of predicting the rocking response under individual earthquake excitations (due to negative rotational stiffness and complex impact energy loss), in a statistical sense, the seismic performance of rocking structures have been shown to be generally consistent with the experimental outcomes. To this end, this study assesses, in a probabilistic manner, the effectiveness of using rocking isolation as a retrofit strategy for single-column concrete box-girder highway bridges in California. Under earthquake excitation, the rocking bridge could experience multi-class responses (e.g., full contacted or uplifting foundation) and multi-mode damage (e.g., overturning, uplift impact, and column nonlinearity). A multi-step machine learning framework is developed to estimate the damage probability associated with each damage scenario. The framework consists of the dimensionally consistent generalized linear model for regression of seismic demand, the logistic regression for classification of distinct response classes, and the stepwise regression for feature selection of significant ground motion and structural parameters. Fragility curves are derived to predict the response class probabilities of rocking uplift and overturning, and the conditional damage probabilities such as column vibrational damage and rocking uplift impact damage. The fragility estimates of rocking bridges are compared with those for as-built bridges, indicating that rocking isolation is capable of reducing column damage potential. Additionally, there exists an optimal slenderness angle range that enables the studied bridges to experience much lower overturning tendencies and significantly reduced column damage probabilities at the same time.

...read moreread less

69 citations

Proceedings Article•10.1145/3331184.3331333•

Table2Vec: Neural Word and Entity Embeddings for Table Population and Retrieval

[...]

Li Zhang¹, Shuo Zhang¹, Krisztian Balog¹•Institutions (1)

University of Stavanger¹

18 Jul 2019

TL;DR: Evaluation results show that table embeddings can significantly improve upon the performance of state-of-the-art baselines and be utilized in table-related tasks, row population, column population, and table retrieval.

...read moreread less

Abstract: Tables contain valuable knowledge in a structured form. We employ neural language modeling approaches to embed tabular data into vector spaces. Specifically, we consider different table elements, such caption, column headings, and cells, for training word and entity embeddings. These embeddings are then utilized in three particular table-related tasks, row population, column population, and table retrieval, by incorporating them into existing retrieval models as additional semantic similarity signals. Evaluation results show that table embeddings can significantly improve upon the performance of state-of-the-art baselines.

...read moreread less

69 citations

Journal Article•10.1080/00987913.2019.1644891•

Microsoft Power BI: Extending Excel to Manipulate, Analyze, and Visualize Diverse Data

[...]

Louis Thacher Becker¹, Elyssa M. Gould•Institutions (1)

University of Tennessee¹

29 Jul 2019-Serials Review

TL;DR: This segment of the Sharpest Tool in the Shed column introduces Microsoft’s Power BI software and associated functionality built into recent (2013 and newer) versions of Microsoft's Excel.

...read moreread less

66 citations

Posted Content•

Improving the Learning of Multi-column Convolutional Neural Network for Crowd Counting.

[...]

Zhi-Qi Cheng¹, Jun-Xiu Li¹, Qi Dai², Xiao Wu¹, Jun-Yan He¹, Alexander G. Hauptmann³ - Show less +2 more•Institutions (3)

Southwest Jiaotong University¹, Microsoft², Association for Computing Machinery³

17 Sep 2019-arXiv: Computer Vision and Pattern Recognition

TL;DR: Zhang et al. as discussed by the authors proposed a novel multi-column mutual learning (McML) strategy to improve the scale invariance of feature representation by incorporating a statistical network to estimate the mutual information between columns, which can approximately indicate the scale correlation between features from different columns.

...read moreread less

Abstract: Tremendous variation in the scale of people/head size is a critical problem for crowd counting. To improve the scale invariance of feature representation, recent works extensively employ Convolutional Neural Networks with multi-column structures to handle different scales and resolutions. However, due to the substantial redundant parameters in columns, existing multi-column networks invariably exhibit almost the same scale features in different columns, which severely affects counting accuracy and leads to overfitting. In this paper, we attack this problem by proposing a novel Multi-column Mutual Learning (McML) strategy. It has two main innovations: 1) A statistical network is incorporated into the multi-column framework to estimate the mutual information between columns, which can approximately indicate the scale correlation between features from different columns. By minimizing the mutual information, each column is guided to learn features with different image scales. 2) We devise a mutual learning scheme that can alternately optimize each column while keeping the other columns fixed on each mini-batch training data. With such asynchronous parameter update process, each column is inclined to learn different feature representation from others, which can efficiently reduce the parameter redundancy and improve generalization ability. More remarkably, McML can be applied to all existing multi-column networks and is end-to-end trainable. Extensive experiments on four challenging benchmarks show that McML can significantly improve the original multi-column networks and outperform the other state-of-the-art approaches.

...read moreread less

63 citations

Proceedings Article•10.1109/ICDAR.2019.00225•

Rethinking Semantic Segmentation for Table Structure Recognition in Documents

[...]

Shoaib Ahmed Siddiqui¹, Pervaiz Iqbal Khan¹, Andreas Dengel¹, Sheraz Ahmed¹•Institutions (1)

German Research Centre for Artificial Intelligence¹

1 Sep 2019

TL;DR: The obtained results advocate that constraining the problem space in the case of FCN by imposing valid constraints can lead to significant performance gains.

...read moreread less

Abstract: Based on the recent advancements in the domain of semantic segmentation, Fully-Convolutional Networks (FCN) have been successfully applied for the task of table structure recognition in the past. We analyze the efficacy of semantic segmentation networks for this purpose and simplify the problem by proposing prediction tiling based on the consistency assumption which holds for tabular structures. For an image of dimensions H × W, we predict a single column for the rows (ŷ_row ∊ H) and a predict a single row for the columns (ŷ_row ∊ W). We use a dual-headed architecture where initial feature maps (from the encoder-decoder model) are shared while the last two layers generate class specific (row/column) predictions. This allows us to generate predictions using a single model for both rows and columns simultaneously, where previous methods relied on two separate models for inference. With the proposed method, we were able to achieve state-of-the-art results on ICDAR-13 image-based table structure recognition dataset with an average F-Measure of 92.39% (91.90% and 92.88% F-Measure for rows and columns respectively). With the proposed method, we were able to achieve state-of-the-art results on ICDAR-13. The obtained results advocate that constraining the problem space in the case of FCN by imposing valid constraints can lead to significant performance gains.

...read moreread less

Proceedings Article•10.1145/3299869.3319855•

Uni-Detect: A Unified Approach to Automated Error Detection in Tables

[...]

Pei Wang¹, Yeye He²•Institutions (2)

Simon Fraser University¹, Microsoft²

25 Jun 2019

TL;DR: This work proposes \sj, a unified framework to automatically detect diverse types of errors, and finds surprising discoveries of thousands of FD violations, numeric outliers, spelling mistakes, etc., with better accuracy than existing algorithms specifically designed for each type of errors.

...read moreread less

Abstract: Data errors are ubiquitous in tables. Extensive research in this area has resulted in a rich variety of techniques, each often targeting a specific type of errors, e.g., numeric outliers, constraint violations, etc. While these diverse techniques clearly improve data quality, it places a significant burden on humans to configure these techniques with suitable rules and parameters for each data set. For example, an expert is expected to define suitable functional-dependencies between column pairs, or tune appropriate thresholds for outlier-detection algorithms, all of which are specific to one individual data set. As a result, users today often hire experts to cleanse only their high-value data sets. We propose \sj, a unified framework to automatically detect diverse types of errors. Our approach employs a novel "what-if'' analysis that performs local data perturbations to reason about data abnormality, leveraging classical hypothesis-tests on a large corpus of tables. We test \sj on a wide variety of tables including Wikipedia tables, and make surprising discoveries of thousands of FD violations, numeric outliers, spelling mistakes, etc., with better accuracy than existing algorithms specifically designed for each type of errors. For example, for spelling mistakes, \sj outperforms the state-of-the-art spell-checker from a commercial search engine.

...read moreread less

Journal Article•10.1609/AAAI.V33I01.3301281•

Meimei: An Efficient Probabilistic Approach for Semantically Annotating Tables

[...]

Kunihiro Takeoka¹, Masafumi Oyamada¹, Shinji Nakadai¹, Takeshi Okadome²•Institutions (2)

NEC¹, Kwansei Gakuin University²

17 Jul 2019

TL;DR: A novel approach for table data annotation that combines a latent probabilistic model with multilabel classifiers, which is more versatile and more accurate, and more efficient due to potential functions based on multi-label classifiers reducing the computational cost for annotation.

...read moreread less

Abstract: Given a large amount of table data, how can we find the tables that contain the contents we want? A naive search fails when the column names are ambiguous, such as if columns containing stock price information are named “Close” in one table and named “P” in another table.One way of dealing with this problem that has been gaining attention is the semantic annotation of table data columns by using canonical knowledge. While previous studies successfully dealt with this problem for specific types of table data such as web tables, it still remains for various other types of table data: (1) most approaches do not handle table data with numerical values, and (2) their predictive performance is not satisfactory.This paper presents a novel approach for table data annotation that combines a latent probabilistic model with multilabel classifiers. It features three advantages over previous approaches due to using highly predictive multi-label classifiers in the probabilistic computation of semantic annotation. (1) It is more versatile due to using multi-label classifiers in the probabilistic model, which enables various types of data such as numerical values to be supported. (2) It is more accurate due to the multi-label classifiers and probabilistic model working together to improve predictive performance. (3) It is more efficient due to potential functions based on multi-label classifiers reducing the computational cost for annotation.Extensive experiments demonstrated the superiority of the proposed approach over state-of-the-art approaches for semantic annotation of real data (183 human-annotated tables obtained from the UCI Machine Learning Repository).

...read moreread less

Journal Article•10.1371/JOURNAL.PONE.0214311•

ukbtools: An R package to manage and query UK Biobank data

[...]

Ken B. Hanscombe¹, Jonathan R. I. Coleman¹, Matthew Traylor², Cathryn M. Lewis¹•Institutions (2)

King's College London¹, University of Cambridge²

31 May 2019-PLOS ONE

TL;DR: ukbtools removes all the upfront data wrangling required to get a single dataset for statistical analysis, and provides a dataset with meaningful variable names, a set of UKB-specific exploratory data analysis tools, disease query functions, and aset of helper functions to explore and write genetic metadata to file.

...read moreread less

Abstract: Introduction The UK Biobank (UKB) is a resource that includes detailed health-related data on about 500,000 individuals and is available to the research community. However, several obstacles limit immediate analysis of the data: data files vary in format, may be very large, and have numerical codes for column names. Results ukbtools removes all the upfront data wrangling required to get a single dataset for statistical analysis. All associated data files are merged into a single dataset with descriptive column names. The package also provides tools to assist in quality control by exploring the primary demographics of subsets of participants; query of disease diagnoses for one or more individuals, and estimating disease frequency relative to a reference variable; and to retrieve genetic metadata. Conclusion Having a dataset with meaningful variable names, a set of UKB-specific exploratory data analysis tools, disease query functions, and a set of helper functions to explore and write genetic metadata to file, will rapidly enable UKB users to undertake their research.

...read moreread less

Journal Article•10.1029/2019JF005258•

Relations Between the Characteristics of Granular Column Collapses and Resultant High-Frequency Seismic Signals

[...]

Maxime Farin¹, Maxime Farin², Anne Mangeney², Julien de Rosny¹, Renaud Toussaint³, Renaud Toussaint⁴, Phuong-Thu Trinh⁵ - Show less +3 more•Institutions (5)

PSL Research University¹, Institut de Physique du Globe de Paris², University of Strasbourg³, University of Oslo⁴, Total S.A.⁵

01 Dec 2019-Journal of Geophysical Research

TL;DR: In this paper, the authors investigated the relationship between the characteristics of granular flows and the generated seismic signals and established empirical scaling laws that can be tested in the field, for a large set of column masses, aspect ratios, particle diameters, and slope angles.

...read moreread less

Abstract: Deducing relations between the dynamic characteristics of landslides and rockfalls and the resultant high-frequency (> 1 Hz) seismic signal is challenging. To investigate relations that can be tested in the field, we conducted laboratory experiments of 3-D granular column collapse on a rough inclined thin plate, for a large set of column masses, aspect ratios, particle diameters, and slope angles. The dynamics of the granular flows were recorded using a high-speed camera, and the generated seismic signal was measured using piezoelectric accelerometers. Empirical scaling laws are established between the characteristics of the granular flows and deposits and that of the generated seismic signals. The radiated seismic energy scales with particle diameter as d3, column mass as M and aspect ratio as a1.1. The increase of the radiated seismic energy as slope angle increases correlates with a similar increase in particle agitation. Based on our experimental results, we revisit scaling laws reported in the field and discuss their possible physical origin. The discrepancy between field and experimental observations can be explained by the complex influence of the substrate on seismic signal and the difference of flow initiation in both cases. However, our empirical scaling laws allow us to determine which flow parameters could be inferred from a given seismic characteristic in the field. In particular, by assuming the flow average speed is known, we show that we can retrieve parameters d, M, and a within a factor of two from the seismic signal.

...read moreread less

Journal Article•10.1109/TKDE.2017.2685998•

Interactive Data Exploration with Smart Drill-Down

[...]

Manas Joglekar¹, Hector Garcia-Molina¹, Aditya Parameswaran²•Institutions (2)

Stanford University¹, University of Illinois at Urbana–Champaign²

01 Jan 2019-IEEE Transactions on Knowledge and Data Engineering

TL;DR: Smart drill-down as mentioned in this paper is an operator for interactively exploring a relational table to discover and summarize "interesting" groups of tuples, each of which is described by a rule.

...read moreread less

Abstract: We present smart drill-down , an operator for interactively exploring a relational table to discover and summarize “interesting” groups of tuples. Each group of tuples is described by a rule . For instance, the rule $(a, b, \star, 1000)$ tells us that there are 1,000 tuples with value $a$ in the first column and $b$ in the second column (and any value in the third column). Smart drill-down presents an analyst with a list of rules that together describe interesting aspects of the table. The analyst can tailor the definition of interesting, and can interactively apply smart drill-down on an existing rule to explore that part of the table. We demonstrate that the underlying optimization problems are NP-Hard , and describe an algorithm for finding the approximately optimal list of rules to display when the user uses a smart drill-down, and a dynamic sampling scheme for efficiently interacting with large tables. Finally, we perform experiments on real datasets on our experimental prototype to demonstrate the usefulness of smart drill-down and study the performance of our algorithms.

...read moreread less

Journal Article•10.1680/JSTBU.17.00006•

Column effective lengths in sway-permitted modular steel-frame buildings

[...]

Guo-Qiang Li¹, Ke Cao¹, Lu Ye¹•Institutions (1)

Tongji University¹

1 Jan 2019

TL;DR: Modular steel-frame buildings (MSFBs) are a fast-evolving alternative to traditional on-site construction, providing factory-level quality and significant time savings, according to industry experts.

...read moreread less

Abstract: Modular steel-frame buildings (MSFBs) are a fast-evolving alternative to traditional on-site construction, providing factory-level quality and significant time savings. The main difference between ...

...read moreread less

Journal Article•10.14778/3358701.3358703•

Discovery and ranking of embedded uniqueness constraints

[...]

Ziheng Wei¹, Uwe Leck², Sebastian Link¹•Institutions (2)

University of Auckland¹, University of Flensburg²

1 Sep 2019

TL;DR: This first study on the discovery of embedded uniqueness constraints (eUCs) shows that the decision variant of discovering a minimal eUC is NP-complete and W[2]-complete, and characterize the maximum possible solution size, and show which families of eUCs attain that size.

...read moreread less

Abstract: Data profiling is an enabler for efficient data management and effective analytics. The discovery of data dependencies is at the core of data profiling. We conduct the first study on the discovery of embedded uniqueness constraints (eUCs). These constraints represents unique column combinations embedded in complete fragments of incomplete data. We showcase their implementation as filtered indexes, and their application in integrity management and query optimization. We show that the decision variant of discovering a minimal eUC is NP-complete and W[2]-complete. We characterize the maximum possible solution size, and show which families of eUCs attain that size. Despite the challenges, experiments with real-world and synthetic benchmark data show that our column(row)-efficient algorithms perform well with a large number of columns(rows), and our hybrid algorithm combines ideas from both. We show how to rank eUCs to help identify relevant eUCs.

...read moreread less

Proceedings Article•10.1145/3299869.3320248•

Natural Language Querying of Complex Business Intelligence Queries

[...]

Jaydeep Sen¹, Fatma Ozcan¹, Abdul Quamar¹, Greg Stager¹, Ashish Mittal¹, Manasa Jammi¹, Chuan Lei¹, Diptikalyan Saha¹, Karthik Sankaranarayanan¹ - Show less +5 more•Institutions (1)

IBM¹

25 Jun 2019

TL;DR: This work presents a NLIDB system with extended capabilities for business applications that require complex nested SQL queries without prior training or feedback from human in-the-loop, and uses novel algorithms that combine linguistic analysis with deep domain reasoning for solving core challenges in handling nested queries.

...read moreread less

Abstract: Natural Language Interface to Database (NLIDB) eliminates the need for an end user to use complex query languages like SQL by translating the input natural language statements to SQL automatically. Although NLIDB systems have seen rapid growth of interest recently, the current state-of-the-art systems can at best handle point queries to retrieve certain column values satisfying some filters, or aggregation queries involving basic SQL aggregation functions. In this demo, we showcase our NLIDB system with extended capabilities for business applications that require complex nested SQL queries without prior training or feedback from human in-the-loop. In particular, our system uses novel algorithms that combine linguistic analysis with deep domain reasoning for solving core challenges in handling nested queries. To demonstrate the capabilities, we propose a new benchmark dataset containing realistic business intelligence queries, conforming to an ontology derived from FIBO and FRO financial ontologies. In this demo, we will showcase a wide range of complex business intelligence queries against our benchmark dataset, with increasing level of complexity. The users will be able to examine the SQL queries generated, and also will be provided with an English description of the interpretation.

...read moreread less

Posted Content•

Table-to-Text Generation with Effective Hierarchical Encoder on Three Dimensions (Row, Column and Time)

[...]

Heng Gong¹, Xiaocheng Feng¹, Bing Qin¹, Ting Liu¹•Institutions (1)

Harbin Institute of Technology¹

05 Sep 2019-arXiv: Computation and Language

TL;DR: Li et al. as mentioned in this paper proposed a table cell fusion gate to combine representations from row, column and time dimension into one dense vector according to the saliency of each dimension's representation.

...read moreread less

Abstract: Although Seq2Seq models for table-to-text generation have achieved remarkable progress, modeling table representation in one dimension is inadequate. This is because (1) the table consists of multiple rows and columns, which means that encoding a table should not depend only on one dimensional sequence or set of records and (2) most of the tables are time series data (e.g. NBA game data, stock market data), which means that the description of the current table may be affected by its historical data. To address aforementioned problems, not only do we model each table cell considering other records in the same row, we also enrich table's representation by modeling each table cell in context of other cells in the same column or with historical (time dimension) data respectively. In addition, we develop a table cell fusion gate to combine representations from row, column and time dimension into one dense vector according to the saliency of each dimension's representation. We evaluated our methods on ROTOWIRE, a benchmark dataset of NBA basketball games. Both automatic and human evaluation results demonstrate the effectiveness of our model with improvement of 2.66 in BLEU over the strong baseline and outperformance of state-of-the-art model.

...read moreread less

Journal Article•10.1061/(ASCE)ST.1943-541X.0002295•

Database and Review of Beam-to-Column Connections for Seismic Design of Composite Special Moment Frames

[...]

Zhichao Lai¹, Erica C. Fischer², Amit H. Varma³•Institutions (3)

Fuzhou University¹, Oregon State University², Purdue University³

01 May 2019-Journal of Structural Engineering-asce

TL;DR: This paper reviews the experimental research on beam-to-column connections for composite special moment frames (C-SMFs) and presents an experimental database consisting of 165 tests conduct...

...read moreread less

Abstract: This paper reviews the experimental research on beam-to-column connections for composite special moment frames (C-SMFs) and presents an experimental database consisting of 165 tests conduct...

...read moreread less

Journal Article•10.14778/3352063.3352123•

Native store extension for SAP HANA

[...]

1 Aug 2019

TL;DR: An overview of SAP HANA’s Native Store Extension (NSE), based on a hybrid in-memory and paged column store architecture composed from data access primitives, is presented, which substantially increases database capacity, allowing to scale far beyond available system memory.

...read moreread less

Abstract: We present an overview of SAP HANA's Native Store Extension (NSE). This extension substantially increases database capacity, allowing to scale far beyond available system memory. NSE is based on a hybrid in-memory and paged column store architecture composed from data access primitives. These primitives enable the processing of hybrid columns using the same algorithms optimized for traditional HANA's in-memory columns. Using only three key primitives, we fabricated byte-compatible counterparts for complex memory resident data structures (e.g. dictionary and hash-index), compressed schemes (e.g. sparse and run-length encoding), and exotic data types (e.g. geo-spatial). We developed a new buffer cache which optimizes the management of paged resources by smart strategies sensitive to page type and access patterns. The buffer cache integrates with HANA's new execution engine that issues pipelined prefetch requests to improve disk access patterns. A novel load unit configuration, along with a unified persistence format, allows the hybrid column store to dynamically switch between in-memory and paged data access to balance performance and storage economy according to application demands while reducing Total Cost of Ownership (TCO). A new partitioning scheme supports load unit specification at table, partition, and column level. Finally, a new advisor recommends optimal load unit configurations. Our experiments illustrate the performance and memory footprint improvements on typical customer scenarios.

...read moreread less

Journal Article•10.1109/MC.2019.2926614•

The IEEE Standard 754: One for the History Books

[...]

David G. Hough

21 Nov 2019-IEEE Computer

TL;DR: The history of IEEE Standard 754, the Standard for Floating Point Arithmetic, which has flourished in many commercial microprocessors and other computer platforms, is recounted.

...read moreread less

Abstract: IEEE Standard 754, Standard for Floating Point Arithmetic, had its beginnings more than 40 years ago. Implementations of the standard have flourished in many commercial microprocessors and other computer platforms. In June, a revision of the standard was approved by the IEEE Standards Association Standards Board. This column recounts some of the interesting history behind the standard.

...read moreread less

Posted Content•

Combinatorial inequalities

[...]

Igor Pak

02 Apr 2019-arXiv: Combinatorics

TL;DR: This is an expanded version of the Notices of the AMS column with the same title, but the authors added acknowledgements and a large number of endnotes which provide the context and the references.

...read moreread less

Abstract: This is an expanded version of the Notices of the AMS column with the same title. The text is unchanged, but we added acknowledgements and a large number of endnotes which provide the context and the references.

...read moreread less

Journal Article•10.1016/J.CHROMA.2018.09.018•

Column selection for comprehensive two-dimensional liquid chromatography using the hydrophobic subtraction model

[...]

Rebecca Lindsey¹, Becky L. Eggimann², Dwight R. Stoll³, Peter W. Carr¹, Mark R. Schure, J. Ilja Siepmann¹ - Show less +2 more•Institutions (3)

University of Minnesota¹, Wheaton College (Illinois)², Gustavus Adolphus College³

29 Mar 2019-Journal of Chromatography A

TL;DR: A computational screening method is proposed whereby virtual 2D chromatograms are calculated utilizing the Snyder-Dolan hydrophobic subtraction model (HSM) for reversed-phase column selectivity and shows a strong sensitivity to the choice of the second dimension column and a preference for those with embedded polar moieties.

...read moreread less

Journal Article•10.36478/JEASCI.2019.1162.1168•

Behavior of Different Materials for Stone Column Construction

[...]

Kwa Sally Fahmi, E.S. Kolosov, Mohammed Y. Fattah

31 Dec 2019-Journal of Engineering and Applied Sciences

Proceedings Article•10.1145/3299869.3320234•

MorphStore - In-Memory Query Processing based on Morphing Compressed Intermediates LIVE

[...]

Dirk Habich¹, Patrick Damme¹, Annett Ungethüm¹, Johannes Pietrzyk¹, Alexander Krause¹, Juliana Hildebrandt¹, Wolfgang Lehner¹ - Show less +3 more•Institutions (1)

Dresden University of Technology¹

25 Jun 2019

TL;DR: MorphStore, an in-memory column store with a novel compression-aware query processing concept, is presented, able to speed up the query execution by morphing compressed intermediate results from one scheme to another scheme to dynamically adapt to the changing data characteristics during query processing.

...read moreread less

Abstract: In this demo, we present MorphStore, an in-memory column store with a novel compression-aware query processing concept. Basically, compression using lightweight integer compression algorithms already plays an important role in existing in-memory column stores, but mainly for base data. The continuous handling of compression from the base data to the intermediate results during query processing has already been discussed, but not investigated in detail since the computational effort for compression as well as decompression is often assumed to exceed the benefits of a reduced transfer cost between CPU and main memory. However, this argument increasingly loses its validity as we are going to show in our demo. Generally, our novel compression-aware query processing concept is characterized by the fact that we are able to speed up the query execution by morphing compressed intermediate results from one scheme to another scheme to dynamically adapt to the changing data characteristics during query processing. Our morphing decisions are made using a cost-based approach.

...read moreread less

Proceedings Article•

Workload-Driven and Robust Selection of Compression Schemes for Column Stores.

[...]

Martin Boissier¹, Max Jendruk•Institutions (1)

Hasso Plattner Institute¹

1 Jan 2019

TL;DR: An automated selection framework for compression configurations for autonomous database systems and introduces Hyrise’s compression framework which implements an efficient and maintainable interface for various column compression techniques.

...read moreread less

Abstract: Modern main memory-optimized column stores employ a variety of compression techniques. Deciding for one compression technique over others for a given memory budget can be challenging since each technique has different trade-offs whose impact on large workloads is not obvious. We present an automated selection framework for compression configurations. Most database systems provide means to automatically choose a compression configuration but lack two crucial properties: The compression selection cannot be constrained (e.g., by a given storage budget) and robustness of the compression configuration is not considered. Our approach uses workload information to determine robust configurations under the given constraints. The runtime performance of the various compression techniques is estimated using adapted regression models. 1 COLUMN COMPRESSION IN HYRISE Two of the main driving forces of current database development – both industrial and research – are autonomous database systems and cloud-based installations. Both topics are strongly connected as database vendors are increasingly interested in optimizing their operational costs for large self-hosted database installations. One way to lower the costs – especially for main memoryoptimized database systems – is to reduce the memory consumption of large databases. Such a reduction allows storing databases on smaller and thus less expensive server machines or adding more instances to a shared server. But the sheer size of large cloud installations hampers manual optimization of compression configurations by database administrators. This development has recently sparked the research on autonomous database systems. The work presented in this paper is an intermediate step to approach the issue of optimizing memory consumption while still retaining the performance advantages of main memoryoptimized databases. When cost considerations are gaining importance, the optimization objective for compression configurations is less runtime performance rather than to retain the current runtime performance while minimizing the storage requirements. With the goal of automatically finding a compression configuration for a given memory budget, this project intends to provide the building blocks for autonomous systems. The area of data compression has been thoroughly studied for decades in database research. Virtually all modern database systems implement various techniques to compress data and most commercial systems further provide means to adjust the compression level (e.g., Oracle’s declarative policies for the automatic data compression (ADO), cf. [12], or SQLServer’s database © 2019 Copyright held by the owner/author(s). Published in Proceedings of the 22nd International Conference on Extending Database Technology (EDBT), March 26-29, 2019, ISBN 978-3-89318-081-3 on OpenProceedings.org. Distribution of this paper is permitted under the terms of the Creative Commons license CC-by-nc-nd 4.0. engine tuning advisor (DTA), cf. [11]). However, we see two distinct issues that remain open from a research perspective: (i) workloadand constraint-based compression configurations and (ii) determination of configurations whose runtime performance is robust to changing workloads. We present and discuss the three main components in our research database Hyrise [10] with which we approach workloaddriven and robust compression configurations: • We introduce Hyrise’s compression framework which implements an efficient and maintainable interface for various column compression techniques (Section 2). • We present our runtime estimation, which predicts the performance of compression techniques (Section 3). • We discuss the applicability of existing approaches for the optimization of physical database designs and how they perform for the task of compression selection (Section 4). 2 COLUMN COMPRESSION FRAMEWORK Virtually every database management system for hybrid transactional and analytical processing (HTAP) employs a variety of compression schemes. Besides the advantage of reducing the main memory footprint, light-weight compression can even improve runtime performance, e.g., by reducing the memory traffic (cf. [2, 4]) or broadening applicability of vectorization (cf. [17]). But supporting a variety of compression schemes is challenging as it needs to balance maintainability and efficiency. Most existing approaches optimize either (i) for performance while hampering maintainability and increasing complexity or (ii) provide unified interfaces for improved maintainability which potentially introduces runtimes issues. 2.1 Hyrise’s Storage Concept Hyrise is a main memory-optimized database with a columnmajor storage format [10]. Each table in Hyrise is horizontally partitioned into n chunks with a predefined maximum size. Each attribute of a table is hence distributed over all chunks whereby a column in a chunk is referred to as a segment. Modifications (i.e., insertions or MVCC-enabled updates) are appended to the most recent mutable chunk. When this chunk reaches its size limit, the chunk is considered immutable and a new mutable chunk is created. Immutable chunks might be compressed asynchronously. Hyrise encodes and compresses segments independently. 2.2 Balancing Performance and

...read moreread less

Journal Article•10.1007/S10586-017-1566-0•

Effective failure nodes detection using matrix calculus algorithm in wireless sensor networks

[...]

R. Palanikumar, K. Ramasamy

01 Sep 2019-Cluster Computing

TL;DR: A new matrix calculus (MCS) method is proposed using which row and column wise round trip paths will be generated and it is possible to detect multiple failure nodes by comparing row wise RTPs delay and columnwise RTPS delay.

...read moreread less

Abstract: Wireless sensor networks are the wireless autonomous networks with a large number of distributed sensor nodes. The quality of service in such WSNs is mainly affected by the failure of sensor nodes. In the existing method, failure sensor nodes are detected by measuring the round trip delay (RTD) time of discrete round trip paths and comparing them with threshold value. A new matrix calculus (MCS) method is proposed using which row and column wise round trip paths will be generated. With the newly generated RTPs it is possible to detect multiple failure nodes by comparing row wise RTPs delay and column wise RTPs delay.

...read moreread less

Journal Article•10.1109/MSP.2019.2898421•

Reproducible Research: Best Practices and Potential Misuse [Perspectives]

[...]

Emil Björnson¹•Institutions (1)

Linköping University¹

26 Apr 2019-IEEE Signal Processing Magazine

TL;DR: The scientific world is becoming more open to the public and fellow researchers as discussed by the authors, and the next step is the open code and data paradigm, which was briefly discussed in the "From the Editor" column in the November 2018 issue of IEEE Signal Processing Magazine (SPM).

...read moreread less

Abstract: The scientific world is becoming more open to the public and fellow researchers. Open access publishing is becoming accepted, even if some publishers are resisting. The next step is the open code and data paradigm, which was briefly discussed in the "From the Editor" column in the November 2018 issue of IEEE Signal Processing Magazine (SPM) [1]. In this column, I follow up on this topic by sharing my experiences, best practices, and thoughts about reproducible research.

...read moreread less

...

Expand