TL;DR: Sherlock is introduced, a multi-input deep neural network for detecting semantic types that achieves a support-weighted F$_1 score of $0.89, exceeding that of machine learning baselines, dictionary and regular expression benchmarks, and the consensus of crowdsourced annotations.
Abstract: Correctly detecting the semantic type of data columns is crucial for data science tasks such as automated data cleaning, schema matching, and data discovery. Existing data preparation and analysis systems rely on dictionary lookups and regular expression matching to detect semantic types. However, these matching-based approaches often are not robust to dirty data and only detect a limited number of types. We introduce Sherlock, a multi-input deep neural network for detecting semantic types. We train Sherlock on $686,765$ data columns retrieved from the VizNet corpus by matching $78$ semantic types from DBpedia to column headers. We characterize each matched column with $1,588$ features describing the statistical properties, character distributions, word embeddings, and paragraph vectors of column values. Sherlock achieves a support-weighted F$_1$ score of $0.89$, exceeding that of machine learning baselines, dictionary and regular expression benchmarks, and the consensus of crowdsourced annotations.
TL;DR: This paper proposed TableNet, a novel end-to-end deep learning model for both table detection and structure recognition, which exploits the interdependence between the twin tasks of table detecting and table structure recognition to segment out the table and column regions.
Abstract: With the widespread use of mobile phones and scanners to photograph and upload documents, the need for extracting the information trapped in unstructured document images such as retail receipts, insurance claim forms and financial invoices is becoming more acute. A major hurdle to this objective is that these images often contain information in the form of tables and extracting data from tabular sub-images presents a unique set of challenges. This includes accurate detection of the tabular region within an image, and subsequently detecting and extracting information from the rows and columns of the detected table. While some progress has been made in table detection, extracting the table contents is still a challenge since this involves more fine grained table structure(rows & columns) recognition. Prior approaches have attempted to solve the table detection and structure recognition problems independently using two separate models. In this paper, we propose TableNet: a novel end-to-end deep learning model for both table detection and structure recognition. The model exploits the interdependence between the twin tasks of table detection and table structure recognition to segment out the table and column regions. This is followed by semantic rule-based row extraction from the identified tabular sub-regions. The proposed model and extraction approach was evaluated on the publicly available ICDAR 2013 and Marmot Table datasets obtaining state of the art results. Additionally, we demonstrate that feeding additional semantic features further improves model performance and that the model exhibits transfer learning across datasets. Another contribution of this paper is to provide additional table structure annotations for the Marmot data, which currently only has annotations for table detection.
TL;DR: In this paper, a neural network based column type annotation framework named ColNet is proposed, which is able to integrate KB reasoning and lookup with machine learning and can automatically train Convolutional Neural Networks for prediction.
Abstract: Automatically annotating column types with knowledge base (KB) concepts is a critical task to gain a basic understanding of web tables. Current methods rely on either table metadata like column name or entity correspondences of cells in the KB, and may fail to deal with growing web tables with incomplete meta information. In this paper we propose a neural network based column type annotation framework named ColNet which is able to integrate KB reasoning and lookup with machine learning and can automatically train Convolutional Neural Networks for prediction. The prediction model not only considers the contextual semantics within a cell using word representation, but also embeds the semantics of a column by learning locality features from multiple cells. The method is evaluated with DBPedia and two different web table datasets, T2Dv2 from the general Web and Limaye from Wikipedia pages, and achieves higher performance than the state-of-the-art approaches.
TL;DR: A novel method for the analysis of tabular structures in document images using the potential of deformable convolutional networks using the famous Page-Object Detection dataset, and a new image-based table structure recognition dataset, TabStructDB2, comprising of 1081 tables densely labeled with row and column information.
Abstract: This paper presents a novel method for the analysis of tabular structures in document images using the potential of deformable convolutional networks. In order to assess the suitability of the model to the task of table structure recognition, most of the prior methods have been tested on the smaller ICDAR-13 table structure recognition dataset comprising of just 156 tables. We curated a new image-based table structure recognition dataset, TabStructDB2, comprising of 1081 tables densely labeled with row and column information. Instead of collecting new images for this purpose, we leveraged the famous Page-Object Detection dataset from ICDAR-17, and added structural information for all the tabular regions present in the dataset. This new publicly available dataset will enable the development of more sophisticated table structure recognition techniques in the future. We performed extensive evaluation on the two datasets (ICDAR-13 and TabStructDB) including crossdataset testing in order to evaluate the efficacy of the proposed approach. We achieved state-of-the-art results with deformable models on ICDAR-13 with an average F-Measure of 92.98% (89.42% for rows and 96.55% for columns) and report baseline results on TabStructDB for guiding future research efforts with an F-Measure of 93.72% (91.26% for rows and 95.59% for columns). Despite promising results, structural analysis of tables with arbitrary layouts is still far from achievable at this point.
TL;DR: Extensive experiments on four challenging benchmarks show that McML can significantly improve the original multi-column networks and outperform the other state-of-the-art approaches.
Abstract: Tremendous variation in the scale of people/head size is a critical problem for crowd counting. To improve the scale invariance of feature representation, recent works extensively employ Convolutional Neural Networks with multi-column structures to handle different scales and resolutions. However, due to the substantial redundant parameters in columns, existing multi-column networks invariably exhibit almost the same scale features in different columns, which severely affects counting accuracy and leads to overfitting. In this paper, we attack this problem by proposing a novel Multicolumn Mutual Learning (McML) strategy. It has two main innovations: 1) A statistical network is incorporated into the multi-column framework to estimate the mutual information between columns, which can approximately indicate the scale correlation between features from different columns. By minimizing the mutual information, each column is guided to learn features with different image scales. 2) We devise a mutual learning scheme that can alternately optimize each column while keeping the other columns fixed on each mini-batch training data. With such asynchronous parameter update process, each column is inclined to learn different feature representation from others, which can efficiently reduce the parameter redundancy and improve generalization ability. More remarkably, McML can be applied to all existing multi-column networks and is end-to-end trainable. Extensive experiments on four challenging benchmarks show that McML can significantly improve the original multi-column networks and outperform the other state-of-the-art approaches.
TL;DR: This work develops a table cell fusion gate to combine representations from row, column and time dimension into one dense vector according to the saliency of each dimension’s representation.
Abstract: Although Seq2Seq models for table-to-text generation have achieved remarkable progress, modeling table representation in one dimension is inadequate. This is because (1) the table consists of multiple rows and columns, which means that encoding a table should not depend only on one dimensional sequence or set of records and (2) most of the tables are time series data (e.g. NBA game data, stock market data), which means that the description of the current table may be affected by its historical data. To address aforementioned problems, not only do we model each table cell considering other records in the same row, we also enrich table’s representation by modeling each table cell in context of other cells in the same column or with historical (time dimension) data respectively. In addition, we develop a table cell fusion gate to combine representations from row, column and time dimension into one dense vector according to the saliency of each dimension’s representation. We evaluated our methods on ROTOWIRE, a benchmark dataset of NBA basketball games. Both automatic and human evaluation results demonstrate the effectiveness of our model with improvement of 2.66 in BLEU over the strong baseline and outperformance of state-of-the-art model.
TL;DR: In this paper, the authors evaluated the effectiveness of using rocking isolation as a retrofit strategy for single-column concrete box-girder highway bridges in California, and derived the response class probabilities of rocking uplift and overturning.
Abstract: Rocking isolation has been increasingly studied as a promising design concept to limit the earthquake damage of civil structures. Despite the difficulties and uncertainties of predicting the rocking response under individual earthquake excitations (due to negative rotational stiffness and complex impact energy loss), in a statistical sense, the seismic performance of rocking structures have been shown to be generally consistent with the experimental outcomes. To this end, this study assesses, in a probabilistic manner, the effectiveness of using rocking isolation as a retrofit strategy for single-column concrete box-girder highway bridges in California. Under earthquake excitation, the rocking bridge could experience multi-class responses (e.g., full contacted or uplifting foundation) and multi-mode damage (e.g., overturning, uplift impact, and column nonlinearity). A multi-step machine learning framework is developed to estimate the damage probability associated with each damage scenario. The framework consists of the dimensionally consistent generalized linear model for regression of seismic demand, the logistic regression for classification of distinct response classes, and the stepwise regression for feature selection of significant ground motion and structural parameters. Fragility curves are derived to predict the response class probabilities of rocking uplift and overturning, and the conditional damage probabilities such as column vibrational damage and rocking uplift impact damage. The fragility estimates of rocking bridges are compared with those for as-built bridges, indicating that rocking isolation is capable of reducing column damage potential. Additionally, there exists an optimal slenderness angle range that enables the studied bridges to experience much lower overturning tendencies and significantly reduced column damage probabilities at the same time.
TL;DR: Evaluation results show that table embeddings can significantly improve upon the performance of state-of-the-art baselines and be utilized in table-related tasks, row population, column population, and table retrieval.
Abstract: Tables contain valuable knowledge in a structured form. We employ neural language modeling approaches to embed tabular data into vector spaces. Specifically, we consider different table elements, such caption, column headings, and cells, for training word and entity embeddings. These embeddings are then utilized in three particular table-related tasks, row population, column population, and table retrieval, by incorporating them into existing retrieval models as additional semantic similarity signals. Evaluation results show that table embeddings can significantly improve upon the performance of state-of-the-art baselines.
TL;DR: This segment of the Sharpest Tool in the Shed column introduces Microsoft’s Power BI software and associated functionality built into recent (2013 and newer) versions of Microsoft's Excel.
TL;DR: Zhang et al. as discussed by the authors proposed a novel multi-column mutual learning (McML) strategy to improve the scale invariance of feature representation by incorporating a statistical network to estimate the mutual information between columns, which can approximately indicate the scale correlation between features from different columns.
Abstract: Tremendous variation in the scale of people/head size is a critical problem for crowd counting. To improve the scale invariance of feature representation, recent works extensively employ Convolutional Neural Networks with multi-column structures to handle different scales and resolutions. However, due to the substantial redundant parameters in columns, existing multi-column networks invariably exhibit almost the same scale features in different columns, which severely affects counting accuracy and leads to overfitting. In this paper, we attack this problem by proposing a novel Multi-column Mutual Learning (McML) strategy. It has two main innovations: 1) A statistical network is incorporated into the multi-column framework to estimate the mutual information between columns, which can approximately indicate the scale correlation between features from different columns. By minimizing the mutual information, each column is guided to learn features with different image scales. 2) We devise a mutual learning scheme that can alternately optimize each column while keeping the other columns fixed on each mini-batch training data. With such asynchronous parameter update process, each column is inclined to learn different feature representation from others, which can efficiently reduce the parameter redundancy and improve generalization ability. More remarkably, McML can be applied to all existing multi-column networks and is end-to-end trainable. Extensive experiments on four challenging benchmarks show that McML can significantly improve the original multi-column networks and outperform the other state-of-the-art approaches.
TL;DR: The obtained results advocate that constraining the problem space in the case of FCN by imposing valid constraints can lead to significant performance gains.
Abstract: Based on the recent advancements in the domain of semantic segmentation, Fully-Convolutional Networks (FCN) have been successfully applied for the task of table structure recognition in the past. We analyze the efficacy of semantic segmentation networks for this purpose and simplify the problem by proposing prediction tiling based on the consistency assumption which holds for tabular structures. For an image of dimensions H × W, we predict a single column for the rows (ŷ_row ∊ H) and a predict a single row for the columns (ŷ_row ∊ W). We use a dual-headed architecture where initial feature maps (from the encoder-decoder model) are shared while the last two layers generate class specific (row/column) predictions. This allows us to generate predictions using a single model for both rows and columns simultaneously, where previous methods relied on two separate models for inference. With the proposed method, we were able to achieve state-of-the-art results on ICDAR-13 image-based table structure recognition dataset with an average F-Measure of 92.39% (91.90% and 92.88% F-Measure for rows and columns respectively). With the proposed method, we were able to achieve state-of-the-art results on ICDAR-13. The obtained results advocate that constraining the problem space in the case of FCN by imposing valid constraints can lead to significant performance gains.
TL;DR: This work proposes \sj, a unified framework to automatically detect diverse types of errors, and finds surprising discoveries of thousands of FD violations, numeric outliers, spelling mistakes, etc., with better accuracy than existing algorithms specifically designed for each type of errors.
Abstract: Data errors are ubiquitous in tables. Extensive research in this area has resulted in a rich variety of techniques, each often targeting a specific type of errors, e.g., numeric outliers, constraint violations, etc. While these diverse techniques clearly improve data quality, it places a significant burden on humans to configure these techniques with suitable rules and parameters for each data set. For example, an expert is expected to define suitable functional-dependencies between column pairs, or tune appropriate thresholds for outlier-detection algorithms, all of which are specific to one individual data set. As a result, users today often hire experts to cleanse only their high-value data sets. We propose \sj, a unified framework to automatically detect diverse types of errors. Our approach employs a novel "what-if'' analysis that performs local data perturbations to reason about data abnormality, leveraging classical hypothesis-tests on a large corpus of tables. We test \sj on a wide variety of tables including Wikipedia tables, and make surprising discoveries of thousands of FD violations, numeric outliers, spelling mistakes, etc., with better accuracy than existing algorithms specifically designed for each type of errors. For example, for spelling mistakes, \sj outperforms the state-of-the-art spell-checker from a commercial search engine.
TL;DR: A novel approach for table data annotation that combines a latent probabilistic model with multilabel classifiers, which is more versatile and more accurate, and more efficient due to potential functions based on multi-label classifiers reducing the computational cost for annotation.
Abstract: Given a large amount of table data, how can we find the tables that contain the contents we want? A naive search fails when the column names are ambiguous, such as if columns containing stock price information are named “Close” in one table and named “P” in another table.One way of dealing with this problem that has been gaining attention is the semantic annotation of table data columns by using canonical knowledge. While previous studies successfully dealt with this problem for specific types of table data such as web tables, it still remains for various other types of table data: (1) most approaches do not handle table data with numerical values, and (2) their predictive performance is not satisfactory.This paper presents a novel approach for table data annotation that combines a latent probabilistic model with multilabel classifiers. It features three advantages over previous approaches due to using highly predictive multi-label classifiers in the probabilistic computation of semantic annotation. (1) It is more versatile due to using multi-label classifiers in the probabilistic model, which enables various types of data such as numerical values to be supported. (2) It is more accurate due to the multi-label classifiers and probabilistic model working together to improve predictive performance. (3) It is more efficient due to potential functions based on multi-label classifiers reducing the computational cost for annotation.Extensive experiments demonstrated the superiority of the proposed approach over state-of-the-art approaches for semantic annotation of real data (183 human-annotated tables obtained from the UCI Machine Learning Repository).
TL;DR: ukbtools removes all the upfront data wrangling required to get a single dataset for statistical analysis, and provides a dataset with meaningful variable names, a set of UKB-specific exploratory data analysis tools, disease query functions, and aset of helper functions to explore and write genetic metadata to file.
Abstract: Introduction The UK Biobank (UKB) is a resource that includes detailed health-related data on about 500,000 individuals and is available to the research community. However, several obstacles limit immediate analysis of the data: data files vary in format, may be very large, and have numerical codes for column names. Results ukbtools removes all the upfront data wrangling required to get a single dataset for statistical analysis. All associated data files are merged into a single dataset with descriptive column names. The package also provides tools to assist in quality control by exploring the primary demographics of subsets of participants; query of disease diagnoses for one or more individuals, and estimating disease frequency relative to a reference variable; and to retrieve genetic metadata. Conclusion Having a dataset with meaningful variable names, a set of UKB-specific exploratory data analysis tools, disease query functions, and a set of helper functions to explore and write genetic metadata to file, will rapidly enable UKB users to undertake their research.
TL;DR: In this paper, the authors investigated the relationship between the characteristics of granular flows and the generated seismic signals and established empirical scaling laws that can be tested in the field, for a large set of column masses, aspect ratios, particle diameters, and slope angles.
Abstract: Deducing relations between the dynamic characteristics of landslides and rockfalls and the resultant high-frequency (> 1 Hz) seismic signal is challenging. To investigate relations that can be tested in the field, we conducted laboratory experiments of 3-D granular column collapse on a rough inclined thin plate, for a large set of column masses, aspect ratios, particle diameters, and slope angles. The dynamics of the granular flows were recorded using a high-speed camera, and the generated seismic signal was measured using piezoelectric accelerometers. Empirical scaling laws are established between the characteristics of the granular flows and deposits and that of the generated seismic signals. The radiated seismic energy scales with particle diameter as d3, column mass as M and aspect ratio as a1.1. The increase of the radiated seismic energy as slope angle increases correlates with a similar increase in particle agitation. Based on our experimental results, we revisit scaling laws reported in the field and discuss their possible physical origin. The discrepancy between field and experimental observations can be explained by the complex influence of the substrate on seismic signal and the difference of flow initiation in both cases. However, our empirical scaling laws allow us to determine which flow parameters could be inferred from a given seismic characteristic in the field. In particular, by assuming the flow average speed is known, we show that we can retrieve parameters d, M, and a within a factor of two from the seismic signal.
TL;DR: Smart drill-down as mentioned in this paper is an operator for interactively exploring a relational table to discover and summarize "interesting" groups of tuples, each of which is described by a rule.
Abstract: We present smart drill-down , an operator for interactively exploring a relational table to discover and summarize “interesting” groups of tuples. Each group of tuples is described by a rule . For instance, the rule $(a, b, \star, 1000)$ tells us that there are 1,000 tuples with value $a$ in the first column and $b$ in the second column (and any value in the third column). Smart drill-down presents an analyst with a list of rules that together describe interesting aspects of the table. The analyst can tailor the definition of interesting, and can interactively apply smart drill-down on an existing rule to explore that part of the table. We demonstrate that the underlying optimization problems are NP-Hard , and describe an algorithm for finding the approximately optimal list of rules to display when the user uses a smart drill-down, and a dynamic sampling scheme for efficiently interacting with large tables. Finally, we perform experiments on real datasets on our experimental prototype to demonstrate the usefulness of smart drill-down and study the performance of our algorithms.
TL;DR: Modular steel-frame buildings (MSFBs) are a fast-evolving alternative to traditional on-site construction, providing factory-level quality and significant time savings, according to industry experts.
Abstract: Modular steel-frame buildings (MSFBs) are a fast-evolving alternative to traditional on-site construction, providing factory-level quality and significant time savings. The main difference between ...
TL;DR: This first study on the discovery of embedded uniqueness constraints (eUCs) shows that the decision variant of discovering a minimal eUC is NP-complete and W[2]-complete, and characterize the maximum possible solution size, and show which families of eUCs attain that size.
Abstract: Data profiling is an enabler for efficient data management and effective analytics. The discovery of data dependencies is at the core of data profiling. We conduct the first study on the discovery of embedded uniqueness constraints (eUCs). These constraints represents unique column combinations embedded in complete fragments of incomplete data. We showcase their implementation as filtered indexes, and their application in integrity management and query optimization. We show that the decision variant of discovering a minimal eUC is NP-complete and W[2]-complete. We characterize the maximum possible solution size, and show which families of eUCs attain that size. Despite the challenges, experiments with real-world and synthetic benchmark data show that our column(row)-efficient algorithms perform well with a large number of columns(rows), and our hybrid algorithm combines ideas from both. We show how to rank eUCs to help identify relevant eUCs.
TL;DR: This work presents a NLIDB system with extended capabilities for business applications that require complex nested SQL queries without prior training or feedback from human in-the-loop, and uses novel algorithms that combine linguistic analysis with deep domain reasoning for solving core challenges in handling nested queries.
Abstract: Natural Language Interface to Database (NLIDB) eliminates the need for an end user to use complex query languages like SQL by translating the input natural language statements to SQL automatically. Although NLIDB systems have seen rapid growth of interest recently, the current state-of-the-art systems can at best handle point queries to retrieve certain column values satisfying some filters, or aggregation queries involving basic SQL aggregation functions. In this demo, we showcase our NLIDB system with extended capabilities for business applications that require complex nested SQL queries without prior training or feedback from human in-the-loop. In particular, our system uses novel algorithms that combine linguistic analysis with deep domain reasoning for solving core challenges in handling nested queries. To demonstrate the capabilities, we propose a new benchmark dataset containing realistic business intelligence queries, conforming to an ontology derived from FIBO and FRO financial ontologies. In this demo, we will showcase a wide range of complex business intelligence queries against our benchmark dataset, with increasing level of complexity. The users will be able to examine the SQL queries generated, and also will be provided with an English description of the interpretation.
TL;DR: Li et al. as mentioned in this paper proposed a table cell fusion gate to combine representations from row, column and time dimension into one dense vector according to the saliency of each dimension's representation.
Abstract: Although Seq2Seq models for table-to-text generation have achieved remarkable progress, modeling table representation in one dimension is inadequate. This is because (1) the table consists of multiple rows and columns, which means that encoding a table should not depend only on one dimensional sequence or set of records and (2) most of the tables are time series data (e.g. NBA game data, stock market data), which means that the description of the current table may be affected by its historical data. To address aforementioned problems, not only do we model each table cell considering other records in the same row, we also enrich table's representation by modeling each table cell in context of other cells in the same column or with historical (time dimension) data respectively. In addition, we develop a table cell fusion gate to combine representations from row, column and time dimension into one dense vector according to the saliency of each dimension's representation. We evaluated our methods on ROTOWIRE, a benchmark dataset of NBA basketball games. Both automatic and human evaluation results demonstrate the effectiveness of our model with improvement of 2.66 in BLEU over the strong baseline and outperformance of state-of-the-art model.
TL;DR: This paper reviews the experimental research on beam-to-column connections for composite special moment frames (C-SMFs) and presents an experimental database consisting of 165 tests conduct...
Abstract: This paper reviews the experimental research on beam-to-column connections for composite special moment frames (C-SMFs) and presents an experimental database consisting of 165 tests conduct...
TL;DR: An overview of SAP HANA’s Native Store Extension (NSE), based on a hybrid in-memory and paged column store architecture composed from data access primitives, is presented, which substantially increases database capacity, allowing to scale far beyond available system memory.
Abstract: We present an overview of SAP HANA's Native Store Extension (NSE). This extension substantially increases database capacity, allowing to scale far beyond available system memory. NSE is based on a hybrid in-memory and paged column store architecture composed from data access primitives. These primitives enable the processing of hybrid columns using the same algorithms optimized for traditional HANA's in-memory columns. Using only three key primitives, we fabricated byte-compatible counterparts for complex memory resident data structures (e.g. dictionary and hash-index), compressed schemes (e.g. sparse and run-length encoding), and exotic data types (e.g. geo-spatial). We developed a new buffer cache which optimizes the management of paged resources by smart strategies sensitive to page type and access patterns. The buffer cache integrates with HANA's new execution engine that issues pipelined prefetch requests to improve disk access patterns. A novel load unit configuration, along with a unified persistence format, allows the hybrid column store to dynamically switch between in-memory and paged data access to balance performance and storage economy according to application demands while reducing Total Cost of Ownership (TCO). A new partitioning scheme supports load unit specification at table, partition, and column level. Finally, a new advisor recommends optimal load unit configurations. Our experiments illustrate the performance and memory footprint improvements on typical customer scenarios.
TL;DR: The history of IEEE Standard 754, the Standard for Floating Point Arithmetic, which has flourished in many commercial microprocessors and other computer platforms, is recounted.
Abstract: IEEE Standard 754, Standard for Floating Point Arithmetic, had its beginnings more than 40 years ago. Implementations of the standard have flourished in many commercial microprocessors and other computer platforms. In June, a revision of the standard was approved by the IEEE Standards Association Standards Board. This column recounts some of the interesting history behind the standard.
TL;DR: This is an expanded version of the Notices of the AMS column with the same title, but the authors added acknowledgements and a large number of endnotes which provide the context and the references.
Abstract: This is an expanded version of the Notices of the AMS column with the same title. The text is unchanged, but we added acknowledgements and a large number of endnotes which provide the context and the references.
TL;DR: A computational screening method is proposed whereby virtual 2D chromatograms are calculated utilizing the Snyder-Dolan hydrophobic subtraction model (HSM) for reversed-phase column selectivity and shows a strong sensitivity to the choice of the second dimension column and a preference for those with embedded polar moieties.
TL;DR: MorphStore, an in-memory column store with a novel compression-aware query processing concept, is presented, able to speed up the query execution by morphing compressed intermediate results from one scheme to another scheme to dynamically adapt to the changing data characteristics during query processing.
Abstract: In this demo, we present MorphStore, an in-memory column store with a novel compression-aware query processing concept. Basically, compression using lightweight integer compression algorithms already plays an important role in existing in-memory column stores, but mainly for base data. The continuous handling of compression from the base data to the intermediate results during query processing has already been discussed, but not investigated in detail since the computational effort for compression as well as decompression is often assumed to exceed the benefits of a reduced transfer cost between CPU and main memory. However, this argument increasingly loses its validity as we are going to show in our demo. Generally, our novel compression-aware query processing concept is characterized by the fact that we are able to speed up the query execution by morphing compressed intermediate results from one scheme to another scheme to dynamically adapt to the changing data characteristics during query processing. Our morphing decisions are made using a cost-based approach.
TL;DR: An automated selection framework for compression configurations for autonomous database systems and introduces Hyrise’s compression framework which implements an efficient and maintainable interface for various column compression techniques.
TL;DR: A new matrix calculus (MCS) method is proposed using which row and column wise round trip paths will be generated and it is possible to detect multiple failure nodes by comparing row wise RTPs delay and columnwise RTPS delay.
Abstract: Wireless sensor networks are the wireless autonomous networks with a large number of distributed sensor nodes. The quality of service in such WSNs is mainly affected by the failure of sensor nodes. In the existing method, failure sensor nodes are detected by measuring the round trip delay (RTD) time of discrete round trip paths and comparing them with threshold value. A new matrix calculus (MCS) method is proposed using which row and column wise round trip paths will be generated. With the newly generated RTPs it is possible to detect multiple failure nodes by comparing row wise RTPs delay and column wise RTPs delay.
TL;DR: The scientific world is becoming more open to the public and fellow researchers as discussed by the authors, and the next step is the open code and data paradigm, which was briefly discussed in the "From the Editor" column in the November 2018 issue of IEEE Signal Processing Magazine (SPM).
Abstract: The scientific world is becoming more open to the public and fellow researchers. Open access publishing is becoming accepted, even if some publishers are resisting. The next step is the open code and data paradigm, which was briefly discussed in the "From the Editor" column in the November 2018 issue of IEEE Signal Processing Magazine (SPM) [1]. In this column, I follow up on this topic by sharing my experiences, best practices, and thoughts about reproducible research.