Top 55133 papers published in the topic of Table (database) in 2023

Showing papers on "Table (database) published in 2023"

Tree Visualization By One Table (tvBOT): a web application for visualizing, modifying and annotating phylogenetic trees

[...]

Jianmin Xie, Yuerong Chen, Runlin Cai, Zhong Yun Hu, Hui Wang - Show less +1 more

05 May 2023-Nucleic Acids Research

TL;DR: TVBOT as discussed by the authors is a web application for visualizing, modifying, and annotating phylogenetic trees with a data-driven engine that only requires practical data organized in uniform formats and saved as one table file.

...read moreread less

Abstract: Abstract tvBOT is a user-friendly and efficient web application for visualizing, modifying, and annotating phylogenetic trees. It is highly efficient in data preparation without requiring redundant style and syntax data. Tree annotations are powered by a data-driven engine that only requires practical data organized in uniform formats and saved as one table file. A layer manager is developed to manage annotation dataset layers, allowing the addition of a specific layer by selecting the columns of a corresponding annotation data file. Furthermore, tvBOT renders style adjustments in real-time and diversified ways. All style adjustments can be made on a highly interactive user interface and are available for mobile devices. The display engine allows the changes to be updated and rendered in real-time. In addition, tvBOT supports the combination display of 26 annotation dataset types to achieve multiple formats for tree annotations with reusable phylogenetic data. Besides several publication-ready graphics formats, JSON format can be exported to save the final drawing state and all related data, which can be shared with other users, uploaded to restore the final drawing state for re-editing or used as a style template for quickly retouching a new tree file. tvBOT is freely available at: https://www.chiplot.online/tvbot.html.

...read moreread less

398 citations

Journal Article•10.1111/1742-6723.14233•

Generative artificial intelligence: Can ChatGPT write a quality abstract?

[...]

Franz E Babl

04 May 2023-Emergency Medicine Australasia

TL;DR: The authors used ChatGPT to produce a quality conference abstract using a fictitious but accurately calculated data table as applied by a non-medically trained person, which was well written without obvious errors and followed the abstract instructions.

...read moreread less

Abstract: ChatGPT is a generative artificial intelligence chatbot which may have a role in medicine and science. We investigated if the freely available version of ChatGPT can produce a quality conference abstract using a fictitious but accurately calculated data table as applied by a non-medically trained person. The resulting abstract was well written without obvious errors and followed the abstract instructions. One of the references was fictitious, known as 'hallucination'. ChatGPT or similar programmes, with careful review of the product by authors, may become a valuable scientific writing tool. The scientific and medical use of generative artificial intelligence, however, raises many questions.

...read moreread less

40 citations

Journal Article•10.1016/j.patcog.2022.109006•

Robust Table Detection and Structure Recognition from Heterogeneous Document Images

[...]

Chixiang Ma¹•Institutions (1)

Microsoft Research Asia (China)¹

01 Jan 2023-Pattern Recognition

TL;DR: In this article , a new table detection and structure recognition approach named RobusTabNet is proposed to detect the boundaries of tables and reconstruct the cellular structure of each table from heterogeneous document images.

...read moreread less

38 citations

Journal Article•10.18653/v1/2023.findings-eacl.83•

Large Language Models are few(1)-shot Table Reasoners

[...]

Wenhu Chen

1 Jan 2023

TL;DR: Large language models are effective few-shot table reasoners, achieving performance comparable to state-of-the-art models with only a single demonstration.

...read moreread less

Abstract: Recent literature has shown that large language models (LLMs) are generally excellent few-shot reasoners to solve text reasoning tasks. However, the capability of LLMs on table reasoning tasks is yet to be explored. In this paper, we aim at understanding how well LLMs can perform table-related tasks with few-shot in-context learning. Specifically, we evaluated LLMs on popular table QA and fact verification datasets like WikiTableQuestion, FetaQA, TabFact, and FEVEROUS and found that LLMs are competent at complex reasoning over table structures, though these models are not pre-trained on any table corpus. When combined with ‘chain of thoughts’ prompting, LLMs can achieve very strong performance with only a 1-shot demonstration, even on par with some SoTA models. We show that LLMs are even more competent at generating comprehensive long-form answers on FetaQA than tuned T5-large. We further manually studied the reasoning chains elicited from LLMs and found that these reasoning chains are highly consistent with the underlying semantic form. We believe that LLMs can serve as a simple yet generic baseline for future research. The code and data are released in https://github.com/wenhuchen/TableCoT.

...read moreread less

27 citations

Journal Article•10.14778/3594512.3594528•

Learned Index: A Comprehensive Experimental Evaluation

[...]

Zhaoyan Sun, Xuanhe Zhou, Guoliang Li

01 Apr 2023-Proceedings of The Vldb Endowment

TL;DR: A detailed review of learned indexes can be found in this paper , where the authors discuss the design choices of key components in learned indexes, including key lookup (position inference which predicts the position of a key, and position refinement which re-searches the position if the predicted position is incorrect), key insert, concurrency, and bulk loading.

...read moreread less

Abstract: Indexes can improve query-processing performance by avoiding full table scans. Although traditional indexes (e.g., B+-tree) have been widely used, learned indexes are proposed to adopt machine learning models to reduce the query latency and index size. However, existing learned indexes are (1) not thoroughly evaluated under the same experimental framework and are (2) not comprehensively compared with different settings (e.g., key lookup, key insert, concurrent operations, bulk loading). Moreover, it is hard to select appropriate learned indexes for practitioners in different settings. To address those problems, this paper detailedly reviews existing learned indexes and discusses the design choices of key components in learned indexes, including key lookup (position inference which predicts the position of a key, and position refinement which re-searches the position if the predicted position is incorrect), key insert, concurrency, and bulk loading. Moreover, we provide a testbed to facilitate the design and test of new learned indexes for researchers. We compare state-of-the-art learned indexes in the same experimental framework, and provide findings to select suitable learned indexes under various practical scenarios.

...read moreread less

26 citations

Journal Article•10.1016/j.fct.2023.113698•

Microplastic contamination and risk assessment in table salts: Turkey.

[...]

Zehra Özçifçi, Burhan Başaran, Hakkı Türker Akçay

01 Mar 2023-Food and Chemical Toxicology

TL;DR: In this article , the characterization of microplastics of table salts (n = 36) was determined by FT − IR, and the exposure to microplastic from table salt consumption was calculated with a deterministic model, and finally, a risk assessment of table salt was performed using the polymer risk index.

...read moreread less

25 citations

Journal Article•10.1021/acs.jctc.3c00279•

Lifelong Machine Learning Potentials

[...]

Marco Eckhoff, Markus Reiher

10 Mar 2023-Journal of Chemical Theory and Computation

TL;DR: In this paper , the authors propose element-embracing atom-centered symmetry functions (eeACSFs), which combine structural properties and element information from the periodic table for the development of a lifelong machine learning potential.

...read moreread less

Abstract: Machine learning potentials (MLPs) trained on accurate quantum chemical data can retain the high accuracy, while inflicting little computational demands. On the downside, they need to be trained for each individual system. In recent years, a vast number of MLPs have been trained from scratch because learning additional data typically requires retraining on all data to not forget previously acquired knowledge. Additionally, most common structural descriptors of MLPs cannot represent efficiently a large number of different chemical elements. In this work, we tackle these problems by introducing element-embracing atom-centered symmetry functions (eeACSFs), which combine structural properties and element information from the periodic table. These eeACSFs are key for our development of a lifelong machine learning potential (lMLP). Uncertainty quantification can be exploited to transgress a fixed, pretrained MLP to arrive at a continuously adapting lMLP, because a predefined level of accuracy can be ensured. To extend the applicability of an lMLP to new systems, we apply continual learning strategies to enable autonomous and on-the-fly training on a continuous stream of new data. For the training of deep neural networks, we propose the continual resilient (CoRe) optimizer and incremental learning strategies relying on rehearsal of data, regularization of parameters, and the architecture of the model.

...read moreread less

23 citations

Journal Article•10.1016/j.ins.2023.02.006•

Three-way conflict analysis based on hybrid situation tables

[...]

Hai-Long Yang, Ye Wang, Zhi-Lian Guo

01 Feb 2023-Information Sciences

TL;DR: In this article , a three-way conflict analysis model based on hybrid situation tables is proposed, where strong alliances, weak alliances, and maximal alliances are defined to find a conflict resolution method in a hybrid situation table.

...read moreread less

22 citations

Journal Article•10.1016/j.mineng.2023.108108•

Research on intelligent implementation of the beneficiation process of shaking table

[...]

You Keshun, Weng Chengyu, Liu Huizhong

01 Aug 2023-Minerals Engineering

TL;DR: Zhang et al. as mentioned in this paper developed deep learning semantic segmentation methods using HALCON to successfully extract multi-dimensional zoning information and determine the mapping relationship between zoning attributes, concentration grade, and recovery rate by combining machine learning models.

...read moreread less

22 citations

Journal Article•10.1109/tse.2023.3288901•

NLP-based Automated Compliance Checking of Data Processing Agreements against GDPR

[...]

01 Jan 2023-IEEE Transactions on Software Engineering

TL;DR: In this article , an automated solution to check compliance of a given data processing agreement (DPAs) against GDPR was proposed. But the approach is limited to 30 actual DPAs and it requires significant time and effort for understanding and identifying DPA-relevant compliance requirements in GDPR and then verifying these requirements in the DPA.

...read moreread less

Abstract: When the entity processing personal data (the processor) differs from the one collecting personal data (the controller), processing personal data is regulated in Europe by the General Data Protection Regulation (GDPR) through data processing agreements (DPAs) . Checking the compliance of DPAs contributes to the compliance verification of software systems as DPAs are an important source of requirements for software development involving the processing of personal data. However, manually checking whether a given DPA complies with GDPR is challenging as it requires significant time and effort for understanding and identifying DPA-relevant compliance requirements in GDPR and then verifying these requirements in the DPA. Legal texts introduce additional complexity due to convoluted language and inherent ambiguity leading to potential misunderstandings. In this paper, we propose an automated solution to check the compliance of a given DPA against GDPR. In close interaction with legal experts, we first built two artifacts: (i) the “shall” requirements extracted from the GDPR provisions relevant to DPA compliance and (ii) a glossary table defining the legal concepts in the requirements. Then, we developed an automated solution that leverages natural language processing (NLP) technologies to check the compliance of a given DPA against these “shall” requirements. Specifically, our approach automatically generates phrasal-level representations for the textual content of the DPA and compares them against predefined representations of the “shall” requirements. By comparing these two representations, the approach not only assesses whether the DPA is GDPR compliant but it further provides recommendations about missing information in the DPA. Over a dataset of 30 actual DPAs, the approach correctly finds 618 out of 750 genuine violations while raising 76 false violations, and further correctly identifies 524 satisfied requirements. The approach has thus an average precision of 89.1%, a recall of 82.4%, and an accuracy of 84.6%. Compared to a baseline that relies on off-the-shelf NLP tools, our approach provides an average accuracy gain of

$\approx$

20 percentage points. The accuracy of our approach can be improved to

$\approx$

94% with limited manual verification effort.

...read moreread less

21 citations

Journal Article•10.1021/acs.jcim.2c01259•

OpticalBERT and OpticalTable-SQA: Text- and Table-Based Language Models for the Optical-Materials Domain

[...]

Jiuyang Zhao, Shu Huang, Jacqueline M. Cole

20 Mar 2023-Journal of Chemical Information and Modeling

TL;DR: The authors presented two text-based language models for optical research, OpticalBERT and OpticalPureBERT, which are trained on a large corpus of scientific literature in the optical-materials domain.

...read moreread less

Abstract: Text mining in the optical-materials domain is becoming increasingly important as the number of scientific publications in this area grows rapidly. Language models such as Bidirectional Encoder Representations from Transformers (BERT) have opened up a new era and brought a significant boost to state-of-the-art natural-language-processing (NLP) tasks. In this paper, we present two “materials-aware” text-based language models for optical research, OpticalBERT and OpticalPureBERT, which are trained on a large corpus of scientific literature in the optical-materials domain. These two models outperform BERT and previous state-of-the-art models in a variety of text-mining tasks about optical materials. We also release the first “materials-aware” table-based language model, OpticalTable-SQA. This is a querying facility that solicits answers to questions about optical materials using tabular information that pertains to this scientific domain. The OpticalTable-SQA model was realized by fine-tuning the Tapas-SQA model using a manually annotated OpticalTableQA data set which was curated specifically for this work. While preserving its sequential question-answering performance on general tables, the OpticalTable-SQA model significantly outperforms Tapas-SQA on optical-materials-related tables. All models and data sets are available to the optical-materials-science community.

...read moreread less

Journal Article•10.1016/j.drudis.2023.103496•

The FDA modernisation act 2.0: bringing non-animal technologies to the regulatory table.

[...]

Alastair Stewart, Delphine Denoyer, Xumei Gao, Yi-Chin Toh

01 Jan 2023-Drug Discovery Today

TL;DR: The FDA modernisation act 2.0 marks a game-changing legislation enabling drug registration without the absolute requirement for the use of animals in safety toxicology assessment as discussed by the authors , and the implications of this most recent chapter in the evolution of the drug regulation pathway are discussed.

...read moreread less

Book Chapter•10.1093/oxfordhb/9780198704355.001.0001•

The Oxford Handbook of Transitional Justice

[...]

18 Sep 2023

TL;DR: The Oxford Handbook of Transitional Justice is currently under development, with articles publishing online ahead of print publication. The table of contents will grow as additional articles are added.

...read moreread less

Abstract: Abstract This handbook is currently in development, with individual articles publishing online in advance of print publication. At this time, we cannot add information about unpublished articles in this handbook, however the table of contents will continue to grow as additional articles pass through the review process and are added to the site. Please note that the online publication date for this handbook is the date that the first article in the title was published online. For more information, please read the site FAQs.

...read moreread less

Journal Article•10.1109/cvpr52729.2023.01071•

Improving Table Structure Recognition with Visual-Alignment Sequential Coordinate Modeling

[...]

Yongshuai Huang, Ning Lü, Dapeng Chen, Yibo Li, Zhong Xie, Shenggao Zhu, Liangcai Gao, Ping Wei - Show less +4 more

1 Jun 2023

TL;DR: Improving table structure recognition by leveraging visual-alignment sequential coordinate modeling and incorporating local visual details into the logical representation.

...read moreread less

Abstract: Table structure recognition aims to extract the logical and physical structure of unstructured table images into a machine-readable format. The latest end-to-end image-to-text approaches simultaneously predict the two structures by two decoders, where the prediction of the physical structure (the bounding boxes of the cells) is based on the representation of the logical structure. However, the previous methods struggle with imprecise bounding boxes as the logical representation lacks local visual information. To address this issue, we propose an end-to-end sequential modeling framework for table structure recognition called VAST. It contains a novel coordinate sequence decoder triggered by the representation of the non-empty cell from the logical structure decoder. In the coordinate sequence decoder, we model the bounding box coordinates as a language sequence, where the left, top, right and bottom coordinates are decoded sequentially to leverage the inter-coordinate dependency. Furthermore, we propose an auxiliary visual-alignment loss to enforce the logical representation of the non-empty cells to contain more local visual details, which helps produce better cell bounding boxes. Extensive experiments demonstrate that our proposed method can achieve state-of-the-art results in both logical and physical structure recognition. The ablation study also validates that the proposed coordinate sequence decoder and the visual-alignment loss are the keys to the success of our method.

...read moreread less

Journal Article•10.1109/jproc.2022.3223791•

A Perspective Vision of Micro/Nano Systems and Technologies as Enablers of 6G, Super-IoT, and Tactile Internet

[...]

Jacopo Iannacci

01 Jan 2023-Proceedings of the IEEE

TL;DR: In this paper , a full list of used acronyms for IoT, IoE, and 5G is presented, and a discussion of the relationship between IoT and IoE can be found.

...read moreread less

Abstract: Modern research in technology fields, such as electronics, distributed networks of sensing/functional nodes, and wireless and wearable devices, is relentlessly converging around wide application paradigms, such as Internet of Things (IoT) [1] and Internet of Everything (IoE) [2] — Table 1 , at the end of section, offers a full list of used acronyms. From a different perspective, recent advances in electronics, hardware (HW) technologies, information technology (IT), and artificial intelligence (AI) for telecommunication networks, standards, and protocols look to unavoidably fall under the umbrella of fifth generation of mobile communications (5G) [3] . Even though they appear orthogonal to each other, IoT, IoE, and 5G are closely linked together. In a nutshell, IoT and IoE target pervasivity of services, while 5G is the pillar upon which transmission of massive amounts of data and information should lay [4] . As brief recap, 5G poses on the three cornerstone drivers of enhanced mobile broadband (eMBB), massive machine-type communications (mMTCs), and ultrareliable low latency communications (URLLC) [5] , to enable data-centric applications such as machine-to-machine (M2M), vehicle-to-vehicle (V2V), and vehicle-to-everything (V2X) communications, along with virtual reality (VR), augmented reality (AR), and extended reality (XR).

...read moreread less

Proceedings Article•10.5220/0011685000003417•

An End-to-End Multi-Task Learning Model for Image-based Table Recognition

[...]

Nam Tuan Ly, Atsuhiro Takasu

15 Mar 2023

TL;DR: In this paper , an end-to-end multi-task learning model for image-based table recognition is proposed which consists of one shared encoder, one shared decoder, and three separate decoders which are used for learning three sub-tasks of table recognition: table structure recognition, cell detection, and cell-content recognition.

...read moreread less

Abstract: Image-based table recognition is a challenging task due to the diversity of table styles and the complexity of table structures. Most of the previous methods focus on a non-end-to-end approach which divides the problem into two separate sub-problems: table structure recognition; and cell-content recognition and then attempts to solve each sub-problem independently using two separate systems. In this paper, we propose an end-to-end multi-task learning model for image-based table recognition. The proposed model consists of one shared encoder, one shared decoder, and three separate decoders which are used for learning three sub-tasks of table recognition: table structure recognition, cell detection, and cell-content recognition. The whole system can be easily trained and inferred in an end-to-end approach. In the experiments, we evaluate the performance of the proposed model on two large-scale datasets: FinTabNet and PubTabNet. The experiment results show that the proposed model outperforms the state-of-the-art methods in all benchmark datasets.

...read moreread less

Dataset•10.5281/zenodo.7624666•

Blind Prediction Competition - Sera.ta - Seismic Response of Masonry Cross Vaults: Shaking table tests and numerical validations

[...]

Nicoletta Bianchini, Chiara Calderini, Nuno Mendes, Paulo Candeias, Paulo B. Lourenço - Show less +1 more

9 Feb 2023

TL;DR: The seismic response of masonry cross vaults is a complex topic that requires further research. The research project aims to better understand the seismic behaviour of masonry cross vaults through shaking table tests and numerical validations.

...read moreread less

Abstract: Masonry vaults play a much relevant role in the seismic response of heritage masonry buildings, ranging from housing to the greatest cathedrals. Acting as both a ceiling and a structural horizontal diaphragm with significant mass, their mechanical behaviour affects the overall seismic response of buildings, in terms of strength, stiffness, and ductility. Moreover, local damage and collapse of vaults may produce significant losses in terms of cultural assets and casualties. In spite of the importance of this topic, the evaluation of the complex three-dimensional behaviour of vaults is still an important challenge for researchers. The main objectives of the present research project are:
1) to better understand the seismic behaviour of masonry cross vaults by means of shaking table tests on both full-scale and small-scale models;
2) to assess the capability of different modelling/analysis approaches to predict the seismic response of these masonry structures. In particular, three sets of shaking table tests are planned:
a. Tests on a 1:1 scale model of a brick unreinforced masonry cross vault: to investigate the behaviour of brick masonry cross vaults under different seismic inputs, in terms of damage, displacement capacity and peak acceleration.
b. Tests on a 1:1 scale model of a brick reinforced masonry cross vault: to evaluate the effectiveness of reinforcing techniques to repair the vaults tested in a). In addition to the experimental tests, a blind prediction competition is performed to assess the efficacy of different modelling strategies and analysis techniques. The final aims are to improve the safety assessment procedures proposed for historic masonry buildings in Eurocode 8.3 and to provide better seismic assessment techniques and strengthening measures.

...read moreread less

Journal Article•10.1257/aer.20210949•

Why Do Households Leave School Value Added on the Table? The Roles of Information and Preferences

[...]

01 Apr 2023-The American economic review

TL;DR: In this article , the authors investigate why households leave value added "on the table" when choosing schools, and show that households have preferences for a variety of school traits, and that fully correcting households' beliefs would eliminate at most a quarter of the value added that households leave unexploited.

...read moreread less

Abstract: Romanian households could choose schools with one standard deviation worth of additional value added. Why do households leave value added “on the table”? We study two possibilities: (i) information and (ii) preferences for other school traits. In an experiment, we inform randomly selected households about schools' value added. These households choose schools with up to 0.2 standard deviations of additional value added. We then estimate a discrete choice model and show that households have preferences for a variety of school traits. As a result, fully correcting households' beliefs would eliminate at most a quarter of the value added that households leave unexploited. (JEL D12, D83, I21, I28)

...read moreread less

Journal Article•10.1145/3539618.3591708•

Large Language Models are Versatile Decomposers: Decomposing Evidence and Questions for Table-based Reasoning

[...]

Yunhu Ye, Binyuan Hui, Min Yang, Binhua Li, Fei Huang, Yongbin Li - Show less +2 more

18 Jul 2023

TL;DR: Large language models are versatile decomposers for effective table-based reasoning by decomposing evidence and questions into smaller components to improve performance and interpretability.

...read moreread less

Abstract: Table-based reasoning has shown remarkable progress in a wide range of table-based tasks. It is a challenging task, which requires reasoning over both free-form natural language (NL) questions and (semi-)structured tabular data. However, previous table-based reasoning solutions usually suffer from significant performance degradation on ''huge'' evidence (tables). In addition, most existing methods struggle to reason over complex questions since the essential information is scattered in different places. To alleviate the above challenges, we exploit large language models (LLMs) as decomposers for effective table-based reasoning, which (i) decompose huge evidence (a huge table) into sub-evidence (a small table) to mitigate the interference of useless information for table reasoning, and (ii) decompose a complex question into simpler sub-questions for text reasoning. First, we use a powerful LLM to decompose the evidence involved in the current question into the sub-evidence that retains the relevant information and excludes the remaining irrelevant information from the ''huge'' evidence. Second, we propose a novel ''parsing-execution-filling'' strategy to decompose a complex question into simper step-by-step sub-questions by generating intermediate SQL queries as a bridge to produce numerical and logical sub-questions with a powerful LLM. Finally, we leverage the decomposed sub-evidence and sub-questions to get the final answer with a few in-context prompting examples. Extensive experiments on three benchmark datasets (TabFact, WikiTableQuestion, and FetaQA) demonstrate that our method achieves significantly better results than competitive baselines for table-based reasoning. Notably, our method outperforms human performance for the first time on the TabFact dataset. In addition to impressive overall performance, our method also has the advantage of interpretability, where the returned results are to some extent tractable with the generated sub-evidence and sub-questions. For reproducibility, we release our source code and data at: https://github.com/AlibabaResearch/DAMO-ConvAI.

...read moreread less

Proceedings Article•10.5220/0011682600003411•

Rethinking Image-based Table Recognition Using Weakly Supervised Methods

[...]

Nam Tuan Ly, Atsuhiro Takasu, Phuc Tri Nguyen, Hideaki Takeda

14 Mar 2023

TL;DR: Wang et al. as mentioned in this paper proposed a weakly supervised model named WSTabNet for table recognition that relies only on HTML (or LaTeX) code-level annotations of table images.

...read moreread less

Abstract: Most of the previous methods for table recognition rely on training datasets containing many richly annotated table images. Detailed table image annotation, e.g., cell or text bounding box annotation, however, is costly and often subjective. In this paper, we propose a weakly supervised model named WSTabNet for table recognition that relies only on HTML (or LaTeX) code-level annotations of table images. The proposed model consists of three main parts: an encoder for feature extraction, a structure decoder for generating table structure, and a cell decoder for predicting the content of each cell in the table. Our system is trained end-to-end by stochastic gradient descent algorithms, requiring only table images and their ground-truth HTML (or LaTeX) representations. To facilitate table recognition with deep learning, we create and release WikiTableSet, the largest publicly available image-based table recognition dataset built from Wikipedia. WikiTableSet contains nearly 4 million English table images, 590K Japanese table images, and 640k French table images with corresponding HTML representation and cell bounding boxes. The extensive experiments on WikiTableSet and two large-scale datasets: FinTabNet and PubTabNet demonstrate that the proposed weakly supervised model achieves better, or similar accuracies compared to the state-of-the-art models on all benchmark datasets.

...read moreread less

Proceedings Article•10.1145/3555041.3589409•

Table Discovery in Data Lakes: State-of-the-art and Future Directions

[...]

Grace Fan, Jin Wang, Yuliang Li, Renée J. Miller

4 Jun 2023

TL;DR: A comprehensive overview of the most recent table discovery techniques developed by the data management community can be found in this paper , where the authors cover table understanding tasks such as domain discovery, table annotation, and table representation learning which help data lake systems capture semantics of tables.

...read moreread less

Abstract: Data discovery refers to a set of tasks that enable users and downstream applications to explore and gain insights from massive collections of data sources such as data lakes. In this tutorial, we will provide a comprehensive overview of the most recent table discovery techniques developed by the data management community. We will cover table understanding tasks such as domain discovery, table annotation, and table representation learning which help data lake systems capture semantics of tables. We will also cover techniques enabling various query-driven discovery and table exploration tasks, as well as how table discovery can support key data science applications such as machine learning and knowledge base construction. Finally, we will discuss future research directions on developing new table discovery paradigms by combining structured knowledge and dense table representations, as well as improving the efficiency of discovery using state-of-the-art indexing techniques, and more.

...read moreread less

Journal Article•10.1016/j.procir.2023.02.115•

Disassembly sequence planning for target parts of end-of-life smartphones using Q-learning algorithm

[...]

Li Li, Fu Zhao, John W. Sutherland, Fengfu Yin

01 Jan 2023-Procedia CIRP

TL;DR: In this article , an improved method that uses a Q-learning algorithm is proposed to optimize the disassembly sequence of end-of-life (EoL) smartphones, where a constraint relationship is first developed of EoL smartphone parts.

...read moreread less

Book Chapter•10.22459/cbpw.2023.01•

Negotiating at an uneven table

[...]

30 Nov 2023

Journal Article•10.1145/3588710•

GitTables: A Large-Scale Corpus of Relational Tables

[...]

26 May 2023

TL;DR: GitTables as mentioned in this paper is a corpus of 1M relational tables extracted from GitHub and annotated with semantic types, hierarchical relations and descriptions from Schema.org and DBpedia.

...read moreread less

Abstract: The success of deep learning has sparked interest in improving relational table tasks, like data preparation and search, with table representation models trained on large table corpora. Existing table corpora primarily contain tables extracted from HTML pages, limiting the capability to represent offline database tables. To train and evaluate high-capacity models for applications beyond the Web, we need resources with tables that resemble relational database tables. Here we introduce GitTables, a corpus of 1M relational tables extracted from GitHub. Our continuing curation aims at growing the corpus to at least 10M tables. Analyses of GitTables show that its structure, content, and topical coverage differ significantly from existing table corpora. We annotate table columns in GitTables with semantic types, hierarchical relations and descriptions from Schema.org and DBpedia. The evaluation of our annotation pipeline on the T2Dv2 benchmark illustrates that our approach provides results on par with human annotations. We present three applications of GitTables, demonstrating its value for learned semantic type detection models, schema completion methods, and benchmarks for table-to-KG matching, data search, and preparation. We make the corpus and code available at https://gittables.github.io.

...read moreread less

Journal Article•10.3389/fmtec.2023.1154132•

Optical character recognition on engineering drawings to achieve automation in production quality control

[...]

Javier Villena Toro, Anton Wiberg, Mehdi Tarkian

20 Mar 2023-Frontiers in Manufacturing Technology

TL;DR: The eDOCr tool as discussed by the authors provides an effective solution for automated text detection and recognition in engineering drawings, which enables seamless integration between CAD/CAM software and quality control for mechanical products.

...read moreread less

Abstract: Introduction: Digitization is a crucial step towards achieving automation in production quality control for mechanical products. Engineering drawings are essential carriers of information for production, but their complexity poses a challenge for computer vision. To enable automated quality control, seamless data transfer between analog drawings and CAD/CAM software is necessary. Methods: This paper focuses on autonomous text detection and recognition in engineering drawings. The methodology is divided into five stages. First, image processing techniques are used to classify and identify key elements in the drawing. The output is divided into three elements: information blocks and tables, feature control frames, and the rest of the image. For each element, an OCR pipeline is proposed. The last stage is output generation of the information in table format. Results: The proposed tool, called eDOCr, achieved a precision and recall of 90% in detection, an F1-score of 94% in recognition, and a character error rate of 8%. The tool enables seamless integration between engineering drawings and quality control. Discussion: Most OCR algorithms have limitations when applied to mechanical drawings due to their inherent complexity, including measurements, orientation, tolerances, and special symbols such as geometric dimensioning and tolerancing (GD&T). The eDOCr tool overcomes these limitations and provides a solution for automated quality control. Conclusion: The eDOCr tool provides an effective solution for automated text detection and recognition in engineering drawings. The tool's success demonstrates that automated quality control for mechanical products can be achieved through digitization. The tool is shared with the research community through Github.

...read moreread less

Journal Article•10.1111/2041-210x.14065•

An exact version of Life Table Response Experiment analysis, and the R package exactLTRE

[...]

Oliver Coates¹•Institutions (1)

Cornell University¹

01 Feb 2023-Methods in Ecology and Evolution

TL;DR: In this article , the authors used the functional analysis of variance framework to derive an exact LTRE method, which calculates the exact response of λ to the difference or variance in a given vital rate, for all interactions among vital rates.

...read moreread less

Abstract: Matrix population models are frequently built and used by ecologists to analyse demography and elucidate the processes driving population growth or decline. Life Table Response Experiments (LTREs) are comparative analyses that decompose the realized difference or variance in population growth rate ( λ ) into contributions from the differences or variances in the vital rates (i.e. the matrix elements). Since their introduction, LTREs have been based on approximations and have not included biologically relevant interaction terms. We used the functional analysis of variance framework to derive an exact LTRE method, which calculates the exact response of λ to the difference or variance in a given vital rate, for all interactions among vital rates—including higher-order interactions neglected by the classical methods. We used the publicly available COMADRE and COMPADRE databases to perform a meta-analysis comparing the results of exact and classical LTRE methods. We analysed 186 and 1487 LTREs for animal and plant matrix population models, respectively. We found that the classical methods often had small errors, but that very high errors were possible. Overall error was related to the difference or variance in the matrices being analysed, consistent with the Taylor series basis of the classical method. Neglected interaction terms accounted for most of the errors in fixed design LTRE, highlighting the importance of two-way interaction terms. For random design LTRE, errors in the contribution terms present in both classical and exact methods were comparable to errors due to neglected interaction terms. In most examples we analysed, evaluating exact contributions up to three-way interaction terms was sufficient for interpreting 90% or more of the difference or variance in λ . Relative error, previously used to evaluate the accuracy of classical LTREs, is not a reliable metric of how closely the classical and exact methods agree. Error compensation between estimated contribution terms and neglected contribution terms can lead to low relative error despite faulty biological interpretation. Trade-offs or negative covariances among matrix elements can lead to high relative error despite accurate biological interpretation. Exact LTRE provides reliable and accurate biological interpretation, and the R package exactLTRE makes the exact method accessible to ecologists.

...read moreread less

Book Chapter•10.1007/978-3-031-21435-6_44•

Using Technical and Structural Coefficients of Economic Statistics to Equalize Flows of Input-Output Table

[...]

Seyit Kerimkhulle¹•Institutions (1)

L.N.Gumilyov Eurasian National University¹

1 Jan 2023

Journal Article•10.1016/j.postharvbio.2022.112155•

Role of epicuticular wax involved in quality maintenance of table grapes: Evidence from transcriptomic data

[...]

Ming-hua Yang, Zisheng Luo, Dong Nan Li, Chao-Nan Ma, Lin Li - Show less +1 more

01 Feb 2023-Postharvest Biology and Technology

TL;DR: In this article , the epicuticular wax of table grapes was removed to mimic the natural wax loss/damage, and the quality attributes of fruit were investigated, which demonstrated wax removal triggered fruit weight loss, softening and browning during storage.

...read moreread less

Posted Content•10.48550/arxiv.2301.13808•

Large Language Models are Versatile Decomposers: Decompose Evidence and Questions for Table-based Reasoning

[...]

31 Jan 2023

TL;DR: The authors decompose huge evidence (a huge table) into sub-evidence (a small table) to mitigate the interference of useless information for table reasoning, and decompose complex questions into simpler sub-questions for text reasoning.

...read moreread less

Abstract: Table-based reasoning has shown remarkable progress in combining deep models with discrete reasoning, which requires reasoning over both free-form natural language (NL) questions and structured tabular data. However, previous table-based reasoning solutions usually suffer from significant performance degradation on huge evidence (tables). In addition, most existing methods struggle to reason over complex questions since the required information is scattered in different places. To alleviate the above challenges, we exploit large language models (LLMs) as decomposers for effective table-based reasoning, which (i) decompose huge evidence (a huge table) into sub-evidence (a small table) to mitigate the interference of useless information for table reasoning; and (ii) decompose complex questions into simpler sub-questions for text reasoning. Specifically, we first use the LLMs to break down the evidence (tables) involved in the current question, retaining the relevant evidence and excluding the remaining irrelevant evidence from the huge table. In addition, we propose a "parsing-execution-filling" strategy to alleviate the hallucination dilemma of the chain of thought by decoupling logic and numerical computation in each step. Extensive experiments show that our method can effectively leverage decomposed evidence and questions and outperforms the strong baselines on TabFact, WikiTableQuestion, and FetaQA datasets. Notably, our model outperforms human performance for the first time on the TabFact dataset.

...read moreread less

Journal Article•10.1002/env.2801•

Mitigating spatial confounding by explicitly correlating Gaussian random fields

[...]

Isa Marques, Thomas Kneib, Nadja Klein

01 Jun 2023-Environmetrics

TL;DR: In this article , a Gaussian random field for the covariate of interest is correlated with a spatial random effect included in the model, for example, as a proxy of unobserved spatial confounders.

...read moreread less

Abstract: Spatial models are used in a variety of research areas, such as environmental sciences, epidemiology, or physics. A common phenomenon in such spatial regression models is spatial confounding. This phenomenon is observed when spatially indexed covariates modeling the mean of the response are correlated with a spatial random effect included in the model, for example, as a proxy of unobserved spatial confounders. As a result, estimates for regression coefficients of the covariates can be severely biased and interpretation of these is no longer valid. Recent literature has shown that typical solutions for reducing spatial confounding can lead to misleading and counterintuitive results. In this article, we develop a computationally efficient spatial model that explicitly correlates a Gaussian random field for the covariate of interest with the Gaussian random field in the main model equation and integrates novel prior structures to reduce spatial confounding. Starting from the univariate case, we extend our prior structure also to the case of multiple spatially confounded covariates. In simulation studies, we show that our novel model flexibly detects and reduces spatial confounding in spatial datasets, and it performs better than typically used methods such as restricted spatial regression. These results are promising for any applied researcher who wishes to interpret covariate effects in spatial regression models. As a real data illustration, we study the effect of elevation and temperature on the mean of monthly precipitation in Germany.

...read moreread less

...

Expand