TL;DR: Evaluation of the algorithm on document images from publicly available UNLV dataset shows competitive performance in comparison to the table detection module of a commercial OCR system.
Abstract: Detecting tables in document images is important since not only do tables contain important information, but also most of the layout analysis methods fail in the presence of tables in the document image. Existing approaches for table detection mainly focus on detecting tables in single columns of text and do not work reliably on documents with varying layouts. This paper presents a practical algorithm for table detection that works with a high accuracy on documents with varying layouts (company reports, newspaper articles, magazine pages, ...). An open source implementation of the algorithm is provided as part of the Tesseract OCR engine. Evaluation of the algorithm on document images from publicly available UNLV dataset shows competitive performance in comparison to the table detection module of a commercial OCR system.
TL;DR: It is shown that Reflect leads to more balanced collaboration, but only under certain conditions, and different effects the table has on over and underparticipators.
Abstract: We describe an interactive table designed for supporting face-to-face collaborative learning. The table, Reflect, addresses the issue of unbalanced participation during group discussions. By displaying on its surface, a shared visualization of member participation, Reflect, is meant to encourage participants to avoid the extremes of over and underparticipation. We report on a user study that validates some of our hypotheses on the effect the table would have on its users. Namely, we show that Reflect leads to more balanced collaboration, but only under certain conditions. We also show different effects the table has on over and underparticipators.
TL;DR: The graph traversal pattern and its use in computing is discussed, which makes use of index-free, local traversals in contrast to the index-intensive, set-theoretic operations of relational databases.
Abstract: A graph is a structure composed of a set of vertices (i.e.nodes, dots) connected to one another by a set of edges (i.e.links, lines). The concept of a graph has been around since the late 19$^\text{th}$ century, however, only in recent decades has there been a strong resurgence in both theoretical and applied graph research in mathematics, physics, and computer science. In applied computing, since the late 1960s, the interlinked table structure of the relational database has been the predominant information storage and retrieval model. With the growth of graph/network-based data and the need to efficiently process such data, new data management systems have been developed. In contrast to the index-intensive, set-theoretic operations of relational databases, graph databases make use of index-free, local traversals. This article discusses the graph traversal pattern and its use in computing.
TL;DR: In this paper, a new formulation of DPC is presented, that allows to analyze the shortcomings of this kind of algorithms, mainly regarding the power limits in which table-based algorithms are valid.
Abstract: Direct power control (DPC) has increasingly gained attention in the last years as a robust and simple control scheme. Yet, the implications of some of the assumptions made by the original table-based DPC are not well explained in the literature. Here, a new formulation of DPC is presented, that allows to analyze the shortcomings of this kind of algorithms, mainly regarding the power limits in which table-based algorithms are valid. This is a key aspect for microgrid applications, as they require a wide range of possible active and reactive power setpoints in order to be able to control the system's voltage and frequency. A new table is then presented that allows to overcome those limitations. The proposed table is valid at any point of the inverter's power limits.
TL;DR: The approach towards establishing a complete and publicly available, hence open environment for the benchmarking of table spotting and structural analysis is described and free access to the ground truthing tool and evaluation mechanism described is provided.
Abstract: Table spotting and structural analysis are just a small fraction of tasks relevant when speaking of table analysis. Today, quite a large number of different approaches facing these tasks have been described in literature or are available as part of commercial OCR systems that claim to deal with tables on the scanned documents and to treat them accordingly.However, the problem of detecting tables is not yet solved at all. Different approaches have different strengths and weak points. Some fail in certain situations or layouts where others perform better. How shall one know, which approach or system is the best for his specific job? The answer to this question raises the demand for an objective comparison of different approaches which address the same task of spotting tables and recognizing their structure.This paper describes our approach towards establishing a complete and publicly available, hence open environment for the benchmarking of table spotting and structural analysis. We provide free access to the ground truthing tool and evaluation mechanism described in this paper, describe the ideas behind and we also provide ground truth for the 547 documents of the UNLV and UW-3 datasets that contain tables.In addition, we applied the quality measures to the results that were generated by the T-Recs system which we developed some years ago and which we started to further advance since a few months.
TL;DR: In this paper, a method and apparatus is provided for optimizing queries received by a database system that relies on an intelligent data storage server to manage storage for the database system Storing compression units in hybrid columnar format, the storage manager evaluates simple predicates and only returns data blocks containing rows that satisfy those predicates The returned data blocks are not necessarily stored persistently on disk.
Abstract: A method and apparatus is provided for optimizing queries received by a database system that relies on an intelligent data storage server to manage storage for the database system Storing compression units in hybrid columnar format, the storage manager evaluates simple predicates and only returns data blocks containing rows that satisfy those predicates The returned data blocks are not necessarily stored persistently on disk That is, the storage manager is not limited to returning disc block images The hybrid columnar format enables optimizations that provide better performance when processing typical database workloads including both fetching rows by identifier and performing table scans
TL;DR: In this paper, the authors describe techniques to automatically infer a (partial) semantic model for information in tables using both table headings, if available, and the values stored in table cells and to export the data the table represents as linked data.
Abstract: Much of the world’s knowledge is contained in structured documents like spreadsheets, database relations and tables in documents found on the Web and in print. The information in these tables might be much more valuable if it could be appropriately exported or encoded in RDF, making it easier to share, understand and integrate with other information. This is especially true if it could be linked into the growing linked data cloud. We describe techniques to automatically infer a (partial) semantic model for information in tables using both table headings, if available, and the values stored in table cells and to export the data the table represents as linked data. The techniques have been prototyped for a subset of linked data that covers the core of Wikipedia.
TL;DR: A table navigation system includes: a table identifier to identify a table and columns within the table; a navigation identifier to determine whether a navigation input by a user to navigate within a column of the table exceeds a threshold, wherein the threshold relates to an expectation of continued navigation input.
Abstract: A table navigation system includes: a table identifier to identify a table and columns within the table; a navigation identifier to determine whether a navigation input by a user to navigate within a column of the table exceeds a threshold, wherein the threshold relates to an expectation of continued navigation input by the user; a filter to filter unique elements in the column into separate categories; and a display engine to present the into separate categories in an interactive display that overlays the table.
TL;DR: An approach that uses linked data to interpret such tables and associate their components with nodes in a reference linked data collection, which assigns a class to table columns, links table cells to entities, and inferred relations between columns to properties is described.
Abstract: Vast amounts of information is available in structured forms like spreadsheets, database relations, and tables found in documents and on the Web. We describe an approach that uses linked data to interpret such tables and associate their components with nodes in a reference linked data collection. Our proposed framework assigns a class (i.e. type) to table columns, links table cells to entities, and inferred relations between columns to properties. The resulting interpretation can be used to annotate tables, confirm existing facts in the linked data collection, and propose new facts to be added. Our implemented prototype uses DBpedia as the linked data collection and Wikitology for background knowledge. We evaluated its performance using a collection of tables from Google Squared, Wikipedia and the Web.
TL;DR: In this article, a user interface is presented to the user that allows the user to specify a multi-condition data filter, which includes a set of filter conditions connected by logical operators, and the filter expressions are applied to the summary data from which the summary table is displayed.
Abstract: Technologies are described herein for allowing a user of an interactive summary table to specify multi-condition data filters to modify the data displayed in the summary table. A user interface is displayed to the user that allows the user to specify a multi-condition data filter. The specification of the multi-condition data filter includes a set of filter conditions connected by logical operators. One or more filter expressions are parsed from the specification of the multi-condition data filter based on the filter conditions and the logical operators, and the filter expressions are applied to the summary data from which the summary table is displayed.
TL;DR: In this article, a table (or data from a plurality of rows thereof) is first compressed into a "compression unit" using any of a wide variety of compression techniques.
Abstract: A database server stores compressed units in data blocks of a database. A table (or data from a plurality of rows thereof) is first compressed into a “compression unit” using any of a wide variety of compression techniques. The compression unit is then stored in one or more data block rows across one or more data blocks. As a result, a single data block row may comprise compressed data for a plurality of table rows, as encoded within the compression unit. Storage of compression units in data blocks maintains compatibility with existing data block-based databases, thus allowing the use of compression units in preexisting databases without modification to the underlying format of the database. The compression units may, for example, co-exist with uncompressed tables. Various techniques allow a database server to optimize access to data in the compression unit, so that the compression is virtually transparent to the user.
TL;DR: In this article, a method for translating data between object oriented programs and database storage tables is proposed, where the user input includes a plurality of parts, each part includes a specification of a source (such as a type source), optionally a filter, and a projection.
Abstract: Facilitating translation of data between object oriented programs and database storage tables. A method includes receiving user input from a user. The user input includes a plurality of parts. Each part includes a specification of a source (such as a type source), optionally a filter, and a projection. Each projection assigns values to table columns. Based on the plurality of parts received, the method includes generating one or more views. The one or more views describe relationships between model extents and database tables.
TL;DR: In this paper, the authors present a summary of finding (SoF) table for use in Cochrane reviews that is understandable and useful for health professionals, acceptable to Cochrane Collaboration stakeholders, and feasible to implement.
Abstract: Objective: To develop a Summary of Findings (SoF) table for use in Cochrane reviews that is understandable and useful for health professionals, acceptable to Cochrane Collaboration stakeholders, and feasible to implement. Study Design and Setting: We gathered stakeholder feedback on the format and content of an SoF table from an advisory group of more than 50 participants and their constituencies through e-mail consultations. We conducted user tests using a think-aloud protocol method, collecting feedback from 21 health professionals and researchers in Norway and the UK. We analyzed the feedback, defined problem areas, and generated new solutions in brainstorming workshops. Results: Stakeholders were concerned about precision in the data representation and about production feasibility. User testing revealed unexpected comprehension problems, mainly confusion about what the different numbers referred to (class reference). Resolving the tension between achieving table precision and table simplicity became the main focus of the working group. Conclusion: User testing led to a table more useful and understandable for clinical audiences. We arrived at an SoF table that was acceptable to the stakeholders and in principle feasible to implement technically. Some challenges remain, including presenting continuous outcomes and technical/editorial implementation. 2010 Elsevier Inc. All rights reserved.
TL;DR: This paper will describe the file format of the MySQL Database 5.1.32 with InnoDB Storage Engine, and explain with a practical example of how to reconstruct the data found in the file system of any SQL table.
Abstract: Whenever data is being processed, there are many places where parts of the data are temporarily stored; thus forensic analysis can reveal past activities, create a (partial) timeline and recover deleted data. While this fact is well known for computer forensics, multiple forensic tools exist to analyze data and the systematic analysis of database systems has only recently begun. This paper will describe the file format of the MySQL Database 5.1.32 with InnoDB Storage Engine. It will further explain with a practical example of how to reconstruct the data found in the file system of any SQL table. We will show how to reconstruct the table as it is, read data sets from the file and how to interpret the gained information.
TL;DR: In this paper, a packet-forwarding table for providing traffic management across servers in a server group is proposed, where a hash value is computed from data in one or more fields in the header of a received packet.
Abstract: A switch device includes a packet-forwarding table for providing traffic management across servers in a server group. Each table entry maps a hash value to a server in the server group. A hash value is computed from data in one or more fields in the header of a received packet. The computed hash value is used as an index into the packet-forwarding table to access a table entry and to identify from the table entry the server in the server group to which the table entry maps the computed hash value. The switch device forwards the packet to the identified server. Implementing traffic management decisions in hardware enables packet switching at the line rate of the switch ports. In addition, the hardware-based traffic management performed by the switch device eliminates session tables and the memory to store them, enabling the switch device to handle an unlimited number of client connections.
TL;DR: Results show that the efficiency of production record-keeping and decision-support is improved by the simple and friendly PRDS, developed on Windows Mobile platform invoking a Geographic Information System (GIS) control.
TL;DR: The table access protocol (TAP) defines a service protocol for accessing general table data, including astronomical catalogs as well as general database tables.
Abstract: The table access protocol (TAP) defines a service protocol for accessing general table data, including astronomical catalogs as well as general database tables. Access is provided for both database and table metadata as well as for actual table data. This version of the protocol includes support for multiple query languages, including queries specified using the Astronomical Data Query Language (ADQL [1]) and the Parameterised Query Language (PQL, under development) within an integrated interface. It also includes support for both synchronous and asynchronous queries. Special support is provided for spatially indexed queries using the spatial extensions in ADQL. A multi-position query capability permits queries against an arbitrarily large list of astronomical targets, providing a simple spatial cross-matching capability. More sophisticated distributed cross-matching capabilities are possible by orchestrating a distributed query across multiple TAP services.
TL;DR: In this paper, a definition of a custom data field that is unique to an organization having isolated access to the tenant is presented, and a tenant-dependent table is presented for access via a database client at the organization.
Abstract: An in-memory database server hosting a tenant of a multi-tenant software architecture can receive a definition of a custom data field that is unique to an organization having isolated access to the tenant. The custom data field can extend a standard table defined by central metadata stored at a system tenant of the multi-tenant software architecture. Tenant private metadata that includes the definition can be stored in memory accessible only to the tenant. A tenant-dependent table that includes the custom data field can be formed, for example by retrieving central metadata defining the standard table from the system tenant and adding the custom data field using the definition. The tenant-dependent table can be presented for access via a database client at the organization. Related systems, articles of manufacture, and computer-implemented methods are disclosed.
TL;DR: In an MFP, use history information collecting functions frequently used user by user and combination information collecting combinations of functions set by users are received from a server and stored in a use history management table and a function combination management table as discussed by the authors.
Abstract: In an MFP, use history information collecting functions frequently used user by user and combination information collecting combinations of functions set by users are received from a server and stored in a use history management table and a function combination management table. When a mode is selected by a logged-in user, the use history management table is read and a function highly frequently used by the user is displayed on a touch-panel. When the selected function is established, the function combination management table is read, and a function or functions frequently combined with the established function are displayed on the touch-panel. When a job ends, the selected combination of functions is transmitted to the server.
TL;DR: In this paper, a technique for analyzing software in which uninstrumented components can be discovered and dynamically instrumented during a runtime of the software is presented, and performance data is gathered from the instrumentation, and it may be learned that the performance of some methods is an issue.
Abstract: A technique for analyzing software in which un-instrumented components can be discovered and dynamically instrumented during a runtime of the software. Initially, an application configured with a baseline set of instrumented components such as methods. As the application runs, performance data is gathered from the instrumentation, and it may be learned that the performance of some methods is an issue. To analyze the problem, any methods which are callable from a method at issue are discovered by inspecting the byte code of loaded classes in a JAVA Virtual Machine (JVM). Byte code of the class is parsed to identify opcodes which invoke byte code to call other methods. An index to an entry in a constants pool table is identified based on an opcode. A decision can then be made to instrument and/or report the discovered methods.
TL;DR: In this article, a technique for database table look-up is described. But the technique is limited to one or more column attributes of a database table in a data structure, wherein the data structure also comprises a record identification (RID) column of a table, one ormore predicate columns corresponding to the RID column, and a sequence number column that is associated with the updated records.
Abstract: Techniques for database table look-up are provided. The techniques include storing one or more column attributes of a database table in a data structure, wherein the data structure also comprises a record identification (RID) column of a table, one or more predicate columns corresponding to the RID column, and a sequence number column that is associated with one or more updated records, generating a key using one or more portions from one or more of the one or more predicate columns, using the key to partition the data structure, wherein partitioning the data structure comprises partitioning the one or more predicate columns for evaluation, and evaluating the one or more predicate columns against the data structure for each matching predicate column-data structure partition.
TL;DR: In this paper, the authors describe techniques for performing scalable layer two (L2) learning in computer networks, where a network device that includes interfaces and a control unit may implement these techniques.
Abstract: In general, techniques are described for performing scalable layer two (L2) learning in computer networks. A network device that includes interfaces and a control unit may implement these techniques. The control unit stores a L2 learning table having entries that are each associated with a service tag identifying a service virtual local area network. In response to receiving a packet that includes a service tag, the interfaces access the L2 learning table using the service tag to determine whether any of the entries of the L2 learning table are associated with the service tag. When none of the entries are associated with the service tag, the L2 learning module updates the L2 learning table to create a new entry defining an association between the one of the interfaces that received the packet and the service tag.
TL;DR: FlexPref, a framework for extensible preference evaluation in database systems implemented in the query processor, aims to support a wide-array of preference evaluation methods in a single extensible code base.
Abstract: Personalized database systems give users answers tailored to their personal preferences. While numerous preference evaluation methods for databases have been proposed (e.g., skyline, top-k, k-dominance, k-frequency), the implementation of these methods at the core of a database system is a double-edged sword. Core implementation provides efficient query processing for arbitrary database queries, however this approach is not practical as each existing (and future) preference method requires a custom query processor implementation. To solve this problem, this paper introduces FlexPref, a framework for extensible preference evaluation in database systems. FlexPref, implemented in the query processor, aims to support a wide-array of preference evaluation methods in a single extensible code base. Integration with FlexPref is simple, involving the registration of only three functions that capture the essence of the preference method. Once integrated, the preference method “lives” at the core of the database, enabling the efficient execution of preference queries involving common database operations. To demonstrate the extensibility of FlexPref, we provide case studies showing the implementation of three database operations (single table access, join, and sorted list access) and five state-of-the-art preference evaluation methods (top-k, skyline, k-dominance, top-k dominance, and k-frequency). We also experimentally study the strengths and weaknesses of an implementation of FlexPef in PostgreSQL over a range of single-table and multi-table preference queries.
TL;DR: In this article, a radio node and an RFID tag are mounted on a moving target to provide a technology for specifying a base station that connects to a moving radio node by efficient processing.
Abstract: PROBLEM TO BE SOLVED: To provide a technology for specifying a base station that connects to a moving radio node by efficient processing.SOLUTION: In one embodiment of the invention, a radio node and an RFID tag are mounted on a moving target. A management server 112 obtains RFID-R/W ID information read from the RFID tag, and ID information of the read RFID tag. The management server 112 refers to a string table 212, a RFID-R/W table 213, and a base station table 214, and specifies the base station to be connected to the radio node.
TL;DR: This paper presented a method for incremental re-training of an SMT system, in which a local phrase table is created and incrementally updated as a file is translated and post-edited.
Abstract: A method is presented for incremental re-training of an SMT system, in which a local phrase table is created and incrementally updated as a file is translated and post-edited. It is shown that translation data from within the same file has higher value than other domain-specific data. In two technical domains, within-file data increases BLEU score by several full points. Furthermore, a strong recency effect is documented; nearby data within the file has greater value than more distant data. It is also shown that the value of translation data is strongly correlated with a metric defined over new occurrences of n-grams. Finally, it is argued that the incremental re-training prototype could serve as the basis for a practical system which could be interactively updated in real time in a post-editing setting. Based on the results here, such an interactive system has the potential to dramatically improve translation quality.
TL;DR: In this article, semantic queries are expressed and executed within a relational database by defining semantic rules applied to execute the semantic queries using table valued functions and common table expressions, and then simply calling the defined table value functions to execute queries.
Abstract: Semantic queries are expressed and executed within a relational database. This can be done by defining semantic rules applied to execute the semantic queries using table valued functions and common table expressions, and then simply calling the defined table valued functions to execute the queries.
TL;DR: Monte Carlo simulations are used to estimate the confidence limits for survival and death probabilities, life expectancy, and healthylife expectancy, as well as any other quantity based on a conventional life table or on Sullivan’s healthy life table.
Abstract: In this report, we use Monte Carlo simulations to estimate the confidence limits for survival and death probabilities, life expectancy, and healthy life expectancy, as well as any other quantity based on a conventional life table or on Sullivan’s healthy life table. Two Excel spreadsheets for use with this method are provided. Background Life-table measures computed from empirical data can be seen as deterministic or random variables. The deterministic approach is by far more popular in demography. It assumes that the population under consideration is a general universe, and that life expectancy and other life table quantities precisely describe the true mortality regime in this population. However, studies on mortality in small populations tend to produce shaky values for life-table functions. These inexplicable fluctuations require researchers to apply the stochastic approach by estimating the likely magnitude of stochastic error. The growth in the use of Sullivan’s (1971) method for estimating health expectancy, which involves combining the life table with survey-based prevalences of poor health, calls for an even greater degree of statistical estimation. Indeed, estimates of health expectancy can be imprecise even in large populations if the prevalence of ill-health origins from samples consist of a few thousand respondents. In this case, stochastic errors in age-specific prevalence of poor health (disability, chronic illness) can be important. One of the most widely used approaches for addressing statistical inference of life expectancy and other aggregated mortality measures was introduced by Chin Long Chiang (1961, 1984). Chiang considered an unbiased estimate of probability of death N D q = ˆ , where D denotes death number (number of events) within a certain age interval and N cohort size (number of trials) at the beginning of the same age interval. For estimation of the standard error of the probability of death, Chiang considered the scheme of Bernoulli trials. Then the random death numbers would be distributed binomially around the mean value estimated as and with the standard deviation estimated as N q ⋅ ˆ N q q ⋅ − ) ˆ 1 ( ˆ . In this case, standard error of probability of death is N q q Sq ) ˆ 1 ( ˆ ˆ − ⋅ = . If the observed death number and population size are higher than 15-20, the normal distribution can be considered as a good approximation of the binomial distribution. In this case, upper and lower confidence limits for the confidence level can be expressed according to the Wald formula , q S z q CI ˆ 2 / ˆ α ± =
TL;DR: In this article, the authors present a system and methods for providing change notifications of changes made in a database table to a remote application by registering an application to receive notifications when a first table of a plurality of tables in the database is changed.
Abstract: Systems and methods for providing notifications of changes made in a database table to a remote application is presented. The system and methods manage change notification of a table in a database by notifying an application registered to receive notifications of changes made to a database table. A database manager executing on a device may receive a request to register an application to be notified when a first table of a plurality of tables in a database is changed. The database manager may establish a notification table comprising fields of the first table and one or more additional fields for managing notification. The database manager may establish a trigger on the first table to invoke a trigger procedure to copy a changed row of the first table to the notification table. The database manager may create a rule for the notification table to notify registered applications when a new row is inserted into the notification table
TL;DR: In this article, a method of hashing for networks and systems thereof is described, which includes receiving a first element, generating a first plurality of hash values based on the first element and hash functions, determining a plurality of buckets in a table, each of the buckets associated with a different one of the hash values, selecting one or more buckets, storing a first associated value in the selected bucket, and encoding an identifier (ID) of hash function generating the hash value associated with the selected buckets into a filter based on hash value.
Abstract: Example embodiments are directed to methods of hashing for networks and systems thereof. At least one example embodiment provides a method of processing elements in a system. The method includes receiving a first element, generating a first plurality of hash values based on the first element and a first plurality of hash functions, determining a first plurality of buckets in a table based on the first plurality of hash values, each of the first plurality of buckets associated with a different one of the hash values, selecting one of the first plurality of buckets, storing a first associated value in the selected bucket, the first associated value being associated with the first element, and encoding an identifier (ID) of the hash function generating the hash value associated with the selected bucket into a filter based on the hash value.