Structured content

Topic Tools

Papers published on a yearly basis

Papers

Proceedings Article•

Web-scale Data Integration: You can only afford to Pay As You Go

[...]

Jayant Madhavan¹, Shawn R. Jeffery², Shirley Cohen³, Xin Dong⁴, David Ko¹, Cong Yu⁵, Alon Halevy¹ - Show less +3 more•Institutions (5)

Google¹, University of California, Berkeley², University of Pennsylvania³, University of Washington⁴, University of Michigan⁵

1 Jan 2007

TL;DR: This paper proposes a new data integration architecture, PAYGO, which is inspired by the concept of dataspaces and emphasizes pay-as-you-go data management as means for achieving web-scale data integration.

...read moreread less

Abstract: The World Wide Web is witnessing an increase in the amount of structured content – vast heterogeneous collections of structured data are on the rise due to the Deep Web, annotation schemes like Flickr, and sites like Google Base. While this phenomenon is creating an opportunity for structured data management, dealing with heterogeneity on the web-scale presents many new challenges. In this paper, we highlight these challenges in two scenarios – the Deep Web and Google Base. We contend that traditional data integration techniques are no longer valid in the face of such heterogeneity and scale. We propose a new data integration architecture, PAYGO, which is inspired by the concept of dataspaces and emphasizes pay-as-you-go data management as means for achieving web-scale data integration.

...read moreread less

386 citations

Patent•

Method and apparatus for normalizing and converting structured content

[...]

Edward A. Green, Ramon Krosley, Kevin L. Markey

26 Jun 2001

TL;DR: In this article, a method and apparatus for transforming information from one semantic environment to another is disclosed for real-time transformation of electronic messages, which is based on the Normalization/Translation NorTran Workbench and a SOLx server.

...read moreread less

Abstract: A method and apparatus are disclosed for transforming information from one semantic environment to another. In one implementation, a SOLx system (1700) includes a Normalization/Translation NorTran Workbench (1702) and a SOLx server (1708). The NorTran Workbench (1702) is used to develop a knowledge base based on information from a source system (1712), to normalize legacy content (1710) according to various rules, and to develop a database (1706) of translated content. During run time, the SOLx server (1708) receives transmissions from the source system (1712), normalizes the transmitted content, accesses the database (1706) of translated content and otherwise translates the normalized content, and reconstructs the transmission to provide substantially real-time transformation of electronic messages.

...read moreread less

317 citations

Journal Article•10.5465/AMR.1980.4288954•

Structured Content Analysis of Cases: A Complementary Method for Organizational Research

[...]

Lawrence R. Jauch¹, Richard N. Osborn¹, Thomas N. Martin¹•Institutions (1)

Southern Illinois University Carbondale¹

01 Oct 1980-Academy of Management Review

TL;DR: A comparatively new method that uses case materials for the development and testing of hypotheses and the key role of the content analysis schedule is explained and an illustration centering on environmental volatility is provided.

...read moreread less

Abstract: In this article, we introduce a comparatively new method that uses case materials for the development and testing of hypotheses. After comparing cases to questionnaires as a data source, we explain the key role of the content analysis schedule, and provide an illustration centering on environmental volatility.

...read moreread less

263 citations

Proceedings Article•10.1145/3351095.3372862•

Garbage in, garbage out?: do machine learning application papers in social computing report where human-labeled training data comes from?

[...]

R. Stuart Geiger¹, Kevin Yu¹, Yanlai Yang¹, Mindy Dai¹, Jie Qiu¹, Rebekah Tang¹, Jenny Huang¹ - Show less +3 more•Institutions (1)

University of California¹

27 Jan 2020

TL;DR: In this paper, the authors investigate to what extent a sample of machine learning application papers in social computing, specifically papers from ArXiv and traditional publications performing an ML classification task on Twitter data, give specific details about whether such best practices were followed.

...read moreread less

Abstract: Many machine learning projects for new application areas involve teams of humans who label data for a particular purpose, from hiring crowdworkers to the paper's authors labeling the data themselves. Such a task is quite similar to (or a form of) structured content analysis, which is a longstanding methodology in the social sciences and humanities, with many established best practices. In this paper, we investigate to what extent a sample of machine learning application papers in social computing --- specifically papers from ArXiv and traditional publications performing an ML classification task on Twitter data --- give specific details about whether such best practices were followed. Our team conducted multiple rounds of structured content analysis of each paper, making determinations such as: Does the paper report who the labelers were, what their qualifications were, whether they independently labeled the same items, whether inter-rater reliability metrics were disclosed, what level of training and/or instructions were given to labelers, whether compensation for crowdworkers is disclosed, and if the training data is publicly available. We find a wide divergence in whether such practices were followed and documented. Much of machine learning research and education focuses on what is done once a "gold standard" of training data is available, but we discuss issues around the equally-important aspect of whether such data is reliable in the first place.

...read moreread less

175 citations

Patent•

Method and system for removing content entity object in a hierarchically structured content object stored in a database

[...]

William J. Baer¹, Edward Hanapole¹, Robert C. Hartman¹, Richard D. Hennessy¹, Eugene Johnson¹, I-Ming Kao¹, Janet L. Murray¹, Jerry D. Robertson¹, Richard W. Walkus¹ - Show less +5 more•Institutions (1)

IBM¹

21 Jan 2000

TL;DR: In this paper, a web-based system, method and program product are provided for adding content to a content object stored in a data repository as a group of hierarchically related content entities.

...read moreread less

Abstract: A web-based system, method and program product are provided for adding content to a content object stored (e.g., a custom compilation or prepublished work) in a data repository as a group of hierarchically related content entities. Each noncontainer content object is preferably stored as a separate entity in the data repository. Each content entity is also stored as a row in a digital library index class as a collection of attributes and references to related content entities and containers. As the user selects desired objects for inclusion in a content object, the system arranges the objects hierarchically, e.g., into volumes, chapters and sections according to the order specified by the user. The system then creates a file object (e.g., a CBO) defining the content object that contains a list or outline of the container and noncontainer entities selected, their identifiers, order and structure. This file object is stored separately in the data repository. Content is removed from the compilation by removing the container or noncontainer identifier from the list or outline. This is achieved through a user interface by providing a mechanism for enabling a user to select a container or noncontainer (e.g., by title) to be removed.

...read moreread less

164 citations

...

Expand

Topic Tools

Papers published on a yearly basis

Papers

Web-scale Data Integration: You can only afford to Pay As You Go

Method and apparatus for normalizing and converting structured content

Structured Content Analysis of Cases: A Complementary Method for Organizational Research

Garbage in, garbage out?: do machine learning application papers in social computing report where human-labeled training data comes from?

Method and system for removing content entity object in a hierarchically structured content object stored in a database

Related Topics (5)

Performance Metrics

No. of papers in the topic in previous years
Year	Papers
2021	16
2020	11
2019	10
2018	9
2017	11
2016	4