TL;DR: It is argued that multivariate prediction approaches are most suitable for dealing with the resulting high-dimensional sparse data matrix and within the statistical framework, the approach scales up to large domains and is able to deal with highly sparse relationship data.
Abstract: One of the main characteristics of Semantic Web (SW) data is that it is notoriously incomplete: in the same domain a great deal might be known for some entities and almost nothing might be known for others. A popular example is the well known friend-of-a-friend data set where some members document exhaustive private and social information whereas, for privacy concerns and other reasons, almost nothing is known for other members. Although deductive reasoning can be used to complement factual knowledge based on the ontological background, still a tremendous number of potentially true statements remain to be uncovered. The paper is focused on the prediction of potential relationships and attributes by exploiting regularities in the data using statistical relational learning algorithms. We argue that multivariate prediction approaches are most suitable for dealing with the resulting high-dimensional sparse data matrix. Within the statistical framework, the approach scales up to large domains and is able to deal with highly sparse relationship data. A major goal of the presented work is to formulate an inductive learning approach that can be used by people with little machine learning background. We present experimental results using a friend-of-a-friend data set.
TL;DR: A frequentist counterpart which has the advantage of being computationally fast is introduced and is based on the penalized likelihood estimation framework, and the construction of confidence intervals is also discussed.
TL;DR: In this article, an alternating type online statistical map generation and releasing device and a method thereof is described, which comprises a statistical composite element resource generation module, a statistical graphic processing module and a statistical map release processing module.
Abstract: The invention discloses an alternating type online statistical map generation and releasing device and a method thereof; the device comprises a statistical composite element resource generation module, a statistical graphic processing module and a statistical map release processing module, wherein the statistical composite element resource generation module is used for generating a statistical element resource library according to input statistical attribute data and statistical unit base spatial data; the statistical graphic processing module carries out hierarchical layer description and statistical layer description respectively according to data input by the statistical element resource library, a set thematic statistical symbol script library and a color scheme library and is combined with a base map layer resource library and uses a layer resource ID as an identifier to carry out combined processing so as to generate the statistical map resource library; and the statistical map release processing module is used for providing a user requirement customization input interface and generating and outputting statistical map data customized by a user according to data input by the user requirement customization input interface
TL;DR: In this paper, the contracting-out problem in service sector analysis is defined and considered from the viewpoint of choice of statistical unit, and it is shown that both the enterprise statistical unit and the establishment-based unit are unsatisfactory for economic analysis.
Abstract: The contracting-out problem in service sector analysis is defined and considered from the viewpoint of choice of statistical unit. It is shown that both the enterprise statistical unit and the establishment- based unit are unsatisfactory for economic analysis. This leads to the recommendation for an “intermediate” statistical unit, namely the “division.” The division, by construction and definition, is shown to have desirable properties for analysis of the contracting-out problem (and own-account problem) relating to services. Some empirical evidence with respect to the Canadian service sector economy supports the analysis and suggests a new interpretation of conventional service sector growth statistics.
TL;DR: This work introduces a new general method for constructing likelihood functions for symbolic data based on a desired probability model for the underlying measurement-level data, while only observing the distributional summaries.
Abstract: Symbolic data analysis (SDA) is an emerging area of statistics concerned with understanding and modelling data that takes distributional form (i.e. symbols), such as random lists, intervals and histograms. It was developed under the premise that the statistical unit of interest is the symbol, and that inference is required at this level. Here we consider a different perspective, which opens a new research direction in the field of SDA. We assume that, as with a standard statistical analysis, inference is required at the level of individual-level data. However, the individual-level data are aggregated into symbols - group-based distributional-valued summaries - prior to the analysis. In this way, large and complex datasets can be reduced to a smaller number of distributional summaries, that may be analysed more efficiently than the original dataset. As such, we develop SDA techniques as a new approach for the analysis of big data. In particular we introduce a new general method for constructing likelihood functions for symbolic data based on a desired probability model for the underlying measurement-level data, while only observing the distributional summaries. This approach opens the door for new classes of symbol design and construction, in addition to developing SDA as a viable tool to enable and improve upon classical data analyses, particularly for very large and complex datasets. We illustrate this new direction for SDA research through several real and simulated data analyses.