TL;DR: This paper proposes a generic metamodel for multidimensional data that keeps the user as the focal point and achieves a clear abstraction for all users, and provides a formal and adaptable, user-centric data warehouse modeling approach to handle large multiddimensional datasets.
Abstract: New emerging scientific applications in geosciences, sensor and spatio-temporal domains require adaptive analysis frameworks that can handle large datasets with multiple dimensions However, existing conceptual design strategies for multidimensional data using the data warehousing framework are not suitable for users, since they involve complex extensions of traditional database design frameworks like E/R and UML diagrams, or the relational star and snowflake schema There is a lack of a generalized model that provides a user-centric design approach to let analysts abstractly design and query multidimensional data In this paper, we propose a solution to this problem by presenting a generic metamodel for multidimensional data that keeps the user as the focal point and achieves a clear abstraction for all users Our model called the BigCube provides users with a set of multidimensional abstract data types for data modeling and includes aggregate operations for performing analysis Overall, we provide a formal and adaptable, user-centric data warehouse modeling approach to handle large multidimensional datasets
TL;DR: The design and implementation of a servicebased dynamic metadata repository over heterogeneous data sources in a distributed event stream processing environment that is dynamic such that data resources can be registered or unregistered at run time is discussed.
Abstract: Enterprise-level applications are becoming complex with the need for event and stream processing, multiple query processing and data analysis over heterogeneous data sources such as relational databases and XML data. Such applications require access to the metadata information for these different data sources. This paper discusses the design and implementation of a servicebased dynamic metadata repository over heterogeneous data sources in a distributed event stream processing environment. The metadata repository is dynamic such that data resources can be registered or unregistered at run time. The design of such a metadata database is the first step in researching multiple query optimization over various query expressions to detect and materialize common subexpressions over relational and XML structured data sources.
TL;DR: A web-based interactive data visualization system, which can display various low-dimensional outlier subspaces to allow users to observe and analyze the distributions of outliers and help the developers of outlier detection applications to validate their experiment results.
Abstract: Detecting outliers from high-dimensional data is a
challenge task since outliers mainly reside in various low dimensional subspaces of the data. To tackle this
challenge, subspace analysis based outlier detection
approach has been proposed recently. Detecting outlying
subspaces in which a given data point is an outlier
facilitates a better characterization process for detecting
outliers for high-dimensional data stream, and make
outlier mining for large high-dimensional data set to be
more manageable. In this paper, to facilitate outlier
subspaces analysis from human perception perspectives in
supporting the development of efficient solutions for
high-dimensional data, we propose a web-based
interactive data visualization system, which can display
various low-dimensional outlier subspaces to allow users
to observe and analyze the distributions of outliers. The
proposed visualization tool can help the developers of
outlier detection applications to directly examine the
distributions of outliers in various low-dimensional
subspaces to validate their experiment results.
TL;DR: A method for configuring of web service standards to enforce security requirements on service interaction specification documents within a SOA serves as a mechanism to direct the population of constraints derived from security controls within standards specification documents, such as WSPolicy.
Abstract: Security certification assesses the security posture of a software system to verify its compliance with diverse, pre-specified security controls identified by guidelines from NIST and the US Department of Defense. Service-oriented architectures (SOA) are difficult to certify because they require compliance verification over a mix of local, global, and interaction criteria dictated by the policies of the participating services and SOA governance. Web services further contribute to this difficulty because they lack direct methods to express security controls. Besides being understandable, the method of expression should indicate potential problems complying with chosen services. This paper presents a method for configuring of web service standards to enforce security requirements on service interaction specification documents within a SOA. The outcome serves as a mechanism to direct the population of constraints derived from security controls within standards specification documents, such as WSPolicy. We focus on security controls for auditing and how these can be enforced in an SOA. We introduce a reusable architecture to notate the comparison of security controls across services.
TL;DR: A novel score is proposed using the label correlation in combination with the correlation between features for breast cancer clinic metastasis time prediction using a Combinatorial Score feature selection algorithm in P-Tree structure and combining it with K-Nearest-Neighbor algorithm.
Abstract: DNA microarray experiments are being used to gather information from tissue and cell samples by generating thousands of gene expression measurements. Many researchers are conducting researches regarding gene expression differences, which is useful in disease diagnose, outcome prediction, cancer type classification and etc. In mining high-dimensional microarray data, feature selection is an important pre-processing stage. In the literature nearly all existing supervised feature selection methods use class labels as supervision information. In this paper, we propose a novel score using the label correlation in combination with the correlation between features. We design a Combinatorial Score feature selection algorithm in P-Tree structure and combine it with K-Nearest-Neighbor algorithm for breast cancer clinic metastasis time prediction. Our experiments suggest that our Combinatorial Score feature selection algorithm can find a subset of genes with high computation efficiency and significant performance for breast cancer clinical metastasis prediction.
TL;DR: This paper sets forth a proposed solution to the challenge of generating derived aggregated normalized views from large, distributed data sets of clinical lab data intended for re-use within clinical translational research.
Abstract: Author(s): Wynden, Rob A; Hudson, Donna L. | Abstract: Within the CTSA (Clinical Translational Sciences Awards) program academic medical centers are tasked with the storage of clinical laboratory data within an Integrated Data Repository (IDR) and the subsequent exposure of that data over grid computing environments for hypothesis generation and cohort selection. Lab data that is collected from multiple machines over long periods of time from many labs and across multiple institutions requires normalization before data sets can be aggregated and compared. However, lab data normalization is difficult when published reference intervals are not always reliable and when the lab data collected is not always normally distributed. This paper sets forth a proposed solution to the challenge of generating derived aggregated normalized views from large, distributed data sets of clinical lab data intended for re-use within clinical translational research.
TL;DR: A parametric model for software test estimate along with a test graph for matching test cases with requirements and test cases analysis to aid in producing a more accurate estimates and tracking.
Abstract: Test is a key activity for ensuring software quality. There is always pressure from project sponsor and management for software development team to commit to shorter schedule and lower cost, especially for testing. Some of the main challenges in testing today are to match the test cases with requirements correctly, and to provide accurate estimates and track the test progress accordingly. In this paper, we present a parametric model for software test estimate along with a test graph for matching test cases with requirements and test cases analysis to aid in producing a more accurate estimates and tracking. The model and the test graph can be used jointly or individually.
TL;DR: This work presents a framework for reasoning about safety that is based on the observation that safety hazards sometimes lead to accidents when certain quality requirements of the system are not satisfied.
Abstract: Architects use a variety of techniques to evaluate designs to determine the degree to which a product produced from the architecture would possess the desired levels of specific quality attributes. Reasoning frameworks are used to guide architecture definition by predicting the extent to which a software architecture satisfies its quality requirements. There has been much research about such direct runtime attributes as performance and modifiability but much less work has been done concerning such indirect attributes as safety. We present a framework for reasoning about safety that is based on the observation that safety hazards sometimes lead to accidents when certain quality requirements of the system are not satisfied. This naturally leads to the use of reasoning frameworks for these other qualities as a means to indirectly reason about safety. We present our technique that utilizes standard safety engineering activities and a risk-based qualitative reasoning approach to make a judgment on the satisfaction of safety requirements by the architecture.
TL;DR: Entity predicates Relationship predicates Entity predicates define whether a design component has a specific class (abstract or concrete), what a method (or attribute) is defined in a class....
Abstract: specification of a component The abstract specification contains a formal model of design component, called design component contract. A design component contract includes structural contract, behavioural contract and interface contract 11/06/2013 11 SEDE 2010, San Francisco The abstract specification contract is defined by: ASC::={ , , , } For all i, j / i # j name.cpi # name.cpj 12 11/06/2013 SEDE 2010, San Francisco The abstract specification contract is defined by: ASC::={ , , , } describe the relations of the constructs of each design component 11/06/2013 13 SEDE 2010, San Francisco The abstract specification contract is defined by: ASC::={ , , , } The finite set of input or output ports attached to a design component and the set of messages sent to or received by a component 11/06/2013 14 SEDE 2010, San Francisco The abstract specification contract is defined by: ASC::={ , , , } The behavioural properties are constraints such as event ordering, and action sequence of each design component 15 SEDE 2010, San Francisco 11/06/2013 Structural contracts The structural aspect of a design component contract SC is a tuple SC = (C, A, M, T, Ar, Pc,Pa,),where C is a set of classes in the design component, A is a set of attributes defined in classes C, M is a set of methods defined in classes C, T is a set of types, Ar is a set of access rights = {public, protected, private}, Pc is a set of connection predicates symbols that capture the relationships For example (Inherit, association, aggregation,..), and Pa is a set of action predicates symbols that can perform in a design component For example (invoke, new, return...) 11/06/2013 SEDE 2010, San Francisco 16 Can be formalized using a subset of First Order Logic (FOL), The subset of FOL used to describe the structural aspect of a design component comprises variable symbols, connectives (‘∧’), quantifiers (‘∃’), element (є) and predicate symbols acting upon variable symbols. The variable symbols represent class, objects, while the predicate symbols represent permanent relations. 11/06/2013 SEDE 2010, San Francisco 17 Entity predicates Relationship predicates Entity predicates define whether a design component has a specific class (abstract or concrete), what a method (or attribute) is defined in a class.... Relationship predicates define the relations between classes, attributes, and operations and the actions that a role can perform in a component. 11/06/2013 SEDE 2010, San Francisco 18
TL;DR: The proposed ontology is the first cyber forensics to integrate both network forensics domain knowledge and problem solving knowledge and can be used as a knowledge-base for developing sophisticated intelligent networkForensics systems to support complex chain of reasoning.
Abstract: We propose, in this paper, a new ontology for network forensics analysis. The proposed ontology is the first cyber forensics to integrate both network forensics domain knowledge and problem solving knowledge. As such it can be used as a knowledge-base for developing sophisticated intelligent network forensics systems to support complex chain of reasoning. We use a real life network intrusion scenario to show how our ontology can be integrated and used in intelligent network forensics systems.
TL;DR: This paper shows how evolution practices undertaken by distinct developers and architects prove to be strongly similar and stresses the necessity to abstract those practices for subsequent (re)uses.
Abstract: While pattern engineering is well adopted by the developers community, software evolution does not yet espouses any archetypes { or styles. Practically, there is no works on evolution specication which could permit to quickly tailor changes on-demand. Starting from pragmatic object-oriented examples, this paper shows how evolution practices undertaken by distinct developers and architects prove to be strongly similar and stresses the necessity to abstract those practices for subsequent (re)uses. We leverage evolution styles to specify some identied recurring evolutions and show how they can be instantiated to meet change requirements.
TL;DR: This work presents an unsupervised framework to detect fraudulent applications for identity certificates by extracting identity patterns from the web, and crossing these patterns with information contained in the application forms in order to detect inconsistencies or anomalies.
Abstract: Identity fraud is becoming a growing concern for most government and private institutions. In the literature, identity fraud is categorized into two classes, namely application fraud and behavioral (or transactional) fraud. Most of the previous works in the area of identity fraud prevention and detection have focused primarily on credit transactional frauds. The work described in this paper is one of the very few works that focus on application fraud detection. We present an unsupervised framework to detect fraudulent applications for identity certificates by extracting identity patterns from the web, and crossing these patterns with information contained in the application forms in order to detect inconsistencies or anomalies. The outcome of this process is submitted to a decision tree classifier generated on the fly from a rule base which is derived from heuristics and expert knowledge, and updated as more information are obtained on fraudulent behavior. We evaluate the proposed framework by collecting real identity information online and generating synthetic fraud cases.
TL;DR: A pilot game is presented in order to prove that real time strategy games could in fact be enhanced with role playing constructs and a successful application of the unified modeling language (UML) to game design is described.
Abstract: The computer game industry has grown to a million dollar industry with new titles coming out every month. However, with all these great achievements, the video game industry does have one significant problem: games are played in similar ways. One particular genre for which this is true is the group of real time strategy games. Almost all of these games have the same structure, in which players first build and upgrade a base, with the property that the more upgraded the base the more powerful the units that can be built, and the more powerful the units the better chance of winning. With prospects for making a successful game rewarding a company with perhaps millions of dollars more games are now flooding the shelves and quantity has become more important than quality [2][6]. This paper reviews our work on improving real time strategy (RTSs) games by incorporating aspects from role playing games (RPGs)[1]. In this case, the four major components from the role playing game are: character equipment, character advancement, character customization, and character classification. In addition, this paper describes a successful application of the unified modeling language (UML) to game design. By demonstrating game design through UML, programmers could see potential problems with game play before the entire code was written. This paper presents a pilot game which was developed in order to prove that real time strategy games could in fact be enhanced with role playing constructs.
TL;DR: This paper proposes a simple learning algorithm based on Hypernet neural networks to predict in vivo human hepatic clearance of drugs and uses a quadratic discriminant function to do so.
Abstract: Accurate prediction of human hepatic clearance of drugs plays a key role the development of new drugs. Doing so is challenging due to the complex nature of the human liver. Numerous hepatic mechanisms are involved in clearing drugs and toxins from bloodstream, some of which are not well understood. In this paper, we propose a simple learning algorithm based on Hypernet neural networks to predict in vivo human hepatic clearance of drugs. The algorithm uses a quadratic discriminant function. A set of 85 compounds was assembled from various sources. The feature space consists of 20 publicly available physicochemical properties calculated from compound molecular structures. In addition, in vitro and in vivo rat, and in vitro human clearance data were used as features. Prediction performance was poor when all 85 compounds were used. However, dividing the dataset into smaller normalized sets significantly improved the success rate. In particular, approximately 80% of the predicted values were successful when data from [13] was used (2-fold error = 20%, r = 0.775).