TL;DR: In this paper, SPSS is used to explore relationships among variables using graphs to describe and explore the data, checking the reliability of a scale, choosing the right statistic, and comparing groups.
Abstract: Preface Data files and website Introduction and overview Part One: Getting started Designing a study Preparing a codebook Getting to know SPSS Part Two: Preparing the data file Creating a data file and entering data Screening and cleaning the data Part Three: Preliminary analyses Descriptive statistics Using graphs to describe and explore the data Manipulating the data Checking the reliability of a scale Choosing the right statistic Part Four: Statistical techniques to explore relationships among variables Correlation Partial correlation Multiple regression Logistic regression Factor analysis Part Five: Statistical techniques to compare groups Non-parametric statistics T-tests One-way analysis of variance Two-way between-groups ANOVA Mixed between-within subjects analysis of variance Multivariate analysis of variance Analysis of covariance Appendix: Details of data files Recommended reading References Index
TL;DR: The CIAO (Chandra Interactive Analysis of Observations) software package was first released in 1999 following the launch of the Chandra X-ray Observatory and is used by astronomers across the world to analyze Chandra data as well as data from other telescopes.
Abstract: The CIAO (Chandra Interactive Analysis of Observations) software package was first released in 1999 following the launch of the Chandra X-ray Observatory and is used by astronomers across the world to analyze Chandra data as well as data from other telescopes. From the earliest design discussions, CIAO was planned as a general-purpose scientific data analysis system optimized for X-ray astronomy, and consists mainly of command line tools (allowing easy pipelining and scripting) with a parameter-based interface layered on a flexible data manipulation I/O library. The same code is used for the standard Chandra archive pipeline, allowing users to recalibrate their data in a consistent way. We will discuss the lessons learned from the first six years of the software's evolution. Our initial approach to documentation evolved to concentrate on recipe-based "threads" which have proved very successful. A multi-dimensional abstract approach to data analysis has allowed new capabilities to be added while retaining existing interfaces. A key requirement for our community was interoperability with other data analysis systems, leading us to adopt standard file formats and an architecture which was as robust as possible to the input of foreign data files, as well as re-using a number of external libraries. We support users who are comfortable with coding themselves via a flexible user scripting paradigm, while the availability of tightly constrained pipeline programs are of benefit to less computationally-advanced users. As with other analysis systems, we have found that infrastructure maintenance and re-engineering is a necessary and significant ongoing effort and needs to be planned in to any long-lived astronomy software.
TL;DR: In this article, a file system includes at least one directory having at least 1 file containing data, but about which at least another file has no information, and a repository of metadata provides information about the data in the files.
Abstract: A file system and method serves to create and manage content. The file system includes at least one directory having at least one file containing data, but about which at least one file has no information. A repository of metadata provides information about the data in the files. Phantom files are created which are designated by names and associated attributes, point to data in base files, without specifying a path name to the base files.
TL;DR: In this paper, the authors propose a synchronization process that replicates selected source data files data stored on the network and creates a corresponding set of replicated data files, called the target data files that are stored on a back up server.
Abstract: The invention provides systems and methods for continuous back up of data stored on a computer network. To this end the systems of the invention include a synchronization process that replicates selected source data files data stored on the network and to create a corresponding set of replicated data files, called the target data files, that are stored on a back up server. This synchronization process builds a baseline data structure of target data files. In parallel to this synchronization process, the system includes a dynamic replication process that includes a plurality of agents, each of which monitors a portion of the source data files to detect and capture, at the byte-level, changes to the source data files. Each agent may record the changes to a respective journal file, and as the dynamic replication process detects that the journal files contain data, the journal files are transferred or copied to the back up server so that the captured changes can be written to the appropriate ones of the target data files.
TL;DR: This work presents a mechanism to reclaim space from this incidental duplication to make it available for controlled file replication, and includes convergent encryption, which enables duplicate files to be coalesced into the space of a single file, even if the files are encrypted with different users' keys.
Abstract: The Farsite distributed file system provides availability by replicating each file onto multiple desktop computers. Since this replication consumes significant storage space, it is important to reclaim used space where possible. Measurement of over 500 desktop file systems shows that nearly half of all consumed space is occupied by duplicate files. We present a mechanism to reclaim space from this incidental duplication to make it available for controlled file replication. Our mechanism includes: (1) convergent encryption, which enables duplicate files to be coalesced into the space of a single file, even if the files are encrypted with different users' keys; and (2) SALAD, a Self-Arranging Lossy Associative Database for aggregating file content and location information in a decentralized, scalable, fault-tolerant manner. Large-scale simulation experiments show that the duplicate-file coalescing system is scalable, highly effective, and fault-tolerant.