Open AccessPosted Content
Asymptotic Distribution-Free Change-Point Detection for Modern Data
Lynna Chu,Hao Chen +1 more
- 01 Jul 2017
TL;DR: Analytic p-value approximations to the significance of the new test statistics for the single change-point alternative and changed interval alternative are derived, making the new approaches easy off-the-shelf tools for large datasets.
read more
Abstract: We consider the testing and estimation of change-points, locations where the distribution abruptly changes, in a sequence of multivariate or non-Euclidean observations. We study a nonparametric framework that utilizes similarity information among observations, which can be applied to various data types as long as an informative similarity measure on the sample space can be defined. The existing approach along this line has low power and/or biased estimates for change-points under some common scenarios. We address these problems by considering new tests based on similarity information. Simulation studies show that the new approaches exhibit substantial improvements in detecting and estimating change-points. In addition, under some mild conditions, the new test statistics are asymptotically distribution free under the null hypothesis of no change. Analytic p-value approximations to the significance of the new test statistics for the single change-point alternative and changed interval alternative are derived, making the new approaches easy off-the-shelf tools for large datasets. The new approaches are illustrated in an analysis of New York taxi data.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Figures

TABLE III THE EMPIRICAL POWER (DETECTION ACCURACY) IN PERCENT UNDER SETTINGS I-III. HERE Mg-NN IS THE PROPOSED METHOD. 
TABLE II THE SPECIFIC CHANGES FOR DIFFERENT SETTINGS AND ALTERNATIVES. 
TABLE VI THE EMPIRICAL POWER WITH THE AVERAGE CHANGE-POINT ESTIMATION ERROR IN PARENTHESES. Mg-NN IS THE PROPOSED METHOD. THE METHOD WITH THE HIGHEST EMPIRICAL POWER IS HIGHLIGHTED IN BOLD. IF TWO METHODS HAVE THE SAME POWER, CHOOSE THE ONE WITH THE SMALLER AVERAGE CHANGE-POINT ESTIMATION ERROR. 
Fig. 9. Density heatmap of taxi pick-ups for dates 01/01 and 02/01 in year 2014. ![TABLE I EMPIRICAL SIZE OF Mg-NN AFTER SKEWNESS CORRECTION AT 0.05 NOMINAL LEVEL WITH n = 1000 UNDER SETTINGS (I), (II) AND (III). THE k-NNG FOR VARIOUS k’S IS CONSIDERED. HERE k1 = [n0.5], k2 = [n0.65] AND k3 = [n0.8].](/figures/tablei-1-6tfe3p3i3s9r.png)
TABLE I EMPIRICAL SIZE OF Mg-NN AFTER SKEWNESS CORRECTION AT 0.05 NOMINAL LEVEL WITH n = 1000 UNDER SETTINGS (I), (II) AND (III). THE k-NNG FOR VARIOUS k’S IS CONSIDERED. HERE k1 = [n0.5], k2 = [n0.65] AND k3 = [n0.8]. 
Fig. 6. Empirical power of Mg-NN, GET, and MET over 1000 times of repetitions under each setting.
Citations
Optimal change point detection and localization in sparse dynamic networks
TL;DR: This work proposes a computationally simple novel algorithm for network change point localization, called Network Binary Segmentation, which relies on weighted averages of the adjacency matrices, and devise a more sophisticated algorithm based on singular value thresholding, called Local Refinement, that delivers more accurate estimates of the change point locations.
•Posted Content
Optimal Change Point Detection and Localization in Sparse Dynamic Networks
TL;DR: In this paper, the authors study the problem of change point localization in dynamic networks, where the underlying distribution of the adjacency matrices are piecewise constant, and may change over a subset of the time points, called change points.
28
•Proceedings Article
Counting Motifs with Graph Sampling.
Jason M. Klusowski,Yihong Wu +1 more
- 03 Jul 2018
TL;DR: This paper quantifies how much more informative neighborhood sampling is than subgraph sampling, and proposes a family of estimators, encompassing and outperforming the Horvitz-Thompson estimator and achieving the sampling ratio.
•Posted Content
Change point localization in dependent dynamic nonparametric random dot product graphs
TL;DR: This paper proposes a novel change point detection algorithm and constructs a nonparametric version of the CUSUM statistic that allows for temporal dependence and is proved theoretically and supported by extensive numerical experiments, which illustrate state-of-the-art performance.
19
•Posted Content
$\alpha$-Ball divergence and its applications to change-point problems for Banach-valued sequences
Qiang Zhang,Wenliang Pan,Xin Chen,Xueqin Wang +3 more
- 05 Aug 2018
TL;DR: In this paper, a measure of divergence between two distributions, namely Ball divergence, is extended to a new one, called α$-Ball divergence, which can be used to test whether two weakly dependent sequences of Banach-valued random vectors have the same distribution.
2
References
•Posted Content
A Fast and Efficient Change-point Detection Framework for Modern Data
Yi-Wei Liu,Hao Chen +1 more
TL;DR: A novel approach making use of approximate nearest neighbor information of the observations is proposed, and an analytic formula to control the type I error is derived, and a useful pattern of data in high dimension is incorporated that the proposed method could detect various types of changes in the sequence.
5
Likelihood Ratio Tests for a Change in the Multivariate Normal Mean
TL;DR: In this article, a sequence of independent multivariate normal vectors with equal but possibly unknown variance matrices are hypothesized to have equal mean vectors, and they wish to test that the mean vectors have changed after an unknown point in the sequence.
•Posted Content
Seeded Binary Segmentation: A general methodology for fast and optimal change point detection.
TL;DR: This work shows that seeded binary segmentation leads to a near-linear time approach (i.e. linear up to a logarithmic factor) independent of the underlying number of change points, and demonstrates the methodology for high-dimensional settings with an inverse covariance change point detection problem.
Detecting simultaneous changepoints in multiple sequences.
TL;DR: It is shown using replicates and parent-child comparisons that pooling data across samples results in more accurate detection of copy number variants and the multisample segmentation algorithm is applied to the analysis of a cohort of tumour samples containing complex nested and overlapping copy number aberrations.
Consistent and powerful graph-based change-point test for high-dimensional data.
TL;DR: A distribution-free, consistent graph-based change-point detection for high-dimensional data using a Bayesian-type statistic based on the shortest Hamiltonian path is proposed and proven to be consistent.