Xiaoting He
Microsoft
6 Papers
Xiaoting He is an academic researcher from Microsoft. The author has contributed to research in topics: Computer science & Triage. The author has an hindex of 5, co-authored 5 publications. Previous affiliations of Xiaoting He include Chinese Academy of Sciences.
Chat about Author
Papers
Robust log-based anomaly detection on unstable log data
Xu Zhang,Yong Xu,Qingwei Lin,Bo Qiao,Hongyu Zhang,Yingnong Dang,Chunyu Xie,Xinsheng Yang,Qian Cheng,Ze Li,Junjie Chen,Xiaoting He,Randolph Yao,Jian-Guang Lou,Murali Chintalapati,Furao Shen,Dongmei Zhang +16 more
- 12 Aug 2019
TL;DR: The experimental results show that the proposed log-based anomaly detection approach, LogRobust, can well address the problem of log instability and achieve accurate and robust results on real-world, ever-changing log data.
577
An empirical investigation of incident triage for online service systems
Junjie Chen,Xiaoting He,Qingwei Lin,Yong Xu,Hongyu Zhang,Dan Hao,Feng Gao,Zhangwei Xu,Yingnong Dang,Dongmei Zhang +9 more
- 27 May 2019
TL;DR: An empirical study of incident triage on 20 large-scale online service systems in Microsoft finds that incorrect assignment of incident reports occurs frequently and incurs unnecessary cost, especially for the incidents with high severity.
106
Continuous incident triage for large-scale online service systems
Junjie Chen,Xiaoting He,Qingwei Lin,Hongyu Zhang,Dan Hao,Feng Gao,Zhangwei Xu,Yingnong Dang,Dongmei Zhang +8 more
- 10 Nov 2019
TL;DR: DeepCT, a Deep learning based approach to automated Continuous incident Triage, incorporates a novel GRU-based (Gated Recurrent Unit) model with an attention-based mask strategy and a revised loss function, which can incrementally learn knowledge from discussions and update incident-triage results.
94
How incidental are the incidents?: characterizing and prioritizing incidents for large-scale online service systems
Junjie Chen,Shu Zhang,Xiaoting He,Qingwei Lin,Hongyu Zhang,Dan Hao,Yu Kang,Feng Gao,Zhangwei Xu,Yingnong Dang,Dongmei Zhang +10 more
- 21 Dec 2020
TL;DR: The first large-scale empirical analysis of incidents collected from 18 real-world online service systems in Microsoft finds that although a large number of incidents could occur over a short period of time, many of them actually do not matter, i.e., engineers will not fix them with a high priority after manually identifying their root cause.
48
Identifying linked incidents in large-scale online service systems
Yujun Chen,Xian Yang,Hang Dong,Xiaoting He,Hongyu Zhang,Qingwei Lin,Junjie Chen,Pu Zhao,Yu Kang,Feng Gao,Zhangwei Xu,Dongmei Zhang +11 more
- 08 Nov 2020
TL;DR: This work investigates the incidents and their links in a representative real-world incident management (IcM) system and proposes LiDAR (Linked Incident identification with DAta-driven Representation), a deep learning based approach to incident linking.