TL;DR: In this paper, a procedure, jet trimming, is proposed to mitigate the sources of contamination in jets initiated by light partons, which is complimentary to existing methods developed for boosted heavy particles.
Abstract: Initial state radiation, multiple interactions, and event pileup can contaminate jets and degrade event reconstruction Here we introduce a procedure, jet trimming, designed to mitigate these sources of contamination in jets initiated by light partons This procedure is complimentary to existing methods developed for boosted heavy particles We find that jet trimming can achieve significant improvements in event reconstruction, especially at high energy/luminosity hadron colliders like the LHC
TL;DR: The basic idea is to iteratively train the network to a certain performance criterion, compute a measure of relevance that identifies which input or hidden units are most critical to performance, and automatically trim the least relevant units.
Abstract: This paper proposes a means of using the knowledge in a network to determine the functionality or relevance of individual units, both for the purpose of understanding the network's behavior and improving its performance. The basic idea is to iteratively train the network to a certain performance criterion, compute a measure of relevance that identifies which input or hidden units are most critical to performance, and automatically trim the least relevant units. This skeletonization technique can be used to simplify networks by eliminating units that convey redundant information; to improve learning performance by first learning with spare hidden units and then trimming the unnecessary ones away, thereby constraining generalization; and to understand the behavior of networks in terms of minimal "rules."
TL;DR: In this paper, the authors propose a means of using the knowledge in a network to determine the functionality or relevance of individual units, both for the purpose of understanding the network's behavior and improving its performance.
Abstract: This paper proposes a means of using the knowledge in a network to determine the functionality or relevance of individual units, both for the purpose of understanding the network's behavior and improving its performance. The basic idea is to iteratively train the network to a certain performance criterion, compute a measure of relevance that identifies which input or hidden units are most critical to performance, and automatically trim the least relevant units. This skeletonization technique can be used to simplify networks by eliminating units that convey redundant information; to improve learning performance by first learning with spare hidden units and then trimming the unnecessary ones away, thereby constraining generalization; and to understand the behavior of networks in terms of minimal "rules."
TL;DR: Although trimming can improve inferences in some settings, in order to consistently improve the performance of propensity score weighting, analysts should focus on the procedures leading to the generation of weights rather than relying on ad-hoc methods such as weight trimmed.
Abstract: Propensity score weighting is sensitive to model misspecification and outlying weights that can unduly influence results. The authors investigated whether trimming large weights downward can improve the performance of propensity score weighting and whether the benefits of trimming differ by propensity score estimation method. In a simulation study, the authors examined the performance of weight trimming following logistic regression, classification and regression trees (CART), boosted CART, and random forests to estimate propensity score weights. Results indicate that although misspecified logistic regression propensity score models yield increased bias and standard errors, weight trimming following logistic regression can improve the accuracy and precision of final parameter estimates. In contrast, weight trimming did not improve the performance of boosted CART and random forests. The performance of boosted CART and random forests without weight trimming was similar to the best performance obtainable by weight trimmed logistic regression estimated propensity scores. While trimming may be used to optimize propensity score weights estimated using logistic regression, the optimal level of trimming is difficult to determine. These results indicate that although trimming can improve inferences in some settings, in order to consistently improve the performance of propensity score weighting, analysts should focus on the procedures leading to the generation of weights (i.e., proper specification of the propensity score model) rather than relying on ad-hoc methods such as weight trimming.
TL;DR: Nine different trimming algorithms are evaluated in four datasets and three common NGS-based applications (RNA-Seq, SNP calling and genome assembly) to increase the quality and reliability of the analysis, with concurrent gains in terms of execution time and computational resources needed.
Abstract: Next Generation Sequencing is having an extremely strong impact in biological and medical research and diagnostics, with applications ranging from gene expression quantification to genotyping and genome reconstruction. Sequencing data is often provided as raw reads which are processed prior to analysis 1 of the most used preprocessing procedures is read trimming, which aims at removing low quality portions while preserving the longest high quality part of a NGS read. In the current work, we evaluate nine different trimming algorithms in four datasets and three common NGS-based applications (RNA-Seq, SNP calling and genome assembly). Trimming is shown to increase the quality and reliability of the analysis, with concurrent gains in terms of execution time and computational resources needed.