TL;DR: This work defines a knee formally for continuous functions using the mathematical concept of curvature and compares its definition against alternatives, and evaluates Kneedle's accuracy against existing algorithms on both synthetic and real data sets and its performance in two different applications.
Abstract: Computer systems often reach a point at which the relative cost to increase some tunable parameter is no longer worth the corresponding performance benefit. These ``knees'' typically represent beneficial points that system designers have long selected to best balance inherent trade-offs. While prior work largely uses ad hoc, system-specific approaches to detect knees, we present Kneedle, a general approach to on line and off line knee detection that is applicable to a wide range of systems. We define a knee formally for continuous functions using the mathematical concept of curvature and compare our definition against alternatives. We then evaluate Kneedle's accuracy against existing algorithms on both synthetic and real data sets, and evaluate its performance in two different applications.
TL;DR: This paper describes Haystack, an object storage system optimized for Facebook's Photos application, which provides a less expensive and higher performing solution than the previous approach, which leveraged network attached storage appliances over NFS.
Abstract: This paper describes Haystack, an object storage system optimized for Facebook's Photos application Facebook currently stores over 260 billion images, which translates to over 20 petabytes of data Users upload one billion new photos (∼60 terabytes) each week and Facebook serves over one million images per second at peak Haystack provides a less expensive and higher performing solution than our previous approach, which leveraged network attached storage appliances over NFS Our key observation is that this traditional design incurs an excessive number of disk operations because of metadata lookups We carefully reduce this per photo metadata so that Haystack storage machines can perform all metadata lookups in main memory This choice conserves disk operations for reading actual data and thus increases overall throughput
TL;DR: In this article, the authors propose algorithms that can simultaneously deal with huge datasets and that can find very subtle effects, finding both needles in the haystack and finding very small haystacks that were undetected in previous measurements.
Abstract: Scientific instruments and computer simulations are creating vast data stores that require new scientific methods to analyze and organize the data. Data volumes are approximately doubling each year. Since these new instruments have extraordinary precision, the data quality is also rapidly improving. Analyzing this data to find the subtle effects missed by previous studies requires algorithms that can simultaneously deal with huge datasets and that can find very subtle effects --- finding both needles in the haystack and finding very small haystacks that were undetected in previous measurements.
TL;DR: Haystack is a prototype system for the detection of intrusions in multiuser US Air Force computer systems that reduces voluminous system audit trails to short summaries of user behavior, anomalous events, and security incidents.
Abstract: Haystack is a prototype system for the detection of intrusions in multiuser US Air Force computer systems. Haystack reduces voluminous system audit trails to short summaries of user behavior, anomalous events, and security incidents. This is designed to help the system security officer detect and investigate intrusions, particularly by insiders (authorized users). Haystacks's operation is based on behavioral constraints imposed by security policies and on models of typical behavior for user groups and individual users. >
TL;DR: Screening of phage-displayed libraries of proteins and peptides has been used to solve an increasing diversity of problems, including identification of binding motifs for much smaller targets and the use of novel screening methods to identify chemical activities.