TL;DR: This paper presents a general detection framework that is independent of botnet C&C protocol and structure, and requires no a priori knowledge of botnets (such as captured bot binaries and hence the botnet signatures, and C &C server names/addresses).
Abstract: Botnets are now the key platform for many Internet attacks, such as spam, distributed denial-of-service (DDoS), identity theft, and phishing. Most of the current botnet detection approaches work only on specific botnet command and control (C&C) protocols (e.g., IRC) and structures (e.g., centralized), and can become ineffective as botnets change their C&C techniques. In this paper, we present a general detection framework that is independent of botnet C&C protocol and structure, and requires no a priori knowledge of botnets (such as captured bot binaries and hence the botnet signatures, and C&C server names/addresses). We start from the definition and essential properties of botnets. We define a botnet as a coordinated group of malware instances that are controlled via C&C communication channels. The essential properties of a botnet are that the bots communicate with some C&C servers/peers, perform malicious activities, and do so in a similar or correlated way. Accordingly, our detection framework clusters similar communication traffic and similar malicious traffic, and performs cross cluster correlation to identify the hosts that share both similar communication patterns and similar malicious activity patterns. These hosts are thus bots in the monitored network. We have implemented our BotMiner prototype system and evaluated it using many real network traces. The results show that it can detect real-world botnets (IRC-based, HTTP-based, and P2P botnets including Nugache and Storm worm), and has a very low false positive rate.
TL;DR: This paper proposes an approach that uses network-based anomaly detection to identify botnet C&C channels in a local area network without any prior knowledge of signatures or C &C server addresses, and shows that BotSniffer can detect real-world botnets with high accuracy and has a very low false positive rate.
Abstract: Botnets are now recognized as one of the most serious security threats. In contrast to previous malware, botnets have the characteristic of a command and control (C&C) channel. Botnets also often use existing common protocols, e.g., IRC, HTTP, and in protocol-conforming manners. This makes the detection of botnet C&C a challenging problem. In this paper, we propose an approach that uses network-based anomaly detection to identify botnet C&C channels in a local area network without any prior knowledge of signatures or C&C server addresses. This detection approach can identify both the C&C servers and infected hosts in the network. Our approach is based on the observation that, because of the pre-programmed activities related to C&C, bots within the same botnet will likely demonstrate spatial-temporal correlation and similarity. For example, they engage in coordinated communication, propagation, and attack and fraudulent activities. Our prototype system, BotSniffer, can capture this spatial-temporal correlation in network traffic and utilize statistical algorithms to detect botnets with theoretical bounds on the false positive and false negative rates. We evaluated BotSniffer using many real-world network traces. The results show that BotSniffer can detect real-world botnets with high accuracy and has a very low false positive rate.
TL;DR: A new kind of network perimeter monitoring strategy, which focuses on recognizing the infection and coordination dialog that occurs during a successful malware infection, and contrast this strategy to other intrusion detection and alert correlation methods.
Abstract: We present a new kind of network perimeter monitoring strategy, which focuses on recognizing the infection and coordination dialog that occurs during a successful malware infection BotHunter is an application designed to track the two-way communication flows between internal assets and external entities, developing an evidence trail of data exchanges that match a state-based infection sequence model BotHunter consists of a correlation engine that is driven by three malware-focused network packet sensors, each charged with detecting specific stages of the malware infection process, including inbound scanning, exploit usage, egg downloading, outbound bot coordination dialog, and outbound attack propagation The BotHunter correlator then ties together the dialog trail of inbound intrusion alarms with those outbound communication patterns that are highly indicative of successful local host infection When a sequence of evidence is found to match BotHunter's infection dialog model, a consolidated report is produced to capture all the relevant events and event sources that played a role during the infection process We refer to this analytical strategy of matching the dialog flows between internal assets and the broader Internet as dialog-based correlation, and contrast this strategy to other intrusion detection and alert correlation methods We present our experimental results using BotHunter in both virtual and live testing environments, and discuss our Internet release of the BotHunter prototype BotHunter is made available both for operational use and to help stimulate research in understanding the life cycle of malware infections
TL;DR: This paper attempts to clear the fog surrounding botnets by constructing a multifaceted and distributed measurement infrastructure, which shows that botnets represent a major contributor to unwanted Internet traffic and provides deep insights that may facilitate further research to curtail this phenomenon.
Abstract: The academic community has long acknowledged the existence of malicious botnets, however to date, very little is known about the behavior of these distributed computing platforms. To the best of our knowledge, botnet behavior has never been methodically studied, botnet prevalence on the Internet is mostly a mystery, and the botnet life cycle has yet to be modeled. Uncertainty abounds. In this paper, we attempt to clear the fog surrounding botnets by constructing a multifaceted and distributed measurement infrastructure. Throughout a period of more than three months, we used this infrastructure to track 192 unique IRC botnets of size ranging from a few hundred to several thousand infected end-hosts. Our results show that botnets represent a major contributor to unwanted Internet traffic - 27% of all malicious connection attempts observed from our distributed darknet can be directly attributed to botnet-related spreading activity. Furthermore, we discovered evidence of botnet infections in 11% of the 800,000 DNS domains we examined, indicating a high diversity among botnet victims. Taken as a whole, these results not only highlight the prominence of botnets, but also provide deep insights that may facilitate further research to curtail this phenomenon.
TL;DR: This work presents work on using machine learning-based classification techniques to identify the command and control (C2) traffic of IRC-based botnets - compromised hosts that are collectively commanded using Internet relay chat (IRC).
Abstract: To date, techniques to counter cyber-attacks have predominantly been reactive; they focus on monitoring network traffic, detecting anomalies and cyber-attack traffic patterns, and, a posteriori, combating the cyber-attacks and mitigating their effects. Contrary to such approaches, we advocate proactively detecting and identifying botnets prior to their being used as part of a cyber-attack (Strayer et al., 2006). In this paper, we present our work on using machine learning-based classification techniques to identify the command and control (C2) traffic of IRC-based botnets - compromised hosts that are collectively commanded using Internet relay chat (IRC). We split this task into two stages: (I) distinguishing between IRC and non-IRC traffic, and (II) distinguishing between botnet and real IRC traffic. For stage I, we compare the performance of J48, naive Bayes, and Bayesian network classifiers, identify the features that achieve good overall classification accuracy, and determine the classification sensitivity to the training set size. While sensitive to the training data and the attributes used to characterize communication flows, machine learning-based classifiers show promise in identifying IRC traffic. Using classification in stage II is trickier, since accurately labeling IRC traffic as botnet and non-botnet is challenging. We are currently exploring labeling flows as suspicious and non-suspicious based on telltales of hosts being compromised