TL;DR: The design and implementation details of the first malware analysis pipeline specifically tailored for Linux malware are presented and the first large-scale measurement study conducted on 10,548 malware samples is presented documenting detailed statistics and insights that can help directing future work in the area.
Abstract: For the past two decades, the security community has been fighting malicious programs for Windows-based operating systems. However, the recent surge in adoption of embedded devices and the IoT revolution are rapidly changing the malware landscape. Embedded devices are profoundly different than traditional personal computers. In fact, while personal computers run predominantly on x86-flavored architectures, embedded systems rely on a variety of different architectures. In turn, this aspect causes a large number of these systems to run some variants of the Linux operating system, pushing malicious actors to give birth to ""Linux malware."" To the best of our knowledge, there is currently no comprehensive study attempting to characterize, analyze, and understand Linux malware. The majority of resources on the topic are available as sparse reports often published as blog posts, while the few systematic studies focused on the analysis of specific families of malware (e.g., the Mirai botnet) mainly by looking at their network-level behavior, thus leaving the main challenges of analyzing Linux malware unaddressed. This work constitutes the first step towards filling this gap. After a systematic exploration of the challenges involved in the process, we present the design and implementation details of the first malware analysis pipeline specifically tailored for Linux malware. We then present the results of the first large-scale measurement study conducted on 10,548 malware samples (collected over a time frame of one year) documenting detailed statistics and insights that can help directing future work in the area.
TL;DR: This paper systematically reconstructs the lineage of IoT malware families, and track their relationships, evolution, and variants through the use of binary code similarity, on a dataset of more than 93k samples submitted to VirusTotal over a period of 3.5 years.
Abstract: The recent emergence of consumer off-the-shelf embedded (IoT) devices and the rise of large-scale IoT botnets has dramatically increased the volume and sophistication of Linux malware observed in the wild. The security community has put a lot of effort to document these threats but analysts mostly rely on manual work, which makes it difficult to scale and hard to regularly maintain. Moreover, the vast amount of code reuse that characterizes IoT malware calls for an automated approach to detect similarities and identify the phylogenetic tree of each family. In this paper we present the largest measurement of IoT malware to date. We systematically reconstruct – through the use of binary code similarity – the lineage of IoT malware families, and track their relationships, evolution, and variants. We apply our technique on a dataset of more than 93k samples submitted to VirusTotal over a period of 3.5 years. We discuss the findings of our analysis and present several case studies to highlight the tangled relationships of IoT malware.
TL;DR: A forensic analysis of Linux executable and linkable format (ELF) files provides insight into different features that have the potential to discriminate malicious executables from benign ones and shows that ELF-Miner provides more than 99% detection accuracy with less than 0.1% false alarm rate.
Abstract: Linux malware can pose a significant threat—its (Linux) penetration is exponentially increasing—because little is known or understood about Linux OS vulnerabilities. We believe that now is the right time to devise non-signature based zero-day (previously unknown) malware detection strategies before Linux intruders take us by surprise. Therefore, in this paper, we first do a forensic analysis of Linux executable and linkable format (ELF) files. Our forensic analysis provides insight into different features that have the potential to discriminate malicious executables from benign ones. As a result, we can select a features’ set of 383 features that are extracted from an ELF headers. We quantify the classification potential of features using information gain and then remove redundant features by employing preprocessing filters. Finally, we do an extensive evaluation among classical rule-based machine learning classifiers—RIPPER, PART, C4.5 Rules, and decision tree J48—and bio-inspired classifiers—cAnt Miner, UCS, XCS, and GAssist—to select the best classifier for our system. We have evaluated our approach on an available collection of 709 Linux malware samples from vx heavens and offensive computing. Our experiments show that ELF-Miner provides more than 99% detection accuracy with less than 0.1% false alarm rate.
TL;DR: In this article, the authors provide a comprehensive survey on the latest developments in cross-architectural IoT malware detection and classification approaches and discuss the feature representations, feature extraction techniques, and machine learning models employed in the surveyed works.
Abstract: In recent years, the increase in non-Windows malware threats had turned the focus of the cybersecurity community. Research works on hunting Windows PE-based malwares are maturing, whereas the developments on Linux malware threat hunting are relatively scarce. With the advent of the Internet of Things (IoT) era, smart devices that are getting integrated into human life have become a hackers’ highway for their malicious activities. The IoT devices employ various Unix-based architectures that follow ELF (Executable and Linkable Format) as their standard binary file specification. This study aims at providing a comprehensive survey on the latest developments in cross-architectural IoT malware detection and classification approaches. Aided by a modern taxonomy, we discuss the feature representations, feature extraction techniques, and machine learning models employed in the surveyed works. We further provide more insights on the practical challenges involved in cross-architectural IoT malware threat hunting and discuss various avenues to instill potential future research.