Instruction2vec: Efficient Preprocessor of Assembly Code to Detect Software Weakness with CNN

doi:10.3390/APP9194086

Open AccessJournal Article10.3390/APP9194086

Instruction2vec: Efficient Preprocessor of Assembly Code to Detect Software Weakness with CNN

Yongjun Lee, +5 more

- 30 Sep 2019

- Applied Sciences

- Vol. 9, Iss: 19, pp 4086

50

TL;DR: Experimental results show that the proposed scheme can detect software vulnerabilities with an accuracy of 91% of the assembly code, and a new method—Instruction2vec—an improved static binary analysis technique using machine.

Abstract: Potential software weakness, which can lead to exploitable security vulnerabilities, continues to pose a risk to computer systems. According to Common Vulnerability and Exposures, 14,714 vulnerabilities were reported in 2017, more than twice the number reported in 2016. Automated vulnerability detection was recommended to efficiently detect vulnerabilities. Among detection techniques, static binary analysis detects software weakness based on existing patterns. In addition, it is based on existing patterns or rules, making it difficult to add and patch new rules whenever an unknown vulnerability is encountered. To overcome this limitation, we propose a new method—Instruction2vec—an improved static binary analysis technique using machine. Our framework consists of two steps: (1) it models assembly code efficiently using Instruction2vec, based on Word2vec; and (2) it learns the features of software weakness code using the feature extraction of Text-CNN without creating patterns or rules and detects new software weakness. We compared the preprocessing performance of three frameworks—Instruction2vec, Word2vec, and Binary2img—to assess the efficiency of Instruction2vec. We used the Juliet Test Suite, particularly the part related to Common Weakness Enumeration(CWE)-121, for training and Securely Taking On New Executable Software of Uncertain Provenance (STONESOUP) for testing. Experimental results show that the proposed scheme can detect software vulnerabilities with an accuracy of 91% of the assembly code.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

•Posted Content

Automated Vulnerability Detection in Source Code Using Deep Representation Learning

R. Russell, +7 more

- 11 Jul 2018

- arXiv: Learning

TL;DR: We developed a fast and scalable vulnerability detection tool based on deep feature representation learning that directly interprets lexed source code.

...read moreread less

327

•Journal Article•10.1109/TDSC.2021.3076142

VulDeeLocator: A Deep Learning-based Fine-grained Vulnerability Detector

Zhen Li, +5 more

- 08 Jan 2020

- arXiv: Cryptography and Security

TL;DR: Vulnerability Deep Learning-based Locator (VulDeeLocator), a deep learning-based fine-grained vulnerability detector, for C programs with source code, advances the state-of-the-art by simultaneously achieving a high detection capability and a high locating precision.

...read moreread less

139

Book Chapter•10.1007/978-1-4842-3381-8_8

Using Machine Learning

Molly Maskrey, +1 more

- 01 Jan 2018

TL;DR: The latest developments in AI focus less on hand coding all possibilities and focuses more on machine learning.

...read moreread less

129

•Journal Article•10.1109/ACCESS.2020.3034324

Cyber Resilience in Healthcare Digital Twin on Lung Cancer

Jun Zhang, +5 more

- 28 Oct 2020

- IEEE Access

TL;DR: A new deep neural model is developed to capture bi-directional context relationships among the risky code keywords for searching an IoT vulnerability in healthcare digital twins and outperforms the state-of-the-art DL-based methods for vulnerability detection.

...read moreread less

115

•Journal Article•10.1109/tdsc.2021.3076142

VulDeeLocator: A Deep Learning-Based Fine-Grained Vulnerability Detector

01 Jul 2022

- IEEE Transactions on Dependable and Secu...

TL;DR: VulDeeLocator as discussed by the authors is a deep learning-based location-based vulnerability detector that can simultaneously achieve a high detection capability and a high locating precision, dubbed Vulnerability Deep Learning-based Locator.

...read moreread less

72

...

Expand

References

•Proceedings Article

ImageNet Classification with Deep Convolutional Neural Networks

Alex Krizhevsky, +2 more

- 03 Dec 2012

TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.

...read moreread less

88.4K

•Proceedings Article•10.3115/V1/D14-1181

Convolutional Neural Networks for Sentence Classification

Yoon Kim

- 25 Aug 2014

TL;DR: The CNN models discussed herein improve upon the state of the art on 4 out of 7 tasks, which include sentiment analysis and question classification, and are proposed to allow for the use of both task-specific and static vectors.

...read moreread less

16.1K

•Posted Content

Convolutional Neural Networks for Sentence Classification

Yoon Kim

- 25 Aug 2014

- arXiv: Computation and Language

TL;DR: In this article, CNNs are trained on top of pre-trained word vectors for sentence-level classification tasks and a simple CNN with little hyperparameter tuning and static vectors achieves excellent results on multiple benchmarks.

...read moreread less

7.8K

•Posted Content

word2vec Explained: deriving Mikolov et al.'s negative-sampling word-embedding method.

Yoav Goldberg, +1 more

- 15 Feb 2014

- arXiv: Computation and Language

TL;DR: This note is an attempt to explain equation (4) (negative sampling) in "Distributed Representations of Words and Phrases and their Compositionality" by Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado and Jeffrey Dean.

...read moreread less

1.7K

Proceedings Article•10.1145/2016904.2016908

Malware images: visualization and automatic classification

Lakshmanan Nataraj, +3 more

- 20 Jul 2011

TL;DR: Preliminary experimental results are quite promising with 98% classification accuracy on a malware database of 9,458 samples with 25 different malware families and the technique exhibits interesting resilience to popular obfuscation techniques such as section encryption.

...read moreread less

1.1K