A Lightweight Cross-Version Binary Code Similarity Detection Based on Similarity and Correlation Coefficient Features
Hui Guo,Shu-guang Huang,Cheng Huang,Min Zhang,Zulie Pan,Fan Shi,Hui Huang,Donghui Hu,Xiaoping Wang +8 more
TL;DR: A new and lightweight method that transforms binary functions into vectors and signals and computes the similarity coefficient value and correlation coefficient value for solving cross-version BCSD problem based on multiple features is proposed.
read more
Abstract: The technique of binary code similarity detection (BCSD) has been applied in many fields, such as malware detection, plagiarism detection and vulnerability search, etc Existing solutions for the BCSD problem usually compare specific features between binaries based on the control flow graphs of functions from binaries or compute the embedding vector of binary functions and solve the problem based on deep learning algorithms In this paper, from another research perspective, we propose a new and lightweight method to solve cross-version BCSD problem based on multiple features It transforms binary functions into vectors and signals and computes the similarity coefficient value and correlation coefficient value for solving cross-version BCSD problem Without relying on the CFG of functions, deep learning algorithms and other related attributes, our method works directly on the raw bytes of each binary and it can be used as an alternative method to coping with various complex situations that exist in the real-world environment We implement the method and evaluate it on a custom dataset with about 423,282 samples The result shows that the method could perform well in cross-version BCSD field, and the recall of our method could reach 9663%, which is almost the same as the state-of-the-art static solution
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
•Posted Content
Revisiting Binary Code Similarity Analysis using Interpretable Feature Engineering and Lessons Learned.
TL;DR: This is the first systematic study on the basic features used in BCSA by leveraging interpretable feature engineering on a large-scale benchmark and shows that a simple interpretable model with a few basic features can achieve a comparable result to that of recent deep learning-based approaches.
Definition, approaches, and analysis of code duplication detection (2006–2020): a critical review
TL;DR: A critical review of previous works on code duplication for code clone and plagiarism detection is performed and future research direction of code duplication is described.
9
Revisiting Binary Code Similarity Analysis Using Interpretable Feature Engineering and Lessons Learned
TL;DR: Wang et al. as mentioned in this paper conducted a systematic study on the basic features used in Binary Code Similar Analysis (BCSA) by leveraging interpretable feature engineering on a large-scale benchmark.
A Review of Deep Learning-Based Binary Code Similarity Analysis
Jiang Du,Qiang Wei,Yisen Wang,Xiangjie Sun +3 more
TL;DR: This study summarizes the research patterns and development trends in binary code similarity analysis (BCSA), thereby proposing potential directions for future research.
2
HGE-BVHD: Heterogeneous Graph Embedding Scheme of Complex Structure Functions for Binary Vulnerability Homology Discrimination
Jiyuan Xing,Senlin Luo,Limin Pan,Jingwei Hao,Yingdan Guan,Zhouting Wu +5 more
TL;DR: This paper proposes HGE-BVHD, a novel method for binary vulnerability homology discrimination, addressing key problems in homologous function detection, including structurally complex functions, cross-architecture programs, and false positives, achieving state-of-the-art results.
1
References
•Proceedings Article
Signature Verification using a "Siamese" Time Delay Neural Network
Jane Bromley,Isabelle Guyon,Yann LeCun,E. Sackinger,Roopak Shah +4 more
- 29 Nov 1993
TL;DR: An algorithm for verification of signatures written on a pen-input tablet based on a novel, artificial neural network called a "Siamese" neural network, which consists of two identical sub-networks joined at their outputs.
Neural Network-based Graph Embedding for Cross-Platform Binary Code Similarity Detection
Xiaojun Xu,Chang Liu,Qian Feng,Heng Yin,Le Song,Dawn Song +5 more
- 30 Oct 2017
TL;DR: This work proposes a novel neural network-based approach to compute the embedding, i.e., a numeric vector, based on the control flow graph of each binary function, then shows that Gemini outperforms the state-of-the-art approaches by large margins with respect to similarity detection accuracy.
•Proceedings Article
Discriminative embeddings of latent variable models for structured data
Hanjun Dai,Bo Dai,Le Song +2 more
- 19 Jun 2016
TL;DR: Structured2vec as mentioned in this paper is an effective and scalable approach for structured data representation based on the idea of embedding latent variable models into feature spaces, and learning such feature spaces using discriminative information.
Scalable Graph-based Bug Search for Firmware Images
Qian Feng,Rundong Zhou,Chengcheng Xu,Yao Cheng,Brian Testa,Heng Yin +5 more
- 24 Oct 2016
TL;DR: A new bug search scheme is proposed which addresses the scalability challenge in existing cross-platform bug search techniques and further improves search accuracy, and implemented a bug search engine, Genius, and compared it with state-of-art bug search approaches.
464
Asm2Vec: Boosting Static Representation Robustness for Binary Clone Search against Code Obfuscation and Compiler Optimization
Steven H. H. Ding,Benjamin C. M. Fung,Philippe Charland +2 more
- 19 May 2019
TL;DR: An assembly code representation learning model that can find and incorporate rich semantic relationships among tokens appearing in assembly code and significantly outperforms existing methods against changes introduced by obfuscation and optimizations is developed.