Eth2Vec: Learning Contract-Wide Code Representations for Vulnerability Detection on Ethereum Smart Contracts
Nami Ashizawa,Naoto Yanai,Jason Paul Cruz,Shingo Okamura +3 more
- 24 May 2021
- pp 47-59
TL;DR: Wang et al. as discussed by the authors proposed a machine-learning-based static analysis tool for vulnerability detection in smart contracts, which automatically learns features of vulnerable Ethereum Virtual Machine bytecodes with tacit knowledge through a neural network for natural language processing.
read more
Abstract: Ethereum smart contracts are programs that run on the Ethereum blockchain, and many smart contract vulnerabilities have been discovered in the past decade. Many security analysis tools have been created to detect such vulnerabilities, but their performance decreases drastically when codes to be analyzed are being rewritten. In this paper, we propose Eth2Vec, a machine-learning-based static analysis tool for vulnerability detection in smart contracts. It is also robust against code rewrites, i.e., it can detect vulnerabilities even in rewritten codes. Existing machine-learning-based static analysis tools for vulnerability detection need features, which analysts create manually, as inputs. In contrast, Eth2Vec automatically learns features of vulnerable Ethereum Virtual Machine (EVM) bytecodes with tacit knowledge through a neural network for natural language processing. Therefore, Eth2Vec can detect vulnerabilities in smart contracts by comparing the code similarity between target EVM bytecodes and the EVM bytecodes it already learned. We conducted experiments with existing open databases, such as Etherscan, and our results show that Eth2Vec outperforms a recent model based on support vector machine in terms of well-known metrics, i.e., precision, recall, and F1-score.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Figures

Fig. 2: Throughput of Eth2Vec: We measured throughput of Eth2Vec for each process. The item of Eth2Vec Test is identical to the vulnerability detection, which contains the detection itself and saving detection results. The item of Summarizing represents the processing time for the display process of the saved data. The entire process until a user obtains an output is a summation of EVM Extractor, Eth2Vec Test, and Summarizing. On this measurement, we randomly picked up 20 contract files, where a threshold is 0.8 and five clones, i.e., candidates of vulnerabilities, at maximum for each function are output. 
TABLE III: List of vulnerabilities: In this paper, we target the following vulnerabilities for evaluation of Eth2Vec. 
Fig. 3: Output example by Eth2Vec: The left-side on the windows displays a list of functions contained in each contract and the right-side shows a list of vulnerabilities included in the chosen function. 
Fig. 1: Overview of Eth2Vec 
TABLE I: Precision of Clone Detection of Eth2Vec. This table shows the average and standard deviation of 10 executions of precision measurement on 10-fold cross-validation. Furthermore, the row of SVM w/o few clones represents result of the SVM-based setting computed by removing few clones. The numbers are truncated to one decimal place.
Citations
Ethereum Smart Contract Analysis Tools: A Systematic Review
01 Jan 2022
TL;DR: In this article , the authors present a systematic review of security analysis tools for smart contracts and highlight some challenges and future recommendations in the field of smart contracts, including taint analysis, symbolic execution and fuzzing.
Smart contract vulnerability detection combined with multi-objective detection
TL;DR: Li et al. as discussed by the authors proposed a Multiple-Objective Detection Neural Network (MODNN), a more scalable smart contract vulnerability detection tool, which supports the parallel detection of multiple vulnerabilities and has high scalability, eliminating the need to train separate models for each type of vulnerability and reducing significant time and labor costs.
68
A novel fraud detection and prevention method for healthcare claim processing using machine learning and blockchain technology
TL;DR: In this article , a decision tree classification algorithm is adopted to classify the original claims dataset and the extracted knowledge is programmed in the Ethereum blockchain smart contract to detect and prevent healthcare fraud.
54
Dynamic Vulnerability Detection on Smart Contracts Using Machine Learning
Mojtaba Eshghie,Cyrille Artho,Dilian Gurov +2 more
- 21 Jun 2021
TL;DR: Dynamit as mentioned in this paper is a monitoring framework to detect reentrancy vulnerabilities in Ethereum smart contracts, which relies only on transaction metadata and balance data from the blockchain system; their approach requires no domain knowledge, code instrumentation, or special execution environment.
54
A Novel Smart Contract Vulnerability Detection Method Based on Information Graph and Ensemble Learning
TL;DR: An ensemble learning (EL)-based contract vulnerability prediction method, which is based on seven different neural networks using contract vulnerability data for contract-level vulnerability detection and has higher accuracy and robustness than other data-driven methods in the target task of predicting smart contract vulnerabilities.
References
Random Forests
Leo Breiman
- 01 Oct 2001
TL;DR: Internal estimates monitor error, strength, and correlation and these are used to show the response to increasing the number of features used in the forest, and are also applicable to regression.
•Proceedings Article
Efficient Estimation of Word Representations in Vector Space
Tomas Mikolov,Kai Chen,Greg S. Corrado,Jeffrey Dean +3 more
- 16 Jan 2013
TL;DR: Two novel model architectures for computing continuous vector representations of words from very large data sets are proposed and it is shown that these vectors provide state-of-the-art performance on the authors' test set for measuring syntactic and semantic word similarities.
27.5K
•Posted Content
Distributed Representations of Words and Phrases and their Compositionality
TL;DR: In this paper, the Skip-gram model is used to learn high-quality distributed vector representations that capture a large number of precise syntactic and semantic word relationships and improve both the quality of the vectors and the training speed.
Ethereum: A Secure Decentralised Generalised Transaction Ledger
Gavin Wood
- 01 Jan 2013
TL;DR: Ethereum as mentioned in this paper is a transactional singleton machine with shared state, which can be seen as a simple application on a decentralised, but singleton, compute resource, and it provides a plurality of resources, each with a distinct state and operating code but able to interact through a message-passing framework with others.
4.7K
•Posted Content
Distributed Representations of Sentences and Documents
Quoc V. Le,Tomas Mikolov +1 more
TL;DR: The authors proposed paragraph vector, an unsupervised algorithm that learns fixed-length feature representations from variable-length pieces of texts, such as sentences, paragraphs, and documents, and achieved new state-of-the-art results on several text classification and sentiment analysis tasks.
3.4K