1. How do cyber criminals use DGA technique?
Cyber criminals use the Domain Generation Algorithm (DGA) technique to generate a large number of malicious pseudo-random domain names within a short period of time. They then use one of these domain names to resolve the Domain Name Service (DNS) address of the Command and Control (C&C) server and establish a secure communication with the attacker. Once this communication is established, the malware sends/receives data/instructions with the attacker. The attacker then seizes complete control of the compromised system and spreads malware (either a botnet or a ransomware). The compromised system or network is then used to target single or multiple computers within the network for stealing confidential data, disabling or hijacking the system, or launching further attacks. These attacks can include distributed denial of service attacks, man in the middle attacks, phishing attacks, SQL injection attacks, etc.
read more
2. What is the approach used in DGA detection?
The approach used in DGA detection involves an ensemble machine learning approach that combines botnets and ransomware DGA malwares. It uses a different training dataset, Cisco Umbrella top 1 million most visited domain names, and attributes extracted from domain names data itself. The methodology follows the Cross-Industry Standard Process for Data Mining (CRISP-DM) model. Ten features are extracted from domain names data, including length, numbers, and special characters. Four machine learning models, including naive bayes, support vector machines, random forest, and classification and regression tree model, are deployed. Models are trained with 4 and 10 features using 300,000 randomly selected training datasets. The models are evaluated using 80% training data and 20% validation data, with 10-fold cross-validation for efficiency. The speed-accuracy trade-off is considered in choosing the best fit model, balancing execution time and prediction accuracy.
read more
3. Which machine learning model performed best in terms of accuracy and execution time?
The Naive Bayes model performed best in terms of accuracy and execution time, with an accuracy level of more than 90% and no malicious domain names wrongly classified as benign. This model is recommended for deployment due to its speed and accuracy. Future work could involve using live domain names data as training datasets to further improve the model's performance.
read more