Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Leveraging the power of Machine Learning as a tool, we delve into the realm of app permissions to discern the true nature of applications, whether they harbor malicious or benign intent. By analyzing and predicting based on these permissions, we unlock valuable insights to safeguard users in the digital landscape.

NotificationsYou must be signed in to change notification settings

Findcoding/Android-Malware-Detection-System-Using-Machine-Learning

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Purpose:

Project atIIITDunder the courseCSE343 : Machine Learning under the guidance of ProfessorAnubha Gupta

Contributors:

Motivation:

As the android market continues to expand, so does the prevalence of malicious apps. According toZDNet, as many as 10%-24% of apps available on the Play store could be malicious in nature. These apps may appear innocuous at first glance, but they can wreak havoc on a user’s system in a variety of harmful ways. Unfortunately, current methods for detecting malware are both resource-intensive and exhaustive, and they struggle to keep up with the rapid pace at which new malware is being developed.

What can help us to overcome these challenges ?

  • Developing a comprehensive strategy to assess and analyze data from confirmed malicious applications.
  • Creating a model that can accurately predict the presence of malicious applications based on their permissions.
  • Introducing a machine learning-based malware detection model that utilizes publicly available metadata information. This model will be evaluated to determine its effectiveness as a first-stage filter for detecting Android malware.

Introduction:

Despite the growing threat of malware, there is still no reliable and robust method for detecting malicious applications. However,with the increasing use of machine learning in various fields, we believe that this issue can be addressed through the applicationof machine learning techniques. Our project aims to conduct a thorough and systematic investigation into the use of machinelearning for malware detection, with the ultimate goal of developing an efficient ML model capable of accurately classifyingapps as eitherbenign (0) ormalware (1) based on their requested permissions.This study Proposes:

  • Conducting an in-depth examination and evaluation of Android metadata and permissions as predictors of malware.
  • Introducing a machine learning-based malware detection strategy that utilizes publicly available metadata information.
  • Analyzing the effectiveness of this model and assessing its potential as a first-stage filter for detecting Android malware.

Dataset Description:

  • Dataset has been taken fromkaggle
  • Data contains the details of the permission of almost 30k app
  • There are 183 features in the dataset like Dangerous Permissions Count, Default : Access DRM content, Default : Move application resource, etc.
  • There is one target class (binary- 0/1) named - ‘Class’, indicating Benign(0) and Malware(1) applications.
  • There are 29,999 records with 20,000 malwares and 9,999 benign apps.

Prerocessing, Visualization and Analysis:The data is first imported from a CSV file and loaded into a dataframe for ease ofuse. The necessary attributes are then extracted from the dataset. To gain a better understanding of the data, several plots aregenerated. The data is checked for null or missing values, and any such values are replaced with the mean of the correspondingcolumn. The distribution of malware and benign applications across various settings is then analyzed, and the results arevisualized through a series of plots created usingMatplotlib andSeaborn.

Plots:

Unsampled Class DistributionUndersampled Class DistributionOversampled Class Distribution

Columns Name vs Missing Values

Exploratory Data Analysis(EDA):

The EDA for the Android Permission Dataset provided valuable insights into the relationships between different features in thedataset and helped us identify the most important features for predicting the app rating. It also provided a foundation for furtheranalysis using machine learning techniques.


Methodology:

After preprocessing the data, it is split into testing and training sets at an8:2 ratio. We attempted both under and oversamplingtechniques on the dataset, but the results were not promising. We then applied various classifiers, including logistic regression,decision trees, and Naive Bayes, but the outcomes were unsatisfactory. Upon further inspection of the dataset, we discoveredthat it contained several multivariate data tables, which required us to applyPCA to each dataset. We plotted the variancepercentage after using PCA and chose to use the inverse transform. We then applied Random Forest to the dataset, whichresulted in a significant improvement in accuracy. We then used the boosting approach to further increase prediction accuracy,both on an unsampled dataset and on one with reliable features selected. The results showed that the model was improving.Finally, we appliedSVM andMLP to the final dataset and achieved our best results. When comparing the results obtained afterfeature selection andboosting, we can see that we have made significant progress and achieved our final accuracy.


PCA features vs Variance Percentage



Libraries Used:

Results and Analysis:

On Basic Models

ModelsUnsampledOversampledUndersampled
LogisticTraining Accuracy 0.69
Test Accuracy 0.68
Recall Score 0.95
ROC Score 0.53
Training Accuracy 0.63
Test Accuracy 0.62
Recall Score 0.66
ROC Score 0.61
Training Accuracy 0.63
Test Accuracy 0.63
Recall Score 0.67
ROC Score 0.62
NaiveTraining Accuracy 0.68
Test Accuracy 0.67
Recall Score 0.97
ROC Score 0.52
Training Accuracy 0.53
Test Accuracy 0.53
Recall Score 0.98
ROC Score 0.51
Training Accuracy 0.53
Test Accuracy 0.53
Recall Score 0.99
ROC Score 0.50
Decision TreeTraining Accuracy 0.67
Test Accuracy 0.67
Recall Score 0.99
ROC Score 0.51
Training Accuracy 0.57
Test Accuracy 0.55
Recall Score 0.68
ROC Score 0.54
Training Accuracy 0.55
Test Accuracy 0.56
Recall Score 0.79
ROC Score 0.55

As we can see that sampling is not effective in our case so move forward with unsampled data only.

ModelsOptimal ParameterAccuracyRecallROC
SVMdefaultTraining Accuracy 0.85
Test Accuracy 0.85
0.940.80
Random Forestn_estimators=200, n_jobs = -1Training Accuracy 0.87
Test Accuracy 0.86
0.930.81
MLPrandom_state = 42, max_iter = 300Training Accuracy 0.85
Test Accuracy 0.85
0.950.80

By looking at the result all the three models performs more or less the same with Random Forest with Accuracy of 86%. As we seen in the Tabulation that, Accuracy follows the order as follow:Random Forest > MLP > SVM

Conclusion:

  • LearningDifferent ways to visualize the data for better understanding of features. Machine Learning models like Logistic Regression, Naive Bayes and Decision Tree to model the problem. How to use platforms like Kaggle and Google Colab. How to work and collaborate in teams.

References:

  • [1] Android Malware Prediction using Machine Learning Techniques: A Review

  • [2] An Efficient Android Malware Prediction Using Ensemble Machine Learning Algorithms

  • [3] Android Permission Dataset

About

Leveraging the power of Machine Learning as a tool, we delve into the realm of app permissions to discern the true nature of applications, whether they harbor malicious or benign intent. By analyzing and predicting based on these permissions, we unlock valuable insights to safeguard users in the digital landscape.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

[8]ページ先頭

©2009-2025 Movatter.jp