Findcoding/Android-Malware-Detection-System-Using-Machine-LearningPublic

NotificationsYou must be signed in to change notification settings
Fork2
Star20

Leveraging the power of Machine Learning as a tool, we delve into the realm of app permissions to discern the true nature of applications, whether they harbor malicious or benign intent. By analyzing and predicting based on these permissions, we unlock valuable insights to safeguard users in the digital landscape.

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
Data		Data
Dataset		Dataset
Plots		Plots
Results		Results
Group_7_Presentation.pdf		Group_7_Presentation.pdf
Group_7_Report.pdf		Group_7_Report.pdf
README.md		README.md
code.ipynb		code.ipynb

Repository files navigation

Android Malware Detection System Using Machine Learning

Purpose:

Project atIIITDunder the courseCSE343 : Machine Learning under the guidance of ProfessorAnubha Gupta

Contributors:

Bijendar Prasad

Motivation:

As the android market continues to expand, so does the prevalence of malicious apps. According toZDNet, as many as 10%-24% of apps available on the Play store could be malicious in nature. These apps may appear innocuous at first glance, but they can wreak havoc on a user’s system in a variety of harmful ways. Unfortunately, current methods for detecting malware are both resource-intensive and exhaustive, and they struggle to keep up with the rapid pace at which new malware is being developed.

What can help us to overcome these challenges ?

Developing a comprehensive strategy to assess and analyze data from confirmed malicious applications.
Creating a model that can accurately predict the presence of malicious applications based on their permissions.
Introducing a machine learning-based malware detection model that utilizes publicly available metadata information. This model will be evaluated to determine its effectiveness as a first-stage filter for detecting Android malware.

Introduction:

Despite the growing threat of malware, there is still no reliable and robust method for detecting malicious applications. However,with the increasing use of machine learning in various fields, we believe that this issue can be addressed through the applicationof machine learning techniques. Our project aims to conduct a thorough and systematic investigation into the use of machinelearning for malware detection, with the ultimate goal of developing an efficient ML model capable of accurately classifyingapps as eitherbenign (0) ormalware (1) based on their requested permissions.This study Proposes:

Conducting an in-depth examination and evaluation of Android metadata and permissions as predictors of malware.
Introducing a machine learning-based malware detection strategy that utilizes publicly available metadata information.
Analyzing the effectiveness of this model and assessing its potential as a first-stage filter for detecting Android malware.

Dataset Description:

Dataset has been taken fromkaggle
Data contains the details of the permission of almost 30k app
There are 183 features in the dataset like Dangerous Permissions Count, Default : Access DRM content, Default : Move application resource, etc.
There is one target class (binary- 0/1) named - ‘Class’, indicating Benign(0) and Malware(1) applications.
There are 29,999 records with 20,000 malwares and 9,999 benign apps.

Prerocessing, Visualization and Analysis:The data is first imported from a CSV file and loaded into a dataframe for ease ofuse. The necessary attributes are then extracted from the dataset. To gain a better understanding of the data, several plots aregenerated. The data is checked for null or missing values, and any such values are replaced with the mean of the correspondingcolumn. The distribution of malware and benign applications across various settings is then analyzed, and the results arevisualized through a series of plots created usingMatplotlib andSeaborn.

Plots:

Exploratory Data Analysis(EDA):

The EDA for the Android Permission Dataset provided valuable insights into the relationships between different features in thedataset and helped us identify the most important features for predicting the app rating. It also provided a foundation for furtheranalysis using machine learning techniques.

Methodology:

After preprocessing the data, it is split into testing and training sets at an8:2 ratio. We attempted both under and oversamplingtechniques on the dataset, but the results were not promising. We then applied various classifiers, including logistic regression,decision trees, and Naive Bayes, but the outcomes were unsatisfactory. Upon further inspection of the dataset, we discoveredthat it contained several multivariate data tables, which required us to applyPCA to each dataset. We plotted the variancepercentage after using PCA and chose to use the inverse transform. We then applied Random Forest to the dataset, whichresulted in a significant improvement in accuracy. We then used the boosting approach to further increase prediction accuracy,both on an unsampled dataset and on one with reliable features selected. The results showed that the model was improving.Finally, we appliedSVM andMLP to the final dataset and achieved our best results. When comparing the results obtained afterfeature selection andboosting, we can see that we have made significant progress and achieved our final accuracy.

Libraries Used:

Results and Analysis:

On Basic Models

Models	Unsampled	Oversampled	Undersampled
Logistic	Training Accuracy 0.69 Test Accuracy 0.68 Recall Score 0.95 ROC Score 0.53	Training Accuracy 0.63 Test Accuracy 0.62 Recall Score 0.66 ROC Score 0.61	Training Accuracy 0.63 Test Accuracy 0.63 Recall Score 0.67 ROC Score 0.62
Naive	Training Accuracy 0.68 Test Accuracy 0.67 Recall Score 0.97 ROC Score 0.52	Training Accuracy 0.53 Test Accuracy 0.53 Recall Score 0.98 ROC Score 0.51	Training Accuracy 0.53 Test Accuracy 0.53 Recall Score 0.99 ROC Score 0.50
Decision Tree	Training Accuracy 0.67 Test Accuracy 0.67 Recall Score 0.99 ROC Score 0.51	Training Accuracy 0.57 Test Accuracy 0.55 Recall Score 0.68 ROC Score 0.54	Training Accuracy 0.55 Test Accuracy 0.56 Recall Score 0.79 ROC Score 0.55

As we can see that sampling is not effective in our case so move forward with unsampled data only.

Models	Optimal Parameter	Accuracy	Recall	ROC
SVM	default	Training Accuracy 0.85 Test Accuracy 0.85	0.94	0.80
Random Forest	n_estimators=200, n_jobs = -1	Training Accuracy 0.87 Test Accuracy 0.86	0.93	0.81
MLP	random_state = 42, max_iter = 300	Training Accuracy 0.85 Test Accuracy 0.85	0.95	0.80

By looking at the result all the three models performs more or less the same with Random Forest with Accuracy of 86%. As we seen in the Tabulation that, Accuracy follows the order as follow:Random Forest > MLP > SVM

Conclusion:

LearningDifferent ways to visualize the data for better understanding of features. Machine Learning models like Logistic Regression, Naive Bayes and Decision Tree to model the problem. How to use platforms like Kaggle and Google Colab. How to work and collaborate in teams.

References:

[1] Android Malware Prediction using Machine Learning Techniques: A Review
[2] An Efficient Android Malware Prediction Using Ensemble Machine Learning Algorithms
[3] Android Permission Dataset

About

Releases

No releases published

Packages

No packages published

Languages

Jupyter Notebook100.0%

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Android Malware Detection System Using Machine Learning

Purpose:

Contributors:

Motivation:

Introduction:

Dataset Description:

Plots:

Exploratory Data Analysis(EDA):

Methodology:

Libraries Used:

Results and Analysis:

On Basic Models

Conclusion:

References:

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages

Languages

Movatterモバイル変換

Findcoding/Android-Malware-Detection-System-Using-Machine-Learning

Folders and files

Latest commit

History

Repository files navigation

Android Malware Detection System Using Machine Learning

Purpose:

Contributors:

Motivation:

Introduction:

Dataset Description:

Plots:

Exploratory Data Analysis(EDA):

Methodology:

Libraries Used:

Results and Analysis:

On Basic Models

Conclusion:

References:

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages0

Languages

Packages