DEV Community

Abdul Ghani
Abdul Ghani

Posted on

5 2

Real-time Phishing Attack Detection using ML πŸ’»

My Final Project

So, I've built this project called RPAD-ML in my final year. It is essentially an Android app coupled with a machine learning backend server which detects πŸ•΅οΈ any link that is a possible phishing site in REALTIME ⚑. It can detect malicious/phishing links from any app. Open any app which has external links πŸ”—, RPAD-ML will detect it in no time and gives you a warning message⚠️ right away.

Demo

Download RPAD-ML Demo APK

I know there are lots of things available like Google safe browsing. But those are limited to chrome web browser. So, What I've done is used a machine learning model of phishing sites combined with Google safe browsing which when given a URL predicts whether it is a phishing website or not.

Link to Code

GitHub logo abdulghanitech / rpad-ml

Real-time Phishing Attack Detection using ML πŸ’»

rpad-ml

Real-time Phishing Attack Detection using ML πŸ’»

The repo contains code for both the ML server and the Android app which was used to detect phishing sites in real-time. Below is a flow chart of it.

Screenshot




How I built it

I've got a machine learning model built using dataset of phishing sites.

DATA SELECTION

The dataset is downloaded from UCI machine learning repository. The dataset contains 31 columns, with 30 features and 1 target. The dataset has 2456 observations.

MODELS

To fit the models over the dataset the dataset is split into training and testing sets. The split ratio is 75-25. Where in 75% accounts to training set.

Now the training set is used to train the classifier. The classifiers chosen are:

* Logistic Regression

* Random Forest Classification

* Support Vector Machine

We will see which one fits best in our dataset.

1.Logistic Regression

Fitting logistic regression and creating confusion matrix of predicted values and real values I was able to get 92.3 accuracy. Which was good for a logistic regression model.

2.Support Vector Machine

Support vector machine with a rbf kernel and using gridsearchcv to predict best parameters for svm was a really good choice, and fitting the model with predicted best parameters I was able to get 96.47 accuracy which is pretty good.

3.Random Forest Classification

Next model I wanted to try was random forest and I will also get features importances using it, again using gridsearchcv to get best parameters and fitting best parameters to it I got very good accuracy 97.26.

Random forest was giving very good accuracy. We can also try artificial neural network to get a improved accuracy.

FEATURE IMPORTANCES

FEATURE IMPORTANCE
ML Model: Phishcoop

Hosting online as a server

I've used the Heroku platform (Hobby plan provided by GitHub education) to host this machine learning model online. I used pickle to save and load the machine learning model and hosted it using Flask.

The idea was to put this as a service and then call it from the android app.

Android App

Essentially, this is the front-end to call this service. I've used Android's accessibility API to access and intercept network. Hence, I got the URLs being opened in any app using this method.

Now, after getting this url, firstly I call the Google safe browsing API to check whether it is a phishing site or not. If yes, I show a warning dialog else I call the machine learning backend server and using the result provided by it I again show warning dialog if the result comes as phishing site.

Additional Thoughts / Feelings / Stories

This was more like a prototype. While it is not that perfect, but hey it works πŸ™ŒπŸ». And the best thing is I've learnt so much by working on this project πŸ€“

AWS Security LIVE!

Join us for AWS Security LIVE!

Discover the future of cloud security. Tune in live for trends, tips, and solutions from AWS and AWS Partners.

Learn More

Top comments (0)

Billboard image

The Next Generation Developer Platform

Coherence is the first Platform-as-a-Service you can control. Unlike "black-box" platforms that are opinionated about the infra you can deploy, Coherence is powered by CNC, the open-source IaC framework, which offers limitless customization.

Learn more

πŸ‘‹ Kindness is contagious

Please leave a ❀️ or a friendly comment on this post if you found it helpful!

Okay