Phishing is a type of social engineering attack, in which criminals using fake websites to obtain a victim’s sensitive information such as login credentials and credit card details. The phishing websites, emails and apps disguise themselves as trustworthy by making them visually almost indistinguishable from the real ones. The criminals spend a lot of time making their sites so similar to real ones that even experienced users if they will not conscious enough, can be tricked into thinking they are on a legitimate site. The user, convinced that they’ve visited the original site, enters the password, but instead of going to the original publisher, it goes to criminals who can use it for malicious purposes.
The damage done by cybercriminals can be significant and costly, especially if they gain access to the victim’s electronic banking account. Therefore, it is important to prevent this type of attack and detect threats as early as possible. Modern browsers and operating systems have built-in mechanisms to defend users.
These are usually based on lists of known phishing sites. For example, Firefox Phishing and Malware Protection works by checking the sites that you visit against lists of reported phishing, unwanted software and malware sites.
There is also extensive literature available on using scraped page content, IP information or other HTTP properties and processing those to create machine learning models.
In this article, we will present an approach based on screenshots of suspicious sites. By doing so, we will prepare a mechanism that is versatile and resistant to code obfuscation techniques that make crawling difficult. We will create a machine learning model that resembles a real person approach to detecting fraudulent websites by visual inspection.