key: cord-0547442-dlmg8jo7
authors: Roy, Sayak Saha; Karanjit, Unique; Nilizadeh, Shirin
title: Evaluating the effectiveness of Phishing Reports on Twitter
date: 2021-11-13
journal: nan
DOI: nan
sha: 3da3c0dde5869e36584dbd47c3b1580fd4b11d5c
doc_id: 547442
cord_uid: dlmg8jo7

Phishing attacks are an increasingly potent web-based threat, with nearly 1.5 million websites created on a monthly basis. In this work, we present the first study towards identifying such attacks through phishing reports shared by users on Twitter. We evaluated over 16.4k such reports posted by 701 Twitter accounts between June to August 2021, which contained 11.1k unique URLs, and analyzed their effectiveness using various quantitative and qualitative measures. Our findings indicate that not only do these users share a high volume of legitimate phishing URLs, but these reports contain more information regarding the phishing websites (which can expedite the process of identifying and removing these threats), when compared to two popular open-source phishing feeds: PhishTank and OpenPhish. We also notice that the reported websites had very little overlap with the URLs existing in the other feeds, and also remained active for longer periods of time. But despite having these attributes, we found that these reports have very low interaction from other Twitter users, especially from the domains and organizations targeted by the reported URLs. Moreover, nearly 31% of these URLs were still active even after a week of them being reported, with 27% of them being detected by very few anti-phishing tools, suggesting that a large majority of these reports remain undiscovered, despite the majority of the follower base of these accounts being security focused users. Thus, this work highlights the effectiveness of the reports, and the benefits of using them as an open source knowledge base for identifying new phishing websites.

Phishing websites are a prominent social-engineering threat whose volume has significantly increased over the past few years [1] , [2] . To counter these attacks, there has been a prolonged effort from the security community in the form of automated tools which use sophisticated machine learning [3] - [6] , deep-learning [5] , [7] , [8] , rule-based [9] - [12] and heuristic [13] - [16] based techniques. But, phishing websites are highly elusive in nature, often evolving to leverage several loopholes and adversarial tactics to circumnavigate automated tool detection [17] - [20] . Thus, domain hosting services, antiphishing tools and web-browsers often rely on one or more phishing feeds/ blocklists, which are curated knowledgebases containing a frequently updated list of phishing URLs, which are either added manually or through a combination of automated discovery/ manual reporting and human verification. As noted by Oest et al. [21] , phishing feeds are reactive in nature, with a considerable time gap between the appearance of the website and it subsequently being reported. Even then, these feeds usually go through some form of manual evaluation, ideally making them less error prone than automated detection systems.

In this work, we investigate phishing reports which are shared on Twitter [22] , the popular micro-blogging platform. To the best of our knowledge, this work constitutes the first study on evaluating this resource as a formidable knowledgebase for identifying new phishing websites, and throughout the course of this paper, we concentrate on finding its effectiveness and compare its characteristics and performance to two other popular open source phishing feeds -PhishTank and OpenPhish. More specifically, this paper: (i) Determines the reliability and volume of information shared by these phishing reports, and how they compare against two other open source phishing feeds. (ii) Being hosted on Twitter, these reports can also be visibly interacted upon by other users on the platform, a feature which is not available to the other two phishing feeds that we study. We thus evaluate the frequency of these interactions, including those from domain registrars and organizations which the reported phishing URLs have targeted, and examine the impacts of these interactions on the detection/removal of the reported URLs. (iii) We also analyze the aftermath of sharing these reports, i.e., how long the URLs remain active after getting reported, as well as how quickly anti-phishing tools detect them. Both are factors which can protect the user from inadvertently accessing the threat.

We collected and analyzed more than 16k tweets which contained 11k unique URLs, over the period of 21st June to 17th August 2021. Using a combination of automated and manual investigations, we repeatedly tracked a myriad of properties extracted from these posts including checking the activity of these URLs, anti-phishing tool detection, information shared by these posts (such as relevant hashtags, images, etc.), true positive rate (percentage of URLs which are legitimate phishing attacks), interactions with other users (likes, comments and replies). We also compared the relevant statistics with two other phishing feeds-PhishTank and OpenPhish.

In Section III, we underline how we collected the phishing reports from Twitter, and characterize them based on the domains they are hosted on. In Section III-C, we discuss about several drive-by downloads URLs which are also reported by these accounts, a feature absent from the other two feeds. In Section IV, we evaluate the information shared by these phishing reports (IV-A), as well as by PhishTank (IV-A1) and OpenPhish (IV-B). We focus on the information shared by these reports (such as screenshots, IP address, name of targeted registrar/ organization, labelling of threats, etc.), as well as their reliability and validity in Section IV-C. Our sole goal in this section is to evaluate how the information shared by phishing reports on Twitter compare with the offerings from PhishTank and OpenPhish, which is later summarized in Table I . Unlike the two other phishing feeds, which usually do not allow user interaction, in Section V, we find the volume of interactions (favourites/ retweets/ comments) that these reports get on Twitter, and whether interaction from the targeted domain/organization has an impact on how quickly these reported websites are removed. We also qualitatively explore how these interactions look like (Fig 7) and determine the technological proficiency of users who typically interact with these reports (V-B). Finally, in Section VI, we determine how long these URLs stay active after being reported, and how the rate (and pace) of removal compares with URLs which are posted on PhishTank and OpenPhish (Section IV). We also check for the coverage of the phishing URLs by anti-phishing engines VI-C. Our main findings can be summarized as below:

1) Twitter is a viable candidate for being utilized as a knowledge-base for phishing reports. Over the course of three months, users consistently shared over 16.4k such reports which covered 11.1k unique URLs hosted over 203 unique registrars, and targeted 146 different organizations. Unlike PhishTank and OpenPhish, these accounts also reported URLs distributing Driveby downloads (7%). 2) Majority of the URL reports taken from Twitter contained more information compared to PhishTank and OpenPhish, which can help domain registrars and antiphishing tools to expedite the process of threat identification.

3) The URLs shared in these reports have a high true positive rate (87%), with only one account contributing to the majority (11%) of the false positives in our dataset. 4) These reports receive very low engagement, with only 13.8% of the posts receiving at least one comment. The domain registrars and organizations which the reported URLs targeted (referred to as targets henceforth) contributed to only 4% of these comments. However, when they did respond to these reports, the URLs became inactive more quickly compared to the URLs which do not receive such interaction. Moreover, only 10.2% of the targets follow at least one such Phishing report account, indicating that they are either not aware of these reporting accounts, or do not consider them as a credible source.

5) 31% of these reported URLs remain active even after a week of their first appearance in our dataset. Moreover, anti-phishing tools consistently have lower detection rates for at least 27% of the URLs when compared to URLs which show up in other phishing feeds. Thus, our evaluation indicates that phishing reports shared on Twitter are a reliable and efficient source for conveying information regarding new phishing websites. Relying on these reports can help domain registrars and anti-phishing tools in expediting the process of identifying newer phishing threats. Additionally, based on the volume of information shared by these reports, it proves to be a valuable resource for researchers in building detailed ground truth datasets with less effort and more efficiency compared to other open phishing feeds like PhishTank and OpenPhish. We explore our findings in broader details from the proceeding section on-wards.

II. BACKGROUND AND RELATED WORK Phishing websites: They are web based threats which usually attempt to trick users into entering their personal information by often pretending to be legitimate organizations. Based on recent measurements, nearly 1.4 million phishing websites are created every month [23] . There is no dearth of literature regarding the development of automated phishing detection strategies, including machine learning and deep learning approaches [3] - [5] , as well as heuristic and rulebased implementations [13] - [15] , [24] . However, as highlighted by Vayansky and Kumar [25] , unlike file-based threats, the success of a phishing attack is largely based on human interaction factors, which makes it challenging. In fact, several qualitative and quantitative studies have determined that users are not proficient at identifying phishing websites [26] - [30] . Additionally, phishing attacks often evolve based on based on the socio-economic conditions such as the 2008 financial crisis [31] , and more recently, the COVID-19 pandemic [32] , [33] , as well as leveraging several adversarial tactics to circumnavigate automated tool detection [17] - [20] . Thus, domain hosting services, antiphishing tools and webbrowsers often rely one or more phishing feeds/blocklists, which are frequently updated knowledge-bases containing a list of new phishing URLs, These URLs are either manually annotated by security conscious individuals or discovered through automated crawlers. Our work focuses on one such knowledge-base distributed across Twitter [22] , the microblogging platform, and how it fares against two popular and open phishing feeds -PhishTank [34] and Openphish [35] .

Effectiveness of Phishing feeds: These are specialize feeds dedicated towards keeping track of new phishing threats which are distributed across the web. These feeds are both closed (proprietary) and open in nature. In this work we only focus on comparing phishing reports posted on Twitter, with two feeds belonging to the open source category -PhishTank (PT) and OpenPhish (OP), since it is easier to collect and analyze a large volume of data from them. Despite the utility of these open phishing feeds, academic research on them is limited, but even then, prevalent work has highlighted several pitfalls that these feeds face consistently. For example, Sheng et al. [36] noted how they had a very low efficiency at identifying newer threats at hour zero -The time when phishing threats are at their most potent state and continue to have a low coverage even aftera few hours. Bell et al. [37] notes that Phishtank and OpenPhish have very few URLs overlapping, suggesting that using them collectively can help in covering a larger volume of these threats. In this work we determine the volume of URLs that are reported exclusively by phishing reports posted on Twitter, and the need for using it as an anti-phishing knowledgebase. Moreover, Moore et al. [38] points out the unreliability of PhishTank, one of the most popular community driven phishing feed, because it is prone to false positives and even deliberate poisoning, the former of which we discuss further in our work. Finally, Oest et al. [21] suggests using an evidence based phishing reporting feed containing additional artifacts such as screenshots can expedite the process of detecting and removing the threat, which, based on our analysis, neither of the two feeds can do very efficiently at present. Considering that these feeds are being used exhaustively by web browsers [39] - [41] , as well as anti-phishing tools and organizations [42] , these shortcomings can inadvertently impact the protection that is offered to users by these services. By critically evaluating the effectiveness of phishing reports posted on Twitter, we discuss the shortcomings of PhishTank and OpenPhish in both reliability and coverage.

To automate the process of collecting tweets which contained phishing reports using the Twitter API [43] , we qualitatively analyzed 500 random posts containing such reports and identified the attributes which were unique to them. We found that the majority of such tweets report the URLs in an obfuscated format, usually replacing 'http'/'https' with case insensitive variants of 'hxxp/hxxps.' This strategy is popularly known as 'URL defanging' [44] , and is used to prevent users from accidentally visiting the malicious link. Thus we utilized two search terms -'hxxp' and 'hxxps' to populate our dataset. However, in some cases, other parts of the URL are also defanged, for example, http://abc [.] .com, but they were usually accompanied with the hashtags #phishing and #scam. We thus collect tweets using those hashtags as well, and then use a regular expression which reverses the defanging by replacing the obfuscating characters from the URL to make them usable for our experiments.

We thus utilized this data collection approach to collect new phishing reports every 30 mins from the period of June 21st to August 17th 2021, acquiring 16,486 tweets posted by 701 unique reporters in the process, which contained 11,139 unique URLs. During each 30 min period several companion processes were also run, including tracking whether the URL is active, checking if the URL is present in the phishing feeds provided by PhishTank and Openphish, and also tracking how many anti-phishing engines were detecting the URL by using VirusTotal [45] . VirusTotal [46] is an online URL scanning tool which scans URLs using 80 different anti-phishing engines, and returns an aggregated total of the engines which detected the URL as malicious. It is used frequently by researchers to create a ground-truth of malicious URLs [47] - [50] . This approach enabled us to get a full picture of how both registrars and anti-phishing engines reacted to the URLs which were shared by these reports.

Additionally, to evaluate the amount of information shared by the phishing reports, as well as their efficiency (i.e., if the URLs shared were actually phishing websites), we also collected the screenshots of both the tweets as well as the website that they had reported. To study how other users reacted towards these reports, we collected a snapshot of all interactions towards these posts at end of every day, which included the comments posted on them, as well as the user ids of the individuals who liked and retweeted them.

We used WHOIS [51] to determine the hosting records of the 11,139 unique URLs and found that the reported URLs were distributed across 203 unique domains. Moreover, 5% (n=631) of the URLs consisted of an adversarial threat category highlighted in work by Saha Roy et al. [19] . These URLs leverage the use of popular free web-hosting domains (which are often white-listed by anti-phishing vendors and phishing feeds alike) to host phishing websites, and in turn remain active for a long time after their first appearance, while also evading detection by several anti-phishing engines. Overall around 52% of these reports used hashtags (keywords prefixed with # symbol) to refer to the names of the domains or organization targeted by the URLs. Hashtags are widely used to define a shared context for specific events or topics [52] , and we assume that the reporters use them to: (a) inform the domain registrar service and organization that the website is phishing and should be investigated, and (b) inform other users about where the website is being hosted and/or which brand or organization it is targeting, We explore in detail the other informational attributes shared by these reports (such as screenshots of the URL, threat category, location, etc.), and how they compare against PhishTank and Openphish in Section IV. Unlike other phishing feeds, the reports on Twitter can be interacted upon by other users in the Twitter community, including accounts maintained by the domain registrars and organizations which are targeted by the reported URL. Thus, in Section V, we explore the responsiveness of these aforementioned parties towards the post, and how it affects the activity of the respective phishing URLs. Figure 1 illustrates the distribution of the URLs across different registrars and drive-by download categories (n=11,139). We find that large amounts of these reported URLs are hosted across popular domain registrars such as GoDaddy, Namecheap, Namesilo, Public domain registry etc. This indicates that these posts are not focused on a particular registrar/ group of registrar, but cover URLs from several sources. Similarly, these reports also cover URLs which host a wide range of file-based threats ranging from Trojan horses, infected PDFs and malicious APKs. We explore the distribution of these threats in Section III-C. 

Phishing report tweets in our dataset were posted by 701 accounts. Interestingly, one account posted more than 48.2% of URLs in our dataset (n=7,946 tweets), 25 accounts posted more than 100 such tweets, and 21 accounts posting more than 50. Due to only one user contributing such a large portion of the tweets, we report our findings by both considering and not considering this one user (whom we refer to as top poster henceforth) separately. Also, 65% of the users in our dataset shared only one tweet. Infact, the distribution of the posts contributed by these accounts is heavily skewed towards some particular users as illustrated in Figure 3 . Despite this, our goal is to not concentrate on any one user, but instead investigate the content shared by all these accounts as a form of distributed knowledge-base, and determine the reliability of information provided by these reports and if it can benefit the identification of new phishing threats. 

While phishing URLs leverage social engineering techniques using various persuasion principles such as authority and distraction to deceive the users into sharing their private information [53] , websites distributing drive by downloads might contain no phishing component at all, but can still acquire sensitive information by installing malicious files or applications in the user's system and exploiting critical vulnerabilities [54] .

We found 829 unique URLs shared over 902 reports which distributed drive-by downloads. We monitored file downloads which were triggered automatically by visiting the URLs in our dataset, and the downloaded files were then scanned using VirusTotal, and were labelled as Drive-by download only if those files were detected by at least two different engines (a threshold considered as a standard for labelling malicious files in both the industry and research communities [55] ). We then distributed these files equally between our team of four security researchers, who executed each of them in a secure VM environment, and based on their characteristics of these files, each file was assigned a label indicative of the threat family they belonged to. We adhered to the threat family labels mentioned in Cisco's Cyber-security Trend report [56], but also added two more categories which were distinctly present in our dataset: Malicious APK (Android based malware), and Infected PDFs. Figure 2 breaks down the distribution of the malicious files across 9 different threat families. We find that 26.8% of these drive-by downloads distributed trojan horses [57] , malicious files disguised as legitimate software. A good portion (13.2%) of the URLs distributed Android based malware [58] , which ranged from apps which attempted to send premium text messages, showing intrusive advertisements, trying to gain access to system resources etc. We also find cryto-mining malware files, or crypto-jacking attacks (6.4%), which tend to use large amount of system resources to illicitly mine crypto-currency for the attackers gain [59] . 13 .8% of files also consisted of Spywares, ranging from Browser hijackers [60] to Keyloggers [61] . Also present are Scarewares [62] and Ransomwares [63] which attempt to which utilize social engineering techniques to restrict/deny control to system data/resource, to threaten the victims into sharing their private information. Thus, it is evident that the Drive-by download URLs shared by the phishing reports on Twitter cover a large array of threat families. In Section IV, we further look into the coverage of drive-by download URLs by PhishTank and OpenPhish.

In this section, we determine the characteristics and volume of information shared by the phishing reports posted on Twitter, and also compare those attributes with the URLs which are shared on PhishTank and OpenPhish in Sections IV-A1, IV-B respectively. We further extend the comparison in Section IV-D by looking at what portion of URLs overlap between the phishing reports and the two other feeds, as well as what portion of URLs are alive when they are shared. Finally, in IV-C, we use sophisticated machine learning tools, as well as qualitative analysis to determine what portion of URLs shared by the reporters on Twitter are legitimate(true-positives).

Using regular expressions, as well as extracting the hashtags from these tweets, we were able to analyze the content presented by these reports. Overall, we found that they shared much more information than just the suspected URL. These included the IP address (31%), hosting registrar(52%), targeted organization (47%), the category of the URL (for example phishing, scam or malware -36%) as well as a full image (23.5%) of the phishing website. Figure 4 provides examples for two such reports and highlights the information shared by them. Without considering the top poster, these statistics increased considerably, with 44% of the posts sharing IP addresses, 61% and 53% sharing hosting registrar and domain targets, respectively, 28% sharing full images of the websites and 42% sharing the name of the threat. This indicates that the tweets shared by the top poster often have less information compared to other reporters. We now consider each of the features (IP address, hosting registrar, targeted organization, etc.) that we have identified from these reports and determine if the other phishing feeds-PhishTank and OpenPhish provide similar information: 1) PhishTank: PhishTank allows any individual to add URLs to their feed, which can then be verified by other users. It does not provide either hosting registrar information nor the IP address of the URLs submitted to their website, and relies only on user submission to populate it's feed. A valid submission only requires the user to provide the URL to be reported and then select the target from a list (with Other being a valid option for targets that are not present in the list). They can also provide an open ended response to indicate the contents of the phishing page/email, however this information does not appear anywhere on the feed. Downloading the comprehensive PhishTank feed (which contained 10,622 URLs at the time), we find that nearly 85% of URLs contained the Other label under targeted organization, thus providing no conclusive information about the organization that the URL had targeted.

Also, since Phishtank relies on verifier accounts to label the submissions on whether the URLs are phishing or not, the feed data provided by PhishTank contains information about when the URL first appeared on PhishTank and when it was verified(the median verification time was around 12.96 minutes). However PhishTank's downloadable feed only provides URLs which have already been verified. Thus, to determine the efficacy of the live feed (which also contains unverified URLs), we first monitored 1k new URLs taken from PhishTank to check what percentage of them are verified. Then through continued observation of these URLs, we found that PhishTank verified nearly 724 of these URLs with a median verification time of 11.49 mins, marking 639 of them as phishing (VALID), and 85 of them as benign (INVALID). However, among the remaining 276 URLs, we found 119 of them to be phishing websites, and we could not observe 37 of them because they were already inactive. Interestingly, among the phishing websites which remained unverified, 53 of them seemed to originate from unconventional phishing domains [19] , a family of phishing threats which are very difficult to detect by both registrars and anti-phishing engines alike. We verified the remaining 120 URLs as false positives, which were added to the 85 URLs that had been labelled by the verifiers as being INVALID. Thus, we find that these 1k URLs had a false positive rate of 20.5%. Considering that researchers often rely on PhishTank as a viable source for collecting phishing URLs [64] , this rate of false positives might add significant noise to their datasets. Also, PhishTank takes screenshots automatically when the URL is submitted (PhishTank does not ask the submitter to provide this information during submission), and if said website is already down then these screenshots do not contribute any useful information towards the appearance of these websites. We found that 29% of the URLs on PhishTank had screenshots which indicated that the website was already inactive before submission, a phenomenon we investigate more closely in section IV-D1. Moreover, PhishTank provides a label which indicates whether a URL is Online. However, by checking the labels of these 1k URLs, we found that in 33% of the URLs, PhishTank provides incorrect information about the activeness of the websites (It incorrectly identifies a website is Online when it is actually Offline or vice versa). Thus using this indicator might provide incorrect information about the activeness of the website.

Similar to PhishTank, we focus on 1k URLs which are collected dynamically from the OpenPhish feed. From these URLs, we found that about 39% of the URLs provided hosting registrar information, and 23% provided the IP address of the URLs. We also noticed that 74% of the URLs identified a relevant targeting organization. OpenPhish also reports when a URL first appeared in their feed. To the best of our knowledge, Openphish does not report the screenshot of the webpage, neither on their website, nor through their API access. Unlike Phishtank, Openphish does not report on the activity of the URL as well, i.e., whether a URL is online or not. Also it does not identify the category of the threat of the URL. How Open-Phish obtains URLs is ambiguous, as they note "OpenPhish receives millions of unfiltered URLs from a variety of sources on its global partner network." [35] . However, we assume that these partners are curated by OpenPhish themselves, and thus they might be more reliable than the open ended anonymous submission approach implemented by PhishTank. This is further corroborated by the low false positive rate of these submissions, as we identified only 41 (out of 1k URLs) which were incorrectly marked as being phishing. Later on, we sample from this set of URLs in Sections VI to track the activity of the reported URLs, as well as how quickly they are detected by anti-phishing engines, and how it compares to phishing reports provided by Twitter accounts.

We have established that different phishing feeds share different volumes and variations of information and illustrated how they compare to the Twitter phishing reports. However, since both researchers and industrial entities rely on these feeds up to some capacity, one of the most important aspect of these reports are the validity of the URLs that they share.

In the previous section we have already determined that PhishTank and OpenPhish have a false positive rate of 20.5% and 4.1% respectively, based on our investigation of 1k URLs collected randomly from these feeds. In this section we evaluate the validity of URLs shared by the phishing reports posted on Twitter. We evaluated the false positive rate of the URLs posted in different report sources by scanning the URLs on VirusTotal, using manual observation, as well as applying an ensemble machine learning approach. We report the methodology and findings of our evaluation below:

We used VirusTotal as an initial filter to reduce the number of phishing websites needed for manual evaluation. For URLs which had at least 2 detections a day after their appearance in our dataset, we marked them as true positives. We found nearly 31% of the tweets (n=5,109) containing 3,827 unique URLs (34%) which did not reach this threshold. Manually labelling such a large volume of URL is not practical, and thus we used two machine learning based implementations, one being a tool developed by Papernot et al. [65] trained on UCI's Phishing Website Dataset (Mustafa et al. [66] ), and the other being Sharkcop [67] [68] to automatically label these URLs. The two different tools were used together for consensus, i.e., a URL was only considered as phishing or benign if both the tools displayed the same label. To gauge the effectiveness of these tools, we manually observed 200 URLs from our dataset and observed an accuracy of 94% for our setup. Any URL where the tools had disparate labels were put aside for manual labelling. In this way our setup was able to mark 2,464 URLs, among which it detected 1,619 URLs as phishing and 845 URLs as benign. The remaining 1,363 URLs were labelled by 4 independent coders. To make sure the coders did not directly interact with the potentially malicious websites, we provided them with screenshots of the website, and the image also contained the URL of the website. The coders verified 824 URLs as phishing and 539 URLs as benign. Thus, for URLs which had less than 2 detections on VirusTotal, we found 2,443 URLs to be phishing and 1,384 URLs to be benign. In total, we found 9,755 URLs to be phishing (87% of all unique URLs) which were contained in 15,241 tweets. Therefore, it can be established that the URLs reported by these phishing reporters have a high true positive rate. However, our dataset is highly skewed towards the top user who contributed 48.2% of the tweets to our dataset. Interestingly, we found that out of 1,384 benign URLs, 712 URLs were posted by this user alone, which constitutes 11.3% of all unique URLs posted by this account (n=6,258, out of which 4,188 URLs were unique). As mentioned in Section IV-A, we found that the top poster shares fewer details in their reports, compared to other users. Since the distribution of our dataset with respect to tweets shared by the reporters is nonuniform, with a large number of users only sharing one post, we construct a cumulative distribution of the weighted false positive rate based on how many posts each user shared versus how many of these shared posts contained URLs which were Figure 5 . We find that the top poster is one of two outliers in the distribution, with only one other user whose feed further contributed to 10% of the false positives. However, both these users have a high TPR rate of 91% and 88%, respectively as well. Outside of these two outliers, as is evident from the diagram, most users have a false positive rate of less than 1%. Thus, the majority of these reporters are much more reliable than PhishTank and OpenPhish with respect to validity of the URLs that they report.

1) Dead on arrival rate (DoA): We identify a URL as being dead on arrival when said URL is already inactive when it first appears on a phishing report/feed. We randomly selected 1k URLs from our phishing report dataset, and checked if they were active when they first appeared in a report, and compared them with 1k URLs we had already selected from Openphish and Phishtank. We found 24.2% of those URLs on Phishtank are dead on arrival. This statistic is 11.4% for Openphish. In comparison, only 3.8% of URLs posted by phishing reporters on Twitter exhibit this behaviour.

This indicates that URLs when posted on Twitter reports are much likely to be alive, and thus need immediate intervention from the targeted registrars and organizations.

2) Overlap between reported URLs and other phishing feeds: Considering each URL which were labelled as True Positive in Section IV-C from the reports in our dataset, we queried their availability on OpenPhish and PhishTank using their respective APIs. Prior literature [37] has noted that URLs keep appearing and disappearing from these phishing feeds, based on if they are still active or not. Thus, we keep checking for the URLs in both OpenPhish and PhishTank every 30 minutes, till after a week of its first appearence in the respective dataset. We find that a low number of URLs overlap with entries on Openphish and PhishTank, with the former having only 13% of URLs overlapping with the Twitter phishing reports, and the later a mere 4%. This indicates that a lot of true positive URLs posted by the phishing reporters on Twitter do not appear in either PhishTank or OpenPhish. Interestingly, 5.8% of the overlapping URLs that showed up in Openphish did so at a median time of 6 hours after being posted on Twitter. The same statistic stands at 1.3% for Phishtank. While it is difficult to ascertain if these feeds take some input from these phishing reports, our findings do suggest that the phishing reports on Twitter are a faster medium to discover newer phishing threats, as they discover these URLs more quickly than both OpenPhish and PhishTank. Also considering that registrars and even anti-phishing engines often rely (at least partially) on these phishing feeds to identify URLs, these feeds failing to cover a large percentage of tweets found on Twitter can be detrimental for user protection. Thus our findings indicate that the phishing reports are an untapped resource for quickly acquiring a vast breadth of information about new phishing websites when compared to PhishTank and Openphish. We summarize the functionalities exhibited by the reports from each of these phishing feeds in Table I . In the next section, we determine how other users on Twitter interact with these phishing reports.

We collected comments posted on each of the phishing report tweets, and found that only 2,285 tweets got at least one reply which is around 14% of the dataset. Moreover, very few of these interactions come from the registrars or the targeted organizations (752 out of 2285 conversations with at least 1 reply, 4% overall). This is despite the fact that 55.2% of these reports contain a hashtag citing these concerned services. Figure 7 illustrates the interaction of the registrars/ targeted organizations with the phishing reports. We see that even for services who have more than 100 tweets dedicated to them by the reporters (using hashtags), only 2 of them were able to reply to about 30% of the tweets that they were tagged in, with 5 targets not replying to any of these tweets. Thus, the CDF indicates that targets have very low interaction with these reports, despite these reports containing URLs which have a high chance of being true positives. We noticed that the median time for getting a reply from the domain registrars is 103 minutes, whereas the same from targeted organizations is is 171 minutes. We term this form of interaction from the registrars/targets as explicit interaction, because in these cases, we can say for sure that the target has noticed the report. Later on in Section VI we explore how this interaction influences the pace at which these reported websites go offline, and how it compares to posts which do not receive any explicit interaction.

We have already seen in Section IV-C that there is a high chance that the posts shared by the phishing reporting users contain legitimate phishing URLs. Additionally, the URLs posted by these accounts have a low overlap with the URLs posted in other phishing feeds that we have investigated. Thus, the visibility of these tweets is vital to recognize new phishing websites that are reported by them. However we have determined that these reports receive very few interactions from the community, as well as the targeted domains and organizations. Another approach to make sure that these tweets are visible is through likes and retweets [69] . We found that these posts have very few interactions in the form of likes (median=0) and retweets (median=2) as well. In fact nearly 82% of the tweets in our dataset (n=13,511) did not receive any likes, and 58% of tweets in our dataset (n=9,596) did not get retweeted. The cumulative distribution of the number of likes(favourites) and retweets received by the report tweets is illustrated in Figure 10 . Thus, the lack of this form of interaction further limits the propagation of these phishing reports through the Twitter community. But, we find that the total number of retweets (n=34,284) is 4.5 times more than the total number of likes received (n=7,527) when all of the tweets in our dataset are considered together. This indicates that the users who interact with these posts have the intention of sharing the information along to their peers, which might lead to more attention towards these tweets. Now, we are interested in determining what percentage of likes/ retweets received by these tweets are from users in the Technological communities, especially, in the Security field. We do so by examining the profile descriptions of the users who have liked and retweeted the tweets in our dataset. Since it is impossible to qualitatively analyze the profile descriptions of all such users, we assigned four coders to go through the profile descriptions of 500 users (who iked/retweeted the reports), to identify which of them indicate that the individual's line of profession/interest is Technological(Tech), Computer Security (Security) or they are not related to Tech (Non-Tech). Based on this labelling, we picked out the profile description of the users marked as Security and created a Word-cloud as illustrated in Figure 8 . We obtain the top 20 most frequently occurring words and their combinations and match it with the profile descriptions of users who liked (n=7,527) and/or retweeted (n=34,284) the phishing reports. We find about 37% of likes/ retweets came from users who are interested/work in computer security. Do note that our findings are based on the keywords that we had selected from the world cloud, and also about 14% of the users had a blank or irrelevant profile description. Thus, realistically, the number of security focused users who interact with these tweet might be even higher. Even then, a large number of these interactions came from individuals belonging to the security community, which might increase the chances of the reports to be noticed by a registrar/targeted organization. However, we also note the majority of likes and retweets come from only 5% of the total no. of accounts that belonged to the security community (based on their profile description). 

We find that the accounts in our dataset have a median follower count of 472 and median listed count of 7. Despite our previous findings that the phishing reports receive low explicit interaction, as well as very few likes and retweets, the large majority of phishing reporting accounts have a decent number of followers, with 523 accounts having more than 100 followers. On the other hand, we checked the listed count rate or LCR, which is the percentage of users who listed an account vs the total number of followers the user has. Listed count is considered as a metric for credibility [70] , i.e. users tend to list accounts who they rely on for information regarding specific topic(s). Interestingly, we find that 3 users have a higher LCR than their total no. of followers, however 93% (n=652) of the reporters have a LCR of less than 10%, with 40% of accounts (n=283) having an LCR of less than 1%. This indicates that despite the users having a decent number of followers, most of them are either not recognized or considered to be a creditable source for providing information, as indicated by their low LCR . Incidentally, the top poster account has an LCR rate of only 2.9% despite contributing the majority of the URLs to our dataset. Using the keywords that we had found from the profile descriptions of security related users in Section V-A, we find that at least 33% of the users belong to the security community. While it is interesting to see that a majority of the users that follow these accounts belong to a relevant community, as we had found earlier in Section V-A, the number of unique users in Security who actually interact with these tweets through likes and retweets is much lower (5%).

We have already observed that the domain registrars and organizations which are targeted by the reported URLs have very low explicit interactions (posting comments) with said reports. But since we have already established that these reports are reliable and provide a lot of information about the phishing website, it is very important that these reports are discoverable, i.e., the targeted entities can notice these reports such that they can expedite the process of removing the URLs. The most convenient way to discover such new reports is to follow the phishing report accounts, as posts from these accounts will then show up in the personalized feed of the followers. Out of the 349 registrars and organizations that were tagged by these reports, 303 of them (87%) have an account on Twitter. Using the Twitter API, we collected the names of all followers for each of our 701 accounts, and then looked for the presence of these 303 registrars/ organizations' twitter accounts in their follower list. We found only 31 targets/ organizations (10.2%) which follow at least one of the phishing reporting accounts, with only one user following a maximum of 12 accounts. Figure 9 illustrates the distribution of the domains/ organizations across the number of phishing report accounts that they follow. While it is difficult to ascertain how and whether registrars/ organizations keep a track of URLs shared by these phishing reports, our findings imply that a large majority of targeted domains/ organizations do not follow these reporting accounts, either because they are not aware of them, or do not consider them a creditable source for obtaining phishing reports. Fig. 9 . Distribution of the domains/organization accounts across the number of phishing report accounts that they follow.

The age of an account denotes how long it has been active on Twitter. We found that the phishing reporting accounts in our dataset have been active for a median period of 2,129 days (5.83 years), with only 83 accounts (12%) having an age of less than a year. Prior literature has recognized accounts which tend to distribute spam and misinformation to have low account age [71] , [72] , and thus the longevity of these accounts can be used as yet another feature/indicator by domain registrars and anti-phishing tools to determine whether they should rely on the reports. We continuously checked whether each unique URL reported in the phishing reports was active and found that throughout the duration of the study, 39% of URLs reported by the accounts were still active after a day, and 31% after a week. Since only 752 tweets received a reply from the registrar/ targeted organization (4% of all tweets in our dataset, containing 671 unique URLs), we compare these tweets with the same number of randomly selected unique URLs included in reports which did not receive a reply from the registrar/ targeted organization. Note that for the latter group, we only selected URLs which had become inactive. We performed a Mann-Whitney U Test [73] on both groups of URLs, and found that URLs found in reports which get a reply from the targeted organization are more likely to become inactive sooner than URLs in reports which do not get a reply (p<0.01). Statistically, URLs in posts which got a reply from an organization all became inactive within a median time period of 403 minutes. On the other hand, for URLs which did not get a reply, we found the median time of removal to be at 1,172 minutes. However, the latter group of URLs can also be bifurcated into two more groups. Earlier we have seen that 52% of these reports use a hashtag which cites the registrar or targeted organization. Thus to determine if there is a difference between the activity time of URLs which contained relevant hashtags versus those which did not, we randomly selected 500 posts (each containing unique URLs) from the two groups, and performed a Mann-Whitney test again. Our results indicate that posts which tag the hosting or targeted organization are more likely to be removed more quickly than posts which do not contain such hashtags (p<0.01). The median time of removal for URLs which contained a relevant hashtag was at 847 minutes, while those which did not had a median time of removal of 1,591 minutes. It is to be noted that all URLs which garnered a reply from the target had relevant hashtags. However these reports were only 8% of the overall tweets which had used hashtags (752 out of 8,572 posts), indicating that the majority of reports with hashtags do not recieve an explicit interaction from the target or hosting organizations. Thus, it is hard to determine what factor determines whether a targeted registrar/organization will reply to these reports.

Phishing report tweets become inactive the quickest when they receive a reply from the targeted registrar/ organization, with the 671 unique URLs removed at a median time of 403 minutes. Comparing this time with the same number of (true positive) URLs chosen from PhishTank and OpenPhish, and also performing respective Mann-Whitney U Tests between removal times between them and the phishing reports, we notice that URLs found on PhishTank are removed at a median time of 132 minutes, having a significant edge over Twitter URLs (p<0.05). The same is seen for OpenPhish URLs are removed at a median time of 71 minutes (p<0.01). Our findings thus suggest that URLs appearing in these feeds get removed much faster than those found in the phishing reports. However, in Section IV-D1, we have already found that several URLs submitted on Phishing and OpenPhish are dead on arrival, when compared to those found on Twitter. This when added to the fact that there is minimal overlap of URLs included in the Twitter Reports with the two other phishing feeds, further suggests that: (a) Phishing reports on Twitter are a viable solution for finding new phishing URLs, which are mostly not found on atleast two other popular phishing feeds, and (b) Registrars and targeted organizations are slower at removing websites which show up on these reports, something that can be easily improved upon. We expand upon these findings in Section VI-C, where we compare the coverage of URLs in phishing reports by anti-phishing tools, compared to those found on the two other phishing feeds.

Around 31% of unique URLs (n=3,453 URLs) remained active even after a week. It is interesting to note that none of the 671 URLs which were part of the posts that targets replied to are found in this category, suggesting that an explicit interaction from the targeted registrar/ organization leads to the removal of the website. Almost 67% of the URLs which did not get removed after a week (n=2,311 URLs) were those which did not have any relevant hashtags. Thus, we hypothesize that the lack of such hashtags might make it difficult for targets to search for them/index them, compared to those which already have a hashtag. The higher rate of removal for URLs which were part of reports which contains relevant hashtags further hints at the phenomenon of Implicit interactions between the target registrars/ organizations with these phishing reports, that the latter can investigate (and remove these URLs) without directly interacting with the reports. However, this assumption is not comprehensive, as URLs in posts which had hashtags might have also shown up in phishing feeds which we have not covered in this study, something we can clarify in a future study by focusing only on URLs which exclusively appear in these phishing reports.

1) The case of unconventional phishing URLs: Work by Saha Roy et al. [19] explored a new category of phishing URLs which use free hosting domains to remain undetected from anti-phishing tools, and are similarly not removed by registrars for a long period of time, if at all. In our phishing report dataset, we found 5% (n=631) URLs belonging to this category, out of which 53 received a reply from the registrar/ targeted organization. We found that all of these URLs were removed at a median time of 319 minutes, which is much quicker (by several days) than what has been previously established. We selected 100 random URLs from each of the other phishing feeds which were part of this category, and noticed that such URLs on PhishTank are removed after a median time of 1047 minutes after appearing, whereas the statistics for OpenPhish is 892 minutes. Thus, our preliminary analysis suggests that phishing reports are a much quicker way of making sure these hard to detect URLs are removed, when compared to OpenPhish and PhishTank. Fig. 11 . Tracking median VirusTotal scores for the reported URLs through their first day of appearance for phishing reports which received a comment from the registrar/ target organization, reports which do not, as well as PhishTank and Openphish.

In this section, we investigate how quickly anti-phishing URLs pick up on URLs shared by phishing reports and how it compares to OpenPhish and PhishTank.

Since we only have 752 posts on Twitter containing 671 unique URLs which received explicit interaction from the targets, it would not be fair to compare them with a large volume of URLs from the other conditions, i.e., Twitter posts without explicit interaction posts. Thus, we sample 500 random tweets from each of these sets. Since most phishing URLs and campaigns are only online for less than a day, we tracked how many anti-phishing tools detected these URLs at an interval of every 30 mins through a period of 24 hours. We illustrate the detection of the URLs through time in Figure 11 . To avoid congestion in the figure we extended the time bins to 1 hr instead of 30 mins.

Our results indicate that URLs on Openphish and PhishTank are detected by a lot more engines within a short time after their appearance compared to those included in Twitter phishing reports. However, we see that reports once explicitly interacted upon by targeted registrars and organizations, see a rapid rise in detection rate by anti-phishing engines, going almost head to head with the other phishing feeds, if not exceeding them. However, URLs which did not get explicit interactions tend to consistently have lower anti-phishing tool detection throughout the day. We have already noticed in Section VI that the majority of the URLs included in Twitter phishing reports were alive or had a very slow rate of removal. We see here that they are very sparsely detected by antiphishing tools as well.

To reliably analyze the qualitative content shared by the tweets used in this study, we only examined reports which were in English. We also limit our data collection to tweets which contained four terms/ hashtags -'hxxp', 'hxxps', #phishing and #scam. Thus, our dataset is not exhaustive, with the possibility of more phishing reports existing in other languages or having different format of text. We also do not account for how the phishing reporters on Twitter obtain the URLs in the first place. Thus as a future study, we propose a qualitative line of research which can investigate these reporters on a case by case basis by conducting interviews and questionnaires to understand how they maintain their accounts on Twitter. Outside of Twitter, we only examine two other open phishing feeds, and since we did not attempt to study proprietary (closed) phishing feeds, it is not possible to determine the contribution of the phishing reports which did not receive any interaction from the targeted entities but got remove anyway, as said URLs may have showed up in those closed feeds, which might have triggered their removal.

In this study, we establish phishing reports posted on Twitter as a new and reliable resource for sharing information regarding these web-based threats. When compared to two other open phishing feeds -PhishTank and OpenPhish, our findings indicate that phishing reports on Twitter tend to share more information about the phishing URLs, cover an extra threat category (drive-by downloads) of attacks, and tend to have lower volume of false positives. Considering that the best defense against phishing websites is taking them offline, several of these reports also use hashtags to notify the domain registrars and targeted organizations that they have been targeted by the reported URLs. When the targeted entities (targets) interact (comment) with the posts, it leads to quick deactivation of the reported URL, as well as getting detected by more anti-phishing engines. However, these interactions were noticed in only 4% of the posts in our dataset. Among the URLs contained in reports which did not get any interactions, nearly 31% of them were still active even after a week, in addition to being detected by fewer anti-phishing engines compared to URLs posted on OpenPhish and PhishTank. We also noticed that only 10.2% of the targets follow at least one phishing reporting account, with only one entity follow a maximum of 12 reporting accounts. This indicates that, despite the majority of these tweets reporting true positive phishing threats, targeted entities are either not aware or do not find these reporting accounts as a viable source for gathering information about new phishing URLs. Additionally, the majority of users, who follow these accounts, belong to the security community, yet a very small minority( 5%) actually interact or share these reports, which might negatively impact the discover-ability of these reports.

Thus, our evaluation brings to light the effectiveness of phishing reports that are hared on Twitter. The reliability and volume of information shared by these reports can expedite the process of moderation and removal of newer phishing threats. Also, considering the low rate of false positives, security researchers can especially benefit from extracting information from these reports and utilize it for ground-truth labelling. Prevalent anti-phishing tools can also look to extract information from this resource to enhance their own blocklists. However, the current situation indicates that domain registrars and organizations targeted by phishing threats tend to ignore or are not aware of these reports, despite several of the reported URLs being exclusive to these phishing reports and do not show up on PhishTank or OpenPhish. This is further exacerbated by the fact that security focused users who follow these accounts tend not to share these tweets through their network to raise awareness and discover-ability of the reports. Thus, we hope our findings in this work can raise awareness towards the effectiveness of this existing knowledge-base on Twitter, such that it can be integrated in prevalent phishing moderation and research workflows, as well as motivate further research towards analyzing these accounts, and if similar useful knowledge-bases can be encountered within other Online social media networks.

Google registers record two million phishing websites in 2020

Must-know phishing statistics: Updated 2020

Machine learning based phishing detection from urls

Detection of phishing websites using an efficient feature-based machine learning framework

Phishing e-mail detection by using deep learning algorithms

Phishing detection: A recent intelligent machine learning comparison based on models content and features

Phishing website detection based on multidimensional features driven by deep learning

Web phishing detection using a deep learning framework

New rule-based phishing detection method

Phidma-a phishing detection model with multi-filter approach

Phishwho: Phishing webpage detection via identity keywords extraction and target domain name finder

kn0w thy doma1n name" unbiased phishing detection using domain name based features

Detecting phishing web sites: A heuristic url-based approach

Heuristic nonlinear regression strategy for detecting phishing websites

Intelligent phishing url detection using association rule mining

Systems and methods for risk rating and pro-actively detecting malicious online ads

Cracking classifiers for evasion: a case study on the google's phishing pages filter

Creative persuasion: a study on adversarial behaviors and strategies in phishing attacks

What remains uncaught?: Characterizing sparsely detected malicious urls on twitter

Bypassing detection of url-based phishing attacks using generative adversarial deep neural networks

Phishtime: Continuous longitudinal measurement of the effectiveness of anti-phishing blacklists

1.4 million phishing websites are created every month

Intelligent rulebased phishing websites classification

Phishing-challenges and solutions

Priming and warnings are not effective to prevent social engineering attacks

On the anatomy of social engineering attacks-a literature-based dissection of successful attacks

Why phishing still works: User strategies for combating phishing attacks

Security awareness of computer users: A phishing threat avoidance perspective

Phishing happens beyond technology: The effects of human behaviors and demographics on each step of a phishing process

Coronavirus scams, feeding off investor fears, mimic fraud from the 2008 financial crisis

Corona virus (covid-19) pandemic and work from home: Challenges of cybercrimes and cybersecurity

Phishing attacks increase 350 percent amid covid-19 quarantine

PhishTank

Phishing feed

An empirical analysis of phishing blacklists

An analysis of phishing blacklists: Google safe browsing, openphish, and phishtank

Evaluating the wisdom of crowds in assessing phishing websites

How does built-in phishing and malware protection work

Opera introduces fraud protection, powered by geotrust and phishtank: New release expands opera's commitment to secure browsing

Safari and privacy

Friends of PhishTank [Infographic

Email Security -Defanging URLs

VirusTotal

Automated malicious advertisement detection using virustotal, urlvoid, and trendmicro

Analysis of malware download sites by focusing on time series variation of malware

Automating url blacklist generation with similarity search approach

Rmvdroid: towards a reliable android malware dataset with app metadata

Python-whois

Dynamical classes of collective attention in twitter

An analysis of social engineering principles in effective phishing

All your iframes point to us

Opening the blackbox of virustotal: Analyzing online phishing scan engines

What is a Trojan? Is it a virus or is it malware

Cryptojacking -What is it

Demystifying a Keylogger -How They Monitor What You Type and What You Can Do About It

What is Scareware

What is Ransomware

Opening the blackbox of virustotal: Analyzing online phishing scan engines

Detecting phishing websites using a decision tree

Phishing websites data set

Sharkcop

Sharkcop

Emoji, playfulness, and brand engagement on twitter

Modeling topic specific credibility on twitter

A framework for real-time spam detection in twitter

Feature engineering for detecting spammers on twitter: Modelling and analysis

Mann-whitney u test