key: cord-0641011-o5jw441f authors: Wu, Peilun; Yan, Fan; Guo, Hui title: Holmes: An Efficient and Lightweight Semantic Based Anomalous Email Detector date: 2021-04-16 journal: nan DOI: nan sha: 2ab217063daf7a05a1f2a6bf039a2a62e99b4fb7 doc_id: 641011 cord_uid: o5jw441f Email threat is a serious issue for enterprise security, which consists of various malicious scenarios, such as phishing, fraud, blackmail and malvertisement. Traditional anti-spam gateway commonly requires to maintain a greylist to filter out unexpected emails based on suspicious vocabularies existed in the mail subject and content. However, the signature-based approach cannot effectively discover novel and unknown suspicious emails that utilize various hot topics at present, such as COVID-19 and US election. To address the problem, in this paper, we present Holmes, an efficient and lightweight semantic based engine for anomalous email detection. Holmes can convert each event log of email to a sentence through word embedding then extract interesting items among them by novelty detection. Based on our observations, we claim that, in an enterprise environment, there is a stable relation between senders and receivers, but suspicious emails are commonly from unusual sources, which can be detected through the rareness selection. We evaluate the performance of Holmes in a real-world enterprise environment, in which it sends and receives around 5,000 emails each day. As a result, Holmes can achieve a high detection rate (output around 200 suspicious emails per day) and maintain a low false alarm rate for anomaly detection. Though the instant messaging software, such as Facebook and WeChat, has gained increasing popularity, the email service is still indispensable for enterprises. Since the email service is a public-facing application, it can be targeted by the hacker as an easy entrance to the internal network. Based on our observations, fraud, malvertisement and spread-phishing are the main email threats frequently received by enterprise users. These emails use deceptive subjects to pretend and hide themselves. Usually, malware infected attachments or malicious URLs are embedded in the email body to spoof recipients for further action. Once the attachment is downloaded or a link is clicked, the recipients's system is compromised or the confidential information is leaked [1] . To alleviate the problem, an enterprise often deploys some anti-spam gateways to filter out unexpected emails. However, the associated techniques for spam detection, such as greylist and subject analysis, cannot effectively discover novel and unknown email threats that are elaborately constructed by utilizing various current hot topics, such as COVID-19, US election. These unknown threats can easily bypass the antispam gateway and successfully permeate the target system, leading to a series of damaging consequences, such as administrator account theft, database attack and financial blackmail. In this paper, we introduce a novel artificial intelligence based anomalous email detector, HOLMES, that can effectively tackle the challenges mentioned above. HOLMES combines word embedding with novelty detection to discover anomalous behaviours from a high volume of mirrored SMTP traffic in a large-scale enterprise environment. To improve the result interpret-ability, we trace the real source IP addresses of suspicious emails in line with their geographical positions and further visualize the correlated relations in a directed-force graph. Our contributions are summarized as follows: • We propose an efficient and lightweight semantic based anomalous email detector, HOLMES, which can not only discover new email threats but also maintain a low false positive rate in a real-world environment. • Different from other detectors that usually require to examine email bodies, HOLMES can discover anomalies simply based on email headers, which significantly reduces the cost of resource consumption and avoid accessing email bodies (a sensitive security issue). • We demonstrate the correlated relations of detected suspicious emails via graph visualization and show that the attacker portrait (based on their geographical positions) is in line with the cyber threat intelligence provided. • We evaluate HOLMES with a commercial anti-spam gateway deployed in a real-world enterprise environment. HOLMES not only can accurately detect those email threats that have been blocked by the anti-spam gateway, but also can discover a large number of email threats that have successfully escaped from the gateway. • We also compare HOLMES with several commercial email detectors offered by different security vendors in VirusTotal [2] , which shows that HOLMES outperforms those detectors with a very high detection rate on the use of threat hunting in the wild. The remainder of the paper is structured as follows. We begin with a brief discussion of some related work on email detection in Section II. We then in Section III introduce the proposed semantic based anomalous email detector, HOLMES. In Section IV, we present our evaluation results of HOLMES and several commercial security products; the demonstration of how the visualization can be used to reconstruct the attack stories is also given in this section. The enhancements on HOLMES for the real world implementation is given in Section V. The paper is concluded in Section VI. As an addon section, we append some extra discussions at the end of this paper. Anomalous emails can be classified into external threats and internal threats in accordance with MITRE ATT&CK Matrix [3] . External threats are the emails sent from external sources, whereas the internal threats are the emails sent from legitimate users within an organization but whose email accounts have been stolen and used for the lateral movement attack. Most of previous research mainly focuses on one specific threat type, such as URL-based lateral phishing [4] or phishing web pages from search engine in a large-scale cyberspace [5] . There are still many open questions and unsolved challenges that need to be addressed holistically. Some issues and the existing solutions are presented below. No Built-In Authentication in SMTP. The lack of a native authentication mechanism inside the SMTP service presents a security loophole to attackers. Attackers can easily forge the email header by pretending to be someone the recipient knows or from a business the recipient has a relationship with, so as to spoof recipients and avoid spam block lists [6] . To address the problem, several frameworks, such as SPF [7] (Sender Policy Framework), DKIM [8] (Domain Key Identified Mail) and DMARC [9] (Domain-Based Message Authentication, Reporting, and Conformance), have been developed to incorporate authentication into the email system. However, these designs are still not very effective in terms of implementation. When integrating authentication into the mail system with a typical component-based software design, there are inconsistency issues between the software components offered by different parties [10] , such as the incompatibility of mail forwarding servers, which allows numerous email threats escape detection. Lack of Sensitivity to Unknown Variations. The unreliability of SMTP leaves email threats to have evolved with many variations, which are difficult to be discovered by the traditional security products. We have evaluated several malicious email detection modules within our internal security products that use pattern matching of attack signatures for anomaly detection. None of them can discover the crafted phishing emails that utilize business-related content to pretend themselves look normal for evasion. We also have used the crafted phishing samples collected from our real-world hunting to evaluate the detection rate of 60+ typical detection engines in VirusTotal (Enterprise Service). Nevertheless, the evaluation result also shows their low sensitivity to unknown threats -in fact, all testing samples can successfully escape from the detection of those engines. This kind of low ability of detecting unknown attack variations has motivated the security community to turn to AI-based methods for anomaly detection. High False Positive Rate. The research on anomaly detection for cyber threat hunting has been around for decades. The main concern on applying machine learning for anomaly detection is the significant false positive rate (FPR). Even though new designs are continuously proposed aiming for improvement [11] - [13] , they were rarely evaluated in a real-world working environment, let alone put into use in commercial systems. High Cost and Performance Bottleneck. The imbalance between the cost of data collection and the performance of algorithmic consumption is a significant challenge for most of the AI-based detectors. Though the complexity of AI computing algorithms has been constantly improved, most AI modules still require large computing and storage resources. which makes the existing attack detectors not easy to use and very slow to response attacks. Furthermore, the detectors that use supervised machine learning require the labeled input data records and often need to be retrained once their performance begins to degrade, which also makes the machine learning ineffective for detection automation. Lack of Provenance Analysis. So far few detectors have considered to integrate the provenance analysis within the detection mechanism. We believe provenance analysis is an important and enabling component in malicious email detection. Provenance analysis [14] can reveal the attack story and the detail of attacker portrait behind the email, such as (1) where the email is from, (2) who the real sender is, (3) how the malicious shellcode executes, (4) what the potential correlations between malicious events are. The above information is important for the security team to analyze the attack techniques, tactics and procedures (TTPs) and further assist the security experts to identify the individual attackers or organizations. To address the above challenges, we introduce an efficient and lightweight semantic oriented anomalous email detector, HOLMES, that can detect an email attack by analyzing the sender -recipient relation, which is available in the email header of SMTP. HOLMES is a threat hunting tool for the incident response and investigation, which works on mirrored network traffic to inspect and report anomalies in the bypass but cannot block them directly. HOLMES is used to assist the incident response team to discover more concealed threats, which escape from the detection of traditional anti-spam gateway and threaten the network security. HOLMES is a self-adaptive learning machine, which can learn historical SMTP traffic from last 24 hours and then detect anomalies that deviate from the baseline of historical behaviour in the next 24 hours. Fig 1 shows For each email, HOLMES takes its header and converts it into a numeric presentation through word embedding. The numeric data records are then input to the machine learning unit (Novelty Detection). The unit generates a list of novel emails, which are, in turn, processed by the rareness selection procedure to narrow down the detection targets. The detected results are finally presented in a human readable format and the correlations of the related email attacks are also pictured with a graph. HOLMES is also a super lightweight and high-efficient detector, which was originally written in Python with only 52 lines codes. Based on the run-time analysis, HOLMES can complete the entire detection in less than 73 seconds with 127 MB memory consumption on around 700 MB datasets. In the section, we would open-source the original codes and detail the implementation of the main detection functions in a hope to assist researchers to evaluate and reuse HOLMES in their future research. Since the email textual header information cannot be directly used for machine learning, how to effectively represent textual data to the machine understandable is important. Most algorithm engineers use OneHotEncoder [15] or Ordi-nalEncoder [16] , or Bag-of-Words (BOW) [17] , which can be simply implemented by the open-sourced library Scikit-Learn. However, those methods are not able to effectively maintain the data semantic correlations in either temporal or in spatial dimension. To address the problem, we use paragraph vector (Doc2Vec) [18] for the conversion, as detailed by the Python code in Compared to other conversion methods, Doc2Vec is able to better keep the semantics of the words or more formally the distances between the words, which can be of variablelength ranging from sentences to documents. Doc2Vec is a semi-supervised learning algorithm. Its input is unlabeled but what will be learned is specified/supervised. In our code, the inputs are email headers and what to be learned are the features in the header, as highlighted in blue in the top-right block in Fig. 1 . Besides of some basic attributes such as subject, header.from or user-agent, which are often forged by hackers, we design two additional features that can also be used to help identify anomalies: the direction of email (direction) and the country of source IP address (srcIp.country). The code in Listing 1 converts each email event to a feature vector, as illustrated in the right side middle boxes in Fig. 1 . The feature vectors are then used for novelty detection. Anomalous emails are usually unknown and novel. Their behaviors often deviate from the trace of historical normal activities. We use Local Outlier Factor (LOF) [19] to discover those emails, as given in Listing 2 and Listing 3. 1 def local_outlier_factor(self,train_feature, test_feature): The LOF algorithm shown in Listing 2 can learn the feature vectors of historical emails (i.e. the train feature dataset in the code) then provide the outlier score for newly seen emails (from the test feature dataset). There are some compelling advantages of applying LOF for novelty detection: (1) It allows to train learning model on the data with contamination; (2) It has a low computing complexity and can be used for onlinelearning, hence avoiding the performance degradation and the cost of retraining; (3) It is not sensitive to fine-tuning, which is beneficial to the effectiveness and stability of parametric learning. 1 def novelty_analysis(self,factor,test_feature): The decision scores from the LOF code can be negative and positive. The negative values indicate the abnormalities and the positive values indicate the normal behaviours. We regard any vector with a score smaller than a threshold is associated with an anomalous email, which is traced by the its index in the dataset, as shown in Listing 3. If we consider the relation of sender and recipient in emails, anomalous emails are often associated with a weak relation, which is a hacker will not send and reply to an email with a same recipient periodically. We therefore can further narrow down the malicious emails based on the senderrecipient relation -those abnormal emails with a weak senderreceiver relation will be selected as the final detection result, as described in Listing 4. In our design, the relationship is measured based on the combination of a set of email features: source IP (src ip), direction, sender (mail from), and receiver (mail to). For a strong sender-receiver relation, there should be many emails of the same IP-direction-sender-receiver value. Therefore, we count emails for different IP-direction-sender-receiver values and select those that have a low count value (smaller than a threshold) as malicious emails, where the threshold is often fine-tuned based on different network environment by the operation team. Most of prior research overlooked a problem: what is the relation within the anomalies? Lack of an effective solution significantly increases the load of security analysts, blurs the attacker portraits, and further makes the provenance analysis difficult. To address the problem, we introduce a correlation graph analysis (CGA) module to improve the clarity of attacker portrait descriptions by correlating different anomalous events. CGA is a directed-force graph and in our design, each node consists of the selected header features: country, srcIp, sender and subject. The directed graph enforces the nodes that have dense connections come closer but separates the nodes if they do not or have sparse connections. The graph depicts the similarity of different anomalies (such as the same srcIp, same subject or same sender) and centralizes the cluster in line with their geographical locations, hence significantly improving the interpret-ability of provenance analysis. The graph on the bottom right of Fig 1 demonstrates the visualization result of CGA, where two clusters (one in red and one in blue) highlight the connected components that are centralized in accordance with the country of srcIp. The blue cluster shows that the same malicious email but sent from different sources, and the red cluster reveals the same source sends multiple different malicious emails. The CGA module can be used to generate active IOCs (Indicator of Compromise) for the Cyber Threat Intelligence Platform, where we can match the similar or same malicious incidents occurred to other customers based on the IOCs. HOLMES has been deployed in an enterprise environment, where it can read mirrored SMTP records from the Elastic-Search (ES) server. In this section, we first present some case studies on the malicious emails detected by HOLMES and then show how correlation analysis can reveal the attack scenarios caused by malicious emails, and finally we compare HOLMES with other popular commercial email detectors. According to our monthly email system data, HOLMES can discover around 1,000 anomalous emails each day. Among them, about 23% are truly malicious. And most of the malicious emails contain either phishing links or malware infected attachments. The rest are mainly spams and only a few are false positives. Based on the detection results, we derive some malicious emails from our email server, which were not blocked by the anti-spam gateway but have been identified by our security analysts as the high risks, to reconstruct the attack stories. Here, we would showcase some delicate crafted phishing emails and describe their malicious behaviour in detail. 1) Case A : Fig 2 (a) shows the execution flow of a malicious email that pretends DHL service and plays following tricks: (1) The email uses a normally-seen subject that is associated with an invoice document; (2) The sender information has been modified as 'DHL Express', which can be implemented by some hacking tools, such as swaks [20] or cobalt strike [21] ; (3) The email includes an attachment named invoice.doc, which is, in fact, a malicious Trojan document that utilizes the CVE-2017-11882 [22] vulnerability; (4) The email contains a delicate picture of DHL delivery service to spoof recipients. In this attack scenario, an attacker who successfully exploited the vulnerability could run arbitrary code in the context of the current user (recipient). If the user is logged on with the administrative user rights, the attacker could take control of the affected system. The attacker could then install programs; view, change, or delete data; or create new accounts with full user rights. As we can seen, users whose accounts are configured to have fewer user rights on the system could be less impacted than users who operate with administrative user rights. 2) Case B : Fig 2 (b) shows the execution flow of a malicious email that uses a deceptive subject named "New Sign-in Attempt", aiming to spoof recipients to change their email account password. Once the recipient clicks the button of "Update security settings", the web page will be redirected to the phishing website: https://controladmin.7m.pl/login[.]html? #xxx@xxx.gov.xx, which induces the victim user to type in the username and password. The web page will, in the end, return to the enterprise homepage that the victim user works. On the hacker side, the back-end server will receive the event log of the failed login attempts from the victim user, and then record the username and password. Hence, the hacker can use the legitimate email account to sign in, such as web page or email server, and can even further send an elaborately crafted phishing email to a person who is the victim's frequent contact, which is hard to be detected by most security products. 3) Case C: The malicious execution flow shown in Fig 2 (c) is similar to the attack shown in Fig 2 (b) in that it also has a link embedded in the mail content for phishing campaign. However, the phishing link https://armonaoil.com/ admin/images/npgtr/newsl/potcpanel [.] html is from a legitimate website rather than from a personally created malicious website, which indicates that the legitimate website has been compromised for the use of darknet market 1 . By further analysis, we found that the enterprise indeed opened the cPanel web hosting server for public access, which was vulnerable to the brute-force attack and remote external control. Furthermore, we examined the recent activities on some popular darknet markets and found that more than 3,000 sites of cPanel accesses were selling in the darknet market (raidforums) since 2020-11-22. Hence, it can be confirmed that the email attack is a phishing campaign caused by the third-party information leakage. As discussed in Section III-D, email attacks can be clustered in line with some features, such as senders and subjects, and there may be geographical links with each other. In our investigation, we can clearly observe the email clusters on the directed-force graph generated by our CGA module, as has been demonstrated in Fig. 1 . Those clusters can reveal some attack scenarios. Some typical examples are presented below. 1) Bitcoin Fraud: The attack scenario is related to a cluster which shows that hundreds of senders from different srcIp addresses sent the same email to spoof the recipients with an email content: "Your computer has been controlled...transfer bitcoin to the wallet.... ". Based on our experience of threat TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE 2) Eye-Fetching Spams: The spams have been discovered in another cluster which shows that several srcIp addresses from a single country sent a large amount of spam emails that had similar interests, which involved with sensitive political topics, gamble and eroticism information. These emails are basically harmless but rather annoying users if the anti-spam gateway fails to filter them out. However, it is still important to investigate those unusual email communication behaviours because some of the spam emails may involve with potential spy activities, which may lead to the information leakage of internal classified documents through social engineering. 3) Periodical Anomalous Behaviour: A notable clustered behaviour was also observed by the CGA module that a group of hackers from the same country using different srcIp addresses periodically sent malicious emails with similar subjects to our customers and the same URL redirection technique for phishing links. These emails use a crafted email content with the brand of targeted victims, such as the name of affiliations or IT support team. Once users follow their instructions to finish all actions, they will be redirected to their original organization's homepage. This periodical anomalous behaviour has been confirmed by our security analysts as a long-term and targeted phishing activity. To evaluate the detection capability of HOLMES, we compare it with some commercial email detectors that are offered by six key security vendors in VirusTotal: Microsoft, Tencent, Kaspersky, FireEye, McAfee and Qihoo-360. We select 15 malicious emails as testing samples. They either contain a phishing link or have a malware infected attachment. All the testing samples were collected from the real-world threat hunting during the whole December month in 2020, and these samples had bypassed the detection of the enterprise anti-spam gateway and successfully detected by HOLMES. The comparison is to demonstrate the proportion of highlyconcealed malicious emails, which the other commercial tools still cannot to discover. The result shows the evidence and reason why we still need a behavioural anomaly detection tool like HOLMES for anomalous email detection. The comparison table is given in Fig 3, where the email subjects representing the 15 malicious emails are listed in the first column and the rest columns are the detection results from the commercial detectors and HOLMES. A FALSE value from a detector on a malicious email indicates that the detector failed to identify the malicious email. For the detectors of Microsoft, Kaspersky and FireEye, we can see a good performance on detection of those malicious emails that contain malware infected attachments. However, they fail to detect the malicious emails that contain phishing links. Based on the further analysis by our security experts, most of the phishing domains have been registered no more than three months and some of them are even from legitimate known enterprises. Moreover, all the phishing links include a specific URL to access the particular crafted phishing web page under the domain name that is shortly expired in around three days. Such a short-lived situation significantly increases the difficulty of anomaly detection. From the comparison table, we can also see that McAfee demonstrates a moderate detection rate on the malicious emails that contain malware infected attachments. Similar to Microsft, Kasperly and FireEye, McAfee also cannot detect the malicious emails that contain a newly registered phishing link. Compared to all above detectors, Tencent and Qihoo-360 have a low detection rate. Among the 15 malicious emails, only two are detected by Tencent and three by Qihoo 360. We would clarify that, the detection engines used for the comparison are supplied by the VirusTotal Enterprise Service. Since the version of the detectors may not be the same used in their commercial products, we would state that the comparison result cannot completely indicate the detection capability of After the first deployment to the enterprise environment, as mentioned in the above section, HOLMES has been upgraded with a few enhancements. In the latest version of HOLMES, we rebuild the code warehouse that makes HOLMES more efficient to discover anomalies in a much smaller rolling time window. The improvement is achieved by moving the data query system from Elastic-Search (ES) server to the real-time Kafka computing platform. The main difference between ES and Kafka is the way the data is processed. ES uses batch processing whereas Kafka uses stream processing, and the stream processing is more timely and efficient. Kafka is an open-source distributed event streaming platform. It consists of producers, cluster (brokers) and consumers. Due to its high throughput and availability, Kafka has been widely used by thousands of companies for highperformance data pipelines, streaming analytics, data integration, and mission-critical applications. In our case, the producers of Kafka are probers deployed in the enterprise network to sniffer network traffic for the use of detection, brokers are the middle-ware mechanism to distribute data streams, and consumers are the detector of HOLMES. The advantage of using Kafka is that HOLMES can detect anomalies in less than one minute without the risk of server crash, significantly reducing the computing consumption. Fig 4 and Fig 5 respectively shows the detection performance of HOLMES in December 2020 and the run-time performance improvement after switching batch processing to stream processing. Furthermore, use of Kafka can also help our security analysts to better schedule time for threat identification and improve the efficiency of threat responses. Based on the report from our automated security operation center (SoC), most of the email threats are from the external sources, particularly from overseas, and the beacon of lateral phishing occurs with a much lower frequency than the inbound email threats, whereas the cases of data information leakage occurs with a medium frequency. To improve the result interpret-ability, in the new implementation, we further classify the anomalous emails into three categories: inbound email threats, lateral phishing threats and outbound data leakage. The classification can help our security analysts to easily locate the potential threats so that the procedure of threat identification can be further accelerated. With the new HOLMES implementation, we are able to provide the daily threat report and offer the IOCs as a Software as a Service (SaaS) in a cloud platform to alert our customers so that they can response the threats in real time. In this paper, we introduce HOLMES, a lightweight semantic based anomalous email detector, which can effectively discover malicious emails in the real-world cyber threat hunting. HOLMES also demonstrates a viable solution that successfully transfers AI technology to the cyber security field and makes an excellent trade-off between the cost of algorithmic consumption and the detection performance. We measure the performance of HOLMES, and compare its detection capability with several well-known commercial detectors offered by the security companies in VirusTotal. Our evaluation result shows that, on the use of cyber threat hunting, HOLMES significantly outperforms those commercial products in a range of malicious attack scenarios, which demonstrates its practical values in the commercial competition. Part of the work has been successfully transferred and integrated into our security products and we are now happy to open-source the code of prototype implementations. Since we shared our evaluation results with other security vendors, we have received some questions regarding to our work. Here, we would like to share some FAQs and our replies from the discussions. A. Q1: Do you only select the testing samples that may be in flavour your evaluation result? HOLMES is a hunting tool that aims to assist security experts to discover the most critical incidents based on a controlled number of alerts. The 15 testing samples used are totally from our daily threat hunting in the wild by HOLMES, which are not purposely-made just in favor of our design. In fact, the real-world situation of undetected concealed malicious emails is much worst than the proportion shown in our evaluation. However, due to the page limitation, we cannot demonstrate all the testing samples here but only select some representative samples that have made negative effects on our customers. The evaluation result in Fig 3 is mainly to show the proportion of high-risk malicious emails, which still cannot be discovered by the other security vendors with traditional methods. B. Q2: Why do you not measure the overall FPR and TNR? Following Question Q1, the purpose of developing HOLMES is to discover those unknown threats that fail to be detected by the traditional email-body based tools . The main reason why we do not measure the FPR and TNR lies in the way of they are implemented. The tools mentioned in Fig 3 are deployed in the cloud, which requires to universally examine the malicious emails from a range of different sources. The diversity does not allow the detectors to generate too many false positives, leading to the poor DR on unknown threats. But HOLMES is deployed locally in a specific enterprise environment, which enables its detection capability to be closely associated with the enterprise business in a self-adaptive learning way. Therefore, the FPR and TNR are dynamically changed according to the environment, which is difficult to be measured. In addition, based on the feedback of our incident response team, the number of suspicious emails reported by HOLMES in a single enterprise usually keeps a stable and controlled number (around 100 to 1,000 based on the scale of enterprise), where they can analyze all the suspicious events in less than 10 minutes, in which situation the FPR and TNR are not the important metrics, but the value of no missing threats is utmost to them and our customers. Many work can be found in the literature related to anomaly detection in the email security area. However, most of previous approaches are implemented on a set of emails that are extracted from a well-defined file format (such as the EML format). But HOLMES detects the email header from the stream of network packets in real-time, in which it is difficult to conduct several evaluations with other academic works under the same conditions. In addition, we previously measured a number of academic works [23] , [24] related to network security (not limited to email) that reported ideal results in their evaluations. However, due to the diversity and complexity of network environments, their performance are significantly dropped down under the real-world network environments, and some of them are even worst than the traditional methods that they compared within the literature. Therefore, we more tend to focus on a fact [25] from the cyber security landscape but not only strictly rely on some academic metrics, which is discovering more concealed threats under a controlled circumstance is much more valuable than 1% increment on metrics. Security by any other name: On the effectiveness of provider based email security Detecting and characterizing lateral phishing at scale Large-scale automatic classification of phishing pages Sender policy framework (spf) for authorizing use of domains in e-mail, version 1 Domainkeys identified mail (dkim) signatures Domain-based message authentication, reporting, and conformance (dmarc)," ser. RFC7489 Composition kills: A case study of email sender authentication Densely connected residual network for attack recognition Pelican: A deep residual network for network intrusion detection Lunet: A deep neural network for network intrusion detection Tactical provenance analysis for endpoint detection and response systems Categorical features transformation with compact one-hot encoder for fraud detection in distributed environment Similarity encoding for learning with dirty categorical variables Understanding bag-of-words model: a statistical framework Distributed representations of sentences and documents Loop: local outlier probabilities Anomaly-based network intrusion detection: Techniques, systems and challenges Applying machine learning and natural language processing to detect phishing email Outside the closed world: On using machine learning for network intrusion detection