key: cord-0541949-a8jzhcvc authors: Lykousas, Nikolaos; Koutsokostas, Vasilios; Casino, Fran; Patsakis, Constantinos title: The Cynicism of Modern Cybercrime: Automating the Analysis of Surface Web Marketplaces date: 2021-05-25 journal: nan DOI: nan sha: 7b8b8ec7635f385dc10a06293d176d48ac8ff4d6 doc_id: 541949 cord_uid: a8jzhcvc Cybercrime is continuously growing in numbers and becoming more sophisticated. Currently, there are various monetisation and money laundering methods, creating a huge, underground economy worldwide. A clear indicator of these activities is online marketplaces which allow cybercriminals to trade their stolen assets and services. While traditionally these marketplaces are available through the dark web, several of them have emerged in the surface web. In this work, we perform a longitudinal analysis of a surface web marketplace. The information was collected through targeted web scrapping that allowed us to identify hundreds of merchants' profiles for the most widely used surface web marketplaces. In this regard, we discuss the products traded in these markets, their prices, their availability, and the exchange currency. This analysis is performed in an automated way through a machine learning-based pipeline, allowing us to quickly and accurately extract the needed information. The outcomes of our analysis evince that illegal practices are leveraged in surface marketplaces and that there are not effective mechanisms towards their takedown at the time of writing. that there is some "control" on who can access this information and to retain the anonymity of the perpetrators. However, should this information be openly disseminated in public channels, it implies that the promoted behaviour is widely practised and is considered a norm by some groups. In the past few years, there has been a significant increase in reported data leaks, online extortion schemes and credential trading. One of our initial research questions was whether such actions are so widely performed that they can be observed on the surface web. In this regard, we wanted to check whether the perpetrators were using platforms of the surface web to advertise their "loot" and the existence of marketplaces in the surface web. Currently, there are several such marketplaces operating with similar functionality; however, this work is mainly focused on Shoppy (https://shoppy.gg/) which appears to have the most users and products at the time of writing. Nonetheless, similar illicit trends have been found in the rest of the surface web marketplaces. The goal of this work is to provide an overview of what is actually being sold in such a marketplace, and leverage methods (e.g. machine learning) to automatically determine which are the illegal products and the main organisations affected. The main limitation in the automation of such a task is the lack of text. These sellers do not need to add a lot of text about what they trade in these marketplaces, and in many occasions there are typos, abbreviations, and slang, posing even more issues in the analysis of the derived text. Further to the analysis of the traded products, we discuss the modus operandi of the sellers and some insight regarding the pricing of "big leaked data". The rest of this work is structured as follows. First, we provide an overview of the related work and a brief discussion of these marketplaces. Then, we detail our data collection methodology. In Section 4, we analyse the collected data to extract actual knowledge out of the short descriptions of the shops in an automated way. Finally, the article concludes, summarising our contributions and highlighting some ideas for future work. In recent years, there have been multiple incidents of massive data breaches affecting a broad spectrum of online services and service providers, including retailers, payment processors, and government entities [12] . Malicious actors gain internal access to sensitive data sources, and then acquire millions of credit and debit card details, user credentials, as well as sensitive data which can be used to identify individuals uniquely. The sheer quantity of data that can be acquired has given rise to a burgeoning market for actors who sell the information that they obtain, through, e.g. hacking and other forms of data theft, to other users. Participants in these illegally acquired data markets leverage various communication and networking methods, enabling them to freely form communities and interaction mechanisms. The most prevalent forms of such marketplaces, as identified in the literature, are Internet forums and Internet-Relay-Chat (IRC) channels [37, 3] . In particular, forums have been shown to comprise the principal medium for cybercriminals to network, form communities, and operate online stolen data markets, despite numerous successful infiltrations by law enforcement agencies [55] . To a large extent, these marketplaces reside in the dark web, commonly behind Tor [15] , and are referred to as "Darknet markets". Darknet markets are popular among criminals since they enable them to anonymously trade illegal goods and services, extending well beyond stolen data. The latter was discussed by Thomas et al. [51] by pointing out the complex value chain of the underground market economy at scale. These marketplaces comprise the essential pillars of this global-scale cybercrime economy and thus have become the key information source for investigating the cybercriminal ecosystem. An extensive body of literature has explored the darknet marketplaces [54] , the involved stakeholders and their communication patterns [20] , and their modus operandi [53] . Of particular interest to researchers, are the marketplaces dedicated to the sale of stolen personal and financial information, known as "carding forums", where cybercriminals sell the artefacts of large scale data breaches, often containing stolen financial information [37, 25, 18] . Subsequently, the compromised credit and debit card information enables malicious actors to commit crimes such as identity theft, financial fraud, and most importantly, online money laundering [12, 31] . Moreover, due to their illicit and underground nature, carding forums and marketplaces are characterised by unique trading dynamics between vendors and sellers, since the quality of merchandise and the identities of traders are unknown to potential buyers. In this regard, a number of works focus on untangling the mechanics of transactions in this particular kind of underground marketplaces [56, 17, 19, 48] . Identifying key players is essential when investigating emergent threats and developing efficient disruption strategies [34] , in particular, considering the fact that members of such communities are characterised by cross-forum posting activity, which can be used to identify user roles based on the type of posts and their frequencies. The indicators of trustworthiness and reputation of a seller play a pivotal role for the sale of illicit services and stolen data through underground hacking forums and markets, as users are more likely to conduct business with sellers who hold reputable standing. As such, it has been shown that the status of reputation can be used to identify prominent players in illicit online marketplaces [56] . Apart from the complexity of its dynamics, the ecosystem of illegal online markets is characterised by an equally wide range of offerings, services and products relevant for a variety of illicit topics, such as underground drug economies, data breaches, and cyber warfare. For instance, the work proposed in [44] focuses on malicious assets traded in hacker forums, such as hacking tools, rootkits and exploits, from which cyber threat intelligence can be distilled, and analyses the extracted data for predicting and mitigating cyber attacks. In [29] , the authors provide both quantitative and qualitative categorisation of offerings in 17 different marketplaces. Their findings indicate the existence of both highly specialised products with respect to particular vendors and markets, as well as the crosslisting of products on multiple sites and nearly identical products for sale by multiple vendors. Nonetheless, the most prevalent categories are related to stolen credentials and information, extending beyond financial accounts. In this direction, Madarie et al. [27] examined how, a diverse set of outlets such as stolen credentials, are disseminated by malicious actors as "account dumps". Their analysis revealed that the illicit dissemination of stolen account credentials covers a broad spectrum of online services, and highlighted Pastebin 3 as one of the main sites used to spread the information. Pastebin is a paste website intended for sharing plain text snippets, which is radically different compared to the typical marketplaces and forums, due to the complete lack of structure. Interestingly enough, while stolen combinations of usernames and passwords for various online services were posted, data thieves used advertisements (embedded within the dumps) for establishing communication with potential customers seeking access to financial or more sensitive data. After thorough literature research, we observed a literature gap focusing on surface web marketplaces and an analysis of some of their activities, including automated methodologies for the extraction and processing of information to discover potentially malicious behaviour. First, we explored two well-known hacking-related forums, namely blackhatworld 4 and cracked 5 looking for indications of the emergence of marketplaces supporting illegal activities, similar to the deep web forums/marketplaces. In this regard, we observed that Shoppy 6 was widely used in such platforms to monetise some of the reported activities. In fact, such activities were advertised. Therefore, we established a methodology to analyse what types of activities and products were being sold in Shoppy. Shoppy is a shop hosting service that provides the opportunity to individual vendors to sell their products, allows payments in different forms, and a set of APIs to, e.g. advertise one's products in forums etc. A crucial difference between Shoppy and the underground marketplaces studied in the literature is that the former does not offer a centralised listing of the sold vendors and products. Each vendor obtains a unique URL where they can host their shop, without providing any means for a user to look for similar shops of products offered by different vendors, a common feature in e-commerce platforms [45] . The decentralised architecture of Shoppy hinders the extraction of knowledge, and thus, our proposed data collection methodology specifically aims to discover shops associated with illicit offerings and services, given the context established by focusing on hacking-related forums. The methodology that we adopted to address this challenge consists of several steps, as depicted in Figure 1 . First, we crawled the blackhatworld and cracked forums, collecting usernames, as well as references to Shoppy accounts in post signatures. Given the size of these two communities, we specifically focused our crawling only to the "Marketplace" forums. To his end, we adopted the architecture of the Structure-driven Incremental Forum crawler (SInFo) [36] , which enabled us to crawl data from the aforementioned forums. Nevertheless, we did not leverage user accounts that could potentially allow us to access even more content, restricted to authenticated users [49] . Next, we examined the extent to which the collected usernames and Shoppy account data could be correlated with existing shops in the Shoppy ecosystem. The data collection process lasted from March to April of 2020. We collected a total of 68,045 usernames, and Shoppy links from forum post signatures, 2,906 of which were linked to existing Shoppy shops at the time of crawling. The results are summarised in Table 1 . Notably, a large fraction of the links to Shoppy accounts found in post signatures, that did not resolve to existing shops, indicating that accounts in Shoppy may be banned, deleted, or renamed. With the collected data, we used the open Shoppy API to retrieve all the information associated with these shops, including products, prices, and their corresponding metadata to create a curated dataset. In the following sections, we explore the Shoppy data in different steps. First, we provide a quantitative review of the collected dataset. Next, we detail our topic modelling approach and, finally, we leverage an exploratory analysis of a subset of the surface data. In this section, we provide a quantitative analysis of the collected Shoppy shops and advertised products, as well as highlight the particular behaviours of vendors. In total, our dataset contains 64,726 products advertised by 2,906 vendors. Shoppy provides vendors with the ability to categorise their products as accounts, services or files. The distribution of product categories in our dataset is provided in Table 2 . "Account" is the default category, which evidently dominates the other two by a large margin. (Fig 2a) and the product prices, in USD (Fig 2b) . We can observe that, while around 40% of the shops have less than ten items listed, there exist shops with thousands of items. As seen in Fig 2b, the price distribution of products is remarkably well described by a lognormal distribution (µ = 1.6, σ = 1.58), highlighting that the prices of approximately 62% of the products fall within a small range comprised between 1 to 10$. Moreover, the median price of Shoppy products is 5 USD, and, as observed in our dataset, the prices can reach up to 10,000 USD. To get deeper insight on how different types of products are priced, focusing on possible outliers priced well above the median of 5 USD, we bin the products based on their price and we calculate the fractions of each product type in each bin in Figure 3 . We can observe that while the lowest price bin is dominated by accounts, the fractions of services per bin follow a consistently increasing trend as the prices increase. In contrast, the relative representation of accounts is inversely proportional to the price, ultimately making services the predominant product category (approx. 70% of total) for the last bin reflecting the highest-priced offerings (≥ 500$). The fractions of the file type products, which as previously shown comprise only a small fraction of the total offerings, are generally sustained, accounting Table 3 : Some illustrative false "services", priced ≥ 500$. for less than 10% of the products in each bin. It is worth to note that our initial observation related with the use of default categories is reflected in Figure 3 , which shows that, for instance, account products are well represented within all the range of possible prices. The latter behaviour seems quite unrealistic in a real and competitive market scenario and is further supported by the experiments performed in the next sections. To investigate the high priced services dominating the upper price bracket, we manually examined them and provided some illustrative examples in Table 3 . Evidently, these items are false products and rather contain information such as merchants' terms of service, notes regarding provided shop feedback, support information, and links to Discord servers and Telegram channels maintained by the merchants. This behaviour has been highlighted in recent literature by arguing that the unregulated and anonymous nature of platforms such as Telegram and Discord, makes them the perfect habitats for scammers and cybercriminals [33, 52] . In this section, we analyse the Shoppy stores and elaborate a topic-based characterisation of the offered products by analysing their titles. To this end, we consider a statistical model, namely "topic model", which is a method well suited to the study of high-level relationships between text documents. Specifically, we leverage Latent Dirichlet Allocation (LDA), a generative probabilistic model proposed in [4] . It comprises an endogenous NLP technique, which as highlighted in [5] "involves the use of machine-learning techniques to perform semantic analysis of a corpus by building structures that approximate concepts from a large set of documents" without relying on any external knowledge base. LDA, as the name implies, is a latent variable model in which each item in a collection (e.g., each text document in a corpus) is modelled as a finite mixture over an underlying set of topics. Each of these topics is characterised by a distribution over item properties (e.g. words). LDA assumes that these properties are exchangeable (i.e. ordering of words is ignored, as in many other "bag of words" approaches in text modelling), and that the properties of each document are observable (e.g. the words in each document are known). The word distribution for each topic and the topic distribution for each document are unobserved; they are learned from the data. Since LDA is an unsupervised topic modelling method, there is no direct measure to identify the optimal number of topics to include in a model. In this sense, LDA assigns documents to different clusters of topics with certain probabilities (i.e. the number of clusters is defined with an integer number k provided by the user), where these probabilities depend on the occurrence of words which are assumed to co-occur in documents belonging to the same topic (Dirichlet prior assumption). This exemplifies the main idea behind all unsupervised topic models, that language is organised by latent dimensions that actors may not even be aware of [30] . Researchers have recommended various approaches to establish the optimal k (e.g. [6, 1, 13, 42, 57] ). These approaches provide a good range of possible k values that are mathematically plausible. However, according to [14] , when topic modelling is used to identify themes and assist in interpretation (like in the present study), rather than to predict a knowable state or quantity, there is no statistical test for the optimal number of topics or the quality of a solution. A simple way to evaluate topic models is to look at the qualities of each topic and discern whether they are reasonable [30] . To the best of our knowledge, the topic coherence measure with the largest correlation to human interpretability is the C v score defined in [42] , which we also adopt in this study to establish the optimal number of topics. In our setting, we consider as a document the aggregate titles of the offered products in each of the 2906 shops in our dataset. For training LDA models on the generated documents, we employed the implementation provided by Machine Learning for Language Toolkit (MALLET) 7 . To obtain the most coherent topic model for our data, we considered the number of topics (k) within the range from 5 to 50 with a step of 5 and trained the LDA models with 1,000 Gibbs sampling iterations and priors α = 5/k, β = 0.01. For each trained model, we compute the C v (k) metric . This metric combines the indirect cosine measure with the normalised pointwise mutual information (PMI) and the boolean sliding window technique, to determine the number of optimal topic classes according to data distribution [43] . According to Figure 4 , the value yielding the highest C v corresponds to C v (20) = 0.621 and thus, we set the number of topics k to 20. In Table 4 , we present the topics learned by our best LDA model, including the most relevant terms describing each topic and the number of shops where each topic is dominant. To obtain the most descriptive terms for topic interpretation, we adopted the approach of ranking individual terms within topics presented in [46] . To provide an insight on the products sold by the shops classified in each topic, Table 5 includes some indicative examples per topic, with respect to the number of topic-relevant terms contained in their titles. The latter further allows us to characterise each one of the learned topics in a qualitative manner. Topics #1 and #2 describe "premium" accounts for a variety of online services and software products including streaming and VPN services. Topic #3 describes accounts associated with popular restaurants and fast food companies. Topic #4 reflects accounts associated with in-game items and collectables for the popular online game Fortnite. This topic is found to be dominant in most shops, in comparison to the other topics, with 466 occurrences (i.e. 16% of all shops). Although selling game accounts can be perceived as an innocuous activity, provided the context of our data collection, these selling activities could be linked with money laundering schemes, based on the idea of converting stolen money to virtual currencies which are used to purchase in-game items [10, 32] . Topic #5 focuses on OpenBullet configurations. OpenBullet is a brute-forcing tool used for performing credential stuffing attacks against online services [24] , which are described by configuration files "configs", offering features such as checking multiple credentials simultaneously (advertised by metrics such as CPM, standing for "Checks Per Minute") and bypassing rate-limiting. Topic #6 contains several classes associated with a broad spectrum of products ranging from game accounts to hacking and reconnaissance tools such as dorks. Topic #7 includes mainly subscriptions to various sports and video streaming services. Topic #8 highlights accounts, hacking tools and in-game items for the popular video game Minecraft. Topic #9 models the false products previously described (cf Table 3 ), containing information regarding vendor's terms of service and links to external Discord servers, Telegram channels, and etc. Topic #10 includes product licences and keys for a variety of software packages, games and operating systems. Topic #11 describes subscription plans for streaming services, similar to Topic #7. Topic #12 involves accounts for the popular game League of Legends. Topic #13 describes selling leaked user data from security breaches, in the form of combo lists, i.e. combinations of usernames/emails and passwords [28] , which can be used for compromising accounts with the same credentials in other services, by means of credential stuffing attacks, as seen in Topic #5. Topic #14 involves mostly guides and e-books regarding carding and other methods of financial fraud. Topic #15 contains discount codes and accounts containing redeemable credits for various online shops and e-commerce platforms. Topic #16 is closely related to topics #14 and #15 and includes vouchers for online purchases in various venues as well as methods to perform a fraud or to scam sellers. Topic #17 mainly includes subscriptions for online services and products with a focus on mobile apps. Topic #18 is related to serial numbers for computer peripherals such as monitors, keyboards, etc. Topic #19 provides assorted "random" accounts for various social media and sites. Finally, Topic #20 is related to products such as redeemable gift cards, mainly for restaurants and food suppliers. In this section, we focus on the Topics #5 and #13, which as highlighted above, model products related with cybercriminal activities such as selling breached credential dumps and using tools for automating the compromise of accounts in different online services. To this end, we leverage the term-salience metric defined in [9] , which given the set of representative terms per topic, ranks them according to their distinctiveness, i.e. how informative a specific term is for determining the generating topic, versus a randomly-selected term. Subsequently, we select the top-3 most salient terms for topics #5 (config, openbullet, capture) and #13 (combo, database, records), and we use them to query product titles, in order to identify the most prevalent products modelled by these topics. For Topic #13, we additionally include the term db which is a common abbreviation for the term database. As previously reported (Table 5) , Topic #13 models leaked data from online data breaches, which are sold in the form of username/email and password combinations, along with other personal information. Such listings usually advertise the number of the breached records, as well as the source of the leak. In Table 6 we present some of the largest account dumps found in our Shoppy dataset, along with their prices. Indeed, we discovered that popular password breaches checker platforms, such as https://haveibeenpwned.com, list the majority of the account database dumps sold on Shoppy. Moreover, this could explain the relatively low price tag for leaks, including up to millions of records, as the respective breaches have already been made public. In Table 7 , we list some illustrative products with titles including at least one of the selected salient terms for Topic #5. We observe that these products represent configurations for software such as OpenBullet 8 /BlackBullet 9 /Storm 10 . As previously stated, such tools can be used to automate credential stuffing attacks [41] , versus various online services, as shown from the product titles. Sellers of such "configs" often advertise features such as CPM (checks per minute) and capturing functionality offered, i.e. the ability to capture specific information associated with a compromised account, such as saved credit cards and payment methods, reward points, etc. From the above, we can largely infer the modus operandi of the account sellers of Shoppy and other cybercriminal markets: One is able to purchase massive quantities of breached credentials, and by exploiting the password reuse behaviour exhibited by many users [40] , she could compromise users accounts with same credentials in other online services by using credential stuffing tools with different configurations. There are several conclusions that can be extracted from the analysis and the outcomes obtained in the previous sections. First and foremost, we found evidence of malicious activities which are usually taking place in the dark web, yet this time arising on the surface web. In this sense, the cynicism of malicious actors, who are perpetrating these activities, is covered by a lack of methodologies and takedown mechanisms, due to several factors such as, e.g. the decentralised nature of the marketplaces. To the best of our knowledge, this is the first work that provides a solid and automated methodology to find, quantify and classify in a comprehensive way such activities. Nevertheless, despite the promising analysis leveraged in this article, malicious actors always find a way to circumvent analysis, since, e.g. they only use such platforms as a contact point, redirecting all of their activities to other external channels such as Telegram or Discord. Moreover, the use of technologies such as IPFS can augment the possibilities and resilience of such malicious practices [35, 39] . Another dimension to be yet explored is the underlying connection between the activities reported in this article and further criminal campaigns. Therefore, despite the fact that most of the sold products can be classified as 'soft' cybercrime (i.e. passwords, credit card credentials, personal data) they can pose significant damage to individuals and businesses, and they may be just the tip of the iceberg. More concretely, money laundering and the financing of other, probably more dangerous activities, can be just happening in front of our eyes [22, 31, 11] . As previously stated, there exist several challenges for the analysis and takedown of the illegal activities being hosted on such platforms. The decentralised nature of, e.g. Shoppy avoids crawling mechanisms that could be used to collect all the stored information. Moreover, Shoppy is a resilient platform, as well as Sellix and Selly. The latter is supported by the fact that users can just have back their shops easily. As a matter of fact, the activities reported in this article are taking place at the moment of writing without restrictions. The possibility of linking these activities by using novel blockchain platforms is a further issue that needs to be thoroughly explored. First, the immutable nature of blockchain may permit the development of shopping platforms which offer private and permanent selling services [23, 26, 7] . The latter fact is a critical issue due to the lack of efficient erasure mechanisms [2, 38, 8] . We argue that more effort should be devoted to the development of robust AI methods as well as data collection procedures such as the one proposed in this article to locate and quantify the extent of such activities. Moreover, robust investigation protocols and more support from law enforcement towards the prosecution of these activities, as well as legislation related to this phenomena are mandatory. Finally, proactive measures, including strategies such as abnormal behaviour detection and the corresponding mitigation actions should be implemented by design, especially in the cases in which a platform is using immutable architectures. In this article, we showed that most of the activities that are leveraged in the dark web are also taking place on the surface web and yet, no effective mechanisms or takedown measures are taking place. This claim is supported by our thorough analysis of a marketplace, namely Shoppy. First, we collected credentials from two well-known forums, namely cracked and blackhatworld. Next, due to decentralised and anonymous nature of Shoppy, we used such credentials to crawl and retrieve data regarding shops, products and descriptions. Subsequently, we used topic modelling-based analysis to categorise and further explore the collected data by reporting several qualitative and quantitative features. Our findings evince the cybercriminal nature of a myriad of shops and users in the Shoppy ecosystem, supporting our initial claim. Finally, to raise awareness and highlight the relevance of our findings, we discussed the implications of our research, the current challenges and limitations, and proposed some measures to overcome them. Future work will focus on exploring similar marketplaces and trying to find correlations between different platforms in an automated way. Moreover, we plan to analyse the possible links between the activities leveraged in such marketplaces and cryptocurrencies, as well as other widely used financial platforms. On finding the natural number of topics with latent dirichlet allocation: Some observations Redactable blockchain-orrewriting history in bitcoin and friends Exploring threats and vulnerabilities in hacker web: Forums, irc and carding shops Latent dirichlet allocation Jumping nlp curves: A review of natural language processing research A density-based method for adaptive lda model selection An efficient blockchain-based privacy-preserving collaborative filtering architecture Immutability and decentralized storage: An analysis of emerging threats Termite: Visualization techniques for assessing textual topic models In-game currencies, skin gambling, and the persistent threat of money laundering in video games Cybercrime in online gaming Criminals and signals: An assessment of criminal performance in the carding underworld Accurate and effective latent concept modeling for ad hoc information retrieval. Document numérique Exploiting affinities between topic modeling and the sociological perspective on culture: Application to newspaper coverage of us government arts funding Tor: The second-generation onion router Wild wide web consequences of digital fragmentation Understanding Online Carding Forums: On Products, Prices And Sellers All your cards are belong to us: Understanding online carding forums Examining signals of trust in criminal markets online A crime script analysis of the online stolen data market Internet Crime Complaint Center (IC3). 2019 internet crime report Detecting money laundering and terrorism financing activity in second life and world of warcraft A privacy-preserving e-commerce system based on the blockchain technology Game-like captchas for intrusion detection Identifying and profiling key sellers in cyber carding community: Azsecure text mining system A blockchain-based framework of cross-border e-commerce supply chain Stolen account credentials: an empirical comparison of online dissemination on different platforms Gathering and analyzing identity leaks for a proactive warning of affected users Product offerings in malicious hacker markets Differentiating language usage through topic models Cards, money and two hacking forums: An analysis of online money laundering schemes Gaming the system: Money laundering through online games Charting the landscape of online cryptocurrency manipulation Hackers hedging bets: A crosscommunity analysis of three online hacking forums Hydras and IPFS: a decentralised playground for malware Sinfo-structure-driven incremental forum crawler that optimizes usergenerated content retrieval Data breaches: what the underground world of carding reveals Blockchain mutability: Challenges and proposed solutions Delegated content erasure in ipfs Password reuse behavior: how massive online data breaches impacts personal data in web Is credential stuffing the new phishing? Computer Fraud & Security, 2020 Exploring the space of topic coherence measures Exploring the space of topic coherence measures Exploring emerging hacker assets and key hackers for proactive cyber threat intelligence E-commerce recommendation applications. Data mining and knowledge discovery Ldavis: A method for visualizing and interpreting topics Cyberethics: Morality and law in cyberspace. Jones & Bartlett Learning Profiling underground merchants based on network behavior Characterizing activity on the deep and dark web Cybercrime losses: An examination of us manufacturing and the total economy Framing dependencies introduced by underground commoditization A tight scrape: methodological approaches to cybercrime research data collection in adversarial environments You are your photographs: Detecting multiple identities of vendors in the darknet marketplaces The dark net: Self-regulation dynamics of illegal online markets for identities and related services Why forums? an empirical analysis into the facilitating factors of carding forums Trust among cybercriminals? carding forums, uncertainty and implications for policing A heuristic approach to determine an appropriate number of topics in topic modeling This work was supported by the European Commission under the Horizon 2020 Programme (H2020), as part of the projects CyberSec4Europe (Grant Agreement no. 830929) and LOCARD (Grant Agreement no. 832735).The content of this article does not reflect the official opinion of the European Union. Responsibility for the information and views expressed therein lies entirely with the authors.