key: cord-0045631-pzdb6fg7 authors: Wang, Siyuan; Gao, Yue; Shi, Jinqiao; Wang, Xuebin; Zhao, Can; Yin, Zelin title: Look Deep into the New Deep Network: A Measurement Study on the ZeroNet date: 2020-05-26 journal: Computational Science - ICCS 2020 DOI: 10.1007/978-3-030-50371-0_44 sha: 325f5b9d808d9d358ae5ab5e316aca1cdd96f480 doc_id: 45631 cord_uid: pzdb6fg7 ZeroNet is a new decentralized web-like network of peer-to-peer users created in 2015. ZeroNet, called an open, free and robust network, has attracted a rising number of users. Most of P2P networks are applied into file-sharing systems or data-storage system and the characteristics of these networks are widely investigated. However, there are obvious differences between ZeroNet and conventional P2P networks. Existing researches rarely involve ZeroNet and the characteristics and robustness of ZeroNet are unknown. To tackle the aforementioned problem, the present study measures the ZeroNet peer resources and site resources separately, and at the same time, proposes collection methods for both. No like other simulation experiments, the experiments of this paper are set on real-world environment. This is also the first measurement study about ZeroNet. Experimental results show that the topology of the peer network in ZeroNet has scarce edges, short distances and low clustering coefficients, and its degree distribution exhibits a special distribution. These indicate that the peer network of ZeroNet has poor robustness and the experimental results of the ZeroNet resilience verify this issue. In addition, this paper represents an improved peer exchange method to enhance the robustness of the ZeroNet. We also measure the topology characteristics, languages, sizes and versions of the sites in ZeroNet. We find that the size of the sites and the client version are also the reasons for the low robustness of the ZeroNet. With BitTorrent-based architecture and Bitcoin-cryptography-based account [3] , Zeronet has the characteristics of no single point of failure, uncensorable, and free [2] . Recent years, ZeroNet, with its advantages, has attracted a rising number of users. Current studies of ZeroNet is lacking, but the P2P networks have been extensively studied. [6, 10, 12, 16] study the P2P network from the perspective of topology, while [5, 7, 9] paid more attention to the robustness of the P2P network. However, there are significant differences between the ZeroNet and traditional P2P networks. Most of P2P networks are applied into file-sharing systems or data-storage systems, but ZeroNet is essentially a site publishing platform. Sites, which shared in ZeroNet, are small and connected. It is different from conventional P2P network and it may cause changes in the topology and robustness. To the best of our knowledge, there is no relevant measurement on ZeroNet yet. The goal of this paper is to track the aforementioned problems. In this work, the communication protocol of ZeroNet was analyzed and the protocol-based resource collection method was proposed. On this foundation, we model and analyze the peers network and sites network separately. For the peers network of ZeroNet, three important properties are taken into consideration: the degree distribution, small world characteristics and the graph resilience to failures or targeted attacks. For the sites network, in addition to the above characteristics, languages, sizes and versions of the sites in ZeroNet are also measured. The contribution of this paper can be summarized as follows: -By analyzing the communication protocol of ZeroNet, the protocol-based resource collection method was proposed. On this basis, we estimate the number of peers and sites in the whole network. -The present study finds that the peer network in ZeroNet has scarce edges, short distances and low clustering coefficients, and its degree distribution unlike other P2P networks exhibits irregular distribution. This is closely related to the peer exchange mechanism and site size of the ZeroNet and they reduce the robustness of the ZeroNet. -An improved node exchange method is proposed in this paper which has the ability to increase the site access success rate. -This study also obtains the topology characteristic, language, version and size distribution of the site network in ZeroNet. Experimental results show the language distribution of ZeroNet is basically the same as that of Dark Web and Surface Web. The small size of sites and low client version are also the reasons for the low robustness of the ZeroNet. ZeroNet, created in 2015, is a decentralized network similar to website and composed of peer-to-peer users. ZeroNet is a BitTorrent-based structure, which also transfers site files in a similar manner. ZeroNet is essentially a site publishing platform. Sites can be accessed through an ordinary web browser where ZeroNet application acts as a local webhost for such pages. In addition, trackers and peer exchange(PEX) message from the BitTorrent network are used by ZeroNet to negotiate connections between peers. There are three fundamental components in ZeroNet network: peer, tracker and site. -peer: Each peer refers to a running zero network client (differentiated by ip and port). -tracker: A ZeroNet tracker is a special type of server that assists in the communication between peers using the BitTorrent protocol. Tracker keeps track of where site copies reside on peer machines, which ones are available at time of the client request, and helps coordinate efficient transmission and reassembly of the copied file. -site: Sites in ZeroNet are like files in the BitTorrent network. The client obtains the peers holding the site from the tracker or PEX. Then download the site files from the obtained peers for browsing. Instead of having an IP address, sites are identified by a public key (specifically a bitcoin address). The private key allows the owner of a site to sign and publish change, which propagate through the network. There are four crucial protocols in ZeroNet network: PEX, Announce, ListModify and StreamFile. -PEX: Peer exchange(PEX) is a protocol which is a part of ZeroNet site file sharing protocol. It allows a group of peers that are collaborating to share a given file to do so more swiftly and efficiently. PEX greatly reduces the reliance of peers on a tracker by allowing each peer to directly update others in the swarm as to which peers are currently in the swarm. -Announce: Announce is used to request peers of site from tracker server. When the client sends an Announce message to the tracker server, the tracker will return to the client other nodes that it knows are holding the site and record the client as the site holder. -ListModify: The ListModify protocol can check the update status of the site without downloading the site. When the node receives the ListModify message, it will return the latest update time of the site. -StreamFile: Zeronet utilizes the StreamFile protocol to download site files. When the peer receives the StreamFile message, it will transmit the site file data over the TCP link. A lot of measurement studies are performed on P2P network [13] , such as [11, 14] etc. [6, 10, 12, 16] conducted researches on the topology of P2P network. [6, 16] measure the typology of BitTorrent in real and simulated environments, respectively, [12] detects peers and statistics the proportion of nodes contributing to the Ethereum and [10] measures the extent of decentralization of Bitcoin and Ethereum. However, The purpose and use of Zeronet is completely different from them. ZeroNet is essentially a site publishing platform rather than a file sharing system. The files shared in ZeroNet are sites which have a very small size and it may cause changes in the topology. [5, 7, 9] analyzed the resilience and robustness of the P2P network. [9] analyzes the robustness of Bitcoin under the AS-level attack. [7] verify the influence of sybil attack to P2P network. [5] measures the impact of neighbor selection to the performance and resilience of P2P. ZeroNet has attracted a large number of users due to its declared high robustness, but its true robustness is unknown. Based on BitTorrent architecture, sites in ZeroNet are censor-resistant, studies related to the dark web [4] is also very relevant to the ZeroNet. For example, the most relevant work is [15] which crawls the sites in Tor [8] and measures some of the properties and privacy of the sites in Tor. This section gives a brief description of the work of peer resources. The work is mainly divided into two parts: the peer resource collection and the topology analysis. By the experiment of resilience, we find that the robustness of ZeroNet is poor. In addition, the improved peer exchange method is proposed. By the way of deploying a collection system, we have collected most of peers in ZeroNet. The peer coverage ratio and the average connection number also be calculated to verify that the vast majority of peers in ZeroNet are detected by us. Figure 1 depicts the structure of peer network in ZeroNet. The larger the size of the nodes in the graph, the more neighbors there are. Resource Collection Method. For the sake of collecting the peers of ZeroNet as much as possible, our collection system comes up with two components: active part and passive part. The active part is faster and more targeted, while passive part has the advantages of avoiding the extra network burden and discovering nodes behind NATs. The active part contains obtaining peer information from trackers by Announce messages and from peers by PEX messages. Each epoch every site will be utilized to construct PEX message and Announce message to obtain peers from other peers and trackers. The passive part is deployed by encapsulating modified ZeroNet client into docker container, so that one computer can run multiple ZeroNet clients. Modified ZeroNet clients will record the peers who build connections with us. Resource collection runs from November 11th to 13th. 2 active-collecting machines and 15 passive-collecting machines with 90 passive clients are deployed. Coverage Ratio. The peer coverage ratio p(n) is the collection system detecting the percentage of peers and it represents the discovering ability of the system. Borrowing from the method used in a, p(n) is evaluated as follows: (1) Where C(m) is the total number of distinct peers discovered after mth epoches, ΔC is the number of newly discovered peers and N is the number of all distinct peers (identified by the IP address and the listening port). Time cost of one epoch is 5 s. As shown in Fig. 2 , after 40 epoches, the number of new peers and distinct peers discovered by the server per epoch is basically the same. The collection system averagely finds 303.74 distinct peers and 1.33 of them are new, so it can be estimated that the peer coverage ratio is nearly 99.56% and snapshot is taken every 100 s. This study also compares the number of the average connections of each peer with the modified ZeroNet client deployed by us, we find the numbers are similar. In this part, we select peer networks of four sites to perform their respective topological measurements and these sites have different sizes and different types. Some of their basic properties are shown in Table 1 . The topology of the ZeroNet is represented by an undirected graph G = (V, E). V denotes the set of ZeroNet peers and E denotes the set of connections of peers. For example, (u, v) ∈ E means that peer u in the peer list of peer v. Degree distribution, small word and resilience, three topological properties of peer network in ZeroNet are measured. Degree Distribution. Degree distribution is one of the most important metrics which represents the structure of peer network in ZeroNet. Existing research found degree distribution usually have close relation with the resilience and routing latency of the network. The degree distribution of the peer cluster is shown in Fig. 3 where x-axis is the node rank in descending order of degrees and y-axis is the degree. Unlike most P2P networks, measured degree distribution of peer cluster is not very Small World. Small world property is characterized by dense local clustering or cliquishness of connections between neighboring nodes. Most of P2P networks have been verified to be a small-world, because PEX or PEX-similar Mechanism helps peers know each other within a short distance. Our goal is to verify whether peer network in ZeroNet exhibits small-world properties though its PEX mechanism is weak. Though all connections of peer network in ZeroNet can't be obtained, we also can get an approximate value of the shortest path length L between peers (most of peers and connections are detected). where dis ij is the length of shortest path between node i and node j. The clustering coefficient cof is calculated by the Eq. (5) and Eq. (6): where d v is the count of the neighbors of node v, E v is the count of connections between the the neighbors of node v. In order to clearly measure the level of the average shortest distance and clustering coefficient of peer network in ZeroNet, an analogous network with the same count of nodes and the same count of edges was simulated. The calculating results of these networks are revealed in Table 1 . The peer network in ZeroNet has a short peer distance and high clustering coefficient which are orders of magnitude larger than analogous random networks. So it is a small-world network. Resilience. For a robust network, when some nodes leave the network, the network should still maintain good connectivity. The resilience of peer network in ZeroNet is examined in two different ways of node removal: random removal, and particularly removing the highest-degree nodes first. Figure 4 shows the proportion of remaining nodes in the largest connected component of four peer networks when facing random deletion and targeted deletion modes. This figure clearly illustrates peer networks in ZeroNet are extremely robust to random peer removals (after removing 22-55% peers). However, when we particularly remove the highest-degree nodes first, it quickly becomes very fragmented (after removing 8-22% peers). This means that the resilience of the ZeroNet is very poor. The peer network is centered on a small number of nodes. When these nodes leave the network, it will seriously affect the robustness of the network. We can also find that the larger the cluster network, the stronger the resilience of the cluster, on the contrary, the weak resilience. Through the analysis of the characteristics of the peer networks in ZeroNet, we find that the degree of some nodes in the ZeroNet is very low, the network is highly centralized and has significant small-world characteristics. The possible reasons for the current characteristics of ZeroNet are as follows: (1) As the presence of NATs and firewall, lots of peers can't be connected. There are few connections between them. (2) No DHT in ZeroNet and high dependence on tracker. No DHT has resulted in a single form and inadequate of node exchange in ZeroNet. As tracker in ZeroNet can also be a peer, a large number of sites can be downloaded from these peers without PEX operation to get more peers. So node network is obviously centered on tracker. (3) Different from the file sharing P2P network, the file resources in ZeroNet are very small (see the site resource measurement and analysis section). When a user visits a site, the site download time is very short and the seeders required are very small. So the connections between nodes are sparse. If more online nodes can be found, the success of ZeroNet visits can be significantly improved, thus improving the robustness of ZeroNet. This paper puts forward an improved peer exchange mechanism for ZeroNet client. By similar methods, we have collected far more sites than clients can access. We start a daemon thread to continuously perform PEX with online nodes, so as to make up for the deficiency of ZeroNet native PEX operations. Specific steps are to start a daemon thread and do the following: (1) Collect all currently online nodes. (2) Use the most popular sites to PEX with all online nodes to get new nodes. (3) Verify that the nodes are found to be linear. (4) Repeat (2), (3) . In this way, the importance of trackers in the ZeroNet is weakened. Experiments have found that even if trackers are blocked, we can still download most of the sites in ZeroNet by crawler with improved peer exchange mechanism. Therefore, our method can improve the connectivity of the network and enhance the robustness of the peer network in ZeroNet. Sites in ZeroNet are equivalent to files in BT network. However, there is link relationship between sites that files in BT don't have. In this section, we will introduce how to collect site resources and the characteristic of sites in ZeroNet. Figure 5 depicts the typology of site network in ZeroNet. The more connections to other ZeroNet sites there are, the larger the size of the site in the graph. The crawler is specifically designed according to the ZeroNet protocol, which is fast and light than regular Dark Web crawler [15] . Furthermore, we check for site updates via ListModify messages without downloading the site. The sites download from the nodes found by peer collection module through the StreamFile message. When a new site or changed site is downloaded, we will retrieve the new ZeroNet site from it. Experiment lasted from November 1, 2019 to December 1, 2019. More than 14000 sites in ZeroNet and downloaded more than 1300 online sites were found by using the above method. From the crawl results, it can be seen that even if the characteristics of ZeroNet make website always online (as long as the node downloading the website is online), there are still a large number of which cannot be accessed. One of the most important reasons is that a large number of sites are created, and very few sites can be popular. These sites are hardly visited and there will be no peers serving them. Coverage Ratio. The most important way to publish a new site in ZeroNet is by publishing it to the popular sites. [15] collects site data by using a headless browser to simulate user clicks which can collect all sites that clients can access. We use both the method in [15] and our proposed site collection method for resource collection. 98.7% of the sites collected by [15] is covered by our method, and 45.3% of the sites covered by our method cannot be collected. The sites collected by our proposed method can cover 98.7% of the sites collected by [15] and an additional 45.3% sites which cannot be collected by method [15] are collected. Existing data can not give the exact number of current coverage ratio, but we have verified that our site collection method can collect far more sites than the ZeroNet client can visit. Figure 6 plots the distinct sites and new sites found by our crawler. After 10 days of data collection, the number of stations found daily has stabilized. Our collection method can collect most of online site. Similar to the peer network topology measurement, we used the same method to measure the topological characteristics of the site network: degree distribution and small-world characteristics. The resilience of site network is not measured because it is meaningless, because most popular sites will not be offline, as long as one peer holding the site is online. Undirected graph G = (V, E) of site The measurement results show that ZeroNet has characteristics of the small world which is similar to the surface network and the dark network. The degree distribution of site network is shown in Fig. 7 and the small world property is shown in Table 2 . Except for the end of the curve, the site degree distribution shows a clear power-law distribution and a large number of sites without external links exist in ZeroNet. The possible reason is that the cost of creating a site in ZeroNet is low, no server costs are required, and a large number of humble sites are created (only a "Hello!" in pages). By comparison with the random graph, it is found that site network in ZeroNet has a high clustering coefficient and a short average shortest distance. Especially the clustering coefficient is many orders of magnitude higher than the random graph. Hence, site network in ZeroNet is obviously centralized and has small world property. We also verify the type of central site and found that most sites of them are navigation sites. Because ZeroNet is based on the BT architecture, popular sites in Zeronet will not go offline due to a single point of failure. So the centralization of the site network will not reduce the robustness of the ZeroNet. In this subsection, we measured the distribution of languages, versions and sizes within the active ZeroNet sites. Languages. This study also measured the distribution of languages of the active ZeroNet sites. We use the same method as [15] by Google Translate API to get the language type of the web pages. Each website may be classified into multiple languages. We found the sites of ZeroNet in 58 different languages. Among them, English, Russian, Chinese, German, French and Spanish are the most used languages, accounting for 66.2%, 17.3%, 5.5%, 3.4%, 3.1% and 2.3% respectively. Except that Chinese sites account for a relatively high proportion, the rankings of the other five languages are basically consistent with the surface network [1] and the dark web [15] . Version and Size. In this subsection, unique characteristics of ZeroNet Sites are measured, such as the versions of the site and the distribution of site sizes in ZeroNet. Version is the client version which site was published in and site size is the sum of all files in the site directory. Version and size information of sites in ZeroNet are obtained by the online site file content.json download by crawler. As of December 1, 2019, the latest version of ZeroNet is 0.7.1. ZeroNet client has no automatic update mechanism. The top site-publish versions are shown in Fig. 7 . 77.6% sites are published by the client with lower than 0.6. * version which suggests most popular and online sites are created long time ago. Only 12.2% sites are published by newest client. Even if the new version of the client of ZeroNet has made a lot of optimizations, there are still a large number of users passively updating the client. This may have a bad influence on the robustness of ZeroNet (Fig. 8) . The distribution of site sizes in ZeroNet is as follows, 95.2% sites are less than 10 MB, 3.8% sites are less than 100 MB and greater than 10 MB, 1.0% sites are greater than 100 MB. Most of the ZeroNet sites are smaller than 10 MB, which is also caused by the default site size limit (10 MB). The small site size makes the site download time very short, and the PEX between peers will be insufficient which also reduces the robustness of ZeroNet. In this paper, we first find the resource collection method of ZeroNet. On this basis, we measure and analyze the peer network and site network of the ZeroNet. The experimental results reveal that the peer network of the ZeroNet has obvious small world characteristics, while the robustness of the node network of the ZeroNet is poor, especially when a large number of high-level nodes leave the network, the connectivity of the network becomes extremely poor. We analyze that this is mainly caused by the tracker and the peer exchange mechanism of ZeroNet. Therefore, we propose an improved peer exchange mechanism, so that even in the face of most trackers unavailable, the ZeroNet can still be visited. This article also fist measure the site of ZeroNet. The results show that the degree distribution of the site network in the ZeroNet is a power law distribution, which also has the characteristics of small worlds. The content of the sites in ZeroNet are also measured from three angles: languages, sizes, and versions. The measurement results show that English, Russian, Chinese, German, French and Spanish are the most popular languages among the ZeroNet sites. The site size of ZeroNet is generally small, and most of the online sites are created using lower ZeroNet client versions. This is also one of the reasons why ZeroNet is less robust. The darknet and the future of content protection Impact of neighbor selection on performance and resilience of structured P2P networks Evolution and enhancement of bittorrent network topologies Defending the Sybil attack in P2P networks: taxonomy, challenges, and a proposal for self-registration Tor: The second-generation onion router Analyzing the deployment of Bitcoin's P2P network under an as-level perspective Decentralization in Bitcoin and Ethereum networks A measurement study of a large-scale P2P IPTV system Measuring Ethereum network peers Peer-to-peer computing On the measurement of P2P file synchronization: Resilio Sync as a case study The onions have eyes: a comprehensive structure and privacy analysis of tor hidden services A measurement study on the topologies of BitTorrent networks