key: cord-0069018-qu1hf678 authors: Moskovkin, V. M.; Gakhova, N. N.; Nabokov, A. Yu. title: Downloading Articles by Russian Researchers Using the Sci-Hub Resource date: 2021-10-28 journal: Sci DOI: 10.3103/s0147688221030059 sha: ba87316e3ee1a84596e3c54fec565d2be0028d26 doc_id: 69018 cord_uid: qu1hf678 On the basis of the 28 million downloaded articles posted by J. Bohannon and A. Elbakyan on the Internet on the Sci-Hub resource for the period from September 1, 2015 to February 29, 2016, about 1.5 million articles downloaded by Russian researchers were identified. They were distributed by publishing houses of scientific periodicals, cities, and regions of Russia, from which the download took place. As an example, among the 521 cities in Russia, the largest downloads were observed by researchers from Moscow (731 100 articles), St. Petersburg (132 600), Novosibirsk (57 500), Kazan (55 100), and Tomsk (26 400). Comparisons are made with similar downloads of Ukrainian researchers. After the Sci-Hub pirate resource was launched in September 2011, all publications about it were more emotional and journalistic in nature. This topic entered the scientific discourse after John Bohannon and Alexandra Elbakyan, founder of Sci-Hub, posted data on 28 million user requests in Sci-Hub for the period from September 1, 2015 to February 29, 2016 in the public domain [1] . This allowed all interested researchers from around the world to analyze the use of Sci-Hub in their own countries and in specific research areas. John Bohannon found that this resource is used by scientists not only from developing and underdeveloped countries, where access to subscription journals is difficult, but also from developed countries (a quarter of the requests come from OECD countries), which have good access to subscription journals [1] and do not want to sacrifice their comfort by obtaining legal access to them in their scientific libraries, which is confirmed by the polls of Jacques Travis [2] : "17% of the respondents said that accessing the full text through Sci-Hub was easier than through legal channels." He also found that 37% of the respondents were unable to legally access the articles they needed, and 23% chose Sci-Hub because they disagreed with the pricing of major commercial publishers of scientific periodicals. All this was best described by Simon Oxenham in summarizing his interview with Alexandra Elbakyan with the catchy headline "Meet the Robin Hood of science" [3] : "The efficiency of the system is really quite astounding, working far better than the comparatively primitive modes of access given to researchers at top universities, tools that universities must fork out millions of pounds for every year." M. Parkill [4] selected the TOP 100 articles from [1] into the PlumX tool, and determined that most of them were published in 2015, that is, Sci-Hub users prefer to receive the latest articles. Moreover, a large number of articles were devoted to physics, technical sciences, and life sciences. Z. Babutsidze [5] studied arrays of downloaded articles on economic topics [1] D. Himmelstein et al. [8] found that Sci-Hub provides free access to more than 85% of the scientific articles from subscription journals, as well as to 97% of the articles from Elsevier, which, as we know, has repeatedly sued this pirated resource. S. Nazarovets [9] used the data of [1] to obtain the distribution of articles downloaded by Ukrainian researchers by publishing houses and regions; he identified the main areas of knowledge that correspond to these articles (chemistry, physics, and astronomy accounted for 69% of the articles; medical and pharmaceutical sciences, 13%; life sciences, 12%; and social sciences, 6%) and the most common journals (Journal of the American Chemical Society, 6769 articles; Organic Chemistry, 6038; Physical Review B, 4325; and Medicinal Chemistry, 3712 articles). In [10] , using the access of the University Association for Contemporary European Studies (UACES) to European Studies journals, journals with IF (WoS) > 1 were selected. Their analysis together with the data on the download of articles from [1] revealed that readers are mainly interested in issues related to populism, extremism, and the economic crisis. According to the data of the same work [1] , D. Androćec [11] studied publications in the field of computer science, which turned out to be 5.95% of the total number of publications, and cited the 20 most popular articles. The first five countries whose researchers downloaded articles on the sciences were India, Iran, China, United States, and Indonesia. Russia was in seventh place on this ranking list with 46659 articles. B. Greshake [12] showed that, out of 62 million articles pirated through Sci-Hub, 80% are from nine publishers. We present an overview of publications (with the exception of article [9] ), for 2016-2017, based on the empirical basis of work [1] . However, in addition to the statistical analysis of articles downloaded from the Sci-Hub, research was conducted in parallel by surveys of users of this pirated resource. We only note work [13] , which describes the results of the largescale Early career researchers (ECRs) project, which motivated 106 young researchers from seven countries (Great Britain, Israel, Spain, China, Malaysia, Poland, and France) to use Sci-Hub. These researchers were interviewed annually for 3 years. It was shown that the popularity of Sci-Hub was growing: in 2016 this resource was used by 6% of the project participants, in 2018 it was used by 25%. It was most popular among young researchers in France. It was also shown that Sci-Hub is heavily blocked in China, but it has its own pirate resource 91lib.com. Even if university libraries are well stocked with subscriptions to scholarly periodicals, Sci-Hub is preferred for convenience over licensed access through the libraries. It is noted that the ResearchGate network was used by 75% of the project participants. One of the most recent surveys of researchers and students about their dependence on Sci-Hub was published in early January 2021 on the Indian SpicyIP repository of blogs on intellectual property and innovation policy [14] . From December 22, 2020 to January 2, 2021, 212 respondents were interviewed, of which 140 (66%) strongly depended on Sci-Hub on a ten-point scale (8-10 points). Before the COVID-19 pandemic, 51.9% of respondents preferred to receive articles through their libraries (48.1% through Sci-Hub), while during the pandemic, this ratio changed in favor of Sci-Hub (164 respondents or 77.3% strongly depended on Sci-Hub to access paid resources). In conclusion to our review, we note that the articles downloaded from Sci-Hub are cited 2.21 times more often than those not downloaded from this resource [15] . This review, including all articles identified through Google Scholar, has shown that there is no research into downloading pirated articles from Sci-Hub by Russian researchers. Here, we try to fill this gap. Data of work [1] consist of 6 files with the extension "*.tab;" files with the extension.tab; each of them reflects the requests of users for a certain period. The files contain • the date and time of the request; • DOI identifier, which includes the code of the publisher and the code of a specific article in the journal, generated by the publisher; • the user's IP address; • the name of the country; • the city name; • the geographic coordinates, latitude and longitude. Along with the data of six files, a file of articles in the CSV format was downloaded, which contains ▪ the name of the publisher; ▪ publisher prefix; ▪ the date of the last save; ▪ the date of the last request. To obtain the results, only requests from Russian IP addresses were selected. Using the PyCharm development environment and the Python programming language, the source files were processed and the results of the downloading of articles by Russian researchers were obtained. When processing the source file of articles, it turned out that if the names of publishers are selected by prefix, the number of downloaded articles will be 1780431, which does not correspond to the number of downloaded articles by cities of Russia, equal 1521434. The discrepancy is due to duplicate publish-ing lines in the original file. When a file with initial data on the number of downloaded articles is processed and the names of publications are found by prefixes, then the union of two frame dates is used, similar to join in SQL. Thus, duplicate lines are also counted and this results in an extra number of articles. After removing duplication, the number of articles with Russian IP-addresses was 1521434. When processing the data, it was also noted that the total number of downloaded articles by country is not equal to the total number of downloaded articles by city. The reason lies in the source files: some of the data lines are missing the name of the city, instead of this N/A occurs. The number of lines with this value was counted; it was 29264. Thus, 1492170 lines were analyzed, which corresponds to the number of articles downloaded in Russia. We present the results of processing the data of [1] on the distribution of downloaded articles by publishing houses, cities, and regions of Russia. Table 1 shows a ranked list of publishers with at least 900 downloads of these articles. Table 1 data were compared with similar results for Ukraine obtained by Sergei Nazarovets [9] . To do this, we combined data on Springer-Verlag and the Nature Publishing Group, receiving a total of 206153 articles, and data on Wiley Blackwell (Blackwell Publishing) and Wiley Blackwell (John Wiley & Sons), receiving a total of 120391 articles. For the five leading publishers with the largest number of their articles downloaded by Russian researchers, we get the following excess over the downloads of articles by Ukrainian researchers: Elsevier, 4.3; Springer Nature, 4.5; Wiley Blackwell, 4.2; American Chemical Society, 3.5; Institute of Electrical and Electronics Engineers, 6.0. The list of leading publishers whose articles were downloaded was approximately the same for researchers in both countries. In the process of data processing, 521 cities and settlements were identified, while in the last 35 cities one download was observed for the entire 6-month period. Among them are cities that are well known: Tuapse, Derbent, Mozdok, Nazran, and Pizhma. Table 2 provides information on the top 100 cities. Comparing the data in Table 2 with the data of [9] , it can be seen that Moscow is 3.9 times ahead of Kiev in the downloading of articles, although Kiev has more downloaded articles per capita than Moscow (64 versus 60 per thousand people). The first cities in both countries are ahead of the second cities in terms of downloads by approximately the same number of times (5.1 and 5.2). The slight difference in the downloading of articles for Moscow and St. Petersburg as regions (subjects) of the Russian Federation from the downloading for them as cities is due to the fact that their regions include small cities, such as Lomonosov and Peterhof for the St. Petersburg region (Table 3) . In comparison with the Ukrainian situation [9] , the third largest Ukrainian region in terms of the number of pirated downloadings, the Kharkiv region [9] , is inferior in this indicator, with the exception of the first two Russian cities, only to Moscow and Novosibirsk regions, as well as the Republic of Tatarstan. On the basis of a large array of 28 million articles highlighted in [1] from the Sci-Hub resource, we identified publications pirated by Russian researchers. These publications are distributed among publishing houses, as well as cities and regions of Russia. Their first triplets looked like this: Elsevier, Springer-Verlag, American Chemical Society; Moscow, St. Petersburg, Novosibirsk; Moscow, St. Petersburg as subjects of the Russian Federation, and the Moscow region. We plan to continue processing the data by defining the distribution of the selected articles by field of research, as well as by journal. It would be relevant, in our opinion, to select data from Sci-Hub at the present time, for example, from September 1, 2021 to February 29, 2022, in order to get exactly a 6-year time interval relative to previous samples. There will then be an understanding of what kind of scientific information Russian researchers need. Here are a few general thoughts on this phenomenon and its relationship to the open access movement. Paper [12] concluded that, despite the growth of Open Access, illegal access to scientific articles is becoming more widespread. For the 6-month period considered above, the scientists of Madrid, Barcelona, and Valencia downloaded, respectively, 98143, 78535, and 26634 articles, while for the whole of 2017 they have downloaded 868322, 488101 and 215690 articles [16] . Thus, in terms of an annual period, the increase in pirate takings in these cities only a year later occurred by 4.4, 3.1, and 8.1 times. The same is occurring all over the world. Enthusiasts of the Open Access movement worked hard towards their goal, and 11-12 years after the launch of this movement, one single, but even greater, enthusiast instantly opened almost 100% access to scientific publications. This access can be called the Black Open Access Revolution. The young student of communist views brought all commercial publishers to their knees and caught government officials around the world by surprise. None of their lawsuits and no government bans are in force here. Publishers have not felt any losses yet, since those who could get it legally, as well as scientists from underdeveloped countries, whose scientific organizations do not have money to access their content, receive illegal content. But they will soon feel it when scientific libraries begin to eliminate subscriptions, SCIENTIFIC which will become unnecessary. This will serve well for the legal Open Access movement, because it will accelerate the transition of commercial subscription magazine publishers to the open access model; they will go bankrupt otherwise. When this happens, then the Sci-Hub pirate project will die out by itself, as Alexandra Elbakyan herself wrote. Data from: Who's downloading pirated papers? In survey, most give thumbs-up to pirated papers Meet the Robin Good of science, Big Think Sci-Hub: The academic cat is out the bag, Plum Anal Bibliogifts in LibGen? Study of a text sharing platform driven by biblioleaks and crowdsourcing Fast and furious (at publishers): The motivations behind crowdsourced research sharing Sci-hub provides access to nearly all scholarly literature Black open access in Ukraine: Analysis of downloading Sci-Hub publications by Ukrainian Internet users Pirating European studies Analysis of Sci-Hub downloads computer science papers Loking into Pandora's Box: The content of Sci-Hub and its usage The new and ultimate disruptor? View from the front The Sci-Hub case: Why it is time to stop favouring the doctrinal approach to law over an empirical one The Sci-Hub effect: Sci-Hub downloads lead to more article citations Sci-Hub, a challenge for academic and research libraries, El Prof. de la Inf