key: cord-0589036-egrvua59 authors: Sourati, Jamshid; Evans, James title: Accelerating science with human versus alien artificial intelligences date: 2021-04-12 journal: nan DOI: nan sha: 76e31cac217d8af2dc91b6dbe5265c8066fffa46 doc_id: 589036 cord_uid: egrvua59 Data-driven artificial intelligence models fed with published scientific findings have been used to create powerful prediction engines for scientific and technological advance, such as the discovery of novel materials with desired properties and the targeted invention of new therapies and vaccines. These AI approaches typically ignore the distribution of human prediction engines -- scientists and inventor -- who continuously alter the landscape of discovery and invention. As a result, AI hypotheses are designed to substitute for human experts, failing to complement them for punctuated collective advance. Here we show that incorporating the distribution of human expertise into self-supervised models by training on inferences cognitively available to experts dramatically improves AI prediction of future human discoveries and inventions. Including expert-awareness into models that propose (a) valuable energy-relevant materials increases the precision of materials predictions by ~100%, (b) repurposing thousands of drugs to treat new diseases increases precision by 43%, and (c) COVID-19 vaccine candidates examined in clinical trials by 260%. These models succeed by predicting human predictions and the scientists who will make them. By tuning AI to avoid the crowd, however, it generates scientifically promising"alien"hypotheses unlikely to be imagined or pursued without intervention, not only accelerating but punctuating scientific advance. By identifying and correcting for collective human bias, these models also suggest opportunities to improve human prediction by reformulating science education for discovery. Research across applied science and engineering, from materials discovery to drug and vaccine development, is hampered by enormous design spaces that overwhelm researchers' ability to evaluate the full range of potentially valuable candidate designs by simulation and experiment 7 . To face this challenge, researchers have initialized data-driven AI models with published scientific results to create powerful prediction engines. These models are being used to enable discovery of novel materials with desirable properties 2 and targeted construction of new therapies 4 . But such efforts typically ignore the distribution of scientists and inventors-human prediction engines-who continuously alter the landscape of discovery and invention. As a result, AI algorithms unwittingly compete with human experts, failing to complement them and augment collective advance. As we demonstrate below, incorporating knowledge of human experts and expertise can improve predictions of future discoveries by more than 100% above AI methods that ignore them. Nevertheless, with tens of millions of active scientists and engineers around the world, is the production of artificial intelligences that mimic human capacity our most strategic or ethical investment? By not mimicking, but rather avoiding human inferences we can design "alien" AIs that radically augment rather than replace human capacity. Identifying the bias of collective human discovery, we demonstrate how human-avoiding or alien algorithms broaden the scope of things discovered by identifying hypotheses unlikely for scientists and inventors to imagine or pursue with undiminished signs of scientific and technological promise. Our analysis builds on insights underlying the wisdom of crowds 8 , which hinges on the independence and diversity of crowd members' information 9 and approach 10 . In scientific crowds, findings established by more distinct methods and researchers are much more likely to replicate 11, 12 . This diversity of scientific viewpoints was implicitly drawn upon by Donald Swanson in a heuristic approach to knowledge generation. He hypothesized that if Raynaud's disorder was linked to blood viscosity in one literature, and fish oil was known to decrease that viscosity in another, then fish oil might lessen the symptoms of Raynaud's disorder but would unlikely be arrived at by the sparse scientific community available to infer it [13] [14] [15] , one of several hypotheses later experimentally demonstrated [16] [17] [18] . Our approach scales and makes this heuristic continuous, combining it with explicit measurement of the distribution of scientific expertise, and drawing upon advances in unsupervised manifold learning 19 . Recent efforts to generate scientific hypotheses rely heavily on scientific literature, but ignore equally available publication meta-data. By programmatically incorporating information on the evolving distribution of scientific expertise, our approach balances exploitation and exploration in experimental search that enables us to both (1) accelerate discoveries predicted to appear in the future and (2) punctuate advance by identifying promising experiments unlikely to be pursued without intervention. The distribution of research experts across topics and time represents a critical social fact that will stably improve our inference about whether surrounding facts have been tried and abandoned-and should be treated as negative knowledge-or remain available for profitable hypothesis generation 20 . First we do this alongside precise replication of a recent analysis in Nature that predicted materials having desirable electrochemical properties from prior literature encoded with unsupervised neural network methods 1 , but ignorant of the distribution of human expertise. We show that by simply adding information about the location of scientists and their likely inferences, using a formally identical approach, we dramatically (~100%) improve predictions of future materials. Next, we extended this approach to identify a much broader matrix of materials and their functional properties 21 , including drugs and vaccines. Finally, we use expert awareness to identify and validate the scientific and technological promise of research avenues unlikely to be explored by human experts unaided. Specifically, we model the distribution of inferences cognitively available to scientists by constructing a hypergraph over research publications. A hypergraph is a generalized graph where an edge connects a set, rather than a pair, of nodes. Our research hypergraph is mixed, containing nodes corresponding not only to materials and properties mentioned in title or abstract, but also the researchers who investigate them (Fig. 1b) . Random walks over this hypergraph suggest paths of inference cognitively available to active scientists. If a valuable material property (e.g., ferroelectricity-reversible electric polarization useful in sensors) is investigated by a scientist who, in prior research, worked with lead titanate (PbTiO 3 , a ferroelectric material), that scientist is more likely to consider whether lead titanate is ferroelectric than a scientist without the research experience. If that scientist coauthors with another who has previously worked with sodium nitrite (NaNO 2 , also a ferroelectric material), that scientist is more likely to consider that sodium nitrite may have the property through conversation than a scientist without the personal connection. The density of the distribution of random walks over this research hypergraph will be proportional to the density of cognitively plausible inferences. If two literatures share no scientists, a random walk over our hypergraph will rarely bridge them, just as a scientist will rarely consider connecting a property valued in one with a material understood in another (Fig. 1a) . Our model (1) initiates a random walk over the research hypergraph with a valued property (e.g., ferroelectricity), then (2) randomly selects an article (hyperedge) with that property, then (3) randomly select a material or author from that article, then (4) randomly selects another article with that material or author, etc., following a Markov process 22, 23 . Such a random walk induces similarity metrics that capture the relevance of nodes to one another. The first metric we use draws upon the local hypergraph structure to estimate the probability a random walker travels from one node to another in a fixed number of steps (see Supplementary Information). Our second metric is based on a popular, unsupervised neural network-based embedding algorithm (deepwalk 24 ) over the generated random walks. This method is formally identical to the word embedding method used in replicated prior work that ignores the distribution of scientists 1 , but which we apply to our hypergraph, considering every random walk sequence as a "sentence" linking materials, experts and functional properties (e.g., store energy; cure breast cancer, vaccinate against COVID-19). The resulting embedding maps every node to a numerical vector, with the dot-product between any pair reflecting the relatedness of corresponding nodes. We also created a comparable embedding space using deeper graph convolutional neural networks that did not change the pattern of results presented here (see Methods and Supplementary Information). Accelerating science by predicting future discoveries Pairwise relevances estimated across our mixed hypergraph reveal distinct phenomena. The relevance of a material to a scientist measures the likelihood that she is or will become familiar with that concept through research experience, related reading, or conversation. The co-relevance of materials suggests that they may be substitutes or complements within the same experiment. The relevance of a material to a property suggests both the likelihood that the material may possess the property, but also that a scientist will likely discover and publish it (Extended Data, Fig. 1a, 1b) . In this way, our hypergraph-induced similarities incorporate physical and material properties latent within literature, but also the complementary distribution of scientists, enabling us to anticipate likely inferences and predict upcoming discoveries. We assessed the pool of materials available to scientists in the literature published prior to the prediction year, ranked materials in terms of their discovery likelihood based on transition probabilities and unsupervised embeddings, then compared those rankings with actual first-time published linkages between materials and properties in published research (see Methods for further details). To demonstrate the power of accounting for human experts, we considered the valuable electrochemical properties of thermoelectricity, ferroelectricity and photovoltaic capacity against a pool of 100K candidate compounds, contrasting our predictions with replicated prior work that did not account for human expertise 1 . We repeated identical analyses for 17 prediction periods, with prediction years ranging from 2001 to 2017, predicting future discoveries as a function of research publicly available to contemporary scientists. We computed annual precisions until the end of 2018, such that the longest precision array was nearly two decades (18 years, from 2001 to 2018) and the shortest was 2 (2017-2018, Extended Data, Fig. 1c ). Replicating the evaluation method of Tshitoyan et al. on the same dataset (1.5M articles about inorganic materials) 1 , predictions that account for the distribution of materials scientists outperformed baselines for all properties and materials by an average of 100% ( Fig. 2b-d) . Drug Repurposing. We used the same approach to explore the repurposing of~4K existing FDA approved drugs to treat 100 critical human diseases. We used the MEDLINE database of biomedical research publications and set the prediction year to 2001. Ground-truth discoveries were based on drug-disease associations established by expert curators of the Comparative Toxicogenomics Database (CTD) 25 , which chronicles the capacity of chemicals to influence human health. Figure 1a reports prediction precisions 19 years after the prediction year, revealing how accounting for the distribution of biomedical experts in our unsupervised hypergraph embedding yields predictions with 43% higher precision than identical models accounting for article content alone. Moreover, we found a strong correlation between prediction precision and drug occurrence frequency in literature (r=0.74, p<0.001), implying that our predictors work best for diseases with relevant drugs mentioned frequently in prior research. Finally, we considered therapies and vaccines to treat or prevent SARS-CoV-2 infection. Here prediction year was set to 2020, when the global search for relevant drugs and vaccines began in earnest. Following Gysi et al. 26 , we considered a therapy relevant to COVID-19 if it amassed evidence to merit a COVID-related clinical trial, as reported by ClinicalTrials.gov. Results shown in Figure 1e indicate that 36% and 38% of the predictions made by deepwalk-based and transition probability metrics entered trials within 12 months of the date of prediction, respectively, 350 to 400% higher than the precision of discovery candidates generated by semantic content alone (10%). These precisions were even higher than a predictive model based on an ensemble of deep and shallow learning predictors trained on multiply measured protein interactions between COVID-19 and the pool of 3,948 relevant compounds from DrugBank 26 , information to which our model was blind (see Extended Data, Fig. 2 for alternative measurement). The success of these COVID-19 predictions suggests how fast-paced research on COVID therapies and vaccines increased the relevance of scientists' prior research experiences and relationships for the therapies and vaccines they would come to imagine, evaluate and champion in clinical trials. Consider the clinical trial of the female progesterone for treating COVID-19 27 . The trial was motivated by factors including the lower global death rate of women than men from COVID-19 and anti-inflammatory properties of progesterone that may moderate the immune system's overreaction to COVID-19 in men 28 . Random walks from our method frequently walked the path between "coronavirus" and "progesterone" literatures to predict clinical study of progesterone for coronavirus complications (Extended Data, Fig. 5 ). Our technique traced a pathway similar to the one articulated by researchers sponsoring the trial: 75% of trial-cited papers, published within the five-year period we considered in building our hypergraph (2015-2019), were identified and used by our prediction model, and 60% of scientists authoring those studies were sampled in our random walk sequences. Our predictive models use the distribution of discovering experts to successfully improve discovery predictions. This is demonstrated by time to discovery, which is inversely proportional to the size of the expert population who studied both property and material in their research. If we define expert density between a property and material as the Jaccard index of experts who mentioned both in recent publications, higher densities suggest the two are cognitively available to more scientists, and that their underlying relationship (if any) is more likely to be investigated earlier. For materials, COVID-19 therapies and vaccines, and a majority of the 100 diseases we considered, correlations between discovery date and expert densities were negative, significant and substantial (Extended Data, Fig. 3 ) showing that materials considered by experts familiar with a property are discovered sooner. Our predictive models efficiently incorporate these expert densities (Extended Data, Fig. 4 ). Similar results can be derived based on embedding proximities: Fig. 3 (top row) illustrates how our predictions cluster atop density peaks in a joint embedding space of experts and the materials they investigate. These expert-material proximities predict discoverers most likely to publish discoveries based on their unique research backgrounds and relationships. Computing the probability of transition from properties to experts through a single intermediate material across 17 prediction years (2001 to 2017), we found that 40% of the top 50 ranked potential discoverers became discoverers of thermoelectric and ferroelectric materials one year after prediction, and 20% of the top 50 discovered novel photovoltaics (Fig. 3 , bottom; see also Extended Data, Fig. 6) Punctuating science by predicting unlikely discoveries As illustrated above, by identifying properties and materials cognitively available to human experts, we maximize the precision of predicting published material discoveries. Almost all published discoveries lie in close proximity to desired properties based on hypergraph induced from prior literature (Fig. 4a ). By contrast, if we avoid the distribution of human experts, we can produce in-human, "alien" predictions designed to complement the scientific community. These predictions are cognitively unavailable to human experts based on the organization of scientific fields, prevailing scientific attention, and expert education, but nevertheless manifest strong mechanistic promise for possessing desired scientific properties (Fig. 4b) . Here, we propose a generic framework for identifying disruptive discovery candidates expected to possess desired properties, but least likely to be studied by human scientists or discovered in the near future without machine recommendation (Fig. 1a , right). Our framework combines two components: an alien component that measures the degree to which candidate materials are beyond the scope of human experts' research experiences and relationships, and a second that rules out those predicted scientifically irrelevant (Fig. 4c) . Each component scores entities based on human availability and scientific plausibility. The two scores are then combined with a simple mixing coefficient . Setting =0 implies full emphasis on scientific plausibility, blind to the distribution of experts. Decreasing imitates human experts and increasing avoids them. At extremes, =-1 and 1 yield algorithms that generate predictions very familiar or very strange to experts, regardless of their scientific merit. Non-zero positive s balance exploitation of relevant materials with exploration of areas unlikely considered or examined by human experts. Materials are ranked by their final scores s with highest-ranked items reported candidates for disruptive discovery. Human availability is assessed with any graph distance metric varying with expert density (e.g., unsupervised neural embeddings, Markov transition probabilities, self-avoiding walks from Schramm-Loewner evolutions). Scientific merit is quantified through theory-driven simulation of material properties. For thermoelectricity, power factor (PF) represents an important component of the overall thermoelectric figure of merit, zT, calculated using density functional theory for candidate materials as a strong indication of thermoelectricity 29, 30 . For COVID-19, proximity between SARS-CoV-2 and candidate compounds in protein-protein interaction networks suggests the likelihood a material will recognize and engage with the virus 26 . If theoretical predictions are unavailable, one may approximate scientific relevance with proximity in unsupervised literature embeddings 1 . Fig. 4d shows the results of running our hybrid model with different s for thermoelectricity, and Extended Fig. 7 for COVID-19. In both, we normalize, rescale and linearly compose alienness derived from shortest-path distance, and scientific plausibility from word embedding proximity (see Methods). We reserve theory-driven indicators based on power factor and protein-protein network proximity to evaluate our predictions, rather than establish scientific plausibility as they would in a deployed system. Increasing from zero to one, candidate materials were less likely to be conceived, discovered, and published, but PF and protein interaction likelihood remained strong for all but the most alien predictions. Intermediate s resulted in a balanced trade-off with strong values of PF and protein-interaction even in distant and completely disconnected materials. This demonstrates the capability of our framework for punctuating scientific advance by proposing alien but scientifically promising candidate materials, with only a naive combination and weighting system. More sophisticated metrics could be employed by incorporating all available prior scientific knowledge and learning combination metrics through self-supervised multi-headed graph convolutional neural networks. These models demonstrate the power of incorporating expert-awareness into artificial intelligence systems for accelerating and punctuating future discovery. Our models succeed by directly predicting human discoveries and the human experts who will make them, yielding an average of 100% improvement in prediction precision. By tuning these algorithms to avoid the crowd, however, they generate scientifically promising "alien" hypotheses unlikely to be imagined, pursued or published without machine recommendation. By identifying and correcting for collective patterns of human attention, formed by field boundaries and institutionalized education, these models complement the contemporary scientific community. A further class of alien predictions could be tuned to compensate not only for emergent bias, but universal cognitive constraints, such as limits on the human capacity to conceive or search through complex combinations (e.g., high-order drug cocktails 31 ). Disorienting hypotheses from such a system will not be beautiful, but being inconceivable, they break unbroken ground and sidestep the path dependent "burden of knowledge" where scientific institutions require new advances built upon the old for ratification and support 32, 33 . Our approach can also be used to identify individual and collective biases that limit productive exploration, and suggest opportunities to improve human prediction by reformulating science education for discovery. Insofar as research experiences and relationships condition the questions scientists investigate, education tuned to discovery would conceive of each student as a new experiment, recombining knowledge and opportunity in novel ways. Our investigation underscores the power of incorporating human and social factors to produce artificial intelligence that complements rather than substitutes for human expertise. By making AI hypothesis generation aware of human expertise, it can race with rather than against the scientific community to expand the scope of human imagination and discovery. 1. (a) Two possible relations between experts, property and material nodes when there exists a hidden underlying relationship between the two (dashed line) to be discovered. Uncolored circles represent human experts and each colored node indicates a material (colored in blue denoted M) or a desirable property they possess (colored in red and denoted P). Solid lines show existing links between expert-material nodes and dashed lines represent existing property-material links that have not yet been discovered. The left case, where concepts P and M share a common collection of experts, is likely to be discovered and published in the near future-they are predictable by scientists, whereas the right case is likely to escape scientists' attention as there is no shared community of experts, and their pursuit would disrupt the current course of science. (b) Illustration of our mixed coauthorship hypergraph for three papers. Uncolored shapes represent authors and colored shapes represent properties (red) or materials (blue) mentioned in article titles and abstracts. These three papers constitute a hypergraph with three hyperedges traced by ellipses. (c) Two initial steps of a random walk process on the hypergraph shown in part (b). Blue and red shapes represent material and property keywords, respectively. Papers (hyperedges) are sampled uniformly whereas, if is set, nodes are selected such that the probability of sampling an entity is times the probability of sampling an author. Note that is the only parameter of this non-uniform sampling ( can be uniquely determined from ). (d) Four example random walk paths starting from property "Coronavirus"-relevant and ending in Progesterone (a chemical under clinical trial investigation for therapeutic efficacy). Each arrow connecting two nodes indicates a sampling step, where the paper shown on top of the receiving node comprises a hyperedge containing that material and the property, author, or material from the prior step. The property node is located at the center; each concentric orbit represents a particular (range of) SP-ds, where the last orbit includes materials disconnected from the property ( SP-d). The size and color of each arc show the total number of compounds with corresponding SP-d from the property and their average PF scores, respectively. The further a compound is located from the property, the less cognitively available it is to scientists. (a) shows that human discoveries mainly lie in close proximity to the property node (highly cognitively available). Nevertheless, (b) shows that candidates of discoveries in each year are distributed more broadly across the network of scientists. Moreover, there exist materials with strong PF scores in distant orbits including the last one that is completely disconnected from the property. (c) indicates that our alien AI algorithm could capture cognitively unavailable hypotheses that are also evaluated to be scientifically plausible (strong PF values) except for very high values. (d) Illustration of our general alien AI framework as a weighted combination of human (un)availability and scientific plausibility scores s. Combining these scores will result in a final ranking of materials from which candidate hypotheses will be chosen. The mixing Coefficient (varying in range [-1,1]) determines how much weight we give to the availability of hypotheses to human scientists. Setting =0 implies full attention to scientific plausibility with no emphasis on human (un)availability. At the extremes, =-1 and 1 sets the objective to be solely imitating and/or avoiding the expert distribution, respectively. Given a specific property and a pool of materials, each discovery prediction experiment consists of computing a set of scores based on the literature prior to the prediction year and selecting 50 materials with the highest scores. Precision of predictions could then be computed against ground-truth discoveries in subsequent months or years. We collected several corpora of scientific articles and considered relationships between materials and various properties. Forming a mixed hypergraph requires a disambiguated set of authors for all scientific articles. Our testbed consisted of two datasets: a collection of~1.5M articles published between 1937 and 2018 classified by Tshitoyan et. al (2019) relating to inorganic materials 1 , and the MEDLINE database that includes more than 28M articles published in various biomedical fields over the span of more than two centuries. We downloaded the former using Scopus API provided by Elsevier (https://dev.elsevier.com/), which readily assigns unique codes to distinct authors. In order to author-disambiguate PubMed database, we used disambiguation results provided by PubMed Knowledge Graph (PKG) 34 , which were obtained by combining information from the Author-ity disambiguation of PubMed 35 and the more recent semantic scholar database 36 . For energy-related materials science, we extracted the pool of materials from the collected 1.5M articles using Python Materials Genomics 37 and direct rule-based string processing. Material-property association was considered to be established if the material co-occurred with any of the property-related keywords. First-time co-occurrences were defined as ground-truth discoveries, following relevant prior work 1 . For the case of drug repurposing, we began with a pool of 7,800 approved candidate drugs downloaded from the DrugBank database. We then built our drug pool using approximately 4,000 drugs possessing simple names (e.g., by dropping complex names containing several numerical parts). We chose 100 diseases from the Comparative Toxicogenomics Database (CTD) 25 that had the largest number of relevant drugs from our drug pool. We searched for names of drugs and diseases in MEDLINE to detect their occurrence within papers to build a hypergraph. Ground-truth relevant drugs for the selected diseases were extracted from the associations curated by CTD. The discovery date for each of the disease-drug associations was set to the earliest publication reported by CTD for the curated or inferred relevance. We ran separate prediction experiments for each disease. The same pool of drugs and corpus of papers were used in case of COVID-19, where their relevance to COVID-19 were identified based on their involvement in COVID-related studies reported by ClinicalTrials.org in or after 2020, regardless of the studies' results. Date of discovery for each relevance was set to the date that the corresponding study was first posted, and if the drug was involved in multiple trials we considered the earliest date. There have been 4,899 trials posted as of March 3rd, 2021 (ignoring 32 trials dated before 2020), which included 251 drugs from our pool (~6%) included in their designs. In practice, coauthorships that occurred long before the time of prediction will neither be cognitively available nor perceived as continuingly relevant. Therefore, we restrict our prediction experiments to use literature produced in the 5 years prior to year of prediction. For each property, we took 250,000 non-lazy, truncated random walk sequences starting from the property node and terminating after 20 steps or after reaching a deadend node with no further connections. Without constraining the space, the majority of hypergraph nodes belong to experts-there are more authors on the average article than materials studied within it. We devised a biased random walk algorithm to compensate for this imbalance, controlled by a parameter , which defines the probability ratio of selecting conceptual (e.g., molecules or materials) to author nodes in any given paper. Larger results in the higher frequency of selecting conceptual nodes and =1 implies a balanced mixture of authors and entities (Fig. 1c , also see Methods). Note that deepwalk similarity is much more global than transition probability, provided the length of our walks (~20) are much longer than the transition steps considered (2-3), and it is more flexible as the walker's edge selection probability distribution can be easily modified to explore the network structure more deeply 38 . Note that authors heavily outnumbered materials in all our databases. To mitigate this imbalance, we introduced a non-uniform node sampling distribution parameterized by , defined as the ratio of the probability of sampling a material or property to the probability of sampling an author in any given paper. A random walker with =1 tends to select roughly equal number of authors and materials. In practice, we sampled from a mixture of two uniform distributions with weights 1/(1+ )|A| and /(1+ )|M| assigned to authors in set A and materials/property in set M, respectively, where |A| denotes the cardinality of set A. Multistep transition probabilities are directly computed from transition matrices using Bayesian rules and Markovian assumptions (Supplementary Information). For deepwalk representation, we trained a skipgram Word2Vec model with embedding dimensionality of 200 over the truncated random walk sequences. In the task of discovery prediction, we discarded author nodes from the generated random walk sequences and training was performed over property/material tokens only. The training hyperparameters here were set equal to the ones used when training the Word2Vec baseline model, i.e., window size of 8, negative sampling size of 15 and learning rate of 0.01, which linearly decayed to 0.001 during iterations. The only exception is the number of epochs, which was 30 for baseline and 5 for the network representation. The size of the vocabularies produced in deepwalk sentences had much smaller tokens than the baselines, as a result they required less effort to capture inter-node relationships. We also ran our prediction experiments after replacing deepwalk representation with a graph convolutional neural network. We used Graph Sample and Aggregate (GraphSAGE) model 39 with 400 and 200 as the dimensionality of hidden and output layers with Rectified Linear Units (ReLU) as the non-linear activation in the network. Convolutional models require feature vectors for all nodes but our hypergraph is inherently feature-less. Therefore, we utilized the word embeddings obtained by our Word2Vec baseline as feature vectors for materials and property nodes. A graph auto-encoder was then built using GraphSAGE architecture as the encoder and an inner-product decoder and its parameters were tuned by minimizing the unsupervised link-prediction loss function 40 . We took the output of the encoder as the embedded vectors and selected the top 50 discovery candidates by choosing entities with the highest cosine similarities to the property node. In order to evaluate the importance of the distribution of experts for our prediction power, we trained this model on our full hypergraph and also after withdrawing the author nodes (see Supplementary Information) . Running the convolutional model on energy-related materials and properties yielded 62%, 58% and 74% precisions on the full graph, and 48%, 50% and 58% on the author-less graph for thermoelectricity, ferroelectricity and photovoltaics, respectively. These results show a similar pattern to those obtained from deepwalk although with a somewhat smaller margin, likely due to the use of Word2Vec-based feature vectors, which limit the domain of exploration by the new embedding model to within proximity of the baseline. Our alien knowledge discovery machine assigns human availability (or alienness) and scientific plausbility scores to each material with respect to a given property, which will be combined with a mixture weight . In our AAI experiments, human unavailability was measured through shortest-path distance (SP-d) to the property node and scientific relevance was quantified by semantic similarities based on word embedding models (e.g., word2vec). The latter often yields continuous scores distributed similar to a Gaussian variable, but the former offers unbounded ordinal scores. This prevents us from directly combining them through Z-scores. To address this issue, we first transformed the two variables according to the Van der Waerden formulation 41 before taking the weighted average of their Z-scores (see Supplementary Materials for comparison to other combination methods). When evaluating AAI, we leveraged the property's theoretical scores obtained based on or prior knowledge from the relevant fields to assess scientific validation of candidates. For thermoelectricity, we used Power Factor (PF) as a scalar score indicating how likely a material is thermoelectric based on theoretical simulations. PF is proportional to the electric conductivity and the absolute temperature and plays a key role in the more general metric of figure of merit (zT). Moreover, Tshitoyan et al. showed that materials that have been studied in conjunction with thermoelectricity in the literature tend to have higher PF scores. In our AAI experiments, we restricted the pool of entities to those for which there existed pre-calculated scores in the same database that Tshitoyan et al. had used 1, 30 , which formed 30% of the unstudied materials in the corresponding five-year period (1996 to 2000) . The theoretical scores that we used for evaluating our AAI method in case of COVID-19 were based on protein-protein interaction of the drugs with the SARS-CoV-2 viral target. Recently, Gysi et al. showed that existing drugs whose target proteins are within or in vicinity of the COVID-19 disease module are potentially strong candidates for repurposing 26 . They employed 12 network-based strategies individually and collectively to identify the most relevant candidate drugs, for which their rank-based combination yielded the best performance. We utilized the inverse of the aggregated ranks from their ensemble strategy as scores to theoretically measure material relevance to COVID-19. These scores were based on our prior knowledge of the target proteins associated with drugs and disease, to which our AAI method was blind. Fig. 1 . Sanity checks on our hypergraph-induced transition probability similarity metric. (a) Between an author and a conceptual node: Histogram of the similarities between nodes of two sets of authors and the node associated with the term "coronavirus". The two sets of authors are defined as authors of 5,000 randomly selected papers from journals Nature Medicine (red) and Applied Optics (turquoise) between 1990 and 2019. We computed similarities between the hypernodes as the logarithm of the average transition probabilities with one and two random walk steps. The histograms are plotted considering only non-zero transition probabilities: 92% of the authors of Nature Medicine (28, 396 in total) and 51% of the selected Applied Optics authors (18,530 in total) had non-zero similarity values. Also, the average non-zero similarities associated with Nature Medicine authors (red dashed line) is almost 5 times larger than that of Applied Optics authors (blue dashed line), implying that based on the hypergraph-induce similarity metric the authors publishing in Nature Medicine write papers more relevant to coronavirus in comparison to those publishing in Applied Optics. (b) Between two conceptual nodes: similarities between several conceptual keywords shown on the x-axis and the node corresponding to "coronavirus". Similarities between the hypernodes are computed as the average transition probabilities with one and two intermediate nodes. The terms and symptoms known to be more relevant to coronavirus have larger average transition probabilities. Fig. 3 . Spearman correlation coefficients of expert density (Jaccard index) between individual properties, the actual discoveries, and date of discovery. Negative correlations imply that entities with higher expert densities are likely to be discovered earlier than others. These results were obtained for discoveries after 2001 for energy-related properties and drugs repurposing applications, and after 2020 for COVID-19. The turquoise bars represent correlations with statistical significance (p-value<0.05) while the red bars had larger p-values indicating nonsignificant results. Moreover, for seven diseases in the CTD database all actual drugs repurposings, i.e., actual discoveries, occurred in a single year (we did not have reliable access to the month or day of discoveries from this database) and hence no correlation coefficients could be computed for them. The results indicate that the materials science properties and also COVID-19 showed strong negative correlations. In the case of CTD database, 67 out of 100 diseases (i.e., properties) showed statistically significant correlations, among which only one disease showed positive coefficient. The average of correlation coefficients across these 67 diseases was -0.18. Fig. 4 . Distribution of expert densities between predicted discoveries and the corresponding properties: (a) drugs repurposing application (considering only the 67 diseases with statistically significant Spearman correlation coefficients, see Extended Data, Fig. 3) ; (b-d) energy-related materials science properties, i.e., thermoelectricity, ferroelectricity and photovoltaic capacity, respectively; and (e) therapies and vaccines for COVID-19. Curves measure normalized densities over the logarithm of Jaccard indices plotted by fitting a Beta distribution over expert densities for the 50 predictions. Solid and dashed lines represent mean values for the corresponding densities. It is clear that the distribution of expert densities for hypergarph-induced metrics (transition probability and deepwalk-based similarity) are concentrated around larger Jaccard index values than word embedding models tracing content alone. In content models, all estimated densities peak at zero (0