key: cord-0967638-iuocntyz authors: Ghouchan Nezhad Noor Nia, Raheleh; Jalali, Mehrdad; Mail, Matthias; Ivanisenko, Yulia; Kübel, Christian title: Machine Learning Approach to Community Detection in a High-Entropy Alloy Interaction Network date: 2022-04-04 journal: ACS Omega DOI: 10.1021/acsomega.2c00317 sha: a2caebe34703d7fbc7f35710c24aa4d8fdc6bf5e doc_id: 967638 cord_uid: iuocntyz [Image: see text] There is a growing trend toward the use of interaction network methods and algorithms, including community-based detection methods, in various fields of science. The approach is already used in many applications, for example, in social sciences and health informatics to analyze behavioral patterns during the COVID-19 pandemic, protein–protein networks in biological sciences, agricultural science, economy, and so forth. This paper attempts to build interaction networks based on high-entropy alloy (HEA) descriptors in order to discover HEA communities with similar functionality. In addition, these communities could be leveraged to discover new alloys not yet included in the data set without any experimental laboratory effort. This research has been carried out using two community detection algorithms, the Louvain algorithm and the enhanced particle swarm optimization (PSO) algorithm. The data set, which is used in this paper, includes 90 HEAs and 6 descriptors. The results reveal 13 alloy communities, and the accuracy of the results is validated by the modularity. The experimental results show that the method with the PSO-based community detection algorithm can achieve alloy communities with an average accuracy improvement of 0.26 compared to the Louvain algorithm. Furthermore, some characteristics of HEAs, for example, their phase composition, could be predicted by the extracted communities. Also, the HEA phase composition has been predicted by the proposed method and achieved about 93% precision. Since the ancient eras, human civilization has attempted to discover new and unknown materials, for example, metals and alloys that can all play a key role in the overall quality of human life. Since the Bronze Age, alloys have been produced based on a "basic element" pattern containing one principal element. Various elements are added to the basic element to improve selected properties. 1 Over the past decades, a new approach has been introduced to design alloys, which involves mixing typically five elements or more in equimolar amounts to produce balanced alloys called high-entropy alloys (HEAs). 1 These were initially introduced and developed by Cantor et al. 2 and Yeh et al. 3 The entropy of mixing for these complex alloys is high. The atoms used to create the alloys have a similar size. 6 HEAs have been widely investigated due to their attractive properties, for example, thermal and electrical conductivity, 4 high corrosion resistance, 5 and high strength in combination with high ductility. A parametric method is commonly used to understand and predict the phase stability, often used in pairs presented by a two-dimensional diagram. 17 Although there are many parametric and statistical methods in the field of materials science, machine learning (ML) is considered one of the most effective methods in materials science. 7 ML algorithms are capable of learning models to explore communities and provide results effectively. The purpose of the present study is to introduce a new model for the HEA interaction network, which is made based on HEA descriptors. This model measures similarities among HEA descriptors by creating a network of interactions based on similarities. Communities are extracted from the interaction network so that each community comprises similar HEA compounds. These compositions are available for interpretation. The outcomes of this paper are communities that can help anticipate HEA phases and detect HEA functionalities through the ability to better analyze them. With that, it might be possible to suggest more efficient alloys for selected applications. The HEAs containing at least five elements with equal or similar atomic percentages have high strength. 32 These alloys are different from conventional alloys due to four main effects, which include the (1) high-entropy effect, (2) sluggish diffusion effect, (3) lattice distortion effect, and (4) cocktail effect. 32 These effects contribute to the ultimate strength or hardness of HEAs. 32 In a study carried out by Ye et al., 1 the phase formation of HEAs and their new properties are discussed, such as strength, mechanical performance at high and cryogenic temperatures, ductility, hardness, magnetism, and electrical conductivity. By using ML in HEAs, the design of alloys can be facilitated and used to discover new compounds with desirable properties. Dai et al. 8 introduced a method that creates a low dimensional descriptor for predicting the phase content of HEAs based on their composition. The process behind their method has several stages: first a coefficient analysis is used to select closely related highly relevant descriptors. It then increases dimensions, which is based on the main structure of primary functions and finally, it is important to select descriptors to explain the material. 8 The main focus is on predicting the alloy phase composition. Using a Pearson correlation, the highly correlated descriptors are removed. They created a new nonlinear descriptor, which analyzes the relationship between descriptors to eliminate additional descriptors. 8 Based on the study performed, the authors proposed a framework to collect HEAs data for interpretation. 27 The features are removed which are causing weak phase predictions. 27 The various ML classifiers are used to predict the HEAs phases the HEA interaction network is not created to extract similar compounds as a community. 27 In a study performed by Kaufmann and Vecchio, 21 a ML method is applied to HEAs to predict solid solution forming ability. Thermodynamic and chemical features are used to do predictions by a random forest model. 21 The HEAs descriptors are not used to compute a hybrid similarity to construct a HEA interaction network. 21 identify similar Based on another study, the authors presented a structure by using a genetic algorithm to choose an effective ML model and features for HEAs phase forecast. 22 The content and structural similarity are not considered by the proposed technique to make a HEAs interaction network. 22 Similar HEAs compositions are not detected as a cluster. 22 To the best of our knowledge, a full interaction network of HEAs has not yet been created to analyze it by social network tools. One of the social network analysis tools is community detection. Communities are a group of compounds that can be used to improve functionality, new compound, and new descriptor discovery. Most studies are about phase detection of alloys by ML methods, which predict the phase composition of a compound, HEA communities can be extracted to predict phases more accurately. In addition, each community consists of HEAs with a similar descriptor and behavior. In a study, carried out by M'Barek et al., 26 biological interaction networks have explored such as genes or proteins. The communities extracted from these biological networks are a set of proteins or genes that collaborate on a similar cellular functionality. By using a genetic algorithm, they presented a specific fitness function based on the amount of similarity and interaction between genes. 26 They have used the semantic similarity in a KEGG data set, which they have taken as score for the structure of communities. It is calculated according to the semantic similarity method based on a genetic ontology. 26 Based on another study, a modified Deepwalk algorithm is presented by the authors, which predicted a link in the proteinprotein interaction network. 31 The feature dimensions are reduced to integrate the network structure and features for link prediction, 31 an HEA network by using their descriptors that is used to detect community for similar nodes. 31 In another study, the authors proposed a new tool named MOFSocialNet based on creating social networks using a metal−organic framework (MOF) database. MOFSocialNet is able to guide MOF researchers through the vast chemical space of existing and hypothetical MOFs. For a demonstration, they used social network analysis to identify the most representative MOFs in this research data set and to detect MOF communities. 33 In another study carried out by Ahajjam et al., 25 a scalable and deterministic approach is proposed to identify communities using leader nodes called the community leader recognition approach. Their approach has two main steps: the first step is to retrieve the leaders and the second step is to identify the community using the similarities between the nodes. Two important issues in their work are community recognition and leader detection in complex networks. 25 The network leader nodes are responsible for disseminating the influence and then, using the similarities between the nodes, the communities around the leader are formed. In social networks, the central nodes are responsible for spreading the intrusion. The advantage of this method is that there is no need for prior knowledge of the number of leaders and communities. They start by finding a leader to identify the most effective nodes and then extract the communities. For each leader, a community is obtained by calculating the similarity between the nodes. 25 They distinguish communities based on the similarities of the nodes with the leader, who are all in the leader's neighborhood. They used real social network data sets and used the Jaccard, Salton, Human Development Index (HDI) and Human Poverty Index (HPI) to calculate the similarity for finding out which works best in finding the leader of their method. 25 In a study carried out by Zhao et al., 28 a community detection algorithm based on graph compression is introduced, which is effective in large networks. The compressed graph is first obtained by repeatedly merging nodes of degree one or two with their bigger degree neighbors. Then, two indexes, namely, the density and quality of nodes, are defined to evaluate the probability of nodes as the seed of a community. With these two criteria in a compressed social network, the number of communities and the initial members of the related community are determined. 28 They use the real social network data set to evaluate their method. There is no similarity computation between nodes to create a HEA interaction network. In a study conducted by Rostami et al., 30 a community detection based on a genetic algorithm is presented for feature selection, which has three steps. It first calculates the similarities of the features, then the features are classified into clusters by community detection algorithms, and finally the features are selected by a genetic algorithm with repair operation based on a new community. A community detection method is used in their approach to divide features into different groups. Using Pearson correlation similarity, the similarity of the features is examined. 30 Clustering is performed on features, and a threshold is set for determining the number of features in each cluster to reach that number using random repair or score repair. Their proposed method selects the optimal number of features, which is automatically determined based on the overall feature structure and their internal similarities. 30 They achieved accurate results in community detection by the genetic algorithm. 30 In a study conducted by Ozaki et al., 20 the pruning method was added to the Louvain algorithm to optimize computational time while maintaining the quality of the community detection and its process. Using this method simplifies the entire process as calculations on the quality of the clusters do not occur at each phase for all nodes, but instead, such calculations are done for nodes that are used in the next phase of the community detection. Ozaki et al. 20 have applied the Louvain algorithm to network that have similarity among nodes calculated based on a Cosine similarity. 20 The novelty of this paper lies in establishing an interaction network for HEAs that has not been implemented in the field of alloy metallurgy so far. In addition, an interaction network analysis method has been used to analyze the HEAs. This particular method uses ML algorithms for the alloy community detection, along with the Louvain algorithm and the particle swarm optimization algorithm. In this section, we present our approach for community detection based on interaction networks of HEAs, using the concepts of Louvain and modified particle swarm optimization (PSO) algorithms. The community members are HEAs that are merged to find the best community number by the Louvain method and by the modified PSO, which are considered the node connections. Our method solves the community detection problem by maximizing an objective function called modularity. Initially, a descriptor for HEAs is selected and a preprocessing is carried out. Communities are extracted by Louvain and by modified PSO algorithms. Our approach is based on the following five steps: • Data set was preprocessed to perform ML algorithms. • Descriptor content similarity was calculated. • Interaction network of HEAs was created. • Descriptor structural similarity was calculated. • Communities that maximize the objective function were extracted The data set used in this paper contains 90 HEA alloys, as listed in Table 1 in ref 1, each of which is characterized by six descriptors (Appendix A). 1 δ is the atomic size difference. 9 ΔH mix (in kJ/mol) is the mixing enthalpy, calculated using eq 1: 1 S c (in k B /atom, where k B is Boltzmann's constant) is the configurational entropy of mixing for an ideal solid solution. 1 φ is a single dimensionless thermodynamic parameter for designing HEAs. 6 ε RMS is the root-mean-square residual strain, usually measured through the energy storage density of the elastic pressure. 10 VEC is considered an important parameter in the selection of the valence electron concentration of the alloys due to the lack of robust atomic size difference. 1 It should be noted that the data class label can also be called phase, which is not considered in these calculations, and the most important challenge in the current article is to find the relationship among similar alloy types. The six descriptors and a portion of the data set are listed in Table A1 in Appendix A, the first column of which contains the number for each chemical composition of the HEA alloy used in the results in Section 5. The second column in Table A1 gives the HEA chemical composition, and the other six columns show the values of the six descriptors for each composition. Algorithm 1 shows the pseudocode of the proposed method that detects communities using Louvain and PSO algorithms: The flowchart of the proposed method is shown in Figure 1 . There are three stages in the proposed method. The first stage is preparing the data, which consists of three steps including HEA feature selection, feature vector creation, and normalization. The second stage is creating an HEA interaction network by using similarity and pruning graph methods. The third step is to apply a ML algorithm that extracts communities from the network. Finally, the modularity is measured, which shows the quality of the communities. 3.1. Data Normalization. Normalization is used when the provided data values are not in the same range and have different intervals to prevent properties and descriptors that contain large values to dominate the overall performance of the system. Additionally, the normalization can potentially minimize the impact of out-of-range scales and maintains all inputs in a single interval. In the present article, min−max normalization was used for property values to normalize the property values to the interval [0, 1] using eq 2: 13 where min A and max A indicate the current minimum and maximum values of the properties found in A. The original values and the normalized values of the properties are presented as v and v′, respectively. As can be seen in eq 2 used above, the maximum and minimum values are 1 and 0. 13 3.2. Content Cosine Similarity Criteria. Content cosine similarity is measured based on the internal angle between two vectors and determines whether the selected vectors are considered codirectional. 11 As shown in the data set in Appendix A, each property of a single composition can be analyzed and compared to another compound. 1 Equation 3 shows the content cosine similarity as follows 11 where x i represents the ith property of the first compound and y i is the ith property of the second compound. 3.3. Structural Jaccard Similarity Criteria. The Jaccard index is mainly used for a comparison of the structural similarity of a data set. 12 The value of the Jaccard similarity coefficient between two data sets is usually obtained by dividing the number of common properties of the two available sets by the total properties of the two sets. 12 Because the input of the interaction network graphical structure is required for the calculation of the Jaccard criterion, the matrix obtained by content cosine similarity must be examined first with different thresholds to find the appropriate value and create the desired graphical representation of the network, so that structural similarities can be measured using the established graph. The threshold to obtain a graph for the content cosine similarity analysis was set at 0.98 in the current study. The description of the structural Jaccard similarity criteria is shown in eq 4: 12 where v i and v j are the two nodes representing the compounds of HEAs, |N i ∩ N j | denotes the common properties of the two compounds v i and v j , and |N i ∪ N j | are all the properties of v i and v j . It is also important to note that this particular criterion can be applied to all common pairs of attributes. 3.4. α Coefficient. The calculation of the parameters for content and structural similarity results in two matrices with similarity values. To detect communities, a hybrid similarity matrix is needed as input that contains similar properties. The α coefficient determines the effect of each of the similarities. The α coefficient also determines the possible effect of each of these common similarities as well as the effect of structural Jaccard similarity. The α coefficient also determines the proper effect of each of content and structural similarities. The output of this phase is a hybrid similarity matrix as required for the community detection algorithm. Each community in the interaction network shows the alloys that have dependencies between each other to perform the same functionality in an equal community. A complex network can be mapped to the graph G(V,E), where V is the node set and E is the edge set. A network C(v,e) is said to be subnetwork, if v is the subset of V and e is the subset of E. Let A be the adjacency matrix; two nodes are adjacent if they have an edge between them. If there exists a link between vertex i and vertex j, then A ij = 1; otherwise A ij = 0. A weighted network has weight w joined to the edges, where w is a real number. Communities in networks are the groups of nodes, which are more profoundly connected to each other than to the rest of the nodes within the network. Community detection is the key characteristic, which may well be utilized to extract valuable information from networks. 29 4.2. Louvain Algorithm. The content of science studies can usually be represented as complex networks, in which the topology of interconnected vertices is obtained from either an organized or random compound. 14 The Louvain algorithm is a metaheuristic method that is introduced to identify and detect communities and groups within the provided graph. In addition, each extracted group represents a community, and this type of algorithm is considered to be an ascending clustering method. 15 Furthermore, a parameter called modularity is used to determine the quality of the obtained communities in this algorithm, and the maximization of this particular parameter is considered to be of great importance. This specific parameter selects the type of communities that are integrated with the target vertex and creates highly modular communities. 15 Despite difficulties in the calculation of modularity in large graphs, the Louvain algorithm can overcome this issue by speeding up the processing of large graphs. 15 This unique property of the algorithm led to its popularity. 16 It is also essential to add that the Louvain algorithm is considered the fastest and most effective algorithm for community detection that tends to operate tirelessly to achieve maximum modularity. The implemented algorithm is divided into two phases that are alternately repeated. Imagine that the procedure begins through a weighted interaction network with N nodes. It first places each node in a separate community, which has just as many nodes as the current network, and then examines the possible neighbors for each node and evaluates the precise rate of modularity, which is accomplished through the removal of the nodes from its related community and transfering them to its neighboring community. Finally, the targeted nodes are placed within the community with the highest possible modularity rate (positive rate); otherwise, it will remain in the current community. Afterward, this process is repeated alternately for all the interaction network nodes of HEAs until no new enhancements are achieved and the first stage of the process is essentially completed. 17 Although this process is repeated several times for each node, the first stage is completed when a local maximum modularity is reached and the rate of modularity remains stable. Examining the order of the nodes in the output of the algorithm may affect the computational load that requires further study. The overall performance quality of the Louvain algorithm for the community detection can be obtained using the modularity rate ΔQ, which is calculated through transferring isolated node i to the C community via eq 5: 17 where Σ in is the sum of the links found within the community C, Σ tot is the total weight of the links connected to the nodes within the community C, k i is the total weight of links associated with node i, k i,in is the sum of the weights of the links connected from node i within the community C, and m is the total weight of all the network links. When node i is transferred from its related community, a similar term is often used to evaluate the modularity changes and adjustments, and the modularity changes can be measured through the removal of node i from its related community and its replacement in the neighboring community. 17 The second stage of this particular algorithm involves establishing a new network of nodes that have previously found their community during the first phase. The weight of the links among new nodes is reached through the total weight of the links between nodes in two respective communities, and the links between the nodes in the same community can potentially lead to an inner circle of the community in the newly established networks. After the second phase is completed, the initial phase of the algorithm can then be reapplied in the previously created weighted network to evaluate the obtained results more accurately, and the combination of these two phases is is termed as a pass. As a result, the overall number of meta-communities decreases with each iteration, and its highest computational load occurs in the initial phase. In fact, these phases are to be continued until the maximum modularity is reached and no further changes occur. This particular algorithm can represent highly complex networks and often operates hierarchically so that the final obtained communities are created and established through an iterative process of integration. Moreover, the height of the hierarchy is determined through the number of iterations, which is usually discovered to be small; take note that this algorithm can possibly have various advantages, such as the visibility of courses that can easily be conducted as well as reaching the targeted outcome without a need for individual attendance or any type of monitoring. Next, it should also be added that the algorithm operates quite as fast and can calculate the modularity rate simply based on Formula 5, which after several repeated courses tends to reduce the number of obtained communities through the integration method. The maximum conduction period of this particular method is related to the initial iteration in the first phase. 17 Also, the qualitative limitations and boundaries of modularity have been eliminated, due to the multilevel nature of the algorithm. Finally, it is also worth mentioning that the isolated nodes are transferred from one community to another, in the first phase of the algorithm. 17 The probability of merging two separate communities through transfer of nodes one-by-one is considered extremely low. However, take note that these communities can very well be merged later, after the consolidation of the nodes is complete. 17 4.3. Community Detection Based on the Particle Swarm Optimization Algorithm. Kennedy and Eberhart initially introduced the PSO algorithm in 1995, which was inspired by the characteristics and behavior of birds. 24 The PSO algorithm is considered to be one of the most important and useful swarm intelligence algorithms, which frequently offers better overall solutions compared to other available algorithms. The mobility found within the particles, which is an array (90 × 1) of nodes, is potentially the best possible way to update each particle for community detection. 19 Optimization of the algorithm may lead to rapid convergence as well as a reduction in the rate of references to the proportionality function, which is directly related to the modularity criterion of community quality. 19 For example, suppose there is a major optimization issue found in the dimensional space of d, where X i = (X i1 , X i2 , ..., X id ) and V i = (V i1 , V i2 , ..., V id ) are the position and velocity vectors, respectively. Let pbest i be the best possible solution for particle i (i = 1, 2, ..., P size ) and gbest be the best possible solution among any type of particle. Furthermore, collaborative and ML of particles are also conducted in each update of pbest i and gbest. Besides, with each iteration of the PSO algorithm, the current velocity and position of the particles can also be updated using eqs 6−8 as follows: 23 ij ij (8) in which the parameter t represents the iteration of the conducted algorithm, w is essentially the inertia coefficient, c 1 and c 2 are the learning rates, rand 1 and rand 2 are random numbers that are uniformly generated in the interval [0, 1], and ρ generally functions as a predefined threshold. 23 4.3.1. Optimal PSO Algorithm and Group Learning. Given that independent communities are obtained by sorting the set of HEA compounds in the interaction network of L(G) materials, they are optimal communities and smaller than G. To identify independent communities in a network, there is a need to discover independent communities in the corresponding line graph. The developed PSO algorithm is used along with group learning techniques resulting in LEPSO, which can be used to optimize the results obtained by linear graph segmentation. 23 4.3.2. Presentation of Community Detection Using Optimal PSO. The linear graph for the chemical composition of alloys is represented as L(G) = ⟨N, E⟩, where N = (n 1 , n 2 , ..., n k ), in which a part of L(G) can be presented as X i = (X i1 , X i2 , ..., X id ) and k = |N|. In the case where the initial value was assumed to be X ij = m, then the results may indicate that there is a relationship between the two compounds e = ⟨n j , n m ⟩ and the X i particles, specifically when n j and n m are found within the same communities as L(G). In order to determine the initial community as the optimal type, each PSO must first be considered as an array of alloy compounds. In this regard, the matrix proximity of the primary interaction network gains the materials using the connected and linked nodes. Some of the potential drawbacks of this design include random initialization of the particles and frequent updates of the particle locations. Moreover, this issue is often so major that the particle components may potentially display links that have never existed before. To solve such problems, particles are recommended to be presented on a list of regular neighbors. 23 The foundation of this particular design is essentially based on the use of data distribution of the neighbors for each node as a representative of an alloy composition, which potentially ensures that newly entered particles used in the process of transference or initialization are all allowed. However, the complete removal of unauthorized particles as well as the prevention of the production of local optimal communities, with the use of repetitive binary division and automated community detection methods, are all considered some of the potential advantages of this optimization method in PSO. 23 4.3.3. Particle Fitness Function in the Optimal PSO. The comprehensible definition of community can encourage researchers to introduce new and different types of quality indicators to evaluate the possible benefits of a partition. The main assumption behind modularity is that the edge density of a cluster should be higher than the predicted density of the sub-graph, so the nodes can randomly be linked. In order to complete the discretization process of the provided algorithm, each node and its relationship with the other available nodes has to be individually analyzed and checked. Therefore, the link between the initial compound and the other compositions is obtained first, and the adjacency matrix is established subsequently. Finally, the particle fitness function that can determine the quality of final-phase communities, also known as modularity is shown in eq 9 23 : In this equation, fit(P i ) is the particle proportionality and fitness value of P i , m is the number of communities found in the C partition of the network as G = ⟨N, E⟩, l c is the number of edges that link the vertices in the community, which is shown as c ∈ C, d c is the sum of the nodes within C, and |E| is the total number of edges found in G. Particle Speed Update. An optimal particle velocity updating algorithm called GbestGenerator is used to avoid the local optimization method, which applies a voting-based clustering technique to take full advantage of valuable hidden community patterns found in less efficient particles and gbest values. In case the proportion of gbest does not improve in T max consecutive iterations, meaning that if the swarm particles are trapped in, member particle clusters MPS are created through the selection of all available gbest particles within the T max and its consecutive iteration particles leading to the combination of the right MPS particles to produce new gbest particles. Accordingly, each particle can potentially have a minimum and a maximum speed for velocity. 23 Equation 6 suggests that the inertia coefficient shown as w is considered extremely significant in the implementation of particle velocity updates. The adjustment strategy of w can be well expressed using eq 10, which is described as follows 23 where w max and w min are the initial and final inertia coefficients, respectively, t max is the representation of the maximum iteration, and t indicates the current iteration. As can be seen in eq 10, in the initial stage (t = 0), the parameters w max and w t are both considered correspondence of each other. When t is too close to t max , w t gradually decreases toward w min . Furthermore, due to the algorithm converges in the early stages, larger coefficient values are needed for the particles to be faster in velocity, while in later stages, much smaller coefficient values are provided to the particles to gradually enhance their overall stability. Based on the previously provided Formula 7, the positional vector components are assigned to either particles 0 or 1, which is not very suitable for the display of particles with respect to the neighbor. Accordingly, the previous position of the particles is related to the previous community and the current new position can be related to the final community. Therefore, the value of X ij as a part of i is obtained from an integer within the range from 1 to deg(n i ), meaning that X ij ∈ {1, 2, ..., deg(n i )} can essentially improve the PSO and the searching abilities of the system. The particle positional updates are shown in great detail using following eqs 11 and 12: 23 (12) where k = rand × deg(n j ), k ≠ X ij (t), deg(n j ) is the degree of the vertex found in n j , and ρ is the threshold set by the user. Another noteworthy point to mention involves the generated positional value based on the distribution degree, which indicates that if the value of node v j is greater than those of its surrounding neighbors or if sig(V ij (t + 1)) is ever greater than the value of ρ, the neighbors of the nodes must then be transferred to the currently selected neighbors. Therefore, the sigmoid function sig() function found within eq 11 is modified to solve this issue. The particle position is very likely to change through the particle velocity reduction procedure, causing the PSO to gradually converge in the global optimization. 23 In view of the novelty of this study, several experiments were carried out to evaluate the efficiancy of the proposed method. In order to detect the HEA interaction network, the similarity of alloy features must be addressed. The weighted interaction network of the HEAs is shown in Figure 2 , where the weight of the links between the compounds determines the degree of similarity among them. This particular interaction network was initially formed based on the content and structural similarity of the alloy descriptors. Besides, all compositions were linked leading to the formation of a complete graph. The primary interaction network had 90 sets of nodes corresponding to the HEA compounds, and each compound is shown by the number presented in results as defined in Appendix A. As shown in Figure 2 , every compound was linked to every other compound using 3968 edges. The interaction network is an undirected graph where all compounds are connected to each other. The degree of each node in the HEA interaction network is the number of edges that it has to other nodes. The degree distribution shows the probability distribution of degrees over the whole network. The degree distribution diagram for the graph constructed is shown in Figure 3 . As shown in Figure 3 , the average degree based on the diagram is 14.20, which is the probability distribution of these degrees over the network. The next step is illustrated in Figure 4 , an α coefficient value of 0.9 and a threshold of 0.6 are both applied to the network, which eliminates less similar nodes and weak connections. As shown in Figure 4 , the resulting interaction network contains 632 edges to maintain the communications among all 90 nodes. In Figure 4 , the nodes are drawn with the size reflecting the degree of each node. As presented in Figure 5 , the Louvain algorithm has been applied to the HEA interaction network, which extracts 13 communities with an overall quality of approximately 0.71. For these 13 communities shown in Figure 5 , each community is indicated by a unique color, and the compounds in every community are fully connected. As shown in Figure 6 , the communities are also extracted with the optimal PSO algorithm using the HEA interaction network in 100 iterations. The 13 optimal communities by PSO are displayed with an enhanced quality of approximately 0.89 shown in Figure 6 . Because the compositions of each community are not connected to the other communities' compounds, the communities obtained from the optimal PSO algorithm have a higher quality modularity parameter. The analysis of each community shows that the neighbors of every compound have the same phase label and include similar elements. In this paper, the measurement criteria for both community recognition algorithms are the main parameters for assessing the quality of communities. 14 If the number of edges found in a community is not more than a random diagram, it can be concluded that the modularity is zero. Another point to note is the maximum modular value, which is basically obtained when all the internal nodes within a community are connected and there is no external edge to other communities. 14 One of the basic features of modularity is the ability to compare different communities with various methods. Because other algorithms do not necessarily extract the same results, many existing criteria cannot assess the quality of communities. Therefore, using the Louvain's hierarchical bottom-up method, analysis of modularity trends in the process of dividing or merging communities can be investigated. The maximum value of this parameter is considered as the best outcome. Moreover, the modularity of each community is a scalar value between −1 and 1, which essentially measures the density of the community's internal links in comparison with the links found between communities; 15,16 a modularity between 0.3 and 0.7 indicates a strong community. 18 Because this criterion is closer to 1, the communities are high quality. According to the experimental result, the modularity of both optimal algorithms in this paper is upper than 0.7 ( Table 1) . The four communities are, as an example, shown in Figure 7 , for communities extracted based on the developed Louvain algorithm. For example, in the blue community (Figure 7 [4.4, 9.2] , and S c is 1.61, equal for all alloys in the red cluster. The four communities are illustrated in Figure 8 , which is obtained based on the PSO algorithm. As an example, the yellow community included FeCoNiCrCuAl, AlCrCuFeMnNi, the same as the red community ( Figure 7 ) extracted by the Louvain method. Considering the communities in the present article, it can be concluded that these clusters essentially have high quality and accuracy, which can be shown through the modularity criterion. Table 1 shows the obtained results of the modularity criterion of community quality using the developed Louvain algorithm and optimal PSO method. As is shown in Table 1 , the modularity criteria in the developed Louvain algorithm is a constant value equal to 0.71, and it has not changed. Also, in optimal PSO, the modularity parameter is started from 0.87 and after 30 iterations, it increased up to 0.89 and after 60 iterations until 150 iterations, it did not change any more. Therefore, the modularity value by optimal PSO is about 0.89. Finally, the benefits of HEA community detection are discussed. In the field of biology and proteins, the analysis of protein networks is useful because proteins that are in a community have common behaviors and properties. This means that proteins of the same community behave similarly. Therefore, it can be concluded that the purpose of community detection is to have HEAs with similar elements in their composition and phases, such as colored community alloys (Figure 7) . Considering the fact that HEAs in a community have similar properties, one can recognize the properties of HEAs as soon as the community of alloys is determined. For example, the maximal number of elements in the alloy could be predicted according to the communities extracted. Another advantage of community detection in HEA interaction networks is the phase prediction using ML techniques, which is presented in this paper. Because the most extracted communities have the same phase in the HEA network, the phase composition of each compound, which is indistinct, can be anticipated. The unseen compound's phase can be identified by the other compounds that are in the same community. As shown in Table 2 , we looked at the number of phases in the same community. In addition, the precision of phase prediction is shown in Table 2 showing the phase forecast by Leuven and PSO, which is approximately 88% and approximately 93%, respectively. In this study, we have used MATLAB software version R2019a, which has been referred in https://www.mathworks.com/. As described and referenced in Section 3, the data set and source code for this paper are located at the GitHub. The data set referred https://github.com/rghoochannejad/HEAs-Community-Detection/tree/Dataset and the source code is referred at https://github.com/rghoochannejad/HEAs-Community-Detection/tree/main. The data set to this article can be found online at https://doi.org/10.1016/j.mattod.2015. 11.026. RESEARCH The present study aimed to present a novelty for community detection based on ML to detect HEA compounds that behave similarly to each other. At first, the descriptors of each compound are analyzed, and then the similarities among the alloys in terms of phase composition are calculated accordingly. Second, an interaction network of HEAs is established, which could very well be linked to the interaction network. Additionally, both the quality and accuracy of extractive communities and their modularity criteria have been analyzed and investigated thoroughly using two methods of Louvain and PSO algorithms, indicating that the proposed method has a high quality in community detection. This evaluation shows that the detected clusters potentially have robust internal connections among the compounds. Although the obtained results of the current method were indicative of high quality and precision, it does not mean that it cannot be further developed. It is also important to mention that other methods can be implemented very well in future studies to determine the more advanced properties of alloys. The present method can also be developed in larger data sets with maintaining the quality. The use of other ML methods still have great potential for obtaining better results, although these statistical methods and ML algorithms do in fact enhance the speed of the research conduction in the field of materials science. The introduced method is not considered as the only efficient way for community detection, but it can be applied in other areas of materials science leading to the detection of other beneficial alloy compositions that can be used in the industry. Finally, the HEA community detection is useful to finding new common features of similar alloys. Moreover, phase prediction is an action, which can be performed by community detection in this study with a good precision rate. The HEA alloy compositions and descriptors of HEAs are given in Table A1 . 1 Karlsruhe Institute of Technology (KIT) Karlsruhe Institute of Technology (KIT) High-entropy alloy: challenges and prospects Microstructural development in equiatomic multicomponent alloys Nanostructured high-entropy alloys with multiple principal elements: novel alloy design concepts and outcomes Microstructure, thermophysical and electrical properties in Al x CoCrFeNi (0 ≤ x ≤ 2) high-entropy alloys Electrochemical kinetics of the high entropy alloys in aqueous environmentsa comparison with type 304 stainless steel Design of high entropy alloys: A single-parameter thermodynamic rule Machine learning for molecular and materials science Using machine learning and feature engineering to characterize limited material datasets of high-entropy alloys Solidsolution phase formation rules for multi-component alloys On lattice distortion in high entropy alloys Learning similarity with cosine similarity ensemble Using of Jaccard coefficient for keywords similarity Dynamic selection of normalization techniques using data complexity measures Fast algorithm for detecting community structure in networks Community structure in social and biological networks Modularity and community structure in networks Fast unfolding of communities in large networks Visual analysis of contact patterns in school environments MDPCluster: a swarmbased community detection algorithm in large-scale graphs A simple acceleration method for the Louvain algorithm Searching for high entropy alloys: A machine learning approach Phase prediction in high entropy alloys with a rational selection of materials descriptors and machine learning models Overlapping community detection for multimedia social networks A new optimizer using particle swarm theory A new scalable leadercommunity detection approach for community detection in social networks Genetic algorithm for community detection in biological networks A community detection algorithm based on graph compression for large-scale social networks Network community detection: A review and visual survey A novel community detection based genetic algorithm for feature selection A novel link prediction algorithm for protein-protein interaction networks by attributed graph embedding Exploiting Metal-Organic Framework Relationships via Social Network Analysis The authors declare no competing financial interest. The authors acknowledge funding by the Federal Ministry of Education and Research of Germany for the project STREAM ("Semantische Repräsentation, Vernetzung und Kuratierung von qualitätsgesicherten Materialdaten", ID: 16QK11C). Furthemore, the authors acknowledge support by the KIT-Publication Fund of the Karlsruhe Institute of Technology.