key: cord-0790342-kx0zkpxg authors: Azam, Naveed Ahmed; Shurbevski, Aleksandar; Nagamochi, Hiroshi title: An Efficient Algorithm to Count Tree-Like Graphs with a Given Number of Vertices and Self-Loops date: 2020-08-22 journal: Entropy (Basel) DOI: 10.3390/e22090923 sha: e67adeeaca7556ffd9b7d463829008643d630d5b doc_id: 790342 cord_uid: kx0zkpxg Graph enumeration with given constraints is an interesting problem considered to be one of the fundamental problems in graph theory, with many applications in natural sciences and engineering such as bio-informatics and computational chemistry. For any two integers [Formula: see text] and [Formula: see text] , we propose a method to count all non-isomorphic trees with n vertices, [Formula: see text] self-loops, and no multi-edges based on dynamic programming. To achieve this goal, we count the number of non-isomorphic rooted trees with n vertices, [Formula: see text] self-loops and no multi-edges, in [Formula: see text] time and [Formula: see text] space, since every tree can be uniquely viewed as a rooted tree by either regarding its unicentroid as the root, or in the case of bicentroid, by introducing a virtual vertex on the bicentroid and assuming the virtual vertex to be the root. By this result, we get a lower bound and an upper bound on the number of tree-like polymer topologies of chemical compounds with any “cycle rank”. Counting and generation of discrete objects are two fundamental problems in combinatorial mathematics and have many applications in the fields of natural science and engineering, such as computational chemistry and bioinformatics. The counting problem asks to count all possible objects under given constraints. On the other hand, the generation problem asks to list all possible objects under given constraints. One of the notable advantages of the counting problem is that we can know the size of the solution space before generating all solutions. Different kinds of enumeration methods are used to solve counting and generation problems, where branching algorithms and Polya's enumeration theorem are the two most commonly used methods for these problems. In branching algorithms, the computation is performed by following a computation tree, and the required solutions are attained at the leaves of the computation tree. It is important to mention that the branching algorithms can only count all solutions after generating each one of them, and therefore they are inefficient for the problem where we first want to know the size of the solution space before the generation of solutions. The well-known Polya's enumeration theorem [1, 2] is used for counting all distinct objects. The idea of this method is to use the cyclic index of the group of symmetries of the underlying object to develop a generating function, which is then used to count all possible objects. Note that finding the group of symmetries and its cyclic index is a challenging task, which may make the use of Polya's theorem harder for some problems. The drawback of branching algorithms discussed above and the difficulty of using Polya's theorem necessitate the exploration of new enumeration methods to solve counting problems efficiently. For an enumeration method, it is necessary to satisfy the following three conditions: (i) Consider all solutions: The method does not miss any of the required objects; (ii) Avoid duplication: The method does not count and generate isomorphic objects; and (iii) Low computational complexity: The method can count and generate all solutions in low time and space complexity. Designing such a method is not an easy task, because of the underlying symmetries and the computation difficulty for their detection. Counting and generation of chemical compounds have a long history and numerous applications in designing novel drugs [3] [4] [5] [6] [7] [8] and structure elucidation [9] . The problem of counting and generation of chemical compounds can be viewed as the problem of enumerating graphs with given constraints. There are several available chemical compound enumeration tools [10] [11] [12] . We can divide these tools into two classes. One class of enumeration tools treats general graph structures [10, 12] . In the other class, the tools are focused on enumerating some restricted chemical compounds. One such tool is Enumol2 [11] . Enumeration of restricted chemical compounds with specialized tools is more efficient than with the tools which use general graph structures. This led to a new trend of developing efficient enumeration of restricted chemical compounds in the field of chemoinformatics [13] . A polymer is a large molecule with interesting chemical properties consisting of many sub-molecules. From a graph-theoretic perspective, we represent the structure of a polymer with a graph G called polymer topology, possibly with self-loops and multi-edges, such that G is connected and the degree of each vertex in G is at least three [14] . For a chemical graph, we get its polymer topology by repeatedly removing the vertices of degree one and two. For example, the polymer topology of Remdesivir C 27 H 35 N 6 O 8 P Figure 1a , a potential candidate of treatment for COVID-19, is illustrated in Figure 1b . Tezuka and Oike [15] pointed out that a classification of polymer topologies will lay a foundation for the elucidation of structural relationships between different macro-chemical molecules and their synthetic pathways. Different kinds of graph-theoretic approaches have been applied to classify and enumerate polymer topologies [16, 17] . For a connected graph G, possibly with self-loops and multi-edges, the cycle rank is defined to be the number of edges that must be removed to get a simple spanning tree of G. Recently, Haruna et al. [14] proposed a method to enumerate all polymer topologies with cycle rank up to five. Notice that trees with no multi-edges but with ∆ ≥ 0 self-loops have cycle rank ∆ and include all polymer topologies with the said structure. Therefore, it is of interest to count and generate all trees with no multi-edges and a given number of vertices and self-loops. We use dynamic programming (DP) to count all mutually non-isomorphic trees with n vertices, ∆ self-loops and no multi-edges. The basic idea of DP is to partition the original problem into subproblems that satisfy some recursive relations, and the union of their solution sets is equal to the solution set of the original problem. Unlike branching algorithms and Polya's theorem, the main advantage of using the DP is that we can count all non-isomorphic structures without their generation and calculation of their group of symmetries. As an application of our results, we get lower and upper bounds on the number of tree-like polymer topologies with self-loops of a given cycle rank. The rest of the paper is organized as follows: Section 2 reviews some notions and results related to graph theory. Section 3 explains our tree counting method. Section 4 makes some concluding remarks. Throughout this draft, the term graph stands for an undirected graph with no multi-edges and possibly with self-loops unless stated otherwise. Let G be a graph. We denote an edge between two vertices u and v in G by uv (= vu). Let V(G) and E(G) denote the vertex set and edge set of G, respectively. Let s(G) denote the number of self-loops in G. For a vertex v ∈ V(G), we denote by s(v) the number of self-loops on the vertex v. For a vertex v in G, let N G (v) denote the set of vertices incident to v except v itself and the degree deg G (v) of v in G is defined to be |N G (v)|. A graph H with the properties V(H) ⊆ V(G) and E(G) ⊆ E(G) is called a subgraph of G. A simple path between two distinct vertices u, v ∈ V(G) is defined to be a subgraph P of G with vertex set V(P) = {u = w 1 , w 2 , . . . , v = w k } and edge set E(P) = {w i w i+1 | 1 ≤ i ≤ k − 1}. A graph is called a connected graph if there is a path between any two distinct vertices in the graph. A connected component of a graph G is defined to be a maximal connected subgraph H of G, i.e., for any vertex v ∈ V(G) \ V(H) it holds that every subgraph with the vertex set V(H) ∪ {v} is disconnected. By Jordan [18] , any simple tree with n ≥ 1 vertices has either a unique vertex or edge, the removal of which creates connected components with at most (n − 1)/2 or exactly n/2 vertices, respectively. Such a vertex is called the unicentroid, the edge is called the bicentroid, and collectively they are called the centroid of the tree. It is important to note that there exits a bicentroid only for trees with an even number of vertices. A tree with a fixed vertex r is called a rooted tree with root r. Note that any tree can be uniquely viewed as a rooted tree by either regarding its unicentroid as the root, or in the case of a bicentroid, by introducing a virtual vertex on the bicentroid and assuming the virtual vertex as the root. Let H be a rooted tree. Let r H denote the root of H. For any two distinct vertices u, v ∈ V(H), let P H (u, v) denote the unique simple path between them in H. For a vertex v ∈ V(H) \ {r H }, we define the ancestors of v to be the vertices on the path . We call the vertex v a child of p(v). Two vertices with the same parent in H are called siblings. For a vertex v ∈ V(H), let H v denote the subtree of H rooted at v induced by v and its descendants. Two rooted trees T and H are called isomorphic if there exists a bijection σ : For any two integers n ≥ 1 and ∆ ≥ 0, let H(n, ∆) denote a maximal set of mutually non-isomorphic rooted trees with n vertices and ∆ self-loops, and we define h(n, ∆) |H(n, ∆)|. We develop a method to compute for any two integers n ≥ 1 and ∆ ≥ 0, the size h(n, ∆) of a maximal set H(n, ∆) of mutually non-isomorphic rooted trees with n vertices and ∆ self-loops; i.e., we are interested in the following problem: Counting Problem Input: Two integers n ≥ 1 and ∆ ≥ 0. Output: h(n, ∆). We solve this problem by using dynamic programming based on the information of the number of vertices and self-loops in the subtrees rooted at the children of the root of each tree in H(n, ∆). We define the following notions. Let n ≥ 1 and ∆ ≥ 0 be any two integers. For each tree H ∈ H(n, ∆), we define Note that for any tree H ∈ H(1, ∆), it holds that Max v (H) = 0 and Max s (H) = 0. Let m, d ≥ 0 be any two integers. We define Observe that by the definition of H(n, ∆, m ≤ , d ≤ ) it holds that Therefore, from now on, we assume that m ≤ n − 1 and d ≤ ∆. Further, by the definition of H(n, ∆, m ≤ , d ≤ ) it holds that H(n, ∆, m ≤ , d ≤ ) = ∅ (resp., H(n, ∆, m ≤ , d ≤ ) = ∅) if "n = 1" or "n − 1 ≥ m ≥ 1" (resp., otherwise (n ≥ 2 and m = 0)). We define It follows from the definition of H(n, , otherwise (n ≥ 2 and m = 0)). Further we have the following relation: Note that if "n = 1 and d = 0" or "n − 1 ≥ m ≥ 1" (resp., otherwise ("n = 1 and d ≥ 1" or "n ≥ 2 and m = 0")), then by the definition of H(n, ∆, m = , d = ) it holds that H(n, ∆, m = , d = ) = ∅ (resp., H(n, ∆, m = , d = ) = ∅). Furthermore, we get the following relation for H(n, ∆, m = , d ≤ ): Proof. The case (i) follows by Equation (1). The case (ii) follows by Equation (2) and the fact that for The case (iv) follows by Equation (4) and the fact that for d ≥ 1 it holds that H(n, ∆, Next we discuss some boundary conditions for our DP to compute h(n, ∆). For any four integers n − 1 ≥ m ≥ 0, and ∆ ≥ d ≥ 0, it holds that (i) h(n, ∆, 0 = , d = ) = 1 (resp., h(n, ∆, 0 = , d = ) = 0) if n = 1 and d = 0 (resp., otherwise ("n = 1 and d ≥ 1" or " n ≥ 2")); (resp., otherwise (n ≥ 2)); (iii) h(n, ∆, 1 = , d = ) = 1 if " n = 2" or " n ≥ 3 and d = 0"; and (iv) h(n, ∆, 1 = , d ≤ ) = h(n, ∆, 1 ≤ , d ≤ ) = d + 1 if " n = 2" or " n ≥ 3 and d = 0". Furthermore, by Lemma 1(iii) it holds that h(n, ∆, 1 ≤ , d ≤ ) = h(n, ∆, 0 = , d ≤ ) + h(n, ∆, 1 = , d ≤ ). By Lemma 2(ii), we have h(n, ∆, 1 ≤ , d ≤ ) = h(n, ∆, 1 = , d ≤ ). Hence the result follows by Equation (5). By Lemma 2, we can get that h(1, ∆) = 1 and h(2, ∆) = ∆ + 1. Furthermore, Lemma 1(i)-(iv) give recursive relations for h(n, ∆, m ≤ , d ≤ ) and h(n, ∆, m = , d ≤ ) which depend on h(n, ∆, m = , d = ). Thus for n ≥ 3, m ≥ 1, and ∆ ≥ d ≥ 0, our next goal is to develop a recursive relation for h(n, ∆, m = , d = ). For any tree H ∈ H(n, ∆, m = , d = ) and any vertex v ∈ N H (r H ), the subtree H v of H satisfies exactly one of the following three conditions: The residual tree of H belongs to exactly one of the families H(n − qm, . This implies that q ≥ 1. Also, it holds that n − 1 ≥ mq and ∆ ≥ dq. This implies that q ≤ (n − 1)/m with q ≤ ∆/d when d ≥ 1. Let K denote the residual tree of H. By the definition of K it holds that K ∈ H(n − mq, ∆ − dq, n − mq − 1 ≤ , ∆ − dq ≤ ). Furthermore, for each vertex v ∈ N H (r H ) ∩ V(K), the tree H v satisfies exactly one of the conditions (C-2) and (C-3). Now, if there exists a vertex v ∈ N H (r H ) ∩ V(K) such that H v satisfies condition (C-2), then d − 1 ≥ 0, and hence K ∈ H(n − qm, ∆ − dq, m = , min{∆ − dq, d − 1} ≤ ). On the other hand, if condition (C-2) does not hold for This implies that for a fixed integer q in the range given in the lemma, the number of trees K in the family H(n, ∆, m = , d = ) with exactly q subtrees Note that, for m = 1 and d = 0, we have 1 ≤ q ≤ n − 1, and by Lemma 2(ii) it holds that h(n − q, ∆, 0 ≤ , ∆ ≤ ) = 0 (resp., h(n − q, ∆, 0 ≤ , ∆ ≤ ) = 1), if 1 ≤ q ≤ n − 2 (resp., otherwise (if q = n − 1)). This implies that any tree H ∈ H(n, ∆, 1 = , 0 = ) has exactly q = n − 1 subtrees H v ∈ H(1, 0, 0 ≤ , 0 ≤ ), for v ∈ N H (r H ). However, observe that for each integer m ≥ 2 or d ≥ 1, and q satisfying the conditions given in the lemma, there exists at least one tree H ∈ H(n, ∆, m = , d = ) such that H has exactly q subtrees H v ∈ H(m, d, m − 1 ≤ , d ≤ ), for v ∈ N H (r H ). Hence, this and case (a) (resp., case (b)) imply Lemma 4(i) (resp., Lemma 4(ii)). Hence, Lemma 4(iii) and (iv) follow from Lemma 4(i) and (ii), respectively. We design a DP algorithm to compute h(n, ∆) based on the recursive structures of h(n, ∆, m ≤ , d ≤ ), h(n, ∆, m = , d ≤ ) and h(n, ∆, m = , d = ), 0 ≤ m ≤ n − 1 and 0 ≤ d ≤ ∆, as given in Lemmas 1 and 4, where h(n, ∆) = h(n, ∆, n − 1 ≤ , ∆ ≤ ) for n ≥ 1 and ∆ ≥ 0. For any four integers n − 1 ≥ m ≥ 0, and ∆ ≥ d ≥ 0, h(n, ∆, m ≤ , d ≤ ) can be obtained in O(nm(n + ∆(n + d · min{n, ∆}))) time and O(nm(∆(d + 1) + 1)) space. The proof of Lemma 5 follows from Algorithm 1 and Lemma 6. For any two integers n ≥ 1 and ∆ ≥ 0, h(n, ∆, n − 1 ≤ , ∆ ≤ ) can be obtained in O(n 2 (n + ∆(n + ∆ · min{n, ∆}))) time and O(n 2 (∆ 2 + 1)) space. Next, for any four integers n − 1 ≥ m ≥ 0, and ∆ ≥ d ≥ 0, we present Algorithm 1 for solving the problem of calculating h(n, ∆, m ≤ , d ≤ ). In this algorithm, for each integers store the values of h(i, j, k ≤ , p ≤ ), h(i, j, k = , p ≤ ), and h(i, j, k = , p = ), respectively. O(min{n, ∆}) ) if p = 0 (resp., otherwise). Thus from line 5-36, Algorithm 1 takes O(n 2 m) (resp., O(nm∆(n + d · min{n, ∆}))) time if ∆ = 0 (resp., otherwise). Therefore, Algorithm 1 takes O(nm(n + ∆(n + d · min{n, ∆}))) time. The algorithm stores three four-dimensional arrays. When ∆ = 0, for each integer 1 ≤ i ≤ n, . Further, if n is even, then there exist trees with n vertices and a bicentroid. This implies that the number of mutually non-isomorphic trees with n vertices and ∆ self-loops is h(n, ∆, (n − 1)/2 ≤ , ∆ ≤ ) when n is odd. Let n be an even integer. Then any tree H with n vertices, ∆ self-loops and a bicentroid has two connected components, A and B obtained by the removal of the bicentroid such that A ∈ H(n/2, i, n/2 − 1 ≤ , i ≤ ) and B ∈ H(n/2, ∆ − i, n/2 − 1 ≤ , ∆ − i ≤ ) for some 0 ≤ i ≤ ∆/2 , where if ∆ is even then for i = ∆/2, both of the components A and B belong to H(n/2, ∆/2, n/2 − 1 ≤ , ∆/2 ≤ ). Note that for any 0 ≤ i ≤ (∆ − 1)/2 , it holds that Therefore, when ∆ is odd (resp., even), the number of mutually non-isomorphic trees with n vertices, ∆ self-loops, and a bicentroid is such that α = 0 (resp., α = 1). Thus, the number of mutually non-isomorphic trees with n vertices and ∆ self-loops is such that α = 0 (resp., α = 1) when ∆ is odd (resp., even). Moreover, for each 0 ≤ i ≤ ∆, Algorithm 1 also computes and stores h(n/2, i, n/2 − 1 ≤ , i ≤ ) during the calculation of h(n, ∆, (n − 1)/2 ≤ , ∆ ≤ ), and therefore the required result follows from Lemma 6. We implemented the proposed DP algorithm and counting trees with a given number of vertices and self-loops. The experimental results in Table 1 show that the proposed method efficiently counts trees with n vertices and ∆ self-loops. We next give a lower bound and an upper bound on the number of tree-like polymer topologies with self-loops of a given rank. For this we prove the following results. For an integer n ≥ 2, there exists at least one tree-like polymer with n vertices and ∆ self-loops if ∆ ≥ n 2 + 1. Proof. Consider a tree T of n vertices of diameter n 2 such that T contains a path of length n 2 , in which each non-end vertex has degree at least 3. Observe that when n is even, the tree T has exactly n 2 − 1 vertices of degree 3, and hence n − n 2 + 1 = n 2 + 1 vertices of degree less than 3. When n is odd, the tree T has n 2 − 3 vertices of degree 3 and one vertex of degree 4. Thus, in this case, the number of vertices of degree less than 3 is n − n 2 + 2 = 2 n 2 − 1 − n 2 + 2 = n 2 + 1. This implies that T can be transformed into a polymer with n 2 + 1 self-loops by assigning a self-loop to each vertex of degree less than 3. Hence, n 2 + 1 self-loops are sufficient to get a tree-like polymer with n vertices. For two integers n ≥ 1 and ∆ ≥ 0, let t(n, ∆) denote the number of trees with n vertices and ∆ self-loops. For r ≥ 1, let p(r) denote the number of tree-like polymers with self-loops and no multi-edges of rank r. Observe that a tree with n vertices and k self-loops at each vertex is a polymer with n vertices of cycle rank kn. From this fact and Lemma 7 it holds that ∑ n,k∈Z + :nk=r t(n, 0) ≤ p(r) ≤ ∑ n∈Z + : n 2 +1≤r t(n, r). This paper presented an efficient method to count the number of all mutually non-isomorphic trees with a given number of vertices and self-loops. The proposed method is based on dynamic programming where we count the number of all mutually non-isomorphic rooted trees with a given number n of vertices and ∆ self-loops in O(n 2 (n + ∆(n + ∆ · min{n, ∆}))) time and O(n 2 (∆ 2 + 1)) space. As an application of our results, we gave lower and upper bounds on the number of tree-like polymer topologies with a given cycle rank. This is an interesting application of DP to objects such as trees, and offers the advantage of getting the size of the entire solution space at low computational complexity without explicitly generating each object. An interesting direction for future research is to efficiently generate all mutually non-isomorphic trees with a given number of vertices and self-loops by using the result from the developed counting method. Further, another possible extension of this research is to count and generate all mutually non-isomorphic tree-like polymer topologies with a given number of vertices and self-loops. Kombinatorische anzahlbestimmungen für gruppen, graphen und chemische verbindungen 970 million druglike small molecules for virtual screening in the chemical universe database GDB-13 A method for the inverse QSAR/QSPR based on artificial neural networks and mixed integer linear programming A novel method for the inverse QSAR/QSPR to monocyclic chemical compounds based on artificial neural networks and integer programming A novel method for inference of chemical compounds of cycle index two with desired properties based on artificial neural networks and integer programming De novo generation of hit-like molecules from gene expression signatures using artificial intelligence Scaffold-based molecular design with a graph generative model Small molecule identification with MOLGEN and mass spectrometry MOLGEN+, a generator of connectivity isomers and stereoisomers for molecular structure elucidation OMG: Open molecule generator Chemoinformatics: A view of the field and current trends in method development On the enumeration of polymer topologies Topological polymer chemistry Some applications of graph theory to the study of polymer configuration The dimensions of chain molecules containing branches and rings This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution Funding: This research is partially funded by JSPS KAKENHI Grant Number 18J23484. The authors declare no conflict of interest. Algorithm 1 DP based counting algorithm for h(n, ∆, m ≤ , d ≤ ) Input: Integers n − 1 ≥ m ≥ 0 and ∆ ≥ d ≥ 0. Output: h(n, ∆, m ≤ , d ≤ ).h [1, j, for i := 3, 4, . . . , n do for j := 0, 1, . . . , ∆ do for k := 1, 2, . . . , min{i, m} do for p := 0, 1, . . . , min{j, d} do if p = 0 and k = 1 thenTheorem 1. For any two integers n ≥ 1 and ∆ ≥ 0, the number of non-isomorphic trees with n vertices and ∆ self-loops can be obtained in O(n 2 (n + ∆(n + ∆ · min{n, ∆}))) time and O(n 2 (∆ 2 + 1)) space.Proof. By Jordan [18] , we can uniquely consider any tree as a rooted tree by either regarding its unicentroid as the root, or in the case of a bicentroid, by introducing a virtual vertex on the bicentroid and assuming the virtual vertex as the root of the tree. By the definition of a unicentroid, the number of mutually non-isomorphic trees with n vertices, ∆ self-loops and a unicentroid is h(n, ∆, (n −