key: cord-0047182-a22km6my authors: Bouyukliev, Iliya title: The Program Generation in the Software Package QextNewEdition date: 2020-06-06 journal: Mathematical Software - ICMS 2020 DOI: 10.1007/978-3-030-52200-1_18 sha: 75888fd30df27a684d47c478dbf9379784d3bbc7 doc_id: 47182 cord_uid: a22km6my This paper is devoted to the program Generation which is a self-containing console application for classification of linear codes. It can be used for codes over fields with [Formula: see text] elements and with wide-range parameters. The base of the implemented algorithm is the concept of canonical augmentation. The classification problem of linear codes is important and difficult. Computer algorithms have been used to find the best linear codes for given length and dimension. There are many computational results for classification of linear codes over finite fields (see for example [3, 12, 13] ), but there is not much related software available (for example Magma [4] , GUAVA [1] , Orbiter [2] , Q-Extension [5] ). Our paper is a contribution to this research. The system QextNewEdition is a software package consisting of several user interface programs for classification of linear codes over finite fields, along with the necessary supporting functions. Here we describe the program Generation for classification of linear codes over fields with q < 8 elements and with wide-range parameters. Despite its simple interface, it allows a lot of restrictions on the considered codes. It gives the possibility to classify not only codes with fixed parameters but also all codes with a given length n and dimensions from k 0 to k for given integers 1 ≤ k 0 ≤ k. To use the program, a knowledge of a programming language is not needed. This program is supported by many different basic functions which implement complicated (in some cases) algorithms and has specific data organizations. The most important of these functions give the minimum distance, a list of codewords with weights smaller than a given integer (for large dimensions Brouwer-Zimmerman algorithm is applied), canonical form, automorphism group. This paper tries to give answers to the following questions: -What type of algorithms are implemented and why they allow parallel implementation? -What is the data organization? -What is the difference with the previous version? -How can the interface be used to enter parameters and what type of restrictions are possible and suitable for different cases? -What can be expected from the program? The remaining part of the paper is organized as follows. Section 2 contains the needed definitions. In Sects. 3 and 4 we describe the main algorithms in the program Generation and its basic functions and data organization, respectively. Sections 5 and 6 answer the questions how we can use Generation for code classification and what can be expected from the program. Finally, we draw a brief conclusion in Sect. 7. In this section we present some definitions following [11] . Let F n q denote the vector space of n-tuples over the q-element field F q . A q-ary linear code C of length n and dimension k, or an [n, k] q code, is a k-dimensional subspace of F n q . A k × n matrix G whose rows form a basis of C is called a generator matrix of C. The number of nonzero coordinates of a vector x ∈ F n q is called its Hamming weight wt(x). The Hamming distance d(x, y) between two vectors x, y ∈ F n q is defined by d(x, y) = wt(x − y). The minimum distance of a linear code C is A q-ary linear code of length n, dimension k and minimum distance d is said to be an [n, k, d] q code. Let A i denote the number of codewords in C of weight i. Then the n + 1-tuple (A 0 , . . . , A n ) is called the weight spectrum of the code C. An inner product (x, y) of vectors x, y ∈ F n q defines orthogonality: Two vectors are said to be orthogonal if their inner product is 0. The set of all vectors of F n q orthogonal to all codewords in C is called the orthogonal code C ⊥ to C: It is well-known that the code C ⊥ is a linear [n, n−k] q code. If C ⊆ C ⊥ , the code C is called self-orthogonal. Self-orthogonal codes with n = 2k are of particular interest, then C = C ⊥ and these codes are called self-dual. The program Generation has an option for classification of self-orthogonal codes over fields with 2, 3 and 4 elements. In the binary and ternary cases, we consider Euclidean inner product defined by u · v = u 1 v 1 + u 2 v 2 + · · ·+ u n v n ∈ F q for u = (u 1 , u 2 , . . . , u n ), and v = (v 1 , v 2 , . . . , v n ). For q = 4 the considered inner product is the Hermitian inner product defined by u · v = u 1 v 2 1 + u 2 v 2 2 + · · · + u n v 2 n ∈ F 4 where u = (u 1 , u 2 , . . . , u n ), v = (v 1 , v 2 , . . . , v n ) ∈ F n 4 . Two linear q-ary codes C 1 and C 2 are said to be equivalent if the codewords of C 2 can be obtained from the codewords of C 1 via a sequence of transformations of the following types: 1. permutation of coordinates; 2. multiplication of the elements in a given coordinate by a nonzero element of F q ; 3. application of a field automorphism to the elements in all coordinates simultaneously. This equivalence may not preserve self-orthogonality over fields with q ≥ 5 elements, for that reason we exclude the classification of self-orthogonal codes over fields with 5 and 7 elements. An automorphism of a linear code C is a sequence of such transformations that maps each codeword of C onto a codeword of C. The automorphisms of a code C form a group, called the automorphism group of the code and denoted by Aut(C). Practically, we will identify a linear code with its generator matrix. We consider the code classification problem as follows. Given a set of parameters q, n, k, d find generator matrices of all inequivalent [n, k, d] q-ary codes. In geometrical aspect, we can define an [n, k, d] q code C as a multiset of n points in This definition is equivalent to the one given in [9] . Codes which are equivalent belong to the same equivalence class. Every code can serve as a representative for its equivalence class. We use the concept for a canonical representative, selected on the base of some specific conditions. Let G be a group that acts on a set Ω. This action defines an equivalence relation in Ω as two elements X, Y ∈ Ω are equivalent, X ∼ = Y , if they belong to the same orbit. A canonical representative map for this action is a function ρ : Ω → Ω that satisfies the following two properties: We take Ω to be the set of all linear [n, k] q codes. For a code C ∈ Ω, the code ρ(C) is the canonical form of C with respect to ρ. Analogously, C is in canonical form if ρ(C) = C. The code ρ(C) is the canonical representative of its equivalence class with respect to ρ. Let γ C : C → ρ(C) maps the code C to its canonical form, or γ C (C) = ρ(C). According to the definition given above, γ C induces a permutation of the coordinates which we denote by π C . The permutation π C defines an ordering of the coordinates and the orbits of C with respect to the action of Aut(C). To find the canonical form and the automorphism group of C, we need a sufficiently large set M (C) of codewords of the code C (we will call it sufficient set) with the following properties: This set is not uniquely determined. Usually, we can accept as a sufficient set the set of all codewords with minimum weight. If the rank of this set is smaller than the dimension of the code, a larger set of codewords is used. In the program Generation any linear code is represented by its generator matrix. The program has two main parts. The first one implements a construction method for generator matrices. This method is based on row by row backtracking with k×k identity matrix as a fixed part. In the m-th step the considered matrices have the following form where the columns of the matrix A m are lexicographically ordered, and X is the unknown part of G. In that case any vector v m of length n − k which fits for the m-th row of A m strictly depends on one of the vectors put on the previous rows. Only the last two vectors give [11, 3, 6] even codes (the other four codes have minimum distance ≤ 4). In the same way we consider the second and the third vectors in T 2 . By exhausting all possibilities in this way, we get all inequivalent codes we are looking for. As a result, we obtain only one (up to equivalence) binary even [11, 3, 6] code. This example explains the construction part given above. There are several advantages of this approach: -the number of equivalent candidates in the search tree becomes smaller, -the construction of generator matrices is very effective, -it allows us to consider codes with relatively large length -more than a hundred in the binary case. Moreover, this construction is also appropriate to the other part of the program that determines inequivalent objects. In fact, this has been a key idea in the package Q-Extension (more detail for this approach and the implementation can be found in [8] ). To the rest of construction part we add functions for minimum and dual distances, orthogonality check and restrictions on weights. The second part of the program is related to the identification of nonequivalent objects in the whole generation process. The general method which we apply is known as canonical augmentation [13, 14] . Description for this specific case is given in [6] . The basic idea is to accept only non-equivalent objects without an equivalence test (in some cases with a small number of tests) at every step of the generation process. Instead of an equivalence test, a canonical form of the objects and a canonical ordering of orbits are used. So for every vector v m in the construction that fits as a m-th row (we call these vectors possible solutions), the algorithm decides acceptance (possible solution becomes real ) or rejection. In this model, the different branches of the search tree are independent and therefore it is easy for parallel implementation. The main algorithms are developed by the basic functions of the package. Some of them are presented in the next section. To present the basic functions used in the program Generation, we have to give some information for the whole package QextNewEdition. It contains several hierarchically ordered modules with functions written in C/C++. Each module depends on the previous one and makes it possible to realize the functions of the next one. The interface programs (like Generation) stays on the top of this hierarchy. The first module deals with the safe allocation of dynamic memory for the whole package. The main structures of the package are matrices (two dimensional arrays) of different types. These structures are used to store generator matrices, check matrices, sets of generator matrices with non-intersecting information sets, sets of all or some of the codewords of considered linear codes, sufficient sets, their corresponding binary matrices, the canonical forms and so on. The concept of the package is to investigate linear codes one by one (in consecutive execution). Therefore, it is convenient to use some global variables. The size of the dynamic variables for different types of data related to linear codes changes when the main function considers the next object. In the beginning, the first module allocates memory for the first object. If this memory is not enough for some of the following objects, it allocates more memory by default. The second module consists of functions related to the rank of a system of codewords, information set, orthogonal code, construction of different generator matrices (with non-intersecting information sets), etc. The following module is related to functions for generating some or all codewords. They give minimum distance, weight spectrum, sufficient set of codewords, coset leaders, etc. We use two general approaches for calculating the weight characteristics of linear codes. One of them is exhaustive search (for small dimensions only) and the other is based on Brouwer-Zimmerman algorithm. Many of the functions check if the minimum distance, weight spectrum and other distance parameters are suitable. A very important part of the package is the module for canonical form and automorphism group. The central object here is the (0,1)-matrix or bipartite graph. The main function in this module obtains canonical form, generators of the automorphism group and orbits of rows and columns (and their ordering) of a given binary matrix. For a linear code C, we use sufficient set M (C) of codewords and invertible mapping of this set to a binary matrix T (C) (see [7] ). If two codes C 1 and C 2 are equivalent their corresponding binary matrices T (C 1 ) and T (C 2 ) are isomorphic. Moreover, the automorphism groups of C and T (C) are isomorphic, too. In this section, we show how the program Generation can be used with examples. Let us consider the binary codes with parameters [24, 7, 10] . It is known that there are 6 inequivalent codes with these parameters [12] . After starting, Generation gives us the following by default: Generating Linear Codes (Generation v1.1 QextNewEdition first module) Generate [24,12,8;2] Linear codes With weights: wt1= 8, wt2= 12, wt3= 16, wt4= 20, wt5= 24, Proportional columns: d2->800, d3->800, d4->800, d5->800, d6->800, d7->800, d8->800, d9->800, d10->800, d11->800, d12->800, To obtain the generator matrices of all 6 inequivalent binary [24,7,10] codes in the file with name 24_7_10.2 we just have to choose point 2, enter the parameters and start the calculations choosing 1. The generator matrices of all inequivalent codes obtained in the generation process (157 in our case) will be written in a file with name 24_7_10.2h. They correspond to all real solutions for k = 1, 7. The table of optimal codes [10] indicates that binary linear codes with parameters [22, 6, 10] do not exist. Therefore binary [24, 7, 10] codes do not have two proportional coordinates and can be obtained from [23, 6, 10] codes. That is why we can use restrictions for proportional coordinates (point 4) as follows: up to 4 proportional coordinates in dimension five, 2 in dimension six and no (enter 1) in dimension 7. In this case, the calculation time is 25% less. If we are interested only in codes with dual distance 4, we can use point 5. The program looks for the codes with dual distance 2 in dimension five and 3 in dimension six. The number of inequivalent codes in the file 24_7_10.2h becomes smaller -146. The program has two options for restrictions on possible weights of the codes under investigation. With point 2 we can set an integer w which divides all the weights. After that with point 3 we can choose only some of the weights between d and n, divisible by w. The restriction for self orthogonality works only for codes over fields with 2, 3 and 4 elements. In the general case, when the program have to generate [n, k, d] codes, the codes with parameters For optimal search of all codes with fixed n and d, and dimensions from k min to k max , we use point 6. In that case the results will be written in files with extensions "2b" and "2bh". For example, the search trees for constructing [25, 6, 10] , [25, 5, 10] and [25, 4, 10] self-orthogonal binary codes have 226, 289 and 99 nodes, respectively (614 summary). If we look for the self-orthogonal codes with the same parameters simultaneously by point 6, the nodes of the corresponding search tree are only 430. In this section we present some examples. To obtain the results, we use one thread of Intel Xeon E5-2620 V4 processor. For natural reasons, the calculational time in the case of relatively small parameters depends on the size of the search tree. For given n, k and q the search tree strictly depends on the restrictions for minimum distance, self-orthogonality, possible weights, dual distance and proportional columns. In the case of codes with large length, the number of objects that need to be checked for acceptability increases exponentially. That is why, even with a small search tree, the computational time grows. The following table contains classification results for linear codes with different parameters and restrictions. The first and second columns show the parameters and the used restrictions, respectively. Column 3 contains the execution times in seconds, and the number of equivalent codes in each case is given in the fourth column. In this paper, we present the first interface program Generation of the software package QextNewEdition. There are freely available versions for Windows and Linux on the webpage http://www.moi.math.bas.bg/moiuser/~data/Software/QextNewEdition The package contains two more programs, namely LengthExtension and DimExtension which will be available on the same webpage. The package QextNewEdition is a successor of Q-Extension [5] . The aim of both systems is classification of linear codes with different properties and restrictions. They share some ideas in the development of algorithms and have similar interface. The package Q-Extension is written in Pascal (Delphi) with static variables depending on the size of the field. QextNewEdition is a new software system, written in C/C++, designed to be widely portable and suitable for parallelization. All basic functions are rewritten, looking for optimal implementation. The main concept and used methods for classification are different. The classification here is based on canonical augmentation as opposed to Q-Extension where the used method is isomorph-free generation via recorded objects [13] . There are many differences between QextNewEdition and Q-Extension. We list some new points: -Programming language is C/C++ which make program portable and proper for MPI parallelization. -Dynamically allocated variables are used. This means that the size of the input data depends only on the hardware and the range of the program can easily be extended to larger fields. -The implementation of the generating part presented in Sect. 3 is different. In the beginning, it was by nested loops and now it is based on specific integer partitions [8] . -The algorithm for canonical form is optimized by additional invariants for the partitioning process. -The representation of the sufficient set as a binary matrix now is much more flexible [7] . With these features, the program Generation is a powerful tool for classifying linear codes. Classifying discrete objects with orbiter Error-Correcting Linear Codes -Classification by Isometry and Applications The Magma algebra system I: The user language What is Q-Extension? Computer classification of linear codes Representing equivalence problems for combinatorial objects About an approach for constructing combinatorial objects Codes and projective multisets Bounds on the minimum distance of linear codes and quantum codes Fundamentals of Error-Correcting Codes Optimal binary linear codes of length ≤ 30 Classification Algorithms for Codes and Designs Isomorph-free exhaustive generation Acknowledgements. We are greatly indebted to the unknown referees for their useful suggestions.