key: cord-0278524-318bin36 authors: Yang, Chin Jian; Russell, Joanne; Ramsay, Luke; Thomas, William; Powell, Wayne; Mackay, Ian title: Overcoming barriers to the registration of new varieties date: 2020-10-09 journal: bioRxiv DOI: 10.1101/2020.10.08.331892 sha: b7fe81125971fa01ddbe991126365590a0b1fdbd doc_id: 278524 cord_uid: 318bin36 Distinctness, Uniformity and Stability (DUS) is an intellectual property system introduced in 1961 by the International Union for the Protection of New Varieties of Plants (UPOV) for safeguarding the investment and rewarding innovation in developing new plant varieties. Despite the rapid advancement in our understanding of crop biology over the past 60 years, the DUS system has not changed and is still dependent upon a set of morphological traits for testing candidate varieties. As the demand for more plant varieties increases, the barriers to registration of new varieties become more acute and thus require urgent review to the system. To highlight the challenges and remedies in the current system, we evaluated a comprehensive panel of 805 UK barley varieties that span the entire history of DUS testing. Our findings reveal the system deficiencies and provide evidence for a shift towards a robust genomics enabled registration system for new crop varieties. The pressure on the current DUS system stems from multiple issues. As more new 42 varieties arise, the DUS trait combinatorial space becomes more limited and requires additional 43 effort in breeding unique DUS trait combinations. Many DUS traits have low heritabilities 7 which 44 means more trait variability due to environmental fluctuations and limited reliability of DUS trait 45 scores outside of the trial environment. While the current system is well established for major 46 crops, it is hard to implement in minor or orphan crops since the traits for DUS are hard to 47 determine 8, 9 . Furthermore, the current DUS system is largely designed for inbred species or 48 varieties which is hardly practical in outbreeding species or hybrid varieties 10 . Lastly, there is also 49 6 thus lower power. Another reason is because some traits are not segregating or rare in either 125 spring or winter germplasm. Examples of these traits are: 3 (lowest leaves: hairiness of leaf 126 sheaths), 12 (ear: number of rows), 23 (grain: husk), 26 (grain: hairiness of ventral furrow) and 27 127 (grain: disposition of lodicules). A major QTL for trait 3 is tightly linked to Vrn-H2, a major 128 vernalisation locus 20 while trait 12, 23 and 27 are largely monomorphic in the UK barley breeding 129 pool due to preferences for two-rowed barley with hulled grains and clasping (collar type) 130 lodicules. In comparison with a previous work on DUS traits GWAS 7 , the number of loci increased 131 from 16 to 32 with 12 loci in common. 132 In accordance to the UPOV guidelines 17 , molecular markers can only be used in DUS if 133 they confer direct relationship with the DUS traits. This might work well with those 14 traits with 134 known major loci, although there is a risk of ignoring effects from minor or exotic loci. One such 135 example would be anthocyanin-related traits in flag leaf (trait 4) and awn (trait 8), where 136 anthocyaninless 1 (ant1) and ant2 are segregating in winter but not spring barley varieties in the 137 UK ( To extend beyond locus-specific markers, a small marker set for DUS has been 145 proposed 21 although our evaluation showed limited distinguishing power. By simulating F6 146 progeny from known parent pairs, we compared the marker set from these simulated progeny to 147 their parents, actual variety (progeny of the parent pairs) and other simulated progeny. While most 148 of these simulated progeny remained unique in older varieties, this is not true for newer varieties 149 ( Fig. 3a, 3b & S5) , especially in spring barley. For example, LG Goddess matched perfectly with 150 7.5% of the simulated progeny, and its parents Octavia and Shada matched perfectly with 8.0% 151 and 7.8% of the simulated progeny respectively (Table S6) . Furthermore, 88.4% of the simulated 152 progeny have over 1% probability of matching with other simulated progeny (Table S6) . A small 153 marker set for DUS is problematic in a crop in which genomic diversity progressively gets narrower 154 over time. Of the total 39 markers 21 , only 4 to 22 markers are segregating between the parents 155 analysed. Besides, these markers are not randomly distributed as there are some in strong 156 linkage disequilibrium (LD) which would not informative. 157 7 As a follow-up, we investigated the number of markers required for proper separation of 158 varieties in DUS and determined that approximately 500 -1,000 markers are likely the minimum 159 ( Fig. 4a) . By comparing the Manhattan distances calculated from all 28 DUS traits against a series 160 of randomly sampled markers, the correlation between these two distances begins to plateau at 161 about 500 -1,000 markers. The correlation maxes out at about 0.60, which is similar to the value 162 previously observed by Jones et al. 10 . This is not surprising given that the correlation depends on 163 the DUS trait heritabilities. Manhattan distances determined from DUS traits with high heritabilities 164 (h 2 > 0.50) showed stronger correlation with Manhattan distances from the marker data than DUS 165 traits with low heritabilities (h 2 < 0.50) (Fig. 4a) . Additionally, the distribution variances stabilise at 166 similar range too (Table S7) , which affirms that any marker set smaller than 500 markers is 167 insufficient. 168 Given the various issues we have described in the DUS system so far, the remaining 169 option is to use genomic markers. There are multiple ways to implement genomic markers in 170 DUS, and we will provide a simple example here using Manhattan distances, which is one of 171 many measures of dissimilarities among varieties. Under haploid marker coding of 0 and 1, 172 Manhattan distance between any two varieties is equivalent to 2 × (1 -similarity) where similarity 173 is measured as the proportion of exact marker matches between two varieties. Similar to the 174 current DUS system, we will need a reference panel (common knowledge varieties set) and the 175 genomic marker data for the reference panel. To demonstrate how genomic markers work in DUS, we simulated 1,000 F6 and BC1S4 180 progeny from two pairs of parents in spring barley. The first parent pair is Propino and Quench, 181 which has a distance of 0.20 and thus represents the "low" distance parents. The second parent 182 pair is Riviera and Cooper, which has a distance of 0.59 and thus represents the "high" distance 183 parents. Given an arbitrary minimum threshold of 0.05 for distinctness, 13.0% of F6 progeny and 184 59.6% of BC1S4 progeny from the low parents would be rejected for lack of distinctness, while 185 none of the F6 progeny and 4.9% of the BC1S4 progeny from the high parents would be rejected 186 ( Fig. 4c) . 187 Another important consequence of using genomic markers in DUS is the regulation of 188 essentially derived varieties (EDVs). As of current standard, the definition of EDVs is unclear and 189 it often involves complicated and expensive court proceedings to determine EDVs 22 . Furthermore, 190 8 the information on whether a market variety is an EDV is not generally disclosed to public, and it 191 is possible that no EDV ever makes it into the market. With genomic markers, any varieties 192 submitted for DUS evaluation that failed to pass the minimum distance threshold would be 193 considered for EDVs. If these varieties demonstrate justifiable VCUs compared to the common 194 knowledge varieties, then EDVs should be granted. Curiously, among the varieties in our 195 reference panel, four varieties did not pass our arbitrary minimum threshold of 0.05 (Fig. 4b) . Being in the genomic era, we have access to great genomic resources like the barley 50k 219 iSelect SNP array 29 for application in DUS. As an example, we have illustrated how genomic 220 markers can be used to evaluate distinctness, uniformity and stability of new varieties (Fig. 5) . 221 Instead of relying on morphological trait differences from common knowledge varieties in the 222 reference panel, we can determine a distance threshold based on genomic markers that would 223 9 allow us to decide if a variety is sufficiently distinct. By sampling multiple seeds (or multiple pools 224 of seeds), we can also test for uniformity based on the distances among these seeds or pools. 225 For instance, uniformity could be defined such that the distances among the seeds from a 226 candidate variety cannot be more than its distances with common knowledge varieties. We can 227 quantify stability by measuring the genomic heterogeneity of the variety seed pool since a fully 228 homogenous seed pool ensures genomic stability in subsequent generations of seed production. 229 In an inbred species, this can be achieved by checking for genomic heterogeneity between seeds 230 in the initial DUS application and final commercial seed lot. In an outcrossing species, this could 231 be done by evaluating the change in allele frequencies between the initial and final seed lots after 232 accounting for possible genomic drift. Overall, genomic markers provide a robust and effective 233 option for improving DUS testing. 234 Discussion 236 Our analysis on the current DUS system using UK barley as an example has shown that 237 morphological traits are not fit for DUS purposes. The trait combinatorial space gets narrower 238 over time, and is likely worse in crop species with limited genetic variation. DUS traits with low 239 heritabilities are not replicable outside the DUS trial and hence these traits have limited meaning 240 to variety fingerprinting. As a consequence, there is no easy way for farmers to verify the identities 241 of the varieties sown in their field. Genetic correlations between DUS and yield are detrimental to 242 crop breeding due to the constraints imposed on selecting for higher yield and away from the 243 common DUS trait combinatorial space. Besides, the current DUS process is time consuming and 244 costly, which is non-ideal for small breeding companies. Unfortunately, alternatives like trait-245 specific markers and small marker sets are inadequate for DUS. 246 It is evident that the current DUS system is due for an update and we have shown that 247 genomic markers are the best way forward. Aside from being able to address various 248 shortcomings in the current system, it also opens up opportunities for bringing molecular editing 249 into breeding practices and clarifies the boundary between new and essentially derived varieties. 250 Genome-edited varieties can be traced back, while remain superior in agronomic performances, 251 to their original non-edited varieties. Given the role of the DUS system in granting varietal rights, 252 it is the perfect setup for addressing the lack of genetic diversity in modern crop which threatens 253 food security 46 . This, obviously, is only possible with genomic markers. In addition, with the 254 impacts from Brexit (in the UK and EU) and Covid-19 looming for an unforeseeable future, there 255 may be heavy restrictions on seed movement that impede the process of getting varieties into the 256 market. Such limitations are non-ideal since only a small fraction of the candidate varieties ends 257 up passing the DUS test while the rest ends up as a waste of time and money. With genomic 258 markers for DUS, it is trivial for testing centres to either receive DNA samples from breeders or 259 marker data from another testing centre in a different country. Lastly, genomic DUS will unlock a 260 new opportunity for an improved seed certification system to better protect breeders, farmers and 261 the SASA data. Therefore, we used the NIAB data for our primary analyses and the SASA data 280 for only comparative analysis between the two. While we attempted to source as many varieties 281 with DUS trait data as possible, we did not have an exhaustive list of all UK barley varieties to 282 date as we were limited to those which are available publicly. 283 Marker data from the UK national list were obtained from the IMPROMALT project 284 analysis that requires separation of the data by seasonal types. The trait and marker data were 294 merged by their AFP numbers. Unlike the variety names that are occasionally recycled, the AFP 295 numbers are unique for each variety. They are also ordered by date of submission for DUS testing. 296 Overall, we had 710 varieties that are in common between the DUS trait and marker data, which 297 serves as our primary data for analysis. 298 DUS trait comparative analysis. We calculated the DUS trait discrepancies between NIAB and 299 SASA by taking the absolute values of the trait score differences. Most of the traits were scored 12 on a scale with an increment of 1, except for trait 3, 23 and 26 which were scored as either 1 or 301 9. To maintain a fair comparison across all traits, we converted those trait scores from 1 or 9 to 1 302 or 2. All DUS trait comparisons were performed only when there is complete pairwise data 303 between NIAB and SASA. 304 Additionally, we subset the DUS trait data into spring and winter barley respectively to 305 calculate the change in trait combinatorial space over time. This analysis was done by first sorting 306 the barley varieties by their AFP number. Next, we computed the rolling mean of 20 varieties' 307 Manhattan distances using dist function in R 47 with an increment of one variety at a time. The 308 lower the mean distance, the narrower the trait combinatorial space. 309 Univariate mixed linear model analyses of DUS traits. By leveraging the genomic relationship 310 among the varieties, we partitioned the DUS phenotypic variance into additive genetic and 311 residual variances using mmer function in the "sommer" package 48 in R 47 . Briefly, the mixed model 312 is described as y = Xβ + g + e. For any DUS trait with n varieties, y is an n×1 vector of DUS trait, 313 X is an n×n incidence matrix relating to fixed effects β, β is an n×m matrix of m fixed effects, g is 314 an n×1 vector of random additive genetic effect and e is an n×1 vector of residual effect. The m 315 fixed effects included intercept, year of entry into national listing, and seasonal type, although the 316 last effect was dropped when spring and winter barley datasets were analysed separately. The 317 random additive genetic effect g was restricted to a normal distribution of mean 0 and variance 318 σg 2 A, where σg 2 is the additive genetic variance and A is an n×n additive genetic relationship 319 matrix calculated using A.mat function in "sommer". Similarly, the residual effect followed a normal 320 distribution of mean 0 and variance σe 2 I, where σe 2 is the residual variance and I is an n×n identity 321 matrix. For every DUS trait, we fitted the model using data from the spring barley dataset (n=370), 322 winter barley dataset (n=335) and combined dataset (n=710). We then extracted the genetic (σg 2 ) 323 and phenotypic (σg 2 + σe 2 ) variances and calculated heritabilities as σg 2 /( σg 2 + σe 2 ). 324 Calculating best linear unbiased estimates (BLUEs) for yield. We obtained the raw dry matter 325 yield data for spring barley from Mackay et al. 19 and the Agriculture and Horticulture Development 326 Board (AHDB) website for 509 varieties that were included in the VCU trials from 1948 to 2019. 327 These varieties were trialled in multiple environments and years. The dry matter yield data from 328 1983 and onwards were taken from fungicide treated trials, and the data prior to that were taken 329 from "best local practice" trials which meant that fungicide usage was left to the discretion of 330 managers at each trial. To account for this difference, we created a "management" variable. 331 Varieties from 1983 and onwards were scored as 1 and the varieties prior to that were scored as 332 0 for this variable. 333 The raw dry matter yield data were fitted into a mixed linear model using lmer function in 334 the "lme4" package 49 in R 47 . Briefly, the raw dry matter yield was set as the response variable, 335 with variety as fixed effects, and management, management-by-year, management-by-year-by-336 variety and management-by-year-by-location as random effects. Next, we calculated the best 337 linear unbiased estimates (BLUEs) for yield using the emmeans function in "emmeans" package 50 for DUS trait, σe2 2 is the residual variance for yield, ρe is the residual correlation between DUS 358 trait and yield and I is an n×n identity matrix. From the bivariate mixed models, we extracted the 359 genetic correlation as ρg and phenotypic correlation as (ρgσg1σg2 + ρeσe1σe2)/((σg1 2 + σe1 2 )(σg2 2 + 360 σe2 2 )) ½ . 361 GWAS on DUS traits. We performed GWAS on each DUS trait using data from the spring barley 362 dataset (n=370), winter barley dataset (n=335) and combined dataset (n=710). We used a similar 363 model as the univariate mixed linear model for GWAS as provided by the GWAS function in 364 14 "sommer" package 48 in R 47 . Briefly, the GWAS model is y = Xβ + miki + g + e, where mi is an n×1 365 vector of marker genotype, ki is the marker effect and i is the marker index from one to the total 366 number of markers. The other terms are the same as previously described in the univariate mixed 367 linear model. We evaluated the GWAS results for significant markers by using a threshold of false 368 discovery rate (FDR) of 0.05, as determined from qvalue function in "qvalue" package 51 in R 47 . 369 Since barley is an inbreeding species, linkage disequilibrium (LD) can complicate GWAS results 370 especially when there is a highly significant marker. Therefore, for any trait where the marker 371 significance exceeded -log10p of 10, we performed a follow-up GWAS with the most significant 372 marker as a fixed effect. The re-evaluation threshold was chosen as 10 to minimise the number 373 of GWAS runs as we were only interested in identifying any potential peaks that are masked due 374 to major segregating loci. If any of the markers on other chromosomes were initially significant 375 due to LD with the causative locus, then these markers should drop below the significance 376 threshold in the second GWAS. 377 Evaluation on the usefulness of small marker set in DUS via simulation. To evaluate the 45 378 DUS markers in Owen et al. 21 , we simulated these markers in the progeny of known parent pairs. 379 We used 39 out of the 45 markers for simulation as six of the markers were either absent or low 380 quality in our dataset. Based on the pedigree information, there were 212 varieties with marker 381 data available for their parents and these varieties were generated from intercross between the 382 parents. For each variety and its parents, we simulated 10,000 F6 progeny using "AlphaSimR" 383 package 52 in R 47 . We then compared the simulated progeny to the known progeny (variety) and 384 its two parents, and counted the number of exact matches in the DUS markers. Additionally, we 385 bootstrapped the comparisons for 1,000 times to get a better estimate of the mean count of exact 386 matches. For comparison within the simulated progeny, we tabulated the number of occurrences 387 of each progeny with unique DUS marker haplotype. 388 To 389 evaluate the number of markers needed for DUS, we randomly sampled one to the maximum 390 number of markers with an increment of log10 of 0.1. We then calculated the Manhattan distances 391 from DUS traits and markers using dist function in R 47 . For each set of markers, we computed the 392 correlation between the Manhattan distances from DUS traits and marker data. In addition, we 393 also separated the DUS traits into a high heritability group (h 2 > 0.5) and low heritability group (h 2 394 < 0.5), and computed the correlations similarly. 395 Demonstrating the use of genomic markers in DUS via simulation. To test how genomic 396 markers can be used in DUS, we chose two known spring barley parent pairs with low and high 397 genomic distances. Acumen's parents, Propino and Quench with a distance of 0.20 represents 398 the low distance option, while Berwick's parents, Riviera and Cooper with a distance of 0.59 399 represents the high distance option. From each of these parent pairs, we simulated 1,000 F6 and 400 BC1S4 progeny using the "AlphaSimR" package 52 in R 47 . We then computed the Manhattan 401 distances from each simulated progeny group using dist function in R 47 . 402 Intellectual property rights in plant varieties. International legal regimes and policy 454 options for national governments General introduction to the examination of distinctness, uniformity and stability and the 456 development of harmonized descriptions of new varieties of plants Insights into deployment of DNA markers in plant variety 459 protection and registration on the common 461 catalogue of varieties of agricultural plant species Genetic 464 strategies for improving crop yield Plant genetic resources and molecular markers: variety registration in 466 a new era Genome-wide association mapping to candidate polymorphism resolution in the 468 unsequenced barley genome Seed systems in Kenya and their relationship to on-farm 470 conservation of food crops Enhancing African orphan crops with genomics Evaluation of the use of high-density SNP genotyping to implement UPOV Model 2 for 473 DUS testing in barley Genetic variation detected by use of the M13 "DNA 475 fingerprint" probe in Malus, Prunus, and Rubus (Rosaceae) Discriminating maize inbred lines using molecular and DUS data Evaluation of diagnostic molecular markers 479 for DUS phenotypic assessment in the cereal crop, barley Identification and DUS testing of rice varieties through microsatellite markers Development of maizeSNP3072, a high-throughput compatible SNP array, for DNA 484 fingerprinting identification of Chinese maize varieties Evaluation of soybean molecular marker public resources for potential 486 application in plant variety protection