key: cord-0010796-7100xir1 authors: LEARNER, M. A.; DENSEM, J. W.; ILES, T. C. title: A comparison of some classification methods used to determine benthic macro‐invertebrate species associations in river survey work based on data obtained from the River Ely, South Wales date: 2006-05-29 journal: Freshw Biol DOI: 10.1111/j.1365-2427.1983.tb00654.x sha: 8755d8c84765eabfc2e3dd9082c484b4750e496c doc_id: 10796 cord_uid: 7100xir1 SUMMARY. The results of a survey of the macro‐invertebrates of the polluted River Ely, South Wales, are used as a basis for comparing several classification methods which have been used previously in river survey work to determine species groupings. The methods compared are product‐moment correlation (clustered by the nearest neighbour technique), Kendall's tau coefficient (clustered by the nearest neighbour and average linkage techniques), and Squared Euclidean‐Distance coefficient (clustered by nearest neighbour and Ward's techniques). The species groupings determined by these methods were influenced both by the association coefficient and the technique used to cluster it. Some species were grouped together by all or most of the methods. The ecological validity of these robust groups is examined. A clear recommendation regarding the most appropriate method is frustrated by incomplete knowledge of the ecological requirements of most of the aquatic macro‐invertebrates used in the data‐set. However, Kendall's tau coefficient clustered by the average linkage technique appeared to produce ecologically meaningful species groups. Product‐moment correlation was also reasonably successful and since it is based on absolute abundance data whereas Kendall's tau coefficient is based on relative abundance data, the use of the two together is recommended for determining robust groups. Throughout the present century there has been a steady development of methods for summarizing the large quantity of biological data obtained from river surveys. These methods, recently reviewed by Wilhm (1975) , Hawkes (1977) and Hellawell (1977a Hellawell ( , b. 1978 , usually result in the production of biotic and diversity indices. They are intended to codify the response of groups of Correspondence: Dr M. A. Learner, Department of Applied Biology. University of Wales Institute of Science and Technology. King Edward VII Avenue, CardiffCF13NU. Wales. organisms to changes in water quality and have been developed principally to detect and assess pollution. Such methods are essentially pragmatic in their conception, much of the information contained within the samples remains unused, and only rarely does their use further our knowledge regarding the ecology of riverine species. Now that computers are more readily available, greater use can be made of the range of classification and ordination methods for determining species groups within large data-sets (Greig-Smith, 1964; Clifford & Stephenson, 1975; Green. 1979) . Since these methods make greater use of the information content of the samples, they are potentially a more sensitive way of identifying environmental change and ecological knowledge is likely to be advanced by their adoption. Edwards, Hughes & Read (1975) and Hamer & Soulsby (1980) showed that the objective establishment of species groups using multivariate analysis could provide a more reliable and sensitive method for identifying environmental discontinuities, especially those induced by pollution, than the use of biotic and diversity indices. However, the attractiveness of multivariate analysis for this purpose is reduced because many multivariate techniques exist and there is evidence (Williams, 1971; Pinkham & Pearson, 1976; Clarke, 1977; Hellawell, 1978) that application of different techniques to the same data-set is likely to produce different groupings of species. This clearly complicates comparison between studies where different methods have been applied and impedes wider acceptance of the species groups identified in any one study as being reliable and sensitive indicators of particular environmental conditions. The objectives of the present study were to establish the extent to which the application to the same data-set of some of the classification methods, which have been used previously by freshwater eeologists to determine benthic macro-invertebrate species groups, would produce different groupings of speeies, and to clarify, if possible, the appropriateness of the methods for producing ecologically meaningful species groups. The raw data used for the comparison were obtained during a macro-invertebrate survey of the polluted River Ely, South Wales, in August 1973. The river (Fig. 1 ) rises about 366m above mean sea level in moorland on the southern edge of the South Wales coalfield just west (SS 990903) of Williamstown and flows south-easterly for about 45 km before entering Ihe Bristol Channel at Cardiff. The total catchment area is 169 km^. The river above Llantrisant has cut through rock of the upper coal measures and the valleys are relatively steep sided. The mean river gradient between SI and S4 ( Fig. 1) is 9.1 mkm''. Downstream of Llantrisant the river crosses the plain of the Vale of Glamorgan composed chiefly of drift deposits (boulder clay, sands and gravels). In this region the gradient is much less being2.0mkm'' between S4and S9. Flow is continuously recorded at two sites ( Fig. 1) . Flows for the year October 1972 to September 1973 at the upper site ranged from 9.5xlO^m'd"' (August) to 3819xlO'mM'' (November) with a mean of 102 x lO'mM"'. Corresponding flows at the lower site were 54 x 10'mM"' to ll,456x lO'mM"' with a mean of 301 X lO^m'd"'. The river upstream of SI was affected by the dutnping of domestic rubbish, particularly vegetable refuse. A major decline in water quality occurred between SI and S2 caused principally by the discharge of ammonia and inert particulate matter, and occasionally phenols and thiocyanates, from the Coedely Coking Plant. Between S2 and S4 water quality remained poor because of the effluent from an overloaded sewage works. Heavy metal contamination, especially by copper, also occurred in this reach. Downstream of S4 the river was affected by industrial sources of biodegradable organic matter and upstream of S5 the entry of the Afon Clun brought in coal particles. Between S6 and S7 the river received the effluent from an overloaded sewage works. From S7 to S9 water quality improved, the river flowing through an agricultural region until it reaches Cardiff and receiving few discharges apart from sewage effluents and land drainage. The Nant Mychydd is unpolluted for most of its length but receives an industrial effluent upstream of S3. Sites were selected in relation to accessibility and known sources of pollution (Fig. I) . Riffles were sampled partly to reduce variability between sites and partly because they generally support a more diverse benthic macro-invertebrate fauna than pools (Hynes, 1970) and hence have greater information content. It is likely, therefore, that changes in the composition of the riffie fauna provide a more sensitive measure of environmental differences between sites. Riffle reaches predominated upstream of Llantrisant but were less frequent downstream where deposition of particles generally prevailed. Five or ten samples of the benthic macroinvertebrate fauna were collected at roughly equal intervals across' the width of the river at each site. Transect sampling has been recommended by Cummins (1962) as providing more information per unit sample effort than other sampling procedures. Each sample was collected by means of a Surber-type sampler which covered an area of 0.! m' and which had a net of 400^JLm diagonal pore-aperture (24meshan''). Edwards et al. (1972) concluded that this mesh size was the best for general sampling purposes of the three meshes they compared in the nearby RiverTaff system. Ten samples were collected at sites where artificial-substrate samplers were also being used. A comparison of the results obtained using the different sampling methods will be given elsewhere. Five minutes were devoted to the collection of each sample after preliminary work at SI and S9, using one minute sequential sampling, had shown that sampling for this length of time accounted lor between 90-99% of the total macro-invertebrate abundance obtainable by the mesh size used and 96-100% of the species present in eaeh part of the river bed sampled. All samples were collected over the period 16-17 August and the animals preserved in formaldehyde for later examination. Larger animals were removed from whole samples but a standardized sub-sampling procedure (Elliott, 1977. p. 135 ) was adopted for smaller animals. Additional samples were collected several times during August and early September from all sites. These were used to provide live chironomid larvae for rearing to adults. In this way. all the numerically important larvae were identified to species. Morphometric details, current velocity measurements and substrate samples were all collected a week after the collection of the macro-invertebrate samples. The results are summarized in Table 1 . At least fifteen depth measurements were made at equally spaced intervals across the river at each site. Current velocity was determined by means of a small Ott current meter held, (a) just below the water surface, and (b) just above the substratum, at a quarter, half and three-quarters of the distance across the river. Substrate samples were collected by means of a cylinder with a toothed rim which was forced into the bed of the river. The cylinder delimited 0.1 m' of bed and substrate to a depth of approximately 10 cm was transferred into a conical canvas bag, terminated by a 1-1 plastic bottle, which was attached around a large opening in the downstream side of the cylinder. One sample was collected at each site from a region of the river bed subjectively assessed to be typical of the riffie. Each sample was dried and approximately 50()g at a time were shaken for lOmin through a nest of nine 20-cm diameter sieves of 16, 8, 4, 2 and 1mm, and 500, 250 and 63^JLm mesh, respectively, by means of a mechanical sieve-shaker (Cummins, 1962) . Large stones were weighed separately. Water quality data were provided by the Welsh Water Authority for eight stations on the R. Ely (Fig. I) for the period January 1972 to December 1973. These data are summarized in Table 2 . Unfortunately, the stations used by the Authority for routine chemical sampling were unsuitable for macro-invertebrate sampling and therefore Table 2 provides only a general description of the quality of the water at ihe sites sampled for macro-invertebrates. Objective techniques for the determination of species groups generally involve two steps. Firstly, a coefficient of association (sometimes dissociation) between each pair of species is calculated and secondly, the resulting matrix of cx>efficients is subjected to a clustering strategy which produces groups of species. The measures of association (or dissociation) used in the present study were chosen because they are widely referred to in the literature and because they have been used previously by river eeologists (Edwards et al., 1975; Hughes, 1975; Ghetti& Bonazzi. 1977; Pollard, 1977; Brooker & Morris, 1980; Scullion & Edwards, 1980) . These measures were product-moment correlation, Kendall's 'tau' rank correlation and, as a distance coefficient. Squared Euclidean-Distance . Measures were selected which required the use of quantitative data because examination of the raw data indicated that most of the numerically important species in the R. Ely occurred at most of the sites and therefore qualitative (presence-absence) data would not discriminate between the species. This view was confirmed in an analysis of the data using Jaccard's coefficient of similarity (Jaccard, 1912) . Although the measures selected used quantitative data, product-moment correlation and Squared Euclidean-Distance are based on the absolute abundance of each species while Kendall's rank correlation method is based on relative abundance. Most previous workers (Edwards et ai. 1975; Hughes, 1975; Pollard, 1977; Scullion & Edwards, 1980) who have applied correlation methods to riverine macro-invertebrate data have grouped species on the basis of the degree of association between them (significance of the correlation coefficient). This is permissible if samples are collected at random and if the correlated measures are normally distributed (Greig-Smith, 1964) . However, because the maero-invertebrate data resulting from the R. Ely survey did not satisfy this requirement the correlation coefficient was used simply to deseribe the presence of association between sets of variables. Its use in this context does not require assumptions to be made of the statistical distribution of the variables and it was unnecessary to transform the data prior to calculating the product-moment correlation coefficient. This is advantageous because transformed data may not be so easily interpretable biologically even though a transformation may be statistically appropriate (Gertz, 1978; Meeter & Livingston, 1978) . Species restricted to a few sites only are likely to be associated on the basis of their joint absence from the majority of sites. Such associations have doubtful ecological significance because different reasons could account for the absence of species from a particular site. We therefore excluded from the data matrix those species which occurred at fewer than four sites. There was no evidence that species restricted to four sites were associated on the basis of joint absences. Where a species occurred at more than four sites, any zero values were included in the matrix. Data treated in exactly the same manner were used in both Kendall's rank correlation procedure and in the calculation of the Squared Euclidean-Distance coefficient. Although Brooker & Morris (1980) used Spearman's rank correlation coefficient. Bullock (1971) clearly preferred Kendall's tau coefficient partly because differences in rank are treated arithmetically whereas Spearman's method treats rank differences in a geometric manner which tends to over-emphasize major differences in rank. Siegel (1956) also considered Kendall's coefficient preferable to Spearman's when dealing with small sample sizes because it has a distribution close to a normal one for sample sizes as low as nine. Kendall's tau coefficient was therefore preferred in the present comparison. Tied ranks were dealt with using the mid-rank method (Kendall, 1955) . The Squared Euclidean-Distance coefficient was the only one of the three selected which measured the degree of dissociation between species (Clifford & Stephenson, 1975) . The resulting data matrix was clustered using Ward's method of hierarchical grouping (Wishart, 1975) and also the nearest neighbour method. Ward's method is most appropriately used with distance measures and therefore the nearest neighbour method was also used because of its wider applicability. Jardine & Sibson (1968 believed that this was theonly method of clustering that met all the mathematical criteria they suggested should be applied to any methtxl of cluster analysis when assessing its suitability. For this reason the nearest neighbour method of hierarchical grouping was also applied to the product-moment correlation and to Kendall's tau correlation mea.sures of association. However, nearest neighbour grouping is prone to the chaining effect (Shepherd & Willmott, 1968) in that species tend to join existing clusters because they resemble one member of the cluster although they may be very different from other members. The species associations resulting from the use of the various clustering techniques were examined carefully to make certain that the species forming the association were closely interlinked and that species were not being incorporated into the clusters by chaining. No chaining was observed in these clusters. Despite Jardine & Sibson's (1968 ) views, average linkage (unweighted pair-group) clustering was applied to Kendall's tau coeffi- cicnt. This technique is considered generally satisfactory although conservative by Clifford & Stephenson (1975) . It would appear sensible for groups to be joined at an average level of similarity rather than at the level of closest similarity between two species, one in each group, as occurs in nearest neighbour clustering and because of this Field & McFarlane (1968) recommended the use of average linkage clustering. Wishart (1975) observed that Euclidean-Distance coefficients are biased towards those variables (species) which have large variances. Thus, those species with very variable abundances may assume exaggerated importance in the clustering process. !n order to reduce this bias the data were standardized by subtracting the mean abundance of each species at a site from its abundance in the sample and dividing by the standard deviation. Such standardization should place more emphasis on distribution but it also means that species are being grouped more on tbe basis of their proportional occurrence than on their absolute abundance. Because of this. Squared Euclidean-Distance coefficients were also calculated using non-standardized data for comparison. All the clustering methods used in the present study are available in the CLUSTAN IC computer package (Wishart. 1975) . A !ist of the species found during the survey and tbeir average abundance at eaeh site is provided in Appendix 1. An indication of the distribution of species within replicate samples is provided by giving (a) the number of samples in which the species occurred at a site and (b) the standard deviation of the mean based on these samples, i.e. zero counts not included (Appendix 1). Elliott (1977) concluded that it is best to express population density in the form of the arithmetic mean of the original counts along with confidence limits calculated using the factor derived from the logarithmic transformation. However, this approach was frustrated by the difficulty of finding a suitable model for transforming the Ely data. The commonly used log (J: + 1) was unsuitable. The average abundance (zero counts included in their calculation) were used when calculating the coefficients of association. The species clusters resulting from the different analyses are listed in Table 3 . Clearly different procedures have led to the delimitation of different species clusters. Even where the same measure of association was used, e.g. Squared Euclidean-Distance, different clustering techniques produced different species clusters. Some methods proved very conservative. The Squared Euclidean-Distance coefficient when clustered by nearest neighbour produced few species groups (Table 3) irrespective of whether the data had been standardized or not. Ward's clustering of the Squared Euclidean-Distance coefficient produced fewer species groups when used with nonstandardized data than when used with standardized data. A basic assumption underlying the use of association analysis is that the grouping of species together implies that they may have similar environmental requirements (Greig-Smith, 1964) . Knowledge of the ecology of at least the commoner macro-invertebrate species in the R. Ely might have established the extent to which this was true for the methods used in this study, Unfortunately, in practice, insufficient information was available to determine this with any certainty. However, some species were grouped together by all or most of the procedures used. Green (1979) stated that if several clustering procedures, based on different algorithms related to different definitions of a group, all produce the same clusters of samples, the conclusion thai they are real groups of some kind is a robust one. These robust groups of species are shown in Table 3 . If these robust groups could also be shown to have ecological validity, they might be used as a basis for comparing the appropriateness of the clustering techniques used. This group is composed of the ephemeropteran BaetLs rhodani and the chironomids Folypedilum (laetum group), Cricotopus trifascia, Eukiefferiella claripennis and possibly E. calvescens. This group was particularly dominant at S7 and S8 (Fig. 2) . The former site was downstream of Miskin Sewage Works and had the highest maximum BOD^, about 17 mg O^ 1'', of the sites investigated; this had declined to about 8mg O^l"' at Site 8. However, the dissolved oxygen coticen trat ion remained relatively high, in excess of 4 mg Oj I"' at both sites because of the shallow turbulent nature of the river, Although phosphate concentrations were not determined at the time of the survey, later studies (Murphy. 1980) showed that orthophosphate concentrations could be as high as 2 mg PI"' at Site 7 and 1.2mg PI"' at Site 8. These, together with high nitrate concentrations (Table 2) . stimulated prolific growths of Cladophora at both sites. Baetis rhodani is noted for its tolerance of pollution by rapidly biodegradable organic matter such as sewage; mild pollution of this nature promotes an increase in numbers (Hawkes. 1979) . Ghetti & Bonazzi (1977) found this species to dominate the macro-invertebrate fauna at Langhirano on the Torrente Parma. Italy, where the water quality was similar to that of the R. Ely. It was less dominant at sites where the water quality was better or poorer than this. Less is known about the other species. Lehmann (1972) refers to £. claripennis (= E. hospita) as living principally among mosses in swift running water. Hawkes & Davies (1971) and Szczesny (1974) have observed the increased dominance of this species in reaches of rivers recovering from sewage pollution. The former authors believed that oxygen concentration was the principal factor detennining the distribution of E. claripennis, and also B. rhodani, in the river they studied. In particular, the summer inci-of both species was progressively suppressed as the amount of organic pollution increased because of the greater severity of oxygen depletion during the summer months. Eukiefferiella caivescens is also particularly associated with the presence of mosses, but generally in unpolluted rivers (learner et ai, 1971; Lehmann, 1972) . However. Wasson (1977) found this species to dominate the macro-invertebrate fauna of an organically enriched region of the Isere where Cladophora was abundant at certain times. Polypedilum and C. trifascia were particularly prominent at eutrophic sites on the R. Ely where the river was recovering from sewage pollution. Similarly, Wasson (1977) found F. laetum and C. trifascia largely restricted to organicallyenriched reaches of the Iserc. Besch & Hofmann (1968) noted that P. laettmi was particularly tolerant of organically polluted waters in the lower Steinach where it was associated with the bacterium Sphaerotilus. In the R. Ely Polypedilum (not necessarily P. laetttm itself) was the only member of Group 1 which occurred at S2 downstream of the Coedely Coking Plant. Product-moment correlation, unlike the other methods used, linked B. rhodaniwiih the crustacean Asellus aquaticus as a separate group whereas the use of Kendall's tau coefficient resulted in A. aqtiatictis being grouped with all the Group I species (Table 3) . This is probably because KendalPs tau is calculated from the ranked values of abundance at the sites, which are very similar in this instance, whereas product-moment correlation is sensitive to differences in absolute abundance which fluctuate more widely. Hawkes & Davies (1971) demonstrated the association between large numbers of A. aquaticus and eutrophic conditions as indicated by thick growths of Cladophora. Similarly in the R. Ely highest population densities occurred where there was an abundance of filamentous plants in the eutrophic lower reaches. Thus, the grouping of A. aqtiaticus with the Group I species would appear ecologically sound although unlike the other Group 1 species It was widely distributed, relatively abundant at SI and absent from S3. This group consists of two copepod species. Paracyclopsfimbriatus and the closely related P. poppei, empidids of the genus Wiedemannia, the naidid worm Pristina idrensis and the chironomid Synorthocladitts semivirens. The group was particularly dominant af SI where it formed 15% of the total macro-invertebrate population (Fig. 2) . The water at this site was relatively clean although affected by the dumping of domestic refuse into one of the upstream tributaries. Pristina idrensis and the two Paracyclops species occur in sewage filter-beds (Curds & Hawkes, 1975) . However, P. idremis was restricted to sites unpolluted by organic matter in the R. Cynon (Learner et ai, 1971 ) and, although occurringat some organically polluted sites in the R. Ely. it was most abundant at the relatively unpolluted site SI. The Paracyclops sfiecies are benthic in habit and crawl on and within the substratum. Hynes (1974) noted P. poppei occurring in sub-surface gravels along with Pristina piumaseta. In the R. Ely Wiedemannia sp. was found particularly associated with mosses and did not seem greatly affected by organic pollution. Hugjies (1975) found that Wiedemannia bi.stignui was most common at organically polluted sites on the R. Cynon. TTie association of Wiedemannia with the other Group 2 species may therefore be determined more by the distribution of aquatic mosses than by water quality. Synorthocladius semivirens appears to be typical of the potamon (Besch. Hofmann & Ellenberger, 1967; Lehmann. 1971; Brooker & MoiTis. 1980 ) but the population densities of this species are clearly enhanced in montane rivers where enrichment from sewage occurs. Szczesny (1974) recorded that this species was tolerant of gross inorganic pollution in the Kryniczanka stream in Poland where turbulence maintained a relatively high dissolved oxygen content but it is generally most abundant under conditions of mild organic pollution (Szczesny, 1974; Kownacki, 1977) , especially wbere fine-grained substrates occur. It appears that the presence of Group 2 species together is generally indicative of very miid organic enrichment. This group is less clearjy defined than the previous two groups (Table 3) . It is composed of the worm Nais elingtiis and the chironomids Conchapelopia melanops, Microcricotopus rectinervis [^ Nanocladius rectinervis-see Saether (1977) ] and Brillia longifurca. Nais elinguis is known to occur abundantly in shallow turbulent rivers with stony substrates which are organically polluted (Szczesny, 1974; Dumnicka & Pasternak, 1978; Learner, Lcxhhead & Hughes, 1978) especially where finer particles occur (Szczesny, 1974) . This species is, however, much less tolerant of high BOD values, low oxygen concentrations and high ammonia concentrations, related to gross organic pollution, than Tubifex tubifex, Limnodrilus hofpneisteri and Lumbricillus rivatis (Kom, 1963; Dumnicka & Pasternak, 1978) which also occur in the R. Ely. Much less is known about the response of Conchapelopia melanops to pollution. This species, living primarily on vegetation, is tolerant of a wide range of conditions, being of widespread occurrence in both upland and lowland streams and along lake shores (Lehmann. 1971; Lindegaard-Petersen. 1972) . The larvae of the other two cbironomids in Group 3 are also found predominantly on vegetation (Lehmann, 1971; Lindegaard-Petersen, 1972) . Brillia longifurca was originally considered a lake species (Thienemann. 1944) but in recent years it has been associated frequently with sewage pollution of riverine habitats (Besch & Hofmann, 1968; Hawkes & Davies. 1971; Learner et al., 1971; Szczesny, 1974) although it also occurs in unpolluted reaches (Lindegaard-Peterson, 1972) . In the R. Ely it was only found at more heavily polluted sites, even occurring at S2 downstream of the Coedely Coking Plant. Britain quite recently (Finder, 1974) but its occurrence in the R. Ely and in the R. Wye, particularly in the lower reaches (Brooker & Morris. 1980) indicate that it has been overlooked in the past. Lehmann (1971) found this species in the middle and lower course of the Fulda. In the Ely. although widely distributed, it was most abundant at organically polluted sites. Group 3 species were particularly abundant at S4 where they accounted for 15% of the total macroinvertebrate population. Above this site the main river was badly polluted by sewage-and coking-effluent and, despite substantial dilution by relatively clean water from the Nant Mychydd, water quality at S4 is poor (Table 2 ). In view of the tolerance of these species to some organic enrichment it is curious that this group was not more abundant at S7 and S8 downstream of Miskin Sewage Works where the water quality was similar to that at S4 (labie 2), A possible explanation in view of the importance of vegetation for these species is that aquatic mosses were present at S4 but not at S7 and S8. The other two sites where Group 3 species were relatively important, S3 and S6, were also characterized by the presence of mosses. The greater proportion of fine substrate at S4 (Table 1 ) may also have been beneficial for N. elinguis (Szczesny, 1974) . Group 3 species were intolerant of conditions at S2 downstream of the Coking Plant and at S5 which was particularly affected by high concentrations of coal particles and high permanganate values (Table 2 ). It is probable that phenols and other toxic substances were also present. In the Ely it would appear that Group 3 species together are indicative of mild to moderate organically polluted conditions provided suitable plants are available. This group consists of the snail Ancyltis fluviatilis. the chironomid Conchapelopia pallidula. the naidid worm Nais variabilis and the beetle Limnius vokkmari. These four were grouped together by the Squared-Euclidean Distance coefficient whatever clustering method was employed (Table 3) probably because this coefficient tended to cluster together species having similar absolute abundances or similar standardized values of abundance (standardized data). This group was much less clearly differentiated by the use of Kendall's tau or product-moment correlation. Ancylus fitiviatilis is used in biotic indices of water quality such as the Chandler's score (Chandler, 1970) as an indicator of unpolluted or virtually unpolluted waters. However, Wasson (1977) found greatest numbers of this species associated with the filamentous bacterium Sphaerotilus in the Isere and. from its presence at most sites along the lower Ely, it appears relatively tolerant of sewage pollution provided the water is well aerated. Bryce et al. (1978) noted that Ancylus was absent from those reaches of the R. Lee system where the oxygen saturation value remained below 50% for considerable periods. Limnius vokkmari is also generally considered to be intolerant of organic pollution but it did occur in reduced numbers in the middle and lower reaches of the Ely and is able to survive in places where turbulence keeps the water well aerated (Learner et al.. 1971) . Conchapelopia pallidula inhabits a wide variety of substrates (Dittmar. 1955). Learner et al. (1971) found it at all the sites they investigated on the R. Cynon including those polluted by sewage and coal particles. However, in the present study this species had a more restricted distribution (Appendix 1) particularly in the lower reaches of the Ely. Nais variabilis has frequently been found associated with sewage pollution in shallow turbulent streams and rivers (Learner c/a/., 1971; Szczesny, 1974; Kownacki. 1977) although this species may be less tolerant than Nais elinguis and N. barbata (Korn, 1963) , both of which also occur in the Ely. Although all the Group 4 species are tolerant of some organic pollution, they were most important at the very mildly polluted sites (SI and S3) where they accounted for 6% and 20% of total macro-invertebrate abundance respectively (Fig. 2) . The anthomyiid Limnophora, the cyclopoid Eucyclops agilis, and the chironomids Macropelopia nebulo.sa and Cricotopus tremuliis were grouped together by virtually all the clustering procedures used. However, their contribution to macro-invertebrate abundance was about 1% or less at all sites and the significance of their association is unclear. Product-moment and Kendall's tau coefficients of association indicate that the worm Nais barbata, and the chironomids Paratrichocladius rufiventris and Cricotopus bicinctus may also be part of Group 5. These three species were generally very abundant at all sites affected by organic pollution except S9 (Fig. 2) . Nais barbata often occurs abundantly in the organically polluted reaches of rivers with stony substrates especially in association with Cladophora (Learnerera/., 1978) . Both P. rufiventris and C. bicinctus similarly oecur abundantly in such environments although the filamentous alga may not necessarily be Cladophora (Hawkes & Davies, 1971; Learner et ai. 1971; Wasson, 1977) . Cricotopus bicinctus has also been found in large numbers associated with growths of Sphaerotilus (Wasson, 1977) . These three species were together most abundant at S4 where they accounted for over 60% of total macroinvertebrate abundance. In this respect they show some similarity to Group 3 but unlike Group 3 they were also dominant at S5. This group consists of the snail Lymnaea peregra, the leech Glossiphonia complanata, the amphipod Gammarus pulex and the water mite Hygrobates fitiviatilis. The tolerance of L. peregra to sewage pollution is well known (Hawkes. 1979) probably because, being airbreathing, it is unaffected by low oxygen concentrations. Dussart (1979) observed that the abundance of this species tended to increase as the calcium, potassium and chloride (probably related to sewage pollution) content of the water increased, Lymnaea peregra was most abundant at S9 on the R. Ely where the alkalinity was the highest of any site sampled and the chloride concentration was relatively high. This site was one of the least polluted of the sites investigated (Table 3) . Glossiphonia complanata occurred at most sites on the R. Ely but was found in greatest numbers at S9. Murphy (1980) in a more recent survey of the Ely noted that high summer densities of this species occurred where the river was unpolluted or mildly polluted with sewage. Bryce et ai (1978) refer to this species as occurring in unpolluted or mildly polluted waters in the R. Lee system, and Matysiak (1976, 1978) concluded that G. complanata is less tolerant of sewage than the often co-existing leech Erpobdella octoculata. This latter species was much more abundant in the R. Ely than G. complanata but was restricted to sites 7, 8 and 9 (Appendix 1) and could not be included in the species-association analysis. Greatest numbers of E. octoculata occurred at S7 and lowest at S9 which is the converse of the pattern of G. complanata abundance. The association between G. complanata and L. peregra probably indicates some similarity in environmental response but there may also be a biotic influence in that molluscs are the chief prey of G. compUmatu (Elliott & Mann. 1979) . Young & Ironmonger (1980) have shown this species, when offered a wide variety of foods under laboratory conditions, to feed extensively on L. peregra. Gammarus pulex Is adversely affected by sewage pollution because of its intolerance of tow oxygen concentrations (Hawkes & Davies, 1971) . However, this species may be fairly numerous in the well-aerated riffles of organically enriched reaches of turbulent rivers (Hawkes, 1979 ); this appears to be the situation in the lower Ely. The water mite Hygrobates fiuviatilis was widespread In the R. Ely. The larvae are parasitic on cbironomid midges (Jones. 1967) but the host spectrum is not known. Learner ero/. (1971) also found this species to be widespread in the R. Cynon. However, highest populations in both rivers occurred at unpolluted or mildly polluted sites. Group 6 species occurred in relatively low numbers but accounted for 7% of total macroinvertebrate abundance at S9 (Fig. 2) . This was one of the least polluted sites but was probably not that dissimilar in water quality from SI and S3 and yet this group was, for unknown reasons, much less abundant at these latter sites. These groups (Table 3 ) are best dealt with together. The tubificid worm Tubifex tubifex and the enchytraeid worm Lumbricillus rivalis are associated together as a species pair by all the association methods used in the present study except Kendall's tau which groups the tubificid worm Limnodriltis hoffineisteri along with them. This latter species was grouped with the ephemeroptera Baetis .scambus by the other methods, Lumbricilltis rivalis and T. tubifex often occur together in shallow turbulent rivers wherever pollution from sewage occurs (Learner et ai, 1971; Edwards et ai, 1972; Dumnicka, 1978; Dumnicka & Pasternak, 1978; Scullion & Edwards, 1980) although Dumnicka (1978) also noted the co-dominance of L. rivalis and the tubificid Limnodrilus uilekeniianus where the substrate was slimy sand rather than mud. The latter species was not found during our study although it does occur in low numbers in the R. Ely (Murphy, 1980) . The L. rivalis-T tubifex combination appears to be much less important in sewage polluted reaches of lowland rivers. It Is probable that L. rivalis is less tolerant of muddy substrates than is T. tubifex because of differences in respiratory behaviour (Berg. Jonasson & Ockelmann, 1962) . The respiratory rate of Z,. rivalis is dependent upon the environmental oxygen concentration whereas that of T. tubifex is not, except at very low oxygen concentrations. Dumnicka & Pasternak (1978) found that the abundance of L. rivalis decreased as the dissolved oxygen concentration decreased from 75 to 1.6% of the air-saturated value: T. tubifex abundance was not correlated with dissolved oxygen concentration. In the R. Ely these two species were an important component of the fauna at Sites 5-9 all of whieh were organically polluted. However, these sites were also affected by the deposition of eoal particles. The amount of fine matter in the sediments increases from S5 to S9 (Table 1) and so also does the abundance of both species. Ladle (1971) found much larger populations of 7", tubifex in fine sediments compared with coarser ones. It is likely that the L. rivalis~T. tubifex association (Group 8) indicates a common response to sewage pollution and also to an increased incorporation of fine inert material into the substratum. This group also dominates the macroinvertebrate fauna at S2 where it accounts for 76% of total abundance. This is undoubtedly partly a respKinse to sewage pollution (a stormwater overflow constantly discharged raw sewage upstream of Site 2). coupled with a tolerance of particularly high ammonia concentrations (Table 2) . Limnodrilus hoffmeisteri also occurred at S2 but although it often occurs with 7. tubifex in organically enriched rivers (Aston. 1973) and lakes (Milbrink, 1980) , the two species do not necessarily display the same pattern of quantitative response to environmental change (Brinkhurst & Kennedy, 1965; Learner et ai, 1971; Szczesny, 1974; Wisniewski, 1976; Dumnicka & Pasternak, 1978; Sarkka & Aho, 1980) . Although published observations do not provide an explanation for the differences in the pattern of quantitative change between these two species in the R. Ely, they do indicate that there may be an ecological justification for grouping T. tubifex and L, rivalis separately from L, hoffmeisteri. As stated earlier those species-association coefficients based on absolute, rather than relative, abundance data grouped L. hoffmeisteri with Baetis scambus (Group 7). The latter species was the dominant ephemeropteran in the organically polluted but well aerated reaches of the R. Cynon (Learner et ai. 1971) and Bryee et al. (1978) also noted a possible association between B. scambtis and organic enrichment in the R. Lee, In the R. Ely, B. .vfa/7i/7i« was the dominant ephemeropteran at Sites 1, 8 and 9 which were relatively mildly polluted by sewage whereas B. rhodani dominated at the more grossly polluted sites. Little information is available in the literature concerning the environmental requirements of B. scambus because until recently (Macan, 1979) it was considered impossible to distinguish between B. scambus nymphs and those of B. biocutatus unless the nymphs were reared or adults collected to determine which species was present. Scullion & Edwards (1980) have shown that the abundance of B. .'scambus is greatly reduced in the presence of coal particles. However, the concentrations in the river they studied were greatly in excess of those in the R. Ely and it is unlikely that the distribution of B. scambus in the R. Ely was influenced in this way. Group 7 species were a particularly important part of the fauna at S9, but their importance at SI is almost entirely due to B. scambus and at S2 entirely due to L. hoffmeisteri; therefore, it is not clear whether the grouping together of these two species has ecological support. One objective of this study was to determine which methods produced ecologically meaningful species groups. In general there appears to be ecological support for most of the robust groups identified (Table 3) . However, none of the methods we compared produced species groupings identical with the robust groupings. Nearest neighbour clustering of the Squared Euclidean-Distanee coefficient lacked discrimination. This method produced some small groups that seem ecologically meaningful but the tendency was for species to be associated into one or two targe groups for which there appears to be little ecological justification. Of the less conservative methods. Ward's clustering of the Squared Euclidean-Distance coefficient based on standardized data grouped species from Robust Groups 1, 5 and 6 together whieh does not appear ecologically realistic in view of their dominance distributions (Fig. 2) whereas most of the species groups produced by the product-moment and Kendall's tau methcxls appear reasonable on the basis of the present state of knowledge about the species concerned. However, several species included in groups generated by nearest neighbour clustering of the product-moment correlation coefficient and by average linkage clustering of the Kendall's tau rank coefficient were not included when nearest neighbour clustering of Kendall's tan coefficient was used. Of the methods we compared, Kendall's tau coefficient clustered by the average linkage technique appears most likely to provide ecologically meaningful species groups. Product-moment correlation is also useful and because it is based on absolute abundance data whereas Kendall's method is based on relative abundance data, the use of the two together provides a reasonable method for determining robust groups. This study supports experience gained elsewhere (Williams. 1971; Pinkham & Pearson, 1976; Clarke, 1977) that species groups detected by the use of similarity measures and clustering techniques are to an appreciable extent influenced by the particular methods used. It is not surprising therefore that the speeies groups determined as a result of the present study bear little relationship to the macro-invertebrate species groups determined for the R. Cynon (Edwards et ai, 1975; Hughes. 1975; Pollard. 1977) , another river of the South Wales coalfield somewhat similar to the R. Ely. However, other aspects are also important apart from the statistical procedures adopted. Although the above authors used similar macro-invertebrate sampling techniques and the same mesh size as ourselves, data refer to different times of the year. Edwards et ai (1975) data refer to July-August, Hughes (1975) used the maximum abundance of eacb species attained during a year. Pollard's (1977) data refer to May and ours were obtained during August. Undoubtedly species associations based on quantitative data will be affected by seasonal changes in population abundance associated with the life-histories of the species involved and also with any seasonal changes in water quality (Hawkes & Davies. 1971; Frost, Chiu & Thomas. 1976; Clarke, 1977) . Season will have less effect on the structure of the species groups based on presence-absence data but as mentioned earlier for the Ely data this approach may be too insen.sitive to be useful. Hamer & Soulsby (1980) also found quantitative differences to be more important than qualitative ones when differentiating between stream macro-invertebrate faunas in their study. For many years river eeologists have attempted to distinguish macro-invertebrate communities characteristic of particular river types or particular river zones . We believe the statistical establishment of species groups is an important tool for this work, as well as for detecting the presence of pollution. However, Williams (1971) points out that many procedures exist for classifying data and the problem is not so mueh one of finding a method of analysis but of choosing the most appropriate one from the many that exist. We hope this study will aid selection of an appropriate method. Tubilkids and water quality: a review. Envirottntentid Pollution 19(i8)Le macrobenthossur des substrats de polyethylene dans les eaux courantes. 2. La Steinach. une rivifire de la zone a truite Das Makrobenthosauf Polyalhylensubsiralen in Fliessgewassern I. -Die Kinzig, ein Fluss der unteren Salmoniden und oberen Barbenzone 1 he respiration of some animals from ihe pro fundal zone of a lake Studies on the biology of ihe Tubificidae (Annelida, Oiigochaeta) in a polluted stream A survey of the macroinvertebrate riffle fauna of the R The investigation of sample^ containing many species. 2. Sample comparison. Biolof(icalJournal of the Linnean Society A biological approach to water quality management nic use of mullivariate techniques in analysing effects of industrial effluents on benthic communities in Central Canada An Introduction to Numerical Classification An evaluation of some techniques for the collection and analysis of benthic samples with speeial emphasis on lotie waters Ecological Aspects of U,sed-water Treatment Ein Sauerlandbach Communities of oligochaetes (Oligochaeta) of the River Nida and tributaries 197R) The influence of physico-chemical properties of waler and bottom sediments in the River Nida on the distribution and numbers of Oligochaeta, Acta Hydrobiologica, A:™jt<7M Life cycles and distribution of the aquatic gastropod molluscs Bithynia tentaculatu (L,), Gyraulus albus (Muller) A biological survey of the River Taff Biological survey in the detection and assessment of pollution Some methods for the statistical analysis of samples of henthic invertebrates /* key to the British freshwater leeches Numerical methods in marine ecology I. A quantitative 'similarity' analysis of rocky shore samples in False Bay Seasonal changes of invertebrate populations in the polluted River Medlock Use of ranking methods to assess environmental data A comparison between various criteria for the interpretation of biological data in the analy.sis of the quality of running water Sampling Design and Statistical Methods for Environmental Biologists Quantitative Plant Ecology An approach to chemical and biological river monitoring systems River zonation and classification Biological classification of rivers: conceptual basis and ecological validity Invertebrates as indicators of river water quality Some effects of organic enrichment on benthic invertebrate communities in stream riffles Biological surveillance and water quality monitoring Change in natural and managed ecosystems: detection, measurement and assessment Proceedings of the Biological Surveillance of Rivers (1970) The Ecology of Running Waters Further studies on the distribution of stream animals within the substratum The distribution of the flora of the Alpine zone The construction of hierarchic and non-hierarchic classifications Choice of methods of automatic classification Descriptions of the larvae of Artus scaber Kramer, Protzia eximia Protz, and Piona uncata Koenike with notes on the hfehistories of the latter two Rank Correlation Methods, 2nd edn Studien zur Okologie der Oligochaeten in der oberen Donau under Berucksichtigung der Abwassereinwirkungen Biocenosis of a high mountain stream under the influence of tourism 4, The bottom fauna of the stream Rybi Potok (the High Tatra Mountains) The biology of Oiigochaeta from Dorset chalk streams A review of the biology of British Naididae (Oligochaeta) with emphasis on the lotic environment Die Chironomiden der Fulda. Archiv fiir Hydrobiologie. Supplement, 37, 466-Lehniann J, (1972) Revision der europaisehen Arten (Puppen ^<^ und Imagines