ORIGINAL ARTICLE T. T. Carpenter Æ A. S. H. Kent A new method of quantifying endometriosis using digital photography Received: 19 December 2004 / Accepted: 18 February 2005 / Published online: 13 May 2005 � Springer-Verlag Berlin / Heidelberg 2005 Abstract The revised American Fertility Society scoring system for quantifying endometriosis is a relatively insensitive tool when assessing peritoneal endometriosis. We describe a new technique that can be used to quantify endometriosis which uses digital photography and a specifically designed computer analysis package to calculate lesion surface area. Using this we were able to demonstrate good intra-observer reproducibility, al- though inter-observer variability was relatively poor. Keywords Endometriosis Æ Quantification Introduction Despite the fact that the revised American Fertility Society scoring system (rAFS) [1–3] is the standard technique for classifying endometriosis, it is rather lim- ited in its discriminatory powers when dealing with peritoneal disease. By virtue of the fact that it was developed as a scoring system aimed to correlate with fertility, points are allocated heavily for disease affecting the fallopian tubes and ovaries with very little points being available for peritoneal disease. In fact, for peri- toneal disease alone the highest possible score is six, which places it in the mild category. Thus if one is spe- cifically interested in change in peritoneal disease the rAFS is a rather blunt tool. In view if this, we undertook a pilot study to assess the feasibility and reproducibility of using digital imag- ing to assess the surface area of specific endometriotic lesions. Method Subjects This was a three-centre study with multicentre regional ethics committee approval. Six patients were recruited (two from each site). Inclusion criteria were patients who were due to undergo a laparoscopy for suspected or known endometriosis, were over 18 years old, had all gynaecological organs present and had given in- formed consent. There was no limitation on stage of disease. Image capture All patients underwent laparoscopy in the proliferative phase of the cycle. Laparoscopic entry was carried out in the usual way as per the practice of the gynaecolo- gist, with particular care being taken when the uterus was instrumented not to be too vigorous with bimanual examination. On entry, the abdominopelvic cavity was inspected in the usual way, at which time an endome- triotic ‘‘index’’ lesion was selected. This was a lesion that was anatomically relatively easy to photograph with no/minimal manipulation needed, next to which could be placed a needle. The operating camera system was then detached from the laparoscope and replaced by a Nikon COOLPIX 4500 digital camera with a special adaptor to allow attachment of the camera to the end of the laparoscope. The camera had 4.0 megapixel resolution. Camera set-up A straight 13 mm surgical needle (with 2.0 vicryl at- tached) was introduced via a second port and placed as close to the lesion as possible in the same plane. The light setting on the camera was set to incandescent with a light intensity of zero (range �3 to +3). T. T. Carpenter (&) Æ A. S. H. Kent Dept. of Gynaecology, Royal Surrey County Hospital, Egerton Road, Guildford, GU2 7XX, UK E-mail: t.carpenter@doctors.org.uk T. T. Carpenter 7 Pointout Road, Southampton, SO16 7DL, UK Gynecol Surg (2005) 2: 119–125 DOI 10.1007/s10397-005-0095-7 Using auto focus, supplemented by manual focus where necessary, the lesion was photographed close-up, ensuring that the whole lesion and needle was in view. The laparoscope was then pulled back, the camera refocused, and a wide-angle picture taken to allow the location of the lesion to be demonstrated. The laparoscope and needle were then removed and the camera system detached. Following this, the whole process was then repeated to obtain a second set of images on the same lesion. The remainder of the lap- aroscopy was completed by the laser ablation of all visible endometriosis and closure of the abdominal port sites was by the usual practice of the operating gynaecologist. Images captured on the digital flash card where then downloaded onto the hard drive of a Compaq Evo computer and copies of each pair (close-up and wide angle) of images were stored on individual compact discs. Image analysis The images were analysed independently by two gynaecologists using a specifically-prepared software package produced by Virtualscopics, Rochester, NY, USA. The 12 individual close-up images were presented in random order for each assessor to quantify. The surface area was calculated by first defining the needle for scale. The individual components of each le- sion—red, black and white (as defined by the rAFS definition) [1]—were then circumscribed by the investi- gators using any combination of the various functions. Functions available were: Filters: Red, Green, Blue, Black, White Delineators: Live wire mode Allows you to optimise the path drawn between successively user- defined points. Shrink-wrap mode Allows you to define a structure by roughly tracing the outside perim- eter of a well-defined structure. Region growth mode Allows you to identify, with one mouse click, an entire well-defined structure. 3-D region growth mode Similar to the Region Growth Mode, but the growth will proceed in three dimensions. Geometrically constrained region growth (GEORG) mode Allows you to define the shape of the geometric model. The model is used to smooth region boundaries and limit growth outside a struc- ture of interest. 3-D GEORG mode Operates on the same principle as GEORG, with the additional functionality of growth proceeding in three dimensions. Add mode Allows you to use free hand tracking to modify a currently finalized (red) contour by adding a new area. Adjust mode The Adjust Mode allows you to modify the active contour. Each time the left mouse button is clicked, the contour is forced to pass through the clicked point. Continuous trace mode Allows you to perform free hand tracing of structure boundaries. Erase mode Allows you to modify the currently finalized contour by using free hand tracing to delete the un- wanted portion of the region. Polygon mode Allows you to manually trace structure boundaries by connecting points that the user made in the Image window. Straight lines are used to connect points defined by successive mouse clicks. Rectangle mode Allows you to define a rectangle. Select mode Allows you to convert a finalized (red) contour to an active (blue) contour. When the investigator was happy with the defined area, the image was finalised and the computer calcu- lated the surface area using the needle as a reference. Statistical analysis Intra- and inter-observer reproducibility was assessed by variance, coefficient of variation, and subjectively using plots. Results The total lesion area and the breakdown of red, black and white components as assessed by the two assessors are shown in Table 1. The range in sizes of the index lesions selected for assessment is wide (0.5–69 mm 2 ). The index lesions for three out of the six subjects contained only red tissue and had no black or white component. These were also the three smallest lesions. The remaining three lesions had all three component areas, with the largest compo- nent being white scar tissue. Plots of the lesion areas for the duplicate assessments made on each subject are presented by area type in Figs. 1, 2, 3 and 4. The within- and between-subject variance compo- nents of the index lesion assessments are shown in Table 2 and the within-subject coefficient of variations (SD/mean) are shown in Table 3. 120 Discussion The ranges in lesion size and lesion components in these patients are quite wide, with some small lesions having only red components and other larger lesions being dominated by white areas. To assess the reproducibility of a test there is no standard single ‘‘test’’ that can be applied. Instead one must use a combination of quantitative and qualitative assessments. In this experiment we used the coefficient of variation (CV) to quantitatively assess variability; how- ever, this should be interpreted with great caution as the number of assessments is small and as such will be markedly influenced by any outlying values. As a general rule, a CV of 100 is used as the cut-off for an acceptable lack of variation, but the figure is obviously a contin- uum, with lower figures indicating less variability. The CVs obtained in this study for both assessors show good reproducibility for within-subject analysis, with all figures being below 100. Variability in all cases was less for assessor one than for assessor two, with the within-T a b le 1 L es io n su rf a ce a re a s a s a ss es se d b y ea ch a ss es so r S u b je ct Im a g e L es io n a re a (m m 2 ) T o ta l R ed B la ck W h it e S u b je ct M ea n C en tr e M ea n A ss es so r 1 A ss es so r 2 A ss es so r 1 A ss es so r 2 A ss es so r 1 A ss es so r 2 A ss es so r 1 A ss es so r 2 A ss es so r 1 A ss es so r 2 A ss es so r 1 A ss es so r 2 1 A 0 .5 4 0 .4 8 0 .5 4 0 .4 8 0 0 0 0 0 .5 4 0 .5 5 1 .3 2 3 .0 4 1 B 0 .5 3 0 .6 2 0 .5 3 0 .6 2 0 0 0 0 2 A 2 .0 1 3 .7 4 2 .0 1 3 .7 4 0 0 0 0 2 .1 0 5 .5 3 2 B 2 .1 8 7 .3 2 2 .1 8 7 .3 2 0 0 0 0 3 A 6 8 .9 4 1 5 .5 7 2 5 .7 2 9 .6 9 .0 3 5 .9 7 3 4 .1 9 0 6 7 .7 3 1 7 .3 4 4 5 .6 3 2 1 .8 5 3 B 6 6 .5 2 1 9 .1 2 6 .1 1 1 0 .6 8 1 0 .2 1 8 .4 2 3 0 .2 0 4 A 1 9 .1 3 1 7 .9 4 1 .7 5 0 .9 1 0 .2 5 0 .6 1 7 .1 3 1 6 .4 3 2 3 .5 4 2 6 .3 7 4 B 2 7 .9 4 3 4 .7 9 2 .2 9 1 .4 7 0 .6 5 1 .2 8 2 5 3 2 .0 4 5 A 1 4 .3 8 1 1 .4 1 4 .3 8 7 .3 0 0 0 4 .1 1 4 .2 9 1 1 .6 3 2 2 .2 9 1 9 .4 2 5 B 1 4 .1 9 1 1 .8 5 1 4 .1 9 8 .9 1 0 0 0 2 .9 4 6 A 3 3 .0 9 3 5 .4 3 1 2 .5 8 .0 3 1 .7 8 1 .1 9 1 8 .8 1 2 6 .2 1 3 0 .2 9 2 7 .2 2 6 B 2 7 .4 9 1 9 .0 1 9 .6 8 6 .6 5 1 .5 6 1 .5 5 1 6 .2 5 1 0 .8 1 Fig. 1 Plots of total lesion surface area for each pair of lesions and for each assessor 121 subject variation contributing between only 0.7 and 3.6% of the total variance of assessor 1 compared to a within-subject variance for assessor 2 of between 6.7 and 34%. However, it is important to note that the per- centage variance figures given in Table 2 are percent- ages, so the figures are proportions. Therefore, if the between-subject variance component is high, as in the case of assessor 1, then proportionately, and thus as a percentage, the within-subject variance will be low. This can therefore give an artificial impression of a smaller within-subject variance than is actually the case. Look- ing at the true figures for variance, it is still clear that assessor 1 gave consistently less within-subject variabil- ity than assessor 2, although to a lesser extent than is apparent from the percentage values. Upon reviewing the techniques used by both asses- sors, assessor 1 used significantly more manual tracing of the regions via the live wire mode than user 2, who predominantly used the more automated functions of regional growth and geometrically constrained regional growth modes. This would imply that, although more time-consuming, semi-automated manual drawing of the lesions is a more reproducible technique. In both cases the most reproducible component of analysis was the red area. This is encouraging, as red areas are considered most active [4], and thus if one were to test any new treatment for active endometriosis, one would expect these areas to respond first and possibly to a greater extent. The total CVs for the two assessors were clearly far less reproducible. As a function of both between-subject variance and within-subject variance, this would be ex- pected due to the wide ranges in lesion sizes and com- positions. When assessing the efficacy of any treatment (or indeed a simple change in disease) one is interested in the changes in individual lesions or components of individual lesions, not the changes in total areas of le- sions in a combined set of patients. As such this varia- tion is unimportant. As explained earlier, assessment of reproducibility is enhanced by the combination of both quantitative and qualitative methods. Thus the plots of lesion size are equally important when drawing conclusions on Fig. 2 Plots of red lesion surface area for each pair of lesions and for each assessor Fig. 3 Plots of black lesion surface area for each pair of lesions and for each assessor 122 reproducibility and they also allow us to assess intra- and inter-observer variability. From Tables 1 to 3 and Figs. 1, 2, 3 and 4, the intra-observer reproducibility of the red and black areas appears very good in both observers. Note, however, that assessor 1 found no black areas in three of the subjects and assessor 2 found no black areas in two subjects, thereby making the plots of the black areas for the six subjects look better than perhaps is truly the case. Red areas, however, were present in all subjects, and the reproducibility is good throughout. The total lesion areas and white lesion areas do seem to show greater variability, with the greatest variability being apparent in the larger lesions. The re- duced reproducibilities of these areas compared to the red and black areas are probably related to the fact that the red and black areas have more obvious borders and thus confines. We are looking at lesions on a back- ground of peritoneum which itself has a white/greyish appearance. Identifying a border between a white area of endometriosis and normal peritoneum is therefore considerably harder, and hence more prone to error, than defining a border between a red or black area and Fig. 4 Plots of white lesion surface area for each pair of lesions and for each assessor T a b le 2 V a ri a n ce o f le si o n su rf a ce a re a s fo r ea ch a ss es so r L es io n a re a B et w ee n -s u b je ct v a ri a n ce co m p o n en t W it h in -s u b je ct v a ri a n ce co m p o n en t T o ta l v a ri a n ce P ro p o rt io n o f to ta l v a ri - a n ce fr o m b et w ee n -s u b je ct v a ri a n ce (% ) P ro p o rt io n o f to ta l v a ri - a n ce fr o m w it h in -s u b je ct v a ri a n ce (% ) A ss es so r 1 A ss es so r 2 A ss es so r 1 A ss es so r 2 A ss es so r 1 A ss es so r 2 A ss es so r 1 A ss es so r 2 A ss es so r 1 A ss es so r 2 T o ta l 6 0 9 .5 8 9 4 .6 9 .5 7 4 8 .3 6 1 9 .1 5 1 4 2 .9 9 8 .5 6 6 2 .5 3 4 R ed 9 6 .8 2 1 4 .1 7 0 .7 1 1 .5 7 9 7 .5 3 1 5 .7 4 9 9 .3 9 0 0 .7 1 0 B la ck 1 4 .4 5 7 .6 2 0 .1 3 0 .5 5 1 4 .5 8 8 .1 7 9 9 .1 9 3 .3 0 .9 6 .7 W h it e 1 8 6 .9 5 9 7 .0 3 7 .0 3 4 0 .1 8 1 9 3 .9 8 1 3 7 .2 1 9 6 .4 7 0 .7 3 .6 2 9 .3 123 normal peritoneum. The greater variability seen in the larger lesions is most likely to be as a consequence of size. The potential for error is going to increase with lesion size since the border to be defined is longer. In addition, we show actual lesion size in the plots rather than relative differences between plots. Thus, a 20% difference in actual surface area between plots in a small lesion will be markedly smaller than a 20% difference in a larger lesion. In reviewing plots of the two assessments, the larger lesion will have a much steeper slope between the two plots than the smaller lesion, despite the relative difference in the two assessments being the same. Assessment of inter-observer variability is best made by viewing Figs. 1, 2, 3 and 4. The more horizontal the line, the lower the intra observer variability, and the closer each pair of lines lie to each other, the lower the inter observer variability. From all plots it is clear that the reproducibility of analysis is significantly worse between assessors. The differences between the assessors when assessing the black areas does not seem great; however, as previously mentioned this is because no black areas were present in two of the subjects (as as- sessed by both assessors), and with the exception of subject 3 the black areas in the other subjects were very small. The lesion from subject 3 showed wide variation in all areas of assessment between the two assessors. On review, this lesion is a very complex lesion, with a potentially large chance of error. There was no consistent difference between the assessors, although it is interesting to note that whilst the red area was most reproducible in within-observer analysis, it was in fact the area that showed most vari- ation between observers. Thus it would appear that, although borders of these areas are easier than other areas to elucidate, the subjective decision as to whether an area is red, black or white still remains to be made by the assessor, and this appears to show more variability. When assessing reproducibility one must be aware of the components contributing to the variability. The ta- bles and figures concentrate on the variability within and between the two assessors. In fact the pairs of lesions analysed in each case are not the same image of the lesion but two different images of the same lesion taken at different times (albeit the same operation). Thus the within-subject variation is not only a function of the variability of the assessor but also the variability of the image taken. Whilst efforts have been made to min- imise the variability between the images, inevitably the photographs will not be taken at exactly the same angle, the needle will not be in exactly the same position for each photograph, and other variables will be slightly different between them too. It is important to be aware of these differences, as they are likely to be minimised in this study by the fact that the images were captured during the same operation a short duration apart. In studies assessing the efficacy of a treatment over time, the images will be captured during separate operations some time apart, almost certainly increasing the variability. The aim of developing new analysis techniques is to facilitate the detection of clinically-significant effects of a treatment on a disease, which in this case means that we would like to detect a clinically-significant difference in endometriotic lesions. In endometriosis, whilst one can make assumptions as to what would be a clinically-sig- nificant reduction in pain for example, because of the lack of correlation of symptoms with disease, it is not possible to logically select a value for clinically-signifi- cant change (with the exception of total disease irradi- ation) in lesion surface area. This technique is therefore most useful as a research tool. For ethical and economic reasons it is not possible to run long, large-scale trials of new treatments without some indication of the efficacy to begin with. Research tools such as this will allow investigators to detect changes, which may be relatively small, over a short period of time, which may then be used to justify a larger, more pragmatic study of a par- ticular treatment. What is important, however, is to be aware of the ability, or limitations, of a test used to detect a differ- ence. A test that has a relatively high variability is going to be unable to detect small differences between two groups because the difference will be ‘‘masked’’ by the background intrinsic variability. From the results ob- tained using this technique, assessor 1 should be able to detect a within-subject variance of more than 3.6%, and assessor 2 a within-subject variance of more than 34%. These figures should be kept in mind when interpreting the results. Because of the relatively small number of lesions used in this study and the differences between each pair of lesions examined, it was not possible to assess whether there was a ‘‘learning curve’’ with this technique. One criticism of this technique is that it only mea- sures the diseased surface and takes no account of depth of invasion and volume of disease. This would be particularly applicable to patients with relatively ad- vanced disease or those in areas such as the uterosacral ligaments, which often have little visible disease at the Table 3 Coefficients of variation for each assessor Lesion area Overall mean CV (Total) CV (Within-Subject) Assessor 1 Assessor 2 Assessor 1 Assessor 2 Assessor 1 Assessor 2 Total 23.08 14.77 107.8 80.93 13.4 47.1 Red 9.32 5.48 106.0 72.4 9.0 22.9 Black 1.96 1.58 194.8 180.9 18.4 46.9 White 11.80 7.71 118.0 151.9 22.5 82.2 124 surface but have deep deposits within. At present, the most likely route to quantifying this sort of disease would be via other means of imaging, such as magnetic resonance imaging. While work has been undertaken in this area, a reproducible technique has not been found to date [5]. Conclusions This technique demonstrates acceptable intra-observer variability for both assessors; however, there is signifi- cant operator dependence for reproducibility. The intra-observer reproducibility is relatively poor. This technique allows us to estimate an observers variability, thereby highlighting what might be considered a clini- cally-significant change when power calculations are performed for future studies. Acknowledgments We would like to acknowledge Pfizer UK for provision of the digital equipment and statistical analysis. We would also like to acknowledge Mr. S. Ewen and Mr. A. Pooley who undertook the digital photography at two of the sites. References 1. American Society for Reproductive Medicine (1997) Revised American Society for Reproductive Medicine classification of endometriosis. Fertil Steril 67(5):817–821 2. American Fertility Society (1985) Revised American Fertility Society classification of endometriosis. Fertil Steril 43(3):351– 352 3. American Fertility Society (1979) American Fertility Society for Classification of endometriosis. Fertil Steril 32(6):633–634 4. Khan KN et al (2004) Higher activity by opaque endometriotic lesions than nonopaque lesions. Acta Obstet Gynecol Scand 83(4):375–382 5. Kinkel K et al (1999) Magnetic resonance imaging characteris- tics of deep endometriosis. Hum Reprod 14(4):1080–1086 125 Sec1 Sec2 Sec3 Sec4 Sec5 Sec6 Sec7 Sec8 Sec9 Tab1 Fig1 Fig2 Fig3 Fig4 Tab2 Tab3 Sec10 Ack Bib CR1 CR2 CR3 CR4 CR5