key: cord-1044933-uf708a72 authors: Quispe-Coica, Alejandro; Pérez-Foguet, Agustí title: Preprocessing alternatives for compositional data related to water, sanitation and hygiene date: 2020-06-25 journal: Sci Total Environ DOI: 10.1016/j.scitotenv.2020.140519 sha: de6b66e2c1ee7dd7bd8e98721e70d0d489f12c0b doc_id: 1044933 cord_uid: uf708a72 Abstract The Sustainable Development Goals (SDGs) 6.1 and 6.2 measure the progress of urban and rural populations in their access to different levels of water, sanitation and hygiene (WASH) services, based on multiple sources of information. Service levels add up to 100%; therefore, they are compositional data (CoDa). Despite evidence of zero value, missing data and outliers in the sources of information, the treatment of these irregularities with different statistical techniques has not yet been analyzed for CoDa in the WASH sector. Thus, the results may present biased estimates, and the decisions based on these results will not necessarily be appropriate. In this article, we therefore: i) evaluate methodological imputation alternatives that address the problem of having either zero values or missing values, or both simultaneously; and ii) propose the need to complement the point-to-point identification of the WHO/UNICEF Joint Monitoring Program (JMP) with other robust alternatives, to deal with outliers depending on the number of data points. These suggestions have been considered here using statistics for CoDa with isometric log-ratio (ilr) transformation. A selection of illustrative cases is presented to compare performance of different alternatives. Monitoring access to WASH services is a multiscale process involving bodies from the local level -to support the planning and implementation of government public policies (Jiménez Fdez. de Palencia and Pérez-Foguet, 2011) -to the international level (WHO/UNICEF, 2017) . WASH monitoring has evolved substantially over the last 15 years. A key point is the movement from the use of single indicators of performance (such as coverage of water and sanitation by improved and unimproved technologies) to multidimensional frameworks that understand WASH in relationship with concepts such as poverty Pérez-Foguet, 2013a, 2019) and human rights (Baquero et al., 2015; Giné-Garriga et al., 2017) , or from the perspective of vulnerable and marginalized groups (Redman-Maclaren et al., 2018; Anthonj et al., 2020a) . Integrating these concepts leads to a much higher complexity than simple coverage of a population by one technical solution or another. This multidimensional nature was first measured through aggregated indicators such as the WASH poverty index Pérez-Foguet, 2013a, 2013b ) that extended the seminal proposal of the Water Poverty Index (Sullivan, 2002; Sullivan et al., 2003; Giné-Garriga and Pérez-Foguet, 2010; Pérez-Foguet and Giné-Garriga, 2011) . Likewise, some limitations of aggregated indicators, such as the compensability between dimensions and the lack of mechanisms to consider cross-influences between dimensions, has been tackled with different techniques (Ezbakhe and Pérez-Foguet, J o u r n a l P r e -p r o o f 3 preventing a direct application of the compositional approach. Data with a value of zero are commonly presented in countries that have made significant progress in the provision of improved water and sanitation services; as a consequence, populations with access to unimproved sources have been drastically reduced, with the number in many cases at or near zero. The ilr transformations in the data therefore cannot be carried out if the zero values are not first excluded or imputed. Exclusion is an easy alternative to address the problem, but if the amount of data in the sector is low, this can affect the predictive capacity of the models. Thus, in the literature, alternatives have been proposed for the imputation of zero values in each situation according to the CoDa properties, including rounded zeros (Palarea-Albaladejo et al., 2007; Palarea-Albaladejo and Martín-Fernández, 2008; Martín-Fernández et al., 2012; Templ et al., 2016; Chen et al., 2018) , count zeros and essential zeros (Aitchison and Kay, 2003) ). The techniques related to rounded zeros are the more convenient imputation alternatives for the WASH sector, given that even in more developed countries, there are likely to be at least small percentages of populations that do not have access to any kind of water services. Simple replacement and multiplicative replacement have already been addressed in previous studies of the sector (Pérez-Foguet et al., 2017; . Despite their simplicity in the application, these methods tend to underestimate the variability of data; therefore, it is advisable that they are only used when the presence of zeros is low (Palarea-Albaladejo and Martín-Fernández, 2008) . In the presence of large amounts of zero values, other imputation alternatives are recommended, according to the variability of data that exist in the time series. The lack of data defining the composition is also a topic of special importance in the sector, as it affects some categories of analysis. For example, according to the national survey (PNAD17) in the rural sector of Brazil, 88.4% of the population have access to improved drinking water sources (and 82.7%, by pipe), but no information is given about access by surface sources (WHO/UNICEF, 2019a). The lack of one or more data points for a specific year means that the ilr transformation J o u r n a l P r e -p r o o f Indonesia was reported to be 6.6% by the National Socio-economic survey in 2016, but another source of information reported that it was 41.5% (Performance Monitoring and Accountability; PMA16) (WHO/UNICEF, 2019b). This stems from the use of multiple sources of information and is not easily remedied automatically, yet it directly influences the estimates obtained under any model. Recently, proposed a method to deal with uncertainties that originate in statistical sampling, using compositional models of trends as applied to water and sanitation data. However, completing the punctual validation of the JMP with techniques and procedures for the detection of outliers or other data errors other than sampling (Bain et al., 2018) is still pending. Therefore, evaluating identification alternatives for the sector's CoDa is necessary. When working with CoDa, outliers cannot be identified for a variable independently of the rest. Multivariate analysis methods are necessary to facilitate the adequate detection of outliers and to enable data with evident errors to be identified, which can alter the estimates (Filzmoser and Hron, 2008; Filzmoser et al., 2009 Filzmoser et al., , 2012 . Filzmoser and Hron (2008) proposed the use of robust identification techniques based on the Mahalanobis distance (MD). The proposal applies to general regression models, such as GAM. Nevertheless, the low amount of data that some countries have can limit the use of this application. Other alternatives, such as ordinary least squares (OLS) regression, provide a better option in those cases. However, direct application of OLS is not convenient, as it can be negatively influenced by the presence of outliers. Therefore, it is necessary to apply robust estimators for linear regression models. Several methods for this exist in the literature, including M-estimation and S-estimation (Rousseeuw and Yohai, 1984) , MM-estimation (Yohai, 1987; Koller, 2011) and others (see overview in Maronna et al., 2019) . In this study, the MM-type estimators are applied, based on the good results obtained with them in other studies. It should be added that robust estimates do not necessarily exclude outliers, but rather modulate their influence on the calibrated model, which gives it a strong advantage for use with limited data. This work proposes and analyzes different coupled strategies for the treatment of zeros, missing data and outliers in compositional trend models, as applied to the international monitoring of the WASH sector, completing the previous work in this regard and facilitating its practical application to the available data. Specifically, it addresses the following objectives: J o u r n a l P r e -p r o o f 5 The algorithm proposed and shown in Figure 1 follows statistical procedures and techniques for CoDa that can be easily applied and replicated in any sector or area of analysis. To understand them, one must first know some basic concepts, such as: i) CoDa represent vectors, with D representing strictly positive components, and the sum is a constant -k‖, as shown in Eq.( 1); ii) its sample space is the simplex S D ; for statistical analysis, it is necessary to move to the Euclidean space using ilr transforms, which requires that D components be passed to (D-1). 1 ( ,........, ) 0, 1,2,...., k: can be 1, 100, or any other positive constant. These concepts and terms, although they seem simple, are not common in the WASH sector. Therefore, it is necessary to be clear about them, to understand the method of analysis in CoDa. If the information is in population (P) units, the proportions of the service categories are formed according to Eq. ( 2). Subsequently, vectors are constructed with the parts, in which the sum is a constant -k‖ (100% if it is given as a percentage, or 1 if given in proportions). In vectors for which data are missing, "NA" is used. The composition vectors that present irregular data (e.g, that are zero, missing, or both zero and missing data simultaneously) and outliers are treated with functions that involve ilr transformations according to Eq.( 5) of Egozcue et al. (2003) , each with particularities in the balances V. This procedure is also applied to generate the models. i. with access to improved (X1 × X2) and unimproved (X3 × X4) services; ii. next, with access to network services (X1) and other improved (X2) forms of access; iii. finally, with access to services (X3) and other unimproved (X4) forms of access. Hygiene balances comprise three parts and are performed under the following procedure , with the balance carried out between the proportion of the population: i. with a handwashing facility on premises (Xh1 × Xh2) and no handwashing facility (no service) (Xh3); ii. next, with access to basic services (Xh1) and limited service (Xh2). The results of balances and transformations are shown in Table 2 . Countries are classified into two groups according to the amount of data, with six being the separation limit. This classification is described in Fuller et al. (2016) . However, as the low quality of data also affects the predictive capacity of the models, we opted to carry out robust models in both groups as detailed below: For countries with data points < 6: the models are built using the robust OLS regression method on transformed data from Table 2B , for which the lmrob function is used, which calculates a regression estimator of the MM type as previously described (Yohai, 1987; Yohai et al., 1991; Koller and Stahel, 2011) . Evaluation of the influence of outliers in linear regression models is carried out using robustness weights. On the other hand, standard linear regression models are added to transformed and non-transformed data, for comparison with the robust alternative. Countries with data points ≥ 6: the model-fitting procedure combines the outlier identification method as part of the preprocessing and then excludes these data from the analysis to generate robust models, as described below: i. Outliers in multivariante data are identified by calculating the robust Mahalanobis distance (Eq. ( 6)) in isometric logratio coordinates of Eq.( 5). For the computational calculation, the outCoDa function is applied (Templ et al., 2011) . where T and C are estimators of location and the covariance, respectively (Mahalanobis, 1936) . Robustness is achieved by exchanging T and C for the minimum covariance determinant (MCD), which are robust estimators (Filzmoser and Hron, 2008) . Potential outliers are those that have robust MD (square) greater than the cut-off value, which is the 0.975 quantile of the 2 1 D J o u r n a l P r e -p r o o f predictive capacity of the model between the two is compared with the adjusted coefficient of determination (R-adj); values near to one the predictive capacity of the model is better. The computational calculation to generate the models is done with the gam function. The interpolation or extrapolation values in the transformed data are returned to the simplex space, for which the inverse transformation is performed with Eq. ( 7). X = Vector of Eq. ( 3) or Eq.( 4). For the WASH sector, it is important to see the interpolations and extrapolations of the models in the different categories of access to WASH. Therefore, performing an inverse transformation is mandatory. The whole process of the algorithm described up to STEP IV allows the interpolations and extrapolations of the different alternatives in the categories of access to WASH to be evaluated and compared, using quality metrics. In order to see the impact of the alternatives in STEP II on the scale of data, the root mean square error (RMSE) metric is applied to models expressed in terms of X. On the other hand, the evaluation of the predictive capacity of the models in the data is carried out through the non-dimensional indicator goodness of fit of Nash Sutcliffe efficiency (NSE) (Nash and Sutcliffe, 1970) applied to the observed and estimated X of the model. If NSE = 1, the fit of the model is perfect, while NSE < 1 suggests that the observed mean is a better predictor than the model (Ritter and Muñoz-Carpena, 2013) . The statistical computation of Figure 1 is performed through R Core Team (2019) The countries that do not present data irregularities are represented by Benin and Ghana for access to hygiene, and by Indonesia for access to rural water. For hygiene, the low amount of data is mainly due to the recent incorporation of this into the Sustainable Development Goals (SDG 6.2) as part of the monitoring indicators (Craven et al., 2013) ; in contrast, access to water and sanitation has been monitored since 1990 . In this type of data, STEP II of the algorithm does not apply. Data with irregularities are presented in three different forms: i) The first case is represented by Nigeria and Paraguay, which have values of zero in the data, of 1.14% and 7.14%, respectively. The categories of Paraguay reveal that this occurs when the provision of water services by improved sources is high ( Figure 3A) ; consequently, indicators of access to unimproved water have zero trends or zero values. J o u r n a l P r e -p r o o f 13 example, metrics in the impCoDa function are better for Benin and South Africa, while the lrEM function is better for Brazil; in contrast, no significant differences between either function (impCoda or lrEM) are present for Zambia. On the other hand, in countries with values of zero and missing values simultaneously (see Table 4C ), the alternative of replacing zero values with "NA" and addressing them as -missing values‖ with the impCoda function gives better results for Bangladesh and Egypt. This occurs when there is a higher percentage of data with missing values than zero values. However, the opposite situation occurs in the data set from Uruguay, which has 15% of zero values and 3.33% of missing values, and for which the lrEMplus function is a better alternative. Finally, while it is true that any of the methods evaluated is adequate for at least one of the cases (depending on each case), the methods are all already better than the multiplicative imputation alternatives or other simple alternatives, as they allow variability to exist in the imputed data. This advantage is more significant when the data points show a higher percentage of these irregularities. If no alternative is applied (either simple or one of those shown in this paper), many countries in the sector should be excluded from the analysis. This is especially important if the loss of information is significant (as happens in South American countries; Quispe-Coica and Pérez-Foguet, 2018). On the other hand, once the new Sustainable Development Goals were agreed upon (United Nations General Assembly, 2015; UN Water, 2016) , each country assumed the responsibility of reducing the population's access to unimproved services of WASH. To this end, many countries are defining and implementing public policies that close these gaps, in which case data will tend to go to extreme values, making it even more necessary to use imputation alternatives for zero values. J o u r n a l P r e -p r o o f This section addresses the case of countries with little data, where the influence of outliers is penalized in the coupled model. The access of rural populations to the different levels of hygiene services in Benin and Ghana illustrates this situation. In Figure 4A and Figure 4B , we present the model fit in data transformed by standard and robust linear regression. The regression lines of both methods are similar in the transformations of ilr2, and differ for both methods in ilr1 (with more drastic changes in Figure 4E ). The difference is mainly due to the fact that, in the robust method, points 1 and 5 of Ghana and Benin, respectively, have a strong degree of negative influence on the model, so that it assigns zero value robustness weights. How the influence of data is modulated creates significant differences in the estimates of the categories of hygiene services that the population accesses. In the case of Ghana, the effect in each category is even greater if we compare it with the other alternatives ( Figure 4F-H) . Likewise, in both Benin and Ghana, the curve generated by robust OLS (ilr) best fits the data. On the other hand, looking at the results qualitatively, it is more reasonable to exclude point 1 in Ghana and point 5 in Benin, which supports the affirmation that the robust linear regression alternative is an excellent alternative for regression models in the presence of data with outliers. Another feature to consider is that with OLS (ilr) or robust OLS (ilr), extrapolations of the service categories in 2000 and 2017 never exceeded the extreme limits of 0 and 100% (Table 5 ). This happens because the inverse transformation has a closing value (Eq. ( 4) J o u r n a l P r e -p r o o f 16 The possible reasons for outliers in data can be diverse. However, in the data analyzed here, it is evident that outliers commonly occur when there are different sources of information. To better illustrate this point, we present the case of the rural population of South Africa, for which information for the sewer categories in 2011 comes from three different sources: the Census (CEN) reported 6.03% access, the Income and Expenditure of Homes survey (IES) reported 44.16% access, and the General Household survey (GHS) reported 5.07% access. Based on the significant difference between data from IES and that from the other two sources of information (CEN and GHS), it is normal to assume that it is an outlier without needing to apply any validation methods. On the other hand, as the census data and the EEG survey only differ by 0.96%, it is difficult to know if either value is atypical or not. Given the doubt that is generated, robust MD can applied to the country's time series. The results obtained show that only the IES data point is an outlier ( Figure 5A .2), which confirms the previous assumption. The punctual validation carried out by the JMP (2019) (see Excel tab -Data Summary/Sanitation for 2011‖) identifies and excludes the CEN and IES data points from the model. These differences in identification that are manifested for a specific country and year can also occur for other countries when a time series is analyzed. For the 2020 estimates for Uruguay, there is no significant difference between the two alternatives (models with outliers included and without outliers). For example, for the category of access to piped water service, the difference between the two models was 0.004%. The remaining three categories also did not differ from these statements. It appears that in countries that have covered almost all water service provision, modeling and comparison are no longer relevant. Nonetheless, it cannot be ruled out that modeling is necessary for trend data to extreme values, as small proportions passed to population units can have significant effects, such as in China and India. On the other hand, we must emphasize two things: i) the estimates cannot exceed the extreme values of 0 and 100% in any service category; and ii) it is very important to use adequate statistical techniques, such as in STEP II, to treat values of zero, according to the variability of the time series data, as this allows models to be built without excluding data. The quality metrics in Table 7 reinforce the hypothesis that outliers influence the quality of the models. The metrics of the four indicators are the same or better when outliers are excluded in six of the ten countries (namely, South Africa, Brazil, On the other hand, the temporal trends of the service categories show the inequalities that exist in access to water and sanitation between the urban and rural sectors. In Indonesia and South Africa, access to water and sanitation by other improved forms is increasing ( Figure 6D .2 and 6E.2); however, in Uruguay, this category tends towards values of zero ( Figure 6F .2). If we compare only Indonesia and Uruguay, the rural-urban gap in the category of access to piped water is further increased, mirroring the world situation reported in the literature with respect to disparities that exist in access to water and sanitation in both sectors (Bain et al., 2014; Chitonge et al., 2020) . That said, and in the context of the SDGs that seeks to ensure that no one is left behind (United Nations General Assembly, 2015) , the rural sector in both Indonesia and South Africa is faced with a greater challenge in the provision and safe management of water and sanitation services. Finally, after outliers have been identified, it is not recommended to eliminate them automatically, as this can lead to loss of relevant information that helps explain the specific situation or time series of the country. Additionally, there are other factors that the analyst does not value when excluding data (such as the cost of obtaining data through a survey, census or other alternatives that is representative of the country); therefore, the essential thing before excluding outliers would be to understand why the values are anomalous. An alternative that would help to understand the presence of these data could be to consult the institutions of origin for the information sources. Nevertheless, obtaining answers becomes complicated when it depends on third-party institutions (for instance, for reports to the SDG, the associated countries generally have statistical or other specialized institutions that are responsible for collecting, processing, and sharing information to interested parties). In these cases, exclusion is simply a necessity because of the improvements it brings to the models. The existence of values of zero, missing values or both simultaneously makes it necessary to treated data in a differentiated manner, for which distinct treatment options are available. While these options are not equivalent, no clear criteria exist for choosing exactly which one to use, with all alternatives potentially equally good. Further, these options are suitable for analyzing data with variations in temporal evolution, which is not possible if we apply the multiplicative replacement (Martín-Fernández et al., 2003) . In countries with low amounts of data, we concluded that robust linear regression (robust OLS (ilr)) is suitable for the analysis of WASH sector data, since it limits the influence of outliers on the calibrated model. Both quantitatively and qualitatively, the declaration of outliers can be validated. In countries with ≥ 6 data points, the identification of outliers with the robust Mahalanobis distance tends to give us more than the qualitative classification made with the JMP (and specifically, for nine of the ten countries evaluated), which J o u r n a l P r e -p r o o f Furthermore, for all cases (e.g., < 6 or ≥ 6 data points), interpolation and extrapolation of the models in the service categories can never exceed the limit value of 0 or 100%. This affirmation concurs with and extends the conclusion obtained by Pérez-Foguet et al. (2017) , as we now have analyzed a wide range of data with different irregularities and include analysis of access to hygiene. Finally, the algorithm proposal that integrates models for a wide range of linear and non-linear data, with outliers included, is expected to contribute to improving data analysis in the sector, and especially those for which sources of information are different. This work complements the proposal made by Pérez-Foguet et al. (2017) Subsequently, each organization performs a correlation analysis of the selected categories, reaching consistent and contradictory conclusions, according to the category of analysis. Notes: The analysis category for water and sanitation is the same as in Table 1 . The correlation matrix is performed with data from the country's time series before pre-processing. From Table A1 -1, organization A infers that the correlation of the categories of access to water between X1w and X3w is positive and low (correlation = 0.27), while organization B concludes that the correlation between X1w* and X3w* has a high degree of relationship, but negative (correlation = -0.75). For the population with access to sanitation (Table A1 -2), an analysis similar to that described, organization A concludes that the relationship between X3s and X4s is positive (correlation = 0.40), and organization B, that it is negative (correlation = -0.30). Therefore, for the same categories of analyses of both cases, the different methods of analysis give different conclusions. The statistical analysis of compositional data Possible solutions of some essential zero problems in compositional data analysis A systematic review of water, sanitation and hygiene among Roma communities in Europe: Situation analysis, cultural context, and obstacles to improvement Geographical inequalities in drinking water in the Solomon Islands Impact of Community-Led Total Sanitation and Hygiene on Prevalence of Diarrheal Disease and Associated Factors among Under-Five Children: A Comparative Cross-Sectional Study in Selected Woredas of Gamo Gofa Zone, Southern Ethiopia Establishing Sustainable Development Goal Baselines for Household Drinking Water, Sanitation and Hygiene Services Rural:urban inequalities in post 2015 targets and indicators for drinking-water Reporting progress on the human right to water and sanitation through JMP and GLAAS Global Monitoring of Water Supply and Sanitation: History, Methods and Future Challenges Global access to handwashing: implications for COVID-19 control in low-income countries Water and Sanitation Inequality in Africa: Challenges for SDG 6 Introducing hygiene elements into sanitation monitoring Groups of parts and their balances in compositional data analysis Isometric Logratio Transformations for Compositional Data Analysis Leaving no one behind: Evaluating access to water, sanitation and hygiene for vulnerable and marginalized groups Estimating access to drinking water and sanitation: The need to account for uncertainty in trend analysis Multi-Criteria Decision Analysis Under Uncertainty: Two Approaches to Incorporating Data Uncertainty into Water, Sanitation and Hygiene Planning Outlier Detection for Compositional Data Using Robust Methods Interpretation of multivariate outliers for compositional data Univariate statistical analysis of environmental (compositional) data: Problems and possibilities Improved monitoring framework for local planning in the water, sanitation and hygiene sector: From data to decision-making Water-sanitation-hygiene mapping: An improved approach for data collection at local level Monitoring and targeting the sanitation poor: A multidimensional approach Water, sanitation, hygiene and rural poverty: issues of sector monitoring and the role of aggregated indicators Unravelling the Linkages Between Water, Sanitation, Hygiene and Rural Poverty: The WASH Poverty Index Improved Method to Calculate a Water Poverty Index at Local Scale A novel planning approach for the water, sanitation and hygiene (WaSH) sector: The use of object-oriented bayesian networks Inequality in access to improved drinking water sources and childhood diarrhoea in low-and middle-income countries Exploring the link between handwashing proxy measures and child diarrhea in 25 countries in sub-Saharan Africa: A cross-sectional study Imputation of missing values for compositional data using classical and robust methods Sharpening Wald-type inference in robust regression for small samples Compositional Data Analysis in Population Studies Effect of handwashing on child health: A randomised controlled trial Potential utilities of mask-wearing and instant hand hygiene for fighting SARS-CoV-2 On the generalized distance in statistics Robust statistics: theory and methods (with R) Bayesian-multiplicative treatment of count zeros in compositional data sets Dealing with Zeros and Missing Values in Compositional Data Sets Using Nonparametric Imputation Model-based replacement of rounded zeros in compositional data: Classical and robust approaches Dealing with Zeros River flow forecasting through conceptual models part I -A discussion of principles Treatment of Zeros, Left-Censored and Missing Values in Compositional Data Sets zCompositions -R package for multivariate imputation of leftcensored data under a compositional approach A modified EM alr-algorithm for replacing rounded zeros in compositional data sets A Parametric Approach for Dealing with Compositional Rounded Zeros Water, Sanitation, and Hygiene (WASH) Conditions and Their Association with Selected Diseases in Urban India Analyzing Water Poverty in Basins Compositional data for global monitoring: The case of drinking water and sanitation Burden of disease from inadequate water, sanitation and hygiene in low-and middle-income settings: a retrospective analysis of data from 145 countries Burden of disease from inadequate water, sanitation and hygiene for selected adverse health outcomes: An updated analysis with a focus on low-and middle-income countries Evolución del Acceso al Agua y Saneamiento en América del Sur Mediante Técnicas Estadísticas Composicionales R: A Language and Environment for Statistical Computing Water, sanitation and hygiene systems in pacific island schools to promote the health and education of girls and children with disability: A systematic scoping review Performance evaluation of hydrological models: Statistical significance for reducing subjectivity in goodness-of-fit assessments Robust Regression by Means of S-Estimators Unmasking Multivariate Outliers and Leverage Points Software cost estimation with incomplete data Calculating a Water Poverty Index The water poverty index: Development and application at the community scale robCompositions: An R-package for Robust Statistical Analysis of Compositional Data Imputation of rounded zeros for high-dimensional compositional data Measuring progress towards sanitation and hygiene targets: a critical review of monitoring methodologies and technologies Monitoring Water and Sanitation in the 2030 Agenda for Sustainable Development. An Introd Progress on household drinking water, sanitation and hygiene 2000-2017: special focus on inequalities, WHO. United Nations Children's Fund (UNICEF) and World Health Organization General Assembly Resolution A/RES/70/1. Transforming our world: the 2030 Agenda for Sustainable Development Joint Monitoring Programme for Water Supply, Sanitation, and Hygiene: Estimates on the use of water, sanitation and hygiene in Brazil Joint Monitoring Programme for Water Supply, Sanitation, and Hygiene: Estimates on the use of water, sanitation and hygiene in Indonesia JMP methodology 2017 update & sdg baselines Progress on drinking water, sanitation and hygiene: 2017 update and SDG baselines, World Health Organization and UNICEF. World Health Organization