key: cord-0934318-iuj621h2
authors: Rua, Catarina; Clarke, William T; Driver, Ian D; Mougin, Olivier; Morgan, Andrew T.; Clare, Stuart; Francis, Susan; Muir, Keith; Wise, Richard; Carpenter, Adrian; Williams, Guy; Rowe, James B; Bowtell, Richard; Rodgers, Christopher T
title: Multi-centre, multi-vendor reproducibility of 7T QSM and R(2)* in the human brain: results from the UK7T study
date: 2020-09-09
journal: Neuroimage
DOI: 10.1016/j.neuroimage.2020.117358
sha: 1f0c9330622554f38fca7d08135128d9f06c7ebb
doc_id: 934318
cord_uid: iuj621h2

INTRODUCTION: We present the reliability of ultra-high field T(2)* MRI at 7T, as part of the UK7T Network's “Travelling Heads” study. T(2)*-weighted MRI images can be processed to produce quantitative susceptibility maps (QSM) and R(2)* maps. These reflect iron and myelin concentrations, which are altered in many pathophysiological processes. The relaxation parameters of human brain tissue are such that R(2)* mapping and QSM show particularly strong gains in contrast-to-noise ratio at ultra-high field (7T) vs clinical field strengths (1.5 - 3T). We aimed to determine the inter-subject and inter-site reproducibility of QSM and R(2)* mapping at 7T, in readiness for future multi-site clinical studies. METHODS: Ten healthy volunteers were scanned with harmonised single- and multi-echo T(2)*-weighted gradient echo pulse sequences. Participants were scanned five times at each “home” site and once at each of four other sites. The five sites had 1x Philips, 2x Siemens Magnetom, and 2x Siemens Terra scanners. QSM and R(2)* maps were computed with the Multi-Scale Dipole Inversion (MSDI) algorithm (https://github.com/fil-physics/Publication-Code). Results were assessed in relevant subcortical and cortical regions of interest (ROIs) defined manually or by the MNI152 standard space. RESULTS AND DISCUSSION: Mean susceptibility (χ) and R(2)* values agreed broadly with literature values in all ROIs. The inter-site within-subject standard deviation was 0.001 – 0.005 ppm (χ) and 0.0005 – 0.001 ms(−1) (R(2)*). For χ this is 2.1-4.8 fold better than 3T reports, and 1.1-3.4 fold better for R(2)*. The median ICC from within- and cross-site R(2)* data was 0.98 and 0.91, respectively. Multi-echo QSM had greater variability vs single-echo QSM especially in areas with large B(0) inhomogeneity such as the inferior frontal cortex. Across sites, R(2)* values were more consistent than QSM in subcortical structures due to differences in B(0)-shimming. On a between-subject level, our measured χ and R(2)* cross-site variance is comparable to within-site variance in the literature, suggesting that it is reasonable to pool data across sites using our harmonised protocol. CONCLUSION: The harmonized UK7T protocol and pipeline delivers on average a 3-fold improvement in the coefficient of reproducibility for QSM and R(2)* at 7T compared to previous reports of multi-site reproducibility at 3T. These protocols are ready for use in multi-site clinical studies at 7T.

Neurodegenerative diseases are a significant global health burden. In many instances, neurodegeneration is associated with the deposition of iron in the brain.

Understanding the patterns of deposition and their association with other risk factors is a key area of clinical research, but progress has been limited by the need to scale over multi-centre trials (Moeller et al., 2019) . The EUFIND (Düzel et al., 2019) is an example of a network focused on advancements in neurodegenerative research by running large-scale multi-centre imaging studies. Also, the UK7T network (http://www.uk7t.org) has recently run a multi-site study with a dementia cohort to assess feasibility in patient groups. Imaging as part of the C-MORE study (Capturing the MultiORgan Effects of is also including harmonized multi-centre sequences which might provide insights into the long-term impact in survivors of . Yet, in order to perform such multi-centre studies, it is necessary to first guarantee the consistency and reproducibility of imaging markers.

A popular approach to estimating iron concentration in the human brain uses gradientecho (GE) magnetic resonance imaging (MRI). In grey matter, iron is mainly found in the protein ferritin which, due to its antiferromagnetic core and the presence of uncompensated spins at the surface or in the core, exhibits a superparamagnetic behaviour (Makhlof et al., 1997; Langkammer et al., 2012) . This paramagnetic iron interacts with the MRI scanner's static magnetic field (B 0 ) causing local dipolar field perturbations. These accentuate the rate of transverse signal decay causing T 2 * relaxation in surrounding tissue, which is visible as decreasing signal amplitude with increasing echo time in a series of GE images. This effect causes an increase in the rate of transverse relaxation, R 2 *, which correlates well with non-heme iron concentrations in grey matter (Gelman et al., 1999; Langkammer et al., 2010) , and has been used to investigate the distribution of iron in the healthy brain and in disease (Haacke et al., 2005; Yao et al., 2009; Li et al., 2019) .

The local presence of iron (and to a lesser extent myelin and calcium) also affects the signal phase of GE images because of the effect of the field perturbation on the local Larmor frequency (House et al., 2007; He et al., 2009; Lee et al., 2012) . Quantitative Susceptibility Mapping (QSM) methods attempt to deconvolve these dipole phase patterns to identify the sources of the magnetic field inhomogeneity. In other words, QSM estimates quantitative maps of tissue magnetic susceptibility χ from GE phase data (Li and Leigh, 2004; Reichenbach, 2012; Wang and Liu, 2015) . This approach has shown sensitivity to several neurological conditions (Lotfipour et al., 2012; Acosta-Cabronero et al., 2013; Blazejewska et al., 2015; Acosta-Cabronero et al., 2016) and offers advantages over magnitude R 2 * such as having reduced blooming artifacts or being able to distinguish between paramagnetic and diamagnetic substances (Eskreis-Winkler et al., 2017) . R 2 * imaging and QSM have been shown to provide reproducible results in single-site and cross-site studies at 1.5T and 3T (Hinoda et al., 2015; Cobzas et al., 2015; Deh et al., 2015; Lin et al., 2015; Santin et al., 2017; Feng et al., 2018; Spincemaille et al., 2019) .

The dipole-inversion problem at the heart of QSM methods benefits from the increased sensitivity to magnetic susceptibility variation and spatial resolution at ultrahigh fields (B 0 ≥ 7 T) (Yacoub et al., 2001; Reichenbach et al., 2001; Tie-Qiang et al., 2006; Duyn et al., 2007; Wharton and Bowtell, 2010) . At 7T, close attention must be paid to B 0 shimming and gradient linearity to achieve accurate QSM and R 2 * mapping (Yang et al., 2010) . Head position is also an important factor that affects the susceptibility anisotropy (Lancione et al., 2017; Li et al., 2017) .

In this study, we introduce single-echo and multi-echo GE imaging protocols for QSM and R 2 * mapping at 7T which were standardised on three different 7T MRI scanner platforms, from two different vendors. We applied this standardised protocol in the UK7T Network's "Travelling Heads" study on 10 subjects scanned at 5 sites. We report reproducibility for derived R 2 * and QSM maps and make recommendations for the design of future multi-centre studies. (Nehrke et al., 2012; Ehses et al., 2019) was subsequently acquired and the transmit voltage (or power attenuation) was then adjusted for all subsequent imaging based on the mean flip-angle from the brain in an anatomically-specified axial slice of the 3D DREAM flip angle map as described in Clarke et al. (2019) . Single-echo 0.7mm isotropic resolution T 2 *-weighted GE data were then acquired with: TE/TR=20/31ms; FA=15°; bandwidth=70Hz/px; in-plane acceleration-factor=4 (Sites-1/2/4/5) or 2x2 (Site-3); FOV=224x224x157mm 3 ; scan-time=~9min. Multi-echo 1.4mm isotropic resolution T 2 *-weighted GE data were acquired with: TE 1 /TR=4/43ms; 8 echoes with monopolar gradient readouts; echo-spacing=5ms; FA=15°; bandwidth=260Hz/px; acceleration-factor=4 (Sites-1/2/4/5) or 2x1.5 (Site-3); FOV=269x218x157mm 3 ; scantime ~6min (Sites-1/2/4/5) and ~4min (Site-3). For Siemens data, coil combination was performed using a custom implementation of Roemer's algorithm, as previously described . Subject 6's single-echo scan failed to reconstruct using Roemer's method on data from the first visit at Site-5 so a sum-of-squares (SoS) algorithm was used for coil combination for that scan instead. A 0.7mm isotropic MP2RAGE scan was used for within-and cross-site registration as previously described .

2.2. QSM and R 2 * data processing QSM maps were generated from both the single-echo and multi-echo T 2 *-weighted datasets using the Multi-Scale Dipole Inversion (MSDI) algorithm, as implemented in QSMbox v2.0 (Acosta-Cabronero et al., 2018) . Briefly: first the local field was estimated by phase unwrapping (Abdul-Rahman et al., 2005) and magnitude-weighted least squares phase echo fitting was performed on the multi-echo data. Then, independently for both single-echo and multi-echo data, background field was removed using the Laplacian Boundary Value (LBV) method followed by the variable Spherical Mean Value (vSMV) algorithm with an initial kernel radius of 40mm (Zhou et al., 2014; Acosta-Cabronero et al., 2018) . MSDI inversion was estimated with two scales: the self-optimised lambda method was used on the first scale with filtering performed using a kernel with 1mm radius, and on the second scale the regularization term was set to λ=10 2.7 (the optimal value for in-vivo 7T datasets found in (Acosta-Cabronero et al., 2018)) and filtering was done with a kernel radius set to 5mm. Brain masks used in the analysis were obtained with FSL's Brain Extraction Tool (BET) with fractional intensity threshold=0.2 for single-echo data (Smith, 2002) . These were then mapped to multi-echo data space.

On the multi-echo data, QSM was reconstructed seven more times: with only one echo at 19 ms (matching the single echo data), with the two shortest echoes (i.e. TE 1 /TE 2 = 4/9 ms), with the three shortest echoes (i.e. TE 1 /TE 2 /TE 3 = 4/9/14 ms), and so forth.

On the multi-echo dataset, voxel-wise quantitative maps of R 2 * were obtained using the Auto-Regression on Linear Operations (ARLO) algorithm for fast monoexponential fitting (Pei et al., 2015) . R 2 * was also fitted five more times: with data from the first three echoes (TE1/TE2/TE3=4/9/14 ms), then with the first four echoes (TE1/TE2/TE3/TE4=4/9/14/19 ms), and so forth.

The neck was cropped from the magnitude data with FSL's "robustfov" command (https://fsl.fmrib.ox.ac.uk/fsl/), applied to the single-echo data and the 4 th echo of the multi-echo data. High-resolution single-echo and multi-echo templates were made from this cropped data for each subject with antsMultivariateTemplateConstruction2.sh from the Advanced Normalization Tools (ANTs, http://stnava.github.io/ANTs/). Two approaches were compared:

transformations using rigid registration with mutual information similarity metric (denoted as "Rigid" below) or using symmetric diffeomorphic image registration with cross-correlation similarity metric (denoted "SyN" below). Other settings were kept the same for both approaches: 4 steps with 0.1 gradient step size, maximum iterations per step 1000, 500, 250 and 100, smoothing factors per step of 4, 3, 2, and 1 voxels, and shrink factors per step of 12x, 8x, 4x, and 2x. The resulting registrations were then applied to the QSM and R 2 * maps which were averaged to create single-echo and multi-echo QSM and R 2 * templates for each subject.

Five regions of interest (Substantia Nigra, Red Nucleus, Caudate Nucleus, Putamen and Globus Pallidus) were manually segmented based on the subject-specific QSM templates of the single-echo data registered with the "SyN" approach. In order to minimize the amount of segmentation variability, these ROIs were then mapped to the single-echo "Rigid", and multi-echo "SyN" and multi-echo "Rigid" spaces with nearest neighbour interpolation and via non-linear registrations obtained with the default settings in the antsRegistrationSyN.sh command in ANTs.

Magnitude data were first registered to the T 1 -weighted MP2RAGE scans (Rigid transformations; MI similarity metric) and later to the standard T 1 "MNI152 brain"

(Montreal Neurological Institute 152) (using settings in antsRegistrationSyN.sh) applied to the single-echo data and to the 1 st echo of the multi-echo data. These registrations were then used to map the 48 probabilistic cortical ROIs, "cortical ROIs", from the Harvard-Oxford Cortical Atlas and the 21 probabilistic subcortical ROIs, "subcortical ROIs", from the Harvard Oxford Subcortical Atlas to the QSM and R 2 * template spaces.

The T 1 -weighted MP2RAGE data was bias-field corrected, brain extracted, and segmented into five tissues using SPM (https://www.fil.ion.ucl.ac.uk/spm/): the grey matter (GM), white matter (WM) and cerebral-spinal fluid (CSF) volumes were mapped into each subject-specific QSM template space. Then, using "fslmaths" from FSL (https://fsl.fmrib.ox.ac.uk/fsl/), the mapped cortical ROIs were thresholded at 10% of the "robust range" of non-zero voxels and multiplied by the GM tissue map in order to obtain GM-specific cortical ROIs. The mapped subcortical ROIs were thresholded at 50% of the "robust range" of non-zero voxels. From these, any CSF voxels were excluded from the left and right Caudate Nucleus, Putamen and Globus Pallidus, and the voxel sets from the left and right counterparts were merged together.

From the single-echo and multi-echo data, average χ and R 2 * values were extracted from the manual and atlas-based ROIs for all volunteers and sessions in template space (values given in Supplementary Material 1).

In order to estimate where the magnetic field is spatially more variable, field-maps were first estimated from the multi-echo datasets. was calculated from the background field removal step of the QSM pipeline, and was defined, per-voxel, as the average difference between the field in a voxel and its immediate nearest neighbors.

The average was extracted for each of the cortical ROIs and averaged across all subjects and sessions. Then the cortical ROIs were divided into two groups based on the values: wherever | | the ROI was grouped into "high " regions, otherwise it was grouped into "low " regions. was calculated from the multi-echo pipeline only, as differences to values calculated using single-echo data were minimal (Figure 1 , Supplementary Material 2).

We explored three possible susceptibility reference regions for QSM processing. The average QSM signal was extracted from: 1. A whole brain mask, "wb"; 2. A whole-brain CSF mask eroded in two steps, "csf"; 3. A manually placed cylindrical ROI in the right ventricle, "cyl" (across all subjects the ROI volume was 10411 mm 3 ).

Statistical analysis was performed with R 3.5.3 (R Core Team, 2013). Cross-site analysis used only the 1 st scan at the "home" site along with the scans at the other four sites.

To obtain the within subject average, AV w , the χ and R 2 * values were averaged within the same site and across the sites and then averaged across subjects:

where is the number of sessions ( for within-site and cross-site) and the number of subjects. Relative reliability was measured using the intra-class correlation coefficient (ICC) from within and cross-site data independently for each ROI (Weir, 2005) :

where and are the between-subjects and within-subjects mean square from a random-effects, one-way analysis of variance (ANOVA) model. Intra-subject absolute variability is assessed by measuring the within-subject standard-deviation (SD w ) calculated as (Santin et al., 2017) :

where ̅ ∑ ⁄ is the replicate average for each subject. SD w was computed using within-site data and cross-site data independently. Similarly, cross-subject variability was calculated by measuring the between-subject standard-deviation (SD b ):

where ∑ ∑ is the measurement average across subjects and sessions. Note that SD b is computed using data from all sites.

Statistical testing on AV w , SD w and ICC values extracted from manual and templatebased ROIs was done by first fitting the data with normal, log-normal, gamma and logistic distributions. The goodness-of-fit statistics for the parametric distributions were calculated and the distribution which showed the lowest Akaikes Information Criterion (AIC) was then used on a general linear model fitting. All models included as fixed main effects ROI number and data type (within-and cross-site). When evaluating the data registration type, the model also included registration type ("Rigid" and "SyN") as a fixed main effect. When testing for QSM reference, the model also included reference region ("wb", "csf", and "cyl") as a fixed main effect. On multi-echo QSM data, a model was fitted which also included the number of echoes processed as a fixed main effect. When comparing the manual and subcortical ROIs, the ROI type (manual vs. atlas-based) was also included as a fixed main effect. Finally, on the data from the cortical ROIs, ROI number was replaced with "high " and "low " ROI type as covariate. A p-value less than 0.05 was considered significant.

We investigated the effect of head orientation on QSM variability. Since all our data was acquired with transverse slice orientation, the slice normal vector in the acquired images was parallel to B 0 . We used the per-subject rotation matrices of the affine transforms from this acquired transverse data to MNI space to estimate the z-axis rotation  with respect to the B 0 vector (0,0,1) ( Figure 7 (A)):

where is the 3 rd row, 3 rd column of the affine transform matrix.

Two linear mixed effects models, 'mod1' and 'mod2', were fitted on the within-site and cross-site data separately: both models included site, ROI, and session number as fixed effects, and subject number as a random effect, while 'mod2' also included as a fixed effect. For each model, the R 2 was evaluated and both models were compared with a chi-squared test.

Finally, from 'mod2' the fit coefficients were used to estimate corrected -values based on a chosen standard for all of the measurements ( radians).

Then, new within-site and cross-site SD w of the corrected were calculated based on the same approach as in sub-section 2.5.

Representative slices of single-echo  (A) multi-echo  (B) and R 2 * maps (C) from an example subject templates. The right Caudate Nucleus (a), Putamen (b) and Globus Pallidus (c) are shown in green. Multi-echo  maps calculated with data from all 8 echoes.

3. Results 3.2. Reproducibility of QSM and R 2 * Figure 3 shows boxplots over ROIs of the within-and cross-site AV w (A), SD w (B) and ICC (C) values for the manual ROIs on the χ and R 2 * maps. The AV w from R 2 * maps measured on the same site is systematically higher compared to the AV w measured across sites (p < 0.0001; e.g., on the Putamen ROI, AV w_within-site = 0.0493 ms -1 vs AV w_cross-site = 0.0489 ms -1 ). On this comparison, QSM data did not show significant differences between within-site and cross-site groups for either single-echo data (p = 0.053) or multi-echo data (p = 0.65).

From all the data in the manual ROIs, the median SD w of single-echo χ-values was approximately 29% lower than for multi-echo χ-values (p = 0.0010). There was a significantly larger SD w cross-site compared to within-site on single-echo χ data (p < 0.0001; e.g., on the PN ROI, SD w_within-site = 0.00088 ppm vs SD w_cross-site = 0.0014 ppm), multi-echo χ (p = 0.033) and on R 2 * data (p < 0.0001).

The ICC values for within-and cross-site R 2 * data (median ICC was 0.98 and 0.91, respectively) were found to be significantly higher than values for single-echo χ 

The within-and cross-site standard deviations for one axial slice from one example subject using "Rigid" and "SyN" registration approaches are shown in Figure 4 .

Generally, with both registration methods, within-site and cross-site SD w increases in veins, in the orbitofrontal regions and at the cortical surface (white and green arrows, Figure 4 ). These are areas associated with large B 0 inhomogeneities and gradient nonlinearity. However, there is a decrease in the cross-site standard deviation in the orbitofrontal region and close to the edges of the cortex when using the "SyN" compared to the "Rigid" method (green arrows, Figure 4 ).

On the manual ROIs increased variability was observed for R 2 * on "Rigid" registered data compared to "SyN" (SD w : p < 0.0001; ICC: p < 0.013) but not for single-echo or multi-echo χ: for example, the median cross-site R 2 * SD w from all ROIs was 0.00066 ms -1 using "SyN" method and 0.00086 ms -1 using the "Rigid" registration method. On the atlas-based cortical ROIs, the same significant trend was observed for R 2 * and singleecho χ data (Table 2 , Supplementary Material 2).

Boxplots from data obtained on the manual ROIs of within-and cross-site SD w (red and green, respectively) of single-echo QSM (A) and multi-echo QSM (B) with a whole-brain reference (wb), with a csf reference (csf), and with a cylinder reference (cyl). Data from each ROI is shown with a different marker for each boxplot. Legend: SN=Substantia Nigra; RN: Red Nucleus; CN: Caudate Nucleus; Pu: Putamen; GP: Globus Pallidus. Multi-echo -maps were calculated with data from all eight echoes.

To assess the optimal QSM susceptibility referencing, Figure 5 shows boxplots of the SD w for single-echo and multi-echo χ using different referencing methods on the manual ROIs. On single-echo χ data, compared to "wb" correction (chosen correction for this study), the "csf" reference did not increase significantly the SD w (p = 0.93) but with "cyl" the median SD w increased by approximately 14% (p < 0.0001).

Multi-echo χ data showed an increase in the median SD w of, respectively, 11% (p = 0.00096) and 8% (p = 0.00064) when using "csf" and "cyl" methods for correction. The effect of varying the referencing of QSM data was similar in within-site and cross-site data, for all methods tested. 

On average across all the manual ROIs and compared to single echo data, multi-echo data (using two or more echoes) showed a significant 14% increase of the SD w ( Figure   6 ) and 3% of the ICC ( there is no significant difference in AV w (p = 0.79) or in SD w (p = 0.11) from χ computed from multiple echoes (i.e. 2 or more echoes in the QSM analysis). Yet, in the atlasbased cortical ROIs, long echo times (i.e. using 6 or more echoes) showed an average increase of 15.7% in SD w (p < 0.0001) compared to using 2 to 5 echoes and a decrease of 1.75% in ICC (p < 0.0001) ( In the manual ROIs, R 2 * showed no significant change in variability across all ROIs when different number of echoes were used in the fitting (SD w : p = 0.11; ICC: p = 0.95) ( Figure 6 (B) ) or on AV w (p = 0.97). In the atlas-based cortical ROIs, the number of echoes used influenced the average R 2 * value (AV w : p < 0.0001), weakly ICC (p = 0.021), but not SD w (p = 0.61). 

On the atlas-based cortical ROIs the SD w increased by approximately 28% and 88% on "high " regions compared to "low " regions on multi-echo χ and R 2 * data, respectively (p = 0.0011 and p < 0.0001) ( 

When analysing in the manually-defined ROIs with respect to , a consistent negative trend was observed for all subjects. In addition, for , the within-site SD w was nearly half of the cross-site SD w (0.011 and 0.028 radians, respectively), indicating that there was larger variability in head orientation across sites (subject-wise variability of , of equation [3] , is plotted in Figure 7 (B)).

Separately for within-site and cross-site data, we assessed the goodness-of-fit of a model containing as an explanatory variable. On single-echo within-site data, the marginal R 2 increased from 0.71 with 'mod1' to 0.76 with 'mod2' (which includes ) (Chi-squared test, p = 0.041). The corresponding cross-site R 2 s were: 0.77 and 0.80 (Chi-squared test, p = 0.057). On multi-echo data, the marginal R 2 increased from 0.75 with 'mod1' to 0.79 with 'mod2' on within-site data (Chi-squared test, p = 0.041) and maintained at 0.79 on both models for cross-site data (Chi-squared test, p = 0.14).

From the corrected -values at , results show a slight decrease in the ratio of within-site to cross-site SD w (Table 2 ), but variability of obtained from cross-site data was still higher than from within-site ( with -correction, p=0.01; uncorrected , p < 0.0001 (subsection 3.2)). For multi-echo data, the SD w obtained from the correctedvalues were similar on within-site compared to cross-site ( with -correction, p=0.11; uncorrected , p = 0.033 (subsection 3.2)).

In QSM, it is assumed that the macroscopic susceptibility in an imaging voxel is isotropic. However, it has been shown that this assumption is too simplistic for single head orientation QSM methods, complicating the interpretation of the QSM results (Li et al., 2017) . We investigated the effect of head orientation on QSM estimation in our data: (A) Considering that data was all acquired in the transverse plane with B 0 perpendicular to the imaging slice, subjects had a variable head rotation with respect to B 0 . To estimate , we used MNI space as a common head orientation (z MNI ) across all scans. From the affine registration matrix converting acquired data into MNI space, the angle of rotation from the rotated z-axis, z 2 , will be given by where is the 3 rd row, 3 rd column of the affine transform matrix. (B) Subject-wise within-site and cross-site measurements on . (C) Single-echo and (D) multi-echo scatter plots of measurements according to on the Globus Pallidus manual ROI. For each subject a linear trend is also plotted and the fit coefficients are given in the plot legend. Data from each site is displayed with a different symbol. Multi-echo -maps were calculated with data from all eight echoes.

Single-echo 0.82 0.67

Multi-echo 0.91 0.82 Table 2 : Within-site to Cross-site ratio of the median SD w obtained from all five manually-defined ROIs on single-echo and multi-echo without and with -correction.

In this paper, the reproducibility of QSM  and R 2 * measurements in cortical and subcortical regions of the brain was assessed for the first time in a multi-site study at 7T for two different protocols (a single-echo 0.7mm isotropic T 2 *-weighted scan and a 1.5mm isotropic multi-echo T 2 *-weighted scan), using three different scanner platforms provided by two different vendors.

Previous studies at 1.5T and 3T have shown good reproducibility for  and R 2 * data acquired on the same scanner or across sites (1.5T and 3T) (Hinoda et al., 2015; Cobzas et al., 2015; Deh et al., 2015; Lin et al., 2015; Santin et al., 2017; Feng et al., 2018; Spincemaille et al., 2019) . In terms of QSM and depending on the subcortical region, intra-scanner 3T repeatability studies report an SD w of 0.002-0.005 ppm (Feng et al., 2018) and 0.004-0.006 ppm (Santin et al., 2017) , and the cross-site 3T study by Lin et al. (2015) reported an average SD W of 0.006-0.010 ppm. We observed a within-site SD w range of 0.0009-0.004 ppm and cross-site SD w range of 0.001-0.005 ppm at 7T.

Compared to 3T studies, this is a 2.0-5.3 fold decrease in the within-site SD w , and a 2.1-4.8 decrease in the cross-site SD w .

The range of within-site SD w values for R 2 * was averaged 0.0003-0.001 ms -1 in our study and the cross-site SD w range was 0.0005-0.001 ms -1 . The cross-site values are comparable to the same site reported at 3T: 0.0005-0.0009 ms -1 (Feng et al., 2018) , 0.0006-0.002 ms -1 (Santin et al., 2017) . Compared to the latter, our cross-site results

show a 1.1-3.4 improvement over the same brain regions in R2* variability.

The study from Hinoda et al. (2015) measured QSM reproducibility at 1.5T and 3T by scanning subjects twice on each of the scanners. They showed there is a 1.1-2.1 fold decrease in the upper and lower limits in Bland-Altman plots at 3 T compared to 1.5 T, which is in line with the expected signal-to-noise ratio (SNR) increase between these two field strengths (Edelstein et al., 1986; Wardlaw et al., 2012) . Compared to 3T

reports, there is, on average, an improvement of approximately 3-fold in our QSM and R 2 * 7T measurements of reproducibility. This is in line with the expected SNR increase in brain imaging from 3T to 7T (Pohmann et al., 2015) .

The higher values of cross-site SD w compared to the within-site values in our study may be attributed to the different gradient systems and automatic distortion corrections used in the different scanner platforms and to the different approaches to shimming, which lead to different geometrical distortions and dropout regions (Figure 3 and 4, Supplementary Material 2) (Yang et al., 2010) . In our study we verified that not only regions in the cortex close to air-tissue interfaces show differences in B 0 across scanners, but also large subcortical regions such as the CN, the Pu and the GP ROIs.

We also showed that the use of a non-linear registration method (here, "SyN" in ANTs)

significantly reduced the inter-scanner variability of cortical QSM compared to rigidbody registration, indicating that differences in geometric distortion across scanners were present. The R 2 * results for both cortical and subcortical structures also show significantly lower inter-scanner variability when a non-linear registration was used.

For QSM, higher cross-site variability may also be attributed to the head orientation with respect to B 0 (Lancione et al., 2017; Li et al., 2017) . Our results indicate head orientation varied somewhat between scans and there was greater variation between sites than intra-site; we also observed a consistent negative correlation between and head orientation ( ). Using a linear model to attempt to regress-out the effects of head rotation improved the reproducibility of both within-site and cross-site data. It also reduced the penalty for multi-site scanning vs single-site scanning, but not completely.

In this study, the reproducibility of QSM using single-echo, high-resolution (0.7 mm isotropic resolution; TE=20ms) and multi-echo standard-resolution (1.4 mm isotropic resolution; TE=4, 9, 14, 19, 24, 29, 34 and 39 ms) protocols were compared, and the results show that the multi-echo QSM data has a significantly higher variability than single-echo QSM. Although multi-echo phase data has been combined with a magnitude-weighted least squares regression of phase to echo time, it may carry inconsistent phase accumulation across echoes that were inconsistently unwrapped. This is also particularly relevant for regions of large field inhomogeneities, where phase accumulation could exceed  between neighbouring voxels, resulting in multiple phase wraps, which the unwrapping algorithm maybe unable to correct (Cronin et al., 2017) . This has also been verified on the analysis of QSM data from the cortical ROIs reconstructed with different numbers of echoes: long echo times increase significantly the test-retest variability. Alternatively, phase unwrapping can be completely omitted by calculating the phase change over all echo-times using a complex fitting routine (Liu et al., 2012; Liu et al., 2013) (fit_ppm_complex.m of MEDI toolbox) and calculating the Laplacian directly from the resulting, still wrapped, temporal phase change data .

It has been shown that resolution influences QSM estimation. Haacke et al. (2015) showed on phantom data that by decreasing slice thickness from 3mm to 0.5mm partial volume effects are reduced, absolute susceptibility values decrease, and accuracy improves up to 25%. Similar findings on in vivo brain data are reported in Sun et al. (2017) (single-echo data) and Karsa et al. (2018) (multi-echo data) . Our results support the suggestion that a reduction of partial volume effects at higher-resolution might play a role in decreasing both test-retest and cross-site variability on the singleecho high-resolution data compared to the multi-echo low-resolution data.

R 2 * values show significantly lower variability, reflected in the higher ICC within and across-sites compared to corresponding values for  in subcortical areas. This may be because the  estimation is globally more sensitive to background field inhomogeneity compared to magnitude data. However, in orbitofrontal and lower temporal regions large through-plane field variations from tissue-air interfaces dominate the field changes and produce dropouts in the signal magnitude and increase the background phase, affecting both QSM and R 2 * maps by increasing variability and decreasing ICC across sites. In addition, because of large field variations, the estimated cortical R 2 * increases significantly when late echo times are used for the fitting, but this effect is not seen in subcortical areas.

QSM can only determine relative susceptibility differences (Cheng et al., 2009 ) and most approaches to calculation of susceptibility from measured phase yield maps in which the average value of susceptibility is zero over the masked imaging volume.

Issues related to referencing of QSM data have been investigated (Feng et al., 2018; Straub et al., 2017) , with aim of finding a reference region or tissue to which all susceptibility values are referred that produces well-defined and reproducible values of susceptibility. Here we investigated how the choice of reference affects the withinsite and cross-site variability of measured susceptibility at ultra-high-field. We tested three accepted reference regions: total whole brain signal, "wb", whole brain CSF eroded in order to exclude any pial or skull surfaces, "csf", and a manually selected cylindrical ROI in the right ventricle, "cyl". We found that the "cyl" referencing generally increased the variability of the cross-site and within-site susceptibility measurements in cortical and subcortical ROIs compared to "wb" referencing. In the case of the multi-echo acquisition the "csf" referencing also increased the variability relative to "wb" data. This may be because of imprecision in systematically obtaining average QSM signal from CSF regions. Referencing using a small ROI in the ventricles might be prone to subjectivity given the natural variation in ventricle size in healthy subjects and in disease. Furthermore, the ventricles do not contain pure CSF: they are traversed by blood vessels with a different  (Sullivan et al., 2002) . This makes wholebrain referencing attractive in many situations. Yet, in patient cohorts where there is substantial iron load in subcortical structures (Snyder and Connor, 2009) , whole brain referencing might not be an appropriate approach. In this case, the more appropriate approach will be to choose a small reference region which shows no changes in the particular disease to be "zero" susceptibility at a cost of a slight increase in SD.

To eliminate operator-dependent bias in segmentation when determining brain structures, we have analysed data using both manual and atlas-based segmentation. In this study, harmonized protocols were produced for all five scanners without any significant sequence alterations, as a product 3D gradient echo (GE) sequence was readily available on all systems (the product 'gre' sequence from Siemens and the product 'ffe' from Philips). The protocols and an example dataset are provided in (Clarke, 2018) . Generally, we also relied on the vendors' reconstruction. However, at the end of the reconstruction pipeline of the Siemens systems we adopted a different coil combination approach based on Roemer et al. (1990) and Walsh et al. (2000) , to match the SENSE approach implemented on Philips scanners (Pruessmann et al., 1999; Robinson et al., 2017) . This was required due to artifacts appearing on phase images in (Santin et al., 2017) . However, other coil combination methods such as a selective channel combination approach (Vegh et al., 2016) or the COMPOSER (COMbining Phase data using a Short Echo-time Reference scan) method (Bollmann et al., 2018) have also been shown to reduce open-ended fringe lines and noise in the signal phase. For future investigations, the raw k-space data collected from all sites in this study has been stored and is available from the authors upon request.

On the QSM reconstruction, an imperfect background field filtering can influence the reproducibility of QSM data. For this reason, we performed background removal in two steps as implemented in QSMbox v2.0 and as described in (Acosta-Cabronero et al., 2018) : first with the LBV approach and then followed by the vSMV method.

Regularized field-to-susceptibility inversion strategies have been proposed to overcome the ill-posed problem in QSM with data acquired at a single head orientation (de Rochefort et al., 2010) . We opted to use the MSDI implementation in QSMbox v2.0 (Acosta-Cabronero et al., 2018) , as it ranked top-10 in all metrics of the 2016 QSM Reconstruction Challenge (Langkammer et al., 2018) , and also now includes a new selfoptimized local scale, which results in a better preservation of phase noise texture and low susceptibility contrast features. On the second step, the regularization factor, λ, used for this study was set to 10 2.7 , as recommended by Acosta-Cabronero et al. (2018) based on an L-curve analysis (Hansen et al., 1993) with high-resolution 7T data.

The standard multi-echo GE protocol in this study was produced as a harmonised sequence that could be performed at all sites, with a relatively short acquisition time (approximately 5 minutes), which is acceptable for patient studies. Mid-brain structures such as the basal ganglia are identifiable, yet small subcortical structures will suffer from partial-volume effects, which could be a limitation of this harmonized protocol for future ultra-high field multi-site studies.

At ultra-high field there can be variations in SNR in magnitude data caused by the variable B 1 + across the brain (Abduljalil et al., 2003) . As R 2 * is estimated voxel-wise, and as there is always a reasonable SNR in the magnitude data, the coefficient in the exponential fit that estimates R 2 * will not be strongly affected by variations in B 1 + . QSM maps are estimated from filtered phase data which is not strongly affected by transmit B 1 + variations. On our data, no correlations were found between QSM or R 2 * maps and B 1 + flip-angle maps collected in the same session ( Figure 6 , Supplementary

Material 2).

Illustration of the feasibility of a 7T QSM clinical study.  (A) and R 2 * (B) for four ROIs (Substantia Nigra, SN; Caudate Nucleus, CN; Putamen, Pu; Globus Pallidus, GP) from healthy volunteer (HV) and synthetic "patient" (PT) data for which AV lit and SD lit were obtained from Langkammer et al. (2016) and SD b were calculated from data of the current study. AV lit values for R 2 * were linearly scaled to 7T according to Yao et al. (2007) . Blue bars show the AV lit  SD lit and green bars the AV lit  SD b . Statistical differences between HV and PT obtained from Langkammer et al. (2016) are also shown. For each ROI, the sample size that would have been needed to give a significant effect was calculated from the group means, AV lit , and the SD b per ROI and is shown in circles. Multi-echo -maps were calculated with data from all eight echoes.

To minimise confounding effects of age or pathology, we assessed test-retest reliability and cross-site variability with ten healthy young subjects. The cross-site, betweensubject standard-deviation, SD b , measured in this study was evaluated together with healthy and Parkinson's disease data from (Langkammer et al., 2016) . A power analysis revealed a sample size that would have been required for a multi-site clinical study in each ROI as shown in Figure 8 . For all the significant ROIs the number of subjects that would have been required per group was less or equal to 44. Since this is lower than the sample size we have used in this study (90 healthy volunteer scans) and the numbers in the Langkammer study (66 patients and 58 control subjects), it gives strong confidence of feasibility for future 7T QSM clinical studies.

We investigated test-retest reliability and reproducibility of T 2 *-weighted imaging protocols at ultra-high field MRI. Considering the increase in susceptibility effects at 7T, we found that variability of measurements of QSM  and R 2 * in the basal ganglia are reduced compared to reports from lower field strengths, 1.5T and 3T. Scanner hardware differences give more modest improvements for cortical measurements of QSM  and R 2 *. Multi-echo protocols do not benefit from long echo times as these increase the imprecision in the estimation of QSM. We suggest that 7T MRI is suitable for multicentre quantitative analyses of brain iron, in health and disease.

The UK7T 

JBR is supported by the Wellcome Trust (WT103838)

Conceptualization, Methodology, Formal analysis, Investigation, Writing -Original Draft, Writing -Review & Editing William T Clarke: Conceptualization, Methodology, Investigation, Writing -Review & Editing Ian D Driver: Conceptualization, Methodology, Investigation, Writing -Review & Editing Olivier Mougin: Conceptualization, Methodology, Investigation, Writing -Review & Editing Andrew T. Morgan: Conceptualization, Methodology, Investigation. Stuart Clare: Conceptualization, Methodology

Richard Wise: Conceptualization, Methodology, Project administration, Funding acquisition

Guy Williams: Conceptualization, Methodology, Project administration, Funding acquisition

Writing -Review & Editing, Funding acquisition

Writing -Review & Editing, Funding acquisition

Conceptualization, Methodology, Project administration, Investigation, Writing -Review & Editing, Funding acquisition

Fast three-dimensional phaseunwrapping algorithm based on sorting by reliability following a non-continuous path

Enhanced gray and white matter contrast of phase susceptibility-weighted images in ultra-high-field magnetic resonance imaging

A robust multi-scale approach to quantitative susceptibility mapping

In Vivo MRI mapping of brain iron deposition across the adult lifespan

In vivo quantitative susceptibility mapping (QSM) in Alzheimer's disease

Susceptibility weighted imaging: differentiating between calcification and hemosiderin

Highresolution characterisation of the aging brain using simultaneous quantitative susceptibility mapping (QSM) and R2* measurements at 7 T

MRI estimates of brain iron concentration in normal aging using quantitative susceptibility mapping

Increase in the iron content of the substantia nigra and red nucleus in multiple sclerosis and clinically isolated syndrome: a 7 Tesla MRI study

The challenge of bias-free coil combination for quantitative susceptibility mapping at ultra-high field

Understanding phase maps in MRI: a new cutline phase unwrapping method

Limitations of calculating field distributions and magnetic susceptibilities in MRI using a Fourier based method

The impact of coregitration of gradient recalled echo images on quantitative susceptibility and R2* mapping at 7T. bioRxiv

UK7T Network harmonized neuroimaging protocols

Multi-site harmonization of

Subcortical gray matter segmentation and voxel-based analysis using transverse relaxation and quantitative susceptibility mapping with application to multiple sclerosis

Use of registration for cohort studies

Quantitative susceptibility map reconstruction from MR phase data using Bayesian regularization: validation and application to brain imaging

Reproducibility of quantitative susceptibility mapping in the brain at two field strengths from two vendors

High-resolution MR imaging of the human brainstem in vivo at 7

European Ultrahigh-Field Imaging Network for Neurodegenerative Diseases (EUFIND)

Highfield MRI of brain cortical substructure based on signal phase

The intrinsic signal-to-noise ratio in NMR imaging

Whole-brain B1-mapping using three-dimensional DREAM

The clinical utility of QSM: disease diagnosis, medical management, and surgical planning

Quantitative susceptibility mapping (QSM) and R2* in the human brain at 3 T: Evaluation of intra-scanner repeatability

MR imaging of human brain at 3.0 T: preliminary report on transverse re-laxation rates and relation to estimated iron content

Imaging iron stores in the brain using magnetic resonance imaging

Quantitative susceptibility mapping: current status and future directions

The use of the l-curve in the regularization of discrete illposed problems

Biophysical mechanisms of phase contrast in gradient echo MRI

Quantitative susceptibility mapping at 3 T and 1.5 T: evaluation of consistency and reproducibility

Correlation of proton transverse relaxation rates (R2) with iron concentrations in postmortem brain tissue from Alzheimer's disease patients

The effect of low resolution and coverage on the accuracy of susceptibility mapping. Magnetic resonance in medicine

Effects of aging on T1, T2 * , and QSM MRI values in the subcortex

The impact of white matter fiber orientation in single-acquisition quantitative susceptibility mapping

Quantitative susceptibility mapping: report from the 2016 reconstruction challenge

Quantitative susceptibility mapping in Parkinson's disease

Quantitative susceptibility mapping (QSM) as a means to measure brain iron? A post mortem validation study

Quantitative MR imaging of brain iron: a postmortem validation study

The contribution of myelin to magnetic susceptibility-weighted contrasts in high-field MRI of the brain

3D texture analysis within substantia nigra of Parkinson's disease patients on quantitative susceptibility maps and R2* maps

Quantifying arbitrary magnetic susceptibility distributions with MR

Susceptibility tensor imaging (STI) of the brain

Quantitative susceptibility mapping of human brain at 3T: a multisite reproducibility study

Nonlinear formulation of the magnetic field to source relationship for robust quantitative susceptibility mapping

Morphology enabled dipole inversion for quantitative susceptibility mapping using structural consistency between the magnitude image and the susceptibility map

High resolution magnetic susceptibility mapping of the substantia nigra in Parkinson's disease

Magnetic hysteresis anomalies in ferritin

Iron, myelin, and the brain: Neuroimaging meets neurobiology

Robustness of PSIR segmentation and R1 mapping at 7T: a travelling head study

DREAM--a novel approach for robust, ultrafast, multislice B(1) mapping

Algorithm for fast monoexponential fitting based on auto-regression on linear operations (ARLO) of data

Signal-to-noise ratio and MR tissue parameters in human brain imaging at 3, 7, and 9.4 tesla using current receive coil arrays. Magnetic resonance in medicine

SENSE: sensitivity encoding for fast MRI

R: A language and environment for statistical computing. R Foundation for Statistical Computing

The future of susceptibility contrast for assessment of anatomy and function

High-resolution blood oxygen-level dependent MR venography (HRBV): a new technique

An illustrated comparison of processing methods for MR phase imaging and QSM: combining array coil signals and phase unwrapping

The NMR phased array

Reproducibility of R2* and quantitative susceptibility mapping (QSM) reconstruction methods in the basal ganglia of healthy subjects

Quantitative imaging of intrinsic magnetic tissue properties using MRI signal phase: an approach to in vivo brain iron metabolism

Toward online reconstruction of quantitative susceptibility maps: superfast dipole inversion

Fast robust automated brain extraction

Iron, the substantia nigra and related neurological disorders

Clinical integration of automated processing for brain quantitative susceptibility mapping: multi-site reproducibility and single-site robustness

Suitable reference tissues for quantitative susceptibility mapping of the brain

Differential rates of regional brain change in callosal and ventricular size: a 4-year longitudinal MRI study of elderly men

Structural and functional quantitative susceptibility mapping from standard fMRI studies

Extensive heterogeneity in white matter intensity in high-resolution T2*-weighted MRI of the human brain at 7.0 T

Selective channel combination of MRI signal phase

Adaptive reconstruction of phased array MR imagery

Quantitative susceptibility mapping (QSM): decoding MRI data for a tissue magnetic biomarker

A systematic review of the utility of 1.5 versus 3 Tesla magnetic resonance brain imaging in clinical practice and research

Quantifying test-retest reliability using the intraclass correlation coefficient and the SEM

Whole-brain susceptibility mapping at high field: a comparison of multiple-and single-orientation methods

Imaging brain function in humans at 7 Tesla

Postprocessing correction for distortions in T2* decay caused by quadratic cross-slice b0 inhomogeneity

Neuro image susceptibility contrast in high field MRI of human brain as a function of tissue iron content

Brain iron in MR imaging: R2* and phase shift at different field strengths

Background field removal by solving the Laplacian boundary value problem