Augmenting disease maps: a Bayesian meta-analysis approach

Analysis of spatial patterns of disease is a significant field of research. However, access to unit-level disease data can be difficult for privacy and other reasons. As a consequence, estimates of interest are often published at the small area level as disease maps. This motivates the development of methods for analysis of these ecological estimates directly. Such analyses can widen the scope of research by drawing more insights from published disease maps or atlases. The present study proposes a hierarchical Bayesian meta-analysis model that analyses the point and interval estimates from an online atlas. The proposed model is illustrated by modelling the published cancer incidence estimates available as part of the online Australian Cancer Atlas (ACA). The proposed model aims to reveal patterns of cancer incidence for the 20 cancers included in ACA in major cities, regional and remote areas. The model results are validated using the observed areal data created from unit-level data on cancer incidence in each of 2148 small areas. It is found that the meta-analysis models can generate similar patterns of cancer incidence based on urban/rural status of small areas compared with those already known or revealed by the analysis of observed data. The proposed approach can be generalized to other online disease maps and atlases.


Introduction
A major field of research in spatial epidemiology involves the construction of disease maps showing the spatial distribution of disease [1]. Disease mapping is usually undertaken using observational disease data (spatially aggregated count data) to show small area estimates of disease incidence, prevalence, mortality or relative risk of a specific disease in small areas [2].
Sometimes the disease maps are published in the form of an atlas, for example, Surveillance Atlas of Infectious Diseases [3], the Atlas of Heart Disease and Stroke [4], the Environment and Health Atlas of England and Wales [5], the U.S. Atlas of Cancer Mortality [6], Atlas of Cancer in Queensland [7], Cancer Atlas of the United Kingdom and Ireland [8] and Australian Cancer Atlas [9]. Since the observational disease data are usually subject to privacy and confidentiality constraints, the modelled disease rates for small areas of a country or region are reported in the atlases. If these reported estimates can be used to extract important information about the spatial distribution of a disease and the influence of different risk factors on the incidence and/or survival, this would widen the scope of research in spatial epidemiology.
Meta-analysis is a popular method for combining results from different studies quantitatively [10], particularly when those results are in the form of aggregate or summary data [11,12]. The present study aims to develop a Bayesian hierarchical meta-analysis model which will model published estimates of disease from atlases to identify specific patterns in disease incidence and will yield similar types of inferences to those obtained using the raw data. The proposed model is illustrated for a specific disease, cancer, in the present study.
Cancer is the second leading cause of death in the world [13] and is an important topic of research in the field of spatial epidemiology [14]. In the twenty-first century, cancer is expected to become the single most significant obstacle to increasing life expectancy in every country in the world [15]. In 2018, the estimated number of new cases of cancer was 18.1 million and estimated number of deaths from cancer was 9.6 million worldwide [15].
A major field of research in cancer epidemiology is the assessment of spatial patterns of cancer. Substantial research has focused on assessing spatial patterns of cancer in different parts of the world [16][17][18][19][20][21][22][23]. The common sources of spatially referenced cancer data used for spatial analysis of cancer are population-based cancer registries; those are supplemented with additional population data, health survey data along with environmental and remote sensing data [24]. Most of the previous studies modelling geographical patterns of cancer incidence used observed population-based counts of new cases of cancer in a region or area, adjusted by the size of the population in each area, for specified periods obtained from a population or cancer registry [20,25,26].
Models for spatial patterns often incorporate spatial smoothing. Among many methods of smoothing, Bayesian methods [27] are very popular, for example, via Gaussian Markov Random fields [28], especially conditional autoregressive (CAR) models, as a prior for the spatial effects [29]. Some common model formulations using CAR representations for spatial smoothing in disease mapping are the Besag-York-Mollié (BYM) model [30], Leroux model [31] and Cressie model [32].
Often, researchers wish to undertake further analysis of disease patterns in these atlases. As an alternative to accessing the underlying observational data, the reported estimates in the atlases can be further modelled using a Bayesian hierarchical meta-analysis model to gain additional insights. In the literature, there are many studies that perform meta-analysis of disease estimates (standardized incidence or mortality ratio, SIR/SMR) across different published studies [33][34][35][36]. However, metaanalysis models using estimates derived from areal data from a single database, particularly to analyse the outcomes of a cancer atlas have not been explored yet, to the best of our knowledge.
The proposed Bayesian hierarchical meta-analysis model approach is not a traditional application of meta-analysis to estimates from different published studies; rather, it is an application of meta-analysis to estimates from a single study, that quantifies the ecological effect of covariates using model-based estimates. The proposed model is applied to analyse the spatially smoothed point and interval estimates available in the Australian Cancer Atlas (ACA) for the 20 different cancers in 2148 small areas (Statistical Area level 2, SA2, defined by Australian Bureau of Statistics (ABS), Australian Statistical Geography Standard (ASGS) 2011 boundaries [37]) across the country. The primary question of interest that is used to illustrate the approach is focused on whether cancer incidence varied in urban, regional and remote areas across Australia. Each of the SA2s belongs to each of the three remoteness categories according to geographical and accessibility to services within each area.
In cancer epidemiology, studies to examine the relationship of cancer incidence/survival/mortality and geographic remoteness is well researched [38][39][40][41]. For example, the relationship between risk of advanced colorectal cancer incidence in Queensland and geographical remoteness was found to be significant for those diagnosed with colon cancer [42]. A classification tree approach has also confirmed a significant association between remoteness and the incidence of several cancers [41]. The cancer disparities in different remoteness categories have also been researched in [43][44][45][46], etc. All the mentioned studies focused on the influence of remoteness on cancer outcomes using population-based cancer data. Hence, in the present study, this well-researched and important research question is chosen to provide additional comparisons by which the validity of the proposed approach can be assessed.
royalsocietypublishing.org/journal/rsos R. Soc. Open Sci. 7: 192151 The meta-analysis approach was proposed because this opens up other avenues for extracting insights from data when only the summary data are published and the original data are unavailable. This is often the case for health data which are subject to privacy and confidentiality. It is true that the original authors could be approached to undertake the follow-on analyses, but this may not always be possible: for example, they could simply refuse, or not have time or funding to implement the request. Moreover, even if the original data were available, the analysis of primary data often requires some domain knowledge. The proposed approach provides a statistically valid methodology to model the published point estimates, taking into account their associated uncertainty, in straightforward manner. We show that this can facilitate new insights, in our case an enhanced understanding of the spatial distribution of cancer.
Following this introduction the paper is organized as follows: §2 consists of a description of data; §3 describes the methods and §4 reports the results; §5 outlines some possible model extensions followed by a discussion in §6.

Data
The present study uses publicly available small area estimates from the Australian Cancer Atlas (ACA) [9]. The ACA is a freely accessible and interactive online platform, showing the spatial variation in standardized incidence and excess deaths for 20 cancers across Australia (see table 13 for a list of the cancers). A key feature of the ACA is the use of Bayesian spatial models to generate the point estimates and their 95% credible intervals for the estimates of standardized incidence ratios (SIR) and excess hazard ratios (EHR) for each cancer in each of 2148 geographical areas (SA2) covering Australia. To generate the estimates of SIR, unit-level data on each cancer over different time periods were modelled using Bayesian spatial models. For 14 cancer types (oesophageal, stomach, liver, pancreatic, cervical, uterine, ovarian, kidney, brain, thyroid, non-Hodgkin lymphoma, leukaemia, myeloma and head and neck), data on a 10-year time period (2005-2014) were used. For the remaining six cancer types (bowel, lung, melanoma, breast, prostrate, all cancers combined), data on a 5-year time period (2010-2014) were used.
Statistical Areas level 2 (SA2) are medium-sized general-purpose areas designed to represent a community that interacts together socially and economically [37]. There are 2196 SA2 regions (ASGS 2011 Boundaries [37]) covering the whole of Australia without gaps or overlaps. In the ACA, SA2s with zero/nominal population, far-flung islands are excluded.
To address the research question, we focused on the SIR and information on the remoteness status in 2011 of each SA2. This information on remoteness was obtained using the Remoteness Index provided by the Australian Bureau of Statistics (ABS), which classifies small areas in Australia into five categories of remoteness based on their relative access to services. There are five categories of Remoteness-Major city, Inner regional, Outer regional, Remote and Very remote. The original five categories were combined into three classes as: 1 = Major Cities (1242 SA2s), 2 = Inner/Outer Regional (810 SA2s) and 3 = Remote/Very Remote (96 SA2s) in the present study.

Bayesian meta-analysis Model
Meta-analysis can be accomplished by applying fixed effects or random effects models to analyse aggregate or summary data or individual data published in different studies on the same subject [11,12]. There has been an active literature on Bayesian approaches of meta-analysis of different types of data, since the use of hierarchical Bayesian model to cast a random effects model [47]. For more details on meta-analysis using Bayesian inference, choice of prior distributions, Bayesian computation and interpretation with examples, see Koricheva et al. [48]. The Bayesian spatial model used to generate the estimates reported in the ACA is described in §3.2. Since this model has already incorporated spatial smoothing to protect the identity of the individuals, additional smoothing terms are not considered in the proposed meta-analysis model, although we revisit this in the section on Model extensions. In the following, we adopt the approach taken in the ACA and model each cancer individually.

Model specification
The data available for modelling are the published estimates of the posterior mean standardized incidence rate (SIR) and corresponding 95% credible intervals for each of N = 2148 statistical areas (SA2). Using notation based on [49], let Y i[ j ] be the modelled log(SIR) for the ith SA2 (i = 1, 2, ….2148) royalsocietypublishing.org/journal/rsos R. Soc. Open Sci. 7: 192151 in the jth remoteness region ( j = 1, 2, 3, 1 = major cities, 2 = regional and 3 = remote). Similarly, let S i[ j ] be the associated standard deviation of Y i[ j ] , obtained from (log(UCL) − log(LCL))/(2 × where θ j is the region-specific mean of the logSIR for the specific cancer and s 2 j is the corresponding variance. The region-specific mean θ j can in turn be modelled as where μ 0 is the overall mean log(SIR) and s 2 0 is the associated variance. Priors of the above hierarchical Bayesian meta-analysis model can be specified as: x 2 n , ( 3 :5) and where ν is the associated degrees of freedom for s 2 ij , t j ¼ 1=s 2 j and t 0 ¼ 1=s 2 0 are the precision parameters associated with the region-specific variance and the overall variance, and N + denotes the zero-mean truncated normal distribution truncated from below at zero.
The above model is fitted for each sex and each of the 20 cancers independently. Thus, we did not include any cancer-specific or sex-specific indices.
The choices of priors and associated hyperparameters were made on the basis of the results of the sensitivity analysis. A total of 76 combinations of priors using different distributions and different values for the hyperparameters were compared and the above model was evaluated to perform reasonably well in terms of convergence (assessed by visual diagnostics and the Gelman-Rubin statistic [50]) and posterior predictive checks (posterior predictive distribution plots with observed value for summary measures, Bayesian posterior predictive p-values and visual comparisons between replicated data drawn using a posterior predictive distribution with the estimates from the ACA). For the prior on the overall mean, μ 0 , a normal prior with many different choices of variance parameter were compared. For the variance parameters, s 2 j and s 2 0 , after comparing all the conventional priors (inverse Gamma, half normal for standard deviation and so on), we found that, for modelling the atlas output data, placing half normal priors on precision parameters gave better model performance, hence we made the above choices. We set ν = 2, as a small value of ν reflects little a priori knowledge about the within-study variation. For more details of the sensitivity analysis, see the electronic supplementary material.

Model implementation
The proposed model was fitted using the R programming language (R 3.5.3) [51] using the package R2jags v. 0.5-7 [52]. The MCMC output of the JAGS model was summarized in R using the coda package [53]. The JAGS code for the model is given in appendix A.
Three parallel MCMC chains, each with 500 000 iterations with a burn in period of 400 000 iterations were run to fit the proposed model. Convergence was examined by using visual diagnostics for the parameters of interest. The parameters for which posterior samples are drawn are: μ ij , s 2 ij , θ j , τ j , μ 0 and τ 0 . In this study, we focused on the region-specific estimates of mean log(SIR) represented by θ j , the corresponding precision parameters τ j and the overall mean μ 0 along with the corresponding precision parameter τ 0 . The region-specific means of log(SIR), θ j , or e uj , which is the mean SIR in jth region, are the key parameters of interest to describe the pattern of incidence of each cancer in major cities, regional and remote areas.

Model inferences
A range of measures were computed in order to identify patterns of cancer incidence by remoteness of the regions. Using the posterior estimates of region-specific means of log(SIR), θ j , in each MCMC iteration the regions were ranked in descending order and the summary of ranks along with 95% credible intervals and probability distributions of ranks for each region were calculated for each cancer. The posterior means of θ j ( j = 1, 2, 3) were also used to obtain pairwise comparisons of region-specific incidence of each cancer probabilistically. For each of the cancers, the values of θ j in each of the MCMC iterations were ordered, thus enabling calculation of the probabilities of major cities having higher posterior means of log(SIR) than that of regional and remote areas, also the probability of regional areas having higher incidence than those in remote areas for males and females. These probabilities provided additional support to the comparisons made using ranks for the observed patterns of cancer incidence by remoteness. In addition to the pairwise probabilistic comparison of major cities, regional and remote areas, the point and interval estimates of posterior mean differences were calculated and reported for each of the three pairs (major cities and regional, major cities and remote, regional and remote) for all the cancers by sex.

Model validation
The results of the proposed Bayesian meta-analysis model for all 20 cancers were verified by comparing with those obtained from unit-level analysis of cancer registry data from each cancer in 2148 SA2s in Australia. Following the choices adopted in ACA, the Leroux model [31] was fitted using the observed incidence data for each cancer. Let Z ij be the number of reported cancer diagnoses in the ith SA2, and jth region modelled as where E ij represents the expected number of cancers in the ith SA2 and jth region, which can be calculated for each SA2 in each region by multiplying the SA2 population with the ratio of number of cancers diagnosed in Australia to the Australian population in each 5-year age group, then summing over all age groups. Under this model, the log(SIR), μ ij , was modelled as, where I j ∈ (0, 1) is an indicator for the jth region ( j = 1, 2, 3 for major cities, regional and remote areas, respectively) and β j is the associated coefficient. Note that the model adopted in the ACA does not have the covariate I j to consider the remoteness of an area in the model; it was included here in order to validate the proposed meta-analysis model. The term R i accounts for spatial autocorrelation between the SA2s. The spatial random effects term has a conditional distribution [31]; . . , 2148 areas: That is, the expected random effects are weighted averages of the random effects of neighbouring areas, R −1 = {R 1 , R 2 , …, R i−1 , R i+1 , …, R N } with weight ρ and have a global mean of 0 with weight (1 − ρ). If the spatial dependence parameter, ρ, equals 1, the Leroux prior is the same as the intrinsic CAR prior [30], and if ρ equals 0, the areas are spatially independent and no spatial smoothing occurs. The term W ik is the (i, k)th element of N × N spatial adjacency matrix having value 1 if areas i and k are considered to be neighbours and 0 otherwise. Neighbours are generally defined by shared boundaries, although islands are assigned neighbours predominately based on the closest mainland access points. An inverse Gamma prior is specified for the variance s 2 R and a uniform (0, 1) prior is chosen for the spatial-dependent parameter, ρ. For further details, see https://atlas.cancer.org.au/ methodology/model-for-cancer-diagnosis/.
In the Leroux model applied to observed incidence data (equations (3.8)-(3.10)), the regions, major cities, regional and remote were used as covariates and a spatial random effects term (R i ) was included to account for spatial autocorrelation. In the proposed meta-analysis model, the regions are added in a hierarchy of the model (see equations (3.1)-(3.7)) and no spatial term was included explicitly since this was already in the reported ACA estimates.
To compare the results of the proposed meta-analysis model and the Leroux model fitted to the observed (real) areal data of cancer incidence, the relative difference between the posterior estimates royalsocietypublishing.org/journal/rsos R. Soc. Open Sci. 7: 192151 of region-specific means of the meta-analysis model (denoted by e u j ) and the unit record model (denoted by e b j ), was calculated as The posterior distribution of ranks of the region-specific mean SIRs (for major cities, regional and remote areas) are also calculated for both models.

Results
The proposed model was applied to analyse the outputs of each of the 20 cancers included in the ACA for both males and females, where applicable. In §4.1, the detailed results obtained by fitting the proposed model for two cancers, lung cancer and thyroid cancers for males and females are reported. Following this, the findings for all the cancers are summarized in §4.2. This is followed in §4.3 by a comparison of the outputs of meta-analyses with those found by analysing the raw incidence data in order to validate the proposed modelling approach. . The estimates of θ 1 , θ 2 and θ 3 are the region-specific means of the log(SIR) in major cities, regional and remote areas, respectively. From the summary results, we can see that the standardized incidence of lung cancer is relatively higher than the Australian average in remote areas and regional areas, and lower than the Australian average in major cities. For a more detailed comparative analysis of geographical disparities in lung cancer incidence, the region-specific means, θ j , were ranked in descending order (region with highest mean log(SIR) is ranked 1) in each iteration of the MCMC chains generated for fitting the proposed model. The summaries of the posterior distribution of ranks of each region and the probability distribution of ranks per region are shown in tables 2 and 3.
From tables 2 and 3 of the ranks of the area-specific means, it is clear that areas within major cities are more likely to have lower incidence of lung cancer and remote areas are most likely to have higher incidence of lung cancer for both sexes. This is supported by the pairwise comparisons given in table 4, the pairwise posterior mean differences of log(SIR), shown in table 5 and the posterior relative risks of SIRs for regional and remote areas with reference to major cities shown in table 6. This is also well established in the literature that lung cancer is known as a low socio-economic status (SES) cancer, having higher risk in the areas with low SES neighbourhood [54,55]. In Australia, lung cancer risk is more in remote areas in comparison to major cities and regional areas [56]. So the output of the proposed meta-analysis models for lung cancer by remoteness categories are supported by the existing literature as well. Figures 1 and 2 show the SIRs of lung cancer (for males and females) from the ACA and the fitted Bayesian hierarchical meta-analysis models. There is larger variability in the SIRs from the atlas  compared with those of the fitted model, since the effect of remoteness has been removed from the estimates in our fitted model. The atlas estimates are obtained grouping all small-area estimates by remoteness categories without considering remoteness in the model, whereas in the proposed model, the posterior SIRs are the result of a model where remoteness categories are included as a covariate to obtain region-specific SIRs for major cities, regional and remote areas.  . The estimates of θ 1 , θ 2 and θ 3 are the specific means of the log(SIR) in major cities, regional and remote areas, respectively. From the summary results, we can see that the standardized incidence of thyroid cancer is relatively higher than the Australian average in major cities and lower than the Australian average in regional and remote areas. The summaries of the posterior distribution of ranks of each region and the probability distribution of ranks per region for thyroid cancer are shown in tables 8 and 9.
From tables 8 and 9, it is apparent that thyroid cancer is most likely to have higher incidence in the areas within major cities for both sexes, which is supported by the pairwise comparisons given in table 10, pairwise  posterior mean differences of log(SIR), shown in table 11, and the posterior relative risks of SIRs for regional and remote areas with reference to major cities in table 12. Figures 1 and 2 also depicts the SIRs of thyroid cancer (for males and females) from the ACA and the fitted Bayesian hierarchical meta-analysis models.
Regarding the mismatches in the probabilities and posterior mean differences, the width of credible intervals constructed for the posterior mean differences obviously reflects the amount of variation in these estimates. While calculating the probabilities, we calculated the proportion of times the fitted mean for major cities is larger than the corresponding means for regional and remote areas, and so on. We did not consider the magnitude of the differences or the uncertainty of the obtained probabilities. We inferred differences between regions based on the credible intervals for the posterior mean differences.

Summary of outputs for Bayesian meta-analysis model
A summary of outputs for all the 20 cancers included in ACA is reported in this section. Table 13 shows the pairwise probabilities of major cities having higher SIR than regional and remote regions, as well as the probabilities of regional areas exceeding remote regions with respect to SIR, for each of the 20 cancers (by sex where applicable). For instance, on average, the SIR in major cities is almost certainly larger than the SIR in regional and remote areas for eight cancer/sex groups (for males: non-Hodgkin lymphoma, stomach and for females: breast, myeloma, non-Hodgkin lymphoma, ovarian, stomach, thyroid). Conversely, on average, major cities are not likely or substantially less likely to have higher SIRs than regional and remote areas for 10 cancer sex group (for males: bowel, head and neck, lung, oesophageal and for females: bowel, cervical, head and neck, lung, melanoma, oesophageal). On the contrary, for leukaemia (females), the probability that the SIR is larger in major cities than regional areas is 0.476 which indicates little difference between the SIR in major cities and regional areas, on average. Similar interpretation can be made for pancreatic cancer (for males: P(major cities > remote) =  0.573, P(regional > remote) = 0.487; for females: P(major cities > regional) = 0.587), bowel cancer (for females: P(major cities > regional = 0.587) and kidney cancer (for males: P(major cities > regional) = 0.605). So a probability close to 0.5 does not indicate substantial difference between the pair under comparison. However, the ranking of the three regions is still able to show a pattern, so in the mentioned cases the pattern is suggestive, not substantial. In addition to the ranks and probability,  the magnitude increase or decrease in the SIRs of regional and remote areas in Australia compared with those of major cities, on the average. Figures 3 and 4 show the fitted values of SIRs obtained from the proposed meta-analysis model for major cities, regional and remote areas for each cancer for males and females, respectively; the horizontal line represents the Australian average. These figures show, at a glance, the large variation in the SIRs by remoteness categories. A comparison of these fitted values to the SIRs from the atlas is shown in figures 5

Validation of outputs of Bayesian meta-analysis model
In this section, the region-specific estimates obtained from the proposed Bayesian hierarchical metaanalysis model on the ecological data (equations (3.1)-(3.7)) and the Leroux model using the observed areal incidence data with remoteness categories as covariates (equations (3.8)-(3.10)) are compared.
The posterior distribution of ranks of the region-specific mean SIRs (for major cities, regional and remote areas) for each cancer revealed consistently the same patterns. For instance, according to both models, SIR of brain cancer is most likely to have higher values in major cities compared with regional and remote areas, melanoma is most likely to have a higher SIR in regional areas compared with major cities and remote areas (table 13).
The difference in posterior estimates of mean SIR for each cancer, for males and females, between the output of unit record model and Bayesian hierarchical meta-analysis model are displayed in figures 7 and 8 respectively. Table 15 shows the relative difference between the posterior estimates of region-specific means of the meta-analysis model and the unit record model. The relative differences between the posterior means for major cities and regional areas are small, with the relative differences being less than 5% for most of the cancers, and less than 10% for only four cancers (table 15). However, the relative differences between the posterior mean estimates for remote areas are much larger. This is because of the smaller number of remote areas and the greater variability between the estimated SIRs in this category. See appendix A.2. for further exposition.

Model extensions
The proposed Bayesian hierarchical meta-analysis models have been applied to ACA data to identify relationship between remoteness categories and incidence of 20 different cancer types by sex. The

Inclusion of a spatial component
In the present study, we have modelled spatially smoothed estimates of log(SIR) from ACA. Since the estimated SIRs were already results of a Bayesian spatial model (with a Leroux prior for the spatial term), we did not use any spatial component in our proposed hierarchical Bayesian meta-analysis model. Further investigation of the residuals of the fitted models for 20 different cancer types by sex resulted in Moran's I values ranging from 0.15380 to 0.89485 with corresponding standard deviations from 0.01447 to 0.01450 and all p-values less than 0.0001 (see appendix A. 3, table 22). To improve the model performance, a spatial term could be added to the proposed meta-analysis model and a spatial prior could be chosen for modelling. This would be a straightforward extension to the specified model (equations (3.1)-(3.7)); instead of equation (3.1) in the present model, we can adopt the following: ) is an unstructured error component and ψ i[ j ] is a spatial random effect having a spatial prior. Any suitable spatial prior could be chosen to model the spatial component [30,31]. For illustration, the proposed Bayesian meta-analysis model with spatial component is fitted for two cancer types, pancreatic and liver cancer (males), which have Moran's I values 0.15380 and 0.89485, respectively (table 22). The updated model codes adding a spatial component are available in appendix A.4.

Meta-analysis models with spatial component for liver and pancreatic cancer (males)
The Bayesian hierarchical meta-analysis model with spatial component introduced at first step as shown by equation (5.1) are fitted in R using R2WinBUGS package. We fitted the model for liver and pancreatic cancer (males). The results are summarized in this section.
The posterior estimates of the model parameters and the corresponding 95% credible intervals for the two cancers are presented in table 16. The posterior estimates of region-specific means θ j , j = 1, 2, 3 for each of the three regions (major cities, regional and remote areas) are ranked and the ranks are summarized in royalsocietypublishing.org/journal/rsos R. Soc. Open Sci. 7: 192151 table 17. The ranks show that major cities have highest incidence of liver and pancreatic cancer. The median ranks are the same in terms of identifying the region with lowest incidence compared with those obtained by the model without the spatial component (table 13). However, for the region with highest incidence, the 95% credible intervals for both models agree in spite of differences in median ranks, as observed from the credible intervals of the ranks. The probabilities of major cities having larger incidence than regional and remote areas are shown in table 18, followed by the posterior mean differences with 95% credible intervals in table 19.
The posterior relative risks of liver and pancreatic cancer for the extended model (table 20) and the model without a spatial component (table 14) are also similar. Hence adding an additional spatial component to the Bayesian hierarchical meta-analysis model did not change the inference for the two cancers modelled here. In figures 9 and 10, we can observe the posterior mean SIRs for each region for both models. As expected, there are differences in the magnitude due to the additional spatial smoothing, but the observed pattern remains the same for both the cancers.
The relative differences of estimated log(SIR) in the three different remoteness regions using a metaanalysis model with a spatial component and the model using observed incidence data are shown in table 21. The relative differences of the model without spatial component (table 15) and the extended model with a spatial term are very close and they are all very small (less than 5%). However, the extended model could be fitted to cancers with larger relative differences such as myeloma, brain, prostate for males and liver, head and neck, ovarian for females to check whether adding a spatial component can reduce the relative difference from the estimates obtained using the observed data.
Moran's I of the residuals after fitting the meta-analysis model with a spatial component for liver and pancreatic cancer (males) are 0.8857 (s.d. 0.1449, p-value , 0:0001) and 0.2051 (s.d. 0.0144,  p-value , 0:0001), respectively, which are still statistically significant and very similar to what we   From this further investigation, we recommend using the proposed Bayesian hierarchical meta-analysis model without the additional spatial component to model estimated cancer incidence data if the estimates were results of spatial models, for example, in case of ACA data. However, if the estimates are not obtained using a spatial random effects model, then including a spatial component in the meta-analysis  Figure 10. Posterior mean SIR of pancreatic cancer (males) for meta-analysis models with and without spatial component. with spatial without spatial model SIR major cities regional region remote Figure 9. Posterior mean SIR of liver cancer (males) for meta-analysis models with and without spatial component. model might be useful and indeed essential. It is also noted that these observations are based on this particular dataset; this motivates further research in order to make more general statements.

Adding continuous covariates
In the proposed Bayesian hierarchical meta-analysis model, only one covariate is included as a level of hierarchy, which is remoteness categories. The model can be modified to include continuous covariates if we replace the θ j by θ in equation (3.2) as where X is a design matrix comprising continuous covariates and β is the vector of corresponding regression coefficients. The resulting model equation can thus be rewritten as a meta regression model with normal priors on the regression coefficients [48].

Discussion
This study has proposed and illustrated a methodology for using information provided in the form of disease maps in atlases. The benefits of the proposed approach that has been demonstrated include the ability to gain further insights from the atlases without having access to the original data, which are sometimes unavailable due to privacy and other reasons. The utility of this approach has been illustrated by the substantive analysis of estimated point and interval estimates of SIRs available in the ACA. The proposed methodology was applied to analyse the outputs from the ACA for 2148 small areas across Australia. This study aimed to determine whether a meta-analysis approach using modelled estimates was able to obtain similar inferences with those obtained using observed areal incidence data in identifying differences in cancer incidence by remoteness of an area. The output of the proposed meta-analysis model demonstrated patterns of cancer incidence by remoteness categories for most of the cancer types. It was observed that some cancers are more likely to occur in remote areas (head and neck cancers, liver cancer, lung cancer, oesophageal cancer for males and females and cervical cancer, uterine cancer for females), while some cancers are more likely to occur in regional areas (all cancers, bowel cancer, melanoma for both sexes, kidney cancer for females, leukaemia and prostate cancer for males) and some cancers are more likely to have greater incidence in major cities (brain, myeloma, non-Hodgkin lymphoma, pancreatic, stomach, thyroid cancer for both sexes, kidney cancer for males, leukaemia and ovarian cancer for females). Using the estimated SIRs reported in the ACA, the proposed model probabilistically identified these patterns of cancer incidence for different regions (major cities, regional and remote areas) in Australia. These probabilistic inferences could be useful to policy-makers by supplementing the information from the ACA.
The meta-analysis method used in our study is, in effect, calculating the average of already averaged SIR estimated for each SA2; this means that there will be some measurement error compared with average SIR estimates based on observed areal incidence data. However, given the slight differences in time periods, the general consistency between the remoteness patterns in cancer incidence we have reported in our study with those reported by the Australian Institute of Health and Welfare (AIHW) in the CIMAR (Cancer Incidence and Mortality Across Regions) books using observed areal incidence data [38] is encouraging. This is particularly so when we focus on our results that had very high (∼1) or low (∼0) probabilities of being higher or lower than other remoteness categories. Clearly, there is greater uncertainty when the probabilities are closer to 0.5, and so these patterns should be interpreted with much greater caution. As an example, our model reported that incidence of kidney cancer for males in major cities could be higher than that of regional areas with a probability of 0.605, which still leaves a substantial 30% probability that regional areas do have higher incidence (as was reported by the AIHW [38]). The remoteness patterns from our proposed meta-analysis model for Australia are also similar to that reported in the Atlas of Cancer in Queensland [57]. The authors considered 13 cancers and four geographical regions: major cities, inner regional, outer regional and remote areas. Cancers having higher incidence in remote areas were lung cancer, oesophageal cancer, cervical cancer; cancers having higher incidence in major cities were breast cancer, kidney cancer; and cancers which have higher incidence in regional (inner or outer) areas were melanoma, prostate and non-Hodgkin lymphoma.
The output of the proposed meta-analysis model was also validated by modelling the observed areal incidence data from 2148 SA2s around Australia. The meta-analysis approach was able to unveil the original regional patterns of cancer incidence found by the unit record model in the majority of cases.
While insights about the different patterns of cancer incidence by remoteness of regions have been widely published [38,57], the rationale of using this methodology to regenerate these patterns was to provide evidence supporting the validation of the proposed modelling approach. The purpose was to confirm whether the proposed model could reveal the same patterns as those observed using the raw data so that this methodology can later be applied to obtain additional insights into other ecological factors which are not already known.
There are limitations to the proposed approach. First, the posterior mean of overall average of each cancer using the proposed meta-analysis model is sometimes different from the Australian average 1 (for lung cancer: 1.05 for males, 1.01 for females, thyroid cancer: males 0.901, females 0.814). However these differences are not substantial as the 95% credible interval around the overall fitted mean contains 1 in all the cases. The reason behind the mean of the parameter not being exactly equal to the Australian average is that, in our proposed meta-analysis model, we are modelling the SIRs, which are ratios themselves and the average of the ratios and the ratio of averages will not be the same [58].
Second, to be able to apply the proposed methodology to online atlases, both point estimates as well as interval estimates or uncertainty estimates need to be available. Moreover, the proposed model might not provide accurate numerical estimates in the presence of outliers; this may explain the more limited validation of the proposed model for remote areas.
We also acknowledge that the proposed methodology could be subject to Simpson's paradox [59][60][61], which refers to patterns observed for groups of data reversing or disappearing when combined together, since we are analysing summary statistics, as opposed to individual-level data [62,63]. In the spatial context, Simpson's paradox could be considered as a form of modifiable areal unit problem, where results differ depending on the spatial units chosen [64]. Our analysis has the advantage of retaining the original areas within the analysis, and in the case study presented, the areas are the smallest possible available. Results can be considered valid provided they are not extrapolated to an individual level.
The proposed Bayesian hierarchical meta-analysis model is somewhat sensitive to the choice of priors and hyperparameters since they are needed at the lower levels of hierarchy, i.e. for overall mean, overall precision and region-specific precision parameters. Sensible choices of the hyperparameters and prior distributions must be made for the priors of overall mean and precision parameters (or variance parameters) in the model to ensure better model performance. Details of a comprehensive sensitivity analysis to explore this issue are reported in the electronic supplementary material.
The modelling approach proposed in this study can be applied to other published disease atlases that provide point and interval estimates of disease rates. This study explored the behaviour of cancer incidence according to urban/rural status of the small areas. Similar types of analyses can be carried out to identify the influence of different socio-demographic, economic, ecological or environmental factors on specific disease incidence or mortality. This implies that further ecological modelling applying the proposed approach can answer important additional research questions to complement the information already published in the atlas or associated literature. The proposed model can also be extended relatively straightforwardly to a multivariate framework.
Another possible extension of the proposed approach is to combine estimates from other atlases together with ACA and pool the effect of remoteness. There will be some additional challenges in appropriately describing the heterogeneity arising within and between the different studies, and accommodating different definitions of the remoteness categories from the atlases. This is a worthy objective for a future study.
Data accessibility. Two data sources were used. 1. Modelled estimates, freely available and downloaded from: atlas.cancer.org.au/app on 5 October 2018. The URL of the database is: https://atlas.cancer.org.au/app.
After launching the atlas, the top right corner (beside the cancer's name, there is an options button, clicking on that button opens a pop-up for data download. Interested people can type their email in the designated place and the link to download full atlas dataset, in the form of a zip file will be emailed to them within a couple of minutes. This step is necessary for obtaining the data and the Atlas wants to keep record of people who accessed the data.
royalsocietypublishing.org/journal/rsos R. Soc. Open Sci. 7: 192151 A.2. More on larger variation in posterior estimates of remote areas Apparently, the modelled estimates from the ACA exhibits more variability in remote areas in comparison to major cities and regional areas on the average. We particularly investigated those cancers which showed larger variability in relative differences in estimated SIRs from meta-analysis output and those obtained from analysing the real data. The reasons could be one of the following: -The smaller number of remote areas than the number of major cities and regional areas among the 2148 small areas ( figure 11). -The larger variability in the estimates of each remote area could be the result of smaller population sizes and fewer cases of some cancers in respective small areas. The inherent variability in estimated SIR in the remote areas for the selected cancers from the ACA makes the credible interval for remote areas wider, which will result in larger relative differences in the posterior means (figures 12 and 13). Perhaps, this can be improved a little by considering the median of the posterior SIRs (from metaanalysis models and model fitted using observed data).

A.3. Measures of spatial autocorrelation
The spatial autocorrelation among the residuals of the fitted models for all 20 cancers by sex are listed in table 22.     royalsocietypublishing.org/journal/rsos R. Soc. Open Sci. 7: 192151