Spatial analysis method

This protocol is extracted from research article:

Spatiotemporal characteristics and meteorological determinants of hand, foot and mouth disease in Shaanxi Province, China: a county-level analysis

**
BMC Public Health**,
Feb 17, 2021;
DOI:
10.1186/s12889-021-10385-9

Spatiotemporal characteristics and meteorological determinants of hand, foot and mouth disease in Shaanxi Province, China: a county-level analysis

Procedure

There are two main laws in geography. The first one was proposed by Waldo Tobler: “Everything is associated with others, and close things are more related compared with distant things” [24]. The first law demonstrates the relationship between distance and association. Michael Goodchild came up with the second law, the 1aw of spatial heterogeneity: “The separation of space accounts for the difference between regions, namely, heterogeneity, including spatial local heterogeneity and spatial stratified heterogeneity” [25]. The second law illustrates that the specific values of units were different from the surrounding regions, which could be regarded as hot or cold spots. Based on the laws of geography, spatial autocorrelation analysis was formulated to reveal the spatial dependence and hierarchical spatial enumeration. Appendix 2 provides a demonstration of different types of spatial cluster situations. Each circle represents the variables in specific units, and the circles are associated with each other. The red circles represent indicators with higher values, while the blue circles denote the lower values. The left graph, demonstrating the positive spatial autocorrelation, shows the pattern of clusters with similar values, namely, the red circles tend to be near to each other, and the blue circles surround each other. There is no spatial autocorrelation in the middle graph due to the random distribution of high and low values. Negative spatial autocorrelation is found in the right graph, which means that the high values are surrounded by the low values.

The Moran’s I is one of the most commonly used indicators considering spatial autocorrelation analysis, which consists of global and local Moran’s I. Global Moran’s I is a reflection of the first law, measuring the spatial dependence of the whole research region, while as a transformation of the second law, the local Moran’s I reflects the regional differences. In our study, the global and local Moran’s I findings reveal the whole-level spatial distribution characteristics of the study region and specific cluster regions in the research area, respectively [26].

In this study, the value of global Moran’s I, ranging from − 1 to 1, reflects the overall spatial distribution of HFMD in Shaanxi province. When the index is near 1, a positive spatial autocorrelation is detected [27, 28]. The counties with high incidence rates of HFMD tend to cluster. A zero means that there is no spatial autocorrelation of HFMD, illustrating high and low values scattered randomly in Shaanxi. When the values are distributed around − 1, a negative spatial autocorrelation is observed, indicating that counties with high and low values border each other. The equation of global Moran’s I is as follows:

Where *X*_{i} is the incidence rate of HMFD in county i and j. The $\overline{X}$ is the mean value of the incidence rate of HFMD in Shaanxi. The difference between the mean and absolute values of incidence rate is crucial in determining the positive or negative effects. *n* is the number of all the counties in Shaanxi. *W*_{ij} is an important tool in spatial modelling, as it quantifies the spatial dependence between observations, which is normally expressed as an n × n non-negative matrix W:

Where n is the number of spatial units; *W*_{ij} represents the spatial dependency relationship between region i and region j. The larger the weight value, the stronger the spatial dependency between regions. The spatial weight matrix was constructed based on a contiguity relationship. Therefore, the value on the main diagonal of the matrix is zero, which means that each area is not adjacent to itself, namely, *W*_{ij} = 0. At the same time, if areas i and j are adjacent, then *W*_{ij} = *W*_{ji}. The spatial weight matrix is symmetrical.

Regarding the local Moran’s I, a positive value of the index represents the similarity of region, which means that the regions with high or low incidence rates of HFMD cluster within the same category, while a negative value indicates the opposite, that is, the counties with high incidence rates tended to be near regions with low incidence rates. Based on the value and the significance level, the clusters could be classified as four types, namely, High-high (HH, the regions with high incidence rates are surrounded with other high incidence rate regions), High-low (HL), Low-high (LH), and Low-low (LL). The equation of local Moran’s I is as follows [26]:

Where *m*_{0} is a constant across all county-units; the explanation of other parameters is the same as with the global Moral’s I. To further demonstrate the statistically significant level of the incidence rate of HFMD, a map displaying the counties whose local Moran’s I has significant results is presented. The map is also known as a LISA map.

In this study, the Spatial Lag Panel Model (SLPM), Spatial Error Panel Model (SEPM), and Spatial Durbin Panel Model (SDPM) were introduced to reveal the relationship between the HMFD and meteorological factors based on the following model derived from the measurement of variables [29–32]. The logarithm of the variable would not change the nature and correlation of the data, but it would compress the scale of the variable. After taking the logarithm of the variables, the data was more stable, and the collinearity and heteroscedasticity of the model were also weakened. In this article, the logarithm of waterfall played an important in weakening the heteroscedasticity. Besides, the temperature had negative number and the unit of humidity is percentage, which is not suitable for logarithm change, so the final model was as follows:

Where the *i* represents the 107 county-units (*i* = 1, 2…107); *t* means the time variable (*t* = 2009, 2010…2018); *α* denotes the constant term and *ε*_{ij} represents the error term. The SLPM is used to analyse the influence of dependent variables from the neighbouring counties by adding the spatial lag term of the dependent variable into the independent variable. The spatial dependence can be reflected as an error term, namely, missing variables in the model have a spatial correlation with HMFD, or unobservable random variables have spatial correlations with HMFD. The SEPM is applied in such circumstances. The SDPM is useful in reflecting the influence on specific regions from surrounding regions. However, although the SDPM can reveal the relationship between dependent and independent variables inside and outside the local region, the coefficients of SDPM cannot be directly explained, as the effects due to the derivative of *y* correspondence to *x* usually do not equal *β*_{k}. Hence, the effects of the coefficient can be decomposed into direct and spill-over effects.

After understanding the functions of all the spatial econometric panel models, a standard model selection strategy is established. The procedures can be divided into four steps. In the first step, the Moran’s I or LM test is introduced to examine the spatial autocorrelation, namely, the availability of conducting spatial analysis methods. In the second step, the Wald test and the LR test are used to choose the SLPM, SEPM or SPDM. In the third step, the Hausman test is applied to determine whether a fixed effect model or a random effect model should be used. If a fixed effect model is used, the last step is introduced to determine the application of fixed effects (time, individual or both). If it is fixed effect model, the last step were introduced to determine individual fixed effects (controlling the “space-specific, time-invariant” variables, which are excluded from the model) or time effects (controlling the “time-specific, space-invariant” variables, which are excluded from the model) or both fixed effects (controlling the above two), and it would be chosen according to the sample size and time.

Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Note: The content above has been extracted from a research article, so it may not display correctly.

Q&A

Your question will be posted on the Bio-101 website. We will send your questions to the authors of this protocol and Bio-protocol community members who are experienced with this method. you will be informed using the email address associated with your Bio-protocol account.