SPATIAL SMOOTHING AND WEIGHTING
Notes from Haining (2003) Chapters 5 and 7.1, and lecture
There are different conceptual models of spatial variation:
- The Regional Model: The emphasis is on the definition of regions as spatial units used for analysis. There are three types of region of particular interest
- Formal/Uniform: Constructed using sharp partitioning of space into homogenous or quasi-homogenous areas. Borders are based upon changes in attribute levels. This is fairly simple to achieve with small numbers of variables, but becomes harder the as the number increases unless there is strong covariance between them all. This method is also tricky when the variable(s) of interest are continuous over space.
- Functional/Nodal: regions are constructed using interacted data. Whereas formal regions are characterized by uniformity of attribute level, the functional region is bound together by patterns of economic or other interaction which set them apart from neighbouring districts. E.g. Labour market regions are defined by attribute similarity (such as travel to work time).
- Administrative: These regions are the consequence of political decisions.
- Rough and Smooth: This model assumes there are two components to spatial data such that
data = smooth + rough
The smooth part is regular or predictable such that knowing part of the distribution allows for extrapolation at the local level. The rough part is irregular, and is what is left over once the smooth component has been accounted for. The rough component cannot be used for extrapolation even though it may be explainable in terms of the underlying data generating process. Smooth structures could include topographical features, a propensity for similar data values to be found clustered together (spatial autocorrelation), trends or gradients etc. Rough could include local hot/cold spots, spatial outliers, and localized areas of discontinuity.
- Scales of Spatial Variation: This recognizes the different scales of spatial variation such that:
Spatial Data = Macro(scale variation) + medium/micro(scale variation) + error
The error here includes measurement error. The macro component refers to trends or gradients present across the whole study area e.g. a south-north linear trend. The medium/micro component refers to more localized structures that are conceptualized as superimposed upon the macro variation e.g. localized disease hotspots, or cold spots. If the data map is dominated by the micro variation, then it displays special heterogeneity. In such a circumstance analysis should probably be conducted at the local level.
Smoothing
The aim of map smoothing is to remove the distracting noise or extreme values present in data in order to reveal spatial features, trends etc. in other words to smooth away the rough component of the data to reveal the underlying smooth pattern. It tries to improve the precision of data without introducing bias.
Resistant Smoothing of Graph Plots
This method is applied to graphical plots to aid visualization and to identify trends and relationships. This essentially fits smooth curves to scatter plots using locally weighted regression with robustness iterations. A vertical line is centred on an observation of x, and then a bandwidth applied around it. The paired observations that fall within this bandwidth or window are assigned neighbourhood weights using a weights function that has its maximum at the centre of the window and decays smoothly and symmetrically to become zero at the boundary of the window. A line is fitted to this subset of data by weighted least squares to create fitted values of y. The window is then slid along the x axis and the process repeated to create a sequence of smoothed y values. The bands overlap as generally the window is applied to each ordered value of x.
Essentially the idea is as follows:
- x(s) is a variable that has a spatial element s
- m(s) is the smooth part of the data and is a function of s although the function is unknown and very likely to be non-linear. This is the part we are trying to estimate
- ε is the rough part we are trying to smooth away
The most common way to get at the estimation of m(s) takes the form:
Where w(s*, s_{j}) is a scalar weight assigned to data point j given its distance (or other weighting scheme) from location s* which is where the window is focused. It should be noted that the sum of the weights equals 1.
Thus the predicted value of m(s) is basically a moving weighted average.
There are various different weighting structures that can be used.
- Nearest Neighbour: Here the eights are based on some predefined set of k nearest neighbours such that w(s*, sj) = 1/k if data point j is one of the k nearest neighbours to location s* and 0 otherwise.
- Kernel Regressions: this assigns weights based upon the distance data point j is to the kernel point s* such that:
Here h is the bandwidth such that the window is always the same. Decisions about how best to aggregate the data are needed here, and should be drawn from some understanding of the underlying process. The denominator here ensures that the weights sum to 1 over the relevant bandwidth. Different kernels can be used such as the Uniform (rectangular) kernel, or the Normal (Gaussian) kernel.
Using these weights we can run a locally weighted regression that estimates a coefficient for every point of data j based upon the bandwidth and weights assigned. These local polynomials can then be made to join locally to create a spline. It should be noted that these methods encounter problems at the edges of the data.
Map Smoothing
The above methods apply in two dimensions (left-right). However, when using data that has more dimensions (North, South East, West) as in spatial data, the spatial kernels can be combine into just one dimension called distance. This method assumes that you can weight points equally in either direction of s*. In practice however, we may need to weight the N-S or E-W directions differently (for example house prices in south London may be more likely to increase as we move north toward the centre, rather than south toward the suburbs). In other words the method assumes the spatial function is the same in every direction. If this is unlikely to be the case, then there is an argument for using narrow bandwidths.
The idea of map smoothing is to improve the precision associated with area data values whilst not introducing serious levels of bias. It is very important to decide upon what size of window is to be used for the smoothing. A large window improves the precision of the statistic because it borrows a lot of information from the rest of the observations (effectively this is increasing the sample size) but the cost is that there is a greater risk of introducing bias as information is borrowed from areas further away that may be different in terms of the data generating process. A small window reduces the risk of bias, but also decreases the precision of the estimate.
The effectiveness of local information borrowing when smoothing depends upon local homogeneity. If local areas are very different (i.e. the rough component is very large) then even very localized smoothing can introduce bias. This homogeneity will itself be affected by the size of the area that is sampled (or density of the sampling points) relative to the true scale of spatial variation [see MAUP]. In the case of high heterogeneity smoothing using local information may not be as effective as smoothing using information borrowed from data points that are similar to the area in question (even if they are not located nearby).
There are a variety of different methods for map smoothing
- Mean and Median Smoothers: in this case the value of data point j is replaced with the median, or mean drawn from the set of values (including itself) that are contained within the window imposed on the map. The window may be defined in terms of distance, adjacency, or other characteristic. The rough component of the data can then be extracted by subtracting the median/mean from the observed value. The choice between the two is important. Mean smoothing tends to blur spikes and edges and so may be appropriate in environmental science where localized spikes are not generally expected. Median smoothing tends to preserve these spikes and hence may be more useful for social applications where there can be large localized spikes. In any case, the performance of these smoothers will depend upon the weights assigned to them.
- Distance weighting is often used. This can be a simple nearest neighbour weight scheme as explained above, although this type of neighbourhood weight function causes values to change abruptly and smoothers based on them can have undesirable properties. A simple distance weighting scheme can be used where data are assigned the values based on weights:
W_{j }= d_{ij}^{-1} / ∑_{dij}^{-1} and then the predicted value of m(s) = ∑w_{ij}x_{j}
An additional restriction can be added such that the only values assigned weights and used in the estimation conform to some rule d_{j} < D otherwise they equal 0. This then has become a kernel smoother as described above. It is traditional to omit the observed value of the kernel point from the analysis (i.e. exclude observation i from the calculation of the mean for point i).
The literature has shown that the precise choice of weighting function is generally less important the choice of window. There is always a variance/bias trade off to be made. The choice should be made based upon either knowledge of the nature of local mean variability.
There are other types of smoothing such as “headbanging” and median polishing, but at this time they appear to go beyond the scope of the course.
In the above example, house price points are smoothed into a raster grid to assign a value to the raster square using an inverse distance weighting scheme.
In practice weighting schemes can be based on a variety of different things: social weights, network distances, migration flows, travel times, income differences etc.