Category Archives: Techniques of Spatial Analysis (GY460)



Notes from Lecture and Various Papers 

Instrumental Variables

Instrumental variables are used when OLS estimates are biased by endogeneity or measurement error. The process is based upon identifying exogenous variation in the key independent variable.


I’m not going to go into how the IV estimator is constructed as it is well documented in EC406 notes, or see e.g. Stock and Watson.


If the regression is overspecified (i.e. there are more instruments than endogenous regressors) then a Hansen-Sargan test can be used to test the exclusion restriction – although the instruments will pass the test if they are all equally endogenous i.e. it is a weak test. In general the F-stat should be > 10 in the first stage, and there should be a strong theoretical reasoning behind the instrument (such that the “compliers” are meaningfully identified).


In the spatial context spatially lagged X variables have been seen to be used as instruments for the spatial lag of Y. However, as we have already seen this method is not without its complications (correctly specifying the functional form/ exogeneity restriction violated). Thus, the literature has begun to move toward adopting the quasi-experimental method by searching for instruments based on policy changes, boundaries, geological features etc. or other similar type events.


Some examples


Hoxby, Does Competition among Public Schools Benefit Students and Taxpayers? (2000)

This is surely one of the most famous examples of a spatial IV. The paper examines whether increased school competition in the form of a greater number of school districts within a municipality has benefits for the population studied. OLS estimates are biased because the supply of school districts is in part a response to the demand for school districts which is probably driven by wealth, ability, parental involvement, and other unobservable characteristics which codetermine student outcomes and cannot be readily controlled for. Thus, Hoxby uses an instrument to attempt isolate exogenous variation in the supply of school districts, to get a consistent estimate of the effect competition has on student outcomes.  The instrument is based on the number of streams and rivers within a municipality. The logic is that in the 19th century when school districts were being drawn up, geological features such as streams presented barriers to movement such that districts were often drawn up with the streams forming natural boundaries. Thus, a municipality with more streams would have more school districts, hence the instrument is relevant. Over time, the importance of streams in terms of determining outcomes has diminished, and hence the presence of more streams has no effect on educational outcomes other than through its effect on determining school districts in the 19th century, and hence the exclusion restriction is satisfied.


There are problems with the strategy. Specifically, rivers may still have an economic effect today, and this could feed back into educational outcomes. Additionally, the way the instrument was constructed has been criticized, as it was subject to much subjective judgement.


Luechinger, Valuing Air Quality Using the Life Satisfaction Approach (2009)

This paper is trying to gauge how important air quality is for affected populations. The hedonic method of valuation (which seeks to determine the unobserved price of a public good by using prices embedded in private goods) tends to underestimate the value of air quality as migration is costly, and private goods prices are based on perceived rather than objective risk. Any residual effect that air pollution has on life satisfaction is indication that compensation has not been fully capitalized in house prices for the reasons just stated.


However, an OLS estimates of air quality on life satisfaction would be biased as cleaner air is the product not only of exogenous policy change (even assuming it is exogenous), but also of local industrial decline and economic downturn. These simultaneous developments can have a countervailing effect on life satisfaction and housing rents. Thus he uses an instrument for SO2 levels; the mandated installation of scrubbers at power plants


The construction of the instrument is somewhat convoluted as it is relies upon a difference in difference estimation. Desulphurization pursuant to retroactive fitting of scrubbers at power plants is the treatment, with a county being down or upwind of the power plant determining assignment to treatment and control group respectively. Yet, as being in treatment/control is a question of degree rather than kind, the treatment group variable is a frequency measure of how often in the period of study the county in question is downwind of the plant. This is likewise multiplied by a distance decay function and the pre-desulphurization levels of the plant in question is controlled for.


The main finding is that SO2 concentration does negatively affect life satisfaction, with estimates being much larger for the OLS specification indicating that reductions in sulphur levels are indeed accompanied by factors that have a countervailing effect on satisfaction.


Gibbons et al. Choice, Competition and Pupil Achievement (2008)

This paper uses a boundary discontinuity in order to construct an instrument for primary school competition in the UK which gets around the endogeneity concern in OLS estimates, namely that motivated parents may move closer to popular schools. The boundaries in question are the Local Education Authority boundaries. Whilst families are allowed to make application to schools outside of their LEA, cross-LEA attendance is extremely uncommon.


They construct indices for choice: for each school they define a travel zone school  that a) encompasses all residential addresses within the same LEA and b) that are contained within a circle whose radius is the median of the travel to school distance for the pupils at that school. Pupil choice is thus the number of travel to school zones in which the student lives, and the School competition measures is the average of this value for students actually attending a given school (i.e. the number of alternatives available to student of a particular school). If families sort spatially near to high performing schools this will tend to decrease apparent competitiveness.


They then exploit the fact that families living near boundaries face longer journeys to school than those in the interior, and as such they are more likely to attend their local school. This is because the catchment area is bounded and hence shrinks. Thus the distance between a pupil’s home and the LEA boundary is an instrument for school choice, and the distance between a school and the boundary is an instrument for competitiveness. They do not find evidence that school competition increases pupil achievement.


Differencing Methods

Often there will be spatial sorting an heterogeneity i.e. differences between places that lead to biased estimates. This sorting will often be on observable characteristics, but just as frequently on unobserved characteristics.


One method for dealing with this is the fixed effects model. This can be estimated with panel or cross sectional data using area dummies, or by making the within groups transformation (de-meaning) and then estimating with OLS. This removes the area specific time invariant determinants of the dependent variable.


With panel data can be time differences which has the same effect. Time dummies can also be included to strip out variation common across regions due to time trends. The remaining variation is time variant region specific variation, and as such for the estimates to be unbiased there can be no correlation between region specific time variant shocks and the error term. For example, there could be no sudden shock to the educational system in a given area that induced people to sort spatially into that area.


The difference in difference method is usually applied to evaluating policy interventions where a treatment and control can be created. I am not going to go into the mechanics here as it is well documented elsewhere.


Some Examples


Manchin et al. Resources and Standards in Urban Schools (2007)

The paper is concerned with whether additional resources can be used to improve the outcomes of hard to reach pupils specifically evaluating Excellence in Cities programme that gave extra funding to schools based upon their level of disadvantage as measured by the proportion of pupils eligible for a free lunch. They use a DID strategy comparing the outcomes in EiC schools with a comparison group. A direct comparison between EiC and non-EiC schools would not be valid as there is no reason to assume that the parallel trends assumption holds. Mindful of this the authors using propensity scores based on a host of school and pupil level characteristics to create a subset of non-EiC schools which are statistically similar to the pre-treatment EiC as schools, and they use this subset as the control group. They do not make a hugely convincing argument for this method, and indeed there are statistically significant differences in the outcome measures in the pre-treatment periods, indicating that there is only limited reason to suspect that the key identifying assumption holds.


They find that the policy was effective in raising pupil attainment in the treatment schools but that the benefits were restricted to the students best able to take advantage of the policy (i.e. the most gifted).


Duranton et al. Assessing the Effects of Local Taxation Using Microgeographic Data (2011)

This in an interesting paper that seeks to identify the effect of local property taxation on the growth of firms. Estimating this has been difficult as site characteristics are heterogeneous, and many characteristics will be correlated with unobservable determinants. Secondly, firms are heterogeneous, and this differences are often largely unobservable, yet these differences cause them to sort spatially. Lastly tax systems may be endogenous to location decisions of firms.


Using panel data they estimate a model which includes firm specific observable characteristics which removes firm specific time varying observable variation. They include a firm fixed effect to remove the time invariant firm specific unobservable variation. They also include higher level fixed effects (site, and region). They then difference the data in the usual way which implements the fixed effect strategy as noted above.


They then do a spatial difference. This takes the difference which is the difference (in the difference) between each establishment and any other establishment located as a distance less than d from that establishment. If there is a term αzt for each site z in time t, and this is not controlled for, then any local shock to firms that also affects tax rates will bias the panel estimates above. However, if we are able to assume that for small changes in d, Δαzt ≈ 0 (i.e. local shocks are smooth over small amounts of space), then by spatially differencing the alpha term falls away, and the time varying local shocks are effectively controlled for.


They then combine this with an instrumentation strategy that instruments tax rates using political variables.  




Notes from Lecture 


Firms and individuals have choices over discrete alternatives such as which mode of transport to take, or where to locate their businesses. These choices are modeled using the random utility model in order to aide in economic interpretation of those choices.

Random Utility Model

This was developed by Daniel McFadden and underlies the discrete choice model. This model holds that preferences over alternatives are a function of biological taste templates, experiences and other personal characteristics some of which are observable, others of which are not (cultural tastes etc.), and the function is heterogeneous within a given population. This indicates that an individual/firm’s utility from choice j can be decomposed into two components:

Uij = Vij – εij 

where V is an element common to everyone given the same characteristics and constraints. This might include representative tastes of the population such as the effects of time and cost on travel mode choices. ε is a random error that reflects the idiosyncratic tastes of the individual concerned as well as the unobserved attributes of the choice j.

V is observable based on consumer/firm choice characteristics such that:

Vij = αtij + βpij + δzij

where t is time and p is price and z is other observable characteristics.

In a setting where there are two choices (e.g. car or bus to work) we observe whether an individual chooses car (yi = 0 ) or bus (yi = 1). Assuming that individuals maximize their utility, they will choose bus if this exceeds the utility from going by car Ui1 > Ui0 which means that Vi1– εi1 > Vi0 – εi0 which indicates that εi1 – εi0 < Vi1 – Vi0. Therefore the probability that we see an individual choose to go by bus is:

P(εi1i0 < Vi1 – Vi0)

Which is equal to P(εi1– εi0) < α(Ti1 – Ti0) + β(Pi1 – Pi0)) 

If we are willing to assume that the probability depends linearly on the observed characteristics then this can be estimated by running the following OLS regression:

Yi1 = α(Ti1 – Ti0) + β(Pi1 – Pi0) + εi1

At this point further observable characteristics can be added, z.

However, as is well known, the OLS model is not bounded by 0 and 1, whereas probability functions are. This means that this estimation may return results outside the possible range of probabilities. In order to counter this problem we can estimate a probability function using probit or logit estimators which are calculated using the maximum likelihood method [of which I am not going to which anything – assuming it will not be examined in detail].

The McFadden paper deals with car versus bus commuting in the SF Bay area.

Multiple Choices

Often we want to think about more than one choice, which requires us to extend this model. We can extend the random utility model to many choices Uij = Vij + εij. Now an actor will choose alternative k if the utility derived from this choice is higher than for all other choices:

Vik + εik > Vij + εij for all j≠k 

If we assume an extreme value distribution then the solution for the probability choice is given by P(yi = k) = exp(Vik) / ∑ exp(Vij). This is a generalization of the logit model with many alternatives, hence the name “multinomial logit”. The model compares choices to some predetermined base case.

Independence of Irrelevant Alternatives (IIA)

One drawback of the multinomial logit method is the IIA problem. This is driven by the assumption underlying the model, that if one choice is eliminated in time t=1, the ratio of individuals choosing the remaining option much remain constant from the pre-elimination period t=0. For example if in t=0 40 people take bus A, 12 people take bus B and 20 people drive, and then in t=1 the B bus company goes bust. In t = zero, the ratio of people driving relative to those taking bus A is 2:1. This must remain constant in t=1 so the model assumes that 24 people will dive and 48 will take bus A. This might not be a valid assumption is bus seats are not supplied elastically, or bus A and bus B were not substitutes.

It is simple to see why this is the case, as the underlying assumption of the model is that P(yi = k) = exp(Vik) / ∑ exp(Vij), and this clearly cannot change simply because one of the other alternatives has been eliminated. 

This can be solved using the nested logit model. Conceptually this decomposes the choices into two separate stages. In the first stage the individual chooses whether to take his car or public transport. If he decides on public transport then he must decide between bus A and bus B. This choice structure is estimated using sequential logits whereby the value placed on the alternatives in the second stage are entered into the choice probabilities in the first stage.

Aggregate Choice Models

Aggregate choice models are useful when individual data are not available, and also when computing power is an issue (due to many fewer observations). All of the above models have aggregate equivalents. In fact, using the Poisson model with a max likelihood estimation method, aggregated data give exactly the same coefficient estimates as the conditional logit model when the only data available are the choice characteristics (i.e. how many people chose what). Multinomial logit will be better when there are accompanying individual/group-level characteristics.

Gravity Models

Choices can also be modeled as flows between origins and destinations. This is widely applied in the fields of trade, migration and commuting.  A flow from place j to k can be modeled as:

Ln(njk) = βXjk + αj + αk + εjk

where the alphas represent characteristics of the source and destination such as population, wages etc., a cost of moving measure can also be included. This literature has found strong distance decay effects, which are puzzling in many cases (e.g. trade) as the cost of moving goods further is now fairly marginal.

Discrete v Aggregate: discrete choice models have the advantage that firm level characteristics can be incorporated, and there is a strong theoretical model underlying the estimations. Aggregate flows on the other hand are easier to compute and there is no need to make assumptions about the functional form that are necessary for the non-linear maximum likelihood estimators. One disadvantage is that no separation of the individual/aggregate factors is possible.



Notes from lecture and various articles 


Generally there is very little reason to suppose that a process will be generated randomly over space. Spatial statistics help us to gauge to what extent the values that data take are related to other observations in the vicinity. 

Spatial statistics broadly fall into two categories:

1)     Global – these allow us to evaluate if there are spatial patterns in the data (clusters)

2)     Local – these allow us to evaluate where these spatial patterns are generated

Differences between these two statistics a can be summarized thus:



Single Values Multi-valued
Assumed invariant over space Variant over space
Non-mappable Mappable
Used to search for regularities Used to search for irregularities
Aspatial Spatial

Generally these statistics are based upon:

  1. Local means – see spatial weighting sections above (smoothing techniques such as kernel regression and interpolation).
  2. Covariance methods – comparing the covariances of neighbourhood variables (Moran’s I, and LISA)
  3. Density methods – the closeness of data points (Ripley’s K, Duranton & Overman’s K-density).

Moran’s I

This is one of the most frequently encountered measures of global association. It is based on the covariance between deviations from the global mean between a data point and its neighbours (howsoever defined – e.g. queen’s/rook’s contiguity at the first/second order etc.).

It is computed in the following way:  

Where there are n data values, y is the outcome variable at location i or its neighbour j, the global mean is Yg and the proximity between locations i and j are given the weights Wij.

A z statistic can be calculated in order to assess the significance of the Moral I estimate (compared in the usual way to a critical value e.g. 1.95 for 5% significance). 

Problems with this measure are that it assumes constant variation over space. This may mask a significant amount of heterogeneity in spatial patterns, and it does not allow for local instability of variation. Thus a focus on local patterns of spatial association may be more appropriate. This could involve a decomposition of this type of global indicator in the contribution of each individual observation. One further issue is that the problems associated with MAUP (see above summaries) are built into the Moran statistic.

Local Moran

The Local Moran is a Local Indicator of Spatial Association (LISA) as defined by Anselin (1995). He posits two requirements for a statistic to be considered a LISA:

  1. The LISA for each observation gives an indication of the extent of spatial clustering of similar values around that observation.
  2. The sum of the LISAs for all observations is proportional to a global indicator of spatial association.

The local Moran statistic allows us to identify locations where clustering is significant. It may turn out to be similar to the global statistic, but it is equally possible that the local pattern is an aberration in which case the global statistic would not have identified it.

It is calculated like this:

Ii = Zi [∑j=1nWijZj, j=i]


where z are the deviations of observation i or j from the global mean, and w is the weighting system. If I is positive then the location in question has similarly high (low) values as its neighbours, thus forming a cluster.

This statistic can be plotted on the y axis, with the individual observation on the x axis, to investigate outliers, and see whether there is dispersion or clustering.


There are problems with this measure. Firstly the local Moran will be correlated between two locations as the share common elements (neighbours) Due to this problem the usual interpretation of significance will be flawed, hence there is the need for a Bonferroni correction which will correct the significance values (thus reducing the probability of a type I error – wrongly rejecting the null of no clustering). MAUP is an issue similarly as above.

Point Pattern Analysis

This type of analysis looks for patterns in the location of events. This is related to the above techniques, although they are based on aggregated data of which points are the underlying observations. As the analysis is based on disaggregated points, there is no concern about MAUP driving the results.

Ripley’s K

This method counts a firm or other observation’s number of neighbours within a given distance and calculates the average number of neighbours of every firm at every distance – thus a single statistic is calculated for each specified distance. The benchmark test is to look for CSR (complete spatial randomness) which states that observations are located in any place with the same constant probability, and they are so located independently of the location of other observations. This implies a homogenous expected density of points in every part of the territory under examination.

Essentially a circle of given distance (bandwidth) is centred on an observation, and the K statistic is calculated based on all other points that are located within that circle using the following formula:

K(d) = α/n2 * ∑i=n i=1i≠j I{distanceij < d 

where alpha is the area of the study zone (πr2), and I is the count of the points that satisfy the Euclidean distance restriction. If there is an average density of points µ, then the expected number of points in a circle of radius r, is µπr2. As the K statistic is the average number of neighbours divided by the expected number of points µ, this means that CSR leads to K(r) = πr2.

Again, the returned density by distance can be plotted against the uniform distribution to see whether observations are clustered or dispersed relative to CSR.

Marcon and Puech (2003) outline some issues with this measure. Firstly, since the distribution of K is unknown, the variance cannot be evaluated, which necessitates using the Monte Carlo simulation method for constructing confidence intervals. Secondly there are issues at the boundaries of the area studied, as part of the circle will fall outside the boundary (and hence be empty) which may lead to an underestimation at that point. This can be partially corrected for by using only the part of the circle’s area that is under study.

Additionally, CSR is a particularly useful null hypothesis, other benchmarks may be preferable.

Kernel Density

These measures yield local estimates of intensity at a specified point in the study. The most basic form centres a circle on the data point, calculates the number of points in the area and divides by the area of the circle. i.e:

δ(s) = N(C(s, r)/ πr2

where s is the individual observation, N is the number of points within a circle of radius r. The problem with this estimate is that the r is arbitrary, but more seriously, small movements of the circle will cause data points to jump in and out of the estimate which can create discontinuities. One way to improve on this therefore is to specify some weighting scheme where points closer to the centroid contribute more to the calculation than those further away. This type of estimation is called the kernel intensity estimate:

δ(s) = ∑i=n, i=1 1/h2 * k(s – sj / h)


where h is the bandwidth (wider makes estimate more precise, but introduces bias) and K is the kernel weighting function.

IMD and Mayoral Election Maps

The following is a visualization of the Index of Multiple deprivation by LSOA for London. The higher the IMD score the more deprived the area. So light blue is least deprived decile, and dark pink is the most deprived decile…

The second map represents the 2008 London Mayoral Elections. The darker the red the greater the Labour (Livingston) Majority in that winning seat. The darker the blue the greater the Conservative (Johnson) Majority in that winning seat.



An Investigation into the Rail Network and Social Exclusion



In recent years it has been recognized that the public transport opportunities that accrue to individuals may play a part in determining their level of social exclusion. In particular, Tony Blair’s Social Exclusion Unit was specifically tasked with examining the linkages between social exclusion, transport, and the location of services with a particular emphasis on “opportunities that have the most impact on life-chances, such as work, learning and healthcare.”[1] A number of researchers have proposed methods for evaluating links between exclusion and transport, generally focusing on accessibility to services. This short paper aims to add to this growing literature by investigating the links between access to the rail network in England, and measures of social exclusion. I use spatially referenced data on the locations of train stations to construct an accessibility measure based on distance to the rail network and use regression techniques to investigate the effect that this measure has on social exclusion.

 The results of this analysis are somewhat surprising in that I find that a higher value of the accessibility measure (meaning that the rail network is further away) is associated with lower levels of social exclusion. This result is robust to a number of specifications and controls for the availability of other types of transport as well as controls for housing and environmental quality. The literature on exclusion and transportation does not offer a theoretical justification for why this should be so. As such, even in the event that the model is correctly specified and takes into account all relevant variables, which is highly unlikely, it would be unwise to conclude that there is any causal mechanism at work.

 Interestingly, higher values of an accessibility measure to the bus and coach network are associated with higher levels of social exclusion. This may be tentative evidence that the public transport network that is most pertinent for social inclusion is the bus/coach network, although further research would be needed to substantiate such a claim.

 The most that can be concluded is that with the data that has been made available for this analysis I am unable to uncover a meaningful link between access to the rail network and levels of social exclusion. This could be due to pertinent variables being omitted from the analysis or that the accessibility measure based on distance is not capturing what is important about access to public transport networks.

[1] Making the Connections: Final Report on Transport and Social Exclusion, Social Exclusion Unit (2003), p. 1


My Maps

I recently completed an investigation into the links between social exclusion and access to the rail network in England. The full report (including maps) is posted in pdf format above, but I have extracted the main maps, as they are colourful and awesome, and I’m dead proud of myself!

Index of Multiple Deprivation 2010

Index of Multiple Deprivation 2010

This map shows the English Index of Multiple Deprevation. Low scores (light blue) indicate low deprivation, and high scores (pink) indicate high levels of deprivation. The gradient colours are deciles, so ligh blue if the least deprived 10% regions, and pink is the most deprived 10% rof regions. Distance to Rail Network

 The above map indicates how far a region is from its nearest train station. The cooured gradients represent how far the region is in metres from the closest mainline rail station.



T. Lyytikainen SERC Discussion Paper 82

A Brief Summary 

This paper uses spatial instrumental variables in order to estimate the neighbourhood effects of tax rate setting. Using the IV approach the empirical results suggest that there is no significant interaction in tax rate choices among Finnish municipalities.


The Finnish case was chosen because in 2000 there was a reform to the system. Tax rates on property were previously selected by municipalities based upon a band of possible rates that was set by the government. In 200 the lower rate was raised by the government from 0.2% to 0.5%. This new lower limit was binding for 40% of the municipalities.


He discusses the spatial lag, and special instrumental variables models that are generally used. The method ultimately used is similar to the traditional spatial IV technique with one key difference; it is the policy intervention that is used as instrument rather than a higher order weighted average.


The actual imposed tax rate changes are not observable as there is no observable information about what rates would have been chosen had the lower bound not been altered. However, he constructs a measure or predicted imposed increase in tax rates that looks like this:


Zi2000 = D(T2000 > T1998i)(T2000 – T1998i)


Where T2000 > T1998i  is a dummy variable equal to 1 if the municipality had a 1998 tax rate below the lower limit of the year 2000 newly imposed lower limit. He then uses this as an instrument for the spatially lagged tax rate change. He says the instrument is relevant in the first stage (but does not report it), and conducts a placebo test.


The weighting system is nearest neighbour, but for robustness purposes, population weighting and a combination weighting scheme are also used.


The results are that although the coefficient on neighbour’s tax rates is positive, it is very small, and also statistically insignificant. This result is robust to different weighting systems.

Interestingly he tests the data using the SAR and general spatial IV method and finds strong and highly significant coefficients which casts doubts upon their reliability given that the IV method he uses should be stronger due to diminished endogeneity concerns.



J.K. Brueckney & L.A. Saavedra, National Tax Journal Vol 56, No. 2

A Brief Summary 

In a Nutshell

The authors use city level data from the US to estimate a model of strategic-tax competition and specifically the tax reaction function. They find that this function has a non-zero slope which indicates changes in a local competitor’s rates affects choices made by a different community.

The data are drawn from a sample of 70 cities that comprise the Boston Metropolitan area. Working under the assumption that community The authors use city level data from the US to estimate a model of strategic-tax competition and specifically the tax reaction function. They find that this function has a non-zero slope which indicates changes in a local competitor’s rates affects choices made by a different community.

The data are drawn from a sample of 70 cities that comprise the Boston Metropolitan area. Working under the assumption that community i‘s tax decision is a function of the tax rates in other communities they use a SAR model of weighted averages of neighbouring jurisdictions as a spatial lag. To check their results are not driven by the weighting scheme used (as it is arbitrary), they test different weighting schemes as part of their robustness checks (contiguous neighbour, distance decay, population weighting, and combinations thereof).

They are aware of the simultaneity problem and the bias that would introduce using OLS, so their estimations are made using Max Likelihood.

The principal finding was that the coefficient on the spatial lag was positive and significant, and this was robust to the different weighting measures. This implies that for the period, strategic tax rate setting occurred and the best response of a community who was faced with increased rates in a neighbouring community, was to themselves raise rates. In game theory this means that communities are strategic compliments.



S. Gibbons & H. Overman

A Summary with some additions from the lecture and Under the Hood Issues in the Specification and Interpretation of Spatial Regression Models, L. Anselin, Agricultural Economics 27 (2002) 

Spatial Models and Their Motivation

The inclusion of spatial effects is typically motivated on theoretical grounds that there is some spatial/social interaction that is affecting economic outcomes. Evidence of this will be spatial interdependence. Thus models are created that seek to answer how interaction between agents can lead to emergent collective behaviour and aggregate patterns. These might also be termed neighbourhood effects.

To start with a basic linear regression:


yi = xβ + µi


Where x is a vector of explanatory variables and β is a vector of parameters with µ as ever being the error term. This basic format assumes that each observation is independent of the others. This is generally too strong when in a spatial context as events in one place often affect events in another, particularly if they happen to be close to each other. A simple way of capturing the effects that nearby observations have on each other is to define a weights vector w which reflects how observations affect each other (for example distance weighting etc.). If this weighting system is multiplied by y then we have a matrix w’iy which for observation i is the linear combination of all y with which it is connected. If the weights are summed to 1 then this will give a weighted average of the neighbours of i.


Spatial Autoregressive Model

This weighted average can then be used to construct the spatial autoregressive model (SAR) which is also known as the “spatial y” model, and is referred to as a spatial lag. This model attempts to uncover the spatial reaction function, or spillover effect. The model looks like this:


yi =ρw’iy + xiβ +µi


The idea is that an individual observation is affected both by their own characteristics and recent outcomes of other nearby agents who are capable of influencing his behaviour. One example may be that when determining at what price to sell one’s house, the individual characteristics such as number of bedrooms are taken into account as well as property prices achieved by others in the vicinity. In this case the Beta captures the effects of the individual characteristics and the Rho captures the causal effect of neighbourhood actions.


Spatial X Model

Alternatively we may drop the assumption that yi  is affected by neighbouring y outcomes, and instead assume that it is affected by spatial lags of the observable characteristics. This is then a spatial x model (SLX):


yi = x’iβ + w’Xγ + µi


This assumes that the observable neighbourhood characteristics are determinants of yi. As in the above example this could be the characteristics of neighbourhood housing such as appearance, size etc. influencing individual price decisions. Beta is as above, and Gamma is the causal effect of neighbourhood characteristics.


Spatial Durbin Model

The spatial Durbin model (SD) combines SAR and SLX:


Yi = ρw’iy + x’iβ + w’Xγ + µi


Interpretation is as above indicates.


Spatial Error Model

This model drops the assumption that outcomes are explained by lags of explanatory variables, and instead assumes that a type of SAR autocorrelation in the error process. This yields:


yi = xβ + µi ; µi  = ρw’iµ + vi


This model assumes that outcomes are dependent upon the unobservable characteristics of the neighbours.



OLS with a lagged y variable (SD and SAR) yields inconsistent estimates unless Rho equals 0. This is because w’iy is correlated with the error term. [need help here] The gist of it seems to be that the average neighbouring dependent variable includes the neighbour’s error term, the neighbour’s neighbour’s error term etc., such that any observation i depends to some extent on the error terms of all the other observations. [I assume this would not be the case if the weighting restrictions were set to only include the nearest neighbor]. The intuition behind this problem is that you are your neighbour’s neighbour. In the simple i-j case the following occurs:


yi = ρyj + xiβ + εi (1)

yj = ρyi + xjβ + εj(2)


Substituting (2) into (1) we get:


yi = ρ(ρyi + xjβ + εj) + xiβ + εi


Which shows that yi is dependent in part upon itself.


Using OLS for the SLX is also problematic, as the assumption underlying OLS is that the error term is not correlated with the regressors. For the SLX model this means that E(ε| x) = 0 and E(ε| Wx) = 0. However if there is spatial sorting for example when motivated parents locate themselves near to good schools, then this assumption is violated as E(ε| Wx) ≠ 0.


The SE model may generate consistent estimates as the assumption that the error is not correlated with the regressor holds, however, standard errors will be inconsistent as by definition the model has autocorrelated error terms. This can lead to mistaken inferences.


Standard errors are inconsistently estimated for all models.


Additionally, the different types of model are difficult to distinguish without assuming prior knowledge of the data generating process which in practice we do not have.



Maximum Likelihood

These problems can be got around using Maximum Likelihood estimation which will provide consistent estimators. Essentially this is the probability of observing the data y given a value for the parameters Rho and Beta. A computer uses iterative numerical maximization techniques to find the parameter values that maximize the likelihood function. [I am totally unclear on how this works, however, I am assuming that we do not need to know the ins and outs.]


The issue with this specification is that it assumes that the spatial econometric model estimated is the true data generating process. This is an incredibly strong assumption that is unlikely to hold in any circumstance.


Instrumental Variables

In theory a second order spatial lag w2’xi (or even third, fourth order) can be used as instruments for w’yi  and then this “exogenous” variation in the neighbourhood outcome can be used to determine yi under the assumption that the instruments are correlated with Wy but not directly with yi. The first stage would look like this:


wy = w’xβ + ρw2xβ + ρ2w3xβ… 

and then the predicted values of wy would be used in the second stage regression with yi as the dependent variable.


There are problems also with this technique. Firstly it is unlikely that the true nature of w is known, and that it is correctly specified is crucial to the model. For example X variables may have an effect over a 5km distance, but the weighting system incorrectly restricts analysis to 2km.  Secondly the higher order lags of the X variables could still be having an effect upon yi and hence the exogeneity restriction is violated, and the 2SLS results are biased. Lastly the different spatial lags are likely to be highly correlated, and as such there will be little independent variation which is essentially a weak instruments problem. Weak instruments can severely bias second stage coefficients which will additionally be measured imprecisely.


The Way Forward

  • Panel data can allow for differencing over time to control for fixed effects. But the problems will be the same as above, but only in the context of differenced data.
  • In terms of the IV strategies, genuinely exogenous instruments should be found such as changes to institutional rules [see later tax paper summary].
  • They argue that the SAR model should be dropped, and if neighbourhood effects cannot be identified using genuine instruments, a reduced form of the SLX model should be used.
  • Natural experimental techniques from other economic literatures should also be borrowed e.g. DID, Matching. These techniques may help us to find causal effects but the tradeoff is that they are only relevant to some sub-set of the population (as in the Local Average Treatment Effect for IVs).




E.L. Gaeser and J.A. Scheinkman

A Brief Summary


Identifying Social Interactions

Inequality, concentrations of poverty and other outcomes may be the partial result of social interactions. Thus interventions that seek to address these phenomena may operate through both their incentives upon individual actors and through these social interactions. However, as policies are generally aimed at the former, it is difficult to quantify the effect, if any, the policy has through social networks. Whilst theory abounds that attests to the effect these interactions can have on the distribution of private outcomes, simply studying the outcomes tells us nothing about whether it was the interactions, or the private incentives that were responsible.


Different methods have been used to identify the interactions. One way is to look for multiplier effects, i.e. identifying the social effects as those operate above and beyond the private effect. However this requires being able to state exactly what the private effects should be in order to look at the difference, and this is generally not possible. A more promising approach looks at the results of interventions that operate directly on the social interactions and not on the private incentives. Three approaches are relevant:

  1. Interventions that change group membership. If group membership is changes (with no change to private incentive), then any changes in outcome could plausibly be attributed to the group effect. The problem here is that private incentives often change alongside, and it is often hard to enforce such that individuals simply do not revert back to their original groupings.
  2. Changing the private incentives for a sub-set and seeing if there are effects on others whose private incentives are not changed.
  3. Interventions that seek to directly challenge social norms such as mass media campaigns. The identification issue here is that the changes may simply affect private preferences rather than acting directly upon the social norms.

Econometric identification of social interactions is very hard, and perhaps the strongest evidence for these phenomena are the persistent degrees of stratification amongst populations.


Econometric Possibilities

The basic concept of social interactions is that one individual’s actions are made based in part upon the actions of another individual or grouping of individuals. Various techniques are used to empirically test these interactions [see later summaries for spatial context]. In general however, these specifications are subject to three problems:

  1. Simultaneity: A’s actions may be affected by B’s, but B’s will most likely be affected by A’s at the same time. This means that any regression that includes B’s actions as an explanation of A’s will suffer from endogeneity. Endogenous and exogenous interactions cannot be separately identified.
  2. Correlated unobservables problem and the related errors in variables problem. This arises is there is some group specific component of the error term that varies across groups and is correlated with the exogenous characteristics of the individuals. The unobservables could arise from preferences, or environmental settings.
  3. Endogenous membership problem – people may sort into groups based on unobservable characteristics. This is similar to selection bias.


The challenge then is to see whether these issues that can generally be lumped together as endogeneity issues, can be circumvented using techniques such as instrumental variables, quasi-experiments, or randomized control trials.


Spatial Smoothing and Weighting


Notes from Haining (2003) Chapters 5 and 7.1, and lecture

There are different conceptual models of spatial variation: 

  • The Regional Model: The emphasis is on the definition of regions as spatial units used for analysis. There are three types of region of particular interest
    1. Formal/Uniform: Constructed using sharp partitioning of space into homogenous or quasi-homogenous areas. Borders are based upon changes in attribute levels. This is fairly simple to achieve with small numbers of variables, but becomes harder the as the number increases unless there is strong covariance between them all. This method is also tricky when the variable(s) of interest are continuous over space.
    2. Functional/Nodal: regions are constructed using interacted data. Whereas formal regions are characterized by uniformity of attribute level, the functional region is bound together by patterns of economic or other interaction which set them apart from neighbouring districts. E.g. Labour market regions are defined by attribute similarity (such as travel to work time).
    3.  Administrative: These regions are the consequence of political decisions.


  • Rough and Smooth: This model assumes there are two components to spatial data such that

data = smooth + rough

 The smooth part is regular or predictable such that knowing part of the distribution allows for extrapolation at the local level. The rough part is irregular, and is what is left over once the smooth component has been accounted for. The rough component cannot be used for extrapolation even though it may be explainable in terms of the underlying data generating process. Smooth structures could include topographical features, a propensity for similar data values to be found clustered together (spatial autocorrelation), trends or gradients etc. Rough could include local hot/cold spots, spatial outliers, and localized areas of discontinuity.  

  • Scales of Spatial Variation: This recognizes the different scales of spatial variation such that: 

Spatial Data = Macro(scale variation) + medium/micro(scale variation) + error

The error here includes measurement error. The macro component refers to trends or gradients present across the whole study area e.g. a south-north linear trend. The medium/micro component refers to more localized structures that are conceptualized as superimposed upon the macro variation e.g. localized disease hotspots, or cold spots. If the data map is dominated by the micro variation, then it displays special heterogeneity. In such a circumstance analysis should probably be conducted at the local level.


The aim of map smoothing is to remove the distracting noise or extreme values present in data in order to reveal spatial features, trends etc. in other words to smooth away the rough component of the data to reveal the underlying smooth pattern. It tries to improve the precision of data without introducing bias.

 Resistant Smoothing of Graph Plots

This method is applied to graphical plots to aid visualization and to identify trends and relationships. This essentially fits smooth curves to scatter plots using locally weighted regression with robustness iterations. A vertical line is centred on an observation of x, and then a bandwidth applied around it. The paired observations that fall within this bandwidth or window are assigned neighbourhood weights using a weights function that has its maximum at the centre of the window and decays smoothly and symmetrically to become zero at the boundary of the window. A line is fitted to this subset of data by weighted least squares to create fitted values of y. The window is then slid along the x axis and the process repeated to create a sequence of smoothed y values. The bands overlap as generally the window is applied to each ordered value of x.

 Essentially the idea is as follows:

  •  x(s) is a variable that has a spatial element s
  • m(s) is the smooth part of the data and is a function of s although the function is unknown and very likely to be non-linear. This is the part we are trying to estimate
  • ε is the rough part we are trying to smooth away

 The most common way to get at the estimation of m(s) takes the form:


Where w(s*, sj) is a scalar weight assigned to data point j given its distance (or other weighting scheme) from location s* which is where the window is focused.  It should be noted that the sum of the weights equals 1.

Thus the predicted value of m(s) is basically a moving weighted average.  

There are various different weighting structures that can be used.

  • Nearest Neighbour: Here the eights are based on some predefined set of k nearest neighbours such that w(s*, sj) = 1/k if data point j is one of the k nearest neighbours to location s* and 0 otherwise.
  • Kernel Regressions: this assigns weights based upon the distance data point j is to the kernel point s* such that:


Here h is the bandwidth such that the window is always the same. Decisions about how best to aggregate the data are needed here, and should be drawn from some understanding of the underlying process. The denominator here ensures that the weights sum to 1 over the relevant bandwidth. Different kernels can be used such as the Uniform (rectangular) kernel, or the Normal (Gaussian) kernel.

 Using these weights we can run a locally weighted regression that estimates a coefficient for every point of data j based upon the bandwidth and weights assigned. These local polynomials can then be made to join locally to create a spline. It should be noted that these methods encounter problems at the edges of the data.

Map Smoothing

The above methods apply in two dimensions (left-right). However, when using data that has more dimensions (North, South East, West) as in spatial data, the spatial kernels can be combine into just one dimension called distance. This method assumes that you can weight points equally in either direction of s*. In practice however, we may need to weight the N-S or E-W directions differently (for example house prices in south London may be more likely to increase as we move north toward the centre, rather than south toward the suburbs). In other words the method assumes the spatial function is the same in every direction. If this is unlikely to be the case, then there is an argument for using narrow bandwidths.

The idea of map smoothing is to improve the precision associated with area data values whilst not introducing serious levels of bias. It is very important to decide upon what size of window is to be used for the smoothing. A large window improves the precision of the statistic because it borrows a lot of information from the rest of the observations (effectively this is increasing the sample size) but the cost is that there is a greater risk of introducing bias as information is borrowed from areas further away that may be different in terms of the data generating process. A small window reduces the risk of bias, but also decreases the precision of the estimate.


The effectiveness of local information borrowing when smoothing depends upon local homogeneity. If local areas are very different (i.e. the rough component is very large) then even very localized smoothing can introduce bias. This homogeneity will itself be affected by the size of the area that is sampled (or density of the sampling points) relative to the true scale of spatial variation [see MAUP]. In the case of high heterogeneity smoothing using local information may not be as effective as smoothing using information borrowed from data points that are similar to the area in question (even if they are not located nearby).


There are a variety of different methods for map smoothing


  • Mean and Median Smoothers: in this case the value of data point j is replaced with the median, or mean drawn from the set of values (including itself) that are contained within the window imposed on the map. The window may be defined in terms of distance, adjacency, or other characteristic. The rough component of the data can then be extracted by subtracting the median/mean from the observed value. The choice between the two is important. Mean smoothing tends to blur spikes and edges and so may be appropriate in environmental science where localized spikes are not generally expected. Median smoothing tends to preserve these spikes and hence may be more useful for social applications where there can be large localized spikes. In any case, the performance of these smoothers will depend upon the weights assigned to them.
  • Distance weighting is often used. This can be a simple nearest neighbour weight scheme as explained above, although this type of neighbourhood weight function causes values to change abruptly and smoothers based on them can have undesirable properties. A simple distance weighting scheme can be used where data are assigned the values based on weights:

 Wj = dij-1 / ∑dij-1 and then the predicted value of m(s) = ∑wijxj


An additional restriction can be added such that the only values assigned weights and used in the estimation conform to some rule dj < D otherwise they equal 0. This then has become a kernel smoother as described above. It is traditional to omit the observed value of the kernel point from the analysis (i.e. exclude observation i from the calculation of the mean for point i).

The literature has shown that the precise choice of weighting function is generally less important the choice of window. There is always a variance/bias trade off to be made. The choice should be made based upon either knowledge of the nature of local mean variability.

There are other types of smoothing such as “headbanging” and median polishing, but at this time they appear to go beyond the scope of the course.  

In the above example, house price points are smoothed into a raster grid to assign a value to the raster square using an inverse distance weighting scheme.  

In practice weighting schemes can be based on a variety of different things: social weights, network distances, migration flows, travel times, income differences etc.

Modifiable Area Unit Problem (MAUP)



A.      Briant, P. P. Combes & B. Lafourcade, Journal of Urban Economics, 67 2010)

(Notes from Lecture, Problem set and above referenced paper) 


The Modifiable Area Unit Problem (MAUP) arises when working with statistical data, and is concerned with the sensitivity of results to the particular choice of zoning system that is used in the analysis. The core of the problem is that we often do not know how driven our statistical results are by the shape/size of the areas under examination and how data within those areas have been aggregated. As the paper shows, coefficients can vary depending on how data are aggregated within different area units.

For example when examining the effects of agglomeration on regional economies, it is often unclear whether results are truly driven by knowledge spillovers, labour pooling effects etc. or whether results are simply the product of how the data are organized. This is an important issue when thinking about policy regarding cluster formation strategies. When investigating such economic processes that have spatial characteristics, results will be affected by shape/size of the clusters used, when those clusters do not accurately reflect the underlying economic realities. Put another way, if data are generate by a particular spatial process, results of analysis will be affected when the units employed do not reflect the underlying data process. Returning to the agglomeration example, if the units of analysis are not well chosen then the zones will pick up/fail to pick up the effects of industries not-in/in their true field of economic influence. This will then tend to over/understate the effects on agglomerations on economic outcomes.


The shape effect refers to results that are driven by different shapes used in analysis. In the boxes, if a black dot is skilled labour productivity and the red is unskilled, then we see even distribution of productivity in the top panel. However, redrawing the shape means that we identify two clusters of high productivity workers, and two clusters of low productivity workers.


The size effect refers to the size of the units. I have not drawn an example but it is easy to see that using smaller triangles may find smaller less dense clusters of high productivity areas, and similarly clustered low productivity areas.


Illustrations with Simulated Data

Arbia (1989) says that the problems of shape and size distortion are minimized (not eliminated) when there is exact equivalence of sub-area (in terms of size and area), and there is an absence of spatial autocorrelation. In practice these two restrictions will very rarely be met. With regard to the lack of spatial correlation, this will rarely be the case as there are likely to be spillover effects from activities or outcomes or processes in one area upon another.


Amrhein (1995) simulated data by randomly generated variables and randomly assigning them to a Cartesian address (i.e. no autocorrelation). He then varied the unit size to 100, 49 and 9 squares of observation. Under these conditions he was able to show that means do not show any particular sensitivity to shape/size effects, although variances increase (and hence standard errors) as the sample size decreases (i.e. when moving to a smaller number of units).


Amrhein & Reynolds (1997) then went on to show that distortions using real census data were sensitive to both shape and size effects but also the aggregation process (i.e. whether data are summed or averaged). It is fairly intuitive to understand this. If data are aggregated by summation, they will be more distorted by an increase in size (as more observations are added) than averaging (the effect of adding more observations is reduced due to the effect of averaging).


Correlation Distortions

The MAUP can produce distortions in multivariate analysis of data drawn from spatial units. The effects of shape/size and the effects of aggregation have been separated out. Amrhein (1995) finds that his coefficients are sensitive both to size and shape, but if the model is well specified the method of aggregation seems to imply fewer distortions. When aggregation affects the dependent and dependent variable in the same way, the effects are small, and this goes for the shape effects also. However, when they are aggregated differently, they will not have the same degree of spatial autocorrelation and hence the size etc. effects will be larger.


Thus the problem is likely to be smaller for wage regressions (where data are averaged) than for gravity regressions (where data are summed).


Testing the Problem

The authors look at the effects of employment density on wage levels in France using a variety of different zoning methods. Specifically they use

  • 341 Employment areas – government units based on minimizing daily commute between zones.
  • 21 Regions
  • 94 departments
  • 22 Large squares
  • 91 medium squares
  • 341 small squares
  • 100 semi-random zones.

Some of the data are summed (employment/trade flows) whereas some are averaged (wage rates); the former increase with the size of the units, whereas the latter are relatively more stable across different zoning systems.


The first thing they do is to calculate the Gini coefficients within every zoning system for the 98 industries examined throughout 18 years and rank them. Using rank correlation analysis they show that the ranking of industries is virtually unaffected by changes in zoning system. However when they use different indexes (e.g. Ellison and Glaeser indices) to rank the industries the rank coefficients are lower meaning that MAUP problems are more pronounced when using alternate indices. This indicates that the index rather than the MAUP is a more significant problem, and thus careful specification of index used should be a more primary concern than issues to do with units of observation.


They then undertake basic regression analysis of log wages on log employment density and a vector of controls. They find that the effects of moving from one zoning system to another are generally small. This is so especially when moving from very small to slightly less small units. The grid system is generally more sensitive which indicates that boundaries that do not reflect administrative/economic realities do generate more error. However, when they control for observed and unobserved skill levels (in order to see if workers are sorting into high density employment areas) the coefficient on density changes by an order of magnitude more than it did for changes due to MAUP. Thus, specification of the model again appears to be significantly more important than MAUP.


Similar changes are observed when market potential (measured as distance to other employment centres) and different definitions of market potential are controlled for. These changes to specification are all more important than MAUP.


Lastly they look at gravity equations for which the dependent variable (trade flows) are summed within zones, and these summations make the coefficients more sensitive to MAUP problems.

Intro to Spatial Analysis


Notes from Lecture and Chapter 2/3 Fothering 

Spatial Analysis

Quantitative geography (spatial analysis) differs from econometrics in as much as it is concerned with data that has a spatial element i.e. data that combine attribute data with locational data. It prime use is to generate knowledge about how data is generated in space, and hence it seeks to detect the spatial patterns that are generated due to physical and social phenomena. Data can be visualized which can enable detection of patterns, but then quantitative analysis can be used to examine the role of randomness in generating those patterns and to test hypotheses about them. All in all, spatial analysis is a testing ground for ideas about spatial processes

Spatial Data

Spatial data are observations of phenomena that possess a spatial reference. Such data may be captured by digitizing maps, collected by survey, or remotely sensed by satellite (amongst other ways).


  • Spatial Objects:        Spatial objects are of three basic types, points, lines or areas. Essentially they are things that can be represented in 2 dimensional space by drawing of some kind of shape e.g. houses, railway lines, county boundaries. They all have some spatial reference that describes the position of the object on the surface of the earth. As well as the spatial reference the object can be associated with some observable characteristic of that object such as elevation, test scores in schools, amount of output at a factory.
  • Fields:          Fields are used as an alternative to objects. Measuring some continuous variable such as elevation, or air density may be hard to achieve using points as the variation is spatially continuous. These types of spatially continuous variables are called fields, and whilst for some very basic fields it may be possible to derive the function that describes the spatial variation, in practice for the vast majority of fields the function remains unknown. In such a case it is simpler to measure the field in a discretized form at regular intervals such that the observations form a regular lattice or grid (they may also be measured at irregular spaces, but this is not so common). The field is a measure of variable x that is geographically referenced to a location on the earth’s surface. Fields can be either fixed (non-stochastic) as in elevation, or random as in income.


Even if we have data on the entire population (which in itself would be rare), it remains only a snapshot in time. There is some underlying process that is generating that data set, and to uncover that process we need to conduct statistical analysis. In other words, simply collecting data for every member of the population will not necessarily lead to an understanding of the underlying processes.



In order to conduct analysis we need a consistent means of describing locations on the earth. Latitude and longitude are one such method, lat being measured north form the equator, and longd being measured east or west of the Greenwich meridian. In order to calculate distances using this projection spherical trigonometry is used. As this can be somewhat cumbersome it is often easier to ignore the curvature of the earth and consider the data to lie on a flat plane. Then a Cartesian coordinate system can be used and Pythagoras’ theorem can be used to calculate distances. This is only appropriate when examining a relatively small area such that ignoring the curvature does not produce undue distortions. British National Grid is one example of such a system.


Distance can then be calculated thus:


However, this distance may not be the most meaningful, for example to a car or pedestrian in a city, the line of sight distance fails to take into account buildings and other obstacles that lie in the path of the two points.


Representing Spatial Data

Spatial data are generally represented in two ways.


  • Vector Model:           In the vector model the physical representation of objects and lines closely matches the conceptual reality. A single point with attributes is represented by an ordered coordinate pair and a vector of attributes. Data is frequently stored within closed polygons (such as boundary lines). This method can also be used to store network lines such as a rail network.
  • Raster Model:           The raster model only stores data values ordered by column within row, or row within column i.e. a grid or lattice. Data is stored that details the origin of the lattice such that each cell can be related back to a geographical location on the surface of the earth. The accuracy of the location is dependent upon the size of the grid cells. Additionally only one single attribute can be stored in the grid, so if two points with different attributes hash to the same cell some decision will have to be made as to which one is stored. Raster data can be discrete (e.g. urban or rural) or continuous (e.g. population density).

Which method is used tends to depend upon the type of data being used. If the data are already in lattice form (such as satellite data) then the raster method is easiest, whereas is positional accuracy is a concern then the vector method is preferable. Often raster data is used to represent fields, but this need not be the case.

Problems and Opportunities

There are a range of problems associated with spatial data analysis such as identifying spatial outliers, edge effects etc. However, there are two that are of particular importance.


  • Spatial Autocorrelation:      A fundamental assumption of statistical analysis is that observations are independently drawn i.e. the value of one observation does not affect another. However, particularly in spatial analysis this is hard to assume as everything is related to everything, but in particular near things are more related to each other than distant things. Data from geographic units are tied together by a variety of factors – contiguity, social character of area, different people being in different locations. There can be spillovers from activity in one area to another, and hence independence is violated. This is called spatial autocorrelation.
  • The main problem with spatial autocorrelation is that the variance of estimators is affected and this in turn affects statistical significance, and hence the construction of confidence intervals. Therefore if we ignore positive spatial correlation the standard errors/confidence intervals will be too small, and if we ignore negative correlation they will be too wide. This can therefore affect decision rules when hypothesis testing can lead to incorrect conclusions.
  • MAUP:         The modifiable area unit problem is concerned with data that has been aggregated into different zones. This will be examined in detail in forthcoming summaries.