Category Archives: Introduction

Intro to Spatial Analysis


Notes from Lecture and Chapter 2/3 Fothering 

Spatial Analysis

Quantitative geography (spatial analysis) differs from econometrics in as much as it is concerned with data that has a spatial element i.e. data that combine attribute data with locational data. It prime use is to generate knowledge about how data is generated in space, and hence it seeks to detect the spatial patterns that are generated due to physical and social phenomena. Data can be visualized which can enable detection of patterns, but then quantitative analysis can be used to examine the role of randomness in generating those patterns and to test hypotheses about them. All in all, spatial analysis is a testing ground for ideas about spatial processes

Spatial Data

Spatial data are observations of phenomena that possess a spatial reference. Such data may be captured by digitizing maps, collected by survey, or remotely sensed by satellite (amongst other ways).


  • Spatial Objects:        Spatial objects are of three basic types, points, lines or areas. Essentially they are things that can be represented in 2 dimensional space by drawing of some kind of shape e.g. houses, railway lines, county boundaries. They all have some spatial reference that describes the position of the object on the surface of the earth. As well as the spatial reference the object can be associated with some observable characteristic of that object such as elevation, test scores in schools, amount of output at a factory.
  • Fields:          Fields are used as an alternative to objects. Measuring some continuous variable such as elevation, or air density may be hard to achieve using points as the variation is spatially continuous. These types of spatially continuous variables are called fields, and whilst for some very basic fields it may be possible to derive the function that describes the spatial variation, in practice for the vast majority of fields the function remains unknown. In such a case it is simpler to measure the field in a discretized form at regular intervals such that the observations form a regular lattice or grid (they may also be measured at irregular spaces, but this is not so common). The field is a measure of variable x that is geographically referenced to a location on the earth’s surface. Fields can be either fixed (non-stochastic) as in elevation, or random as in income.


Even if we have data on the entire population (which in itself would be rare), it remains only a snapshot in time. There is some underlying process that is generating that data set, and to uncover that process we need to conduct statistical analysis. In other words, simply collecting data for every member of the population will not necessarily lead to an understanding of the underlying processes.



In order to conduct analysis we need a consistent means of describing locations on the earth. Latitude and longitude are one such method, lat being measured north form the equator, and longd being measured east or west of the Greenwich meridian. In order to calculate distances using this projection spherical trigonometry is used. As this can be somewhat cumbersome it is often easier to ignore the curvature of the earth and consider the data to lie on a flat plane. Then a Cartesian coordinate system can be used and Pythagoras’ theorem can be used to calculate distances. This is only appropriate when examining a relatively small area such that ignoring the curvature does not produce undue distortions. British National Grid is one example of such a system.


Distance can then be calculated thus:


However, this distance may not be the most meaningful, for example to a car or pedestrian in a city, the line of sight distance fails to take into account buildings and other obstacles that lie in the path of the two points.


Representing Spatial Data

Spatial data are generally represented in two ways.


  • Vector Model:           In the vector model the physical representation of objects and lines closely matches the conceptual reality. A single point with attributes is represented by an ordered coordinate pair and a vector of attributes. Data is frequently stored within closed polygons (such as boundary lines). This method can also be used to store network lines such as a rail network.
  • Raster Model:           The raster model only stores data values ordered by column within row, or row within column i.e. a grid or lattice. Data is stored that details the origin of the lattice such that each cell can be related back to a geographical location on the surface of the earth. The accuracy of the location is dependent upon the size of the grid cells. Additionally only one single attribute can be stored in the grid, so if two points with different attributes hash to the same cell some decision will have to be made as to which one is stored. Raster data can be discrete (e.g. urban or rural) or continuous (e.g. population density).

Which method is used tends to depend upon the type of data being used. If the data are already in lattice form (such as satellite data) then the raster method is easiest, whereas is positional accuracy is a concern then the vector method is preferable. Often raster data is used to represent fields, but this need not be the case.

Problems and Opportunities

There are a range of problems associated with spatial data analysis such as identifying spatial outliers, edge effects etc. However, there are two that are of particular importance.


  • Spatial Autocorrelation:      A fundamental assumption of statistical analysis is that observations are independently drawn i.e. the value of one observation does not affect another. However, particularly in spatial analysis this is hard to assume as everything is related to everything, but in particular near things are more related to each other than distant things. Data from geographic units are tied together by a variety of factors – contiguity, social character of area, different people being in different locations. There can be spillovers from activity in one area to another, and hence independence is violated. This is called spatial autocorrelation.
  • The main problem with spatial autocorrelation is that the variance of estimators is affected and this in turn affects statistical significance, and hence the construction of confidence intervals. Therefore if we ignore positive spatial correlation the standard errors/confidence intervals will be too small, and if we ignore negative correlation they will be too wide. This can therefore affect decision rules when hypothesis testing can lead to incorrect conclusions.
  • MAUP:         The modifiable area unit problem is concerned with data that has been aggregated into different zones. This will be examined in detail in forthcoming summaries.