Monthly Archives: April 2012



I.W.H. Patty, M. Walls & W. Harrington

Journal of Economic Literature Vol. XLV (2007) pp. 373-399 

In a Nutshell

This is a wide ranging and very detailed paper that describes a host of negative externalities that are imposed by drivers on the road, and the policy responses that seek to address them. The main conclusion is that for the most persistent externalities that cause gridlock in so many places, the best hope is offered by congestion charging via electronic road pricing. To improve highway safety some mileage based insurance policies should be considered. Whilst it is unlikely that there measures will internalize global warming type externalities it is argued that to do so with transport policy would not be most efficient. To reduce carbon emissions some form of carbon trading is preferable, or a tax on all oil products, not just on gasoline. These measures could be buttressed by R&D into new technologies to aid in the shift away from fossil fuels.


A Typology of Externalities

Local Air Pollution: gasoline vehicles emit harmful elements into the atmosphere which can be damaging to the health of those located in the vicinity. These pollutants can be reduced by reducing VKT and lowering emissions through technology. A fuel tax may achieve the former, but it cannot achieve the later and so should not be pursued in isolation. In fact emissions have fallen dramatically due to progressively stringent emissions standards. A willingness to pay estimate for the cost of avoiding poor health indicates that 2.3 cents per mile is the value of the externality.


Global Air Pollution: cars etc. account for 20% of nationwide carbon emissions. A fuel tax is a tax on carbon emissions as there are no available technologies for reducing carbon emissions on light vehicles. Various estimates of the value of a rise in temperature have been proffered, and they differ wildly, largely due to different social discount rates being used. They yield estimates of between 5, 12 and 72 cents per gallon of gasoline.


Oil Dependency: dependence exposes the country to volatility and price manipulation.  Like the vulnerability to oil price volatility and the cost of military presence in the Middle East, this externality is pretty murky and ill defined, and is not clearly related to market failure. See the paper for more details.


Congestion: I do not go again into how congestion is related to externalities. Estimates of the externality imposed in terms of reduced speed indicate a tax of $1.05 per gallon. However, if this were applied to fuel the effect may not be as intended as peak driving decision are often much more inelastic with respect to price – people have to get to work. Thus such a fuel tax would probably only have an effect on the least congested roads.


Traffic Accidents: there are around 40,000 deaths per year on the US highways. What is needed is a tax on VMT reflecting differences in marginal costs across drivers, vehicles and regions. Using quality adjusted life years estimates the value would appear to be 15 cents per VMT.


Noise Costs: are estimated at 0.4 cents per mile for passenger vehicles.


Others considered are highway maintenance, urban sprawl, parking subsidies etc. 


Fuel Tax: fuel taxes have declined as they have not kept up with inflation and improved fuel economy. Behaviour changes in response to fuel prices, but the elasticity of VMT to fuel range is between -0.1 and -0.6.  This could thus be an avenue for policy. However, there are issues regarding equity (fuel is a proportionately larger part of the poor’s budget, so the tax is thought to be regressive) although these could be partially solved by recycling the tax dollars in the direction of pro poor policy. There are also political issues with a strong auto/oil lobby.


Fuel Economy Standards: economy standards reduce emissions and dependence although they may increase other externalities as people are encouraged to drive more VKTs.


Alternative fuel technologies are another possible response.


Congestion Tolls: This is attractive. Building new roads is now hard (given high levels of existing development) and may not even be efficient. New income is needed for highway maintenance as fuel tax levels have fallen. Congestion can now be collected electronically which reduces bottlenecks due to toll booths etc. On the other hand, it may be hard to make meaningful estimates of what the marginal pricing structure should be, and it may represent a substantial information barrier which the consumer is not able to react to efficiently. There are political problems too.


G. Duranton & M.A. Turner

NBER Working Paper 15376

Principal Research Question and Key Result Does increasing the size of the interstate highway system relieve congestion? The key finding is that the elasticity of vehicle kilometres traveled to highway lane kilometres is almost 1 across all specifications which indicates that the amount of traffic increases proportionately with the size of the highway network. In other words, building roads is not a good means of reducing congestion.


Theory When deciding whether to enter the road system a driver assesses the marginal benefit of driving an extra kilometer (which is assumed to be a decreasing function), and the marginal cost (including his time, fuel etc.) of driving that kilometer. He drives until the marginal benefit equals marginal cost. However, the marginal social cost of him driving that kilometer is higher than the marginal personal cost as he imposes an externality on other drivers by being on the road (i.e. his presence on the road adds to congestion in general). Thus the social optimum equilibrium of the amount he drives will be lower than the private equilibrium. Transport policy can intervene in order to better equate the social and private costs such that the social optimum is reached.

This can occur in a variety of ways. Fuel tax could be used for example. However, this has been shown to affect largely leisure trips and not the travel to work trips (which are presumably less elastic with respect to price) that are the main cause of congestion. Another option would be to charge per metre of road used, with different pricing mechanism for the time of day and the amount of traffic. This option would be hard to implement and also it would be hard for an individual to respond rationally to a complex and changing charging mechanism. Congestion charging is a limited form of this, a point that will be returned to in later summaries.

The option under examination here is to build more roads. As this will increase capacity on the road network it should reduce the amount of negative externality the individual driver imposes on others, thus moving the marginal social cost nearer to the marginal private cost, and bringing congestion nearer to the socially optimum level. However, maybe more roads simply attract more drivers in which case all equilibiria are simply shifted outwards, and the result will be no nearer to the social optimum than the previous equilibrium.


Motivation The cost of congestion is huge. Between 1995 and 2001 the time spent on household travel increased 10% whilst distances remained constant, which is equivalent to billions of dollars’ worth of lost time.


Data Using Metropolitan Statistical Areas (MSA) they use official highway data to generate variables that detail the lane kilometres, vehicle kilometers traveled (VKT), and the average annual daily traffic (AADT).  They then do the same for other major roads. The summary stats show that the AADT increased from 4,832 vehicles per kilometer lane of highway in 1983, to 9,361 per lane kilometer in 2003. They have three cross sections of data.


Strategy There are a variety of strategies. Firstly they pool the cross sections and do a simple OLS regression VKT on the left and lane kilometres on the left with geographic, climactic, socio-economic and population controls.

Using the panel format they then control for fixed effects, and time fixed effects by differencing the data.

They recognize that there could be endogeneity issues. Specifically if VKT is correlated with some unobserved demand for driving, and planners respond with road building policies to that demand for driving by building roads then the coefficients will be overestimated as the increase in vehicle kilometres traveled will be due to demand for driving, not a consequence of the road building. Thus they have an IV strategy.

IV1: Planned highway kilometres from the 1947 Highway plan. This was a plan to connect the major population centres as directly as possible. Clearly this will be very relevant, and they argue it is exogenous as the plan was drawn up to connect population centres in the 1950s, without a thought for future traffic demand. This instrument is conditional on population. (don’t know what this means).

IV2: Rail network in 1898. Railroad travel connected a lot of cities and towns in the 19th century, and as the importance of railways waned, roads were built that followed their routes as substitutes. Given that the economy was very different when the railroads were constructed, and that they were done so primarily by private companies who were concerned with relatively short term gain, it is unlikely that they were made with future traffic flows in mind, and this they argue adds credibility to the exogeneity argument. They claim the instrument only need be exogenous conditional on the controls, so controlling for historical populations and geographic variables is sufficient to guarantee exogeneity. (check this)

IV3: Expedition routes between 1835 and 1850. Again they control for historical populations and geography and say it is hard to imagine how the explorers were selecting routes with travel between future cities in mind [I don’t see how that is the point particularly].

They then instrument for VKT using all instruments (though they do test them separately). As the F-stat in the first stage is less than 10, they do a LIML estimation as well which is supposedly more robust to weak instrument problems.

Results They have a coefficient of around 1 in all specifications (adding controls one at a time). This is the case when instruments are used one at a time, and also when amalgamated into a single first stage.

It appears then that new road capacity is met with a proportional increase in driving, thus confirming what Downes called the fundamental law of road congestion.


Robustness That the coefficients are robust to a wide variety of specifications is fairly good evidence that the results are not being driven by the nature of the model. (a more pessimistic interpretation would be that all specifications are affected equally by endogeneity).

They use data on availability of public transport and find that increasing public transport does not affect congestion. This is because public transport may take some people off the road, but as that effectively increases road capacity in a similar way as building new roads, the VKT demand response is the same.

Using data on what type of vehicles are using the road network over time they try to decompose VKT to understand where the extra demand is coming from. They find that commercial traffic accounts for between 10-20% of the increased VKT. Individuals account for around 11-45% of the increase. Population is thought to increase due to new highways as economic activity is increased. They find that a 10% increase in the road network causes a 1.3% increase in population in a MSA over 10 years and this accounts for around 5-15% of the extra VKT. Another mechanism could be diversion from other roads to the highways, but when they test this they find only very small results suggesting that traffic creation, not diversion is the problem (the mechanism for testing is regressing the VKT for highways on non-highway lane measures).


Problems Bad controls. – including socioeconomic controls could be dangerous as they are direct outcomes of the independent variable (kms of highway). Introducing outcome measures as additional controls biases estimates in indeterminable ways. Some comfort is taken from the fact that the results do not change significantly.

IV – exogeneity concerns remain particularly for the 1947 highway plan. Comfort is taken from the fact that the results are broadly the same across all specifications.

Weak Instruments – the instruments are weak, and it is not totally clear that the LIML estimation solves this problem. Again, it is comforting that all estimates are broadly similar across specifications.


Implications It appears then that new road capacity is met with a proportional increase in driving, thus confirming what Downes called the fundamental law of road congestion.

Public transport probably will not affect congestion levels. They do back of the envelope welfare calculations and find that the time saved by new highways is probably not worth the cost, whereas improvements in public transport are most likely to be welfare improving.



IMD and Mayoral Election Maps

The following is a visualization of the Index of Multiple deprivation by LSOA for London. The higher the IMD score the more deprived the area. So light blue is least deprived decile, and dark pink is the most deprived decile…

The second map represents the 2008 London Mayoral Elections. The darker the red the greater the Labour (Livingston) Majority in that winning seat. The darker the blue the greater the Conservative (Johnson) Majority in that winning seat.



O. Dube & J Vargas

Mimeo MYC (2011)

Principal Research Question and Key Result How do income shocks affect conflict in Columbia? Specifically they examine how conflict is affected by different types of shock. The key result is that negative price shocks in labour intensive  industries such as agriculture increase the incidence of conflict in areas that are more intensely defined by that industry. Positive price shocks in capital intensive industries on the other hand increase the incidence of conflict in those regions more dependent on that type of industry (i.e. the effects are opposites).


Theory Two theoretical mechanisms linking price shocks and conflict are identified.

  1. Opportunity cost: a rise in workers’ wages (due to price shocks) increases  the opportunity cost of participating in conflict if workers decide between working in agriculture or receiving the wages paid by paramilitary type groups. This means that in industries that are labour intensive (e.g. agriculture) a fall in wages may incentivize some of those workers to move into conflict participation. Thus areas that are more dependent on agriculture will see a differential rise in the incidence of violence as opposed to regions that are less dependent.
  2. Rapacity:                  a rise in price of commodities produced using capital intensive methods (such as oil) increases the amount of contestable wealth within a region and thus the returns to predation/conflict rise. This predicts that a positive oil price shock will be associated with a rise in violence in areas that are more reliant on the oil industry as opposed to labour intensive industries.


Motivation Other papers have looked at income shocks on conflict and found positive results. In the Miguel paper (to be summarized shortly) he instruments for GDP using rainfall, and shows that negative shocks to income affect the incidence of violence. This paper aims to go one further by identifying different channels through which income can affect conflict.

The use of panel data is important here. Cross country data do not allow for controlling for fixed/time effects, and data may not be comparable across countries. However, even panel data cannot take account of time varying and region specific fixed effects, and also cross country results are more easily generalizable due to high external validity.


  • Data are from Colombia. 21,000 war related episodes in 950 municipalities from 1988 to 2005. The conflict data separated into Guerilla attacks, Paramilitary attacks, clashes and casualties.
  • Commodity intensity is drawn from land use survey 1997 for coffee production. For oil the data are barrels per day in 1988 and the length of pipeline in 2000. Prices are taken from international statistics, and internal statistics.
  • Colombia makes a good case study as data are available in panel format, and there is lots of variation in violence experienced. Additionally, oil and coffee are different types of industry and both are major contributors to national income.


Strategy In order to get at the two theoretical channels they look at two commodities, coffee (which experienced a major price slump in the period – thus affecting wages) and oil (which experienced a major price rise – thus affecting contestable income as municipalities that produce and transport oil receive royalties from its sale). They interact the price of those commodities with a measure of intensity at the municipal level in order to get at the differential effects of the shocks by municipality.

yit = αj + βt + γ(OilProdj * OPt)+ δ(OilPipej * OPt) + θ(Cofj * CPt) + λXjt + εjt


y are conflict outcomes in municipality j in time t. α and β are municipality and time fixed effects. OilProd measures oil production in municipality j in 1998 and OilPipe is length of pipe in municipality j in 2000. OP is the natural log of the international oil price. Cof is the number of hectares devoted to coffee production in 1997, and CP is natural log of internal coffee price. X is a vector of control variables. γ and δ capture the extent to which the oil price induces a differential change in violence in municipalities that produce or transport oil more intensely and θ shows the extent to which price shocks for coffee affect violence differentially in coffee producing regions. 

Whilst oil price may plausibly be exogenous (as Colombia only produces 1% of world oil so is a price taker), the coffee price is unlikely to be so given that Colombia is a major exporter. If violence restricted production and this increased price, or similar, then this could confound the results. So they instrument for the coffee price using the log of foreign coffee exports. This seems to be fairly plausible strategy.


However, endogeneity concerns still exist as the level of coffee produced is not fixed, and farmers may substitute into and out of production based on prices, conflict etc. Thus they instrument for (Cofj * CPt) using a composite instrument constructed using data on rainfall, temperature and slope of the land, which determine the possibility to produce coffee in a given municipality. They use a topographical instrument to determine exogenous possibilities for a municipality hosting an oil pipeline.


Results θ  is negative and significant for all types of violence indicating that as coffee prices fell, municipalities that are coffee intensive witnessed a differential increase in all types of violence. γ and δ are positive but only significant for paramilitary attacks. The instrumental variable strategy results are similar (although less significant for coffee) although the signs and the significance are pretty varied on the oil related coefficients which may be indication of a more spurious link between the oil price and conflict.


  • They present graphs of attacks by two types of region, those that are coffee intensive, and those that are oil intensive. Before the years of the price shocks, the different types of violence seem to be moving more or less parallel to each other in the coffee/non-coffee municipalities, and then there is divergence which lends some credibility to the results. There is some parallelism in the oil/non-oil states, but to a far lesser degree, and this is less convincing.
  • They include wage levels as the dependent variable to show that price changes of coffee (but not oil) did affect agricultural wages. They do the same for local gov revenue which was affected by oil price changes but not coffee price changes. This means there are probably no spillover effects between the industries.
  • They show similar effects for other agricultural commodities (although with a more limited data set) which adds credibility to the opportunity cost story. They do the same for gold and coal and the results are similar but more mixed in terms of which types of violence are affected. As there is little theoretical justification for these different effects the results are hard to interpret.
  • As coca production could be a substitute for coffee production, and could attract more violence, then there could be bias in the estimates. To counter this notion, they include area under coca cultivation as the dependent variable and find no significant link between coffee price and coca cultivation. They also remove all coca producing municipalities from the sample and still find similar effects.


Problems Whilst they deal with the endogeneity between price and conflict quite convincingly there are still serious endogeneity concerns. In particular there could be variables that determine what crops are produced and also determine the possibility for conflict. For example local institutions may mediate conflict and provide stable investment environment for the introduction of certain crops such as coffee.  Geographical/topographical conditions may also determine choice of crop, and potential for violence.  This instrumentation strategy they use is not convincing, as if geographical factors were the cause of the endogeneity concern, they cannot then be used as instruments as the exclusion restriction is violated (in particular slope has been shown to influence conflict and it will determine agricultural production possibilities) . The same goes for the pipeline instrument. The F-stat is 5.5 in the first stage so the instruments are weak, and as there are quite a few of them, this increases the difficulties with introducing bias due to weak instruments. Lastly the structure of the instruments is extremely opaque – there is no discussion of why the instruments are constructed in the way that they are, and it is not clear therefore for what subset of the sample we are in fact estimating the LATE for. Generally there is no discussion of the exclusion restriction and thus the results without instrumentation may be subject to endogeneity bias, and the result with instrumentation should be taken with a big pinch of salt due to violation of the exclusion restriction, and the problem of weak instruments.

There is no clear reason why oil prices should only increase paramilitary attacks. The authors state that oil production is densely located and thus there is only room for one type of criminal organization to operate. This anecdotal evidence is not empirically supported, and it remains unclear why casualties should not be significant if attacks increase.

Since they have wage data at the municipal level (even if it is only individual level data) it is unclear why they do not test it directly. The show that wages were affected by the fall in price, and that the fall in price is linked to increased violence, but there is no necessary causal link between these two pieces of evidence. They could have instrumented wages using the export values of the foreign producers (or similar) and then used those predicted values to estimate the effect on conflict. There would have be some level of aggregation and there is a chance the wage sample is not representative, but it would have been useful to include some discussion.


Implications This is an interesting paper that advances the theory between income shocks and conflict. However, the identification strategy is not very clean especially in relation to the use of geographic instruments which cast doubts on the results. In general the opportunity cost story seems to be better supported than the rapacity story.

If the opportunity cost story is believed, then support for those dependent upon agriculture in times of falling prices could be a policy used to prevent the outbreak or perpetuation of armed conflict.





An Investigation into the Rail Network and Social Exclusion



In recent years it has been recognized that the public transport opportunities that accrue to individuals may play a part in determining their level of social exclusion. In particular, Tony Blair’s Social Exclusion Unit was specifically tasked with examining the linkages between social exclusion, transport, and the location of services with a particular emphasis on “opportunities that have the most impact on life-chances, such as work, learning and healthcare.”[1] A number of researchers have proposed methods for evaluating links between exclusion and transport, generally focusing on accessibility to services. This short paper aims to add to this growing literature by investigating the links between access to the rail network in England, and measures of social exclusion. I use spatially referenced data on the locations of train stations to construct an accessibility measure based on distance to the rail network and use regression techniques to investigate the effect that this measure has on social exclusion.

 The results of this analysis are somewhat surprising in that I find that a higher value of the accessibility measure (meaning that the rail network is further away) is associated with lower levels of social exclusion. This result is robust to a number of specifications and controls for the availability of other types of transport as well as controls for housing and environmental quality. The literature on exclusion and transportation does not offer a theoretical justification for why this should be so. As such, even in the event that the model is correctly specified and takes into account all relevant variables, which is highly unlikely, it would be unwise to conclude that there is any causal mechanism at work.

 Interestingly, higher values of an accessibility measure to the bus and coach network are associated with higher levels of social exclusion. This may be tentative evidence that the public transport network that is most pertinent for social inclusion is the bus/coach network, although further research would be needed to substantiate such a claim.

 The most that can be concluded is that with the data that has been made available for this analysis I am unable to uncover a meaningful link between access to the rail network and levels of social exclusion. This could be due to pertinent variables being omitted from the analysis or that the accessibility measure based on distance is not capturing what is important about access to public transport networks.

[1] Making the Connections: Final Report on Transport and Social Exclusion, Social Exclusion Unit (2003), p. 1


My Maps

I recently completed an investigation into the links between social exclusion and access to the rail network in England. The full report (including maps) is posted in pdf format above, but I have extracted the main maps, as they are colourful and awesome, and I’m dead proud of myself!

Index of Multiple Deprivation 2010

Index of Multiple Deprivation 2010

This map shows the English Index of Multiple Deprevation. Low scores (light blue) indicate low deprivation, and high scores (pink) indicate high levels of deprivation. The gradient colours are deciles, so ligh blue if the least deprived 10% regions, and pink is the most deprived 10% rof regions. Distance to Rail Network

 The above map indicates how far a region is from its nearest train station. The cooured gradients represent how far the region is in metres from the closest mainline rail station.



T. Lyytikainen SERC Discussion Paper 82

A Brief Summary 

This paper uses spatial instrumental variables in order to estimate the neighbourhood effects of tax rate setting. Using the IV approach the empirical results suggest that there is no significant interaction in tax rate choices among Finnish municipalities.


The Finnish case was chosen because in 2000 there was a reform to the system. Tax rates on property were previously selected by municipalities based upon a band of possible rates that was set by the government. In 200 the lower rate was raised by the government from 0.2% to 0.5%. This new lower limit was binding for 40% of the municipalities.


He discusses the spatial lag, and special instrumental variables models that are generally used. The method ultimately used is similar to the traditional spatial IV technique with one key difference; it is the policy intervention that is used as instrument rather than a higher order weighted average.


The actual imposed tax rate changes are not observable as there is no observable information about what rates would have been chosen had the lower bound not been altered. However, he constructs a measure or predicted imposed increase in tax rates that looks like this:


Zi2000 = D(T2000 > T1998i)(T2000 – T1998i)


Where T2000 > T1998i  is a dummy variable equal to 1 if the municipality had a 1998 tax rate below the lower limit of the year 2000 newly imposed lower limit. He then uses this as an instrument for the spatially lagged tax rate change. He says the instrument is relevant in the first stage (but does not report it), and conducts a placebo test.


The weighting system is nearest neighbour, but for robustness purposes, population weighting and a combination weighting scheme are also used.


The results are that although the coefficient on neighbour’s tax rates is positive, it is very small, and also statistically insignificant. This result is robust to different weighting systems.

Interestingly he tests the data using the SAR and general spatial IV method and finds strong and highly significant coefficients which casts doubts upon their reliability given that the IV method he uses should be stronger due to diminished endogeneity concerns.



J.K. Brueckney & L.A. Saavedra, National Tax Journal Vol 56, No. 2

A Brief Summary 

In a Nutshell

The authors use city level data from the US to estimate a model of strategic-tax competition and specifically the tax reaction function. They find that this function has a non-zero slope which indicates changes in a local competitor’s rates affects choices made by a different community.

The data are drawn from a sample of 70 cities that comprise the Boston Metropolitan area. Working under the assumption that community The authors use city level data from the US to estimate a model of strategic-tax competition and specifically the tax reaction function. They find that this function has a non-zero slope which indicates changes in a local competitor’s rates affects choices made by a different community.

The data are drawn from a sample of 70 cities that comprise the Boston Metropolitan area. Working under the assumption that community i‘s tax decision is a function of the tax rates in other communities they use a SAR model of weighted averages of neighbouring jurisdictions as a spatial lag. To check their results are not driven by the weighting scheme used (as it is arbitrary), they test different weighting schemes as part of their robustness checks (contiguous neighbour, distance decay, population weighting, and combinations thereof).

They are aware of the simultaneity problem and the bias that would introduce using OLS, so their estimations are made using Max Likelihood.

The principal finding was that the coefficient on the spatial lag was positive and significant, and this was robust to the different weighting measures. This implies that for the period, strategic tax rate setting occurred and the best response of a community who was faced with increased rates in a neighbouring community, was to themselves raise rates. In game theory this means that communities are strategic compliments.



S. Gibbons & H. Overman

A Summary with some additions from the lecture and Under the Hood Issues in the Specification and Interpretation of Spatial Regression Models, L. Anselin, Agricultural Economics 27 (2002) 

Spatial Models and Their Motivation

The inclusion of spatial effects is typically motivated on theoretical grounds that there is some spatial/social interaction that is affecting economic outcomes. Evidence of this will be spatial interdependence. Thus models are created that seek to answer how interaction between agents can lead to emergent collective behaviour and aggregate patterns. These might also be termed neighbourhood effects.

To start with a basic linear regression:


yi = xβ + µi


Where x is a vector of explanatory variables and β is a vector of parameters with µ as ever being the error term. This basic format assumes that each observation is independent of the others. This is generally too strong when in a spatial context as events in one place often affect events in another, particularly if they happen to be close to each other. A simple way of capturing the effects that nearby observations have on each other is to define a weights vector w which reflects how observations affect each other (for example distance weighting etc.). If this weighting system is multiplied by y then we have a matrix w’iy which for observation i is the linear combination of all y with which it is connected. If the weights are summed to 1 then this will give a weighted average of the neighbours of i.


Spatial Autoregressive Model

This weighted average can then be used to construct the spatial autoregressive model (SAR) which is also known as the “spatial y” model, and is referred to as a spatial lag. This model attempts to uncover the spatial reaction function, or spillover effect. The model looks like this:


yi =ρw’iy + xiβ +µi


The idea is that an individual observation is affected both by their own characteristics and recent outcomes of other nearby agents who are capable of influencing his behaviour. One example may be that when determining at what price to sell one’s house, the individual characteristics such as number of bedrooms are taken into account as well as property prices achieved by others in the vicinity. In this case the Beta captures the effects of the individual characteristics and the Rho captures the causal effect of neighbourhood actions.


Spatial X Model

Alternatively we may drop the assumption that yi  is affected by neighbouring y outcomes, and instead assume that it is affected by spatial lags of the observable characteristics. This is then a spatial x model (SLX):


yi = x’iβ + w’Xγ + µi


This assumes that the observable neighbourhood characteristics are determinants of yi. As in the above example this could be the characteristics of neighbourhood housing such as appearance, size etc. influencing individual price decisions. Beta is as above, and Gamma is the causal effect of neighbourhood characteristics.


Spatial Durbin Model

The spatial Durbin model (SD) combines SAR and SLX:


Yi = ρw’iy + x’iβ + w’Xγ + µi


Interpretation is as above indicates.


Spatial Error Model

This model drops the assumption that outcomes are explained by lags of explanatory variables, and instead assumes that a type of SAR autocorrelation in the error process. This yields:


yi = xβ + µi ; µi  = ρw’iµ + vi


This model assumes that outcomes are dependent upon the unobservable characteristics of the neighbours.



OLS with a lagged y variable (SD and SAR) yields inconsistent estimates unless Rho equals 0. This is because w’iy is correlated with the error term. [need help here] The gist of it seems to be that the average neighbouring dependent variable includes the neighbour’s error term, the neighbour’s neighbour’s error term etc., such that any observation i depends to some extent on the error terms of all the other observations. [I assume this would not be the case if the weighting restrictions were set to only include the nearest neighbor]. The intuition behind this problem is that you are your neighbour’s neighbour. In the simple i-j case the following occurs:


yi = ρyj + xiβ + εi (1)

yj = ρyi + xjβ + εj(2)


Substituting (2) into (1) we get:


yi = ρ(ρyi + xjβ + εj) + xiβ + εi


Which shows that yi is dependent in part upon itself.


Using OLS for the SLX is also problematic, as the assumption underlying OLS is that the error term is not correlated with the regressors. For the SLX model this means that E(ε| x) = 0 and E(ε| Wx) = 0. However if there is spatial sorting for example when motivated parents locate themselves near to good schools, then this assumption is violated as E(ε| Wx) ≠ 0.


The SE model may generate consistent estimates as the assumption that the error is not correlated with the regressor holds, however, standard errors will be inconsistent as by definition the model has autocorrelated error terms. This can lead to mistaken inferences.


Standard errors are inconsistently estimated for all models.


Additionally, the different types of model are difficult to distinguish without assuming prior knowledge of the data generating process which in practice we do not have.



Maximum Likelihood

These problems can be got around using Maximum Likelihood estimation which will provide consistent estimators. Essentially this is the probability of observing the data y given a value for the parameters Rho and Beta. A computer uses iterative numerical maximization techniques to find the parameter values that maximize the likelihood function. [I am totally unclear on how this works, however, I am assuming that we do not need to know the ins and outs.]


The issue with this specification is that it assumes that the spatial econometric model estimated is the true data generating process. This is an incredibly strong assumption that is unlikely to hold in any circumstance.


Instrumental Variables

In theory a second order spatial lag w2’xi (or even third, fourth order) can be used as instruments for w’yi  and then this “exogenous” variation in the neighbourhood outcome can be used to determine yi under the assumption that the instruments are correlated with Wy but not directly with yi. The first stage would look like this:


wy = w’xβ + ρw2xβ + ρ2w3xβ… 

and then the predicted values of wy would be used in the second stage regression with yi as the dependent variable.


There are problems also with this technique. Firstly it is unlikely that the true nature of w is known, and that it is correctly specified is crucial to the model. For example X variables may have an effect over a 5km distance, but the weighting system incorrectly restricts analysis to 2km.  Secondly the higher order lags of the X variables could still be having an effect upon yi and hence the exogeneity restriction is violated, and the 2SLS results are biased. Lastly the different spatial lags are likely to be highly correlated, and as such there will be little independent variation which is essentially a weak instruments problem. Weak instruments can severely bias second stage coefficients which will additionally be measured imprecisely.


The Way Forward

  • Panel data can allow for differencing over time to control for fixed effects. But the problems will be the same as above, but only in the context of differenced data.
  • In terms of the IV strategies, genuinely exogenous instruments should be found such as changes to institutional rules [see later tax paper summary].
  • They argue that the SAR model should be dropped, and if neighbourhood effects cannot be identified using genuine instruments, a reduced form of the SLX model should be used.
  • Natural experimental techniques from other economic literatures should also be borrowed e.g. DID, Matching. These techniques may help us to find causal effects but the tradeoff is that they are only relevant to some sub-set of the population (as in the Local Average Treatment Effect for IVs).




E.L. Gaeser and J.A. Scheinkman

A Brief Summary


Identifying Social Interactions

Inequality, concentrations of poverty and other outcomes may be the partial result of social interactions. Thus interventions that seek to address these phenomena may operate through both their incentives upon individual actors and through these social interactions. However, as policies are generally aimed at the former, it is difficult to quantify the effect, if any, the policy has through social networks. Whilst theory abounds that attests to the effect these interactions can have on the distribution of private outcomes, simply studying the outcomes tells us nothing about whether it was the interactions, or the private incentives that were responsible.


Different methods have been used to identify the interactions. One way is to look for multiplier effects, i.e. identifying the social effects as those operate above and beyond the private effect. However this requires being able to state exactly what the private effects should be in order to look at the difference, and this is generally not possible. A more promising approach looks at the results of interventions that operate directly on the social interactions and not on the private incentives. Three approaches are relevant:

  1. Interventions that change group membership. If group membership is changes (with no change to private incentive), then any changes in outcome could plausibly be attributed to the group effect. The problem here is that private incentives often change alongside, and it is often hard to enforce such that individuals simply do not revert back to their original groupings.
  2. Changing the private incentives for a sub-set and seeing if there are effects on others whose private incentives are not changed.
  3. Interventions that seek to directly challenge social norms such as mass media campaigns. The identification issue here is that the changes may simply affect private preferences rather than acting directly upon the social norms.

Econometric identification of social interactions is very hard, and perhaps the strongest evidence for these phenomena are the persistent degrees of stratification amongst populations.


Econometric Possibilities

The basic concept of social interactions is that one individual’s actions are made based in part upon the actions of another individual or grouping of individuals. Various techniques are used to empirically test these interactions [see later summaries for spatial context]. In general however, these specifications are subject to three problems:

  1. Simultaneity: A’s actions may be affected by B’s, but B’s will most likely be affected by A’s at the same time. This means that any regression that includes B’s actions as an explanation of A’s will suffer from endogeneity. Endogenous and exogenous interactions cannot be separately identified.
  2. Correlated unobservables problem and the related errors in variables problem. This arises is there is some group specific component of the error term that varies across groups and is correlated with the exogenous characteristics of the individuals. The unobservables could arise from preferences, or environmental settings.
  3. Endogenous membership problem – people may sort into groups based on unobservable characteristics. This is similar to selection bias.


The challenge then is to see whether these issues that can generally be lumped together as endogeneity issues, can be circumvented using techniques such as instrumental variables, quasi-experiments, or randomized control trials.


Spatial Smoothing and Weighting


Notes from Haining (2003) Chapters 5 and 7.1, and lecture

There are different conceptual models of spatial variation: 

  • The Regional Model: The emphasis is on the definition of regions as spatial units used for analysis. There are three types of region of particular interest
    1. Formal/Uniform: Constructed using sharp partitioning of space into homogenous or quasi-homogenous areas. Borders are based upon changes in attribute levels. This is fairly simple to achieve with small numbers of variables, but becomes harder the as the number increases unless there is strong covariance between them all. This method is also tricky when the variable(s) of interest are continuous over space.
    2. Functional/Nodal: regions are constructed using interacted data. Whereas formal regions are characterized by uniformity of attribute level, the functional region is bound together by patterns of economic or other interaction which set them apart from neighbouring districts. E.g. Labour market regions are defined by attribute similarity (such as travel to work time).
    3.  Administrative: These regions are the consequence of political decisions.


  • Rough and Smooth: This model assumes there are two components to spatial data such that

data = smooth + rough

 The smooth part is regular or predictable such that knowing part of the distribution allows for extrapolation at the local level. The rough part is irregular, and is what is left over once the smooth component has been accounted for. The rough component cannot be used for extrapolation even though it may be explainable in terms of the underlying data generating process. Smooth structures could include topographical features, a propensity for similar data values to be found clustered together (spatial autocorrelation), trends or gradients etc. Rough could include local hot/cold spots, spatial outliers, and localized areas of discontinuity.  

  • Scales of Spatial Variation: This recognizes the different scales of spatial variation such that: 

Spatial Data = Macro(scale variation) + medium/micro(scale variation) + error

The error here includes measurement error. The macro component refers to trends or gradients present across the whole study area e.g. a south-north linear trend. The medium/micro component refers to more localized structures that are conceptualized as superimposed upon the macro variation e.g. localized disease hotspots, or cold spots. If the data map is dominated by the micro variation, then it displays special heterogeneity. In such a circumstance analysis should probably be conducted at the local level.


The aim of map smoothing is to remove the distracting noise or extreme values present in data in order to reveal spatial features, trends etc. in other words to smooth away the rough component of the data to reveal the underlying smooth pattern. It tries to improve the precision of data without introducing bias.

 Resistant Smoothing of Graph Plots

This method is applied to graphical plots to aid visualization and to identify trends and relationships. This essentially fits smooth curves to scatter plots using locally weighted regression with robustness iterations. A vertical line is centred on an observation of x, and then a bandwidth applied around it. The paired observations that fall within this bandwidth or window are assigned neighbourhood weights using a weights function that has its maximum at the centre of the window and decays smoothly and symmetrically to become zero at the boundary of the window. A line is fitted to this subset of data by weighted least squares to create fitted values of y. The window is then slid along the x axis and the process repeated to create a sequence of smoothed y values. The bands overlap as generally the window is applied to each ordered value of x.

 Essentially the idea is as follows:

  •  x(s) is a variable that has a spatial element s
  • m(s) is the smooth part of the data and is a function of s although the function is unknown and very likely to be non-linear. This is the part we are trying to estimate
  • ε is the rough part we are trying to smooth away

 The most common way to get at the estimation of m(s) takes the form:


Where w(s*, sj) is a scalar weight assigned to data point j given its distance (or other weighting scheme) from location s* which is where the window is focused.  It should be noted that the sum of the weights equals 1.

Thus the predicted value of m(s) is basically a moving weighted average.  

There are various different weighting structures that can be used.

  • Nearest Neighbour: Here the eights are based on some predefined set of k nearest neighbours such that w(s*, sj) = 1/k if data point j is one of the k nearest neighbours to location s* and 0 otherwise.
  • Kernel Regressions: this assigns weights based upon the distance data point j is to the kernel point s* such that:


Here h is the bandwidth such that the window is always the same. Decisions about how best to aggregate the data are needed here, and should be drawn from some understanding of the underlying process. The denominator here ensures that the weights sum to 1 over the relevant bandwidth. Different kernels can be used such as the Uniform (rectangular) kernel, or the Normal (Gaussian) kernel.

 Using these weights we can run a locally weighted regression that estimates a coefficient for every point of data j based upon the bandwidth and weights assigned. These local polynomials can then be made to join locally to create a spline. It should be noted that these methods encounter problems at the edges of the data.

Map Smoothing

The above methods apply in two dimensions (left-right). However, when using data that has more dimensions (North, South East, West) as in spatial data, the spatial kernels can be combine into just one dimension called distance. This method assumes that you can weight points equally in either direction of s*. In practice however, we may need to weight the N-S or E-W directions differently (for example house prices in south London may be more likely to increase as we move north toward the centre, rather than south toward the suburbs). In other words the method assumes the spatial function is the same in every direction. If this is unlikely to be the case, then there is an argument for using narrow bandwidths.

The idea of map smoothing is to improve the precision associated with area data values whilst not introducing serious levels of bias. It is very important to decide upon what size of window is to be used for the smoothing. A large window improves the precision of the statistic because it borrows a lot of information from the rest of the observations (effectively this is increasing the sample size) but the cost is that there is a greater risk of introducing bias as information is borrowed from areas further away that may be different in terms of the data generating process. A small window reduces the risk of bias, but also decreases the precision of the estimate.


The effectiveness of local information borrowing when smoothing depends upon local homogeneity. If local areas are very different (i.e. the rough component is very large) then even very localized smoothing can introduce bias. This homogeneity will itself be affected by the size of the area that is sampled (or density of the sampling points) relative to the true scale of spatial variation [see MAUP]. In the case of high heterogeneity smoothing using local information may not be as effective as smoothing using information borrowed from data points that are similar to the area in question (even if they are not located nearby).


There are a variety of different methods for map smoothing


  • Mean and Median Smoothers: in this case the value of data point j is replaced with the median, or mean drawn from the set of values (including itself) that are contained within the window imposed on the map. The window may be defined in terms of distance, adjacency, or other characteristic. The rough component of the data can then be extracted by subtracting the median/mean from the observed value. The choice between the two is important. Mean smoothing tends to blur spikes and edges and so may be appropriate in environmental science where localized spikes are not generally expected. Median smoothing tends to preserve these spikes and hence may be more useful for social applications where there can be large localized spikes. In any case, the performance of these smoothers will depend upon the weights assigned to them.
  • Distance weighting is often used. This can be a simple nearest neighbour weight scheme as explained above, although this type of neighbourhood weight function causes values to change abruptly and smoothers based on them can have undesirable properties. A simple distance weighting scheme can be used where data are assigned the values based on weights:

 Wj = dij-1 / ∑dij-1 and then the predicted value of m(s) = ∑wijxj


An additional restriction can be added such that the only values assigned weights and used in the estimation conform to some rule dj < D otherwise they equal 0. This then has become a kernel smoother as described above. It is traditional to omit the observed value of the kernel point from the analysis (i.e. exclude observation i from the calculation of the mean for point i).

The literature has shown that the precise choice of weighting function is generally less important the choice of window. There is always a variance/bias trade off to be made. The choice should be made based upon either knowledge of the nature of local mean variability.

There are other types of smoothing such as “headbanging” and median polishing, but at this time they appear to go beyond the scope of the course.  

In the above example, house price points are smoothed into a raster grid to assign a value to the raster square using an inverse distance weighting scheme.  

In practice weighting schemes can be based on a variety of different things: social weights, network distances, migration flows, travel times, income differences etc.

Modifiable Area Unit Problem (MAUP)



A.      Briant, P. P. Combes & B. Lafourcade, Journal of Urban Economics, 67 2010)

(Notes from Lecture, Problem set and above referenced paper) 


The Modifiable Area Unit Problem (MAUP) arises when working with statistical data, and is concerned with the sensitivity of results to the particular choice of zoning system that is used in the analysis. The core of the problem is that we often do not know how driven our statistical results are by the shape/size of the areas under examination and how data within those areas have been aggregated. As the paper shows, coefficients can vary depending on how data are aggregated within different area units.

For example when examining the effects of agglomeration on regional economies, it is often unclear whether results are truly driven by knowledge spillovers, labour pooling effects etc. or whether results are simply the product of how the data are organized. This is an important issue when thinking about policy regarding cluster formation strategies. When investigating such economic processes that have spatial characteristics, results will be affected by shape/size of the clusters used, when those clusters do not accurately reflect the underlying economic realities. Put another way, if data are generate by a particular spatial process, results of analysis will be affected when the units employed do not reflect the underlying data process. Returning to the agglomeration example, if the units of analysis are not well chosen then the zones will pick up/fail to pick up the effects of industries not-in/in their true field of economic influence. This will then tend to over/understate the effects on agglomerations on economic outcomes.


The shape effect refers to results that are driven by different shapes used in analysis. In the boxes, if a black dot is skilled labour productivity and the red is unskilled, then we see even distribution of productivity in the top panel. However, redrawing the shape means that we identify two clusters of high productivity workers, and two clusters of low productivity workers.


The size effect refers to the size of the units. I have not drawn an example but it is easy to see that using smaller triangles may find smaller less dense clusters of high productivity areas, and similarly clustered low productivity areas.


Illustrations with Simulated Data

Arbia (1989) says that the problems of shape and size distortion are minimized (not eliminated) when there is exact equivalence of sub-area (in terms of size and area), and there is an absence of spatial autocorrelation. In practice these two restrictions will very rarely be met. With regard to the lack of spatial correlation, this will rarely be the case as there are likely to be spillover effects from activities or outcomes or processes in one area upon another.


Amrhein (1995) simulated data by randomly generated variables and randomly assigning them to a Cartesian address (i.e. no autocorrelation). He then varied the unit size to 100, 49 and 9 squares of observation. Under these conditions he was able to show that means do not show any particular sensitivity to shape/size effects, although variances increase (and hence standard errors) as the sample size decreases (i.e. when moving to a smaller number of units).


Amrhein & Reynolds (1997) then went on to show that distortions using real census data were sensitive to both shape and size effects but also the aggregation process (i.e. whether data are summed or averaged). It is fairly intuitive to understand this. If data are aggregated by summation, they will be more distorted by an increase in size (as more observations are added) than averaging (the effect of adding more observations is reduced due to the effect of averaging).


Correlation Distortions

The MAUP can produce distortions in multivariate analysis of data drawn from spatial units. The effects of shape/size and the effects of aggregation have been separated out. Amrhein (1995) finds that his coefficients are sensitive both to size and shape, but if the model is well specified the method of aggregation seems to imply fewer distortions. When aggregation affects the dependent and dependent variable in the same way, the effects are small, and this goes for the shape effects also. However, when they are aggregated differently, they will not have the same degree of spatial autocorrelation and hence the size etc. effects will be larger.


Thus the problem is likely to be smaller for wage regressions (where data are averaged) than for gravity regressions (where data are summed).


Testing the Problem

The authors look at the effects of employment density on wage levels in France using a variety of different zoning methods. Specifically they use

  • 341 Employment areas – government units based on minimizing daily commute between zones.
  • 21 Regions
  • 94 departments
  • 22 Large squares
  • 91 medium squares
  • 341 small squares
  • 100 semi-random zones.

Some of the data are summed (employment/trade flows) whereas some are averaged (wage rates); the former increase with the size of the units, whereas the latter are relatively more stable across different zoning systems.


The first thing they do is to calculate the Gini coefficients within every zoning system for the 98 industries examined throughout 18 years and rank them. Using rank correlation analysis they show that the ranking of industries is virtually unaffected by changes in zoning system. However when they use different indexes (e.g. Ellison and Glaeser indices) to rank the industries the rank coefficients are lower meaning that MAUP problems are more pronounced when using alternate indices. This indicates that the index rather than the MAUP is a more significant problem, and thus careful specification of index used should be a more primary concern than issues to do with units of observation.


They then undertake basic regression analysis of log wages on log employment density and a vector of controls. They find that the effects of moving from one zoning system to another are generally small. This is so especially when moving from very small to slightly less small units. The grid system is generally more sensitive which indicates that boundaries that do not reflect administrative/economic realities do generate more error. However, when they control for observed and unobserved skill levels (in order to see if workers are sorting into high density employment areas) the coefficient on density changes by an order of magnitude more than it did for changes due to MAUP. Thus, specification of the model again appears to be significantly more important than MAUP.


Similar changes are observed when market potential (measured as distance to other employment centres) and different definitions of market potential are controlled for. These changes to specification are all more important than MAUP.


Lastly they look at gravity equations for which the dependent variable (trade flows) are summed within zones, and these summations make the coefficients more sensitive to MAUP problems.

Intro to Spatial Analysis


Notes from Lecture and Chapter 2/3 Fothering 

Spatial Analysis

Quantitative geography (spatial analysis) differs from econometrics in as much as it is concerned with data that has a spatial element i.e. data that combine attribute data with locational data. It prime use is to generate knowledge about how data is generated in space, and hence it seeks to detect the spatial patterns that are generated due to physical and social phenomena. Data can be visualized which can enable detection of patterns, but then quantitative analysis can be used to examine the role of randomness in generating those patterns and to test hypotheses about them. All in all, spatial analysis is a testing ground for ideas about spatial processes

Spatial Data

Spatial data are observations of phenomena that possess a spatial reference. Such data may be captured by digitizing maps, collected by survey, or remotely sensed by satellite (amongst other ways).


  • Spatial Objects:        Spatial objects are of three basic types, points, lines or areas. Essentially they are things that can be represented in 2 dimensional space by drawing of some kind of shape e.g. houses, railway lines, county boundaries. They all have some spatial reference that describes the position of the object on the surface of the earth. As well as the spatial reference the object can be associated with some observable characteristic of that object such as elevation, test scores in schools, amount of output at a factory.
  • Fields:          Fields are used as an alternative to objects. Measuring some continuous variable such as elevation, or air density may be hard to achieve using points as the variation is spatially continuous. These types of spatially continuous variables are called fields, and whilst for some very basic fields it may be possible to derive the function that describes the spatial variation, in practice for the vast majority of fields the function remains unknown. In such a case it is simpler to measure the field in a discretized form at regular intervals such that the observations form a regular lattice or grid (they may also be measured at irregular spaces, but this is not so common). The field is a measure of variable x that is geographically referenced to a location on the earth’s surface. Fields can be either fixed (non-stochastic) as in elevation, or random as in income.


Even if we have data on the entire population (which in itself would be rare), it remains only a snapshot in time. There is some underlying process that is generating that data set, and to uncover that process we need to conduct statistical analysis. In other words, simply collecting data for every member of the population will not necessarily lead to an understanding of the underlying processes.



In order to conduct analysis we need a consistent means of describing locations on the earth. Latitude and longitude are one such method, lat being measured north form the equator, and longd being measured east or west of the Greenwich meridian. In order to calculate distances using this projection spherical trigonometry is used. As this can be somewhat cumbersome it is often easier to ignore the curvature of the earth and consider the data to lie on a flat plane. Then a Cartesian coordinate system can be used and Pythagoras’ theorem can be used to calculate distances. This is only appropriate when examining a relatively small area such that ignoring the curvature does not produce undue distortions. British National Grid is one example of such a system.


Distance can then be calculated thus:


However, this distance may not be the most meaningful, for example to a car or pedestrian in a city, the line of sight distance fails to take into account buildings and other obstacles that lie in the path of the two points.


Representing Spatial Data

Spatial data are generally represented in two ways.


  • Vector Model:           In the vector model the physical representation of objects and lines closely matches the conceptual reality. A single point with attributes is represented by an ordered coordinate pair and a vector of attributes. Data is frequently stored within closed polygons (such as boundary lines). This method can also be used to store network lines such as a rail network.
  • Raster Model:           The raster model only stores data values ordered by column within row, or row within column i.e. a grid or lattice. Data is stored that details the origin of the lattice such that each cell can be related back to a geographical location on the surface of the earth. The accuracy of the location is dependent upon the size of the grid cells. Additionally only one single attribute can be stored in the grid, so if two points with different attributes hash to the same cell some decision will have to be made as to which one is stored. Raster data can be discrete (e.g. urban or rural) or continuous (e.g. population density).

Which method is used tends to depend upon the type of data being used. If the data are already in lattice form (such as satellite data) then the raster method is easiest, whereas is positional accuracy is a concern then the vector method is preferable. Often raster data is used to represent fields, but this need not be the case.

Problems and Opportunities

There are a range of problems associated with spatial data analysis such as identifying spatial outliers, edge effects etc. However, there are two that are of particular importance.


  • Spatial Autocorrelation:      A fundamental assumption of statistical analysis is that observations are independently drawn i.e. the value of one observation does not affect another. However, particularly in spatial analysis this is hard to assume as everything is related to everything, but in particular near things are more related to each other than distant things. Data from geographic units are tied together by a variety of factors – contiguity, social character of area, different people being in different locations. There can be spillovers from activity in one area to another, and hence independence is violated. This is called spatial autocorrelation.
  • The main problem with spatial autocorrelation is that the variance of estimators is affected and this in turn affects statistical significance, and hence the construction of confidence intervals. Therefore if we ignore positive spatial correlation the standard errors/confidence intervals will be too small, and if we ignore negative correlation they will be too wide. This can therefore affect decision rules when hypothesis testing can lead to incorrect conclusions.
  • MAUP:         The modifiable area unit problem is concerned with data that has been aggregated into different zones. This will be examined in detail in forthcoming summaries.




E. Duflo, P. Dupas & M. Kremer (2009)

Principal Research Question and Key Result Is it enough to simply add more resources to education in order to increase measurable outcomes, or is there a need to also change the incentives that face education providers? Using a field experiment the authors can directly compare the outcomes of two such policies that occur in the same context. They find that there are significant effects on test scores for children from assigning them to a class with a short-term contract teacher as opposed to a regular civil service teacher. There are also significant effects when both types of teacher are subject to supervision by a trained committee of parents. Overall the results only persist for kids in contract teacher classes with parental supervision.


Theory Throwing more resources (teachers) at educational facilities may not actually increase test scores. This could be for a number of reasons for example insufficient learning materials, poor incentive structures, other variables (such as health) that prevent learning, or simply that class sizes even when significantly reduced are still too large for the extra resources to have any meaningful impact. It is important to assess the practical relevance of such factors as they will have implications for policy. The factor being examined here is incentives. They want to examine in effect whether reforming school systems might be more effective than simply increasing resources.


Motivation Whilst access to education has increased hugely in the developing world, it has become clear that this has not necessarily translated into increased competency in basic skills. There is evidence that increasing participation without changing methodology or environment (deworming, school meals etc.) may have little effect.  Some countries prefer to hire less experienced, short contract teachers on lower pay than government teachers as they are thought to be easier to motivate. However, field experiments have not been designed to evaluate both the effect of increasing resources, and changing incentives. Until now that is…
  • The data are from an experimental program in Western Kenya. 140 out of 210 schools were randomly selected. Of these 70 became part of the Extra Teacher Program (ETP) and 70 were control schools. The ETP schools were given funding to hire an extra contract teacher. Whilst contract teachers are already used in Kenya, as they are self-funded by parent contributions there is a chance that the presence of a contract teacher is correlated with variables of how important education is in the community. The use therefore of experimental data ensures that the treatment is distributed independently of other characteristics that may affect exam results. This teacher would teach a randomly assigned 50% of year 1 students for a whole year, and then follow them into the second year. The regular civil service teacher had the remainder of the class. For a further subset of 35 schools (of the ETP schools) a school committee was trained in how to monitor evaluation of the contract teacher and were encouraged to hold a 1 year review to see if the contract would be renewed.
  • There were very few requests to move classes, so randomization should have been largely preserved.
  • The outcome variable was scores in standard tests


  • OLS regression with dummies for whether students were in ETP schools, interacted with dummies for civil/contract teacher further interacted with dummies for whether there was a committee overseeing.
  • As well as test scores as dependent variable, they also use attendance measures to evaluate teacher incentives in terms of effort.


Results Teacher Effort

  • Civil service teachers in ETP schools 15% less likely to be in school relative to comparison schools. They may have taken advantage of extra teacher to work less. Contract teachers in ETP schools with no committee 30% more likely to be in school than civil service counterparts in same ETP schools indicating strong incentives to perform due to contract. The presence of a committee did not appear to change the attendance of the contract teachers. The civil service teacher with committee was 9% more likely to be in class perhaps because the incentive facing the contract teacher meant he was not incentivized to do the work of the civil service teacher.


  • Students in the reduced class sizes showed no statistically significant improvement in scores.
  • Students in contract ETP schools score 0.21 SDs higher than colleagues in civil service classes and 0.24 than non-ETP schools. This is robust to controls for teacher demographics. This could be either because the committees are better at picking teachers, or because they face better incentives.
  • Students in committee schools in the contract class did not do better than non-committee contract kids. However students in civil committee class did do better than the comparison schools, so the committee seems to have had an effect on the civil service teachers.
  • The effects all disappear in post program evaluation except for the committee schools.


  • There was a roughly 20% attrition rate in tests which is worrying for the results, if those students were the less able then the results will be upward biased. It is not clear why a subsample of only 60 children were given tests and not every child.
  • It is not totally clear that incentives are driving the different results. The committees could be better at picking teachers. Additionally the contract teacher remained with the class for two years, whereas the civil service teachers chop and change; thus it could be the continued presence of one teacher that is making the difference.
  • Whilst the findings on class size are interesting they can say nothing about the effect of reducing class size from say 80 to 10.
  • It is not clear that the presence of committees improves outcomes through monitoring rather than general awareness about education in the community.
  • External validity is hard to assert, as the Kenyan system is very specific in terms of the process of hiring teachers etc. Additionally, we cannot assume that making all teachers contract workers would have beneficial effects as the positive increase in effort by contract teachers could be being driven by the fact that they want to become civil service teachers.


  • The fact that smaller class sizes did not improve performance indicates that simply reducing class size is not efficient (perhaps because reducing from 80 kids to 40 is not enough of a difference, or alternately because reduction in class size also reduced teacher effort).
  • The fact that only the committee schools saw persistent benefits suggest that whatever it is about having contract teachers is only made permanent by the presence of committees.





E. Duflo

American Economic Review, Sept 2002

Principal Research Question and Key Result Can investments in infrastructure increase educational attainment and does this then have labour market implications? The estimates suggest that in the context of Indonesia, each new school per 1000 children was associated with an increase of 0.12 to 0.19 in years of education resulting in 15. To 2.7% increase in earnings for those fully exposed to the program.


Theory If education increases the productive capacity of an individual, then having more education will lead to increased earnings all else constant as the wage rate equals the marginal product of capital.  There are important assumptions underlying this theory that may not always be satisfied in practice. For example, firms are assumed to have no wage setting power and to be able to directly observe levels of human capital. Additionally there can be no externalities, and in particular there can be no offsetting equilibrium effects if we are to find statistically significant associations.


Motivation There is a large body of evidence that suggest that there are significant returns to education, and that these returns tend to be higher in developing countries where the marginal effect of further education is higher. However cross country regressions of wage on education are difficult due to the incompatibility of educational quality and hence data, as well as problems of controlling for important unobserved characteristics such as community, and ability. This paper uses a natural experiment to look for exogenous changes in the availability of schooling in regions of Indonesia, and how this changed the amount of education attained, and consequently how that impacted the wages of those affected.


  • In 1973 Oil revenues in Indonesia were mobilized for the INEPRES project to provide increased educational facilities with a focus on provision in areas where enrollment rates were historically low. c.62,000 schools were constructed. At the same time the government recruited and trained a suitable number of teachers.
  • Data is a cross section from the 1995 census in Indonesia, of men born between 1950 and 1972. This data was matched with regional census data to match where these men were schooled, and how many INEPRES schools were constructed in that area. The date of birth and the region of birth jointly determine how much an individual was exposed to the program. E.g. children over 12 in 1974 did not benefit as they had already left primary education by the time the schools came into being. For younger children, exposure to the programme is a decreasing function of their age i.e. the older they are, the less exposed they were. Region of birth also denotes intensity of exposure as some regions received more schools than others.
  • The schools measure is schools constructed per 1000 children.


Strategy Effect on Education

  • First, basic DID using means is calculated in the form of summary stats.
  • This is then done in a regression analysis including an interaction term between the intensity in region of birth, and whether the individual is old enough to have befitted (i.e. falls in the treatment group). The control group is therefore those who are not young enough to have benefitted from the policy. She controls for birth year fixed effects, region specific fixed effects and a vector of region specific observables.


  • Same idea as above basically

IV Strategy

  • If program had no effect on wages other than through education then the program can be used as an instrument. Indeed she shows  using different subsamples that there is no wage effect in regions that did not see an education effect which indicates that the increase in wages in the applicable regions is being driven by the increase in years of education. This makes the exogeneity restriction very plausible.
  • Thus she uses the interactions between the age in 1974 and the program intensity in region of birth as instruments for changes in educational achievement.


Results Education

  • Summary stats reveal that those kids in areas of high program intensity had lower years of education and wages which reflects the fact that schools were targeted at educationally poor regions primarily. The difference between the high/low areas differenced by the differences in received/did-not-receive treatment (by age) is the casual effect. That is, one extra school per 1000 children increases education by 0.13 years.
  • The DID regression gives results of 0.12 years for the whole sample, and 0.2 years for the subsample of those employed in 1995.


  • The DID specification indicates that increasing one school per thousand increases earnings of those fully exposed to the program by 1.6 to 4%.


  • The 2SLS estimates increase upon the OLS estimates although they are statistically equal, and they are robust to the inclusion of the controls as outlined above. They are higher in sparsely populated regions which indicates that a local average treatment effect (LATE) is being estimated (see problems below).  The wage earners are found to earn c.10% more if there was 1 extra school per 1000 in their area.


  • A placebo difference in difference using older age groups from 1974 returns results very close to 0 which lends support to the assumptions underlying the D-I-D.
  • Controls for other INEPRES programs in the region as well as initial enrollment rates which increases the estimates somewhat indicating the main results are not being downward biased by omitted variables.
  • A further control [placebo] experiment in the DID regression is undertaken that returns very tiny results.
  • Interacts a dummy indicating being of a certain age with the schools constructed per region to see effect on all age ranges. The coefficients only start increasing at age 12 and all coefficients are significant from age 0 to 8. This indicates that those who were not old enough to benefit from INEPRES in fact did not which lends support the identification strategy.
  • Presents results for subsamples of regions that show that the program had no effect in densely populated areas. In sparsely populated areas each new school significantly reduced the distance to education.


  • Whilst it is suggested that the results are conservative as the program was targeted at areas that had low enrolment rates the data suggest otherwise. When log(1-enrolment rate) is regressed on log(INEPRES Schools) the coefficient is 0.12. If the program had been targeted as stated this coefficient should be close to 1. This means that there could be bias in the results if the program was actually targeted at areas where the returns to education were already likely to be higher based on wealth, or some other unobserved characteristics.
  • If employment opportunities in different regions were changing such that people’s attitudes to education were changing (but to different degrees) then we have time varying and region specific variation, that could lead people to exploit the program in different ways. This correlation would confound the DID results which rely on the fact that there is no region specific time varying variable that is correlated with the program.
  • Micro level data cannot properly account for the externalities generated by such a program. For example there were almost certainly spillover benefits on fertility, mortality etc. which are not accounted for, and therefore it is possible that the wage estimates are somewhat underestimated.
  • There is likely to be bias in the wage results as data are only collected for those employed, not including the self or unemployed. If family pressures to be employed meant that children of such families were likely to receive more education then we have an omitted variable problem that could upward bias the results. This should be partially solved by the IV strategy which is admittedly strong.
  • The IV restrictions are plausibly upheld. The instruments a strong in the first stage (relevance) and plausibly exogenous. However, the resulting estimate is only a LATE for those capable of being affected by the IV. In this case that will be those for whom distance to school is a factor that colours their decision as to whether to attend or not.
  • As with any such study there are external validity concerns. In particular the huge drive for education in Indonesia at that time made the program ripe for success, a success which is by no means guarantees in other contexts. There could also be long run general equilibrium effects which would confound any short run benefits. Additionally since the IV indicates that the program affects those capable of being affected by the IV (i.e. those for whom distance to school is an important factor when deciding whether to attend or not), this indicates that such a program will not work in all settings (e.g. urban settings). The program was particularly well designed, and care was taken to ensure that teacher quality was not affected. In other settings it is not clear that teacher quality could be maintained such that the benefits of expansion of schooling are not offset by reductions in quality.


  • The IV results indicate that estimations of returns to education in developing countries are not biased upwards due to unobserved community effects as some have argued.
  • Increasing access to education can have benefits, but generally only in regions where distance to nearest educational facility is an issue. In the densely populated regions in this study where the program would only have affected class size, there were no observable effects.  For such urban regions other tactics will need to be used in order to increase participation rates in education.