Category Archives: Choice Models

SPATIAL INTERACTION, GRAVITY AND DISCRETE CHOICE MODELS

SPATIAL INTERACTION, GRAVITY AND DISCRETE CHOICE MODELS

Notes from Lecture 

Intro

Firms and individuals have choices over discrete alternatives such as which mode of transport to take, or where to locate their businesses. These choices are modeled using the random utility model in order to aide in economic interpretation of those choices.

Random Utility Model

This was developed by Daniel McFadden and underlies the discrete choice model. This model holds that preferences over alternatives are a function of biological taste templates, experiences and other personal characteristics some of which are observable, others of which are not (cultural tastes etc.), and the function is heterogeneous within a given population. This indicates that an individual/firm’s utility from choice j can be decomposed into two components:

Uij = Vij – εij 

where V is an element common to everyone given the same characteristics and constraints. This might include representative tastes of the population such as the effects of time and cost on travel mode choices. ε is a random error that reflects the idiosyncratic tastes of the individual concerned as well as the unobserved attributes of the choice j.

V is observable based on consumer/firm choice characteristics such that:

Vij = αtij + βpij + δzij

where t is time and p is price and z is other observable characteristics.

In a setting where there are two choices (e.g. car or bus to work) we observe whether an individual chooses car (yi = 0 ) or bus (yi = 1). Assuming that individuals maximize their utility, they will choose bus if this exceeds the utility from going by car Ui1 > Ui0 which means that Vi1– εi1 > Vi0 – εi0 which indicates that εi1 – εi0 < Vi1 – Vi0. Therefore the probability that we see an individual choose to go by bus is:

P(εi1i0 < Vi1 – Vi0)

Which is equal to P(εi1– εi0) < α(Ti1 – Ti0) + β(Pi1 – Pi0)) 

If we are willing to assume that the probability depends linearly on the observed characteristics then this can be estimated by running the following OLS regression:

Yi1 = α(Ti1 – Ti0) + β(Pi1 – Pi0) + εi1

At this point further observable characteristics can be added, z.

However, as is well known, the OLS model is not bounded by 0 and 1, whereas probability functions are. This means that this estimation may return results outside the possible range of probabilities. In order to counter this problem we can estimate a probability function using probit or logit estimators which are calculated using the maximum likelihood method [of which I am not going to which anything – assuming it will not be examined in detail].

The McFadden paper deals with car versus bus commuting in the SF Bay area.

Multiple Choices

Often we want to think about more than one choice, which requires us to extend this model. We can extend the random utility model to many choices Uij = Vij + εij. Now an actor will choose alternative k if the utility derived from this choice is higher than for all other choices:

Vik + εik > Vij + εij for all j≠k 

If we assume an extreme value distribution then the solution for the probability choice is given by P(yi = k) = exp(Vik) / ∑ exp(Vij). This is a generalization of the logit model with many alternatives, hence the name “multinomial logit”. The model compares choices to some predetermined base case.

Independence of Irrelevant Alternatives (IIA)

One drawback of the multinomial logit method is the IIA problem. This is driven by the assumption underlying the model, that if one choice is eliminated in time t=1, the ratio of individuals choosing the remaining option much remain constant from the pre-elimination period t=0. For example if in t=0 40 people take bus A, 12 people take bus B and 20 people drive, and then in t=1 the B bus company goes bust. In t = zero, the ratio of people driving relative to those taking bus A is 2:1. This must remain constant in t=1 so the model assumes that 24 people will dive and 48 will take bus A. This might not be a valid assumption is bus seats are not supplied elastically, or bus A and bus B were not substitutes.

It is simple to see why this is the case, as the underlying assumption of the model is that P(yi = k) = exp(Vik) / ∑ exp(Vij), and this clearly cannot change simply because one of the other alternatives has been eliminated. 

This can be solved using the nested logit model. Conceptually this decomposes the choices into two separate stages. In the first stage the individual chooses whether to take his car or public transport. If he decides on public transport then he must decide between bus A and bus B. This choice structure is estimated using sequential logits whereby the value placed on the alternatives in the second stage are entered into the choice probabilities in the first stage.

Aggregate Choice Models

Aggregate choice models are useful when individual data are not available, and also when computing power is an issue (due to many fewer observations). All of the above models have aggregate equivalents. In fact, using the Poisson model with a max likelihood estimation method, aggregated data give exactly the same coefficient estimates as the conditional logit model when the only data available are the choice characteristics (i.e. how many people chose what). Multinomial logit will be better when there are accompanying individual/group-level characteristics.

Gravity Models

Choices can also be modeled as flows between origins and destinations. This is widely applied in the fields of trade, migration and commuting.  A flow from place j to k can be modeled as:

Ln(njk) = βXjk + αj + αk + εjk

where the alphas represent characteristics of the source and destination such as population, wages etc., a cost of moving measure can also be included. This literature has found strong distance decay effects, which are puzzling in many cases (e.g. trade) as the cost of moving goods further is now fairly marginal.

Discrete v Aggregate: discrete choice models have the advantage that firm level characteristics can be incorporated, and there is a strong theoretical model underlying the estimations. Aggregate flows on the other hand are easier to compute and there is no need to make assumptions about the functional form that are necessary for the non-linear maximum likelihood estimators. One disadvantage is that no separation of the individual/aggregate factors is possible.