Friday 27 September 2013

Origin Destination Data Expansion Tool

If an origin destination survey only collects a sample of the trips made across a cordon area, this data can be combined with total flow counts at each site to produce a complete dataset. This can quite accurately represent the true situation on the ground, and can reduce the costs of carrying out such a survey.

Using ANPR technology, detection rates in excess of 95% can often be achieved (albeit at quite a high cost), and this leaves little for the tool to do, but if you have collected Bluetooth, WiFi or RFID signatures, or adverse weather affects an ANPR survey, the sample rate can be significantly lower. In these cases it is especially useful to be able to easily carry out 'bi-proportional matrix balancing', sometimes known as the 'Furness method'.

I have developed a free online tool to carry out this process.

It is also available in Buchanan Computing's MicroMatch, but this does cost money, and may not always be applicable to your data since it is just intended for registration data.

The process can similarly be applied to data collected through road-side interviews or questionnaires where only a small proportion of travellers provide information about their trips.

The software employs a process similar to that described here, and my implementation uses Python and NumPy.

Another use for the tool is to 'factor up' existing OD data from a previous survey, to meet new total flows which have been collected more recently. There are arguably better approaches to this type of problem however, such as using a gravity model.

For full information about how to use it, please read on, as there is limited information on the webpage.


Two CSV (comma separated values) files are required - see the following examples:
LFData.csv
TCData.csv

The first is for the total flows into and out of the cordon at each site for each time interval.  The software supports multiple classes, which have a column each, whose header must be labelled. 5, 15 or hourly intervals should work fine, but there must be an entry for each time interval, even if all counts are 0. These values represent what the seed data is to be factored up to (the target values), and will often be ATC data or manually classified vehicle link counts by direction.

The second is for the seed data to be expanded. This could be ANPR, Bluetooth or RFID match data for the cordon area which constitutes a sample of the vehicles using the cordon.

The provided seed data from this file may optionally be fixed so it cannot be expanded. This allows the software to serve a different purpose in 'filling in' OD match pairs where that data is unavailable. This can be the case when large junctions are enumerated from video footage by following vehicles manually through a junction. Occasionally one or more turning movements cannot be seen sufficiently clearly, and this method may provide an acceptable alternative to an expensive resurvey. When using the tool like this, the values for the missing turning movements are seeded with low values, which then hopefully find equilibrium. You may want to run the data through twice, once to fill a missing OD match pair, and then again to adjust for any inaccuracy in the recorded turning counts.

In some cases it may also help to untick the option to expand return trips. This can be useful if you have 95% of the data, and the tool is adding unwanted extra pairs which seem out of place (u-turns at a junction for example). This may happen if there were some problems with the raw match data.

The number of iterations you require depends on how well your dataset is converging, and what characteristics you require from the data. Experimenting with this value is probably a good idea.

Once you click 'Expand Data' the data should be processed within a minute, and you will be presented with several files containing different views of the new dataset.

Processed Data - this is the expanded trip data in the format you supplied
Time Series Data - this is convenient for graphing the distribution of trips over time

Target Match - this shows how closely the seed values converged upon the target values
Processed Data Summary - this aggregates the Processed Data over the whole survey period


I would welcome any comments of suggestions regarding this tool, and if you have data in a format that you cannot convert to the required format, please let me know, and if time allows, I may add support for it.

If your requirements are more complicated than this, I would also be happy to discuss them with you.

No comments:

Post a Comment