Predicting GDP with spatial network analysis

Here’s an interesting little plot I thought I’d share. It shows Mean Angular Distance (MAD 320km scale) on the x axis and GDP per capita on the y. High MAD means less accessible, so the plot shows a trend where less accessibility is associated with less productivity.

The blue outlier is Aberdeen, by the way – rich in oil income but situated in the back of beyond. And London is a long way off the chart to the top left.

Interestingly, MAD measures only accessibility of spatial networks rather than raw quantity. You might think raw network quantity (highly correlated with population and jobs, r2>0.9) would be a better predictor of GDP than accessibility. Maybe it is, if you get the scale of measurement right (accessibility on a large scale could be a proxy for quantity on a small scale). But for the range of scales I tested (10,20,40,80,160,320,640km) it wasn’t.

Anyway this is just interested tinkering, rather than a serious piece of research. More importantly it was a test of a truly large scale network analysis with the upcoming sDNA version 2. We saved computation time by only computing accessibility for a sample of points in each region (sDNA will have the capacity to do this); in fact the whole computation took less time than it took to load and save the data, which for a million-link feature class is quite substantial. The whole job was done in an hour or two.

UPDATE 30/10/2014

I followed this up with the new sDNA+, making a model with population assigned to the network. It was tested for many more radii: 5, 10, 15, 20, 25, 30, 40, 50, 60, 70, 80, 100, 120, 140, 160, 180, 200, 240, 280, 320, 360 and 400 km; in each NUTS region I sampled 20 points on the network to compute accessibility on these scales.

It turns out by far the best predictor was plain ol’ population (Weight, within sDNA – when you have assigned population data to network weight). Population within a 10km network buffer has r=0.7 with GDP per capita. This beat network gravity models formed with the MAD, MED, NQPDA and NQPDE accessibility stats.

Computing accessibility on all those scales, for 20 points in each of 113 regions, ran overnight on my Core i3 (not too fast) processor. I skipped the more expensive sDNA computations by putting nobetweenness;nojunctions;nohull into the advanced config. In fact, I might as well paste the whole advanced config for this analysis here:

nobetweenness;nojunctions;nohull;skipzeroweightorigins;origweight=compute;destweight=link_population

Compute was a field I set to 1 for links I wanted to compute accessibility for; 0 otherwise. I chose these at random with a probability inversely proportional to the number of links in each region.