Saturday, 22 November 2014

Where is that ATM?

To read the source Content

Where is that ATM?

Mike Sanderson suggests how Big Data can help us make better decisions - provided we can crack the thorny issue of ontology

About thirty years ago we had this marketing strapline for a multi-dimensional cube analyser. It went along the lines of being able to carry out work efficiently (doing the job right) and effectively (doing the right job). Twenty years ago Hammer & Champy1 set the IT consultancy world alight with the concept of business process re-engineering (BPR). This was about doing the job right and re-organising processes accordingly. 

The IT industry now has the tools to identify what the right job is, to go alongside our ability to engineer business processes. It’s known as Big Data in the industry. My proposition is simple, we now have tools to analyse structured and unstructured data to make better decisions, like where to find an ATM at the local supermarket.

Like BPR, its origins came from the consultancy world. In the Big Data phenomenon, McKinsey2 grabbed a large share of the zeitgeist at day one. Like most in our industry I suppose, I groaned inwardly as the cynic in me read the column inches on Big Data. After all, as Henderson said in September’s issue3, we have always had big data in spatial. The trouble is that McKinsey’s view of big data is built around personal location data – that coming from tablets and mobiles. What must we do so that the location industry plays a part?

About co-ordinates

Using location data is dangerous. Examine the two images at the top of the next page concerning the possible range of North Korean missiles. That on the left might equate to a big data answer without geodesy playing a part. That on the right ... well make up your own mind.

About scale

In thinking about this subject I came across the Data Science@Berkeley Blog on What is Big Data? The first contributor (John Akred, CTO of Silicon Valley Data Science) included these words in his contribution:

“Advances in sensing technologies, the digitization of commerce and communications, and the advent and growth in social media are a few of the trends which have created the opportunity to use large scale, fine grained data…”.

So I have no need to be worried about scale. It seems they get it after all, or perhaps not? As with the geodesy example, the unwary will not necessarily get the right answer. We can smile knowingly about these issues or we can help the big data users out. But before we get to how we might do this, there is one area with which the spatial industry and the big data industry both struggle.

About semantics

We can’t get to the right answer if we can’t frame a question that big data and the spatial industry jointly understand. This is not about a point-in-polygon search it’s about the ontology word that frightens everyone off. What do I mean when I say St. Pancras station? It could be the façade, the concourse, the tracks, the platforms, the hotel or all of these. But unless we know we will arrive at either the wrong answer or the wrong location. Big data will need to solve this issue if it is going to succeed.

What next?

I can think of three options where the spatial industry can play a part. There are probably others, but I do not have space for them here. Some of these ideas were also discussed last month by Millard (Frankenstein’s Data, GeoConnexion, Sep 2014) I am just trying to formalise them here. They are:

  • Intermediate

  • Add co-ordinates to enterprise data

  • Provide a platform to assure data quality.

Intermediate: This involves taking the complexity of scale and geodesy out of spatial data for the non-spatial user. The user interface will have to solve the ontology issue described above and as Millard describes the customer will want to know what the costs and benefits are of using open data only, a mixture of open and licensed data or licensed data only. 

Add co-ordinates to enterprise data: The biggest issue is that most enterprise data does not have geo co-ordinates attached. So all those work orders in asset intensive industries raised through Enterprise Management work management applications; the financial markets buy and sell orders are two examples where geo co-ordinates are not entered typically. In these examples we may be trying to optimise work planning or detect fraud in the latter example. The 1990s solution to these challenges involved heavy lifting of data and trying to integrate GIS and work management applications. That hasn’t worked effectively.

Our industry has been good at geocoding and reverse geocoding but tended to internalise the results of these operations within the GIS. It’s time to do it the other way around and if a map is the required output, then interface.

Provide a platform to assure data quality: I would contend that it is not enough to let the market decide what represents a good quality data set. As I have described above, and others have also, spectacular failures can occur when spatial data are misused. 1Spatial has provided a cloud-based platform for nearly five years for users to test data for geometric and topological consistency as part of a triple A data approach. 

The genie is out of the bottle in terms of personal location data being available to big data decision-makers (and it has been to governmental decision-makers for even longer), so I can do no better than quote Dr. Brian T. Gray, Assistant Deputy Minister, Earth Sciences Sector, Natural Resources Canada:

“Canadian government departments are working together to launch the Federal Geospatial Platform to enable online discovery, access, integration and visualization of multiple layers of accurate, authoritative and accessible geospatial information. That way users can search once and find everything.” 


So will the combination of BPR and Big Data make organisations more effective and more efficient? The answer to that is yes if the GIS industry helps the enterprise handle location more easily. And we will both benefit from working together to crack the ontology problem.

No comments: