Tag Archives: public data

UK crime data as clear as mud

I’m a nosy neighbour. I like to know what’s going on in my area: who’s bought the house next door, local planning applications, any dodgy activity going on? My husband and I are both self employed so there is usually at least one of us out and about in Caversham during the day. That means we have the chance to chat with our local postman, workmen digging up the road, Police Community Support Officers doing their rounds and with people in the local shops, bank and post office. Crime, not surprisingly, is a major topic on our “watch list” and just over two years ago police forces in England and Wales started to provide access to local crime statistics via online maps. The new service allowed you to drill down to ward level and view trends in burglary, robbery, theft, vehicle crime, violent crime and anti-social behaviour.

The format varied from one police force to another. For example Thames Valley Police provided a basic map and tables of data:

Thames Valley Police 2008 crime rates

Others such as the Metropolitan Police included additional graphical representation of the statistics such as  bar charts:

Metropolitan Police 2008 Crime Rates

None of them pinned down incidents to individual streets or addresses but they did give you an idea of the level of crime in a particular neighbourhood, how it compared with the same period the previous year and whether it was high, above average, average, below average, low or no crime. They were short, though, on detailed definitions of what each category of crime included. I looked at these maps out of personal curiosity rather than using them for any serious business application, and I made certain assumptions such as murder being included under ‘Violence against the person’. That may not have been the case.

Some police forces placed obvious links to the information on their home pages whilst others buried the data in obscure corners of their web sites. The crime maps where then all moved to the CrimeMapper web site – the Thames Valley Police map can still be seen at http://maps.police.uk/view/thames-valley – but that has now been integrated into Police.uk website, which “includes street-level crime data and many other enhancements“.

All you have to do is go to http://www.police.uk/, type in your postcode, town, village or street into the search box and “get instant access to street-level crime maps and data, as well as details of your local policing team and beat meetings“. The first screen looks good with news of local meetings, events, recent tweets, YouTube videos and – as the home page promised – information on my local policing team.

Police UK page for RG4 5BE

When I focus on the map to look at the detail there are markers for the location of the crimes and clicking on them gives you a brief description of the crime:

Detail on Police UK crime rates for Caversham

In this example, the detail box had details of two crimes “on or near Anglefield Road” and this is where I started to become confused. Were the burglary and the violent crime part  of the same incident or totally separate? Furthermore, if you look in the left hand panel of the screen you will see “To protect privacy, individual addresses are not pinpointed on the map. Crimes are mapped to an anonymous point on or near the road where they occurred.” Fair enough, but I would like to know how near ‘near’ is. 100, 200, 400 yards? Half a mile, a mile? And does the focus shift from one street to another from one month to the next? If it stays put then a street could gain a crime rate reputation that it does not deserve but if it shifts there is no way one can compare data from one month or year to another, which brings me to my next question.

Why is there only one month’s data? Previous versions of the crime maps gave you three months data for the current and the previous year for comparison. There is nothing about this in the Help section of  the Police UK site but the Guardian reports:

police forces have indicated that whenever a new set of data is uploaded – probably each month – the previous set will be removed from public view, making comparisons impossible unless outside developers actively store it.” (Crime maps are ‘worse than useless’, claim developers http://www.guardian.co.uk/technology/2011/feb/02/uk-crime-maps-developers-unhappy?CMP=twt_iph).

This means that if you want to run comparisons over time you will have to download the files and store them on your own system each month, or find someone else who is already doing it.

The Guardian article also says:

the Information Commissioner‘s Office (ICO) advised that tying crime reports to postcodes or streets with fewer than 12 addresses would render the individuals involved too identifiable. The police have also decided to remove data about murders or sexual assaults.

With respect to the latter the help file on the Police UK site suggests otherwise:

Crimes have been grouped into six categories following advice from the Information Commissioner’s Office. This doesn’t mean that the crimes listed under ‘other’ are not seen as important. Rather it ensures that for some of the more sensitive crimes there is even greater privacy for the victims.

So which is it: murders and sexual assaults are not included at all or aggregated under “other”? Jonathan Raper says on his blog Placr News (“Five reasons to be cautious about street level crime data” http://placr.co.uk/blog/2011/02/five-reasons-to-be-cautious-about-street-level-crime-data/):

Some data is redacted eg sexual offences, murder. The Metropolitan Police has already released this data to ward level though… and it is easy to cross-reference one murder in one ward to reports in the local press at the same time

Data visualisations and mashups are becoming increasingly popular and make it considerably easier to assess a situation and view trends. The Guardian Datablog (http://www.guardian.co.uk/news/datablog), for example, encourages people to take data sets, mash them up and create their own visualisations, and upload a screen shot to  the Guardian Datastore on Flickr (http://www.flickr.com/groups/1115946@N24/). It is vital, though, that the source of the data, whether the full data set or just a selection has been used, and whether or not it is going to be updated is clearly spelt out. All too often one or even all of these are missing from the accompanying notes, and in some cases there are no notes at all!

An example of good practice is “UK transport mapped: Every bus stop, train station, ferry port and taxi rank in Britain” (http://www.guardian.co.uk/news/datablog/2010/sep/27/uk-transport-national-public-data-repository). The posting clearly states the source (http://data.gov.uk/dataset/nptdr) and its coverage:

“A snapshot of every public transport journey in Great Britain for a selected week in October each year. The dataset is compiled with information from many sources, including local public transport information from each of the traveline regions, also coach services from the national coach services database and rail information from the Association of Train Operating Companies”

It then goes on to specify the time period  (5-11 October, 2009) and the tools that were used to create the visualisation.

Another is the “Live map of London Underground trains” (http://traintimes.org.uk/map/tube/). This shows “all trains on the London Underground network in approximately real time“. The source is a live data feed from Transport for London (TfL) and the notes state that a “small number of stations are misplaced or missing; occasional trains behave oddly; some H&C and Circle stations are missing in the TfL feed.” It would be helpful to have a list of those missing stations, but the site has at least brought the issue of potential missing data to the users’ attention.

Returning to the Police.uk crime data, there are three major problems with the site for me as a researcher:

1. Are all crimes included in the database, or are some such as murders and sexual assaults excluded altogether or aggregated under “other”? More detailed and unambiguous scope notes please.

2. The street data level is useless. The markers are not exact locations but “near” to, there is no definition of “near”, no information on how the position of the marker is calculated or the geographic radius that it covers. It would be better to return to aggregated data at the ward level.

3. There are no options for comparing time periods and it seems that historical data will not be available on the web site. An ad hoc researcher will have to spend time and effort tracking down a developer or a web site that is downloading and keeping copies of all of the datasets as they are published.

The new crime data web site is a retrograde step. We need transparency and clarity rather than the muddle and confusion that has been generated by the lack of information on what is being provided.