Some of the slides that I used as part of my June 2016 workshops on Business Information are now available on both SlideShare and authorSTREAM. The workshop run in the last week of June inevitably included a session on the EU referendum and the Brexit result. A few of those extra slides are included in this edited version of the presentation.
On one of my recent workshops I was asked if I used Google as my default search tool, especially when conducting business research. The short answer is “It depends”. The long answer is that it depends on the topic and type of information I am looking for. Yes, I do use Google a lot but if I need to make sure that I have covered as many sources as possible I also use Google alternatives such as Bing, Millionshort, Blekko etc. On the other hand and depending on the type of information I require I may ignore Google and its ilk altogether and go straight to one or more of the specialist websites and databases.
Here are just a few of the free and pay-per-view resources that I use.
Information on industry sectors
My favourite collection of guides on industry sectors is The British Library Business Information and IP Centre’s industry guides. These highlight relevant industry directories, databases, publications and web sites and are excellent starting points if you are new to a sector
There are at least a dozen statistics sites that I use on a regular basis but if I’m unsure of where to look or want to make sure I haven’t missed anything I use OFFSTATS – Official Statistics on the Web at http://www.offstats.auckland.ac.nz. A great starting point for official statistical sources by country, region subject or a combination of categories. All of the content in the database is in the public domain and available through the Internet and has been quality assessed by staff at The University of Auckland Library.
Official company information
If I want to confirm the existence of a company or obtain filings and accounts I usually go direct to the relevant official company registry. I have a list of the registries that can be searched online at http://www.rba.co.uk/sources/registers.htm. As many of my enquiries are for UK companies I am a frequent visitor to Companies House at http://www.companieshouse.gov.uk/. Some information is free but filings and accounts are priced. There are several companies that repackage Companies House data and sometimes make extra data or analysis free of charge for example Company Check at http://www.companycheck.co.uk/, which enables you to search by company or director’s name. Risk reports, information on CCJs, and some official filings are priced if you do not have a subscription to the full service.
Share price information
For free share price information I use Yahoo Finance (http://uk.finance.yahoo.com/) and Google Finance (https://www.google.co.uk/finance). Both of these services provide charts and news on shares on the major stock markets. Google’s graphs are ‘annotated’ with labels that link to news articles listed to the right of the graph, so you can see whether or not a particular event or announcement has affected a share price. Both offer free, daily historical share prices in figures. As well viewing historical graphs for share prices you can download the data as a spreadsheet.
For news alerts I use a mixture of bookmarked searches, Google email and RSS alerts, and RSS feeds from a wide range of blogs and news sites. I find that Google alerts are erratic and unreliable but they do sometimes pick up something unique so I still include them in the mix. RSS feeds are my main source of current awareness and when a news feed is rather broad in coverage I use my RSS reader’s search function to identify the articles that are of interest to me. I use a desktop reader call RSSOwl (http://www.rssowl.org/) but Inoreader (http://inoreader.com/) is a web based service that offers similar features and options.
Who is behind a site?
If I am to use any information from the web for business purposes I need to know who is behind the website. DomainTools http://www.domaintools.com/ is one of many services that will tell who owns a domain name, unless they are hiding behind an agent or privacy protection service. There is also a Whois+ extension for Chrome (my default browser) that can be used to run a quick and easy check on the domain name of a displayed page.
If you are interested in finding out more about business information resources I am running a workshop for TFPL in London on
March 6th , June 6th 2014. Details are on the TFPL website.
Update: please note change of date for the next business information workshop. It is now being held on June 6th, 2014.
I recently mentioned Gridwatch (How the UK’s electricity is generated http://www.rba.co.uk/wordpress/2013/08/21/how-the-uks-electricity-is-generated/) as a way of tracking how much energy is passing through the National Grid and the technology used to generate that electricity. Although Gridwatch is a great way of observing the total amount of electricity that is generated by each technology – gas, coal, wind etc – it does not go into any detail with respect to individual installations. The Digest of UK energy statistics (DUKES) produced by the UK Department of Energy & Climate Change (http://www.gov.uk/government/organisations/department-of-energy-climate-change/series/digest-of-uk-energy-statistics-dukes) includes a spreadsheet listing all of the operational power stations, fuel that they use, installed capacity, location, and the year that generation began. The direct link to the spreadsheet is http://www.gov.uk/government/uploads/system/uploads/attachment_data/file/226895/dukes5_11.xls. This is historical data and the current list refers to plants in operation at the end of May 2013.
UK Energy Watch has a map (http://www.ukenergywatch.org/Electricity/PowerStations) showing the location of UK power stations of 400 MW or larger, so it is by no means comprehensive. It does allow you, though, to click on a plant and display current generation except for CCGT stations (Combined Cycle Gas Turbine).
There are more options available when it comes to what are called renewables (for example wind, solar, biomass, hydro). The DECC’s RESTATS interactive map at http://restats.decc.gov.uk/app/pub/map/map/ enables you to search by technology, region, county, planning authority and application status. It also has a separate map for wind farm capacities.
The site information includes installed capacity, details of the planning application but not how much energy is actually being produced. RESTATS says that “Information is held on the performance of operational projects but owing to the need to maintain the commercially sensitive nature of these data, specific site details and performance figures are not disclosed“.
The UK Data explorer has produced a renewables map at http://ukdataexplorer.com/renewables/ that uses the RESTATS data and shows operational renewable electricity sites over 0.01 MW. The different colours represent the type of plant and the area of the circles indicate installed capacity (maximum power output).
To see details of a specific installation you should be able to hover over a point on the map. This did not work for me with some of the smaller plants and when I tried to zoom in on an area I often lost the background map.
The Interactive Map of Renewable and Alternative Energy Projects in the UK at http://www.renewables-map.co.uk is another interactive map and can be filtered by technology type and planning status.
According to the website the information is gathered from “a wide range of web resources, in all cases these will be referenced, usually by a link to that information. Locations are either taken from existing data, usually from planning applications, or by painstakingly identifying the location on the ground using online maps.” I am not sure how up to date the map is and I noticed that the smaller hydro installations along the Thames are missing. Another problem that I have experienced with this site is that when I click on “More details” for an installation I get far too many “internal sever errors”. However, when the information does appear it includes useful comments on the technology, links to relevant websites and the latest news.
The final one in my list is from the energy generating company RWE Innogy (http://www.rwe.com/web/cms/en/194914/rwe-innogy/sites/production-data-live/). Its interactive map provides information on most of its European plants and includes wind farms, hydro power plants and biomass CHP (Combined Heat and Power). The production data is updated every minute. To see information on an installation, click on its icon on the map. The information includes live production, location, type of installation and when production started.
These are by no means the only websites offering interactive maps and information on UK energy production, and none of them give the full picture. They are good starting points, though, if you are interested in researching individual technologies or individual power stations.
Statista (http://www.statista.com/) is a portal for statistics, market data, market research and survey results. It aggregates data on over 600 international industries from more than 18,000 sources, including market researchers, trade organizations, scientific journals, and government databases. You can get a rough idea of what the service covers from the list of topics at http://www.statista.com/topics/, although this does give the impression that it is heavily US biased, which is not the case.
Searching Statista is straightforward. The default is to include what it calls word fragments, so a search on the term bank will also find banking. This can be changed to identical words if you require an exact match. To search for phrases place double quote marks around your terms, and documents containing a specific term can be excluded by prefixing the word with an exclamation mark. You can ‘OR’ terms by inserting the vertical bar symbol (|) between them, for example dog | cat. To refine your search further use the options on the right hand side of the results page. These include date of publication, region, free/fee-based/all, and category.
Trying to view the data and charts that you find can be a frustrating process as most of the information is only available on subscription. There is a limited amount of free information but you still have to register for what Statista calls its basic free access. The premium account for an individual user is USD 49/month and runs for a minimum of 12 months. This is automatically renewed so make sure you cancel before the year is up if you no longer wish to use it. The corporate subscription is USD 199.90/month. Again the subscription runs for 12 months. A free trial is available for the corporate service but not for individual users. Prices for universities, libraries and education are available on request.
On the results page you can easily identify the subscription content by the red plus sign next to the chart icon to the left of the result. A red ‘P’ indicates a priced report that is free to corporate clients but can be purchased by basic and premium account holders.
Clicking on a result takes you to a page where the data is displayed as a chart or graph depending on what Statista thinks is the most appropriate for that statistic, but you can change the representation if you wish. There are also options for downloading the data in various formats and links to the original source.
Information is provided on the source and includes title of the original document, the publisher, publication date, time period covered by the survey and the geographical regions covered. Statista also adds notes such as definitions of terms used in the original report and observations on possible missing data.
Much of the information on Statista can be found by via Google or Bing using advanced search commands and options, but you will probably have to wade through a significant amount of irrelevant information before you find the data you need. Statista can certainly save you a considerable amount of time, and the data displays are a great way of helping you identify the potential usefulness of a report, as can the metadata and notes.
Is the subscription worth it? It is definitely cost effective for corporates in terms of saving search time and ensuring accuracy and quality of information. I have had good feedback about Statista from several of my clients. I am not so sure about individual users. Some of my own research is highly specialised and I am not convinced that Statista covers all of the areas of interest to me. It is difficult for me to assess its value without trialing the full service first, and there are no pay as you go options such as a day or weekly rate. I might even consider paying for a month’s access for a project but it is a full year’s subscription or nothing. Nevertheless, I recommend you take a serious look at it and if you are contemplating a corporate subscription you can ask for a free trial. It is now on my list of essential research tools.
Towards the end of June I headed off to Swansea Central Library to facilitate a workshop on search tools and techniques for finding business information and statistics. The session was organised for the libraries of the wonderfully named SWAMP – South West and Mid Wales Partnership.
We had fantastic views from the library of the sea and shore line so they did very well to remain focused on the work in hand. The top tips that the group suggested at the end of the day were a mixture of search techniques and business information sites.
Don’t give up and don’t get stuck in a rut. If your first attempts fail to produce anything useful try a different approach to your search. Try some of the tips mentioned below: use advanced search commands, a different search tool or go direct to a website that covers your subject area or type of information.
Google automatically looks for variations on your search terms and sometimes drops terms from your search without telling or asking you. To beat Google into submission and make it run your search exactly as you have typed it in, click on ‘Search tools’ in the menu above your results, then click on the arrow next to ‘All results’ and from the drop down menu select Verbatim.
3. Private Browsing.
To stop search engines personalising your results according to your previous searches and browsing behaviour, find out where the private browsing option is in your browser (in Chrome it is called Incognito). This ignores all cookies and past search history and is as close as you can get to unfiltered results.
Short cuts to private browsing in the main browsers are:
Chrome – Ctrl+Shift+N
FireFox – Ctrl+Shift+P
Internet Explorer – Ctrl+Shift+P
Opera – Ctrl+Shift+N
Safari – click on Safari next to the Apple symbol in the menu bar, select Private Browsing and then click on OK.
4. The site: command.
Include the site: command in your search to focus your search on particular types of site, for example site:ac.uk, or to search inside a large rambling site. You can also use -site: to exclude sites from your search. For example, if you are searching for information on Wales and Australian websites mentioning New South Wales keep coming up include -site:au in your search.
5. The filetype: command.
Use the filetype: command to limit your research to PowerPoint for presentations, spreadsheets for data and statistics or PDF for research papers and industry/government reports. Note that in Google filetype:ppt and filetype:xls will not pick up the newer .pptx and xlsx formats so you will need to incorporate both into your strategy, for example filetype:ppt OR filetype:pptx, or run separate searches for each one. In Bing.com, though, filetype:pptx will pick up both .ppt and .pptx files.
6. Guardian Data Store (http://www.guardian.co.uk/data/)
For datasets and visualisations relating to stories in the news. This is proving to be a very popular site on both the public and in-house workshops. As well as the graphs and interactive maps the source of the data is always given and there are links to the original datasets that are used in the articles.
7. Company Check (http://www.companycheck.co.uk/)
Company Check repackages Companies House data and provides 5 years of figures and graphs for Cash at Bank, Net Worth, Total Liabilities and Total Current Liabilities free of charge. It also lists the directors of a company. Click on a director’s name and you can view other current and past directorships for that person.
8. BL BIPC industry Guides
The British Library Business Information and IP Centre’s industry guides at http://www.bl.uk/bipc/dbandpubs/Industry%20guides/industry.html highlight relevant industry directories, databases, publications and web sites. Excellent starting points if you are new to the sector.
9. Web archives for documents, pages and sites that are no longer “live”.
Most people know about the Internet Archive’s Wayback Machine at http://www.archive.org/and its collection of snapshots of websites taken over the years. There is also a collection of old UK government webpages at http://www.nationalarchives.gov.uk/webarchive/, and the British Library has a UK web archive at http://www.webarchive.org.uk/ukwa/.
10. Keep up to date
Keep up to date with what the search engines are up to, changes to key resources and new sites. Identify blogs and commentators that are relevant to your research interests and subject areas and follow them using RSS or email alerts.
I regularly mention Zanran (http://www.zanran.com/) in my workshops on search and business information, and it often finds its way into the Top Tips compiled by the delegates at the end of the day.
Zanran is not a Google alternative. Rather than search the text of web pages it extracts and indexes numerical data presented as tables, charts and images in PDF reports, spreadsheets and ordinary web pages. You can simply type in your search terms but there are additional options for narrowing down the search by location of the web server, specifying an individual site, selecting a time period and limiting by file type.
The results page lists the files it has found with an extract highlighting the content containing your terms. In this example I am looking for data on agricultural methane emissions in the UK.
To the left of each entry is a thumbnail. Moving the cursor over the thumbnail brings up a preview of the page containing the relevant chart, table or image. This enables you to immediately assess the relevance of the data without having to download and go through a lengthy document.
If you click on the thumbnail or the title to view the whole document you have to register (free of charge) as copies of the indexed documents are stored by Zanran. If you prefer to go to the original document click on the URL button attached to the summary of the page and click on the link that is then revealed. Unfortunately, you may see “page not found” especially if it is on a UK government department web site. Many of these have now been closed and their content archived making it difficult to track them down. Registering with Zanran is by far the easier option. Also, rather than deluge you with documents from a single site, as Google all too often does, Zanran gives you a link telling you if and how many other results are available on a site.
How does it compare with Google? Well, Google did come up with relevant results for my search but I had to spend a lot of time ploughing through them to identify the best documents. And Google did not pull up in the first 100 results the very useful archived UK government documents that Zanran gave me.
If you are looking for data or statistics Google still does a very good job but I recommend you also run a search in Zanran. It may well come up with a real gem, as it often has for me.
The EU has launched a public beta of its new open data portal at http://open-data.europa.eu/open-data/. Open data is information that can be freely used, re-used and redistributed by anyone. The EU portal covers all the information that public bodies in the European Union produce, collect or pay for. At present it has 5,811 datasets of which 5,634 come from Eurostat, the statistical office of the EU.
You can search the datasets by keyword and refine your results using the keywords and publishers listed on the right of the screen.
Alternatively there are options for browsing the datasets using tags and keywords. This may be easier if you are not sure of what terms to use.
Using the tags also seems to be more reliable. A search on coal production gave me one relevant dataset but the rest of the results only had production as a keyword. I was seeing sets for carrot production, production of butter, sunflower production etc. I assume that ‘coal’ had been dropped because there were so few results containing both terms. Searching on just coal reduced the number of results from around 5000 to 7, one of which was highly relevant (Primary production of coal and lignite). The other 6 covered energy production in general including coal. Browsing and narrowing down the sets using the tags does seem to be the best way of navigating the data at the moment.
Once you have identified a relevant dataset additional information such as time span and date last modified are provided together with links for downloading the data.
It’s then up to you to find a way of viewing and analysing the data!
StatsWales is the key website to visit for statistics on Wales. A new version of the site, StatsWales ‘Beta’, has been launched with a full launch planned for Monday 3rd December.
New features include:
- improved search capability
- enhanced charting
- direct URI access to data catalogue and reports
- better sharing of reports including those personally tailored/configured
- additional direct data access formats
- more powerful personalisation
- support for legacy links
The old platform will be available until December 31st
While data is being transferred to the new system access to both the new and the old services will be provided as follows:
https://statswales.wales.gov.uk – will link to the new system
http://statswales1.wales.gov.uk – link to the old StatsWales system will work until 31st December
http://statswales.wales.gov.uk – will point to the actual current system in use during this transition period
A video tutorial on the new system is available at https://www.youtube.com/watch?v=a08s26rDM1g
Public Data Explorer has escaped the cull at Google Labs, from which it has now “graduated”. There were signs that it might survive when Google started adding public data charts to the top of the results for some statistics searches. It can now be found at http://www.google.com/publicdata/home. Public Data Explorer allows you to search and compare sets of public data such as population, energy consumption, mobile phone usage and government debt across countries. You can select the countries or states that you wish to compare and view the data as line graphs, bar charts, map visualizations or bubble charts.
Data sets include information from the OECD, World Bank, Eurostat and various US government departments. There is also an option to upload your own data sets – details are at http://www.google.com/publicdata/admin. This probably accounts for some of the very specific data that is now available, for example: Australian Population Estimates, Unemployment rate Italy and Catalan municipalities indicators. The full list is at http://www.google.com/publicdata/directory. It is worth exploring the more generic titles such as “World Bank, World Development Indicators” and “OECD Factbook 2010” to uncover the full range of what is available.
I’m a nosy neighbour. I like to know what’s going on in my area: who’s bought the house next door, local planning applications, any dodgy activity going on? My husband and I are both self employed so there is usually at least one of us out and about in Caversham during the day. That means we have the chance to chat with our local postman, workmen digging up the road, Police Community Support Officers doing their rounds and with people in the local shops, bank and post office. Crime, not surprisingly, is a major topic on our “watch list” and just over two years ago police forces in England and Wales started to provide access to local crime statistics via online maps. The new service allowed you to drill down to ward level and view trends in burglary, robbery, theft, vehicle crime, violent crime and anti-social behaviour.
The format varied from one police force to another. For example Thames Valley Police provided a basic map and tables of data:
Others such as the Metropolitan Police included additional graphical representation of the statistics such as bar charts:
None of them pinned down incidents to individual streets or addresses but they did give you an idea of the level of crime in a particular neighbourhood, how it compared with the same period the previous year and whether it was high, above average, average, below average, low or no crime. They were short, though, on detailed definitions of what each category of crime included. I looked at these maps out of personal curiosity rather than using them for any serious business application, and I made certain assumptions such as murder being included under ‘Violence against the person’. That may not have been the case.
Some police forces placed obvious links to the information on their home pages whilst others buried the data in obscure corners of their web sites. The crime maps where then all moved to the CrimeMapper web site – the Thames Valley Police map can still be seen at http://maps.police.uk/view/thames-valley – but that has now been integrated into Police.uk website, which “includes street-level crime data and many other enhancements“.
All you have to do is go to http://www.police.uk/, type in your postcode, town, village or street into the search box and “get instant access to street-level crime maps and data, as well as details of your local policing team and beat meetings“. The first screen looks good with news of local meetings, events, recent tweets, YouTube videos and – as the home page promised – information on my local policing team.
When I focus on the map to look at the detail there are markers for the location of the crimes and clicking on them gives you a brief description of the crime:
In this example, the detail box had details of two crimes “on or near Anglefield Road” and this is where I started to become confused. Were the burglary and the violent crime part of the same incident or totally separate? Furthermore, if you look in the left hand panel of the screen you will see “To protect privacy, individual addresses are not pinpointed on the map. Crimes are mapped to an anonymous point on or near the road where they occurred.” Fair enough, but I would like to know how near ‘near’ is. 100, 200, 400 yards? Half a mile, a mile? And does the focus shift from one street to another from one month to the next? If it stays put then a street could gain a crime rate reputation that it does not deserve but if it shifts there is no way one can compare data from one month or year to another, which brings me to my next question.
Why is there only one month’s data? Previous versions of the crime maps gave you three months data for the current and the previous year for comparison. There is nothing about this in the Help section of the Police UK site but the Guardian reports:
“police forces have indicated that whenever a new set of data is uploaded – probably each month – the previous set will be removed from public view, making comparisons impossible unless outside developers actively store it.” (Crime maps are ‘worse than useless’, claim developers http://www.guardian.co.uk/technology/2011/feb/02/uk-crime-maps-developers-unhappy?CMP=twt_iph).
This means that if you want to run comparisons over time you will have to download the files and store them on your own system each month, or find someone else who is already doing it.
The Guardian article also says:
“the Information Commissioner‘s Office (ICO) advised that tying crime reports to postcodes or streets with fewer than 12 addresses would render the individuals involved too identifiable. The police have also decided to remove data about murders or sexual assaults.“
With respect to the latter the help file on the Police UK site suggests otherwise:
“Crimes have been grouped into six categories following advice from the Information Commissioner’s Office. This doesn’t mean that the crimes listed under ‘other’ are not seen as important. Rather it ensures that for some of the more sensitive crimes there is even greater privacy for the victims.“
So which is it: murders and sexual assaults are not included at all or aggregated under “other”? Jonathan Raper says on his blog Placr News (“Five reasons to be cautious about street level crime data” http://placr.co.uk/blog/2011/02/five-reasons-to-be-cautious-about-street-level-crime-data/):
“Some data is redacted eg sexual offences, murder. The Metropolitan Police has already released this data to ward level though… and it is easy to cross-reference one murder in one ward to reports in the local press at the same time“
Data visualisations and mashups are becoming increasingly popular and make it considerably easier to assess a situation and view trends. The Guardian Datablog (http://www.guardian.co.uk/news/datablog), for example, encourages people to take data sets, mash them up and create their own visualisations, and upload a screen shot to the Guardian Datastore on Flickr (http://www.flickr.com/groups/1115946@N24/). It is vital, though, that the source of the data, whether the full data set or just a selection has been used, and whether or not it is going to be updated is clearly spelt out. All too often one or even all of these are missing from the accompanying notes, and in some cases there are no notes at all!
An example of good practice is “UK transport mapped: Every bus stop, train station, ferry port and taxi rank in Britain” (http://www.guardian.co.uk/news/datablog/2010/sep/27/uk-transport-national-public-data-repository). The posting clearly states the source (http://data.gov.uk/dataset/nptdr) and its coverage:
“A snapshot of every public transport journey in Great Britain for a selected week in October each year. The dataset is compiled with information from many sources, including local public transport information from each of the traveline regions, also coach services from the national coach services database and rail information from the Association of Train Operating Companies”
It then goes on to specify the time period (5-11 October, 2009) and the tools that were used to create the visualisation.
Another is the “Live map of London Underground trains” (http://traintimes.org.uk/map/tube/). This shows “all trains on the London Underground network in approximately real time“. The source is a live data feed from Transport for London (TfL) and the notes state that a “small number of stations are misplaced or missing; occasional trains behave oddly; some H&C and Circle stations are missing in the TfL feed.” It would be helpful to have a list of those missing stations, but the site has at least brought the issue of potential missing data to the users’ attention.
Returning to the Police.uk crime data, there are three major problems with the site for me as a researcher:
1. Are all crimes included in the database, or are some such as murders and sexual assaults excluded altogether or aggregated under “other”? More detailed and unambiguous scope notes please.
2. The street data level is useless. The markers are not exact locations but “near” to, there is no definition of “near”, no information on how the position of the marker is calculated or the geographic radius that it covers. It would be better to return to aggregated data at the ward level.
3. There are no options for comparing time periods and it seems that historical data will not be available on the web site. An ad hoc researcher will have to spend time and effort tracking down a developer or a web site that is downloading and keeping copies of all of the datasets as they are published.
The new crime data web site is a retrograde step. We need transparency and clarity rather than the muddle and confusion that has been generated by the lack of information on what is being provided.