Tag Archives: archives

More UK information vanishes into GOV.UK

Just when you’ve finally worked out how to search some of the key UK government web resources they disappear into the black hole that is GOV.UK.

The statistics publication hub went over a few weeks ago and the link http://www.statistics.gov.uk/ now redirects to http://www.gov.uk/government/statistics/announcements. Similarly, Companies House is now to be found at http://www.gov.uk/government/organisations/companies-house and the Land Registry is at http://www.gov.uk/government/organisations/land-registry. Most of the essential data, such as company information and ownership of properties, can still be found via GOV.UK and in fact some remains in databases on the original websites. For example, following the links on GOV.UK for information on a company eventually leads you to the familiar WebCHeck service at http://wck2.companieshouse.gov.uk/. Companies House useful list of overseas registries, however, seems to have totally disappeared but is in fact hidden in a general section covering all government “publications” (http://www.gov.uk/government/publications/overseas-registries#reg).

Documents may no longer be directly accessible from the new departmental home pages so a different approach is needed if you are conducting in-depth research. GOV.UK is fine for finding out how to renew your car tax or book your driving theory test – two of the most popular searches at the moment – but its search engine is woefully inadequate when it comes to locating detailed technical reports or background papers. Using Google’s or Bing’s site command to search GOV.UK is the only way to track them down quickly, for example biofuels public transport site:www.gov.uk.  Note that you need to include the ‘www’ in the site command as site:gov.uk would also pick up articles published on local government websites. This assumes, though, that the document you are seeking has been transferred over to GOV.UK.

There have been complaints from researchers, including myself, that an increasing number of valuable documents and research papers have gone AWOL as more departments and agencies are assimilated Borg-like by GOV.UK. Some of the older material has been moved to the UK Government Web Archive at http://www.nationalarchives.gov.uk/webarchive/.

This offers you various options including an A-Z of topics and departments and a search by keyword, category or website. The latter is slow and clunky with a tendency to keel over when presented with complex queries. I have spent hours attempting to refine my search and wading through page after page of results only to find that the article I need is not there, nor anywhere else, which is an experience several of my colleagues have had. This has led to conspiracy theories suggesting that the move to GOV.UK has provided a golden opportunity to “lose” documents.

I am reminded of a scene from Yes Minister:

James Hacker: [reads memo] This file contains the complete set of papers, except for a number of secret documents, a few others which are part of still active files, some correspondence lost in the floods of 1967…

James Hacker: Was 1967 a particularly bad winter?

Sir Humphrey Appleby: No, a marvellous winter. We lost no end of embarrassing files.

James Hacker: [reads] Some records which went astray in the move to London and others when the War Office was incorporated in the Ministry of Defence, and the normal withdrawal of papers whose publication could give grounds for an action for libel or breach of confidence or cause embarrassment to friendly governments.

James Hacker: That’s pretty comprehensive. How many does that normally leave for them to look at?

James Hacker: How many does it actually leave? About a hundred?… Fifty?… Ten?… Five?… Four?… Three?… Two?… One?… *Zero?*

Sir Humphrey Appleby: Yes, Minister.

From “Yes Minister” The Skeleton in the Cupboard (TV Episode 1982) – Quotes – IMDb  http://www.imdb.com/title/tt0751825/quotes 

For “floods of 1967″ substitute “transfer of files to GOV.UK”.

Tweets from the past

Embarrassed by some of your first tweets from 2007? Wish you hadn’t got involved in that drunken virtual brawl on Twitter last Christmas? There was a time when you could safely assume that those ramblings would be lost in the mists of Twitter’s archive never to be seen again. A search on Twitter would only give the last few days worth of postings and Google no longer archives the whole of Twitter. True, the Library of Congress does keep copies of every single tweet for posterity but access is only allowed for serious research purposes. So far, the Library has received  about  400 inquiries but has not yet been able to provide access (http://blogs.loc.gov/loc/2013/01/update-on-the-twitter-archive-at-the-library-of-congress/). So you can breathe easily again? Unfortunately not.

There are commercial organisations such as Datasift (http://datasift.com/) and Gnip (http://gnip.com/) that charge an arm and a leg for analysing tweets and other social media comments, but the cost puts their services out of the reach of the casual searcher. You may find, though, that your forthright hashtagged tweets at a conference have been recorded for all to see free of charge (Sharing (or Over-Sharing?) at #ILI2012, http://ukwebfocus.wordpress.com/2012/11/02/sharing-or-over-sharing-at-ili2012/). And Twitter, itself, is finally providing access to historical tweets.

You can now download your entire collection. Go to your Twitter home page, click on the cog wheel in the upper right hand corner and select settings.

Twitter Settings

At the bottom of the Settings page is a link to request your archive.

Request your archive

You should receive an email a few minutes later with a download link. The file is zipped and once you have unpacked it you can browse your tweets by year and month or search the archive using keywords or hashtags.

Downloaded Twitter Archive
Browse downloaded Twitter archive by year and month
Search downloaded Twitter archive
Search downloaded Twitter archive

I have not been able to work out how often you are allowed to download your archive and, rather annoyingly, there is no top-up option.

Twitter also runs searches on its entire archive – sort of. There is no obvious date option at the moment, not even under advanced search, so it is appears to be all or nothing, and it does not give you everything straightaway. I thought I would have a look at the tweets on Internet Librarian International 2009, hashtag #ili2009, and was surprised that there seemed to be so few. I scrolled down to the bottom of the results and saw “You’ve reached the end of the Top Tweets for #ili2009″ with a link to “View all tweets”. Twitter then loaded the remaining tweets as I continued to scroll down the page. About Top Tweets Twitter says:

“We’ve built an algorithm that finds the Tweets that have caught the attention of other users. Top Tweets will refresh automatically and are surfaced for popularly-retweeted subjects based on this algorithm. We do not hand-select Top Tweets.”

There are also links at the top of the results page that enable you to view Top, All, and tweets from just ‘People you follow’.

Twitter Archive search

There are in fact advanced search commands that can be used to include a date range in your search (see https://support.twitter.com/articles/71577 for details). Changing my search to #ili2009 since:2009-10-01 until:2009-10-31 did seem to work. I am not convinced, though, that Twitter is giving me everything, even when I choose ‘All’. It’s a start and long overdue, but I’m not going to abandon my own archiving strategies just yet.