Tag Archives: web archives

The case of the disappearing press release

UK government departments and organisations frequently change their names, merge or disappear altogether. The same applies to their websites and documents held on those sites. Tracking down copies of older reports, data and superseded guidelines and regulations is becoming increasingly difficult, especially as so many sites are now being closed down. Information is supposed to be transferred to the new Gov.uk web site (http://www.gov.uk/) but historical information is in danger of vanishing altogether.

I recently needed to get back to a press release issued by the Potato Council (yes, there really is such a thing!) dated November 9, 2007. The title of the document was “Provisional Estimate of GB Potato Supply for 2007” and I had the original URL in my notes. The URL is no longer on the Potato Council’s web site and searching the site failed to turn up the document. Searching the Potato Council’s web site using the Google site: command also failed to find it. I next ran the URL through Google, Bing and DuckDuckGo and found 2 references to it in research papers but not the press release itself.

As I had the URL my next stop was the Internet Archive Wayback Machine (http://www.archive.org/) but the archive found nothing. The Wayback Machine periodically takes snapshots of web sites and lets you browse those copies by date. You can enter the URL of a home page or an individual page. The snapshots are not taken every time a website changes so there are gaps in its coverage, and a page or document can be missed. Hoping that the URL might have changed at some point I browsed copies of the Potato Council’s site for late 2007 and early 2008, but no joy.

Next I tried the UK Government Web Archive at the National Archives (http://www.nationalarchives.gov.uk/webarchive/). This is similar to the Wayback Machine but concentrates on UK government sites and related official bodies. One of the options is to browse the A-Z directory. I found fewer archive copies than in the Wayback Machine but hoped that the one entry for 2008 might come up trumps. Unfortunately it did not.

Archive copies of the Potato Council web site

Another possibility was that Zanran (http://www.zanran.com/) might have a copy. Zanran concentrates on indexing and searching information contained in charts, graphs and tables of data. It archives copies of the documents and I have used it several times to track down information that has been removed from the live web. A search on potato supply estimate UK 2007 came up with a list of results with my document at the top.

Zanran search result

At first glance, it does not appear to match the document I am looking for because the title is different. The titles listed by Zanran are not always those of the whole document but the labels or captions associated with the individual charts and tables. If you hover over the thumbnail to the left of the entry you can see a preview of a much larger section to make sure you have the right document. Clicking on the thumbnail or title will usually take you to Zanran’s archive copy.

Had I not found the press release on Zanran, I would next have contacted the Potato Council. My experience, though, is that very few organisations are able or willing to supply older documents such as press releases. My last resort would have been to contact the authors of the two papers I had found via Google to see if they had kept copies.

I usually keep copies of all papers and pages that I use as part of my research on major projects but inevitably there are times when I forget. As demonstrated above, there are several tools that can be used to try and track down documents that have disappeared from the web but success is not guaranteed.