Tag Archives: Zanran

The case of the disappearing press release

UK government departments and organisations frequently change their names, merge or disappear altogether. The same applies to their websites and documents held on those sites. Tracking down copies of older reports, data and superseded guidelines and regulations is becoming increasingly difficult, especially as so many sites are now being closed down. Information is supposed to be transferred to the new Gov.uk web site (http://www.gov.uk/) but historical information is in danger of vanishing altogether.

I recently needed to get back to a press release issued by the Potato Council (yes, there really is such a thing!) dated November 9, 2007. The title of the document was “Provisional Estimate of GB Potato Supply for 2007” and I had the original URL in my notes. The URL is no longer on the Potato Council’s web site and searching the site failed to turn up the document. Searching the Potato Council’s web site using the Google site: command also failed to find it. I next ran the URL through Google, Bing and DuckDuckGo and found 2 references to it in research papers but not the press release itself.

As I had the URL my next stop was the Internet Archive Wayback Machine (http://www.archive.org/) but the archive found nothing. The Wayback Machine periodically takes snapshots of web sites and lets you browse those copies by date. You can enter the URL of a home page or an individual page. The snapshots are not taken every time a website changes so there are gaps in its coverage, and a page or document can be missed. Hoping that the URL might have changed at some point I browsed copies of the Potato Council’s site for late 2007 and early 2008, but no joy.

Next I tried the UK Government Web Archive at the National Archives (http://www.nationalarchives.gov.uk/webarchive/). This is similar to the Wayback Machine but concentrates on UK government sites and related official bodies. One of the options is to browse the A-Z directory. I found fewer archive copies than in the Wayback Machine but hoped that the one entry for 2008 might come up trumps. Unfortunately it did not.

Archive copies of the Potato Council web site

Another possibility was that Zanran (http://www.zanran.com/) might have a copy. Zanran concentrates on indexing and searching information contained in charts, graphs and tables of data. It archives copies of the documents and I have used it several times to track down information that has been removed from the live web. A search on potato supply estimate UK 2007 came up with a list of results with my document at the top.

Zanran search result

At first glance, it does not appear to match the document I am looking for because the title is different. The titles listed by Zanran are not always those of the whole document but the labels or captions associated with the individual charts and tables. If you hover over the thumbnail to the left of the entry you can see a preview of a much larger section to make sure you have the right document. Clicking on the thumbnail or title will usually take you to Zanran’s archive copy.

Had I not found the press release on Zanran, I would next have contacted the Potato Council. My experience, though, is that very few organisations are able or willing to supply older documents such as press releases. My last resort would have been to contact the authors of the two papers I had found via Google to see if they had kept copies.

I usually keep copies of all papers and pages that I use as part of my research on major projects but inevitably there are times when I forget. As demonstrated above, there are several tools that can be used to try and track down documents that have disappeared from the web but success is not guaranteed.

Zanran – great for data in tables, charts and graphs

I regularly mention Zanran (http://www.zanran.com/) in my workshops on search and business information, and it often finds its way into the Top Tips compiled by the delegates at the end of the day.

Zanran is not a Google alternative. Rather than search the text of web pages it extracts and indexes numerical data presented as tables, charts and images in PDF reports, spreadsheets and ordinary web pages. You can simply type in your search terms but there are additional options for narrowing down the search by location of the web server, specifying an individual site, selecting a time period and limiting by file type.

The results page lists the files it has found with an extract highlighting the content containing your terms. In this example I am looking for data on agricultural methane emissions in the UK.

Zanran search results

To the left of each entry is a thumbnail. Moving the cursor over the thumbnail brings up a preview of the page containing the relevant chart, table or image. This enables you to immediately assess the relevance of the data without having to download and go through a lengthy document.

Zanran document preview

If you click on the thumbnail or the title to view the whole document you have to register (free of charge) as copies of the indexed documents are stored by Zanran. If you prefer to go to the original document click on the URL button attached to the summary of the page and click on the link that is then revealed. Unfortunately, you may see “page not found” especially if it is on a UK government department web site. Many of these have now been closed and their content archived making it difficult to track them down. Registering with Zanran is by far the easier option. Also, rather than deluge you with documents from a single site, as Google all too often does, Zanran gives you a link telling you if and how many other results are available on a site.

How does it compare with Google? Well, Google did come up with relevant results for my search but I had to spend a lot of time ploughing through them to identify the best documents. And Google did not pull up in the first 100 results the very useful archived UK government documents that Zanran gave me.

Google v Zanran

If you are looking for data or statistics Google still does a very good job but I recommend you also run a  search in Zanran. It may well come up with a real gem, as it often has for me.