UK company information now free of charge

A year ago Companies House announced that they were going to make all of their company information available free of charge to everyone. The press release was short on detail and many of us wondered what format the data would be in and how easy it would be to use. Daily files containing accounts data registered on the previous day were already available but these are huge zip files that, when unpacked, contain files with meaningless names. (http://download.companieshouse.gov.uk/en_accountsdata.html). Unless you have software that can manage and search the data it is impossible to identify which files contain information on the company you are researching. Companies-House-Free-1

 

For most of us the files are useless.  Was this to be the format of the free service? Thankfully, no.

A new beta service at http://beta.companieshouse.gov.uk/ now enables you to search for companies by name or number and obtain free of charge:

  • Company overviews
  • Current and resigned officers
  • Document images
  • Mortgage charge data
  • Previous company names
  • Insolvency data

Companies-House-Free-2

For the officers you can see what other companies they are involved with. What you cannot do at the moment is search by director name from the start. That is a “planned feature” as are disqualified directors search, company monitoring, company name availability, dissolved companies and overseas data. For those options you have to revert to the old WebCHeck service at http://wck2.companieshouse.gov.uk/.

The new beta service is easy to use and at last we have access to UK company documents and accounts free of charge. So, does this mean the end of services such as Company Check (http://companycheck.co.uk/) and DueDil (http://www.duedil.com/)? Not necessarily. Company Check, for example, already has an option for searching by director name and there are also useful charting, monitoring and structure options as well as access to some European companies. They also offer access to risk scores, credit reports and County Court Judgments (all priced). All of these services only allow you to search for companies one at a time: there is no multi-criteria search that you can use to find companies by turnover, number of employees, industry sector for example. Neither can you compare companies or conduct a detailed peer group analysis. For that you still have to use priced services such as BvD (http://www.bvdinfo.com/)

Overall, a move in the right direction and ideal if your needs are simple, for example accounts and director information for a live company. But look carefully at what features are available before you cancel your subscription service.

 

Business information – sources and search techniques

Business Information - sources and search techniques

I am running my full day business information, sources and search techniques workshop for the Commercial, Legal and Scientific Information Group (CLSIG).

Date: Thursday, 16 July 2015,  9:30am to 4:30pm

Venue: CILIP, 7 Ridgmount Street, WC1E 7AE London . See map: Google Maps

Cost:  CLSIG/CILIP Members £85;  Non-members  £100; Concessions £50

Contact for bookings: Marie.cannon@nortonrosefulbright.com

For further details of the workshop content contact  karen.blakeman@rba.co.uk

Search engines, government and official information sources, and the EU regulatory environment are continually changing. All of these affect how we search and the information that is presented to us. In some cases information may be deliberately excluded from our results. This one day workshop will look at what’s new, key resources for business and official information, and how to use search tools to ensure you are picking up everything that you need. There will be time for practical sessions so that you can try some of the exercises provided, or experiment with your own searches. Lunch and refreshments are included.

Topics covered include:

  • effect of EU legislation on research and due diligence
  • increase in official open data – accessibility, quality, usability
  • changes to Google and other search tools, and their impact on research
  • starting points, evaluated listings and government sources
  • company information: official sources; free open data sources worldwide; companies that repackage official company information – pros and cons
  • news sources and alerting services
  • the value of social media and professional networks for business intelligence
  • statistics, market and industry data

Please email Marie Cannon to book your place (Marie.cannon@nortonrosefulbright.com)

Flickr messes up big time

My "abstract" cat, according to Flickr
My “abstract” cat (or possibly dog), according to Flickr

A few days ago Flickr revamped its website yet again. Flickr users have become used to changes that offer no improvements in functionality, and it rarely comes as a surprise that some aspects of the service are sometimes made worse. The most recent updates did not seem to be that significant. The layout is different; search is just as bad as ever with odd and irrelevant results popping up; and you still cannot directly edit an incorrectly, Flickr assigned location. The last is possible but it involves a somewhat Heath Robinson approach, more of which in a separate posting.

This time, though, Flickr has made a huge mistake. It has been using image recognition technology for about a year to automatically generate tags for users’ photos but, until now, those tags have been hidden from users. They are now visible. The official announcement is on the Help Forum, Updates on tags (http://www.flickr.com/help/forum/en-us/72157652019487118/) followed by many pages of users comments, mostly negative. Flickr’s mistake is not in making the tags visible or doing the tagging at all, but in not allowing users the option to opt-out or offering a global tag deletion tool.

The computer generated tags have been added retrospectively to everyone’s photos, so some of us now have the prospect of checking thousands of images for incorrect or irrelevant tags. My experience, so far, is that most of them are useless. I honestly cannot see how the tags “indoor” or “outdoor”, which seem to be applied to the majority of my photos, are helpful in a search. If the auto generated tags have already been used in Flickr’s search it would explain why the results are often rubbish.

It is easy to spot the difference between user and Flickr generated tags: the former are in a grey box and the latter in a white or light grey box.

Flickr-Tags
Flickr user and automatically generated tags

If you want to delete a Flickr generated tag you have to do it tag by tag, photo by photo. Do not go on a tag deletion frenzy just yet, though. There are reports that the deleted tags sometimes reappear.

Oddities that I have spotted so far in my own photostream include a photo of our local polling station auto-tagged with “shop” (http://www.flickr.com/photos/rbainfo/17209179077/), and an image of a building site tagged with “snow” (http://www.flickr.com/photos/rbainfo/17332657995/). I suspect that in the latter case Flickr was confused by the amount of dust and debris surrounding what remains of the buildings.

To see the full horror of what Flickr has done, click on the Camera Roll link on your Photostream page and then Magic View. My cat has been tagged several times as a dog and once as abstract, which I suggest should be replaced by “Zen”. And to a photo of three hippos in Prague Zoo have been added animal, ape, elephant, tortoise, baby, child and people (http://www.flickr.com/photos/rbainfo/8712618469/). Note that Magic View only uses Flickr auto generated tags; we users are obviously not to be trusted!

I admit that there are a handful of instances where Flickr has reminded me of potentially relevant tags, so I might be tempted by an option whereby Flickr suggests additional tags. But I want to make the final decision as to whether to add them or not. I most certainly do not want Flickr adding, without my permission, thousands of tags to my back catalogue. And by the way, Flickr, whatever happened to my privacy setting of who can “Add notes, tags, and people:Only you”, which you have clearly breached.

It is bad enough to have to deal with the rubbish that Google dishes out, but to have to cope with Flickr’s lunacy as well is too much. Flickr, you have seriously messed up this time. Many of us do know what we are doing most of the time when we tag our photos. Carry on down this route and you won’t just annoy your users but risk losing a substantial number of them, some of whom pay for Pro accounts.

Google dumps Reading Level search filter

It seems that Google has dumped the Reading Level search filter. This was not one that I used regularly but it was very useful when I wanted more serious, in-depth, research or technically biased articles rather than consumer or retail focused pages. It often featured in the Top Tips suggested by participants of my advanced Google workshops.

It was not easy to find. To use it you had to first run your search and then from the menu above the results select ‘Search tools’, then ‘All results’, and from the drop menu ‘Reading level’. Options for switching between basic, intermediate and advanced reading levels then appeared just above the results.

Google Reading Level comparison
Slide showing Google Reading Levels from one my search workshops

More details of how it worked are in the blog posting I wrote when it was launched in 2010 (http://www.rba.co.uk/wordpress/2010/12/13/x-factor-web-pages-are-advanced-says-googles-reading-level/).

So another tool that helped serious researchers find relevant material bites the dust. I daren’t say what I suspect might be next but, if I’m right, its disappearance could make Google unusable for research.

And you thought Google couldn’t get any worse

We’ve all come across examples of how Google can get things wrong: incorrect supermarket opening hours (http://www.rba.co.uk/wordpress/2015/01/02/google-gets-it-wrong-again/), false information and dubious sources used in Quick Answers (http://www.rba.co.uk/wordpress/2014/12/08/the-quality-of-googles-results-is-becoming-more-strained/), authors who die 400 years before they are born (http://googlesystem.blogspot.co.uk/2013/11/google-knowledge-graph-gets-confused.html), a photo of the actress Jane Seymour ending up in a carousel of Henry VIII’s wives (http://www.slate.com/blogs/future_tense/2013/09/23/google_henry_viii_wives_jane_seymour_reveals_search_engine_s_blind_spots.html) and many more. What is concerning is that in many cases no source is given. According to Search Engine Land (http://searchengineland.com/google-shows-source-credit-quick-answers-knowledge-graph-203293) Google doesn’t provide a source link when the information is basic factual data and can be found in many places. But what if the basic factual data is wrong? It is worrying enough that incorrect or poor quality information is being presented in the Quick Answers at the top of our results and in the Knowledge Graph to the right, but the rot could spread to the main results.

An article in New Scientist (http://www.newscientist.com/article/mg22530102.600-google-wants-to-rank-websites-based-on-facts-not-links.html) suggests that Google may be looking at significantly changing the way in which it ranks websites by counting the number of false facts in a source and ranking by “truthfulness”. The article cites a paper by Google employees that has appeared in arXiv (http://arxiv.org/abs/1502.03519) “Knowledge-Based Trust: Estimating the Trustworthiness of Web Sources”. It is heavy going so you may prefer to stick with just abstract:

“The quality of web sources has been traditionally evaluated using exogenous signals such as the hyperlink structure of the graph. We propose a new approach that relies on endogenous signals, namely, the correctness of factual information provided by the source. A source that has few false facts is considered to be trustworthy. The facts are automatically extracted from each source by information extraction methods commonly used to construct knowledge bases. We propose a way to distinguish errors made in the extraction process from factual errors in the web source per se, by using joint inference in a novel multi-layer probabilistic model. We call the trustworthiness score we computed Knowledge-Based Trust (KBT). On synthetic data, we show that our method can reliably compute the true trustworthiness levels of the sources. We then apply it to a database of 2.8B facts extracted from the web, and thereby estimate the trustworthiness of 119M webpages. Manual evaluation of a subset of the results confirms the effectiveness of the method.”

If this is implemented in some way, and based on Google’s track record so far, I dread to think how much more time we shall have to spend on assessing each and every source that appears in our results. It implies that if enough people repeat something on the web it will deemed to be true and trustworthy, and that pages containing contradictory information may fall down in the rankings. The former is of concern because it is so easy to spread and duplicate mis-information throughout the web and social media. The latter is of concern because a good scientific review on a topic will present all points of view and inevitably contain multiple examples of contradictory information. How will Google allow for that?

It will all end in tears – ours, not Google’s.

More UK information vanishes into GOV.UK

Just when you’ve finally worked out how to search some of the key UK government web resources they disappear into the black hole that is GOV.UK.

The statistics publication hub went over a few weeks ago and the link http://www.statistics.gov.uk/ now redirects to http://www.gov.uk/government/statistics/announcements. Similarly, Companies House is now to be found at http://www.gov.uk/government/organisations/companies-house and the Land Registry is at http://www.gov.uk/government/organisations/land-registry. Most of the essential data, such as company information and ownership of properties, can still be found via GOV.UK and in fact some remains in databases on the original websites. For example, following the links on GOV.UK for information on a company eventually leads you to the familiar WebCHeck service at http://wck2.companieshouse.gov.uk/. Companies House useful list of overseas registries, however, seems to have totally disappeared but is in fact hidden in a general section covering all government “publications” (http://www.gov.uk/government/publications/overseas-registries#reg).

Documents may no longer be directly accessible from the new departmental home pages so a different approach is needed if you are conducting in-depth research. GOV.UK is fine for finding out how to renew your car tax or book your driving theory test – two of the most popular searches at the moment – but its search engine is woefully inadequate when it comes to locating detailed technical reports or background papers. Using Google’s or Bing’s site command to search GOV.UK is the only way to track them down quickly, for example biofuels public transport site:www.gov.uk.  Note that you need to include the ‘www’ in the site command as site:gov.uk would also pick up articles published on local government websites. This assumes, though, that the document you are seeking has been transferred over to GOV.UK.

There have been complaints from researchers, including myself, that an increasing number of valuable documents and research papers have gone AWOL as more departments and agencies are assimilated Borg-like by GOV.UK. Some of the older material has been moved to the UK Government Web Archive at http://www.nationalarchives.gov.uk/webarchive/.

This offers you various options including an A-Z of topics and departments and a search by keyword, category or website. The latter is slow and clunky with a tendency to keel over when presented with complex queries. I have spent hours attempting to refine my search and wading through page after page of results only to find that the article I need is not there, nor anywhere else, which is an experience several of my colleagues have had. This has led to conspiracy theories suggesting that the move to GOV.UK has provided a golden opportunity to “lose” documents.

I am reminded of a scene from Yes Minister:

James Hacker: [reads memo] This file contains the complete set of papers, except for a number of secret documents, a few others which are part of still active files, some correspondence lost in the floods of 1967…

James Hacker: Was 1967 a particularly bad winter?

Sir Humphrey Appleby: No, a marvellous winter. We lost no end of embarrassing files.

James Hacker: [reads] Some records which went astray in the move to London and others when the War Office was incorporated in the Ministry of Defence, and the normal withdrawal of papers whose publication could give grounds for an action for libel or breach of confidence or cause embarrassment to friendly governments.

James Hacker: That’s pretty comprehensive. How many does that normally leave for them to look at?

James Hacker: How many does it actually leave? About a hundred?… Fifty?… Ten?… Five?… Four?… Three?… Two?… One?… *Zero?*

Sir Humphrey Appleby: Yes, Minister.

From “Yes Minister” The Skeleton in the Cupboard (TV Episode 1982) – Quotes – IMDb  http://www.imdb.com/title/tt0751825/quotes 

For “floods of 1967″ substitute “transfer of files to GOV.UK”.

Google gets it wrong again

Yesterday, on New Year’s Day, I came across yet another example of Google getting its Knowledge Graph wrong. I wanted to double check which local shops were open and the first one on the list was Waitrose. I vaguely recalled seeing somewhere that the supermarket would be closed on January 1st but a Google search on waitrose opening hours caversham suggested otherwise. Google told me in its Knowledge Graph to the right of the search results that Waitrose was in fact open.

Waitrose New Years Day Opening according to Google

Knowing that Google often gets things wrong in its Quick Answers and Knowledge Graph I checked the Waitrose website. Sure enough, it said “Thursday 01 Jan: CLOSED”.

Waitrose New Year opening hours according to Waitrose

If you look at the above screenshot of the opening times you will see that there are two tabs: Standard and Seasonal. Google obviously used the Standard tab for its Knowledge Graph.

I was at home working from my laptop but had I been out and about I would have used my mobile, so I checked what that would have shown me. Taking up nearly all of the  screen was a map showing the supermarket’s location and the times 8:00 am – 9:00 pm. I had to scroll down to see the link to the Waitrose site so I might have been tempted to rely on what Google told me on the first screen. But I know better. Never trust Google’s Quick Answers or Knowledge Graph.

The quality of Google’s results is very strained

I recently received an email from a friend asking about whether it was acceptable for a student to cite Google as a source in their work. My friend’s instinct was to say no, but there was a problem getting beyond Google and to the original source of the answer. The student had used the Google define search option to find a definition of the term “leadership”, which Google duly did but failed to provide the source of the definition. My response to citing Google as a source is always “No” unless it is an example of  how Google presents results or a comment on the quality (or lack of it) of the information that has been found. The results that appear at the top of the results, such as the definitions or the new quick answers, have been created and compiled by someone else so Google should not get the credit for it. In addition, what is displayed by Google in response to the search will vary from day to day and in creating these quick answers Google sometimes introduces errors or gets it completely wrong.

There have been several well documented instances of Google providing incorrect information in the knowledge graph to the right of search results and in the carousel that sometimes appears at the top of the page (see http://googlesystem.blogspot.co.uk/2013/11/google-knowledge-graph-gets-confused.html and http://www.slate.com/blogs/future_tense/2013/09/23/google_henry_viii_wives_jane_seymour_reveals_search_engine_s_blind_spots.html). The same problems beset the quick answers. For a short time, a Google search on David Schaal came up with a quick answer saying that he had died on April 11th, 2003! (As far as I am aware, he is still very much alive).

No source was given nor was there any indication of where this information had come from. Many have questioned Google on how it selects information for quick answers and why it does not always give the source. Google’s response is that it doesn’t provide a link when the information is basic factual data (http://searchengineland.com/google-shows-source-credit-quick-answers-knowledge-graph-203293), but as we have seen the “basic factual data” is sometimes wrong.

Quick answers above the Google results have been around for a while. Type in the name of a Premier League football club and Google will give you the results for the most recent match as well as the scores and schedule for the current season. Not being a fan myself I would have to spend some time checking the accuracy of that data or I could, like most people, accept what Google has given me as true. Looking for flights between two destinations? Google will come up with suggestions from its Google Flights; and this is where it starts to get really messy. I’ve played around with the flights option for several destinations. Although Google gives you an idea of which airlines fly between those two airports and possible costs, the specialist travel sites and airline websites give you a far wider range of options and cheaper deals. It is when we come to health related queries, though, that I have major concerns over what Google is doing.

Try typing in a search along the lines of symptoms of [insert medical condition of your choice] and see what comes up. When I searched for symptoms of diabetes the quick answer that Google gave me was from Diabetes UK.

Google Quick Answer - symptoms of diabetes

At least Google gives the source for this type of query so that I can click through to the site for further information and assess the quality. In this case I am happy with the information and the website. Having worked in the past for an insulin manufacturer I am familiar with the organisation and the work it does. It was a very different story for some of the other medical conditions I searched for.

A search for symptoms of wheat intolerance gave me a quick answer from an Australian site whose main purpose seemed to be the sale of books on food allergies and intolerances, and very expensive self-diagnosis food diaries. The quality of information and advice on the topic was contradictory and sometimes wrong. The source for the quick answer for this query varied day by day and the quality ranged from appalling to downright dangerous. A few days ago, it was the Daily Mail that supplied the quick answer, which actually turned to be the best of the bunch, probably because the information had been copied from an authoritative site on the topic.

Today, Google unilaterally decided that I was actually interested in gluten sensitivity and gave me information from Natural News.

Google quick answer for wheat intolerance

I shall leave you to assess whether or not this page merits being a reliable, quick answer (the link to the page is http://www.naturalnews.com/038170_gluten_sensitivity_symptoms_intolerance.html).

Many of the sources that are used for a Google quick answer appear within the first three results for my searches and a few are listed at number four or five. This one, however, came in at number seven. Given that Google customises results one cannot really say whether or not the page’s position in the results is relevant or if Google uses some other way of determining what is used. Google does not say. In all of the medical queries I tested relevant pages from the NHS Choices website, which I expected to be a quick answer in at least a couple of queries, were number one or two in the results but they have never appeared as a quick answer.

Do not trust Google’s quick answers on medical queries, or anything else. Always click through to the website that has been used to provide the answer or, even better, work your way through the results yourself.

So what advice did I suggest my friend give their student? No, don’t cite Google. I already know who Google currently uses for its define command but a quick way to find out is to simply phrase search a chunk of the definition. That took me straight to an identical definition at Oxford Dictionaries (http://www.oxforddictionaries.com/), and I hope that is the source the student cited.

Turn2Us database of hardship grants

I came across Turn2Us  via the Paul Lewis Money blog (http://paullewismoney.blogspot.co.uk/2014/11/hardship-grants-available.html) where there is an excellent overview of the service. Turn2Us is part of the Elizabeth Finn Care charity. As well as providing a useful benefits search tool there is a searchable database of over 3000 charities that distribute £288 million in grants to individuals in financial hardship every year. To find potential sources of grants in your location type in your postcode, gender and age. Some of the charities only provide grants for people who have worked for a particular company or in a particular industry, but there are many that offer support to the general population and can help with education costs, living costs or hardship in retirement.

When I typed in my postcode (Reading), age and gender it came up with a list of 72 charities. There were some that were bizarre in their specificity. The Edmund Godson Charity, for example, offers “one-off grants for people in need who wish to emigrate and who currently live in and around Woolwich, Shinfield near Reading, north east Herefordshire and Tenbury in Worcestershire”. I was also intrigued by one that provides grants and annuities for “older women in the UK who are not ‘of the artisan class'”, and was left wondering whether I would qualify.

Turn2Us

Those idiosyncrasies aside, there is a wide range of help available here from charities that are little known and not easy to find.

How to alienate and seriously annoy your users

LinkedIn is seriously annoying some of its users. Megan Roberts recently reported her experiences of the network’s data insecurity on her blog LinkedIn and data insecurity (http://meganjroberts.wordpress.com/2014/06/10/linkedin-and-data-insecurity/). But it seems they’ve upped their game.

I had (or possibly still do have) a Personal Premium account. As I don’t find the limited extra features of any use I decided to cancel my premium account about three weeks ago, well in advance of the renewal date. Having filled out the online forms I assumed that was all I had to do, but each time I logged in to my account it was still marked as a Premium account. So I went through the cancellation process again. I waited a few days but my account was still marked as premium. I went through the cancellation procedure again. My account was still labelled as Premium but when I went to try and cancel it a fourth time it was marked as already cancelled. Success? Well – no.

Today I checked my business bank account and saw that LinkedIn has debited my account for the renewal fee despite my cancellation. Perhaps I should have been alerted to potential problems when confirmation emails failed to arrive. But under my account settings the premium account was finally marked as cancelled so I assumed that was that.

I have raised a ticket with LinkedIn but I doubt I’ll get any sense from them – I never have done in the past. First thing in the morning I am reporting the debit to my bank as an unauthorised transaction.

Congratulations, LinkedIn, on developing a strategy that is guaranteed to thoroughly p*** off your users.

Update: LinkedIn have now apologised for the “misunderstanding”. My account has been reset to “basic” and they have refunded my money.

News and comments on search tools and electronic resources for research