Many thanks to Emily Scott who alerted me on Twitter to a priceless example of Google Knowledge Graph getting it totally wrong.
For those of you who don’t know what the Knowledge Graph is, it is the box that sometimes appears on the right hand side of your results, which pulls together information on your topic from a variety of sources. To quote Search Engine Land it is “a system that Google launched in May 2012 that understands facts about people, places and things and how these entities are all connected“. The problem is that Google quite often gets it wrong, although usually it is just one fact that is incorrect. One of the more well know examples is when Google decided that the American author Robert Greene was born in 1959 but died in 1592. It had confused him with the the 16th century English writer of the same name. As I always say in my Google workshops, never trust the information that appears in the Knowledge Graph. The data comes from different sources that may be referring to entities that are not related at all.
The example that Emily encountered, though, is in a league of its own. She was searching for frugivores (fruit eaters) and this is what Google’s Knowledge Graph suggested:
As far as I am aware fruit is not the preferred food of wolves, cats or lions. Clicking on the “View 45+more” option for representative species we see that Google is under the impression that cheetahs, killer whales, polar bears and leopards are also frugivores.
I’ll allow raccoons although I wouldn’t say that fruit is their preferred food. But, hey, what do I know about raccoons other than that my US friends tell me the little s***s raid trash cans and will eat anything they can get their paws on.
No doubt someone has already reported the error via the feedback link and someone at Google is busy correcting it. Enjoy and take screenshots while it is still there.
As well as advanced Google search features and alternative search tools I comment on the direction Google is going in. Note that this presentation was given before the Alphabet announcement. Those of you who have attended my Google and non-Google search tool workshops should know most of what is in the slides, but they might serve as a useful reminder.
As well as the general dumbing down and relentless removal of search options, it covers the new technologies that Google is experimenting with: artificial intelligence, driver-less cars, robotics, home environment sensors and controls. Some of this is already being integrated with search and “mobile”.
I am running a “New Google, New Challenges” workshop for UKeiG this autumn in Manchester and London. It concentrates on search, how the changes at Google are impacting the way it manages our search and presents results, and how to use what is left of the advanced search techniques and specialist databases for more relevant research results.
It seems that Google has dumped the Reading Level search filter. This was not one that I used regularly but it was very useful when I wanted more serious, in-depth, research or technically biased articles rather than consumer or retail focused pages. It often featured in the Top Tips suggested by participants of my advanced Google workshops.
It was not easy to find. To use it you had to first run your search and then from the menu above the results select ‘Search tools’, then ‘All results’, and from the drop menu ‘Reading level’. Options for switching between basic, intermediate and advanced reading levels then appeared just above the results.
So another tool that helped serious researchers find relevant material bites the dust. I daren’t say what I suspect might be next but, if I’m right, its disappearance could make Google unusable for research.
“The quality of web sources has been traditionally evaluated using exogenous signals such as the hyperlink structure of the graph. We propose a new approach that relies on endogenous signals, namely, the correctness of factual information provided by the source. A source that has few false facts is considered to be trustworthy. The facts are automatically extracted from each source by information extraction methods commonly used to construct knowledge bases. We propose a way to distinguish errors made in the extraction process from factual errors in the web source per se, by using joint inference in a novel multi-layer probabilistic model. We call the trustworthiness score we computed Knowledge-Based Trust (KBT). On synthetic data, we show that our method can reliably compute the true trustworthiness levels of the sources. We then apply it to a database of 2.8B facts extracted from the web, and thereby estimate the trustworthiness of 119M webpages. Manual evaluation of a subset of the results confirms the effectiveness of the method.”
If this is implemented in some way, and based on Google’s track record so far, I dread to think how much more time we shall have to spend on assessing each and every source that appears in our results. It implies that if enough people repeat something on the web it will deemed to be true and trustworthy, and that pages containing contradictory information may fall down in the rankings. The former is of concern because it is so easy to spread and duplicate mis-information throughout the web and social media. The latter is of concern because a good scientific review on a topic will present all points of view and inevitably contain multiple examples of contradictory information. How will Google allow for that?
Yesterday, on New Year’s Day, I came across yet another example of Google getting its Knowledge Graph wrong. I wanted to double check which local shops were open and the first one on the list was Waitrose. I vaguely recalled seeing somewhere that the supermarket would be closed on January 1st but a Google search on waitrose opening hours caversham suggested otherwise. Google told me in its Knowledge Graph to the right of the search results that Waitrose was in fact open.
Knowing that Google often gets things wrong in its Quick Answers and Knowledge Graph I checked the Waitrose website. Sure enough, it said “Thursday 01 Jan: CLOSED”.
If you look at the above screenshot of the opening times you will see that there are two tabs: Standard and Seasonal. Google obviously used the Standard tab for its Knowledge Graph.
I was at home working from my laptop but had I been out and about I would have used my mobile, so I checked what that would have shown me. Taking up nearly all of the screen was a map showing the supermarket’s location and the times 8:00 am – 9:00 pm. I had to scroll down to see the link to the Waitrose site so I might have been tempted to rely on what Google told me on the first screen. But I know better. Never trust Google’s Quick Answers or Knowledge Graph.
I recently received an email from a friend asking about whether it was acceptable for a student to cite Google as a source in their work. My friend’s instinct was to say no, but there was a problem getting beyond Google and to the original source of the answer. The student had used the Google define search option to find a definition of the term “leadership”, which Google duly did but failed to provide the source of the definition. My response to citing Google as a source is always “No” unless it is an example of how Google presents results or a comment on the quality (or lack of it) of the information that has been found. The results that appear at the top of the results, such as the definitions or the new quick answers, have been created and compiled by someone else so Google should not get the credit for it. In addition, what is displayed by Google in response to the search will vary from day to day and in creating these quick answers Google sometimes introduces errors or gets it completely wrong.
No source was given nor was there any indication of where this information had come from. Many have questioned Google on how it selects information for quick answers and why it does not always give the source. Google’s response is that it doesn’t provide a link when the information is basic factual data (http://searchengineland.com/google-shows-source-credit-quick-answers-knowledge-graph-203293), but as we have seen the “basic factual data” is sometimes wrong.
Quick answers above the Google results have been around for a while. Type in the name of a Premier League football club and Google will give you the results for the most recent match as well as the scores and schedule for the current season. Not being a fan myself I would have to spend some time checking the accuracy of that data or I could, like most people, accept what Google has given me as true. Looking for flights between two destinations? Google will come up with suggestions from its Google Flights; and this is where it starts to get really messy. I’ve played around with the flights option for several destinations. Although Google gives you an idea of which airlines fly between those two airports and possible costs, the specialist travel sites and airline websites give you a far wider range of options and cheaper deals. It is when we come to health related queries, though, that I have major concerns over what Google is doing.
Try typing in a search along the lines of symptoms of [insert medical condition of your choice] and see what comes up. When I searched for symptoms of diabetes the quick answer that Google gave me was from Diabetes UK.
At least Google gives the source for this type of query so that I can click through to the site for further information and assess the quality. In this case I am happy with the information and the website. Having worked in the past for an insulin manufacturer I am familiar with the organisation and the work it does. It was a very different story for some of the other medical conditions I searched for.
A search for symptoms of wheat intolerance gave me a quick answer from an Australian site whose main purpose seemed to be the sale of books on food allergies and intolerances, and very expensive self-diagnosis food diaries. The quality of information and advice on the topic was contradictory and sometimes wrong. The source for the quick answer for this query varied day by day and the quality ranged from appalling to downright dangerous. A few days ago, it was the Daily Mail that supplied the quick answer, which actually turned to be the best of the bunch, probably because the information had been copied from an authoritative site on the topic.
Today, Google unilaterally decided that I was actually interested in gluten sensitivity and gave me information from Natural News.
Many of the sources that are used for a Google quick answer appear within the first three results for my searches and a few are listed at number four or five. This one, however, came in at number seven. Given that Google customises results one cannot really say whether or not the page’s position in the results is relevant or if Google uses some other way of determining what is used. Google does not say. In all of the medical queries I tested relevant pages from the NHS Choices website, which I expected to be a quick answer in at least a couple of queries, were number one or two in the results but they have never appeared as a quick answer.
Do not trust Google’s quick answers on medical queries, or anything else. Always click through to the website that has been used to provide the answer or, even better, work your way through the results yourself.
So what advice did I suggest my friend give their student? No, don’t cite Google. I already know who Google currently uses for its define command but a quick way to find out is to simply phrase search a chunk of the definition. That took me straight to an identical definition at Oxford Dictionaries (http://www.oxforddictionaries.com/), and I hope that is the source the student cited.
These Top Ten search tips comes from an advanced workshop I recently ran for a group in Oxford. If this is the first Top Tips that you have read on this blog, a few words of explanation as to how these are generated. These are not my own personal tips but are nominated by people who have attended my full day workshops and tried out the various commands and techniques during the practical sessions.
The participants on this particular workshop were experienced, heavy duty researchers so I was keen to see what they came up with.
This is a regular in the Top Ten lists on this blog. It is an essential tool for making Google behave and forcing it to run your search the way you want it run but is well hidden. Google automatically looks for variations on your terms and sometimes drops terms from the search. To make Google carry out your search exactly as you have typed it in, first run your search, then click on ‘Search tools’ in the menu above your results. In the second line of options that appears click on ‘All results’ and from the drop down menu select Verbatim. This is very useful when searching for an article by title and Google decides to ignore the double quote marks, which it sometimes does if it thinks you don’t have enough results. If you are carrying out in-depth research it is worth using Verbatim even if your “normal” Google results seem to be OK. You may see very different content in your results list.
2. site: search and -site:
Use the site: command to focus your search on particular types of site, for example site:ac.uk for UK academic websites, or to search inside a large rambling site. If you prefer you can use the Advanced search screen at http://www.google.co.uk/advanced_search and fill in the site or domain box. You can also use -site: to exclude sites from your search.
Use the filetype: command to limit your research to PowerPoint for presentations, spreadsheets for data and statistics or PDF for research papers and industry/government reports.
4. Asterisk * betweem terms
Use the asterisk between two words to stand in for 1-5 words. This is useful if you want two of your keywords close to one another but suspect that there may often be one or two words separating them. For example solar * panels will find solar photovoltaic panels, solar water heating panels etc.
5. Numeric range search
This command is unique to Google. Use it for anything to do with numbers – years, temperatures, weights, distances, prices etc. Simply type in your two numbers separated by two full stops as part of your search. For example to limit your search forecasts covering a future time period.
6. Incognito/private browsing
Even if you are not signed in to a Google account, Google personalises your results according to your search and browsing behaviour using the cookies that are stored on your computer. If you want to burst out of the filter bubble, as it is often called, use a private browser window or incognito (Chrome). Google will then ignore tracking and search cookies on your machine. To call up a private browser or incognito window use the following keys:
Chrome – Ctrl+Shift+N
FireFox – Ctrl+Shift+P
Internet Explorer – Ctrl+Shift+P
7. Public Data explorer The Public Data Explorer is one of Google’s best kept secrets. It can be found at http://www.google.com/publicdata/ and allows you to search open data sets from organisations such as the IMF, OECD, IM, Eurostat and the World Bank. You can compare the data in a number of ways and there are several charting options.
8. Repeat search terms
If you are fed up with seeing the same results for a search repeat your main search term or terms. This often changes the emphasis of your search and the order in which the results appear.
9.Change order of terms
Changing the order in which you type in your search terms can change the order of your results. The pages that contain the terms in the order you specified in your search are usually given a higher weighting. This is another useful tip for when you are stuck in a search rut and are seeing the same results over and over again.
10. Different country versions
The country versions of Google give priority to the country’s local content, although it might be in the local language. This is a useful strategy when searching for research groups, companies and people that are active in a specific country. Use the standard ISO two letter country code, for example http://www.google.fr/ for Google France, http://www.google.it/ for Google Italy. It is also worth trying your search in Google.com. Your results may be more international or US focused and Google usually rolls out new search features in Google.com before launching in other country versions. If Google insists on redirecting you to your own local country version, go to the bottom right hand corner of the Google home page and you should see a link to Google.com.
I have several Google accounts used for different purposes. I set up the first in the very early days of Google -long before even Gmail arrived on the scene – in order to manage analytics and what I then called “serious stuff” related to my business website. I subsequently used it for managing my YouTube videos. I set up a second account when Google Labs and Gmail came along and regarded that as my experimental acccount. Gradually, I used the second one more and more as my main account but kept the first for my business website applications. When Google+ came along I “upgraded” the second account and set up a profile.
Everything was fine until one day I tried to access my YouTube videos that were linked to my first, non-Google+ account. YouTube encouraged me to set up a Google+ profile for this account but I declined. YouTube responded by making my videos invisible to everyone, including myself! So I gave in and set up a second Google+ profile.
If only that had been the end of it. People started adding this new profile to their circles rather than my main one. I tried to find ways around this but in the end decided to just abandon the YouTube videos and delete the superfluous Google+ profile. It is easily done via your Google+ settings page but of course there are numerous dire warnings of all the wonderful things that you will no longer be able to enjoy (not a lot actually!). Despite what has been implied in the past deleting or what Google calls “downgrading” your Google+ account does NOT delete your ordinary Google account.
This is a feature which I have been seeing on and off for a few months so I’m not sure if it is one of their experiments or if it is being rolled out gradually. It’s very simple: advertisements that appear at the top of your results lists and in the panel to the right are marked with a little yellow box with ‘Ad’ written inside.
Over the years it has become harder to identify ads at the top of results as the pale pastel backgrounds to them became more subtle. It has been suggested that the more obvious marker is a consequence of discussions between Google and various regulatory authorities.
News and comments on search tools and electronic resources for research