Tag Archives: Search Strategies

Google still thinks coots are possibly cats (or cows)

I have been dining out on the ‘Google thinks cats are lions’ story for several months but decided that its inclusion in my presentation at INFORUM 2011 in Prague should be its last outing. (See my blog postings at http://www.rba.co.uk/wordpress/2011/02/12/google-decides-that-coots-are-really-lions/ and http://www.rba.co.uk/wordpress/2011/02/21/update-on-coots-vs-lions/ for the details on this story). Towards the end of my talk I pointed out that Google has now abandoned coots=lions and carries out what I consider to be a normal search for coots mating behaviour, or as normal as any Google search can be. I had checked in Google.co.uk, Google.com and Google.cz a couple of weeks before the INFORUM conference and coots were definitely black, medium sized water birds and not large furry mammals with huge fangs and claws. As I concluded my presentation, though, I saw a few people in the audience staring at their laptops and shaking their heads. One of them came up to me during the break and pointed out that Google Czech Republic was offering cats instead of coots for the first two results. This prompted a quick review of the Google coots/cats/lions situation.

The search: coots mating behaviour

Google.co.uk gives a reasonable set of results but having blogged and included details of the search in so many presentations and newsletters my own pages are taking over the top positions in the results.

Google UK coots search

Google.com gives similar results.

Google.cz however has different ideas. It offers me three articles from Google Scholar and then says “Did you mean cats mating behaviour” in Czech and gives me two results on that subject. The rest of the results are all about coots, so at least Google.cz is giving me my original search as an option rather than unilaterally deciding I really meant cats.

Google Czech Republic and Coots

Looking at other country versions of Google, Google.no and Google.se came up with similar results. Google Germany, however, thinks coots are cows and even throws in a Youtube video:

Google Germany Coots

I am not going to even begin to try and work out what is going on. Three of us nearly went mad attempting to get to the bottom of the original coots=lions oddity. But it does make one wonder even more whether Google can be trusted to come up with even a handful of useful results.

All About Google – Top Tips

As well as the “Anything BUT Google” sessions, I have also been running “All About Google” workshops. The participants are asked to come up with a group Top 10 Tips and a combined list from the last three events is listed below. Many tips were common to all three so the final list has 16 tips. I also spotted people experimenting with the Google Art Project (http://www.googleartproject.com/), Fusion Tables (http://www.google.com/fusiontables/), Google Custom Search Engines (http://www.google.com/cse), Google Internet Statistics (http://www.google.co.uk/intl/en/landing/internetstats/), and one person found Google Labs Transliteration (http://www.google.com/transliterate/) very useful.

1. Use the filetype: command or the file format option on the Advanced Search screen to limit your research to PowerPoint for presentations, spreadsheets for data and statistics or PDF for research papers and industry/government reports. Note that filetype:ppt, for example, will not pick up the newer .pptx so you will need to incorporate both into your strategy, for example filetype:ppt OR filetype:pptx

2. Use the plus sign (+) before a term or phrase to try and force an exact match – be aware, though, that Google sometimes still does what it wants with your terms – or use the minus sign immediately before a term to exclude pages that contain it. The minus sign can also be used with commands to exclude, for example, a specific site (-site:nameofsite.com) or a file format (-filetype:ppt) from your results.

3. Include the site: command in your strategy or use the domain/site box on the advanced search screen to focus your search on particular types of site, for example site:nhs.uk

4. Try the two proximity commands. An asterisk (*) between two words will look for your words in the order specified and separated by one or more terms, for example solar * panels. The AROUND(n) command, which is undocumented, looks for your terms in either order separated by the number of words (n) specified, for example solar AROUND(2) panels. Note that AROUND did not work for everyone on the workshop.

5. Usage rights. Use the Advanced Search screens for the web and image search to limit your search to Creative Commons material. The options are in the pull down menu under Usage Rights.

6. Use Google Realtime (http://www.google.com/realtime) for searching Twitter. Other social networks are supposedly included but the results are usually dominated by Twitter. Archives go back to February 2010 and there is a useful timeline that enables you to visualise activity over time and look at specific dates.

7. Use the tilde (~) before a term to search for synonyms. For example ~energy will search for energy, power, oil, gas, electricity or electric.

8. Wonder wheel. This can be found in the side bar to the left of your web search results page. Google pulls out terms and phrases from the top results and represents them as spokes on a wheel. Click on one of them and your search is revised and another wheel created. You can view the list of results to the right of the wheel. Note: the Wonder wheel is not available if you have Instant Search switched on.

9. Change the order in which you enter your search terms. This will change the order in which your results are presented and in some cases can change the search completely.

10. Repeat important terms to change the order in which results are presented. Like changing the order of your search terms, this can sometimes significantly alter the results.

11. Google Reader (http://www.google.com/reader). As well as using to aggregate RSS feeds that you have entered individually the Add Subscription box also allows you to search for new feeds using keywords.

12. Google Scholar (http://scholar.google.com/). Although there are serious limitations to Google Scholar and the advanced search options are unreliable it can be very useful in tracking down the details of a half remembered reference. One member of the workshop explained that students often fail to accurately note down articles mentioned in lecturers. The specialist databases do not always retrieve the references in these cases whereas Google Scholar often does.

13. Google Scholar for citations. Although far from comprehensive and sometimes inaccurate not everyone can afford the more reliable but expensive databases. (Note: although it does not cover all subjects it is worth looking at Microsoft Academic Search at http://academic.research.microsoft.com/as an alternative).

14. Quality. Just because you found something through a Google search does not mean it is true or a trusted source, or that it is the most relevant document. Young students in particular often need to be reminded of this.

15. Open up the side bars to the left of your results. The options change depending on the type of search (general web search, images, news, books, recipes) and it is the key to narrowing down your search, especially by date.

16. Stand your ground! Don’t let Google take over. Clear your web history, cache and cookies. If you are responsible for access to the internet in your information centre or library, set up the browsers so that web histories and caches are cleared everytime a user logs out.  (You may need to enlist help from IT to set this up)

 

Anything but Google – URLs

I omitted to include the URLs of some of the specialist tools mentioned in the Anything but Google presentation. You could Bing or Yahoo the names of the services (we’re not going to Google them are we?) but to save time I’ve listed them below.

ChemSpider – Database of Chemical Structures and Property Predictions
http://www.chemspider.com/
Owned by the Royal Society of Chemistry Chemspider links together compound information across the web and provides free text and structure search of millions of chemical structures. Search by systematic name, synonym, trade name, registry number, SMILES or InChI.

Biznar http://biznar.com/
Live federated search from Deep Web Technologies and covering 60 business collections. As well as presenting you with a standard list of results, the pages are organised into folders on the left hand side of the screen covering topics, authors, publications, publishers and dates (years).

TechXtra http://www.techxtra.ac.uk/
This is an initiative of Heriot Watt Universit providing a free service for finding articles, books,industry news, job announcements, technical reports, technical data, full text eprints, thesis and dissertations in engineering, mathematics and computing.

Scirus http://www.scirus.com/
Owned by Elsevier, Scirus covers scientific information. (See the About Us section for the full details). Some of the information is from free web resources but it also includes many priced articles.

PhilPapers: Online Research in Philosophy http://philpapers.org/
Directory of online philosophical articles and books by academic philosophers. Its purpose is “to facilitate the exchange and development of philosophical research through the Internet. Our service gathers and organizes philosophical research on the Internet, and provides tools for philosophers to access, organize, and discuss this research.”

Microsoft Academic Search http://academic.research.microsoft.com/
Currently concentrates on chemistry, computer science, engineering, mathematics and physics. It has advanced search options that actually work (unlike Google Scholar!), lists citations and has a wonderful Visual Explorer.

Not mentioned in the slides but discussed briefly during the session was HealthMash http://healthmash.com/. A semantic metasearch health search engine with “clustering and advanced linguistic capabilities.” I’d be interested in people’s experiences and views of this one.

Update on coots vs. lions

If you have landed on this page thinking that this is a post about your favourite football or rugby team, please note that this is an update on my earlier article ‘Google decides that coots are really lions’ (http://www.rba.co.uk/wordpress/2011/02/12/google-decides-that-coots-are-really-lions/). It has nothing to do with sporting activities unless you count trying to work out what Google is doing with your search! The original post was about how and why Google decided that a search on coots mating behaviour should really have been lions mating behaviour.

The first response to my posting was a comment from Arthur Weiss (http://www.rba.co.uk/wordpress/2011/02/12/google-decides-that-coots-are-really-lions/comment-page-1/#comment-14207).
He suggested that Google was treating coots and lions as synonyms (both are living creatures). I thought that was pushing synonyms too far even for Google. (Sorry, Arthur).

I then had two comments in quick succession from Susanna Winter via Twitter (@Mrs_Figaro). The first is at (http://twitter.com/Mrs_Figaro/statuses/36714410223341568):

Twitter comment on lions vs coots

Moving coots from the beginning to the end of the strategy resulted in an exact match and not a single lion in sight:

Mating behaviour coots

Changing the order of the search terms is a trick I often use to change the order of my results or bring up pages that might be buried in the hundreds or thousands, but I have never seen such a dramatic change such as this.

Susanna’s search strategy ‘coots feeding behaviour’, which came up with an exact match, muddied the waters even more. Perhaps there is a search frequency algorithm coming into play? Are there more searches for lions mating behaviour than for coots, but not lions feeding behaviour? I am not convinced that this explains Google’s insistence on looking for lions rather than our animal of choice. Susanna’s next tweet suggests what is going on (http://twitter.com/Mrs_Figaro/statuses/36715389190676480):

Google spelling correction

What you see is:

Google coots search minus lions

So Arthur was on the right track. (My apologies, Arthur).  What probably happened with our search is, as Susanna said, that Google first assumed a typo and then did a synonym search on cats. What puzzles me, though, is how Google arrived at cats from coots. Surely coyotes or goats would be nearer when it comes to typographical errors?

I have two final variations on our search to confuse you even further.

The first is repeating coots at the start of the strategy. An exact match:

Repeating coots in the search

Now move one of the ‘coots’ to the end of the strategy and Google asks “Did you mean lions mating behaviour coots”:

Repeating coots in the search

I give up!

Google decides that coots are really lions

First of all let us make sure we all know the difference between lions and coots. As far as I can recall, lions are huge, snarly, growly, land animals that are liable to eat you if you cross their path. This appears to be confirmed by Wikipedia (http://en.wikipedia.org/wiki/Lions) but of course Wikipedia could be wrong. Coots are  medium sized water birds (http://en.wikipedia.org/wiki/Coots) and the worst that could befall you should you antagonise one is a severe pecking.

I was walking by the Thames in Caversham today and took several photos of the birds on the river. One was of two coots who were having what appeared to be a minor domestic or an argument over territory, but a friend suggested to me that what I saw was coot mating behaviour. What do you do in a situation such as this? You Google.

My search on coots mating behaviour came up with:

Google''s interpretation of search on 'coots mating behaviour'

Where the [expletive deleted] did the lions come from?? I just do not understand how Google managed to replace coots with lions. One is a water bird with wings, feathers, and a beak and the other a large, aggressive land mammal with fur, claws and big teeth. But Google, yet again, has decided to go off and run its own search. (See my posting Oi! Google – you have seriously overstepped the mark http://www.rba.co.uk/wordpress/2011/01/03/oi-google-you-have-seriously-overstepped-the-mark/).

So did I get what I wanted by clicking on “Search instead for coots mating behaviour”? Yes I did, but Google still thinks I really want to search for lions and asks “Did you mean: lions mating behaviour”. Google has totally lost the plot.

What Google should have given me in the first place

And the photo that started it all? That can be found on my Flickr account at http://www.flickr.com/photos/rbainfo/5438769506/. I think you will agree that coots are very different from lions (http://commons.wikimedia.org/wiki/File:P_l_Bleyenberghi.jpg)

AROUND: Google proximity search operator

Several people have already blogged about Google’s AROUND proximity operator: Digital InspirationResearchBuzz, SearchReSearch and Phil Bradley to name just four. According to SearchReSearch the command has been available for 5-6 years, which begs the question “Why has no-one picked up on it before now?” Could it possibly be because the operator does not do what it says on the tin? Perish the thought and wash my brain out with soap and water for even considering such a thing. 

The AROUND command allows you to specify the maximum number of words that separate your search terms. The syntax is firstword AROUND(n) secondword. For example oil AROUND(2) production.

The reason I have not commented on AROUND so far is because – how can I put this politely – I am finding it difficult to find a search in which it is of practical value. I shall illustrate with just one of my searches, macular degeneration, but my experiences with other test and “real” searches are similar. When testing search features the relevance of the documents that appear on the first few pages of the results is more important than the number of  hits, especially as the latter are often guesstimates from Google and can vary enormously depending on which version of Google you use. Nevertheless, the numbers are interesting even if they only serve to confuse us further and I have included them with the screen shots. All of the following searches were run in Google.co.uk

Let’s kick off with a very basic version of my test search: macular degeneration

Number of results: 7,340,000

Macular Degeneration simple search

The results are relevant and as usual Google appears to be listing first those pages where the terms appear next to one another. If we did want to be more precise and reduce the number we could search for the phrase: "macular degeneration".

Number of results: 1,690,000

Macular degeneration phrase search

Not surprisingly the number of results has been reduced significantly to 1,690,000.

Let us now say that my enquirer has come back with an amendment to the original request. They have been told that there are several forms of macular degeneration, for example macular disciform degeneration, and they want a selection of articles covering as many of them as possible. I have a biomedical background and can easily identify the relevant phrases and run separate searches on them, but what if I didn’t have a clue where to start? I could use Google’s asterisk (*) between my two terms to stand in for one or more words.

The strategy macular * degeneration gives us a massive 21,500,000 results, far more than our first basic search if the numbers are to be believed.

Macular degeneration asterisk search

In just the first 6 results we have picked up vitelliform and disciform degeneration, and more are picked up in the subsequent 20-30 results.

Google’s search tips say “If you include * within a query, it tells Google to try to treat the star as a placeholder for any unknown term(s) and then find the best matches.” It is not clear from this whether the asterisk stands in for one or more terms. Adding more asterisks to the search does not alter the number of results, which in any case are only an estimate. We do, though, see very different content and now variations on our terms (for example macula)  are appearing emboldened in the page summaries.

Comparison of asterisk searches

We could try and force an exact match search by placing a plus sign before macular in our strategy, but let’s try and keep this exercise simple.

Now for three searches using AROUND(n). Note that AROUND must be in capital letters, otherwise Google will treat it as just another search term. Specifying the number of separating words as 1, 2 and 3 gave me 1,710,000, 1,710,000 and 1,720,000 results respectively.

Google AROUND operator

The results are very different from the searches incorporating the asterisk and AROUND(2) and AROUND(3) were identical. Also, it seems that with the AROUND operator Google is giving priority to documents where the terms are a phrase and not separated by any other words. It was only when I reached around 650 that I started to see phrases where my two terms were separated by one other word.

Using just AROUND without any number gave me 1,610,000 results that looked very similar to those obtained with AROUND(1).

Logically, one might think that macular AROUND(0) degeneration would be the same as a search on the phrase "macular degeneration". It isn’t!

Phrase versus AROUND(0)

Not only are the number of results different (AROUND(0) comes back with 4, 250,000 compared with 1,690,000 from the phrase search) but so is the content.

Finally, I decided to follow Phil Bradley’s lead and see what happens when I try and exclude the phrase from the AROUND(0) search: macular AROUND(0) degeneration -"macular degeneration". I got 43,000 results in which the terms seemed to appear anywhere within the document, in any order and separated by any number of other words.

In conclusion, despite what I said earlier I think AROUND does work but it is difficult to test because Google always seems to give priority to pages in which your terms appear as a phrase and not separated by any other words. Its effect is probably more obvious if you are dealing with a topic that would otherwise return a very small number of results. The ranking and sorting of the results changes significantly, though, when you use AROUND so it might be worth trying if you are fed up with seeing the same documents and sites again and again. In all of the test searches I have carried out so far I still prefer the asterisk, especially if I want to be able to identify expanded phrases quickly and easily. But, as the saying goes, your mileage may vary. Feedback on your own experiences, please.

Oi! Google – you have seriously overstepped the mark

Yes, I am talking to you Google and  this time you really have gone too far.

All I wanted to do was check up on the background of a photo I had taken of the wall surrounding the graveyard of a church in Reading. The church in question is St Laurence. We have all become accustomed to the “Did you mean….?” option at the top of our search results. I found it invaluable early in the morning or late at night when typos were inevitable in my search strategy: yes, thank you, I really did mean ‘widget manufacturers’ and not ‘wigdet manufacturers’. Recently, though, Google has abandoned the optional corrected search and now runs instead the corrected strategy as the default with yours as the extra option. Google has taken this a stage further and runs your search as it thinks fit.

So Google decided that I really meant to search for Saint Lawrence and has included that in the search. There is no option to search on just Saint Laurence:

Google St Laurence search

On this occasion there were some relevant pages in my results. But yes, Google, I really did want to search for Saint Laurence! Now, it seems, I have to prefix all of my search terms with a plus sign or enclose them in double quote marks to stop Google’s dictatorial behaviour.  But why should I have to do that?

In one of my presentations last year on Google vs. Bing/Yahoo I commented that Google would have to do something really stupid before users would switch to another search engine. For me, Google has done that really stupid thing. I am now seriously contemplating switching search engines for basic web searching. My final decision will be based on relevance of results and how quickly they are delivered. I have to spend too much time and click too many times to get them on Google

UPDATE: It has just got worse. I tried a search on the phrase “Saint Laurence” thinking Google would carry out an exact match search, but Google will have no truck with such obvious ploys. (Ignore the Twitter search at the top of the results screen – that is a Greasemonkey script add-on for FireFox).

Google search changes

I now have to click on the option for “Saint Laurence” to get results for the search I had originally requested. Putting a plus sign before my phrase in the search box does not change Google’s mind. “Excuse me, Google, but I do know what I am doing and when I tell you to carry out an exact match search I WANT AN EXACT MATCH SEARCH! Got it?”

My Online Information 2010 presentations

If you have not already spotted the links on Twitter, Facebook, LinkedIn etc  to the various presentations I gave at “Online” in London earlier this month here they are all in one place. I gave two talks as part of the free seminar programme that was part of the exhibition, a conference presentation and a pre-conference workshop. They all have a Creative Commons attribution non-commercial license assigned to them  (see http://creativecommons.org/licenses/by-nc/3.0/ for further information on the license).

Google’s New Search Features: has it gone too far?
1st December 2010

This presentation was given in the exhibition area as part of the free seminar and masterclass programme. I have added comments to some of the screen shots so that they make a little more sense to those who were not there.

Google’s New Search Features: has it gone too far

authorSTREAM
Slideshare

Challenges of Finding Quality Business Information
1st December 2010

A second presentation I gave as part of the exhibition free seminar programme. Again, I have annotated some of the screen shots.

authorSTREAM
Slideshare

Search Engine Wars: let battle commence
30th November 2010

This is a presentation I gave as part of the Online Information conference. It is quite different from the one I gave  with the same title to INFORUM in Prague earlier this year. I wish I could say it was because so much has changed since then: unfortunately very little has changed.

authorSTREAM
Slideshare

Guide to Using Social Media to Promote Your Organisation and Services
29th November 2010

This was a one day workshop pre-conference workshop. The slides merely formed a framework for the day.  There were more services and issues discussed within the group than are shown in the presentation.  The link given below, which is a direct link to a ppt file on the RBA Information Services web site, will not be available indefinitely.  The presentations on my social media page are updated every time I run a workshop or give a seminar on the topic.

PowerPoint Presentation (9.5 MB)