Flickr pulls out all the stops with automatic tagging

Flickr really went to town with its automatically generated tags for my photo today. The photo was of star anise, which I took for the Challenge Friday Group; this week’s challenge is stars. A straightforward, simple photo of the spice, I thought but Flickr read a lot more into it.

This is the photo:

Star_Anise_20160311_Signed

And these are the tags:

Flickr_Star_Anise_TagsThe ones at the top with the grey background are the tags that I assigned to the photo. The rest are Flickr’s.

I deleted several of them before I thought of taking a screenshot for posterity but what is left gives  you a flavour of the range of concepts that Flickr feels are relevant.  Some of them I agree with (pattern, star shape, symmetry). Others I shall delete  as they are of no help to anyone searching on them (foliage, leaf, landscape, tree, forest, blossom,  pastel).

I do, though, rather like “minimalism” even though the structure and complexity of flavour and aroma  of star anise is far from minimalist. It  stays.

Business information key resources and search strategies – Top 10

The participants  of the business information workshop I ran on March 8th  had a variety of interests: search strategies and commands for Google et al,  UK government information, statistics, open data, social media, companies, locating scientific research.  So it was quite tough limiting the Top Tips that I asked them to nominate at the end of the day to just 10.

This is what they came up with.

  1. Get to know the key resources and starting points for different types of business information e.g. Companies House, OFFSTATS and go direct to those rather than Google. It will save you time in the long run.
  2.  Verbatim. An invaluable tool for research when Google insists on rewriting your search and dropping terms. To make Google search for all of your terms without variation, but in any order, first run your search. Then click on ‘Search tools’ in the line of options above your results. In the second line of options that appears click on ‘All results’ and from the drop down menu select Verbatim. If you are carrying out in-depth research it is worth using Verbatim even if your “normal” Google results seem to be OK. You may see very different content in the Verbatim list.
  3. Combine advanced search commands such as site: and filteype: to focus your search on types of information (PDF reports, PPT presentations, spreadsheets containing data) and websites (government, academic, individual sites). Also try using the minus sign to exclude documents containing specific terms or sites that are irrelevant.
  4. Phil Bradley’s UK Newspapers Google Custom Search Engine. http://www.philb.com/nationaluknewspapers.htmlPhilB_News_Search
    A relatively new tool that enables you to search all of the major national UK newspapers and regional newspapers. A real time saver if you are searching for local information on a local business or entrepreneur and don’t want to have to track down all the local papers and search them one by one.
  5. OFFSTATS – The University of Auckland Library http://www.offstats.auckland.ac.nz/ A good starting point for official statistical sources by country, region subject or combination of categories. All of the content in the database has been chosen and quality assessed by staff at The University of Auckland Library.
  6. Zanran http://zanran.com/ A tool for searching information contained in charts, graphs and tables of data. Enter your search terms and optionally limit your search by date and/or format type. Zanran comes up with a list of documents that match your criteria with thumbnails to the left of each entry. Hover over the thumbnail to see a preview of the page containing your data and further information on the document.
  7. Advanced Twitter Search. http://twitter.com/search-advanced Essential tool if you are using Twitter to look for news on product developments, announcements, conferences, discussions on technologies/companies, or how companies interact with customers.
  8. Wayback Machine http://www.archive.org/ Want to see what was on a website a few years ago or trying to track down a document that seems to have vanished from the web? Try the Internet Archive Wayback Machine at http://www.archive.org/. Enter the URL of the website or document and you should then see a calendar of the snapshots that the archive has. Choose a date from the calendar to view the page. The archive does not have everything but it is worth a try. See also the UK National Archives of old government websites and pages at http://www.nationalarchives.gov.uk/webarchive/
  9. OUsefulInfo, http://blog.ouseful.info/ “Trying to find useful things to do with emerging technologies in open education and data journalism”. Maintained by Tony Hirst, this blog has useful information and descriptions of what can be involved when dealing with and manipulating open data.
  10. DuckDuckGo  http://duckduckgo.com/ This was not covered in the workshop but one of the participants recommended it as a useful alternative to Google. Aside from the absence of tracking and personalisation it provides different and a greater variety of results when compared with Google.

Edited highlights of the workshop slides can be found on authorSTREAM and Slideshare.

My next business information related workshop is Discover Open Data on  the 7th April.  The  next advanced Google workshop (New Google, New Challenges) is on the 13th April and the Essential non-Google search tools is on  the 12th April.

Business information resources and search strategies

Statista_2I’ve finally finished “updating” my slides and notes for the first of this year’s workshops on business information, which takes place next week. It was not so much an update, more  a rewrite. With all the changes at Google, UK government website transfers and disappearances, the “right to be forgotten” and the many website design changes by business resources to make them mobile friendly I was virtually starting from scratch.

Next week’s workshop is being organised by TFPL in central London on Tuesday 8th March.  TFPL have made the event the “Course of the Week” and reduced the price to £249, but you have to book by  midnight on Friday 4th March to get the reduced price.

The topics I shall be covering include:

How the legal and regulatory environment is affecting search and the provision of information

Starting points, evaluated listings and government sources

Company information: share prices, financials, official data

Statistics, market and industry data, open data

News sources and alerting services

Essential search techniques for tracking down business information

How to use social media and professional networks for business intelligence

Where to find older, archived material

As usual, there will be practical sessions for people to try out resources and search techniques for themselves.  If you are interested go to the TFPL website or contact their learning time on 020 7378 5477.

Google advanced search – get it right!

When running advanced search workshops, and especially Google sessions, I prefer not to dwell on commands and search options that are no longer supported. They are gone and that is that, and it is far better to concentrate on how to get the best out of what is left. Of course it is unavoidable when your slides have been prepared several days before the event and  Google decides to pull the plug on one of your favourite search features just before you start! Similarly I tend not to show “this is how NOT to use….” a command  or incorrect syntax. It is often the incorrect format that one remembers.  Recently, though, I have added slides  to my presentations that cover both defunct commands and errors in syntax and format.

The problem is that not only are many people unaware that some search options are no longer available but also some fact sheets and articles covering advanced search are getting it wrong. The recent Guardian article on top search tips for Google almost got it right  but referred to the tilde, which was dropped in 2013, and did not really understand how Google automatically looks for synonyms and variations on a term (see my earlier blog posting Guardian’s top search tips for Google not quite tiptop). I have also seen a couple of recently produced Google  fact sheets riddled with mistakes.

The wonderful thing about Google is that it can take the most tortuous and error ridden search string and still come back with something that is sensible – most of the time.  The downside of this is that one assumes the search query has worked as intended when in fact Google has totally rewritten the search for you.  At some point, though, Google will rewrite the search in such a way that it brings back rubbish. So, it is important to know what commands are available and how they should be used.

Let’s get started.

Plus (+) sign before a word to force an exact match.

This was discontinued in October 2011 because Google intended to use it as a way to search for Google+ pages. That has been abandoned and it is now a searchable character.  If you want to force an exact match search on a term precede the term with intext: for example intext:agriculture.

I have also seen examples claiming that a plus sign between words acts as a Boolean AND. No, it doesn’t.  If you do get different results when using + it is because Google is searching for that as well as your terms.

Tilde  (~) for synonyms

This was withdrawn in  June 2013  because not many people used it and it was no longer needed. Google now looks for synonyms by default.

thesaurus: for alternative terms

‘thesaurus:’ sort of works because Google treats ‘thesaurus’, having ignored the colon,  as a search term. So ‘thesaurus:eclectic’ will give you links to pages and websites of dictionaries and definitions that give synonyms for eclectic. It does not give you a straightforward list of alternatives in the same way that ‘define’ does. If you use thesaurus  you have to go the websites in turn to view the synonyms.

eclectic-thesaurus

eclectic-define

The asterisk *

The asterisk (*) is a placeholder for terms between two words e.g. solar * panels finds solar photovoltaic panels, solar PV panels, solar thermal panels. It is NOT a truncation symbol. Again, you might think it is because Google ignores the asterisk and automatically looks for  words that begin with the letters you have typed in.

The example I gave in my earlier blog posting was a search on phenobarb*. I expected Google to pick up references to phenobarbitone. It picked up 76,000 results including phenobarbital but there was no mention of phenobarbitone in the first 100.  Phenobarb without the asterisk picked up the exact same results.  A search on phenobarbitone, with and without the asterisk came up with 241,000 results. I have no idea how or when Google decides to stop looking for variations on your string but it is obvious from the above example that the asterisk is not a truncation symbol.

Do NOT capitalise the first letter of commands, and NO spaces

Commands such as intitle:, intext:, filetype: and site: must be all lower case and NO spaces between the colon and the search term. Capitalise the first letter or add a space after the colon or both and Google treats the command as an ordinary searchable word.

The correct format for an intitle: search is, for example, intitle:caversham and finds the following:

Google_Intitle_Correct

Capitalise the first letter of the command or insert a space or both and you find:

Google_Intitle_WrongI do understand why so many fact sheets, and presentations, show commands with an initial capital letter. You spend ages preparing your information and when you have sent off your slides for printing or converted your document to a PDF you discover that Microsoft Office has changed the format of the command. Because your search example is on a separate line with the command at the start Office, bless it, decides to auto-correct and capitalise the first letter. I know, it has happened to me! So, please, check and double check your support materials.

Google searches for all of your terms by default

Not always. If your search, as it stands, finds zero or a low number of results then Google will drop one or more terms that are usually shown as strikethroughs.  In the above screenshot you can see that the third entry in the results has a “Missing: caversham” at the end of the snippet.

If Google is dropping a term that is essential to your search then prefix it with intext:, for example intext:caversham. If you want all of your terms to be included, and without any variations, then use the Verbatim search option.  If you are using a desktop or laptop run your search and then click on the Search tools option at top of your results. A second line of options will appear. Click on All results and select Verbatim.  The layout and location of Verbatim on mobile devices will usually be different.

Double quotation marks around phrases

Double quotation marks around phrases, titles of papers, song titles, famous sayings etc. works most of the time. But, again, if Google finds zero or only a handful of results it will ignore the marks. Google may also alter the spelling of one or more words within the double quotation marks. Use Verbatim if you are sure that the phrase is correct and you want to bring Google to heel.

Full nested Boolean search

Google has NEVER supported full nested Boolean search. I still meet people who are adamant that Google does, but when pushed they admit that they often get unexpected results.  You can , though, use OR for alternative terms and the minus sign before a term to exclude documents containing that term.

This is how Google interprets the search (confectionery OR chocolate) AND (production OR manufacture) AND (france OR Germany OR UK OR switzerland) NOT belgium

Google_Boolean

Note that pages containing Belgium are included rather than excluded.

Remove the ‘NOT Belgium’ and this is what we see:

Google_Boolean_2

Add ‘-belgium’ to the end of the search instead of ‘NOT belgium’ and we get:

Google_Boolean_3

Running Verbatim on our original Boolean search shows that Google is treating AND and NOT as lower case, searchable words:

Google_Boolean_Verbatim

If you really want to use full Boolean, then get thee hence to Bing.

If you want to learn more about Google search Dan Russell, who works at Google,  is currently running an online course on Power Searching with Google.  Alternatively, if you want a more business or academic research and UK/European oriented workshop on what Google can do I am running an advanced Google workshop with UKeiG on April 13th, 2016.

Google’s Knowledge Graph a total fruitcake

Many thanks to Emily Scott who alerted me on Twitter to a priceless example of Google Knowledge Graph getting it totally wrong.

For those of you who don’t know what the Knowledge Graph is, it is the box that sometimes appears on the right hand side of your results, which pulls together information on your topic from a variety of sources.  To quote Search Engine Land it is “a system that Google launched in May 2012 that understands facts about people, places and things and how these entities are all connected“. The problem is that Google quite often gets it wrong, although usually it is just one fact that is incorrect. One of the more well know examples is when Google decided that the American author Robert Greene was born in 1959 but died in 1592. It had confused him with the the 16th century English writer of the same name. As I always say in my Google workshops, never trust the information that appears in the Knowledge Graph.  The data comes from different sources that may be referring to entities that are not related at all.

The example that Emily encountered, though, is in a league of its own. She was searching for frugivores (fruit eaters) and this is what Google’s Knowledge Graph suggested:

Google_Frugivore

As far as I am aware fruit is not the preferred food of wolves, cats or lions.  Clicking on the “View 45+more” option for representative species we see that Google is under the impression that cheetahs, killer whales, polar bears and leopards are also frugivores.

Google_Frugivore_2

I’ll allow raccoons although I wouldn’t say that fruit is their preferred food. But, hey, what do I know about raccoons other than that my US friends tell me the little s***s raid trash cans and will eat anything they can get their paws on.

No doubt someone has already reported the error via the feedback link and someone at Google is busy correcting it. Enjoy and take screenshots while it is still there.

Guardian’s top search tips for Google not quite tiptop

I have just been alerted by fellow search expert Alison McNab to an article by Samuel Gibbs (@SamuelGibbs) in the Guardian on top search tips for Google.  I had to double check the date of the article because although it is OK for the most part it has got a few things wrong, one of the commands was withdrawn some time ago,  and it has missed what I consider to be one of the most important Google search options.

So let’s have a look at the tips one by one.

  1. Exact phrase.

Yes, placing double quote marks around words usually makes Google search for the exact phrase. However, Google does sometimes ignore the quote marks.

2.  Exclude terms

Yes, preceding a term with a minus sign will exclude documents containing that term

3. Either OR

Yes, the OR command does work when searching on alternative terms – most of the time. Make sure the OR is in capital letters.

4. Synonym search

Tilde symbol (~) for a synonym search? No! Google withdrew it over two years ago  because not many people used it. Google now looks for synonyms by default. If you precede a term with a tilde Google ignores it and carries on as normal. I’ve just tried several searches with and without the tilde and get exactly the same results.

5. Search within a site

Yes. The site: command is one of the most powerful advanced search commands and enables you to search within a single site, for example site:www,gov.uk, or a type of site, for example site:ac.uk for UK academic sites.

6. The power of the asterisk

Yes, the asterisk can stand in for one or more terms between two words, for example solar * panels. No, it is not a truncation symbol.

The example given by The Guardian  is a search on architect*, which finds “architect, but also architectural, architecture, architected, architecting and any other word which starts with architect.” As with synonyms, Google searches for variations on a word by default.

I ran a search on phenobarb* expecting Google to pick up references to phenobarbitone. It picked up 76,000 results including phenobarbital but there was no mention of phenobarbitone in the first 100 documents.  Phenobarb without the asterisk picked up the exact same results. Excluding phenobarbitone by using the minus sign gave me 70,000 results.  A search on phenobarbitone, with and without the asterisk came up with 241,000 results.

7. Searching between two values

Yes. The number range search does work and is great for searching within a range of values or years.  For example:

chocolate consumption forecasts 2016..2020

top 10..100 UK car insurance companies

toblerone 1..5 kg

8. Search for word in the body, title or URL of a page

This covers the commands intext:, intitle: and inurl:.  All correct but intext: is especially useful in that it forces Google to include that term in the search. It is invaluable if you find Google dropping key terms from your strategy, which it does if you are likely to retrieve zero results or it thinks the number of results is too low.

9. Search for related sites

The related: command looks for similar sites, for example related:theguardian.com finds other news organisations. It works but only shows you 20-30 sites. Worthwhile using, though, if you want to broaden your search to other but similar organisations and only have one or two to start with.

10. Combine them

I wholeheartedly agree with this one. Once you have a few advanced commands under your belt you can really start to focus your search and retrieve more relevant results.

What’s missing?

I’m surprised that filetype: was not included. It is nearly always on the list of top tips that my advanced search workshop participants suggest at the end of the day.  It’s a quick and easy way of finding presentations (filetype:ppt, filetype pptx), government documents and research papers (filetype:pdf) and datasets (filetype:xls, filetype: xlsx, filetype:csv).

The major omission for me, though, is Verbatim. It is different from the rest in that it is not a command that you can type in. You have to run your search first. From the menu at the top of the results select ‘Search tools’, followed by ‘All results’  and  then ‘Verbatim’. Use this when Google is wreaking havoc on your search by leaving out terms and using weird and wonderful terms that have nothing to do with your subject. Verbatim will search on all of your terms without dropping any or looking for variations and synonyms.

Verbatim

If you are interested in learning more about advanced search in Google and other search tools, some of my past presentations and fact sheets are available at http://rba.co.uk/as/.  If you are interested in attending a workshop my public access training schedule for 2016 is at http://www.rba.co.uk/training/ (more events will be added shortly).

“Do not track” does not mean anonymous browsing

A question that I’m often asked is “do search engines that don’t track your search history also anonymize your IP address?” DuckDuckGo is the first search tool that often springs to mind with respect to “do not track”.  It does not store searches, web history or IP addresses when you use it to search. Also, it does not pass on the search terms you used to the sites that you visit. However, the sites that you visit will still be able to see your IP address.  See https://duckduckgo.com/privacy for further details.

Ixquick (http://ixquick.com/) and StartPage (http://startpage.com/) are similar but have an additional feature that gives you the option to display a page from the results list using a proxy. Run the search as normal and you’ll see the usual set of results. Next to each result you should see a “proxy” link. Click on that and you go through a proxy server making you invisible to the website you are visiting.

Ixquick

Any links that you subsequently click on and which are on the same site also go through the proxy. As soon as you follow any links that take you off that site then you are warned that you that you will be “unproxied”.

Ixquick2

The disadvantages of using the proxy option are that it can be slower, some functions on the page may not work, and I have come across some pages that do not display at all.

Articles and top tips in eLucidate

The latest eLucidate from UKeiG is now out at  and available at http://www.cilip.org.uk/uk-einformation-group/elucidate/elucidate-current-issue. My contributions to this issue are Alphabet Soup (about the changes and restructuring of Google), top tips on Exploiting Google and Kicking the Google Habit.

The two “top tips” articles came out of two workshops I facilitated for UKeiG in Manchester and came from the participants themselves. I am repeating the workshops – significantly updated following recent announcements –  next week in London; Essential non-Google Search Tools and New Google, New Challenges.  If you are interested and want to learn more, there is still time to book a place on either or both of the workshops.

Wayback Machine gets funding to rebuild and add keyword searching

The Wayback Machine (http://www.archive.org/), also known as the Internet Archive,  is always a popular site on my search workshops. It is a fantastic way of discovering how web pages looked in the past and for tracking down documents that are no longer on the live web.

Wayback-UKOLUG
UKOLUG Home Page 27th April 1999

It isn’t 100% guaranteed to have what you are looking for and at present you need the URL of the web site or document in order to use it. People often ask if keyword searching is possible; it isn’t at the moment but it will be.

The Internet Archive has received support from the Laura and John Arnold Foundation (LJAF) and will be re-building the Wayback Machine. When it is completed in 2017, the next generation Wayback Machine will have more webpages that are easier to find and will include keyword indexing of homepages.

Further details of the rebuild are on the Internet Archive blog at http://blog.archive.org/2015/10/21/grant-to-develop-the-next-generation-wayback-machine/

 

Google introduces RankBrain

Original at https://www.flickr.com/photos/healthblog/8384110298
Original at https://www.flickr.com/photos/healthblog/8384110298

We’ve known for some time that Google has been buying heavily into artificial intelligence and looking at applying it not only to its robotics and driverless cars projects but also to search. Now it is official: artificial intelligence and machine learning plays a major role in processing Google queries and is, Google says,  the third most important signal in ranking results. It has been named RankBrain.

Danny Sullivan covers the story in Search Engine Land and looks at the implications for search. There is a follow up story  by Danny that goes into more detail, FAQ: All About The New Google RankBrain Algorithm, and he makes a guess at what the number 1 and 2 ranking signals are (Google won’t say!).

Both are very interesting articles on how Google is using RankBrain in search especially the FAQ,  which is a “must read” if you want to begin to understand how Google is now handling your search.

News and comments on search tools and electronic resources for research