Category Archives: Search Engines

How search works – sort of

Google has put together a site showing how Google search works (http://www.google.com/insidesearch/howsearchworks/thestory/). The main page is a scrolling animated graphic that just gives you some elementary facts but there are links to more detailed information and videos on the main topics of crawling and indexing, the searching and ranking algorithms, fighting spam and Google’s general policies. They are a useful set of pages for anyone who does not already know the basics of how Google works, but if you are looking for something that tells you how to get sensible results from Google you’ll be disappointed. As Phil Bradley says:

“…. boils down to ‘we find some stuff, do magic to it, filter out the crap that our magic didn’t get and then give it to you.’ Yes folks, an entire site to say that. Wasted opportunity.”

Top tips for finding research information

Free Search Tools for Finding Research Information

This week I was in Canterbury leading a workshop and discussion on Google and Google Scholar for finding research information. Although the emphasis was on Google we also covered other specialist tools designed to search for scientific and research information. We also had an interesting discussion on h-index, other citation indices and services such as ORCID and ResearchGate. The slides for the session are available on authorSTREAM (http://www.authorstream.com/Presentation/karenblakeman-1706478-google-scholar-research-information/), Slideshare (http://www.slideshare.net/KarenBlakeman/scholar-research-information) and temporarily at http://www.rba.co.uk/as/.

Anyone who has attended one of my workshops knows that I ask the group to propose at the end of the session their top tips. These are the Canterbury group’s top 10 tips.

1. What’s going on?
Try and find out what’s going on behind the scenes and how the different search tools work. For example, Google and Google Scholar are quite different in the way they manage your search. Understanding how they operate means that you can adapt your search strategy accordingly and also manage your expectations; for example Google Scholar does not use the publishers’ meta data so author and date search are unreliable.

2. Personalisation and ‘unpersonalisation’
Google personalises your search based on past activity, who is in your social networks,and a whole host of other ‘stuff’. You can quickly ‘unpersonalise’ your results by using a separate browser window that does not use cookies or your web history as part of the search algorithm.

If you use Chrome as your browser, open what is called an incognito window. In the top right hand corner of your screen there is an icon with three lines. Click on it and from the drop down menu select New incognito window. Alternatively press the Ctrl Shift N keys on your keyboard

If you use Firefox, from the menu at the top of the screen select Tools followed by Start Private Browsing.

In Internet Explorer select Tools followed by InPrivate Browsing. If you cannot see InPrivate under Tools try looking under the Safety option.

3. Advanced search commands
Use Google advanced commands  such as filetype: to focus on PDFs, presentations, spreadsheets containing data and site: to look for information on just one site or a range of sites such as UK government. Although the advanced search screen has boxes for you to fill in for the commands the file format or filetype option is limited. It does not include options for the newer Microsoft Office formats such as .pptx and xlsx. Use filetype: as part of your search strategy, for example:

nasa dark energy dark matter filetype:pptx

Google Scholar commands are more limited – see slide 28 of the presentation.

4. intext:
Google automatically looks for variations on your terms and sometimes omits words from your search if it thinks the number of results is too low. Prefixing a term with intext: tells Google that it must be included in your search and exactly as you have typed it in. For example:

UK public transport intext:biodiesel statistics

tells Google that biodiesel must be included in the search and exactly as typed in.

5. Reading Level
Use Reading level if Google is failing to return any research oriented documents for a query. Run the search and from the menu above the results select Search toolsAll results and then from the drop menu Reading level. Options for switching between basic, intermediate and advanced reading levels should then appear just above the results. Google does not give much away as to how it calculates the reading level and it has nothing to do with the reading age that publishers assign to publications. It seems to involve an analysis of sentence structure, the length of sentences, the length of the document and whether scientific or industry specific terminology appears in the page.

6. Date options
In Google web search, use the date options in the menus at the top of the results page to restrict your results to information that has been published within the last hour, day, week, month, year or your own date range. Click on Search tools, then Any time and select an option. This works best with news, discussion boards, and blogs and web sites that use blogging software  to generate pages but Google is getting better at identifying the correct date of a web page.

Google Scholar handles publication dates differently. On the results page you can select a date range from the menu on the left hand of the page. Alternatively, you can run a Google advanced search and enter your publication years. However, Google Scholar looks for publication years in the area of the document where the date is most likely to be. As a result it may identify a page number or part of an author’s address as a year!

7. Google Scholar alerts
To be used with caution as the searches periodically stop without warning, and so have to be set up again, and they sometimes include documents that are several years old. Whatever your search you can set up an alert by selecting Create alert from the menu on the left hand side of the results page.

If the author has created a profile on Google Scholar, from their profile page you can follow new articles and/or new citations for that author. From past experience I warn you that this is not entirely reliable.

Google Scholar Follow Author

8. Metrics – top publications
Although it claims to search all scholarly literature Google Scholar does not always cover all of the key journals in a subject area. There is no complete source list but there is a top publications for subjects and languages under the ‘Metrics’ link in the upper right hand corner of the Scholar home page.

9. Microsoft Academic Search – visualisations
Microsoft Academic Search (http://academic.research.microsoft.com/) is a direct competitor to Google Scholar. The site is sometimes slow to load and it often assigns authors to the wrong institution. Nevertheless, the visualisations such as the co-author and citation maps can be useful in identifying who else is working in a particular area of research. The visualisations can be accessed by clicking on the Citation Graph image to the left of the search results or author profile.

Microsoft academic search citation graph
Author Citation Graph


10. Mednar visual
Deep Web Technologies has developed in conjunction with various institutions a number of science and research specific portals, some of which are publicly available. The sources that they cover are different but they all have similar search and display options. Results are automatically ranked by relevance but this can be changed to date, title or author. In addition to the standard relevance ranked list of results the portals create clusters of topics on the left hand side of the screen. The topics include broad subject headings, authors, publications, publishers, and year of publication and are a useful tool for narrowing down a search. Some of the portals, such as Mednar (http://mednar.com/), offer a clickable ‘visual’ of topics and sub-topics.

Mednar Macular Degeneration Visual

Forthcoming workshops

I am running three workshops in April on business information and search. All three have a practical element so that you can try out resources and techniques for yourself.

Introduction to Business Research

This is being organised by TFPL and will be held in London on Thursday, 18th April. This course provides an introduction to many areas of business research including statistics, official company information, market information, biographical information and news sources. It will cover explanations of the jargon and terminology, regulatory issues, assessing the quality of information, primary and secondary sources. Further information is available on the TFPL web site at http://www.tfpl.com/services/coursedesc.cfm?id=TR1116&pageid=-9&cs1=&cs2=f

Business information: key web resources

This is also being organised by TFPL in London and is being held on Friday, 19th April. This workshop looks in more detail at the resources that are available for different types of information, alerting services and free vs. fee. It also covers search strategies for tracking down industry, market and corporate reports. Further information is available at http://www.tfpl.com/services/coursedesc.cfm?id=TR945&pageid=-9&cs1=&cs2=f

Make Google behave: techniques for better results

This is a very popular workshop and is being organised by UKeiG. It is being held in Manchester on Tuesday, 30th April.

Topics include:

  • How Google works
  • Recent developments and their impact on search results
  • How Google personalises your results and can you stop it?
  • How to use existing and new features to focus your search and control Google
  • How and when to use Google’s specialist tools and databases
  • What Google is good at and when you should consider alternatives

The workshop will be repeated in London on Wednesday, 30th October. Details and booking information are on the UKeiG website at http://www.ukeig.org.uk/trainingevent/make-google-behave-techniques-better-results-karen-blakeman

New Search Strategies articles

There are three new articles available in the subscribers area of Search Strategies:

Searching for research information: Institutional Repositories HTML article and PDF

Mendeley as a search tool for research papers. Available as an HTML article and PDF

Scirus. Available as an HTML article and PDF

Annual individual subscription rates are £48/year (£40 + £8 VAT). Multi-user and corporate rates are available on request. For further details contact Karen Blakeman publications@rba.co.uk.

To purchase a subscription go to http://www.rba.co.uk/search/purchase.shtml

Oi, Google! NO!!

I’ve been seeing what looks like a new annoying Google search “feature” for a few weeks. I have been trying to ignore it in the hope that it would go away but it hasn’t. The problem is that Google has started giving me long lists of YouTube videos for some of my queries, even though I am in web search. For example a search on comfrey compost tea came up with about a dozen videos before giving me web pages with text describing the benefits of comfrey compost, which was what I wanted. In addition, in the menus on the left hand side of the screen Google offered me options to refine my video search by duration. But, Dear Google, I did NOT want videos at all!

Google search for comfrey compost tea

It did not matter whether or not I was signed in to my Google account. The videos were still given priority. I wondered if this was just an issue with Chrome so I switched to Firefox. The list of videos disappeared and was replaced by just one entry for YouTube at the top.

Comfrey Compost Tea in Firefox and Incognito

This gave me a clue as to what might be going on. I use Chrome for most of my “personalised” search. I generally stay logged in to my account, have enabled web search history and do not clear out the search cookies. In contrast I use Firefox for “de-personalised” search. I stay logged out of Google and social networks, and cookies and history are cleared after each session. I usually watch permaculture and gardening videos in Chrome, which probably explains why YouTube was taking pride of place in many of my search results. To test the theory I paused and deleted my web search history, and cleared cookies and browsing data. I then signed out of Google, cleared cookies again and re-ran the search. The blasted videos were still there.

What if I ran the search in a Chrome incognito window? The results were identical to those when using Firefox. Back to a normal Chrome window and the videos returned. I then checked that my web history was off and deleted. It wasn’t and it steadfastly refused to go away. Then the penny dropped. All my Chrome data – bookmarks, history etc – are synced to my Google account so no matter how often I try and delete the stuff locally it will all come back down again from my account. I disconnected my Google account under Chrome’s settings and, “Hey presto”, no more videos. I reconnected and they were back. It appears that if you are using Chrome and have synced it with your Google account you will get personalised results, even if you are signed out of your account.

So, if you are a Chrome user you may think that you have switched off personalisation by logging out of your account but that may not be the case. If you are conducting serious research it is always worth running your searches in an Incognito window, using a different browser or a completely different search engine like DuckDuckGo (http://duckduckgo.com/).

Postscript: I forgot to mention that I also tried Verbatim, but to no avail. Verbatim makes sure that all your terms are in the pages/documents exactly as you have typed them in but that still gives Google plenty of leeway in presenting those results. Google still bombarded me videos although some were different from my original search.

Rediscovering BananaSlug for “long tail” search

I think it must have been seeing Phil Bradley the other night that made me think of revisiting BananaSlug.com (http://bananaslug.com/). I don’t mean that Phil reminds me of a banana slug but he did introduce me to the search tool via his blog way back in 2005. I have been looking at ways of getting out of what I call “search ruts”. You keep seeing the same results again and again but suspect that there may be something more relevant if only you could get to it. Million Short, which I mentioned in a previous blog post (http://www.rba.co.uk/wordpress/2012/10/04/million-short-unearthing-stuff-hidden-in-the-dungeons-of-googles-results/), is one way to tackle the problem. BananaSlug takes a different approach to what is known as long tail search. It adds a random term to your search and pulls up pages buried way down in the results list that you would probably never see. Just type in your search and then select a category, for example Animals, Great Ideas, Random Number, Themes from Shakespeare. BananaSlug then adds a random word from that category to your terms.

At first glance this approach to search may seem appropriate for frivolous, fun stuff only but I find that it works really well with serious research topics. Running one of my test searches zeolites "environmental remediation" through the categories pulled up information that could have taken me hours or even days to find otherwise. Bear in mind that BananaSlug uses Google so synonyms and variations of the random word will be included in the search. When I selected Colors as my category red was added to my search and Google included reddish and reds.

BananaSlug Search Results

Most of the categories came up with something useful although Random Number, inevitably for this type of search, came up with page numbers of journal articles. I didn’t think Themes from Shakespeare would work but the random word it suggested was storm and there were several interesting papers on storm water management and treatment.

Banana Slug Shakespeare Storm

This may seem a bizarre way to explore search alternatives but if you are stuck for ideas give it a go.

Note: for more information on the banana slug Ariolimax see http://en.wikipedia.org/wiki/Banana_slug. The Pacific banana slug is the second-largest species of terrestrial slug in the world, growing up to 25 centimetres (9.8 in) long.

Million Short: unearthing stuff hidden in the dungeons of Google’s results

Fed up with seeing the same results from Google again and again? Wondering if that elusive document is buried somewhere at the bottom of Google’s 2,000,000 hits? Then get thee hence to Million Short (http://millionshort.com/). Million Short runs your search and then removes the most popular web sites from the results. Originally it removed the top 1 million, as its name suggests, but the default has changed to the top 10,000. The principle remains the same, though: exclude the more popular sites and you could uncover a real gem. The page that best answers your question might not be well optimised for search engines or might cover a topic that is so “niche” that it never makes it into the top results. Million Short does not say what it uses for search results or how it determines what are the most popular web sites. According to Webmonkey “Sanjay Arora, founder of Exponential Labs, tells Webmonkey that Million Short is using “the Bing API… augmented with some of our own data” for search results. What constitutes a “top site” in Million Short is determined by Alexa and Million Short’s own crawl data.” (http://www.webmonkey.com/2012/05/million-short-a-search-engine-for-the-very-long-tail/).

Using Million Short is straightforward. Type in your search and select how many sites you want to exclude (top 10K, top million, top 100). The results page includes a list of the sites that have been removed and you can opt to add one or more back in. You can also block a site using a link next to it in the results or click on “Boost!” so that pages from the site go to the top.

Million Short results

Million Short automatically tries to detect which country you are in but you can change it under “Manage Settings and Country”. I didn’t notice much difference when I changed countries but then most of the queries I pass through Million Short tend to be scientific or technical. On the same page you can manage sites that you have blocked, added or boosted.

Does it work? I would not use it instead of the existing major search engines such as Google, Bing or DuckDuckGo but as an additional tool to surface material that is not easily found in the likes of Google. As well as web search there are image and news searches, but I’m not convinced that I’d find those all that useful.

If you are interested in comparing Million Short with Google try Million Short It On at http://www.millionshortiton.com/index.html. I had several goes at this and most of the results were a draw. That is no surprise as the searches I ran were very specific and I wanted to see if Million Short would pull up additional information, which it did. Million Short won outright on a couple and Google on one. The Google win was by default because Million Short did not come up with anything for comparison (the search in question was biofuels public transport carbon emissions).

There are a number of techniques that you can use to improve Google results for example changing the order of the words in your search, Verbatim, filetype or Reading Level but I would also recommend trying Million Short. The results should at least be different and may reveal vital information for your research.

Top search tips from North Wales

August is usually a quiet month for me with respect to work. Time for a holiday away and then a couple of weeks ambling along the Thames Path or pottering around the garden. This year, though, as soon as I was I back from my travels I was knuckling down and updating my notes for two search workshops in North Wales. Both were for the North Wales Library Partnership (NWLP), the first taking place at Coleg Menai in Bangor and the second at Deeside College. Both venues had excellent training facilities and IT, which meant we could concentrate on getting to grips with what Google is doing with search and experiment with different approaches to making Google do what we want it to do.

At the end of the workshops both groups were asked to come up with a list of  Top 10 Tips. I’ve combined the two lists and removed the duplicates to generate the list of 16 tips below.

  1. Repeat one or more of your search terms one or more times
    Fed up with seeing the same results for your search?  Repeat your main search term or terms to change the order of your results.
  2. Menus on left hand side of Google results pages
    Use the menus on the left hand side of the results page to focus your search and see extra search features. To see all of the options click on the ‘More’ and ‘More search tools’ links. The content of the menus changes with the type of search you are running, for example Image search has a colour option.
  3. Verbatim
    Google automatically looks for variations of your terms and no longer looks for all of your terms in a document. If you want Google to run your search exactly as you have typed it in, click on the ‘More search tools’ options at the bottom of the left hand menu on your results page and then on Verbatim at the bottom of the extended menu that appears.
  4. intext:
    Google’s automatic synonym search can be helpful in looking for alternative terms but if you want just one term to be included in your search exactly as you typed it in then prefix the word with intext:. For example carbon emissions buses intext:biofuels flintshire. The command sometimes has the effect of prioritizing pages where your term is the main focus of the article.
  5. Advanced search screen and search commands
    Use the options on the advanced search screen  or the search commands (for example filetype: and site:) in the standard search box to narrow down your search. A link to the advanced searchscreen can usually be found under the cog wheel in the  upper right hand area of the screen. If you can’t see a cog wheel or the link has disappeared from the menu go to http://www.google.co.uk/advanced_search. A list of the more useful Google commands is at http://www.rba.co.uk/search/SelectedGoogleCommands.shtml
  6. Try something different
    Get a fresh perspective by trying something different. Two most popular during these two workshops seemed to be DuckDuckGo (http://duckduckgo.com/) and Millionshort (http://millionshort.com). Other search engines to try include Bing (http://www.bing.com/) and Blekko (http://blekko.com/).
  7. Use the country versions of Google for information that is country specific
    This will ensure that the country’s local content will be given priority, although it might be in the local language. Useful for companies and people who are based in or especially active in a particular country, or to research holiday destinations. Use Google followed by the standard ISO two letter country code, for example http://www.google.de/ for Google Germany or http://www.google.no/ for Google Norway.
  8.  Filetype to search for document formats or types of information
    For example PowerPoint for experts or presentations, spreadsheets for data and statistics, or PDF for research papers and industry/government reports. Note that filetype:ppt will not pick up the newer .pptx so you will need to include both in your search, for example filetype:ppt OR filetype:pptx. You will also need to look for .xlsx if you are searching for Excel spreadsheets and .docx for Word documents. The Advanced Search screen file type box does not search for the newer Microsoft Office extensions.
  9. Clear cookies
    Even if you are logged out of your Google account when you search, information on your activity is stored in cookies on your computer. These can personalise your results according to your past search and browsing history. Many organisations have set up their IT systems so that these tracking cookies are automatically deleted at least once a day or whenever a person logs in or out of their computer account. At home, your anti-virus/firewall software may perform the same function. If you want to make sure that cookies are deleted or want to control them manually How to delete cookies at http://aboutcookies.org/Default.aspx?page=2 has instructions on how to do this for most browsers.
  10. Looking for research papers? Google Scholar (http://scholar.google.com/) is one place to look but there may be additional material hidden somewhere on an academic institution’s web site. Include advanced search commands, for example filetype:pdf site:ac.uk, in your search.
  11. For the latest news, comments and analysis on what is happening in an industry or research area carry out a  Google blog search and limit your search by date. Simply run your search as usual in the standard Google search box. On the results page click on Blogs in the menu on the left hand side of the screen and then select the appropriate time option.
  12. site: and -site:
    Use the site:command to search within a single site or type of site.For example:2011 carbon emissions public transport site:statistics.gov.uk to search just the UK official statistics web siteasthma prevalence wales site:gov.uk OR site:nhs.ukto search all UK government and NHS web sites

    If you are fed up with a site dominating your results use -site: to exclude it from your search.

    For example:

    Dylan Thomas -site:bbc.co.uk

  13. Reading level – from tourism to research
    Use this to option in the menus on the left had side of your results page to change the type of information. For example run a search on copper mines north wales. Then click on Reading Level in the left hand menus. Selecting “Basic” from the options that appear at the top of the results gives you pages on tourism and holiday attractions. “Advanced” gives you research papers, journal articles and mineral databases. Google does not give much away as to how it calculates the reading level and it has nothing to do with the reading age that publishers assign to books. It could involve sentence structure, grammar, the length of sentences on a web page, the length of the document, the terminology used and doubtless many other criteria.
  14. Google.com
    Apart from presenting your search results in a different order Google.com is where Google tries out new features. As well as seeing pages that may not be highly ranked in Google.co.uk you will get an idea of how Google search may look in the UK version in the future.
  15. Numeric range search
    Use this for anything to do with numbers – years, temperatures, weights, distances, prices etc. Use the boxes on the Advanced Search screen or just type in your two numbers separated by two full stops as part of your search.For example:world oil demand forecasts 2015..2030
  16. An understanding of copyright is important if you intend to re-use information found in the web and absolutely essential if you are going to use images. Creative Commons licences clearly state what you can and can’t do with an image but they are not all the same. The list at Creative Commons http://creativecommons.org/licenses/ outlines the terms and conditions. “FAQs – Copyright – University of Reading” at http://www.reading.ac.uk/internal/imps/Copyright/imps_copyrightfaqs.aspx gives some guidance on copyright but if in doubt always ask! An example of what can happen if you get it wrong is demonstrated by “Bloggers Beware: You CAN Get Sued For Using Pics on Your Blog” http://www.roniloren.com/blog/2012/7/20/bloggers-beware-you-can-get-sued-for-using-pics-on-your-blog.html.

 

Personalised vs non-personalised search – a word cloud comparison

My talk at the recent INFORUM 2012 conference held in Prague was about the issue of personalisation and the impact of our social network activities on search results. I believe that personalisation, and in particular contributions from our social and professional networks and even Google+, can present us with an alternative view of a topic or person that can be an important part of our analysis of a situation. I always have two different browsers open. One is not logged in to any account of any sort, has all cookies cleared at the end of each research session, and has search history disabled. The other is permanently logged in to a Google+ enabled account, social and professional accounts, and has web history enabled. This enables me to quickly switch between two very different environments to give me very different results when I am conducting research on Google or even Bing. Demonstrating this at a workshop or conference can be difficult, though, because postings and comments from the social elements of the search results may have been restricted to friends or limited circles.

For the INFORUM 2012 conference I decided to generate word clouds for personalised and non-personalised results for a Google.co.uk search on the single word Prague. The titles and up to the first 250 words of the top 20 results for the searches were scraped into a document from which the clouds were generated. In the graphic below, which has been taken from my presentation, the first word cloud represents a search that is as non-personalised as I could make it and the second has been personalised by several weeks of research on what to do and see in Prague. There are no prizes for guessing what we were interested in visiting!

Word cloud

Search gets personal and social

My INFORUM 2012 presentation on “Search gets personal and social” is available on authorSTREAM at http://www.authorstream.com/Presentation/karenblakeman-1431533-search-gets-personal-and-social/

It is also available temporarily at http://www.rba.co.uk/as/

A paper is also available on the INFORUM web site at http://www.inforum.cz/en/proceedings. It covers much of what I said but bear in mind it was written a few weeks beforehand and the presentation was updated with new developments the night before I gave the talk.