Tag Archives: advanced search

Order matters with Google advanced search commands

The great thing about running search workshops is that you have so many people experimenting with advanced commands that someone is bound to spot an anomaly that you haven’t. We’ve become used to seeing different results when changing the order in which we enter keywords but not when using advanced search commands. During one of my workshops we had a couple of people playing around with Google’s allintitle command. This tells Google to look for all of the keywords following allintitle in the title of a document.

The search that was initially used was allintitle:diabetic retinopathy and came back with 277,000 results. Restricting the search to UK academic sites by using allintitle:diabetic retinopathy site:ac.uk reduced the number to about 2,190 and gave sensible results. But changing the order of the commands to site:ac.uk allintitle:diabetic retinopathy gave  two very bizarre results:

Site and Allintitle  Commands

Both results are from academic sites but the allintitle as a search command seems to have been ignored. The first entry includes intitle, diabetic and retinopathy and the second has allintitle, diabetic and retinal. Using the Verbatim option from the menus on the left hand side of the results page gave us zero!

Next we tried combining allintitle with fieltype:pdf.

allintitle:diabetic retinopathy filetype:pdf

gave us 3490 results of which at least the first 100 were relevant.

Switching the order to :

filetype:pdf allintitle:diabetic retinopathy

gave 495,000 results some of which were relevant but many did not contain all of our terms nor did they contain both diabetic and retinopathy in the title. Google was also looking for variations on our terms.

Order of advanced search commands

 

Using Verbatim on this search gave us zero again.

Advanced Commands and Verbatim

When we looked at the advanced search screen Google had put everything in the right boxes. If we used the advanced search screen to enter our terms afresh the search worked with Google putting the allintitle command at the start of the search.

Was this a general problem or just with allintitle? We then played around with the intitle command.

intitle:diabetic intitle:retinopathy site:ac.uk – 2220 sensible results (slightly more than our original allintitle search)

site:ac.uk intitle:diabetic intitle:retinopathy – 2220 sensible results identical to those above

intitle:diabetic intitle:retinopathy filetype:pdf – 3480 sensible results

filetype:pdf intitle:diabetic intitle:retinopathy – 3480 sensible results same as previous search

We then tried using a phrase after intitle:

intitle:"diabetic retinopathy" site:ac.uk – 2130 sensible results

site:ac.uk intitle:"diabetic retinopathy" 2130 sensible results identical to previous search

Following a suggestion made by Tamara Thompson of PIBuzz ( http://pibuzz.com/) changing the search slightly to site:ac.uk "intitle:diabetic intitle:retinopathy" gave exactly the same results.

Just to make sure that it wasn’t just us in the UK seeing this I asked fellow members of AIIP (http://www.aiip.org/) to run the original two allintitle searches. They saw exactly the same thing.

Its seems, then, that there is a problem when allintitle is not the first command in a search. The intitle alternatives appear more reliable. If you prefer to use the command line rather than fill in the boxes on the Advanced Search screen remember that order sometimes matters.

Does this affect other combinations of commands? I left it at allintitle and intitle but I wouldn’t be at all surprised.

x-Factor web pages are “advanced” says Google’s reading level

Google has rolled out a new search option that assigns a reading level to the pages in your results list. Don’t be surprised if you haven’t spotted it yet; it is hidden on the advanced search screen. Under the “Need more tools?” section you can choose from the drop down menu to see all of the results with reading level annotations, basic results, intermediate results or advanced results.

Google Reading Level

Google does not give much away as to how it calculates the reading level and it has nothing to do with the reading age that publishers assign to books. It could involve sentence structure, grammar, the length of sentences on a web page, the length of the document, the terminology used and doubtless many other criteria. But Google isn’t saying.

If you have opted to see the annotations, at the top of your results page you will see a graphic showing the percentages for each of the categories. Under the title of each entry in your results list is the reading level.

Google Reading Level Results

Click on the Basic, Intermediate or Advanced links next to the bar chart to see pages for that reading level. The eagle-eyed amongst you will have spotted that Google appears to be mathematically challenged because the numbers do not add up to 100%. In all of the searches I have done so far 1 or 2% are missing from the statistics. Looking through the lists of results some pages have no reading level assigned to them and they seem to be documents that contain very little information, have more numbers than text, and some are formatted files. Note, though, that most file formats do have a reading level so why some are not picked up remains a mystery to me. Some Daily Mail articles do not have a reading level either but many would argue that they fall into the ‘very little information’ category!

Once you have used the Reading Level in the advanced search screen you can change your search on the results page and it remains as part of your search strategy until you close down your browser or tab.

You can also check out an entire web by using the site command, for example site:rba.co.uk

Google Reading Level for RBA site

And this is where you can start to have some fun comparing sites (WARNING – this is addictive!). Phil Bradley has done some in his blog posting Google adds reading level
(http://philbradley.typepad.com/phil_bradleys_weblog/2010/12/google-adds-reading-level.html). He also highlights some potential problems with labelling pages in this way. For example ‘basic’ does not necessarily mean stupid, but some people may be deterred from selecting basic pages because of the tag.

Most of my pages are classed as intermediate and I am happy with that. Many of them are listings and analyses of business information sources. My husband’s blog on the other hand is 71% advanced and 27% intermediate. This comes as no surprise to me as he has a habit of littering his postings with complex calculations on topics such as wind turbine energy generation and the EROEI of tar sands oil production. (Just the sort of thing not to read before you have had your second cup of coffee of the day.) That plus the industry specific jargon that he uses makes an advanced tag inevitable.

Google Reading Level Energy Balance Blog

The evidence so far seems to be suggesting that using terms or jargon that are relatively uncommon in the whole of the Google database is a heavy factor in determining the reading level. Let’s look at what one might consider to be an intellectually challenging topic: the use of zeolites in environmental remediation.

Google Reading Level Zeolites search

That seems to confirm it.

As a final test and for a bit of fun let’s look at what Google makes of a search on the recent x Factor final.

Google Reading Level xFactor

Noooooo! Surely some mistake? The X factor home page is rated as basic but 93% of the results are advanced. There is indeed a mistake but it was my sloppy search strategy. Changing the x factor part of the search to a phrase gives what I would expect and a switch to 53% basic, 40% intermediate and 6% advanced.

ReadingLevelxFactor2.jpg

Out of curiosity, I looked at the content of the advanced pages and am now totally bemused. I cannot see how they could ever have been classified as such, but then this is Google we’re talking about. Perhaps Google cannot comprehend the scoring system, why so many people watch it or why the programme exists at all?

Google Reading Level xFactor

I have experimented with several other searches. Some came up with results as bizarre as those for the x Factor search but it is interesting how the breakdown can be changed by slightly modifying your search strategy, for example by using phrases when appropriate or a plus sign before a term to force an exact match search. Google’s Reading Level could be useful as a training tool to show how small alterations to a search strategy can radically change the results. But as with all things Google, we do not know how it works and the results can sometimes be very strange. Use with caution.

IFEG Advanced Search, Statistics & Market Research

I have now uploaded the slides for my workshop at the Information for Energy Group (IFEG). As usual, I have uploaded them to several different web sites in case one or more are blocked by corporate firewalls. If you have problems accessing any of the locations, let me know and I’ll sort out some other means of getting the presentation to you.

Workshop: Advanced Internet Searching for Energy Information & Market Research
Organised for:
Information for Energy Group
Venue: The Energy Institute, New Cavendish Street, London.
Date: Thursday 13 May 2010

PowerPoint Presentation (download from the RBA site – 7.5 MB)
authorSTREAM
Slideboom
Slideshare

Another workshop – another Top 10 Search Tips

The participants at the latest advanced search workshop were all from the public sector and had very strong views on some of the new developments in search. They were definitely not impressed by Google automatically enabling web history with a view to “personalizing” search results. (See Your Google results are about to get weirder
http://www.rba.co.uk/wordpress/2009/12/17/your-google-results-are-about-to-get-weirder/). (The workshop participants  are switching off Web History as soon as they get back to the office!) There were several sites and search features, though, that did impress them. This is their list of Top 10 Search Tips.

1. The Google Wonderwheel was the clear winner of the day with this group. When your results page appear on screen, click on “Show options” just above the results and to the left of the screen. Then select Wonderwheel from the list on the left of the page. (For further details see Google new search and display options
http://www.rba.co.uk/wordpress/2009/10/05/google-new-search-and-display-options/)

2. Google’s Timeline was a close second in the popularity stakes. This is also under Show options in Google when you do a default web search and is also available in Google News. It shows the distribution of your articles over time and gives you an idea of when something started to become a “hot topic” and how a story has developed over time. It is not 100% accurate but is good enough to give you an overall picture of how interest in a subject has waxed and waned.

3. LGSearch http://lgsearch.net/ They liked this one a lot! This a Google Custom Search Engine (CSE) set up by Dave Briggs (http://davepress.net/) that searches UK public sector web sites in one go. On the results page you can, if you wish, narrow down your search further to Local Government, Central Government, Health, Police & Fire, LG Related or Social Media.

4. Slideshare http://www.slideshare.net/. A site used by many people and organisations to provide access to PowerPoint presentations. Search for presentations on any topic or by a specific person then view online or download the original if the author permits. Once you have selected a relevant presentation Slideshare also shows you a list of other presentations containing similar content. No registration required if you just want to search.

5. Try something else other than Google. As well as giving Yahoo or Bing a go, try and think about the type of information you are looking for: news, video, statistics, what people are talking about. Then use the appropriate search tool for that type of information.

6. Twitter search http://search.twitter.com/ You may not want to indulge in Twitter yourself but it can give you an idea of what people are saying about a topic. It is also an essential part of reputation monitoring and competitive intelligence: what are people saying about you or your products and services? You do not have to have a Twitter account to search Twitter, just go to search.twitter.com.

7. Google Blogsearch (http://blogsearch.google.com/) and Blogpulse (http://www.blogpulse.com/) Blogs are another useful source of views and opinions on every topic imaginable. Blogpulse has a “trend this” option on the results page that displays a graph showing you how many blog posts mention your search terms over time.

8. Zuula.com (http://www.zuula.com/) for quick and easy access to a wide range of search tools covering different types of information. Enter your search once, click on the tab for the type of resource (video, images, reference, news), and then work your way through the list of search engines.

9. Google Custom Search Engines (CSE). We looked at several Google CSEs, LGsearch.net and Directionlessgov (http://directionlessgov.com) being just two of them. You can, though, set up your own CSE at http://www.google.com/cse/. Useful if you search the same web sites day after day. You will need a Google account or Gmail account to set up a CSE but you can host your CSE on your own web site or on Google. CSEs can be made public or kept private.

10. University of Auckland Official Statistics (OFFSTATS)  http://www.offstats.auckland.ac.nz/ This set of web pages provides information on Official Statistics on the Web and is an excellent starting point for official statistics by country and subject/industry.

Workshop: Advanced Internet Search Strategies 29th October

If you have booked a place on my advanced search workshop taking place this week in London on the 29th, you should by now have received confirmation, joining instructions etc. via post, fax, or email (or all three!).  If you have not yet received anything from me contact me straight away via email, phone or fax. Details are at http://www.rba.co.uk/about/contactkb.htm

Workshop on Advanced Search Strategies, London

Several people have asked me when I am next running my workshop on advanced search strategies (sometimes known as Google and Beyond) in London. The next date for London is Wednesday, 18th February and there are still some places left. The venue is InTuition House, Borough High Street, London SE1 1JX, which is close to Borough tube station and London Bridge. The cost is £150 + VAT (total: £172.50) and includes refreshments and a buffet lunch.

Full details of the workshop together with a booking from are on my web site at http://www.rba.co.uk/training/searching.htm . You can pay by credit card, PayPal or request to be invoiced for the event.

For those of you who live in the Manchester area, I am running a similar event for UKeiG on April 1st. Details are at http://www.ukeig.org.uk/training/2009/April/GoogleandBeyondManchester200904.html