Tag Archives: Million Short

Essential Non-Google Search Tools for Researchers – Top Tips

This is the list of Top Tips that delegates attending the UKeiG workshop on 7th September 2016 in London came up with at the end of the training day.  Some of the usual suspects such as the ‘site:’ command, Carrot Search and Offstats are present but it is good to see Yandex included in the list for the first time.

  1. Carrotsearch http://search.carrotsearch.com/carrot2-webapp/search or http://carrotsearch.com/ and click on the “Live Demo” link on the left hand side of the page.
    This was recommended for its clustering of results and also the visualisations of terms and concepts via the circles and “foam tree”. The Web Search uses eTools.ch for the general searches and there is also a PubMed option.

    Carrot Search Foam PubMed Foam Tree
    Carrot Search Foam PubMed Foam Tree
  1. Advanced Twitter Search http://twitter.com/search-advanced
    The best way to search Twitter! Use the Advanced Search http://twitter.com/search-advanced or the click on the “More Options” on the results page. There is a detailed description of the commands and how they can be used at https://blog.bufferapp.com/twitter-advanced-search 
  1. Yandex http://www.yandex.com/
    The international version of the Russian search engine with a collection of advanced commands – including a proximity operator – that makes it a worthy competitor to Google. Run your search and on the results page click on the two line next to search box.

    Yandex Advanced Search
    Yandex Advanced Search

    Alternatively, use the search operators. Most of them are listed at https://yandex.com/support/search/how-to-search/search-operators.xml. There is also a /n operator that enables you to specify that words/phrases must appear within a certain distance of each other, for example:

    "University of Birmingham" nanotechnology /2 2020

    There are country versions of Yandex for Russia, Ukraine, Belarus, Kazakhstan and Turkey. You will, though, need to know the languages to get the best out of them and apart from Turkey they use a different alphabet.

  1. Millionshort http://millionshort.com/
    If you are fed up with seeing the same results from Google again and again give MillionShort a try. MillionShort enables you to remove the most popular web sites from the results. The page that best answers your question might not be well optimised for search engines or might cover a topic that is so specialised that it never makes it into the top results in Google or Bing.Originally, as its name suggests, it removed the top 1 million but you can change the number that you want omitted. There are filters to the left of the results enabling you to remove or restrict your results to ecommerce sites, sites with or without advertising, live chat sites and location. The sites that have been excluded are listed to the right of the results.
  1. site: command
    Use the site: command to focus your search on particular types of site, for example include site:ac.uk in your search for UK academic websites. Or use it to search inside large rambling sites with useless navigation, for example site:www.gov.uk. You can also use -site: to exclude individual sites or a type of site from your search. All of the major web search engines support the command.
  1. Microsoft Academic Search http://academic.research.microsoft.com/
    An alternative to Google Scholar.“Semantic search provides you with highly relevant search results from continually refreshed and extensive academic content from over 80 million publications.”This was recently revamped and although it now loads and searches faster than it used to the new version has lost the citation and co-author maps that were so useful. It can be a useful way of identifying researchers, publications and citations but do not rely on the information too much. It can get things very wrong indeed. For example, I’ve found that for some reason the affiliation of several authors from the Slovak Technical University in Bratislava is given as the Technical University of Kenya!
  1. Wolfram Alpha https://www.wolframalpha.com/
    This is very different from the typical search engine in that it uses its own curated data. Whether or not you get an answer from it depends on the type of question and how you ask the question. The information is pulled from its own databases and for many results it is almost impossible to identify the original source, although it does provide a possible list of resources. If you want to see what WolframAlpha can do try out the examples and categories that are listed on its home page.
  1. OFFSTATS – The University of Auckland Library http://www.offstats.auckland.ac.nz/
    This is a great starting point for locating official statistical sources by country, region or subject. All of the content in the database is assessed by humans for quality and authority, and is freely available.
  1. Meltwater IceRocket http://www.icerocket.com/
    IceRocket specialises in real-time search and was recommended for inclusion in the Top Tips for its blog search and advanced search options. There is also a Trends tool that shows you the frequency with which terms are mentioned in blogs over time and which enables you to compare several terms on the same graph.

    IceRocket Trends
    IceRocket Trends

    Very useful for comparing, for example, mentions of products, companies, people in blogs.

  1. Behind the Headlines NHS Choices http://www.nhs.uk/news/Pages/NewsIndex.aspx
    Behind the headlines provides an unbiased and evidence-based analysis of health stories that make the news. It is a good source of information for confirming or debunking the health/medical claims made by general news reporting services, including the BBC. For each “headline” it summarises in plain English the story, where it came from and who did the research, what kind of research it was, results, researcher’s interpretation, conclusions and whether the headline’s claims are justified.

Million Short: unearthing stuff hidden in the dungeons of Google’s results

Fed up with seeing the same results from Google again and again? Wondering if that elusive document is buried somewhere at the bottom of Google’s 2,000,000 hits? Then get thee hence to Million Short (http://millionshort.com/). Million Short runs your search and then removes the most popular web sites from the results. Originally it removed the top 1 million, as its name suggests, but the default has changed to the top 10,000. The principle remains the same, though: exclude the more popular sites and you could uncover a real gem. The page that best answers your question might not be well optimised for search engines or might cover a topic that is so “niche” that it never makes it into the top results. Million Short does not say what it uses for search results or how it determines what are the most popular web sites. According to Webmonkey “Sanjay Arora, founder of Exponential Labs, tells Webmonkey that Million Short is using “the Bing API… augmented with some of our own data” for search results. What constitutes a “top site” in Million Short is determined by Alexa and Million Short’s own crawl data.” (http://www.webmonkey.com/2012/05/million-short-a-search-engine-for-the-very-long-tail/).

Using Million Short is straightforward. Type in your search and select how many sites you want to exclude (top 10K, top million, top 100). The results page includes a list of the sites that have been removed and you can opt to add one or more back in. You can also block a site using a link next to it in the results or click on “Boost!” so that pages from the site go to the top.

Million Short results

Million Short automatically tries to detect which country you are in but you can change it under “Manage Settings and Country”. I didn’t notice much difference when I changed countries but then most of the queries I pass through Million Short tend to be scientific or technical. On the same page you can manage sites that you have blocked, added or boosted.

Does it work? I would not use it instead of the existing major search engines such as Google, Bing or DuckDuckGo but as an additional tool to surface material that is not easily found in the likes of Google. As well as web search there are image and news searches, but I’m not convinced that I’d find those all that useful.

If you are interested in comparing Million Short with Google try Million Short It On at http://www.millionshortiton.com/index.html. I had several goes at this and most of the results were a draw. That is no surprise as the searches I ran were very specific and I wanted to see if Million Short would pull up additional information, which it did. Million Short won outright on a couple and Google on one. The Google win was by default because Million Short did not come up with anything for comparison (the search in question was biofuels public transport carbon emissions).

There are a number of techniques that you can use to improve Google results for example changing the order of the words in your search, Verbatim, filetype or Reading Level but I would also recommend trying Million Short. The results should at least be different and may reveal vital information for your research.