Tag clouds for analysing documents

CV not getting you those all important interviews? Nobody answering your job advert? Or perhaps your corporate publicity is not doing the biz? Processing your document through a tag cloud generator might give you a clue as to where you are going wrong. Sue Hill gave a presentation at the recent City Information Group open day on CPD and skills. In passing she mentioned that they sometimes run a CV or job description through a tag cloud generator to show people why their lovingly created prose is way off the mark. The tag cloud brings to the fore your most used terms and it can be a shock to discover that you have placed the emphasis in totally the wrong area. It then struck me that you could do this with any form of literature – a web page, training publicity, membership recruitment forms.

There are dozens, if not hundreds, of tag cloud generators on the Web and most of them are free. For starters try Wordle, Tagcrowd, or Tag Cloud Generator. The example below is a tag cloud of the UKeiG home page generated by Wordle.

UKeiG tag cloud generated by Wordle

Top Search Tips

I ran another advanced search workshop (Google and Beyond) for UKeiG on June 11th, this time in London. Twenty people attended the event and came up with the following list of top search tips at the end of the day.

1. Use the Advanced Search screen. There are lots of goodies to be found on the advanced search screens: options for focussing your search by file format (e.g. xls for data and statistics, ppt for expert presentations, pdf for industry or government reports); site and domain search to limit your search to just one web site or a type of organisation (e.g. UK government, US academic); and in Google there is a numeric range search.

2. Google Custom Search Engines (Google CSE) at http://www.google.com/coop/cse/. This made its first appearance in the Top Tips from the Liverpool workshop earlier this year. Ideal for building collections of sites that you regularly search, to create a searchable subject list, or to offer your users a more focused search option.

3. See what Google does with your search string.

a) If you use the default search box and Google comes back with odd results, click on Advanced Search to see what it has done with your search terms.

b) If you use the Advanced Search screen and fill in the boxes, see how Google formats the search strategy by looking the search box at the top of the results page. By learning the commands and prefixes you can build more specific searches more quickly on the default search page.

4. Cached copies. Look at the search engines cached copy of a web page if you can’t find your search terms in the document or if the page is nothing like the description in the results list. You will see the version of the page that has been used by the search engine for indexing and with your terms highlighted.

5. Use tools such as Intelways and Zuula for quick and easy access to a wide range of search tools covering different types of information. Enter your search once, click on the tab for the type of resource for which you are searching (video, images, reference, news etc.), and then work your way through the list of search engines.

6. Alacrawiki. The Alacra Spotlights section is a good starting point for evaluated sites and information on industry sectors. It is also a good example of what to look for when assessing the quality of a wiki and how easy it is for anyone to edit the pages. In the Spotlights sections there is no edit option , not even if you register for an account and login. Only the Alacra editors can edit the pages.

7. Open access journals. Google Scholar sometimes leads you to copies of journal articles in institutional repositories and open access journals, but there are also directories of open access journals. For example: http://www.doaj.org/ , http://www.wsis-si.org/oa-journals.html, http://www.abc.chemistry.bsu.by/current/fulltext.htm . This is not my area of expertise so comments on other directories are welcome.

8. Social bookmarking sites. Try social bookmarking sites, not only for creating your evaluated lists of sites but for searching other peoples. For example FURL, Del.icio.us, Connotea, 2Collab . Connotea (owned by the Nature Publishing Group) and 2Collab (owned by Elsevier) are aimed at researchers and scientists.

9. Search results visualisation. Try out some of the newer search tools that present results and search options in a different way. For example Cluuz, Kartoo, Kvisu, Quintura. [Some of the participants specifically mentioned Cluuz and Kvisu].

10. The Internet Archive (Wayback Machine) at http://www.archive.org/ for pages, sites and documents that have disappeared. Ideal for tracking down lost documents, seeing how organisations presented themselves on the Web in the past, and for collecting evidence for a legal case (e.g. ‘passing off’, copyright infringement).

Energy Export Databrowser

The Energy Export Databrowser, set up Jonathan Callahan, is based on BP’s 2007 Statistical Review and provides a quick and easy way to view country data on consumption, import and export of crude oil and natural gas. It covers over 80 countries and data goes back to the 1960s. There is feedback on the browser itself and an interesting discussion on the accuracy and validity of the underlying data on The Oildrum.

Directories: Major Companies of the World 2008

Seven new Editions of the World’s Major Companies Series have been published by Graham & Whiteside and are now available for purchase on the dataresources web site.

Major Chemical and Petrochemical Companies of the World 2008
This directory covers more than 7,000 of the leading chemical and petrochemical companies worldwide.

Major Energy Companies of the World 2008
More than 4,000 companies involved in coal mining and coal products; electricity supply; fuel distribution; natural gas supply; nuclear engineering; oil and gas exploration and production; oil and gas services and equipment; and oil refining worldwide.

Major Financial Institutions of the World 2008 (2 Vols)
Over 9,000 leading financial institutions worldwide, including banks, investment, insurance and leasing companies.

Major Food and Drink Companies of the World 2008
9,800 of the leading food, alcoholic and non-alcoholic drink companies worldwide.

Major Information Technology Companies of the World 2008
This directory covers more than 3,100 of the leading information technology companies worldwide.

Major Pharmaceutical and Biotechnology Companies of the World 2008
The world’s largest pharmaceutical companies, providing essential business profiles of the international leaders in the industry.

Major Telecommunications Companies of the World 2008
Profiles of more than 3,500 of the leading telecommunications companies worldwide, including many of the top Internet companies.