Tales from the Terminal Room

September 2001, Issue No. 24

Home About RBA Business Resources Search Strategies for the Internet Tales from the Terminal Room Training Contact Us

Please Note: This is an archive copy of the newsletter. The information and links that it contains are not updated.


Archives

 


 

Creative Commons License.

Tales from the Terminal Room ISSN 1467-338X
September 2001, Issue No. 24
Editor: Karen Blakeman
Published by: RBA Information Services

Tales from the Terminal Room (TFTTR) is a monthly newsletter, with the exception of July and August, which are published as a single issue. TFTTR includes reviews and comparisons of information sources and search tools; updates to the RBA Web site Business Sources and other useful resources; dealing with technical and access problems on the Net; and news of RBA's training courses and publications.


In this issue:

  • Review: Meta-search Software for your PC
    • Copernic vs LexiBot Vs WebSeeker
  • Updates to the RBA Web site
    • Introduction to the Euro
    • Capital Market Guide
    • Guide to SEC Filings
    • Deloitte VAT/GST Rates
    • PricecoopersWaterhouse Global VAT Online
    • PricecoopersWaterhouse EdgarScan
    • Lawyer Locator
    • Internet Search Tools & Portals
  • Gizmo of the Month
    • Spam Proof Email Links

Review

Meta-search software for your PC: Copernic vs. LexiBot Vs WebSeeker

Meta-search tools - tools that carry out your search across several engines at once - are nothing new. They have been around for as long as the search engines themselves. Recently, though, there have been complaints that the Web-based services are dominated by adverts and that the quality of the results has been decreasing with irrelevant, paid-for links appearing in the top placings. Are the PC based programs any better?

I have been using Copernic on and off for about 18 months and thought it was time to look at it in more detail along with two recent entrants into the market: LexiBot and WebSeeker.

All three were put to the test using my standard set of test searches:

  1. Data on the confectionery market in the UK using the search strategy: confectionery per capita consumption UK. The results were compared with those generated by Google.
  2. Checking whether or not one of our new web sites has been added to the major search engine databases.
  3. Search for current news on the Russian company Lukoil. Results were compared with an alert set up using the NorthernLight News search and searches in the FT Global Archive and LexisNexis.
  4. Searches for various book titles.
  5. File search: search for download sites offering Netscape 6 (N6setup.exe)

The first part of this review looks at the general features of the three packages. This is followed by a summary of their performance in the test searches, and then some conclusions.

Copernic

http://www.copernic.com/

The free version of Copernic, which is supported by banner advertising, covers eighty search engines grouped into 7 categories. The categories include Web, Newsgroups, Buy Books and Email addresses. You can select and deselect tools within each category but there is no option for creating your own personalised category (available in both LexiBot and WebSeeker).

The plus version costs USD 39.95, has over 90 categories covering 1000 search engines and no advertising. The categories include country specific Web search categories e.g. UK, Switzerland, Poland, Canada, Portugal; Science; Recipes; Pets; File search; Technical news; Patents. Many of the subject categories are US biased but there are enough International and country specific categories to make this a worthwhile investment for many non-US residents. If you look behind the scenes, many of the categories include web sites and databases that are not accessible to the major search engines - the so-called "invisible Web" - so in some respects Copernic is similar to LexiBot.

The Pro version (USD 79.95)includes options for automatic scheduled updates to searches, email notification of new documents that are found, and a query spell checker.

The basic Web meta search option covers 17 search engines including Google, FAST, AltaVista and HotBot but not Northern Light. Default search options are set to return a maximum of 10 results from each engine but this can be increased to a maximum of 300 from each. However, this is not advisable as you could end up with over 6,000 results to sift through!

Searching is simple: select the category you wish to use, type in your search terms and select one of "any of the words", "all of the words", "exact phrase".

The results list presents you with titles, URLs, relevance and the search engines that found the pages. There is an option to show "extracts" - a couple of lines - that contain your search terms. The default sort is by relevance but you can re-sort results by title, search engine or URL. Duplicates are automatically removed. You can also the validate the results to remove any dead or inaccessible links.

You can store your results for future reference and if you re-run your search at a later date, any new documents that it finds are marked with a star for easy identification. This is useful if, for example, you are monitoring a company; although the new documents will not necessarily be recent ones it does save you the bother of having to sift through your whole set of results in an effort to identify them.

Double clicking on a result brings up the page in Copernic's own browser - a version of Internet Explorer - but you can use the browse option to view it in your default browser. The Copernic browser is generally faster and has the advantage that it highlights your search terms within the page.

If you have the "paid for" version, you can install new categories compiled by Copernic and the program will periodically check for updates to your installed categories.

Overall: I find this an easy and fast package to use, and one that consistently returns relevant results.

LexiBot

http://www.lexibot.com/

BrightPlanet, the producers of LexiBot, published the oft quoted report "The Deep Web: Surfacing Hidden Value". Some of the "deep web" or "invisible web" sources mentioned in that report are covered by LexiBot. For that reason I was persuaded to trial the product: "2,200 Deep Web databases and search engines in 200 pre-configured channels" The deep or invisible Web consists of resources that the standard search engines do not normally index. They are often databases such as telephone directories, library catalogues or password protected sites that are free or available on subscription only.

LexiBot can be downloaded free of charge for a month but costs USD 289.95 to purchase.

The LexiBot Web site claims:

"The program is simple and straightforward to work with. There are six main screens that provide you with an interface to set up and conduct powerful Internet and database searches. They also allow you to interact with and manipulate the returned search results."

Having had many trials and tribulations with this software, I have to disagree most strongly with the first sentence. Copernic automatically displays the search categories that are available to you on the left hand side of the screen. With LexiBot, it takes a while to work out which button you need to click in order to pull up the list of groups. Furthermore, the help files and tutorials tend to concentrate on the power of the software rather than giving a straightforward "1-2-3 this is how to use LexiBot".

The Web sources button presents you with a list of groups from which you select one or more for your search. The help files do warn you, though, not to select too many as this may significantly slow down your search. As a search on just half a dozen sites took me anywhere between four and seventeen minutes that is advice worth heeding. The default "Starter group" covers AlltheWeb (FAST), AltaVista, Excite, Google, HotBot, Infoseek [sic], NorthernLight, WebCrawler and Yahoo. The problem I had when I wanted to change my search groups was identifying the most appropriate one and, like Copernic, many of them are heavily US biased. There are about 200 groups and when I looked for a group for my book search I plumped for the first one in the list that seemed relevant - Books and literature. I subsequently realised that I should have continued scrolling down the list and selected "Shopping - books".

Getting to know the groups and the resources they encompass can take a couple of hours, but a nice feature of LexiBot is that you can combine several groups, perhaps deselect a few individual resources, and then create your own group.

Once you have selected the sources, you can start searching. Type in your search terms and then click on a sliding scale that ranges from "Fast" to "Quality". At the "quality" end of the spectrum LexiBot must be carrying out some form of analysis on the pages its finds to improve relevance. Not surprisingly there is no information on the algorithms used but I found that, apart from an increased search time, there was little if any improvement in relevance and in some cases the quality decreased.

There are "manual" options for refining your search, for example limiting the search to the last 20 days, but this had no obvious impact. Watching LexiBot at work is fascinating if time consuming. It appears that LexiBot goes to each site and identifies everything that is clickable including advertisements. Many of these are eventually rejected but this is probably why LexiBot takes so long to process a search.

When the search is eventually completed your results are displayed by relevance with the title, URL and size. There are additional folders where you can look at scores, terms, sources and rejects. The Scores Folder is used to compare the scores among all of the returned documents. You can use this folder to select one or more documents as masters, and then use the Re-Rank facility to analyse and score all of the other documents against those masters you've selected. Thus, says LexiBot, "once you find a document that has the kind of information for which you're searching, the LexiBot makes it very quick and easy to find all of the other documents that are similar". I regret to say that it didn't work for me, even though I struggled with it for over an hour.

The Terms Folder is used to assess the terms on the returned Web documents and to refine your search further. The terms can be sorted alphabetically or by count in ascending or descending order. You can highlight any number of terms and click on the OR or AND radio button and LexiBot will display the URLs of the pages in your current set that match your additional criteria.

The Rejects Folder displays those documents that have been rejected by LexiBot for "not meeting the search criteria".

When it comes to "browsing" the results, you can view them in your default browser or in LexiBot's text browser. Some pages are unreadable as plain text because of the way in which they are designed but for the rest it is a quick way of viewing your results.

Overall: I found this a difficult, non-intuitive package to get to grips with and one that took far too long to return results, most of which were irrelevant.

WebSeeker

http://www.bluesquirrel.com/products/seeker/

WebSeeker is not available as a free download at present so you will have to pay USD 29.95 if you are interested in the product. The first version that I looked at included a standard Web meta search option and a handful of alternative groups including Music/MP3, News, Newsgroups, Software, and Business & Finance. They are now adding extra categories that can be automatically downloaded if you allow WebSeeker to periodically check for upgrades.

The default Web option covers AltaVista, Ask Jeeves, Go To, Google, HotBot, Jump City, MSN, NorthernLight, WebCrawler and Yahoo. Search tools can be deselected from a category if you use the Advanced search screen. Alternatively you can set up your own category, which can include tools already supported by WebSeeker or tools added by you using the Add Custom Search Engine option. This promises to be a very useful feature and one that I have only just started to investigate in detail.

You can choose to use the Search Wizard or the Advanced search. The wizard takes you through the essential steps of selecting your category, entering your search terms (all of the words, any of the words, exact phrase) and the type of "find" you require. WebSeeker offer Instant Find (de-duplicated links) Clean Find (de-duplicates and removes inaccessible links) or Filter Find (same as Clean Find and also downloads and indexes results to your hard drive), and the number of results to be retrieved per search engine or in total. The Advanced Search has similar options but also gives you the opportunity to select the search tools that you want WebSeeker to use.

The search process is fast and results are displayed with the title and URL. You can view pages in WebSeeker's own browser, which highlights your search terms in the page, or use the Browse function to view them in your default browser. When you have finished you can reject results from the list, mark selected pages for monitoring and save the results list to file.

Overall: A fast and relatively easy package to use. The number of categories and search tools is limited, but there are options for setting up your own categories and adding additional search engines not already listed.

Search Results

How did these three perform in the test searches?

1. Confectionery per capita consumption UK

Both Copernic and WebSeeker came up with relevant results which compared favourably with searches conducted directly in Google. I opted to search for all the words in both cases, used the full Copernic "Web" category, but in WebSeeker deselected Ask Jeeves, MSN and Jump City.

In LexiBot I selected "A Starter Group", which covers the major search engines, and opted for a "quality" search. This took 9 minutes and 45 seconds to "accept" 21 records. Top of the list was a reference from Dictionary.com about the UK. This had apparently been picked up by Ask Jeeves, which in turn had been picked up via a link on the one of the search engine pages (Ask Jeeves is not actually listed in the LexiBot starter group). Documents 2-4 were identical but relevant news items from the BBC, 5 and 6 were references to the same ICC Keynote report, 9 was the Cadbury Schweppes home page and 10 was the history of tobacco. Several highly relevant pages picked up by WebSeeker and Copernic were rejected by LexiBot!

Changing the LexiBot search setting from Quality to Fast improved the relevance of the results, but at 7 minutes 22 seconds I would not call it particularly fast. I did try other Web searches in LexiBot but found it far too slow, and relevance was poor when compared to Google, Copernic and WebSeeker.

2. Checking whether a new Web site has been added to the major search engine databases.

Neither LexiBot nor WebSeeker display the names of all of the search engines that find a particular page. For this type of search, Copernic was the outright winner.

3. Search for current news on the Russian company Lukoil.

Before I started I suspected that this was going to be a difficult one to crack but I had expected LexiBot, with its deep web resources, to outperform the other two. The first problem I had with LexiBot was selecting the right category. In the end I decided to combine several, one of which included LexisNexis, deselected many sources that I felt would be irrelevant and saved the resulting collection as a new category. Within the Manual options, I chose to search for pages that had a date within the last 30 days.

Even when I chose Fast search, LexiBot was appallingly slow (around 9 -10 minutes), many of the results had 1998 dates, but more disappointingly there were no results from LexisNexis. I eventually found these in the rejects folder tagged as "Size limit". I suspect that the search in LexisNexis generated more than 1000 results, which causes it to prompt you to refine your search - a message that LexiBot promptly rejects.

In Copernic there were several categories that looked relevant, but as there is no way of merging them I had to carry out separate searches in each. UK Newspapers and Newswire categories came up with relevant and recent articles but, as there is no sort by date option, you either have to guess or view the documents one by one to see how recent they are.

WebSeeker's News category covers CNNfn, TechWeb, ZDNet and USA Today and returned no results.

Compared with the alerts facility in Northern Light and searches in the FT Global Archive and LexisNexis, the results for all three were poor. I was not at all surprised that this was the case with Copernic and WebSeeker but that LexiBot failed so dramatically was a surprise, especially as it is marketed as a deep web search tool.

4. Searches for various book titles.

I looked for both current and out of print titles. WebSeeker does not have a books category but Copernic's Buy books worked well. Once I had worked out in LexiBot that "Shopping - Books", and not "Books and Literature", was the relevant category, that too worked albeit very slowly. For non-US users, though, there could be a problem in actually purchasing the titles as all the sites are US based. For myself, I will stick to using Amazon UK for current titles and Just Books for out-of-print publications.

5. File search: search for download sites offering Netscape 6 (N6setup.exe)

LexiBot does not have a category for locating software or files.

WebSeeker has a Software category that has four resources: ZDNet, Tucows, Dave Central and Gaming Depot. These are fine if you are looking for Web sites and using the general program name, for example Netscape, in a text search, If, though, you are looking for a specific file on an ftp site, as I often need to do, then Copernic File Search does the job superbly.

Conclusions

For me, LexiBot is a non-starter. It does not pretend to be just a meta-search engine and it is a waste of time and resources using it as such. However, its performance when using the other categories and in particular the deep web resources was equally dismal. Good ideas behind it, but in practice not worth USD 289.95

I find it difficult to choose between Copernic and WebSeeker. Both are fast in presenting results. Copernic has more categories and displays the names of the search engines that find a page, so it is more useful to me when I want to check which search tools have indexed one of my Web sites. WebSeeker, on the other hand, allows you to set up your own categories and add search tools not already covered by the program.

If you just want a straightforward meta-search tool, take a look at the free version of Copernic but do consider upgrading and paying for the plus version if you start using it on a regular basis (USD 39.95). At the very least you will not have to put up with the adverts, which do slow down the performance of the free version, and you may find the additional categories useful.

There is no free evaluation version of WebSeeker at present, but if your research interests are such that you would like to create your own groupings of resources then the USD 29.95 will be money well spent.

Finally: Useful though these tools are, they are no replacement for the advanced search features that are available directly through the individual search engines, and no substitute for getting to know essential "invisible" resources in your subject area.

Karen Blakeman


Information Resources

General Sources & Lists of Sites http://www.rba.co.uk/sources/general.htm

The Scottish Business Information Service (http://www.scotbis.com/), listed under Information Brokers in the July/August issue of TFTTR is mentioned again this month but under General Sources & Lists of Sites. The Essential Links section of Scotbis lists over 500 evaluated sites covering areas such as business support, company information, export, country information, news and statistics. There is some local bias, for example in the business support section, but the remainder of the links are International in coverage. Worth adding to your favorites or bookmarks as a key starting point for business information.

Miscellaneous day to day essentials http://www.rba.co.uk/sources/misc.htm

Several sites have been added to the Day to Day Essentials. The first is Introduction to the Euro (http://www.primark.com/pfid/), a PDF document from Primark Worldscope with general information on the Euro and the impact of its introduction on company accounts data. On the same site, the Capital Market Guide is a superb guide to corporate financial filings, disclosure requirements for companies in 81 countries and forms of business in each country. It is quite a large PDF file but worth downloading for future reference. Similarly, the Guide to SEC Filings covers all you need to know about US SEC filings. To access or download these PDFs click on "Guidebooks and Other Communications."

A new section on VAT and Sales Taxes has been added. The Deloitte Tax site (http://www.tax.deloitte.com/) has a free table of VAT and sales tax rates across the world. From the home page select Services followed by Indirect Tax and then VAT/GST rates. PricewaterhouseCoopers Global VAT Online (http://www.globalvatonline.pwcglobal.com/) has a vast amount of information on the subject but is a priced service.

Stock Markets & Company Financials http://www.rba.co.uk/sources/stocks.htm

PricewaterhouseCoopers EdgarScan (http://edgarscan.pwcglobal.com/)
Many services that re-package the US SEC filings and related reports are now charging for access to certain types of data and downloadable formats, notably the spreadsheet and RTF formats. PricewaterhouseCoopers offers a free service providing access to SEC EDGAR filings, some of which are made available as spreadsheet and RTF files. There are options for further analysis using the Java based Benchmark Assistant. As I have only just discovered this site, I have not had time to evaluate the service in full and have experienced a few problems with the Benchmark Assistant, but those could be due to the high security settings in my browser.

Trade & Service Directories http://www.rba.co.uk/sources/trade.htm

Martindale-Hubbell have launched Lawyer Locator (http://www.lawyerlocator.co.uk/), a database of over 54,000 solicitors, barristers and law firms in the UK, Isle of Man and the Channel Islands. On the default search screen you enter your town, or the first part of your postcode and can combine that with an Area of Law, for example Conveyancing, Charities, Intellectual Property.

To locate a specific firm of solicitors you enter their name, although this does not work on some formats of names. A search for my own solicitors "Barrett & Co" came up with zero hits but searching on just "Barrett" did bring up a set of results with "Barrett & CO" at the top. The advanced search includes an additional option for searching for individual solicitors or barristers by name; firm, chamber or company; area of law; town and post code.

Search Strategies for the Internet

Our two page A4 leaflet "Internet Search Tools & Portals" has been updated and can be retrieved as a PDF file (18K) from http://www.rba.co.uk/search/list.pdf

The leaflet lists major search tools, directories and business portals. If you have problems downloading or printing out the leaflet do please contact us at info@rba.co.uk giving your name and address, and we will pop a good old-fashioned hardcopy in the post to you!


Gizmo of the Month

Spam Proof Email Links

Fed up with spambots harvesting your Web page email links for their junk mailing lists? Then take a look at http://www.mways.co.uk/hidemail.php

Developed by Jolyon Ralph of Mysterious Ways in response to a request from a client, this script is relatively straightforward to use. It creates the code for a link that hides the email address reasonably well and should fool most if not all spambots. There are no guarantees as spambot authors are always "upgrading" their harvesting techniques but it should give you some respite. Don't spoil it all, though, by displaying the email addresses in the visible text of the page! Use something like "Email us" or "Contact us".


TFTTR Contact Information

Karen Blakeman, RBA Information Services
UK Tel: 0118 947 2256, Int. Tel: +44 118 947 2256
UK Fax: 020 8020 0253, Int. Fax: +44 20 8020 0253
Address: 88 Star Road, Caversham, Berks RG4 5BE, UK

Archives

TFTTR archives: http://www.rba.co.uk/tfttr/archives/index.shtml

Subscribe and Unsubscribe

To subscribe to the newsletter fill in the online registration form at http://www.rba.co.uk/tfttr/index.shtml

To unsubscribe, use the registration form at http://www.rba.co.uk/tfttr/index.shtml and check the unsubscribe radio button.


Privacy Statement

Subscribers' details are used only to enable distribution of the newsletter Tales from the Terminal Room. The subscriber list is not used for any other purpose, nor will it be disclosed by RBA or made available in any form to any other individual, organisation or company.


Creative Commons License
This work is licensed under a Creative Commons Attribution 2.5 License.

You are free:
  • to Share - to copy, distribute, display, and perform the work
  • to Remix - to make derivative works
Under the following conditions:
  • Attribution. You must attribute the work to Karen Blakeman, and cite Tales from the Terminal Room as the source and include the year and month of publication.
  • For any reuse or distribution, you must make clear to others the license terms of this work.
  • Any of these conditions can be waived if you get permission from the copyright holder.

This page was last updated on 26th September 2001   2001