Last changed 11 Dec 1997 ............... Length about 2,000 words (14,000 bytes).
This is a WWW document by Steve Draper, installed at http://www.psy.gla.ac.uk/~steve/resources/www.search.html. You may copy it. How to refer to it.

Using the Internet as a Search Tool

This document was first written by Stephany Biello, and since revised by her and Steve Draper.

Contents (click to jump to a section)


Purpose of this Station


This information is designed to help you think about using the Internet for research during your time here at the University. "Surfing" the Internet can be as overwhelming and time-consuming as browsing a large library without a card catalogue.

First, it is important to understand that there is no single authority governing the explosion of resources on the Internet. In fact, the Internet itself is a network of networks which have different origins and purposes. Because anyone can be a "publisher" on the Net, we will want to consider the source of any information we obtain.

Many people and organisations have attempted to structure the available information as a "virtual library", but the amount and variety of information is so vast and changing so fast that no single source can be comprehensive.

Software developers have designed programs (called "search engines") to search the World Wide Web. It is important to understand how they search and what they search because different search engines will deliver different results.

The World Wide Web has emerged as a viable and legitimate way to publish information. Librarians are being asked by their patrons to help them find this information. Experience is starting to suggest that certain kinds of information can be found more effectively on the Web than it can be found using print sources.

I. Subject vs. Keyword Searching:


A. Understanding the Differences

In a traditional library, books and other materials are catalogued and even arranged on the shelves according to their subjects. These are usually fairly broad general topics. The number of subject headings is limited, and may not reflect the complexity and variety of topics covered in a particular article. Subject searching can be difficult if the category is not obvious. For example, a library may catalogue thalamus under thalamus- brain- central nervous system.

Computerised keyword searching allows you to find more information because the computer looks at words in the titles and content of a source as well as the subject. It also allows you to find more specific information, because each source yields many more keywords than subjects. The challenge in using keyword searching is to refine your topic so that the search yields an adequate number of useful citations. It is also important to understand the differences in the way searches are conducted by
different search engines.

For example, lets say you were doing a critical review on the causes of Alzheimer's diseases, and you wanted to use the net to find more information. First lets try finding information by subject.

B. Subject Searching:

You could try looking at the

World Wide Web Virtual Library: Category Subtree (the address for this site is http://vlib.stanford.edu/Overview2.html type this in where it says location in Netscape.)

Find the topic Neurobiology. This can be done in one of two ways. After the World Wide Web Virtual Library: Category Subtree has fully loaded, you can go to the find option under Edit. Type in Neurobiology in the space and click on find next. Alternatively, visually scroll down the list of subjects until you find Neurobiology. You can scroll down the page by using the mouse to click on the arrow on the bar that runs horizontally down the right hand corner of the page. Hint, neurobiology is located under the Bio Sciences section.

Having resisted the tendency to select a topic such as Beer & Brewing within the Agriculture heading, click on Neurobiology.

Scroll down and have a look at some of the information within the category of neurobiology. Then use the find option to locate information about Alzheimer's disease. Have a look at the information by clicking on the Alzheimer's Association. When you have reached the Alzheimer's Association site, write down the location of the site here: ________________________________ This could be used if you wanted to return directly to the site at a later time.

Click on research and medical, then on Advances in Alzheimer Research to have a look at some past articles in their National Newsletter.

Now click on the Back button at the top of the page to return to the Alzheimer's Association home page and look this time at related resources. Clicking on related resources will bring you to an option to look at Links to other Alzheimer's related materials on the Web. Click on links, have a look at what information is available.

Now try searching by subject with another resource: Social Issue Resources (http://www.sirs.com/tree/tree.htm) . Again, go to this site by entering the address in the location area at the top of the netscape screen. Information about Alzheimer's disease is likely to be located within the Science section, so Click on the Science label to the left of the loaded Social Issue Resources page. Scroll down the page until you find the Medical Science heading (or go directly to it with the find command). Select (or click on...) Alzheimer Web. One This page has loaded have a look at some of the information located there.

On this page you will see a figure of the brain which highlights cholinergic pathways affected in the Alzheimer brain. What do they tell you is one of the brain areas involved in these cholinergic pathways?________________________________

Questions about the sources in general- answer briefly-

Question #1. Which source gave you the largest number of results?

Question #2.Which source gave you the most useful results?

C. Keyword Searching:

Now try keyword searching "Alzheimer's disease" in two different ways on the Internet:

WebCrawler (http://webcrawler.com/) is popular because it features a simple search form which is easy to use.

SavvySearch (http://www.cs.colostate.edu/~dreiling/smartform.html) is a multithreaded page which allows searching from several sources simultaneously.

Question #3. Which source gave you the largest number of results?

Question #4. Which source gave you the most useful results?

II. What is a Search Engine?


A search engine is a tool to help you find information on the Internet. Search engines periodically scan the contents of the Web to rebuild their massive indexes of Web pages. Some search titles or headers of documents, others search the documents
themselves, and still others search other indexes or directories. When you request specific KEYWORDS, they then search the indexes they have built for those words. Your search is not a "live" search of the web, but a search of that engine's index.

Since different methods are used to build search engine indexes, it is important to note that different search engines will give you different results. Acquaint yourself with and use several different search engines for more in-depth research much as you would look in several books of a library's reference section to research any topic for applicable information. Be sure you take a close look at the source of the information you receive. Internet has no filter so personal opinions and links to personal home pages in addition to well-documented primary research sources may show up in your results list.

Two features will probably influence your choice of a "favourite" search engine. One is ease of use: it should allow you to customise your searches without offering so many options that using it is confusing. Second, a good search engine should be accurate: if properly configured, it will return a reasonable quantity of fairly precise results.

One way to learn about the particular characteristics of any search engine is to read its "help" screens. Alta Vista's Advanced Search provides a detailed description and some sophisticated examples.


Computerised search mechanisms are based on Boolean logic; the better we understand how it works, the better will be the results we obtain.

Boolean logic utilises three primary operators: AND, OR, and NOT.

Using the word "AND" actually narrows the results obtained in a search, while using the word "OR" broadens the results.

To see the difference, enter a custom search for Alzheimer's Disease on Lycos (http://www-uk.lycos.com/) Try the search once with "OR" and then again with "AND" by checking the appropriate boxes.

Question #5 Which search gave you the largest number of results?

Question #6 Which search gave you the most useful results?

Lycos defaults to search for ANY of the words you enter (Boolean "or"); other search engines assume you want ALL of the words (Boolean "and").

The Boolean operator "OR" can be very useful when there are many synonyms for a concept, and we do not know which one might have been chosen by the author or indexer. For example, teenagers OR adolescents OR youth will yield many more
citations than any of these words by itself.

We can also limit the results by using the Boolean operator "NOT". For example, you could search by Alzheimer's Disease not dementia.

Different search engines incorporate Boolean logic in different ways. It is important to read the instructions for each search engine before entering your search terms.

III.Comparing Search Engines:

A. Combined Subject and Keyword Searchers

Yahoo (http://www.yahoo.com/) features a hierarchically organised subject tree of information resources which have registered. It offers limited search options, but is often a useful starting place because of its large database of authoritative sources. Restrict your search to one of the subcategories to avoid retrieving business-related entries.

Webcrawler (http://webcrawler.com) is lightning fast and returns a weighted list of links. It analyzes the full text of documents, allowing the searcher to locate keywords which may have been buried deep within a document's text. Read its Search Tips.

Lycos (http://www-uk.lycos.com/) searches document titles, headings, links, and keywords, and returns the first 50 words of each page it indexes for your search. Its search engine is more configurable than Webcrawler. Choose Sites by Subject to browse by topic.

B. Web Robots

Web "robots" depend on software which automatically searches the World Wide Web for new material.

Excite (http://www.excite.com) currently contains searches of one and a half million webpages, UseNet news articles and classified ads as well as links to current news, weather, etc. It presents results with a detailed summary to allow you a quicker selection of appropriate sites
for information.
Alta Vista (http://www.altavista.digital.com/) is "arguably the most comprehensive index of documents on the web." Use Boolean operators on the Advanced Search screen.

C. Multithreaded Search Engines

Multithreaded search engines are slower, but give you more search options and compare results from more than one search engine. They are likely to return more specific information with more precision than the robots.

SavvySearch (http://www.cs.colostate.edu/~dreiling/smartform.html) allows you to specify the sources and types of information you want to retrieve.

Metacrawler ( http://www.metacrawler.com/index.html) is highly configurable, similar to Savvy Search, and verifies and collates the results.

Question #7 What was a particularly interesting site you located?

For further information have a look at two articles at the site for PC Magazine:
1. Researching With the Web http://www.zdnet.com/pcmag/issues/1511/pcmg0081.htm

2. Finding Your Needles in the Web's Haystack
http://www.zdnet.com/pcmag/issues/1513/pcmg0080.htm

Question #8. Try your Alzheimer's Disease Topic and search for information about it in each of the above search engines. Compare the results. Now, what's your favourite search engine? Why?

V. Evaluating Information Sources: Applying Critical Standards to the Internet

As said before, it is important to understand that there is no single authority governing the explosion of resources on the Internet. In fact, the Internet itself is a network of networks which have different origins and purposes. Because anyone can be a "publisher" on the Net, you must consider the source of any information you obtain.

A. Reliability:

What is the source of the information: did it come from an academic, government or commercial site or a Usenet newsgroup? If the information was obtained from a commercial site, what is the site designed to sell? Does that goal affect the quality or objectivity of the information provided?

B. Authority:

Postings to Usenet newsgroups frequently reflect the author's individual opinion. What do you know about this author's credentials?

C. Objectivity:

Is the information presented objectively, or does it reflect the biases of its author or web site? How thorough is the coverage compared to other sources?

D. Relevance:

If information about your topic is changing rapidly, how current is the information? How recently was the web site updated? Does the information you retrieved from the Internet add a significant perspective to your research?

Question #9 Go back to the site you located that you thought was particularly interesting. Briefly indicate the objectivity and relevance of the site. Also briefly comment on the reliability of the information it contains.

objectivity_______________________________________________________

relevence_________________________________________________________

reliability_________________________________________________________

If you have any extra time look at an article from January 1996 Information Technology, "Beyond Surfing: Tools and Techniques for Searching the Web"

http://magi.com/~mmelick/it96jan.htm