Introduction to Online Search Tools

The Internet, and particularly the World Wide Web (WWW or "the web", one of the largest "components" of the Internet) is a wonderful tool for either commencing or enhancing your research. You can use the information you find online for: brainstorming; gauging coverage of a given topic; enhancing more traditional research methods; developing a more detailed perspective on a given topic; and a variety of other purposes. But how do you go about finding this information? Well, there are four broad classes of tools you can use to help you track down information, and all of the tools can be found online: 1. search engines, 2. categorically-organised directories, 3. topic-specific indexes or "resource pages"; and 4. "deep web tools".

Search Engines

A search engine is an online computer program that  uses a "robot" (specifically, a "crawler") to automatically and randomly crawl around the entire Internet, indexing all the information it finds and putting it into a huge searchable database. When you go to a search engine's website, you are asked to type in keywords that describe the information you want to find; the search engine then consults its index and points you to websites and other locations on the Internet which contain the information that most closely matches your keywords. The process is akin to searching for a name in a phone book; you search for a particular name and are then given a corresponding phone number that allows you to connect to the person you want. Similarly, the search engine gives you the name of the information you want (usually in the form of a title of a document or file on a website) and provides its own "phone number" in the form of a link which takes you to the information you want (again, usually a document or file located on a website). Search engines index all types of documents and files, including HTML files (like the one you're reading), graphics and photographs, and multimedia files like audio and video.

Because search engines use robots which continually visit websites and other Internet locations in a random fashion, they are often the best tool for giving you the greatest "breadth" of information. However, because the process is random and automated and does not involve much human intervention (i.e., humans don't review the information that the search engines index), quality is sometimes sacrificed for breadth. In addition to finding highly relevant documents, files, and sites, you are often also presented with documents and websites which, although they seem to match your keywords, aren't really useful to you or are of any redeemable quality.

Each search engine differs somewhat in the exact method (or "algorithm") it uses to index online information, which means that—depending on what you're looking for—each one will vary in terms of how "thorough", "detailed", "relevant", and "important" its results are. For instance, although one search engine's claim to fame may be that it indexes more websites than any other, the results it gives you may not be as relevant to your query as the results given by a smaller, albeit more selective search engine. Given that search engines differ in the algorithms they use, you should try to use more than one if you want to find as much (quality) information as you can on a given topic. Time, however, is always a constraint, and you may have time to use only one or two search engines. In such cases, read through the listing of search engines below and try to figure out which one you think would give you the best results based on what you're looking for, the kind of results you want, how specific you want the results to be, and so forth.

Examples of search engines include Google, All The Web, AltaVista, and Teoma. Some search engines are more specialised, such as those that search just for news articles (e.g., Google News), images (e.g., Picsearch), or content from weblogs ("blogs") and RSS feeds (e.g., FaganFinder). A few search engines even let you search through archives of old websites and webpages that are no longer online (e.g., The Wayback Machine)

Online Directories

Online directories differ from search engines in that they are compiled by human beings. As opposed to automated robots, real people scour the Internet looking for quality websites and then index those websites into directories organised by topical category. When you visit one of these online directories, you are presented with a huge number of categories, hierarchically organised by topic. There are two ways to find information using these directories. First, you can browse through the hierarchy of categories until you get to the category which best describes what you're looking for; you can then look through the list of relevant websites, documents, or files and view the ones you feel would be most useful to you. Alternatively, you can just type in a few keywords describing what you're looking for and let the online directory either 1. take you to what it thinks are the relevant categories that will contain links to the kind of information you want; or 2. scour all of its categories and return a list of what it thinks are relevant websites and documents from all of its categories.

The obvious benefit to online directories is that they are edited and managed by real people. Reviewers visit various websites and place them under intuitively-deigned categories if they feel the sites are relevant and of sufficient quality. When you search an online directory, you generally encounter fewer irrelevant sites than you do when you to use a search engine. However, because humans are not machines, they cannot possibly visit as many sites as do the robots (crawlers) used by search engines. Therefore, what you gain in quality, you may sacrifice in breadth. Furthermore, determining what is a "quality" website is a subjective process; what one reviewer at Yahoo thinks is "quality" (or even "relevant") may not be quality or relevant to you. In fact, you might wonder why certain sites were included in a given category, while others weren't.

Examples of popular online directories include Yahoo and the Open Directory Project ("the largest, most comprehensive human-edited directory of the Web").

Topic-Specific Indexes and Resource Pages

Topic-specific indexes and resource pages are similar to online directories in that they are human-edited lists of websites. The difference is that they tend to focus on one specific topic. Resource lists and indexes are usually created, edited, and updated by individual Internet users or organisations that have a special interest or expertise in a given topic. Because these users and organisations focus their energies on one given topic, they have more time than the editors at the larger online directories to scour the Internet for websites, documents, and files relevant to that topic. In addition, their interest or expertise in a given topic is often greater than that of the reviewers at larger online directories. As a result, you get a much greater number of quality Internet resources relevant to your topic of interest. The drawback to indexes and resource lists are the same as those for online directories.

Examples of topic-specific indexes and resource pages include FindLaw, PsychCrawler (which looks like a robot-driven search engine, but is actually human-edited), my own Psychology Resources List, and About.com (a website with a large number of resource lists edited by individual users or "guides" with special expertise in a given area).

Caveat: There is More to the Internet Than The Above Search Tools Can Possibly Reach

Although search engines, directories, and indexes are powerful tools in helping you locate online information, they are still only able to access about 10% of the total information available on the Internet. Why is this? Well, there are at least five reasons. First, the majority of websites are not recognised by these tools because they do not use proper titles, keyword meta tags, or other identifiers. Second, even if a search engine or other tool is able to access a given site, it does not generally index every single document or file on that site (given that some sites have thousands of pages, this would be impractical and would make search engines' and other programs' databases too unwieldy to search efficiently). Third, these tools tend to focus on data found on the World Wide Web; much more data is available in the other "components" of the Internet (e.g., FTP servers, newsgroups, Telnet sites, etc.). Fourth, the search tools reach only the "surface layer" of the World Wide Web itself; 500 times more information is available in what is called the "deep web" (see below). Finally, because the Internet is so vast and grows exponentially by the minute, these tools cannot possibly explore all the content that is available online.

Given this limitation, what's your best bet for accessing as much relevant online information as you can? First, use a variety of search engines, indexes, and directories when looking for information on a given topic or when searching for a specific document or file; if one tool fails to turn up what you're looking for, another might do the trick. Second, when you reach a website or server to which to which you were directed by a search engine or other search tool, make sure to look through the site's "map" or use the search feature on the site or and server so you can manually explore everything on it and see if there is anything else of interest to you. Third, have a look at this article by Robert J. Lackie: "Those Dark Hiding Places: The Invisible Web Revealed". Finally, read the next section on "Deep Web Tools".

Deep Web Tools

There are two "layers" to the World Wide Web portion of the Internet. The first is the "surface layer", which is comprised of static, permanent webpages with their own permanent links. These are the pages you find when you visit a website directly or find it through a search engine or other search tool. The second layer is the "deep layer", and is 500 times bigger than the surface layer. The deep layer is comprised of information held in thousands of special databases. This information is not, however, available in the form of permanent web pages because information retrieved through databases is always "dynamic": different formulations of information will be created depending upon the query a given user makes when s/he visits a database at any given time. Because dynamic information cannot be transformed into permanent webpages, traditional search engines are unable to locate this information. The only way to view the contents of these databases is to visit each one, one-by-one, and tell them what you want from them. Fortunately, there are now some special tools—called "deep web tools"—that allow you to query multiple databases simultaneously. This can save you a tremendous amount of time.

Is it worth exploring the deep Web for information? The folks at BrightPlanet—owners of one of the major deep Web search tools—describe the vast amount of information waiting to be found in the deep Web:

Public information on the deep Web is currently 400 to 550 times larger than the commonly defined World Wide Web. The deep Web contains 7,500 terabytes of information, compared to 19 terabytes of information in the surface Web. The deep Web contains nearly 550 billion individual documents compared to the 1 billion of the surface Web. More than an estimated 100,000 deep Web sites presently exist. Sixty of the largest deep Web sites collectively contain about 750 terabytes of information – sufficient by themselves to exceed the size of the surface Web by 40 times.

BrightPlanet also describes the kind of information you will find in the deep Web:

Deep Web sites tend to be narrower with deeper content than conventional surface sites. Total quality content of the deep Web is at least 1,000 to 2,000 times greater than that of the surface Web. Deep Web content is highly relevant to every information need, market and domain. More than half of the deep Web content resides in topic specific databases. A full 95% of the deep Web is publicly accessible information—not subject to fees or subscriptions.

Because the content found in the deep Web is highly specific and detailed, the staff at Invisible Web—another deep Web search tool—encourage the following:

In general, we like the idea of comparing the resources available on the Invisible Web to a good collection of reference works. The challenge is to be familiar with some key resources prior to needing them. Information professionals have always done this with canonical reference books, and often with traditional, proprietary databases like Dialog and Lexis-Nexis. We encourage you to approach the Invisible Web in the same way—consider each specialized search tool as you would an individual reference resource.

To learn more about the deep web, visit http://www.brightplanet.com/deepcontent/index.asp and http://www.invisible-web.net.

For Additional Information

For more details on how to find information on the Internet, consider the following:

How Internet Search Engines Work
http://computer.howstuffworks.com/search-engine.htm
Great little article if you want a simpler explanation of how Internet search tools work.

Search Engine Watch
http://www.searchenginewatch.com
A very comprehensive site featuring articles on how different search tools and engines work, detailed descriptions of all the major online search tools (including search tools for specific topics and media types), tips on how to search the Internet, and news on search engine developments and technology. There are also many articles with suggestions on how to optimise your own website so that search engines will visit and index it. Everything you ever wanted to know about search engines and other tools is on this site (and also on Search Engine Guide, listed below).

Search Engine Guide
http://www.searchengineguide.com
Another great site with featuring: search engine news from all over the Internet (including over 4,600 archived articles); a listing over almost 3,000 search engines (including specialised search engines); and an extensive list of relevant books and resources.

Yahoo maintains an excellent index of sites devoted to Internet research and search tools (including sites which give you more information on how to search the Internet) as well as an extensive listing of nearly 500 search tools. The Open Directory Project also has a listing of useful sites related to Internet searching, as well as an index of about 1,300 directories and 300 search engines. Some of the research sites listed in these indexes feature really helpful articles on finding online information that most of the general search engines don't pick up.

Copyright © 2004, by Eddy M. Elmer

Permanent URL: http://www.eddyelmer.com/search_intro.htm

Return to Search Tools

Return to Searching Eddy's Site