Search Engines made EZ
or Going Fishing in Uncharted Waters!
Search Engines:
§
Search Engines
are connected to large databases that have “crawled” and indexed websites
§
Not all websites
are indexed on search engines
Statistics:
Search
Engine |
Showdown |
Claim |
Google |
3,033 |
3,083 |
AlltheWeb |
2,106 |
2,112 |
AltaVista |
1,689 |
1,000 |
WiseNut |
1,453 |
1,500 |
Hotbot |
1,147 |
3,000 |
MSN
Search |
1,018 |
3,000 |
Teoma |
1,015 |
500 |
NLResearch |
733 |
125 |
Gigablast |
275 |
150 |
Google Solidly in Lead
AlltheWeb Moves Back to 2nd
AltaVista Up to 3rd
Database Total Size Estimates:
Little Overlap Despite Database Growth!
This analysis compares the results of four small searches run
on ten different search engines. The four searches found a total of 334 hits,
141 of which represented specific Web pages. Of those 141 Web pages, 71 were
found by only one of the ten search engines while another 30 were found by only
two.
Several of the largest search engines have shown significant
growth since Feb. 2000, when the overlap comparison was last run. Even so,
almost half of all pages found were only found by one of the search engines,
and not always the same one. Over 78% were found by three search engines at
most. Each pie slice in the chart represents the number of hits found by the
given number of search engines. For example, the by 1 (71)
slice represents the number of unique hits found by one (and only one) search
engine.
Information provided by:
Search Engine Showdown - http://searchengineshowdown.com/
Boolean Search Strings
Before we take a look at
specifics of these search engines lets take a look at Boolean Search Strings.
A PRIMER IN BOOLEAN LOGIC
Boolean logic consists of three
logical operators:
OR - AND - NOT
Each operator
can be visually described by using Venn diagrams, as shown below.
OR
college OR
university
college OR university OR
campus
AND
poverty AND
crime AND gender
NOT
cats NOT
dogs
PHRASE
Reality Check
Lets say we are
looking for a software engineer with C++, UNIX (any flavor) but we want to stay
away from UNIX Systems Administrators who call themselves software engineers
and can program in C++.
We can create a
search string using Boolean operators:
“software
engineer” AND C++ AND (unix OR sun OR solaris OR hpux) NOT administrator
You probably
already use something like this on your job boards or in house resume tracking
system.
BACK TO THE SEARCH ENGINES!
Let’s explore
the top 3 search engines:
Google - http://www.google.com/
AllTheWeb -
http://www.alltheweb.com/
Alta Vista -
http://www.altavista.com/
The basic Google
search had implied AND built in so we do not have to use the operator.
The engine
supports AND as well as OR and uses a minus sign “–“ for not. Here is a quick list of “Advanced Operators”
that Google supports.
Advanced
Operators:
cache:, link:, related:,
info:, stocks:, site:, allintitle:, intitle:, allinurl:, inurl: and filetype:
Let’s look at a
few of these.
link: The query [link:] will list webpages that have links
to the specified webpage.
site: Google will restrict the results to those websites in
the given domain.
intitle: Google will restrict the results to documents
containing that word in the title.
filetype: is
worth mention too as many .pdf’s contain lists that can be very valuable.
Back to our
search for a software engineer. What if
we wanted to find an engineer from a specific company? We will use www.artesiansolutions.com for our
example as our target company.
(link:artesiansolutions.com
OR site:artesiansolutions.com) AND (intitle:resume OR intitle:cv) AND “software
engineer” AND C++ AND (unix or sun or solaris or hpux) NOT administrator
This should get
us a website that has a link on it to our target company or that is on the
website of that target company that is either a “resume” or a “CV” in the HTML
title section of the code in addition to our technical requirements for a
software engineer.
Our friends at
Google have decided that this is way too much information for most people and
have made a friendly interface for us.
The Boolean search strings do make for a good case study and lesson as
we can apply this strategy to other search engines.