Search Engines made EZ or Going Fishing in Uncharted Waters!

 

  • Most recruiters are “fishing” for candidates in the same pool of active candidates
  • Balance your recruiting efforts Active and Passive

 

Search Engines:

§         Search Engines are connected to large databases that have “crawled” and indexed websites

§         Not all websites are indexed on search engines

 

Statistics:

Bar Chart - 12KB

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Search Engine

Showdown
Estimate
(millions)

Claim
(millions)

Google

3,033

3,083

AlltheWeb

2,106

2,112

AltaVista

1,689

1,000

WiseNut

1,453

1,500

Hotbot

1,147

3,000

MSN Search

1,018

3,000

Teoma

1,015

500

NLResearch

733

125

Gigablast

275

150

 

Google Solidly in Lead
AlltheWeb Moves Back to 2nd
AltaVista Up to 3rd

 

 

 

 

Database Total Size Estimates:

 

 

 

 

 

 

 

 

Little Overlap Despite Database Growth!

This analysis compares the results of four small searches run on ten different search engines. The four searches found a total of 334 hits, 141 of which represented specific Web pages. Of those 141 Web pages, 71 were found by only one of the ten search engines while another 30 were found by only two.

Several of the largest search engines have shown significant growth since Feb. 2000, when the overlap comparison was last run. Even so, almost half of all pages found were only found by one of the search engines, and not always the same one. Over 78% were found by three search engines at most. Each pie slice in the chart represents the number of hits found by the given number of search engines. For example, the by 1 (71) slice represents the number of unique hits found by one (and only one) search engine.

Pie Chart

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

  • Bottom line is three search engines found 78% of the Web pages for this search.  You need to use more than one when doing a search and using three will only get you two thirds of the possibilities!

 

Information provided by: Search Engine Showdown - http://searchengineshowdown.com/

 

 

Boolean Search Strings

Before we take a look at specifics of these search engines lets take a look at Boolean Search Strings.

 

A PRIMER IN BOOLEAN LOGIC

 

 

Boolean logic consists of three logical operators:

OR - AND - NOT

Each operator can be visually described by using Venn diagrams, as shown below.

 

 

OR

Venn diagram for OR

college OR university

 

  • In this search, we will retrieve records in which AT LEAST ONE of the search terms is present. We are searching on the terms college and also university since documents containing either of these words might be relevant.

 

Venn diagram for OR

college OR university OR campus

 

  • The more terms or concepts we combine in a search with OR logic, the more records we will retrieve.

AND
Venn diagram for AND

poverty AND crime AND gender

  • In this search, we retrieve records in which BOTH of the search terms are present
  • This is illustrated by the shaded area overlapping the two circles representing all the records that contain both the word "poverty" and the word "crime"
  • Notice how we do not retrieve any records with only "poverty" or only "crime"

Venn diagram for AND

  • The more terms or concepts we combine in a search with AND logic, the fewer records we will retrieve.

 

 

NOT
Venn diagram for NOT

cats NOT dogs

  • In this search, we retrieve records in which ONLY ONE of the terms is present

 

 

PHRASE

 

  • Double Quotes are used for a phrase:  “cat dog” will retrieve only records with “cat dog” in that order.

 

Reality Check

 

  • So how can we use this as a tool?

 

 

Lets say we are looking for a software engineer with C++, UNIX (any flavor) but we want to stay away from UNIX Systems Administrators who call themselves software engineers and can program in C++.

 

We can create a search string using Boolean operators:

 

“software engineer” AND C++ AND (unix OR sun OR solaris OR hpux) NOT administrator

 

You probably already use something like this on your job boards or in house resume tracking system.

 

BACK TO THE SEARCH ENGINES!

 

Let’s explore the top 3 search engines:

            Google - http://www.google.com/

            AllTheWeb - http://www.alltheweb.com/

            Alta Vista - http://www.altavista.com/

 

Google –

The basic Google search had implied AND built in so we do not have to use the operator. 

The engine supports AND as well as OR and uses a minus sign “–“ for not.    Here is a quick list of “Advanced Operators” that Google supports.

 

Advanced Operators: 

cache:, link:, related:, info:, stocks:, site:, allintitle:, intitle:, allinurl:, inurl: and filetype:

 

Let’s look at a few of these.

link: The query [link:] will list webpages that have links to the specified webpage.

site: Google will restrict the results to those websites in the given domain.

intitle: Google will restrict the results to documents containing that word in the title.

filetype: is worth mention too as many .pdf’s contain lists that can be very valuable.

 

  • How can we use this?

 

Back to our search for a software engineer.  What if we wanted to find an engineer from a specific company?  We will use www.artesiansolutions.com for our example as our target company.

 

(link:artesiansolutions.com OR site:artesiansolutions.com) AND (intitle:resume OR intitle:cv) AND “software engineer” AND C++ AND (unix or sun or solaris or hpux) NOT administrator

 

This should get us a website that has a link on it to our target company or that is on the website of that target company that is either a “resume” or a “CV” in the HTML title section of the code in addition to our technical requirements for a software engineer.

 

Our friends at Google have decided that this is way too much information for most people and have made a friendly interface for us.  The Boolean search strings do make for a good case study and lesson as we can apply this strategy to other search engines.

 

  • Lets look at the user friendly version:

 

 

 

Go to Google Home

 

 

Advanced Search Tips | All About Google

 Advanced Search

 


Find results

with all of the words

with the exact phrase

with at least one of the words

without the words

Language

Return pages written in

File Format

return results of the file format

Date

Return web pages updated in the

Occurrences

Return results where my terms occur

Domain

return results from the site or domain

e.g. google.com, .org  More info

SafeSearch

No filtering   Filter using SafeSearch

 



Page-Specific Search

 

Similar

Find pages similar to the page

e.g. www.google.com/help.html

 

Links

Find pages that link to the page

 

 

 

 

  • It uses “All of the words” for our AND operator, “exact phrase” for our phrase, “at least one of the words” for our OR operator and “without the words” for our NOT.
  • We also have “File Format”, “Domain” and “Links” for our advanced operators.
  • The “Date” and “Similar” fields could be interesting as well.

 

  • Google includes a search engine for logging on to the “Usenet” newsgroups, Government (.gov) and Military (.mil) sites as well as specific URL’s for Universities for alumni or college recruiting.

 

 

 

 

 

AllTheWeb

 

AllTheWeb has many similar functions as Google.  You can do Boolean searches (note they use ANDNOT where Google used NOT).  This site also introduces a couple of more operators.  The * asterisk and ^ caret.  The * is an old DOS (does anybody remember DOS?) command for a “wild card”.  Meaning it will accept any characters that follow such as: Nic* will return Nick, Nicole, Nicholas, Nicky, Nicolas etc…The ^ caret anchors the search to the exact term.

 

We have our Advanced Operators, link:. site:, filetype:, url: and  title: (note Google used intitle).

Does AlltheWeb support any advanced operators in a query?

AlltheWeb allows a number of advanced operators to construct a query, or query of phrases. These operators allow you to build phrases, and include or exclude specific terms or phrases in your query. The following operators are supported:

Character(s)

Example

Description

" "

"pac man"

Results will contain the phrase "pac man".

( )

(pac man)

Results will contain either the term "pac" or the term "man". Putting multiple terms in parentheses is equivalent to using the Boolean or between single terms.

+

+pac

Results must include the term "pac". You can create a more complex query by stringing along a sequence of terms. For example, "+pac +man" will find results that include both the term "pac" and the term "man". Using the + character is equivalent to using the Boolean and between single terms. By default, AlltheWeb assumes the use of the + operator.

-

"dog breeders" -poodles

Results will include pages with information on "dog breeders" excluding the term "poodles". Using the - character is equivalent to using the Boolean andnot between single terms.

Does AlltheWeb support Boolean query language?

The following Boolean operators are supported. However, it is important to note that they will only work from the Advanced Web Search page with "Boolean Query" selected from the "Query Type" pulldown menu. Currently you can use the following keywords to find results that include or exclude terms:

Operator

Example

Description

and

pac and man

All search results will contain both the terms "pac" and "man".

or

pac or man

Results will contain either the term "pac" or the term "man".

andnot

pac andnot man

Results will contain the term "pac" but not the term "man".

rank

pac rank man

Results will contain the term "pac" and preferably include the term "man".

Each of these Boolean operators can be used in conjunction with another and with parentheses, allowing you to make complex Boolean queries. The following are some examples of how these operators can be used together for more query precision.

Complex Boolean Examples

florida and golf andnot "Arnold Palmer"

Results must contain the term "florida" and the term "golf" but not the phrase "Arnold Palmer".

florida and golf andnot "Arnold Palmer" rank LPGA

Results will include "florida" and "golf" without the phrase "Arnold Palmer" preferably including the phrase "LPGA".

Does AlltheWeb support any special keywords in a query?

AlltheWeb supports a number of keyword shortcuts to many advanced search features. You can also use special keywords in your search to find pages within a specified domain, or pages with a certain words in their titles or URL. Currently the following keywords are supported:

Keyword

Example

Description

site:

auctions site:domain.com

Finds pages for "auctions" within "domain.com". This can be helpful in obtaining highly relevant results from a specific site. By default, the site: keyword anchors the value of the keyword to match the end of the hostname, in this case, "domain.com". This query will find pages for "auctions" within "domain.com", but not in "domain.com.sg".

The following examples illustrate how to use the caret (^) operator to anchor the value of the "site:" keyword to match the start, substring or exact hostname.

 

antiques site:^auctions.domain.com*

Using both the caret (^) and asterisk (*) operators anchor the value of the keyword to match the start of the hostname, in this case, "auctions.domain.com". This query will find pages for "antiques" within "auctions.domain.com.au", but not in "www.auctions.domain.com".

promotional offers site:^www.domain.com

The caret (^) operator anchors the value of the keyword to match the exact hostname, in this case, "www.domain.com". This query will find pages for "promotional offers" within "www.domain.com" only. It will not search "www.domain.com.sg" or "domain.com".

"antique furniture auctions" site:domain.com*

The asterisk (*) operator un-anchors the value of the keyword to match a substring in a hostname, in this case, "domain.com". This query will find pages for "antique furniture auctions" within any domain including "domain.com" in the hostname. These could include "www.domain.com", "www.domain.com.au", or "au.auctions.domain.com".

url:

url:football

Finds pages with the specific word or phrase in the URL. This example will find all pages that have the term "football" anywhere in the URL (e.g. http://www.domain.com/football.html would be in the results).

link:

link:www.alltheweb.com

Finds all pages with a link to www.AlltheWeb.com.

title:

title:AlltheWeb

Finds pages that contain the term "AlltheWeb" in the page title (which appears in the title bar of most browsers).

language:

heippa language:fi

Finds pages that contain the term "heippa" restricted to pages written in Finnish. Note: The language that you define must be a subset of the languages you are searching in. For example, if you have your preferred languages set to English and German, and you run a query with language:fi, you won't get any results.

filesize:

landscapes filesize:<1024

Finds pages that contain the term "landscapes" on pages less than 1024 bytes.

landscapes filesize:[1024;2048]

Finds pages that contain the term "landscapes" on pages between 1024 and 2048 bytes.

filetype:

specifications filetype:pdf

Finds PDF files containing the term "specifications".

specifications filetype:msword

Finds Microsoft Word files containing the term "specifications".

specifications filetype:flash

Finds Macromedia Flash files containing the term "specifications".

 

  • Here is the screen for the Advanced Search:

 

 

First select a type of search

Query language guide

Search for -

Boolean -

Create a boolean query using the operators and, or, andnot and rank.

see examples

 

 

  • More options consist of “Must include” for our AND, “Must not include” for our NOT, “Should include” for our OR, “Domain” for our site:, “File format” for filetype:, date ranges and a nice feature for “geographic region” so we can focus on North America.

 

 


  Use the following to include and exclude additional criteria

Language -

Find results written in  

Word Filters -

   

   

   

 + Add a filter

Domain Filters -

Filter results from specific domains (com, gov, dell.com, etc.)

Include results from

Exclude results from

Find results from a specific geographic region

 Only find results from 

IP-address Filters -

Only find results from the following IP-address(es) and/or range(s)

Media Types -

Find results that contain the following media types:

Include - Exclude

 

 

Include - Exclude

 

 

Images

   

 

Macromedia Flash

 

Audio (midi, wav, au)

   

 

Java applets

 

Video (mov, qt, avi)

   

 

JavaScript

 

RealVideo & RealAudio

   

 

VBScript

File format -

Only find results that are

Date -

Only find results updated

after

before

Size -

Only find results that are

Presentation -

Display  results per page

Offensive Content Filter -

Turn filter

      

 

Altavista

 

Altavista has three search options.  Your standard search that assumes ands and tries to translate a question, a “More Precision” search and the “Advanced Search”.

 

More Precision uses the “All these words” for AND, “this exact phrase” for Phrases, “any of these words” for OR and “none of these words” for the NOT.

 

AltaVista

 

 

 Web 

 Image 

 MP3/Audio 

 Video 

 Directory 

 News 

AltaVista USA

 

 

   

All these words

  

Back to Basic Search

this exact phrase

any of these words

and none of these words

 

 

 

 

SEARCH: WorldwideU.S.       RESULTS IN:  All languages English, Spanish

 

Advanced Search:

With this tool we can build queries, build Boolean expressions, use a date range and look for specific file types.

 

AltaVista

 

 

 

Advanced Web Search

  Help

 

Build a query with...

    all of these words

  

Basic Search

    this exact phrase

    any of these words

    and none of these words


 

Search with...

    this boolean expression



Use terms such as AND, OR,
AND NOT, NEAR  

    sorted by

Pages with these words
will be ranked highest.

 


 

SEARCH: WorldwideU.S.       RESULTS IN:  All languages English, Spanish

 

 

Date:

by timeframe:  

by date range:   to (dd/mm/yy)
 

File type:

Location:

by domain:   Domain/Country Code Index

only this host or URL:   http://  
 

Display:

site collapse (on/off)   What is this?

results per page
 

 

 

 

 

Here are the Advanced Operators used in AltaVista:

Note they use AND NOT, and introduce NEAR.  We also have our standards, domain:, host:, link:, url: and title:

 

   AltaVista

 

 

HomeAltaVista Help > Search > Special search terms

 


You can use these terms for both basic and advanced Web searches. For advanced searches, type these into the free-form Boolean box.

 

AND

Finds documents containing all of the specified words or phrases. Peanut AND butter finds documents with both the word peanut and the word butter.

OR

Finds documents containing at least one of the specified words or phrases. Peanut OR butter finds documents containing either peanut or butter. The found documents could contain both items, but not necessarily.

AND NOT

Excludes documents containing the specified word or phrase. Peanut AND NOT butter finds documents with peanut but not containing butter. NOT must be used with another operator, like AND. AltaVista does not accept 'peanut NOT butter'; instead, specify peanut AND NOT butter.

NEAR

Finds documents containing both specified words or phrases within 10 words of each other. Peanut NEAR butter would find documents with peanut butter, but probably not any other kind of butter.

*

The asterisk is a wildcard; any letters can take the place of the asterisk. Bass* would find documents with bass, basset and bassinet.
You must type at least three letters before the *.
You can also place the * in the middle of a word. This is useful when you're unsure about spelling.
Colo*r would find documents that contain color and colour.

(  )

Use parentheses to group complex Boolean phrases. For example, (peanut AND butter) AND (jelly OR jam) finds documents with the words 'peanut butter and jelly' or 'peanut butter and jam' or both.

anchor:text

Finds pages that contain the specified word or phrase in the text of a hyperlink. anchor:job +programming would find pages with job in a link and with the word programming in the content of the page.

Do not put a space before or after the colon. You must repeat the keyword to search for more than one word or phrase; for example, anchor:job OR anchor:career to find pages with anchors containing either the word job or the word career.

applet:class

Finds pages that contain a specified Java applet. Use applet:morph to find pages using applets called morph.

object:class

Finds pages that contain a specified object created by another program (eg. a Flash object). Use object:money to find pages using objects called money.

domain:domainname

Finds pages within the specified domain. Use domain:uk to find pages from the United Kingdom, or use domain:com to find pages from commercial sites.

host:hostname

Finds pages on a specific computer. The search host:www.shopping.com would find pages on the Shopping.com computer, and host:dilbert.unitedmedia.com would find pages on the computer called dilbert at unitedmedia.com.

image:filename

Finds pages with images having a specific filename. Use image:beaches to find pages with images called beaches.

like:URLtext

Finds pages similar to or related to the specified URL. For example, like:www.abebooks.com finds Web sites that sell used and rare books, similar to the www.abebooks site. like:sfpl.lib.ca.us/ finds public and university library sites. like:http://www.indiaxs.com/ finds sites about culture on the Indian subcontinent.

link:URLtext

Finds pages with a link to a page with the specified URL text. Use link:www.myway.com to find all pages linking to myway.com.

text:text

Finds pages that contain the specified text in any part of the page other than an image tag, link, or URL. The search text:graduation would find all pages with the term graduation in them.

title:text

Finds pages that contain the specified word or phrase in the page title (which appears in the title bar of most browsers). The search title:sunset would find pages with sunset in the title.

url:text

Finds pages with a specific word or phrase in the URL. Use url:garden to find all pages on all servers that have the word garden anywhere in the host name, path, or filename.


 

X-Ray: Searching within the URL of a web page is also sometimes referred to as “X-raying”. To X-ray or search within a URL means to search for certain pages you want that ONLY come from that URL. You can X-ray any web site for anything. Web sites that have many people who belong to are very good to search such as: companies, virtual communities, and ISP’s. In this example we will search an ISP/virtual community, America Online.  For this we would use the site: or url: advanced operators.

Peeling Back URLs: Don't get so excited over finding a great candidate online that you forget the page you're on may be only one level removed from all of that candidate's peers—if you strip the URL down to the higher-level folder. Other directory names that typically imply more candidates are /member(s) or /people.  For instance what if our search came up with a resume link like this: http://www.artesiansolutions.com/user/scherry.html ?  We can “peel back” the “scherry.html” in the command line of the browser and possibly find a list of his peers!

Another place that peeling back may work well is if you click a link in your search results that lead to a page error: Sometimes the page is password-protected, sometimes it's gone entirely. But if you go one level higher, you may be at visible pages of people again.  In the same thinking some search engines have “cache”.  These may be old links that the URL does not have on line anymore but the search engine has saved in the cache.

Flip: To search where these links are pointing to is called searching a link or “flipping”. In the presentation, I demonstrated how you could find out which pages on the Internet are linking to a particular domain by using the link: advanced operator. link:artesiansolutions.com

You can substitute any domain name in the above.

Having done our industry research, we are ready to source targeted companies for useful information using the familiar Flip, X-ray, and Peel-Back techniques, as well as some inferential logic... 

When we "flip" a target company for resumes/CVs, there are 3 things we can expect to find: 

1. Resumes of employees of the target company

2. Resumes of individuals who use the technology sold by the target company

3. Links to associations and industry hubs that involve the target company's product 

In most cases, the odds are not in your favor to find the right candidate by "flipping" the target company…the value of the effort lies in discovering the industry hubs, mailing lists, discussion groups, conferences, and associations where you will discover your candidate. This information is hidden in the resumes/CVs the initial "flip" reveals. 

You don’t have to spend a lot of money on an AIRS class just play around with different combinations and strings and you will get results.  If you do have some money an excellent tool for find resumes using these techniques without having to put a lot of time into creating the strings is Talent Hook (www.talenthook.com).

 

Play around with these searches and engines.  Save your search strings to a word document so that you can cut and paste into the different engines as well as tweak for the nuances of each search engine.  In some cases it will take a few tries to narrow down results to get what you want.  Substitute “attendees” or “list” or “directory” for resume while searching domains to find those elusive passive candidates.

 

Good luck and have FUN with it.

Nick Patti

recruitersd@cox.net

 

*P.S. Thanks for attending the presentation on April 17. Some of the feedback indicated that it would be nice to see more examples so here they are:

This is a good one to start with as a stub for San Diego recruiting.
(title:resume OR title:CV OR title:bio OR title:homepage OR url:resume OR resume) AND NOT (job OR "career opportunity" OR "equal opportunity employer" OR "employment at" OR EOE OR "employment opportunity" OR opening OR "submit resume" OR "your resume" OR "sample resume" OR "career development" OR classified OR book OR books) AND (619 or 858 or 760 or 909) AND (insert skill set here)


Here is one I used at BAE to find a data modeler:
(title:resume OR title:CV OR title:bio OR title:homepage OR url:resume OR resume) AND NOT (job OR "career opportunity" OR "equal opportunity employer" OR "employment at" OR EOE OR "employment opportunity" OR opening OR "submit resume" OR "your resume" OR "sample resume" OR "career development" OR classified OR book OR books) AND (link:www.raytheon.com OR link:www.lmco.com OR link:www.bah.com OR link:www.boozallen.com OR link:www.saic.com OR link:www.trw.com OR link:www.litton.com OR link:www.orincon.com OR link:www.boeing.com OR link:www.northgrum) AND (“data model*” OR “modeling and simulation”) AND (hyperformix OR "ses workbench")


References:

http://www.searchengineshowdown.com/stats/size.shtml

http://www.searchengineshowdown.com/stats/sizeest.shtml

http://www.searchengineshowdown.com/stats/overlap.shtml

http://library.albany.edu/internet/boolean.html

http://www.google.com/

http://www.google.com/help/features.html#sitesearch

http://www.google.com/help/refinesearch.html

http://www.google.com/advanced_search

http://www.alltheweb.com/

http://www.alltheweb.com/help/faqs/query_language.html

http://www.alltheweb.com/help/faqs/query_language.html#2

http://www.alltheweb.com/advanced?cs=utf-8

http://www.altavista.com/

http://www.altavista.com/?qbmode=

http://www.altavista.com/web/adv

http://www.altavista.com/help/adv_search/syntax

 

1