Quality of Retrieval by Major Search Engines

Assignment 1 in Internet Search Techniques and Business Intelligence

Task

The task is to compare performance of Internet search engines by running search queries, and measuring precision and recall of their search results.

You work in a group of 4 persons. Consider the following steps of your work:

Select 4 information needs that will serve as your test queries.
Select 3 search engines to compare.
In order to measure recall, you will need pooling. Further down follows a description of your pooling process.
Run your search queries, and measure precision and recall, as described below.
Do interpolation of 4 individual precision-recall curves for one search engine, obtain one representative precision-recall curve per search engine.
Compare the search engines.
Write and submit the report.

Information Needs

Here are sample information needs. Feel free to choose any other information need.

You are looking for information on the history of La Liga.
You want to explain to a ten years old child who Harry Potter is.
You want to buy a new smart-phone but cannot make up your mind for the model.
You are looking for cheap accommodation in Scotland.
You want to find out how hashing and hash tables work.
You are interested in the latest activities of Greepeace (visiting www.greenpeace.org is not enough).
What is the difference between iPhone and Samsung Galaxy?
What is the relation between a person's gender and income in Sweden?
Being tired of spam mail, you finally decide to find out what the catch with cheap mortgage loans is.
You are burned out and you are looking for psychological advice for highly stressed people.
Any news about the war in Syria?
What is the population of Sierra Leone?
You are interested in protection of animal rights.

Search Engines

Feel free to choose a general purpose search engine from the list of search engines.

Pooling

The purpose of pooling is to identify your collection of relevant documents for each information need (you can't handle a collection of hundreds of billions of documents).

For each separate information need that you have selected:
- Do "magic magnet" in order to create a pool of links:
  - For each search engine that you have selected:
    - Formulate the information need as a query q and run it.
      Formulate the query in the best way, how you think, for the given search engine. Use quotation marks, the plus sign '+', Boolean operators, if you need. Some search engines offer an advanced search option. If you decide to use advanced search, stick to it for all 4 queries you submit to that search engine.
    - Take the top 30 links of the "natural" search result and add them to a pool of links, ignore paid links (advertizing). We declare that only the top 30 retrieved links may refer to relevant documents. 31^st and down are declared non-relevant.
- Reduce the size of the pool. You have 3 · 30 links in the pool, but some links overlap because the same link is retrieved by several search engines or two links point to the same document. Join overlapping links, leave only one representative link per document. Remove dead links.
  Now you have between 30 and 90 links in the pool; 30 if all your search engines have the same links among top 30, 90 if each search engine has a unique set of top links.
- Evaluate relevance of the links in the pool to the information need. Apply subjective judgement and two values: relevant / non-relevant.
  See examples of pooling.
- Pooling for this information need is finished. Now you have your collection of relevant links for this information need.

Please observe that you do pooling for each of the 4 information needs separately and obtain 4 pools, one pool per information need.

Measuring precision and recall

Consider your 3 selected search engines.

For each separate search engine that you have selected:
- For each query q you have run:
  - Measure precision and recall for the top 5, 10, 15, 20, 25, 30 documents (meaning 1-5, 1-10, 1-15 ... documents, not 1-5, 6-10, 11-15 ... documents). When measuring recall for q, the total number of relevant documents for q is the size of the pool for q, i.e., the number of links in the pool.
    Observe how precision and recall change as you consider more documents. Draw a precision-recall curve. Please note that precision is not defined if recall is 0. See an example of precision-recall calculations and curves.
  - Calculate the interpolated precision values for this curve at the standard recall values 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0 (it may happen, however, that your measured recall values will end much sooner, e.g. at 0.5). Write them down in a table; you get one table per search engine. See an example of interpolation.
    Typical mistake: Please observe that an interpolated precision value is the highest, not closest, measured precision value "to the right" (which includes your exact measured precision if your measured recall value happens to be equal to the standard recall value being considered). A sequence of interpolated precision values is either flat or falling, never rising.
- After you have acquired 4 precision-recall curves (one for each query, the same search engine) and their interpolated precision values, calculate the average interpolated precision values for all 4 curves together at the standard recall values 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0. Write the average interpolated precision values in the same table where all the interpolated precision values for this search engine reside.
  When you calculate the average interpolated precision value, use 0 if you have a missing interpolated precision value.

Now you have average interpolated precision values for the 3 selected search engines. Draw 3 average interpolated precision-recall curves, one for each search engine, at the standard recall values 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0 in one picture.

Compare the search engines.

Report

Write down each query for each search engine. Write it once if you didn’t change it.

For each information need, tell how many relevant documents you got in the pool.

Present 3 precision-recall pictures, one for each selected search engine. In each precision-recall picture show

the 4 measured precision-recall curves, one per information need, and
a table as required above for the interpolated precision values at the standard recall values 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0.

In one final picture, show the 3 average interpolated precision-recall curves (one curve per search engine) for your selected search engines. Tell which engine you think is best? Why?

In case you have trouble with drawing precision-recall curves, heres comes an Excel example ("Raw P-R", "Interpolated P"); most probably Google spreadsheets work the same way.

In order to be sure you calculated the right thing, write the precision and recall formulas you used. Describe how you did interpolation of precision values at the standard recall values.

Don't forget your name on the report.

Eriks Sneiders