Quality of Retrieval by Major Search Engines

Assignment 1 in Internet Search Techniques and Business Intelligence

Task

The task is to compare performance of Internet search engines by running search queries, and measuring precision and recall of their search results.

You work in a group of 4 persons. Consider the following steps of your work:

  1. Select 4 information needs that will serve as your test queries.
  2. Select 3 search engines to compare.
  3. In order to measure recall, you will need pooling. Further down follows a description of your pooling process.
  4. Run your search queries, and measure precision and recall, as described below.
  5. Do interpolation of 4 individual precision-recall curves for one search engine, obtain one representative precision-recall curve per search engine.
  6. Compare the search engines.
  7. Write and submit the report.

Information Needs

Here are sample information needs. Feel free to choose any other information need.

Search Engines

Feel free to choose a general purpose search engine from the list of search engines.

Pooling

The purpose of pooling is to identify your collection of relevant documents for each information need (you can't handle a total collection of 8 000 000 000 documents).

Please observe that you do pooling for each of the 4 information needs separately and obtain 4 pools, one pool per information need.

Measuring precision and recall

Consider your 3 selected search engines.

Now you have average interpolated precision values for the 3 selected search engines. Draw 3 average interpolated precision-recall curves, one for each search engine, at the standard recall values 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0 in one picture.

Compare the search engines.

Report

Write down each query for each search engine. Write it once if you didn’t change it.

For each information need, tell how many relevant documents you got in the pool.

Present 3 precision-recall pictures, one for each selected search engine. In each precision-recall picture show

In one final picture, show the 3 average interpolated precision-recall curves (one curve per search engine) for your selected search engines. Tell which engine you think is best? Why?

In case you have trouble with drawing precision-recall curves, heres comes an Excel example ("Raw P-R", "Interpolated P"); most probably Google spreadsheets work the same way.

In order to be sure you calculated the right thing, write the precision and recall formulas you used. Describe how you did interpolation of precision values at the standard recall values.

Don't forget your name on the report.


Eriks Sneiders