a) DSV football
b) Nikos DSV
See an example of calculating similarity between a document and a query. In your report, you are not required to repeat all the details that you find in the example. Note that because term frequencies are used as the only term weights, it is possible to convert the formula to something very simple which makes the similarity calculations trivial.
Use any means of calculation you find appropriate. Allow 5 digits after the decimal point. Consider examples:
Verify your page rank. Remove the external page, i.e. 0.ddmm = 0. Recalculate the page rank values. The average page rank value, considering the five Documents A through E, should be 0.65301. Please note that the average is not 1 because Document E does not return its page rank back to the collection.
Typical mistakes. Be careful when you count the number of outgoing links. Document A has 3 outgoing links, Document B has 2, Document C has 2, Document D has 4, and Document E has no outgoing links.
Be careful when you do Excel iterations (if you do). Check several times whether you still have the right formula in all the Excel cells.
Re-calculate the page rank of each document, re-order the documents:
In order to show the relevance of the documents to the query applying different formulas, fill in the table below. Remember to re-order the documents for each similarity metric according to their placement in the list.
Placement in the list |
Initial Page Rank | New Page Rank | sim(q, d) | SIM1(q, d) | SIM2(q, d) | |||||
---|---|---|---|---|---|---|---|---|---|---|
Doc id | PR value | Doc id | PR value | Doc id | sim value | Doc id | SIM value | Doc id | SIM value | |
1 | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
2 | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
3 | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
4 | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
5 | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
Don't forget to write your name on the report. Attach the Excel sheets and program code if you have any. This makes error detection easier.