Current Research Topics, March 1996 Introduction to filtering Streams of Electronic Messages: News, Email - Mandatory (must read, internal priority) - Educational (should read, internal interestingness) - Entertainment (read when possible, internal interestingness) - NOISE: thread mutation, flame, short comments No time to read it all. Classification and selection takes time. Short messages. Must read most of it to know its classification. HUMAN FILTERING cognitive Subject: keywords, phrases, idioms social Sender: commitment, quality, content social To: Cc: personal, group, mailing-list economic Length: content, reding time, transfer time Combinations: Sender & Length -> contents Subject & Length -> contents (bird.jpg, 122 bytes) Email is classified twice: when received for priority, when stored for retrieval FK's matrix Work Private Personal 1 2 Group 3 4 Subdivisions: orders, questions, requests, information, advertising. Orders, Information and Advertising mostly statements. Questions and Requests mostly questions. IDIOMS support string matching Re: followup marker (references field) WB wants to buy FS for sale RFI Request For Input CFP Call for papers Subject abuse: ======**** IMPORTANT ****====== MESSAGE FLOW INFORMATION RETRIEVAL purpose serendipity search query good-?-bad information source upstream downstream (repository) novelty recent archived noise high noise low noise structure threads articles Alta Vista Search Engine Complexity of human classification - knowledge, stress, interests, priorities A survey of filtering systems SLIDES GroupLens MIT, Univ. of Minnesota The PEFNA system