Sixth ScandSum Minutes Bergen 7 October 2003

Program and Participants of Sixth ScandSum meeting 7 October 2003,
UiB, Bergen, Norway

Participants
Hercules Dalianis KTH
Martin Hassel KTH
Koenraad de Smedt Univ of Bergen
Anja Liseth Univ of Bergen
Till Cristopher Lech Cognit, Norway
Bart Jongejan CST, Denmark

Presentations
En evaluering av NorSum, Evaluation of NorSum - Anja Liseth and Koenraad de Smedt Univ of Bergen
Anja is performing the research in cooperation with Bergens Tidende, BT. Christian Lepsöe och Svend Solheim BT are the contact persons at BT
Articles from NTB Norges Telegrambyrå are cut below. This gives few real summaries.

Anja Bergen Extract tool preprocesses and selects manually the sentences.
Database with 30 articles each article with each sentence with a uniq ID.
Does the informants work different that the editors?
Should one adapt NorSum to the editing work?

Bredt-Projektet by Anja Liseth
Behandling av referensielle enheter i diskursteori
Marking of discourse chains (anaphor resolution) in a tagged Norwegian corpus contains fiction texts.

The differences between news text and fiction texts are long chains with pronouns in fiction texts and in news there are referential noun phrases. Pronouns obscures kew words by surpresses the frequencies for them, pronoun resolution will solve this problem.

From SweSum till FarsiSum - Hercules Dalianis
In cooperation with Nima Mazdak (engineer and computational linguist), a version for Farsi has been made. This version uses only removal of stopwords and verbs in the identification of keywords. Stopwords are frequent non-content words collected from a concordance. Verbs are removed by a heuristic based on SOV-order in Farsi. Texts are coded in UTF-8.

KTH Extract Korpus - To create an extract corpus and an automatic evaluation schema - Martin Hassel
Named Entity (NE) together with summarization gives to little background information.
KTH eXtract tool was used on 10 texts and 11 informants gave us an extract corpus of 96 extracts that was used to evaluate SweSum.
One observation is that it is 40 percent overlap between different extracts when one looks at the average extract one see that there is 70 percent overlapp between different extracts.

Using SweSum without NE gives us 57 percent overlap between SweSum summaries and the extracts and using SweSum with NE gives 34 percent overlap with extract corpus. Summarization works better without NE. Hassel 2003 (PDF).

Kunnskapsbasert dokumentanalyse og sammendrag, Knowledgebased document analysis and summarization - Till Lech
A three year research project, KunDoc, together with UIB funded by Norskt sprogråd.
The aim of the project is to find relations between discourse and coreference relations
The project will also bae a network building summer school and conference. Till will be the PhD student.
Use:
Search and summarize
Kategorization of text content

Unni Eiken “Classification of predicate and arguments” that is a "hovedfag" student will look to disambiguate ambiguous coreferents.

Final meeting

18-21 March 2004, Fjällgården, Åre, Sweden

Tasks by next meeting Iceland

Write Norfa Yearbook 2003 (deadline end October 2003)
Finalize evaluation work
Write joint paper on summarization before March 18, 2004 (Koordinator Koenraad?)

Slides from the meeting (in PDF)
Anja Liseth OH-slides OH-slides
Hercules Dalianis OH-slides
Martin Hassel OH-slides
Till Lech OH-slides

Latest change 8 november , 2003.