Program and Participants of the Seventh ScandSum meeting 20-21 march 2004, Fjällgården Åre


Hercules Dalianis - KTH Stockholm
Martin Hassel - KTH Stocholm
Nima Mazdak - KTH and Stockholm University
Koenraad de Smedt - University of Bergen
Anja Liseth - University of Bergen
Till Christopher Lech - Cognit and University of Bergen
Jürgen Wedekind - CST - Copenhagen University
Henrik Holmboe - Norfa - Aarhus School of Business
Kaili Müürisep - University of Tartu

Farsisum by Nima Mazdak
The summarizer for Persian - Master thesis link

Evaluation strategies by Martin Hassel
Questions from Koenraad, why not using many queries per summary to find the quality in the question-answering scheme.

NorSum evaluation – Anja Liseth
Problems in constructing an extract corpus: When two sentences get the same selection frequency, is when two sentences get the same number of votes, which one to select among? We choose the one with highest position rank.
What happens if the ideal/gold standard contain mutual excluding sentences.
Should we include them in ideal summary. We need methods to calculate these

Bergen count compression rate on sentence level, Stockholm at word level.

Anja informants felt that the text were full of air therefore the easy to summary down to four sentences.

KunDoc project and demo of new Summarizer in MS-Office – Till
One can summarize a document in Word and obtain keywords that directly are used to search with Google.

Semantic web
Make the webb structured using Ontologies,
XTM predicate logic, RFD, DAML+OIL, OWL,
Darpa homepage full of ontologies in different domains.
Using semantic web one ask question like “which project had a meeting in Åre 2004?”

Tools for Semantic web: Text to Onto, Protégé,

Results from automatic evaluation of DanSum and SweSum
Hercules & Martin
The results shows that Danish informants at average of 67% agree of extracts at average summarization length
The results shows that Swedish informants at average of 61% agree of extracts at average summarization length
Both Danish and Swedish human extracts have an average length of 32% and 34% respectively.
Martin found the best extract compared to the majority votes for all languages. Some of the extracts had 100% overlap to majority vote. Now we have to compare these best extracts with SweSum.

Danish summarization – Jürgen
Summarize more coherent. Users would prefer either 15% (news paper editing) or 85% (news surveillance) compression rate. Summarize whole article and take care of different segment in different way.

Demonstration - Grim -
a language learning environmentfor writers of Swedish (spell- and grammar checking, translation, summarization) - Hercules

Language technology resources for Estonian text summarization-Kaili Müürisep
In Tallinn there are two place for language technology and that is Institute of Estonian Languages (dictionaries, morphology) and Institute of Cybernetics (speech processing). In Tartu there is University of Tartu that also works with text summarization in a small scale. Two systems has been developed -AutoSum with a powerful analysis. AutoSum was a short bachelor project.  EstSum is a smaller but more modular system that is written in Perl and that also is continued with small means. Kaili described also that ! and ? give penalty points in the summary. Questions can be removed. The same with exclamations. EstSum uses a corpora of half million words to calculate the frequency of words. Though no lemmatizer is used yet.

TvärSök proposal - Hercules
Hercules described a research proposal of of cross language information retrievel in three languages Danish, Swedish and Norwegian, using three different approaches
Lexikon-lockup, fuzzy matching, and Random Indexing.

Paper writing
Norfa Årbog contribution
Longer article, 20-25 pages, at Advanced popular scientific level containing overview of ScandSum work Future direction, Graphs more than numbers.

Koenraad is the editor Deadline for paper June 30, 2004,

Hercules will contact Sasan

New research proposal and
Fund rising for continuation of work

Nordpluss nabo deadline March 1, 2005
Nordpluss sprog decision April 15, 2004
Vetenskapsrådet April 20, 2004
NFR IT-funk

Existing projects
Nordic Graduate School in Language Technology

Slides from the meeting (in PDF)
Hercules Dalianis OH-slides, 1,  2, and 3
Nima Mazdak OH-slides
Martin Hassel OH-slides OH-slides
Anja Liseth OH-slides
Jürgen Wedekind OH-slides
Till Lech OH-slides
Kaili Müürisep OH-slides

Latest change 24 march 2004.