Minutes of the Second ScandSum meeting June 9, 2002, Holmen Fjordhotell, Oslo, Norway

Participants of the meeting

Hercules Dalianis, KTH
Koenraad de Smedt Univ in Bergen
Jürgen Wedekind, CST Copenhagen
Janne Bondi Johannessen,Univ in Oslo
Helge Dyvik, Univ in Bergen
Henrik Holmboe, Norfa
Victoria Rosén, Univ i Bergen
Veronika Haderlein, FAST Search and Transfer, Oslo
Lotte Weilgaard, Syddansk Univ, Kolding
Margrete H. Møller, Syddansk Univ, Kolding

Meeting

Hercules described the current Swedish text summarizer, see also
previous minutes First ScandSum meeting Åre

Janne Bondi Johannessen described the Oslo-Bergen tagger which can be used for tagging Norwegian texts. The Oslo-Bergen tagger can be adapted to generate SGML based tags. These tags can be received by the coming new version of the SweSum summarization engine and hence summarize Norwegian texts.

The tagger can be tested here.http://decentius.hit.uib.no:8005/cl/cgp/test.html

We plan to use a server based tagger that can be communicated through SSH protocoll so the new Summarization system will be fully distributed and hence easy to support and update.
The same will be with the Granska Tagger.

The tagger format will be similar to the following:

<text>
<paragraph>
<sentence>
<clause>
<word lemma="ha" tag="verb">Har</word>
...
</clause>
</sentence>
</paragraph>
</text>

Though if the text structure is not alsways hierarchical we must be able to treat it,
adding for example these tags

<div type="s"> between sentences

<word> eksemplet <lemma type="eksempel" cat="subst"> </word>

Victoria Rosén described the lexicon from the Scarrie project. This lexicon contains relations between possible lexical variants in Bokmål, for example between høyesterett and høgsterett. This offers great potential for grouping variants of keywords. The SCARRIE lexicon can be used as a basis for constructing a word list for the current SweSum architecture, or it can be used to define keyword links on the output of the tagger as well as on user keywords.

A paper on SCARRIE is available at
http://ling.uib.no/~desmedt/papers/MONS8-paper.html

Helge Dyvik talked about Word senses based on parallel corpora.
Helge decribed a method where going back and forth between the processed parallel corpora starting with one lexical item and finding the most closed related in the other corpora / language and then back again to the other corpora / language and hence finding the lexical items which are closest semantically related. This method with enough large parallel corpora can make the building of wordnets / ontologies partly automatized.

New people that could contribute:

Viggo Kann KTH
Bergen people
Janne Bondi Johannessen / Paul Meurer Univ of Bergen
Ari Pirkola or Kalervo Järvelin University in Tampere Finland
Kristin Bjaradottir, Inst. of Lexicography, Island
Tiit Roosma University of Tartu, Estonia
Everita Milconoka or Inguna Skadina Univ of Latvia
Vidas Daudaravicius CLC at Vytautas Magnus University, Kaunas, Lithuania.
Veronika Haderlein FAST Search and Transfer Oslo

Possible dates

* 13-15 Sept 2002, Skagen, Denmark
* 25-28 Jan 2003, Geilo or Voss, Norway
* 5-8 April 2003, Åre

Tasks by next meeting Skagen

Invite new nodes / people to Skagen
Make the Norwegian tagger work
Prepare diskussion of Danish resources