Guest professorship at CST - Center for Center for Language
Technology, University of Copenhagen founded by Norfa
Hercules Dalianis has a part time Guest professorship at CST Center
for Language
Technology, University of Copenhagen during 2002-2005
The aim of the guest professorship at CST
- Centre
for Language Technology, is to transfer and develop competence for
language technology
between Swedish and Danish within the following areas.
- Automatic text summarization
- Information retrieval using human language technology
- Evaluation of the above techniques
The amount of information on Internet is growing rapidly and we need
tools to
tame this flow. Tools based in filtering and extracting information.
Text summarization
Automatic text summarization is the technique where a computer
summarizes a text by extracting the most important information and
compile a new non redundant text. An example of this technique is
SweSum automatic text
summarizer (Dalianis 2000) for Swedish news text. We adapted the
Swedish text summarizer SweSum to Danish DanSum using the Danish STO-lexicon
(in Danish)<>. We adapted also SweSum to Norwegian in NorSum. DanSum
has been evaluated in the project DefSum at Danmarks
Elektroniske Forskningsbibliotek. DanSum was also evaluated by
creating a Danish
extract corpus collection, the evaluation is described in Hassel
(2005) and in de Smedt et al (2004). The evaluation of NorSum is
decribed in Liseth (2004)
>
Information retrieval
The SiteSeeker search engine is a (Swedish) language sensitive
search engine
for web sites and intranets. We wanted also to adapt SiteSeeker to
Danish and Norwegian to improve precision and recall. We started by
connecting SiteSeeker to the Scandinavian multilingual web site Nordoknet.
Then we adapted the Swedish stemmer first to Danish and then to
Norwegian. The work was rather straightforward since both Danish and
Norwegian are closely related languages to Swedish. We used also
the CST
lemmatizer to automatically create stemmers from each of our
keyword dictionaries used for each of our text summarizers. The work
was a success the stemmers became very exact, except for Norwegian
where the manual rules based stemmer was more precise.
Cross language information retrieval was carrid out in an early
prototype described in (Wedekind 2005) In the final period of the guest
professorship we obtained funding from the Nordic Council to construct
a cross language search engine for the scandinavian languages TvärSök.
References
Dalianis, H. 2000 SweSum - A Text Summarizer for Swedish, Technical
report TRITA-NA-P0015, IPLab-174, NADA, KTH, October 2000
de Smedt, K., A. Liseth, M. Hassel, H. Dalianis 2005. How short is
good? An evaluation of automatic summarization. In Holmboe, H. (ed.)
Nordisk Sprogteknologi 2004. Årbog for Nordisk
Språkteknologisk Forskningsprogram 2000-2004, pp 267-287, Museum
Tusculanums Forlag,
Hassel, Martin. 2004. Evaluation of automatic text summarization – a
practical implementation. Licentiate thesis, Stockholm, NADA-KTH.
Liseth, Anja. 2004. Hvor kort er godt? : En evaluering av NorSum: en
automatisk tekstsammenfatter for norsk. Hovedoppgave. Department of
Linguistics. University of Bergen. (In Norwegian).
Wedekind, J. 2005. Towards Multilingual Retrieval of Document
Information on Language Technology. In Holmboe, H. (ed.) Nordisk
Sprogteknologi 2005. Årbog for Nordisk Språkteknologisk
Forskningsprogram 2000-2004, pp 33-38, Museum Tusculanums Forlag.
Pressreleases
DanSum, the first text
summariser for Danish, Norfa October 9, 2002.
Summarised
news for the mobile phone, March 29, 2004.
Sammanfattade nyheter i mobilen, March 29, 2004, (in Swedish).
Latest change August 18, 2005.