What is automatic text summarization?

Automatic text summarization is the technique, where a computer summarizes a text. A text is entered into the computer and a summarized text is returned, which is a non redundant extract from the original text.

The technique has its roots in the 60's and has been developed during 30 years, but today with the Internet and the WWW the technique has become more important.

Microsoft Word has since 1997 a summarizer for documents. (See under Tools where you can find Summary).

Automatic text summarization can be used:

SweSum is the first automatic text summarizer for Swedish.
It summarizes Swedish news text in HTML/text format on the WWW.
During the summarization 5-10 key words - a mini summary is produced.
Accurancy 84% at 40% summary of news with an average original length of 181 words.

Automatic text summarization is based on statistical,linguistical and heuristic methods where the summarization system calculates how often certain key words (the Swedish system has 700 000 possible Swedish entries pointing at 40 000 Swedish base key words). The key words belong to the so called open class words. The summarization system calculates the frequency of the key words in the text, which sentences they are present in, and where these sentences are in the text. It considers if the text is tagged with bold text tag, first paragraph tag or numerical values. All this information is compiled and used to summarize the original text.

SweSum is also available for Danish, Norwegian, English, Spanish, French, Italian, Greek, Farsi (Persian) and German texts.

Read more

2007 Hassel, M. Resource Lean and Portable Automatic Text Summarization, PhD-Thesis, School of Computer Science and Communication, KTH, ISBN-978-917178-704-0, pdf.

2005 Müürisep, Kaili and Pilleriin Mutso. ESTSUM - Estonian newspaper texts summarizer. Proceedings of The Second Baltic Conference on Human Language Technologies. April 4-5, 2005. Tallinn, pages 311-316. pdf.

2005 Hassel, M and H. Dalianis. Generation of Reference Summaries. In the proceedings of the 2nd Language & Technology Conference: Human Language Technologies as a Challenge for Computer Science and Linguistics, April 21-23 2005, Poznan, Poland, pdf.

2005 de Smedt, K., A. Liseth, M. Hassel, H. Dalianis 2005. How short is good? An evaluation of automatic summarization. In Holmboe, H. (ed.) Nordisk Sprogteknologi 2004. Årbog for Nordisk Språkteknologisk Forskningsprogram 2000-2004, pp 267-287, Museum Tusculanums Forlag, pdf

2005 Pachantouris, George. GreekSum - A Greek Text Summarizer, Master Thesis, Department of Computer and Systems Sciences, KTH-Stockholm university, pdf.

2004 Liseth, Anja. Hvor kort er godt? : En evaluering av NorSum: en automatisk tekstsammenfatter for norsk. Hovedoppgave.  Seksjon for lingvistiske fag. Universitetet i Bergen, (på norska), html.

2004 Hassel, Martin: Evaluation of automatic text summarization - a practical implementation. Licentiate thesis, Stockholm, NADA-KTH, pdf.

2004 Dalianis, H., M. Hassel, K. de Smedt, A. Liseth, T.C. Lech and J.Wedekind. Porting and evaluation of automatic summarization. In Holmboe, H. (ed.) Nordisk Sprogteknologi 2003. Årbog for Nordisk Språkteknologisk Forskningsprogram 2000-2004, pp. 107-121. Museum Tusculanums Forlag, pdf.

2004 Hassel, M and N. Mazdak, FarsiSum - a Persian text summarizer, In the proceedings of Computational Approaches to Arabic Script-based Languages, Workshop at Coling 2004, the 20th International Conference on Computational Linguistics, August 28 2004, Geneva, Switzerland. pdf.

2004 Mazdak, Nima. FarsiSum - a Persian text summarizer, Master thesis, Department of Linguistics, Stockholm University, pdf.

2003 Decker, Anna. Towards automatic grammatical simplification of Swedish text. Master thesis, Computational Linguistics, Department of Linguistics, Stockholm University, pdf.

2003 Dalianis, H., M. Hassel, J. Wedekind, D. Haltrup, K. de Smedt and T.C. Lech. Automatic text summarization for the Scandinavian languages. In Holmboe, H. (ed.) Nordisk Sprogteknologi 2002: Årbog for Nordisk Språkteknologisk Forskningsprogram 2000-2004, pp. 153-163. Museum Tusculanums Forlag, pdf.

2003 Hassel, Martin. Exploitation of Named Entities in Automatic Text Summarization for Swedish. In the proceedings of NODALIDA 2003, the 14th Nordic Conference of Computational Linguistics, Reykjavik, May 30-31, 2003. (pdf)

2003 Fallahi, Sasan: Computer aided text summarization. Using SweSum in a real newspaper environment. OH bilder tillgängliga här. (pdf).

2003 Wedekind, J. Brugervenligt værktøj til automatisk resummering af videnskabelige dokumenter. Danmarks Elektroniske Forskningsbibliotek. (html)

2002 Hassel, M. Development of a Swedish Corpus for Evaluating Summarizers and other IR-tools pdf

2001 Evaluation of the French text summarizer (på franska) pdf

2001 Hassel, M. Pronominal Resolution in Automatic Text Summarisation pdf

2000 Dalianis, H. SweSum - A Text Summarizer for Swedish, Technical report TRITA-NA-P0015, IPLab-174, NADA, KTH, October 2000, html


Responsible for this page: Hercules Dalianis <hercules@kth.se>
Latest change March 22, 2017