Dataset: Turkish Multi-document Summarization (MDS) Corpus
Date
2014-10-01
Journal Title
Journal ISSN
Volume Title
Publisher
Boğaziçi University
Contact Person
Arzucan, Özgür, arzucan.ozgur@boun.edu.tr, Boğaziçi University
Abstract
Description
The corpus includes four folders. The folder “clusters” holds the original documents that will be summarized in 21 subfolders. Each subfolder contains about 10 documents (multi-documents) related to the same topic. There are three different manually prepared summaries of these 21 topics in the folders “models1”, “models2”, and “models3”. Each of these summary folders contains 21 text files such that each text file is the multi-document summary of the documents in that topic.
Example:
The files in the folder clusters/1/:
1.txt: Zonguldak'ta ruhsatsız olduğu ileri sürülen …
…
9.txt: Zonguldak'ta, ruhsatsız kömür ocağında …
The file in the folder models1:
1: Zonguldak'ta yaşanan 2 ayrı maden kazasında …
The file in the folder models2:
1: Zonguldak'ta çalışma ruhsatı olmayan …
The file in the folder models3:
1: Zonguldak'ta ruhsatsız olduğu ortaya çıkan …
Keywords
Multi-document text summarization