Scientific Abstracts Corpus

Öztürk, SeçilSankur, BülentGüngör, TungaYılmaz, Mustafa BerkayKöroglu, BilgeAğın, Onurİşbilen, MustafaUlaş, ÇağdaşAhat, Mehmet2023-03-032023-03-032014-04-01https://tulap.cmpe.boun.edu.tr/handle/20.500.12913/78The dataset is a labeled text corpus in 35 academic disciplines compiled from journals and conference proceedings. For each discipline, 200 papers were compiled. Each text includes the topic, name of the resource, title of the paper, abstract, and keywords (if available). The corpus consists of 34 xml files where each file corresponds to a discipline. Each xml file contains information about 200 papers. Information for a paper has the following format: <makale> <Etiket>discipline</Etiket> <Başlık>paper title</Başlık> <Özetçe>paper abstract</Özetçe> <Anahtar>keywords separated by commas</Anahtar> <Kaynak>journal/conference name</Kaynak> <TürkçeKarakter>Sorunsuz/Sorunlu</TürkçeKarakter> </makale> Example: <makale> <Etiket>Arkeoloji</Etiket> <Başlık>Burdur Bölgesi Neolitik Çağ Mimarlığı ve Anadolu'daki Çağdaşları Arasındaki Konumu Hakkında</Başlık> <Özetçe> Bu makalede yeni kazılarda elde edilen bilgiler ışığında Burdur yöresinde yaklaşık 2000 yıl (İÖ 7000 - 5300) süren Neolitik Çağ boyunca mimaride gözlenen özellikleri irdeleyeceğiz. ... </Özetçe> <Anahtar></Anahtar> <Kaynak>Adalya - Akdeniz Medeniyetleri Araştırma Enstitüsü Yıllığı</Kaynak> <TürkçeKarakter>Sorunsuz</TürkçeKarakter> </makale>TurkishApache License 2.0Scientific papersAbstractText classificationScientific Abstracts Corpuscorpus