Dataset: BOUN Treebank v2.11
dc.contributor.author | Marşan, Büşra | |
dc.contributor.author | Türk, Utku | |
dc.contributor.author | Atmaca, Furkan | |
dc.contributor.author | Özateş, Şaziye Betül | |
dc.contributor.author | Berk, Gözde | |
dc.contributor.author | Bedir, Seyyit Talha | |
dc.contributor.author | Köksal, Abdullatif | |
dc.contributor.author | Başaran, Balkız Öztürk | |
dc.contributor.author | Güngör, Tunga | |
dc.contributor.author | Özgür, Arzucan | |
dc.contributor.author | Uskudarli, Susan | |
dc.contributor.author | Akkurt, Salih Furkan | |
dc.date.accessioned | 2023-03-03T22:22:46Z | |
dc.date.available | 2023-03-03T22:22:46Z | |
dc.date.issued | 2022 | |
dc.description | This dataset is the re-annotated version of BOUN Treebank. Extracted from Turkish National Corpus (TNC), BOUN Treebank consists of 9,761 sentences (121,214 tokens) from five different text types: Biographical texts, national newspapers, instructional texts, popular culture articles, and essays. The syntactic dependency relations and morphological features of the sentences were manually annotated by linguists following the UD scheme. Some statistics on the treebank: - Although the dataset shows word order variance, more than %70 of the sentences have OV and SV word order. - The average token count of the updated treebank is 12.74 and the average arc length is 2.90. | |
dc.description.sponsorship | TÜBİTAK, 16909, Dilbilim Temelli Türkçe Doğal Dil İşleme Platformu, nationalFunds | |
dc.identifier.uri | https://tulap.cmpe.boun.edu.tr/handle/20.500.12913/65 | |
dc.language.iso | tur | |
dc.publisher | Boğaziçi University | |
dc.relation.isreferencedby | https://arxiv.org/abs/2207.11782 | |
dc.subject | dependency annotation | |
dc.subject | universal dependencies | |
dc.title | BOUN Treebank v2.11 | |
dc.type | corpus | |
dspace.entity.type | Dataset | |
local.contact.person | Büşra, Marşan, busra.marsan@boun.edu.tr, Boğaziçi University | |
local.size.info | 9761, sentences |
Files
Original bundle
1 - 3 of 3
No Thumbnail Available
- Name:
- tr_boun_v2-dev.conllu
- Size:
- 944.41 KB
- Format:
- Unknown data format
- Description:
- Turkish BOUN Treebank v2, dev file
No Thumbnail Available
- Name:
- tr_boun_v2-test.conllu
- Size:
- 933.25 KB
- Format:
- Unknown data format
- Description:
- Turkish BOUN Treebank v2, test file
No Thumbnail Available
- Name:
- tr_boun_v2-train.conllu
- Size:
- 7.55 MB
- Format:
- Unknown data format
- Description:
- Turkish BOUN Treebank v2, train file