BOUN Treebank v2.11

This dataset is the re-annotated version of BOUN Treebank. Extracted from Turkish National Corpus (TNC), BOUN Treebank consists of 9,761 sentences (121,214 tokens) from five different text types: Biographical texts, national newspapers, instructional texts, popular culture articles, and essays. The syntactic dependency relations and morphological features of the sentences were manually annotated by linguists following the UD scheme. Some statistics on the treebank: - Although the dataset shows word order variance, more than %70 of the sentences have OV and SV word order. - The average token count of the updated treebank is 12.74 and the average arc length is 2.90.

Keywords

dependency annotation, universal dependencies

Referenced by

https://arxiv.org/abs/2207.11782

Sponsor

TÜBİTAK, 16909, Dilbilim Temelli Türkçe Doğal Dil İşleme Platformu, nationalFunds

URI

https://tulap.cmpe.boun.edu.tr/handle/20.500.12913/65

Full item page

Dataset:
BOUN Treebank v2.11

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Contact Person

Abstract

Description

Keywords

Citation

Referenced by

Sponsor

URI

Dataset: BOUN Treebank v2.11

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Contact Person

Abstract

Description

Keywords

Citation

Referenced by

Sponsor

URI

Dataset:
BOUN Treebank v2.11