Dataset: TBMM Corpus
Date
2018-05-01
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Boğaziçi University
Contact Person
Onur, Güngör, onurgu@gmail.com, Boğaziçi University
Abstract
Description
This corpus contains the transcripts of Grand National Assembly of Turkish Parliament (TBMM) meetings which span nearly a century between 1920 and 2015. The corpus contains 208 million tokens in 12645 documents. The documents are shared in an easily parsable format with an accompanying code for viewing, browsing, and querying the corpus. This code is required as the raw corpus files do not store the surface forms but only the designated unique ids which are then resolved using the accompanying vocabulary file.
Below are the first few tokens from the document that contain the transcripts of the first convention that took place on 15/12/1965.
'TOPLANTI', 'T', 'B', 'M', 'M', 'TUTANAK', 'DERGİSİ', 'Cilt', '5', '15', '12', '1965', 'tarihli', '1', 'nci', 'Birleşimden', '6', '7', '1966', 'tarihli', '10', 'ncu', 'Birleşime', 'kadar', '1965', '1966', 'Fihrist', 'ÇEŞİTLİ', 'İŞLER', 'Sayfo', 'Sayfa', 'Aydın', 'Milletvekili', 'Reşat', 'Özarda', 'nın', 'Türkiye', 'Büyük', 'Millet', 'Meclisi', 'Birleşik', 'Toplantı', 'İçtüzüğünün', '14', 've', '16', 'ncı', 'maddelerinin', 'değiştirilmesi', 'hakkındaki', 'tüzük', 'teklifinin', 'Cumhuriyet', 'Senatosu', 'Adalet', 've', 'Anayasa', 've', 'Millet', 'Meclisi', 'Anayasa', 'komisyonlarından', 'seçilecek', '7', …
Keywords
Parliamentary text, Historical records, Transcriptions
Citation
Referenced by
Sponsor
Turkish Ministry of Development, DPT2007K120610, TAM Project, nationalFunds