Dataset: TBMM Corpus
dc.contributor.author | Güngör, Onur | |
dc.contributor.author | Tiftikçi, Mert | |
dc.contributor.author | Sönmez, Çağıl | |
dc.date.accessioned | 2023-03-03T22:22:46Z | |
dc.date.available | 2023-03-03T22:22:46Z | |
dc.date.issued | 2018-05-01 | |
dc.description | This corpus contains the transcripts of Grand National Assembly of Turkish Parliament (TBMM) meetings which span nearly a century between 1920 and 2015. The corpus contains 208 million tokens in 12645 documents. The documents are shared in an easily parsable format with an accompanying code for viewing, browsing, and querying the corpus. This code is required as the raw corpus files do not store the surface forms but only the designated unique ids which are then resolved using the accompanying vocabulary file. Below are the first few tokens from the document that contain the transcripts of the first convention that took place on 15/12/1965. 'TOPLANTI', 'T', 'B', 'M', 'M', 'TUTANAK', 'DERGİSİ', 'Cilt', '5', '15', '12', '1965', 'tarihli', '1', 'nci', 'Birleşimden', '6', '7', '1966', 'tarihli', '10', 'ncu', 'Birleşime', 'kadar', '1965', '1966', 'Fihrist', 'ÇEŞİTLİ', 'İŞLER', 'Sayfo', 'Sayfa', 'Aydın', 'Milletvekili', 'Reşat', 'Özarda', 'nın', 'Türkiye', 'Büyük', 'Millet', 'Meclisi', 'Birleşik', 'Toplantı', 'İçtüzüğünün', '14', 've', '16', 'ncı', 'maddelerinin', 'değiştirilmesi', 'hakkındaki', 'tüzük', 'teklifinin', 'Cumhuriyet', 'Senatosu', 'Adalet', 've', 'Anayasa', 've', 'Millet', 'Meclisi', 'Anayasa', 'komisyonlarından', 'seçilecek', '7', … | |
dc.description.sponsorship | Turkish Ministry of Development, DPT2007K120610, TAM Project, nationalFunds | |
dc.identifier.uri | https://tulap.cmpe.boun.edu.tr/handle/20.500.12913/66 | |
dc.language.iso | Turkish | |
dc.publisher | Boğaziçi University | |
dc.relation.isreferencedby | http://lrec-conf.org/workshops/lrec2018/W2/pdf/19_W2.pdf | |
dc.rights | Apache License 2.0 | |
dc.rights.uri | http://opensource.org/licenses/Apache-2.0 | |
dc.subject | Parliamentary text | |
dc.subject | Historical records | |
dc.subject | Transcriptions | |
dc.title | TBMM Corpus | |
dc.type | corpus | |
dspace.entity.type | Dataset | |
local.contact.person | Onur, Güngör, onurgu@gmail.com, Boğaziçi University |
Files
Original bundle
1 - 3 of 3
No Thumbnail Available
- Name:
- turkish-parliament-texts-0.4b.tar.gz
- Size:
- 38.26 MB
- Format:
- Unknown data format
- Description:
No Thumbnail Available
- Name:
- v0.4b.tar.gz
- Size:
- 38.26 MB
- Format:
- Unknown data format
- Description:
No Thumbnail Available
- Name:
- tbmm-corpus-v0.4b.tar.bz2
- Size:
- 1.21 GB
- Format:
- Unknown data format
- Description:
- Unknown
License bundle
1 - 1 of 1