Dataset:
TBMM Corpus

dc.contributor.authorGüngör, Onur
dc.contributor.authorTiftikçi, Mert
dc.contributor.authorSönmez, Çağıl
dc.date.accessioned2023-03-03T22:22:46Z
dc.date.available2023-03-03T22:22:46Z
dc.date.issued2018-05-01
dc.descriptionThis corpus contains the transcripts of Grand National Assembly of Turkish Parliament (TBMM) meetings which span nearly a century between 1920 and 2015. The corpus contains 208 million tokens in 12645 documents. The documents are shared in an easily parsable format with an accompanying code for viewing, browsing, and querying the corpus. This code is required as the raw corpus files do not store the surface forms but only the designated unique ids which are then resolved using the accompanying vocabulary file. Below are the first few tokens from the document that contain the transcripts of the first convention that took place on 15/12/1965. 'TOPLANTI', 'T', 'B', 'M', 'M', 'TUTANAK', 'DERGİSİ', 'Cilt', '5', '15', '12', '1965', 'tarihli', '1', 'nci', 'Birleşimden', '6', '7', '1966', 'tarihli', '10', 'ncu', 'Birleşime', 'kadar', '1965', '1966', 'Fihrist', 'ÇEŞİTLİ', 'İŞLER', 'Sayfo', 'Sayfa', 'Aydın', 'Milletvekili', 'Reşat', 'Özarda', 'nın', 'Türkiye', 'Büyük', 'Millet', 'Meclisi', 'Birleşik', 'Toplantı', 'İçtüzüğünün', '14', 've', '16', 'ncı', 'maddelerinin', 'değiştirilmesi', 'hakkındaki', 'tüzük', 'teklifinin', 'Cumhuriyet', 'Senatosu', 'Adalet', 've', 'Anayasa', 've', 'Millet', 'Meclisi', 'Anayasa', 'komisyonlarından', 'seçilecek', '7', …
dc.description.sponsorshipTurkish Ministry of Development, DPT2007K120610, TAM Project, nationalFunds
dc.identifier.urihttps://tulap.cmpe.boun.edu.tr/handle/20.500.12913/66
dc.language.isoTurkish
dc.publisherBoğaziçi University
dc.relation.isreferencedbyhttp://lrec-conf.org/workshops/lrec2018/W2/pdf/19_W2.pdf
dc.rightsApache License 2.0
dc.rights.urihttp://opensource.org/licenses/Apache-2.0
dc.subjectParliamentary text
dc.subjectHistorical records
dc.subjectTranscriptions
dc.titleTBMM Corpus
dc.typecorpus
dspace.entity.typeDataset
local.contact.personOnur, Güngör, onurgu@gmail.com, Boğaziçi University
Files
Original bundle
Now showing 1 - 3 of 3
No Thumbnail Available
Name:
turkish-parliament-texts-0.4b.tar.gz
Size:
38.26 MB
Format:
Unknown data format
Description:
No Thumbnail Available
Name:
v0.4b.tar.gz
Size:
38.26 MB
Format:
Unknown data format
Description:
No Thumbnail Available
Name:
tbmm-corpus-v0.4b.tar.bz2
Size:
1.21 GB
Format:
Unknown data format
Description:
Unknown
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.62 KB
Format:
Plain Text
Description:
Collections