Turkish Word Embeddings

Dataset:
Turkish Word Embeddings

dc.contributor.author	Güngör, Onur
dc.contributor.author	Yıldız, Eray
dc.date.accessioned	2023-03-03T22:22:45Z
dc.date.available	2023-03-03T22:22:45Z
dc.date.issued	2017-05-01
dc.description	This resource is a database of Turkish word embeddings learned using the skip-gram algorithm. A corpus of 940 million tokens was used to obtain 2 million word embeddings. The corpus is built by collecting from several online Turkish resources such as news outlets, forums, blogs, and e-books. This package consists of both the embeddings and the corpus. Each line in the file that stores the word embeddings contains one word surface form and 300 values that make up the dimensions of its embedding. The corpus consists of a single file that contains a sentence in each line.
dc.identifier.uri	https://tulap.cmpe.boun.edu.tr/handle/20.500.12913/62
dc.language.iso	Turkish
dc.publisher	Huawei Türkiye Ar-Ge Merkezi
dc.publisher	Boğaziçi University
dc.relation.isreferencedby	https://ieeexplore.ieee.org/document/7960223
dc.rights	Apache License 2.0
dc.rights.uri	http://opensource.org/licenses/Apache-2.0
dc.subject	Word embedding
dc.subject	Word2Vec
dc.subject	Skip-gram
dc.subject	Negative sampling
dc.title	Turkish Word Embeddings
dc.type	corpus
dspace.entity.type	Dataset
local.contact.person	Onur, Güngör, onurgu@gmail.com, Boğaziçi University