Tokenizer

Tool:
Tokenizer

dc.contributor.author	Ak, Buse
dc.contributor.author	Güngör, Tunga
dc.date.accessioned	2023-03-03T22:21:39Z
dc.date.available	2023-03-03T22:21:39Z
dc.date.issued	2022-06-01
dc.description	Tokenization is the process of segmenting a text into tokens. Given a text, the tokenizer identifies the tokens (words, punctuation marks, etc.) within the text and outputs the tokens separately. This process is necessary for applications that work on a per token basis.
dc.description.sponsorship	Boğaziçi University, 16909, Research Fund, ownFunds
dc.identifier.uri	https://tulap.cmpe.boun.edu.tr/handle/20.500.12913/53
dc.language.iso	Turkish
dc.publisher	Boğaziçi University
dc.source.uri	https://github.com/BOUN-TABILab-TULAP/tokenizer
dc.subject	Tokenization
dc.subject	Word splitting
dc.subject	Word segmentation
dc.title	Tokenizer
dc.type	toolService
dspace.entity.type	Tool
local.contact.person	Buse, Ak, buse.ak@boun.edu.tr, Boğaziçi University
local.demo.uri	https://tulap.cmpe.boun.edu.tr/demo/tokenizer