Tool:
Tokenizer

dc.contributor.authorAk, Buse
dc.contributor.authorGüngör, Tunga
dc.date.accessioned2023-03-03T22:21:39Z
dc.date.available2023-03-03T22:21:39Z
dc.date.issued2022-06-01
dc.descriptionTokenization is the process of segmenting a text into tokens. Given a text, the tokenizer identifies the tokens (words, punctuation marks, etc.) within the text and outputs the tokens separately. This process is necessary for applications that work on a per token basis.
dc.description.sponsorshipBoğaziçi University, 16909, Research Fund, ownFunds
dc.identifier.urihttps://tulap.cmpe.boun.edu.tr/handle/20.500.12913/53
dc.language.isoTurkish
dc.publisherBoğaziçi University
dc.source.urihttps://github.com/BOUN-TABILab-TULAP/tokenizer
dc.subjectTokenization
dc.subjectWord splitting
dc.subjectWord segmentation
dc.titleTokenizer
dc.typetoolService
dspace.entity.typeTool
local.contact.personBuse, Ak, buse.ak@boun.edu.tr, Boğaziçi University
local.demo.urihttps://tulap.cmpe.boun.edu.tr/demo/tokenizer
Files
Collections