Tool: Tokenizer
dc.contributor.author | Ak, Buse | |
dc.contributor.author | Güngör, Tunga | |
dc.date.accessioned | 2023-03-03T22:21:39Z | |
dc.date.available | 2023-03-03T22:21:39Z | |
dc.date.issued | 2022-06-01 | |
dc.description | Tokenization is the process of segmenting a text into tokens. Given a text, the tokenizer identifies the tokens (words, punctuation marks, etc.) within the text and outputs the tokens separately. This process is necessary for applications that work on a per token basis. | |
dc.description.sponsorship | Boğaziçi University, 16909, Research Fund, ownFunds | |
dc.identifier.uri | https://tulap.cmpe.boun.edu.tr/handle/20.500.12913/53 | |
dc.language.iso | Turkish | |
dc.publisher | Boğaziçi University | |
dc.source.uri | https://github.com/BOUN-TABILab-TULAP/tokenizer | |
dc.subject | Tokenization | |
dc.subject | Word splitting | |
dc.subject | Word segmentation | |
dc.title | Tokenizer | |
dc.type | toolService | |
dspace.entity.type | Tool | |
local.contact.person | Buse, Ak, buse.ak@boun.edu.tr, Boğaziçi University | |
local.demo.uri | https://tulap.cmpe.boun.edu.tr/demo/tokenizer |