Tool:
Tokenizer

Date
2022-06-01
Journal Title
Journal ISSN
Volume Title
Publisher
Boğaziçi University
Contact Person
Buse, Ak, buse.ak@boun.edu.tr, Boğaziçi University
Demo Page
Abstract
Description
Tokenization is the process of segmenting a text into tokens. Given a text, the tokenizer identifies the tokens (words, punctuation marks, etc.) within the text and outputs the tokens separately. This process is necessary for applications that work on a per token basis.
Keywords
Tokenization, Word splitting, Word segmentation
Citation
Referenced by
Sponsor
Boğaziçi University, 16909, Research Fund, ownFunds