Tool: Tokenizer
| dc.contributor.author | Ak, Buse | |
| dc.contributor.author | Güngör, Tunga | |
| dc.date.accessioned | 2023-03-03T22:21:39Z | |
| dc.date.available | 2023-03-03T22:21:39Z | |
| dc.date.issued | 2022-06-01 | |
| dc.description | Tokenization is the process of segmenting a text into tokens. Given a text, the tokenizer identifies the tokens (words, punctuation marks, etc.) within the text and outputs the tokens separately. This process is necessary for applications that work on a per token basis. | |
| dc.description.sponsorship | Boğaziçi University, 16909, Research Fund, ownFunds | |
| dc.identifier.uri | https://tulap.cmpe.boun.edu.tr/handle/20.500.12913/53 | |
| dc.language.iso | Turkish | |
| dc.publisher | Boğaziçi University | |
| dc.source.uri | https://github.com/BOUN-TABILab-TULAP/tokenizer | |
| dc.subject | Tokenization | |
| dc.subject | Word splitting | |
| dc.subject | Word segmentation | |
| dc.title | Tokenizer | |
| dc.type | toolService | |
| dspace.entity.type | Tool | |
| local.contact.person | Buse, Ak, buse.ak@boun.edu.tr, Boğaziçi University | |
| local.demo.uri | https://tulap.cmpe.boun.edu.tr/demo/tokenizer |