The process of breaking text into smaller parts, like words or variables, to make it easier for machines to interpret and translate.
Tokenization refers to the process of breaking down text into smaller units, known as tokens.
In software localization and Natural Language Processing (NLP), tokenization plays a key role in analyzing, processing, and translating content accurately.
In localization workflows, tokenization helps identify which parts of the text should be translated, preserved (e.g., code variables, names, placeholders), or treated differently.
Proper tokenization supports features like string segmentation, translation memory, and automatic formatting. Inaccurate token boundaries may cause translation errors or layout issues in the final localized product, while good tokenization avoids breaking code or formatting while localizing dynamic UI content.
In software development, tokenization can also refer to placeholder management, where variables (like {username}
or %1
) are inserted into translatable strings and need protection during translation.
Localazy handles tokenized content safely, preserving placeholders during translation and helping translators focus only on what needs to be translated.