A token in OpenAI is a unit of text that represents a common sequence of characters. For example, the word "hamburger" gets broken up into the tokens "ham", "bur" and "ger", while a short and common word like "pear" is a single token¹. Tokens are used by OpenAI's large language models to process and generate text.
The number of tokens in a piece of text is calculated by applying a tokenizer, which is a function that splits the text into tokens. Different models use different tokenizers, so the same text may have different token counts depending on the model. For example, newer models like GPT-3.5 and GPT-4 use a different tokenizer than previous models and will produce different tokens for the same input text.
When a prompt is more than the limited token, the model will truncate the prompt to fit within the limit. For example, if the limit is 2048 tokens and the prompt is 2500 tokens, the model will only use the first 2048 tokens as input and ignore the rest. The output is also limited by the token count, and the model will stop generating text when it reaches the limit. For example, if the output limit is 512 tokens and the model has generated 500 tokens, it will only generate 12 more tokens before stopping.
Hope this helps.
Token in OpenAI Platform. https://platform.openai.com/tokenizer.
Top comments (0)