Building a transformer model from scratch can often be the only option for many more specific use cases. Although BERT and other transformer models have been pre-trained for many languages and domains, they do not cover everything.
Often, these less common use cases stand to gain the most from having someone come along and build a specific transformer model. It could be for an uncommon language or a less tech-savvy domain.
BERT is the most popular transformer for a wide range of language-based machine learning — from sentiment analysis to question and answering. BERT has enabled a diverse range of innovation across many borders and industries.
The first step for many in designing a new BERT model is the tokenizer. In this article, we’ll look at the WordPiece tokenizer used by BERT — and see how we can build our own from scratch.