The Evolution of Tokenization - Byte Pair Encoding in NLP
Introduction to Byte Pair Encoding
NLP may have been a little late to the AI epiphany but it is doing wonders with organisations like Google, OpenAI releasing state-of-the-art(SOTA) language models like BERT and GPT-2/3 respectively.
GitHub Copilot and OpenAI codex are among a few very popular applications that are in the news. As someone who has very limited exposure to NLP, I decided to…



