High Signal AI

High Signal AI

The Evolution of Tokenization - Byte Pair Encoding in NLP

Introduction to Byte Pair Encoding

Harshit Tyagi's avatar
Harshit Tyagi
Oct 03, 2021
∙ Paid

NLP may have been a little late to the AI epiphany but it is doing wonders with organisations like Google, OpenAI releasing state-of-the-art(SOTA) language models like BERT and GPT-2/3 respectively.

GitHub Copilot and OpenAI codex are among a few very popular applications that are in the news. As someone who has very limited exposure to NLP, I decided to…

User's avatar

Continue reading this post for free, courtesy of Harshit Tyagi.

Or purchase a paid subscription.
© 2026 Harshit Tyagi · Privacy ∙ Terms ∙ Collection notice
Start your SubstackGet the app
Substack is the home for great culture